An Optimized Charging Strategy for China EV Battery Using Deep Reinforcement Learning

With the rapid adoption of electric vehicles (EVs) globally, enhancing the charging efficiency of EV power batteries has become a critical focus. Fast charging, while reducing charging time, often leads to challenges such as thermal safety risks and accelerated battery degradation. Traditional charging strategies, like Constant Current-Constant Voltage (CC-CV), lack adaptability and fail to optimize multiple objectives simultaneously. In this study, we propose an intelligent charging method based on deep reinforcement learning (DRL) to address these issues. Our approach integrates a thermoelectric coupling model for accurate battery state representation, employs an Extended Kalman Filter (EKF) for precise State of Charge (SOC) estimation, and utilizes the Deep Deterministic Policy Gradient (DDPG) algorithm to optimize the charging strategy. By incorporating a multi-objective reward function, Ornstein-Uhlenbeck (OU) noise exploration, experience replay, and soft update techniques, we achieve a balance between charging time, energy loss, and thermal safety. Simulation results on the MATLAB platform demonstrate that our method reduces the maximum internal battery temperature by approximately 2°C while maintaining comparable charging time to conventional methods, thereby optimizing both efficiency and safety for China EV battery systems.

The proliferation of EVs has intensified the demand for fast-charging technologies, particularly for EV power batteries. However, high-current charging can induce adverse effects such as increased internal temperature, solid electrolyte interface (SEI) growth, and lithium plating, which compromise battery lifespan and safety. Existing methods, including rule-based and model-based strategies, often overlook the dynamic interplay between electrical and thermal behaviors. Our work bridges this gap by developing a comprehensive framework that leverages deep reinforcement learning to dynamically adjust charging currents based on real-time battery states. This not only improves charging efficiency but also enhances the longevity and reliability of China EV battery systems.

To model the behavior of EV power batteries, we construct a thermoelectric coupling model that combines an integer-order equivalent circuit model (first-order RC) with a dual-state lumped parameter thermal model. The electrical part captures polarization effects and dynamic voltage responses, while the thermal part simulates core and surface temperature variations. The discrete-time equations for the equivalent circuit model are derived as follows. The output voltage $U_t(k)$ at time $k$ is given by:

$$U_t(k) = U_{oc}(k) – U_p(k)$$

where $U_{oc}(k)$ is the open-circuit voltage, and $U_p(k)$ is the polarization voltage. The state-space representation for the RC network is:

$$U_p(k) = \left(1 – \frac{t_s}{R_p C_p}\right) U_p(k-1) + \frac{t_s}{C_p} I(k)$$

Here, $t_s$ is the sampling time, $I(k)$ is the current, $R_p$ is the polarization resistance, and $C_p$ is the polarization capacitance. The parameters $R_0$ (ohmic resistance), $R_p$, and $C_p$ are identified using recursive least squares (RLS) algorithm under different temperatures and SOC levels. The transfer function in the continuous domain is:

$$G(s) = \frac{U_t(s)}{I(s)} = \frac{R_0 + R_p + R_0 \tau_0 s}{1 + \tau_0 s}$$

where $\tau_0 = R_p C_p$ is the time constant. Discretizing this yields the parameters $a_1$, $b_1$, and $b_2$ as:

$$U_t(k) = a_1 U_t(k-1) + b_1 I(k) + b_2 I(k-1)$$

The relationships between discrete and continuous parameters are:

$$R_0 = \frac{b_1}{a_1}, \quad R_p = \frac{a_1 b_1 – b_2}{a_1 (1 + a_1)}, \quad C_p = -\frac{t_s a_1^2}{a_1 b_1 – b_2}$$

For the thermal model, the dual-state lumped parameter approach describes the core temperature $T_i$ and surface temperature $T_s$ dynamics. The state-space equations are:

$$C_e \frac{d(T_i – T_a)}{dt} = Q_g – \frac{T_i – T_s}{R_i}$$
$$C_s \frac{d(T_s – T_a)}{dt} = \frac{T_i – T_a}{R_i} – \frac{T_s – T_a}{R_o}$$

where $T_a$ is the ambient temperature, $R_i$ and $R_o$ are thermal resistances, $C_e$ and $C_s$ are thermal capacitances, and $Q_g$ is the heat generation rate. The parameters identified via RLS are summarized in Table 1.

Table 1: Identified Parameters for the Dual-State Lumped Thermal Model
Parameter	Value
$R_i$ (K/W)	4.72
$R_o$ (K/W)	6.49
$C_e$ (J/K)	117.65
$C_s$ (J/K)	13.40

Accurate SOC estimation is crucial for effective charging management of EV power batteries. We employ the EKF algorithm to estimate SOC dynamically, combining the advantages of open-circuit voltage methods and ampere-hour integration. The nonlinear state-space model for the battery is:

$$x_k = A x_{k-1} + B u_k + \omega_k$$
$$y_k = C x_{k-1} + D u_k + \nu_k$$

where $x_k = [U_p(k), S_{SOC}(k)]^T$ is the state vector, $y_k = [U_t(k)]$ is the measurement, $u_k = [I(k)]$ is the input, and $\omega_k$ and $\nu_k$ are process and measurement noises, respectively. The matrices are defined as:

$$A = \begin{bmatrix} 1 – \frac{t_s}{R_p C_p} & 0 \\ 0 & 1 \end{bmatrix}, \quad B = \begin{bmatrix} \frac{t_s}{C_p} \\ -\frac{t_s}{Q_n} \end{bmatrix}, \quad C = \begin{bmatrix} -1 & 0 \end{bmatrix}, \quad D = \begin{bmatrix} R_0 \end{bmatrix}$$

The EKF algorithm involves prediction and update steps. The prediction step computes the prior state estimate and error covariance:

$$\hat{x}_k^- = f(\hat{x}_{k-1}, u_{k-1})$$
$$P_k^- = A_{k-1} P_{k-1} A_{k-1}^T + Q$$

The update step corrects the estimate using the measurement:

$$e_k = y_k – h(\hat{x}_k^-, u_k)$$
$$K_k = P_k^- H_k^T (H_k P_k^- H_k^T + R)^{-1}$$
$$\hat{x}_k^+ = \hat{x}_k^- + K_k e_k$$
$$P_k^+ = (I – K_k H_k) P_k^-$$

where $H_k$ is the Jacobian matrix of the observation function. The SOC estimation error remains below 1.69%, ensuring reliability for China EV battery applications.

The DDPG algorithm is adopted to optimize the charging strategy for EV power batteries. This actor-critic method combines deep neural networks with deterministic policy gradients, suitable for continuous action spaces. The framework includes online and target networks for both actor (policy) and critic (value) functions. The actor network $\mu(s|\theta^\mu)$ maps states to actions (charging currents), while the critic network $Q(s,a|\theta^Q)$ evaluates the action-value function. Target networks $\mu’$ and $Q’$ are softly updated to stabilize training:

$$\theta^{\mu’} \leftarrow \tau \theta^\mu + (1-\tau) \theta^{\mu’}$$
$$\theta^{Q’} \leftarrow \tau \theta^Q + (1-\tau) \theta^{Q’}$$

where $\tau$ is the soft update coefficient. Experience replay is used to store transitions $(s_t, a_t, r_t, s_{t+1})$ in a buffer, and mini-batches are sampled to train the networks. OU noise is added to actions for exploration:

$$a_t = \mu(s_t|\theta^\mu) + \mathcal{N}_t$$

where $\mathcal{N}_t$ is OU noise. The state space includes SOC intervals from 10% to 90% in 5% increments, and the action space is the charging current, constrained to safe limits. The multi-objective reward function integrates charging time, energy loss, and temperature rise:

$$R = -J_{obj} = -(\omega_{ct} J_{ct} + \omega_{el} J_{el} + \omega_{tr} J_{tr})$$

where the loss functions are defined as:

$$J_{ct} = k + \frac{1 – S_{SOC}(k)}{I_b(k)}$$
$$J_{el} = I_b^2(k) R_0 + \frac{U_p^2(k)}{R_p}$$
$$J_{tr} = T_i(k) – T_i(k-1)$$

Weights $\omega_{ct}$, $\omega_{el}$, and $\omega_{tr}$ are set with $\omega_{el} = \omega_{tr}$ and $\beta = \omega_{ct} / \omega_{el}$. Constraints on current, voltage, temperature, and SOC are enforced through penalties in the reward function. This approach ensures that the charging strategy for China EV battery systems prioritizes both efficiency and safety.

Simulations are conducted on the MATLAB platform to validate the proposed method. The training process involves 2,000 episodes, with an average reward converging over time. Under an ambient temperature of 25°C and $\beta=9$, the charging time is 1,625 seconds. Comparative analysis with multi-stage constant current (MCC) strategies shows that our method reduces the maximum internal temperature by about 2°C while maintaining similar charging times. For instance, MCC-1 and MCC-2 achieve charging times of 1,552 s and 1,777 s, respectively, but with higher thermal stress. The DDPG strategy adapts to different ambient temperatures (15°C, 25°C, 45°C), adjusting currents to optimize performance. Table 2 summarizes the comparison results.

Table 2: Comparison of Charging Strategies for EV Power Battery
Strategy	Charging Time (s)	Max Internal Temperature (°C)
DDPG (25°C)	1625	~36
MCC-1	1552	~38
MCC-2	1777	~36

The training efficiency is evaluated based on computational time. The CPU time for one full training session is 2,408.57 seconds, and the inference time per charging current decision is 1.2 seconds, making it feasible for real-world applications in China EV battery management systems. The reward function design effectively balances the objectives, as shown by the stable learning curve. Additional experiments varying $\beta$ demonstrate that higher values prioritize charging time reduction, while lower values emphasize thermal safety and energy loss minimization.

In conclusion, our DDPG-based charging strategy offers a robust solution for optimizing the performance of EV power batteries. By integrating thermoelectric modeling, accurate SOC estimation, and multi-objective reinforcement learning, we achieve significant improvements in charging efficiency and thermal management. Future work will focus on refining the reward function weights, enhancing neural network architectures, and deploying the algorithm in real-world EV charging stations to validate its practicality. The ongoing development of China EV battery technologies will benefit from such intelligent charging systems, contributing to sustainable transportation solutions.

The proposed framework underscores the importance of adaptive learning in addressing the complex challenges of fast charging. As the demand for EVs grows, advanced strategies like DDPG will play a pivotal role in ensuring the safety, longevity, and efficiency of EV power batteries. We anticipate that further research will lead to even more sophisticated methods, potentially incorporating digital twin systems for continuous optimization across charging networks.