A Causal Tree-of-Thought-Based Model for Battery State-of-Charge Prediction in Electric Vehicles

As the adoption of electric vehicles accelerates globally, accurate prediction of the State-of-Charge (SOC) for battery EV car systems has become a critical challenge. The SOC reflects the remaining energy capacity, directly influencing driving range, safety, and battery longevity. Traditional methods, such as model-based approaches or data-driven neural networks, often struggle with real-time state awareness, dynamic calibration under complex driving conditions, and long-sequence forecasting accuracy. In this article, I propose a novel temporal prediction framework that integrates a causal tree-of-thought reasoning mechanism with deep reinforcement learning strategies. This framework addresses these limitations by enabling adaptive modeling of battery state transitions while maintaining computational efficiency, thereby enhancing the reliability of battery EV car operations.

The core innovation lies in combining a hierarchical causal structure with proximal policy optimization (PPO) to create a multi-level reasoning network. By embedding time-series networks within the Actor-Critic architecture, the model hierarchically captures the direct and indirect causal influences of key variables—such as temperature, internal resistance, and current—on SOC. Furthermore, the introduction of a tree-of-thought structure allows for multi-path policy evaluation, incorporating strategy search, path tracking, and backtracking correction mechanisms. This design not only improves prediction robustness but also offers interpretability into the causal relationships affecting battery EV car performance. Experimental results demonstrate that this approach significantly outperforms existing models like Transformer, FEDformer, Mamba, and LSTM across various operating conditions, with mean absolute error (MAE) below 0.26%, root mean squared error (RMSE) below 0.35%, and coefficient of determination (R²) above 99.5%.

In the context of battery EV car systems, SOC prediction is essential for optimizing energy management and preventing issues like overcharge or overdischarge. The battery pack, typically composed of numerous cells, experiences degradation over time due to factors such as temperature fluctuations, aging, and operational stress. Existing methods often rely on simplified equivalent circuits or black-box neural networks, which may fail to adapt to dynamic environments like sudden acceleration or varying road conditions. This limitation can lead to inaccurate SOC estimates, potentially compromising the safety and efficiency of battery EV car fleets. Therefore, developing an adaptive framework that can perceive state changes and evolve continuously is paramount for advancing electric mobility.

To formalize the problem, I consider SOC prediction as a Markov Decision Process (MDP), where the state space includes multiple variables influencing battery behavior. Let the state at time $t$ be represented as:

$$S_t = [T_t, H_t, R_t, I_t, V_t, \text{SOH}_t, \text{OC}_t, \text{OD}_t]$$

Here, $T_t$ denotes temperature, $H_t$ humidity, $R_t$ internal resistance, $I_t$ current, $V_t$ voltage, $\text{SOH}_t$ state of health, and $\text{OC}_t$ and $\text{OD}_t$ are binary indicators for overcharge and overdischarge events, respectively. These variables collectively impact the SOC, which can be expressed as a nonlinear function:

$$\text{SOC}_t = f(T_t, H_t, R_t, I_t, V_t, \text{SOH}_t, \text{OC}_t, \text{OD}_t) + \epsilon_t$$

where $\epsilon_t$ represents noise or unobserved disturbances. The relationship between SOC and effective capacity is governed by:

$$\text{SOC}(t) = \text{SOC}(t_0) – \frac{1}{Q_{\text{now}}} \int_{t_0}^{t} \eta I(\tau) d\tau$$

with $Q_{\text{now}} = Q_{\text{rated}} \cdot \text{SOH}$. This highlights the coupling between SOC and SOH, where inaccuracies in SOH estimation can propagate errors in SOC predictions for battery EV car applications.

My solution revolves around a deep reinforcement learning framework enhanced with causal inference. The proposed model employs a multi-level PPO algorithm, where each level corresponds to a causal variable from the state vector. The Actor network, built on a time-series network (TSN), generates action policies that represent strategy factors influencing SOC, while the Critic network evaluates state values to guide optimization. The action policy at each level is sampled from a Gaussian distribution:

$$a_t \sim \mathcal{N}(\mu_t, \sigma_t^2)$$

where $\mu_t$ and $\sigma_t$ are outputs from the TSN-Actor. The state value function $V(S_t)$ is estimated by the Critic to reflect expected cumulative rewards, such as energy consumption. The advantage function $A_t$ is computed using temporal difference error:

$$A_t = \sum_{i=0}^{T-t} (\gamma \lambda)^i \delta_{t+i}, \quad \delta_t = r_t + \gamma V(S_{t+1}) – V(S_t)$$

Here, $r_t$ is a reward function designed based on energy usage, and $\gamma$ is a discount factor. The policy loss incorporates a clipped objective for stability:

$$L^{\text{CLIP}}(\theta) = \mathbb{E}_t \left[ \min\left( \rho_t A_t, \text{clip}(\rho_t, 1-\epsilon, 1+\epsilon) A_t \right) \right]$$

with $\rho_t = \frac{\pi_\theta(a_t | s_t)}{\pi_{\theta_{\text{old}}}(a_t | s_t)}$. The total loss combines this with value function error and entropy regularization:

$$L = L^{\text{CLIP}} + c_1 (V(S_t) – \tilde{V}_t)^2 – c_2 H(\pi_\theta)$$

This formulation ensures continuous parameter evolution, enhancing the model’s ability to adapt to changing conditions in battery EV car systems.

The causal tree-of-thought mechanism extends this by constructing a multi-branch reasoning network. For each causal variable, the Actor generates 1,000 candidate action strategies, from which the top 4 with the highest value estimates are selected for further propagation. This creates a tree structure where paths represent different causal influences on SOC. At each node, the state value is updated as:

$$V_n = \text{Critic}_C([x_n, V_{n-1}]) \cdot W_C + b_C$$

where $x_n$ is the current input feature. If a path yields an SOC estimate beyond physical boundaries—e.g., exceeding a threshold $P_{\text{SOC}} = \omega_i + \lambda \cdot \sigma_i$—a backtracking mechanism is triggered to correct anomalies. This dynamic process enables the model to explore multiple reasoning paths while maintaining consistency, crucial for reliable predictions in diverse battery EV car scenarios.

To validate the framework, I conducted experiments using real-world data from 10 pure electric vehicles over five years, covering urban, highway, and rural driving conditions. The dataset includes variables like temperature, humidity, and SOH, with data split into training, validation, and test sets in a 7:2:1 ratio. Preprocessing involved Z-score normalization to standardize features. Key model parameters were set as follows: learning rate of 0.0001, discount factor of 0.93, and initial exploration rate of 0.65. The TSN architecture incorporated a probabilistic sparse self-attention mechanism for efficient long-sequence processing.

The prediction performance was evaluated using MAE, RMSE, and R² metrics. As shown in Table 1, the proposed model outperforms baseline methods across different SOH levels and vehicle types, demonstrating its robustness for battery EV car applications.

Model	SOH (%)	MAE (%)	RMSE (%)	R² (%)
Transformer	100	0.432	0.515	96.11
FEDformer	100	0.279	0.366	99.05
Mamba	100	0.297	0.333	98.51
Proposed (CoT-RL)	100	0.256	0.309	99.65
Transformer	80	0.549	0.675	81.56
FEDformer	80	0.337	0.401	97.79
Mamba	80	0.494	0.551	87.12
Proposed (CoT-RL)	80	0.254	0.303	99.71

Ablation studies further confirmed the contribution of each component. For instance, removing the SOH variable increased MAE by approximately 21.71%, underscoring its importance in SOC estimation for battery EV car systems. The causal tree-of-thought mechanism reduced MAE by up to 20.06% compared to variants without it, highlighting its role in enhancing prediction accuracy. Additionally, the model’s computational efficiency was assessed: with an average inference time of 78.3 ms and parameter size of 3.2 million, it meets real-time requirements for deployment in battery EV car energy management systems.

The long-sequence modeling capability was tested by varying prediction horizons from 5 to 60 seconds. As summarized in Table 2, the TSN-based Actor maintained stable performance across time scales, whereas other models like FEDformer showed significant degradation, emphasizing the superiority of the proposed architecture for capturing temporal dependencies in battery EV car data.

Prediction Horizon (s)	Model	MAE (%)	RMSE (%)	R² (%)
5	TSN	0.261	0.322	99.57
5	FEDformer	0.281	0.325	98.92
30	TSN	0.256	0.309	99.65
30	FEDformer	0.315	0.366	98.13
60	TSN	0.266	0.325	99.51
60	FEDformer	0.412	0.469	94.87

From a theoretical perspective, the convergence of the PPO algorithm ensures that the policy parameters approach a local optimum under mild conditions. The gradient update rules are given by:

$$\theta_A \leftarrow \theta_A – \alpha_A \nabla_{\theta_A} L^{\text{CLIP}}, \quad \theta_C \leftarrow \theta_C – \alpha_C \nabla_{\theta_C} L^{\text{CLIP}}$$

where $\alpha_A$ and $\alpha_C$ are learning rates. As the number of iterations increases, the policy converges such that $\lim_{n \to \infty} \theta_n = \theta^*$, provided the learning rate is sufficiently small and the policy is smooth. This stability is crucial for reliable SOC prediction in dynamic battery EV car environments.

In practical terms, the reward function $r_t$ is designed to reflect energy consumption, aligning with the goals of efficient battery EV car operation. It can be expressed as:

$$r_t = \int_t^{t+\Delta t} \left( V_{\text{bat}} I_{\text{bat}} + \frac{F_{\text{motor}} v_t}{\eta_{\text{drivetrain}}} \right) dt$$

where $V_{\text{bat}}$ and $I_{\text{bat}}$ are battery voltage and current, $F_{\text{motor}}$ is motor force, $v_t$ is velocity, and $\eta_{\text{drivetrain}}$ is drivetrain efficiency. This formulation encourages policies that minimize energy waste, directly benefiting the sustainability of battery EV car fleets.

Looking ahead, this framework offers promising extensions for real-world deployment. For instance, it can be integrated into edge computing platforms within battery EV car systems to enable low-latency SOC estimation. Future work will focus on model lightweighting and testing under more adversarial conditions, such as extreme weather or rapid charging cycles, to further enhance robustness. The causal tree-of-thought approach also opens avenues for explainable AI in automotive applications, allowing users to understand the factors driving SOC predictions and trust the system’s decisions.

In conclusion, the fusion of causal tree-of-thought reasoning with deep reinforcement learning presents a significant advancement in SOC prediction for battery EV car technologies. By addressing key challenges in state awareness, dynamic adaptation, and long-term forecasting, this model not only improves accuracy but also paves the way for smarter, more reliable electric vehicles. As the automotive industry continues to evolve, such adaptive frameworks will be instrumental in optimizing battery performance and ensuring the seamless integration of battery EV car into our daily lives.