The widespread adoption of battery electric vehicles (BEVs) is largely propelled by advancements in lithium-ion battery (LiB) technology, renowned for its high energy density and extended cycle life. Accurate assessment of a battery’s state of health (SOH), primarily reflected by its capacity fade, and its remaining useful life (RUL) is crucial for ensuring the safety, reliability, and cost-effectiveness of battery electric vehicle operations. However, the complex, non-linear electrochemical degradation processes within LiBs pose significant challenges for precise estimation. Current methodologies broadly fall into two categories: experimental analysis techniques, such as Incremental Capacity Analysis (ICA) and Differential Voltage Analysis (DVA), and model-based approaches, including adaptive filters and data-driven algorithms. While adaptive models like dual H-infinity filters offer real-time joint estimation, they often rely on precise equivalent circuit models, which are difficult to derive under diverse and uncertain real-world operating conditions of a battery electric vehicle. Data-driven methods, leveraging machine learning, bypass the need for explicit physical modeling by establishing correlations between measurable battery features and target parameters like capacity and RUL. Techniques such as Relevance Vector Machine (RVM), Long Short-Term Memory (LSTM) networks, and Support Vector Machines (SVM) have shown promise. However, the Least Squares SVM (LS-SVM) can be computationally intensive for large datasets, and the initial extracted health features may not exhibit a strong linear correlation with the degradation targets, leading to suboptimal prediction accuracy. This paper addresses these issues by proposing a novel framework. An optimized SVM model, trained on an initial dataset, is used for preliminary capacity estimation, where the optimal support vectors are selected by maximizing the quadratic Rayleigh entropy to reduce model complexity. Furthermore, to enhance the correlation between the raw health indicators and the estimation targets, the Box-Cox transformation is employed. This integrated approach enables a more accurate and robust co-estimation of battery capacity and remaining service life for battery electric vehicle applications.

Methodology
Box-Cox Transformation for Feature Enhancement
The Box-Cox transformation is a parametric power transformation technique designed to stabilize variance and make data more closely conform to a normal distribution, thereby improving the linearity of relationships between variables. Consider a standard linear regression model:
$$ y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + … + \beta_m x_{im} + \epsilon_i, \quad i=1,2,…,n $$
$$ \epsilon_i \sim N(0, \sigma^2) $$
where \(m\) is the number of independent variables, \(\beta\) are regression coefficients, and \(\epsilon\) is the random error. The Box-Cox transformation on the dependent variable \(y\) is defined as:
$$ y^{(\lambda)} = \begin{cases} \frac{y^{\lambda} – 1}{\lambda} & \lambda \neq 0 \\ \ln y & \lambda = 0 \end{cases} $$
where \(\lambda\) is the transformation parameter. The corresponding inverse transformation is:
$$ y = \begin{cases} (\lambda y^{(\lambda)} + 1)^{1/\lambda} & \lambda \neq 0 \\ \exp(y^{(\lambda)}) & \lambda = 0 \end{cases} $$
The probability density function for \(y^{(\lambda)}\) can be derived. For practical estimation, we maximize the profile log-likelihood function \(L(\lambda)\) to find the optimal \(\lambda\):
$$ L(\lambda) = -\frac{n}{2} \ln \hat{\sigma}^2 + (\lambda – 1) \sum_{i=1}^{n} \ln y_i $$
where \(\hat{\sigma}^2\) is the maximum likelihood estimate of the error variance for a given \(\lambda\). The \(\lambda\) that maximizes \(L(\lambda)\) is chosen, and the original feature or target variable is transformed accordingly. This process significantly enhances the linear correlation between the health indicators (e.g., features from charging curves) and the battery capacity or RUL, leading to more accurate model training for a battery electric vehicle’s battery management system.
Battery Aging Experiment and Data Analysis
Aging tests were conducted on commercial lithium-ion cells under room temperature conditions. The test protocol involved charging with constant currents of 1.5A and 2A, with cutoff voltages set at 4.2V and 2.5V, respectively. The recorded capacity fade curve exhibits a notable “capacity regeneration” phenomenon, where the apparent capacity temporarily increases during certain cycles. This effect, often more pronounced in accelerated lab tests with high C-rates than in real-world battery electric vehicle usage, is attributed to complex electrode-electrolyte interface reactions and temperature effects. The battery’s end-of-life (EOL) is defined as the point where its capacity degrades to 60% of its nominal rated capacity. The RUL is then quantified as the remaining number of charge-discharge cycles until this EOL threshold is reached.
Stepwise Estimation of Capacity and Remaining Cycle Life
The proposed co-estimation framework integrates the Box-Cox transformation with an optimized LS-SVM algorithm. The overall algorithm workflow consists of four main stages: Initial Data Processing, Health Feature Extraction, Offline Model Training, and Online Estimation.
Initial Data Processing & Feature Extraction: For a given charging cycle, the voltage-capacity (Q-V) curve is obtained. The Incremental Capacity (IC) curve, \(dQ/dV\) vs. \(V\), is then computed. A Kalman Filter (KF) is applied to smooth the raw IC curve and reduce measurement noise. Key health features, such as the peak value or the integral of the IC curve within a specific voltage interval, are extracted. These features serve as indicators of the battery’s aging state.
Feature Enhancement & Offline Training: The extracted raw features and the target values (capacity or cycles to EOL) are normalized. The Box-Cox transformation is applied to these normalized sequences to find the optimal \(\lambda\) that maximizes their linear correlation. The transformed data is then used to train two separate models: a Capacity Estimation Model and an RUL Prediction Model, both based on the LS-SVM algorithm. To optimize the LS-SVM’s hyperparameters (regularization parameter \(\gamma\) and kernel parameter \(\sigma\)), a Particle Swarm Optimization (PSO) algorithm is employed. Furthermore, to reduce computational complexity, the optimal support vectors for the final model are selected by maximizing the quadratic Rayleigh entropy from the initial training set.
Online Estimation: During the operational life of the battery electric vehicle’s battery, for each new charging cycle, the same feature extraction and Box-Cox transformation processes are applied to the measured charging data. The transformed feature is then fed into the pre-trained Capacity and RUL models to obtain real-time estimates of the current battery capacity and its remaining useful life.
Experimental Validation and Results Analysis
Evaluation of Aging Feature Parameters
The effectiveness of the extracted health feature is first validated by analyzing its correlation with the measured battery capacity. A feature like the incremental capacity peak within a specific voltage window is chosen. As shown in the capacity fade curve, the extracted feature parameter demonstrates an excellent trajectory agreement with the actual capacity degradation curve, including the capacity regeneration segments. When capacity temporarily increases, the feature value also rises correspondingly. The Pearson correlation coefficient (\(S_p\)) between this feature and the battery capacity is calculated to be 0.994, indicating a very strong linear relationship after the Box-Cox transformation. This high correlation confirms the suitability of the chosen feature as a robust health indicator for predicting the state of health of a battery electric vehicle’s battery.
Experimental Setup
The experimental setup for RUL evaluation involves a battery test bench equipped with a channel for cell cycling and a sensor for monitoring parameters like surface temperature. The battery’s voltage and current during tests across different predefined voltage intervals are recorded and transmitted to a host computer for storage and processing. The proposed Box-Cox transformation and estimation algorithms are implemented on this platform to process the data and output the capacity and RUL estimates in real-time.
Validation of the Estimation Algorithm
The performance of the proposed algorithm is validated using experimental data. 50% of the cycle data is used as the training set for the optimized LS-SVM model, and the remaining 50% is used for testing. The battery capacity is estimated using features extracted from different voltage intervals (20mV, 60mV, 100mV, 150mV) and compared against the traditional ICA-based method.
The estimation results for battery capacity are summarized in the table below. Performance metrics include Absolute Average Error (AAE), Maximum Absolute Error (MAE), Root Mean Square Error (RMSE), Coefficient of Determination (R²), computation time, and a comprehensive score derived from entropy weight and analytic hierarchy process methods.
| Voltage Interval (mV) | AAE (mA·h) | MAE (mA·h) | RMSE (mA·h) | R² | Cost Time (ms) | Score |
|---|---|---|---|---|---|---|
| 20 | 15.2 | 59.2 | 19.2 | 0.994 | 1.2 | 0.25 |
| 60 | 10.8 | 58.3 | 13.9 | 0.992 | 1.3 | 0.18 |
| 100 | 12.4 | 62.5 | 17.2 | 0.995 | 1.1 | 0.22 |
| 150 | 13.9 | 115 | 18.3 | 0.994 | 1.2 | 0.62 |
| Traditional ICA | 25.7 | 138.5 | 31.3 | 0.937 | 0.2 | 0.87 |
The results indicate that while all voltage intervals with the proposed method yield better results than the traditional ICA method, the 60mV interval provides the optimal balance. It achieves the best AAE, RMSE, and a very high R², resulting in the lowest comprehensive score (where lower is better). Although the 150mV interval has a slightly better AAE than 20mV, its large MAE leads to the worst overall performance score among the proposed intervals. This demonstrates the critical role of selecting an appropriate voltage window for feature extraction in the context of a battery electric vehicle’s operational data.
Similarly, the RUL estimation performance is evaluated. After 20 rounds of training and testing, the statistical results for RUL prediction are presented below.
| Voltage Interval (mV) | AAE (Cycles) | MAE (Cycles) | RMSE (Cycles) | R² | Cost Time (ms) | Score |
|---|---|---|---|---|---|---|
| 20 | 9.83 | 112.45 | 13.42 | 0.996 | 1.3 | 0.68 |
| 60 | 5.28 | 46.3 | 9.24 | 0.992 | 1.2 | 0.23 |
| 100 | 5.67 | 69.24 | 9.65 | 0.996 | 1.2 | 0.24 |
| 150 | 7.29 | 83.5 | 13.54 | 0.995 | 1.1 | 0.54 |
| Traditional ICA | 13.56 | 83.54 | 17.58 | 0.947 | 0.2 | 0.76 |
The results for RUL estimation mirror those of capacity estimation. The 60mV voltage interval consistently delivers the best performance, with the lowest AAE (5.28 cycles), MAE (46.3 cycles), and RMSE (9.24 cycles), significantly outperforming the traditional ICA method. This confirms that the health feature extracted at the 60mV interval, after Box-Cox transformation, provides the most reliable indicator for predicting both the state of health and the remaining useful life of a lithium-ion battery in a battery electric vehicle.
Conclusion
This paper presents an integrated data-driven framework for the co-estimation of capacity and remaining useful life in lithium-ion batteries for battery electric vehicle applications. The method centers on enhancing the correlation between measurable health indicators and degradation targets through Box-Cox transformation and employing an optimized LS-SVM model for prediction. The key findings are: Firstly, the health feature extracted from the incremental capacity curve, particularly within a specific voltage window, shows an exceptionally high correlation (0.994) with the actual battery capacity, effectively capturing even non-monotonic behaviors like capacity regeneration. Secondly, the proposed algorithm demonstrates superior estimation accuracy compared to traditional ICA methods across various voltage intervals. Thirdly, a voltage interval of 60mV is identified as optimal for feature extraction, yielding the best balance of estimation accuracy (lowest AAE, MAE, RMSE) and model robustness for both capacity and RUL prediction. This reliable and accurate assessment methodology for battery capacity and remaining life is vital for optimizing battery usage, planning maintenance, and ensuring the long-term health and safety of battery electric vehicles.
