Electric Vehicle Remaining Range Prediction with a Three-Layer Weighted Stacking Model

In recent years, the rapid development of the economy and improvements in living standards have made electric vehicles a primary mode of transportation in China. Data from the Ministry of Public Security shows that the number of electric vehicles in China has been growing exponentially, contributing to efforts to reduce carbon emissions. However, electric vehicles face challenges such as limited energy storage capacity and long charging times, which lead to range anxiety among drivers. Accurate prediction of the remaining range is crucial to alleviate this issue and enhance the adoption of electric vehicles. Traditional methods for range prediction often rely on single models or simplistic feature selection, which may not capture the complex interactions between vehicle intrinsic properties, driver behavior, and external environmental factors. In this study, we propose a novel approach using a three-layer weighted stacking model to improve the accuracy and generalization of remaining range prediction for electric vehicles.

We begin by addressing the feature selection problem. The dataset used in this work is derived from a national electric vehicle monitoring platform, comprising one year of real-world operational data from 10 electric vehicles. Data points were collected at 15-second intervals and include variables such as vehicle speed, voltage, current, state of charge, and environmental conditions. After preprocessing, which involved filtering discharge processes, removing outliers using the Pauta criterion, and filling missing values with Lagrange interpolation, we constructed a candidate feature set. The minimum redundancy maximum relevance (mRMR) algorithm was employed to optimize the input features by evaluating both the relevance of features to the target and the redundancy among features. We combined the maximal information coefficient (MIC) and Spearman correlation coefficient (SC) to form a composite metric:

$$ M(u,v) = \omega_m \text{MIC}(u,v) + \omega_s \text{SC}(u,v) $$

where $\omega_m$ and $\omega_s$ are weights satisfying $\omega_m + \omega_s = 1$. The MIC and SC are defined as:

$$ \text{MIC}(u,v) = \max_{a < b < G} \frac{I(u,v)}{\log_2 \min(a,b)} $$

$$ \text{SC}(u,v) = 1 – \frac{6 \sum_{i=1}^n d_i^2}{n(n^2 – 1)} $$

Here, $a$ and $b$ represent grid dimensions, $G$ is the total grid size, $d_i$ is the rank difference, and $n$ is the sample size. The mRMR algorithm aims to maximize the relevance $R(S,y)$ and minimize the redundancy $R'(S)$:

$$ \max R(S,y), \quad R = \frac{1}{|S|} \sum_{x_i \in S} M(x_i,y) $$
$$ \min R'(S), \quad R’ = \frac{1}{|S|^2} \sum_{x_i,x_j \in S} M(x_i,x_j) $$

The objective function is $\phi = R / R’$, and incremental search is used to select the optimal feature subset. We evaluated the performance using mean squared error (MSE) and found that the optimal feature subset consists of 12 features, as shown in Table 1.

Table 1: Optimized Input Feature Set for Electric Vehicle Remaining Range Prediction
Feature Type	Feature Names
Vehicle Intrinsic Properties	State of Charge, Minimum Temperature, Maximum Cell Voltage, Voltage, Minimum Cell Voltage, Speed, Current, Maximum Temperature
Driver Behavior	Accelerator Pedal Position, Brake Pedal Status
External Environment	Ambient Temperature, Humidity

This feature set was compared against other methods like Pearson correlation coefficient (PCC), variance threshold (VTM), and expert manual selection (EMS). The mRMR approach demonstrated superior performance, with lower MSE, MAE, MAPE, and higher R², as summarized in Table 2. This underscores the importance of considering both relevance and redundancy in feature selection for electric vehicle applications.

Table 2: Comparison of Feature Selection Methods for China EV Range Prediction
Method	MSE (km)	MAE (km)	MAPE (%)	R²
mRMR	0.7983	0.6470	5.52	0.9831
PCC	0.8501	0.6892	5.89	0.9805
VTM	0.9015	0.7310	6.25	0.9780
EMS	0.8754	0.7103	6.08	0.9792

Next, we developed a three-layer weighted stacking model for remaining range prediction. The stacking model integrates multiple base learners to enhance predictive performance. The first layer, or base layer, includes models such as XGBoost, random forest (RF), gradient boosting decision tree (GBDT), support vector machine (SVM), and BP neural network. The second layer, or generalization layer, uses Ridge and Lasso regression to fuse the outputs from the base layer and the original features. The third layer, or meta-model layer, combines the predictions through linear weighting. The overall prediction function is:

$$ f(x) = \vartheta_{\text{Ridge}} f_{\text{Ridge}}(x) + \vartheta_{\text{Lasso}} f_{\text{Lasso}}(x) $$

where $\vartheta_{\text{Ridge}}$ and $\vartheta_{\text{Lasso}}$ are weights optimized using Bayesian optimization. The Bayesian optimization algorithm (BOA) employs a Gaussian process as a probabilistic surrogate model and an expected improvement acquisition function to efficiently search for optimal parameters. The Gaussian process is defined as:

$$ f(x) \sim \mathcal{GP}(\mu(x), k(x,x’)) $$

and the acquisition function is:

$$ \text{EI}(x) = \mathbb{E}[\max(f(x) – f_{\text{best}}, 0)] $$

We applied BOA to determine the weights for the base models, as shown in Table 3. The weights reflect the relative importance of each model, with XGBoost, RF, and GBDT contributing significantly to the base layer.

Table 3: Weight Distribution of Base Models in the Stacking Model for Electric Vehicle Prediction
Layer	Model	Weight
Base Layer	XGBoost	0.4503
	RF	0.3697
	GBDT	0.1720
	SVM	0.0036
	BP	0.0044
Generalization Layer	Ridge	0.6310
Generalization Layer	Lasso	0.3690

The training process uses 3-fold cross-validation. For each base model, the training data is split into three folds. Two folds are used for training, and one for prediction, generating outputs that are combined for the next layer. This ensures robust performance and reduces overfitting. The final meta-model is trained on these combined features to produce the remaining range prediction.

We evaluated the proposed three-layer weighted stacking model (Stacking-A) against variants: Stacking-B (without original features), Stacking-C (without weights), and Stacking-D (two-layer stacking). The results, presented in Table 4, demonstrate that Stacking-A achieves the best performance, with MSE of 0.4568 km, MAE of 0.4932 km, MAPE of 4.34%, and R² of 0.9852. This highlights the benefits of incorporating original features, hierarchical structure, and weighted fusion in improving prediction accuracy for electric vehicles.

Table 4: Performance Comparison of Stacking Models for China EV Remaining Range Prediction
Model	MSE (km)	MAE (km)	MAPE (%)	R²
Stacking-A	0.4568	0.4932	4.34	0.9852
Stacking-B	0.4702	0.5010	4.42	0.9850
Stacking-C	0.5247	0.5326	4.68	0.9844
Stacking-D	0.6256	0.5771	5.28	0.9838

To further analyze the model, we examined the impact of feature selection on prediction error. As the number of features increases, the MSE decreases and stabilizes after 10 features, indicating that additional features may introduce noise. The optimal weights for MIC and SC were found to be $\omega_m = 0.7$ and $\omega_s = 0.3$, which minimized MSE. This optimization is crucial for handling the high-dimensional data typical in electric vehicle operations.

The Bayesian optimization process efficiently searched the parameter space, requiring fewer evaluations to find global optima. The weights assigned to base models, such as 0.4503 for XGBoost, indicate its strong predictive capability, while lower weights for SVM and BP suggest their complementary roles. In the generalization layer, Ridge regression’s higher weight (0.6310) compared to Lasso (0.3690) shows its effectiveness in handling multicollinearity and providing stable predictions.

In conclusion, our proposed method for electric vehicle remaining range prediction leverages advanced feature selection and a weighted stacking model to achieve high accuracy and generalization. The mRMR algorithm effectively identifies relevant features while minimizing redundancy, and the three-layer stacking model with Bayesian-optimized weights integrates multiple learning paradigms. This approach addresses the limitations of single models and simplistic feature selection, making it suitable for real-world applications in China’s growing electric vehicle market. Future work could explore dynamic updates to the model based on real-time data and extend the method to other types of electric vehicles.

The adoption of electric vehicles in China is accelerating, and accurate range prediction is key to sustaining this growth. By improving prediction models, we can reduce range anxiety, optimize energy management, and support the transition to sustainable transportation. Our study demonstrates that machine learning techniques, when properly integrated, can significantly enhance the performance of electric vehicle systems, contributing to the broader goals of energy efficiency and environmental protection.

Moreover, the robustness of our model was tested under various conditions, including different driving patterns and environmental factors. The results consistently showed that the three-layer weighted stacking model outperforms traditional approaches, with lower errors and higher explanatory power. This makes it a valuable tool for manufacturers, policymakers, and consumers in the electric vehicle ecosystem.

In summary, we have developed a comprehensive framework for electric vehicle remaining range prediction that combines innovative feature selection with ensemble learning. The use of real-world data from China EV operations ensures practical relevance, and the methodological advancements provide a foundation for future research. As the electric vehicle industry evolves, such data-driven approaches will play a critical role in enhancing vehicle performance and user experience.