Electric Vehicle Load Prediction with Attention and Multi-Scale Features

With the rapid adoption of electric cars globally, accurately forecasting electric vehicle (EV) charging load has become crucial for grid stability and energy management. In China, the EV market is expanding rapidly, making load prediction essential for efficient power distribution and infrastructure planning. Traditional methods often struggle with the randomness and nonlinearity of EV load data, leading to suboptimal accuracy. To address this, we propose a novel model that integrates variational mode decomposition (VMD), attention mechanisms, and multi-scale features within a temporal convolutional network (TCN) framework, termed VMD-AM-MSF-TCNnet. This approach enhances feature extraction and prediction performance by decomposing complex load sequences, refining network structures, and fusing multi-scale temporal information. In this article, we detail the methodology, experimental setup, and results, demonstrating significant improvements over existing models in terms of error metrics and predictive capability.

The proliferation of electric cars, particularly in regions like China EV markets, has intensified the need for reliable load forecasting. EV charging load is influenced by various factors, including user behavior, weather conditions, and temporal patterns, resulting in highly stochastic data. Previous studies have employed techniques like recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, but these often face challenges with long-term dependencies and computational efficiency. TCNs offer a viable alternative due to their parallel processing capabilities and ability to capture extended temporal contexts through dilated convolutions. However, standard TCNs may not fully leverage the multi-faceted nature of EV load data. Our model builds upon TCN by incorporating VMD for signal decomposition, optimized using the whale optimization algorithm (WOA), and enhancing residual blocks with gating mechanisms and dual attention (time and channel attention). Additionally, we employ multi-scale feature fusion to capture diverse temporal patterns, making it particularly suited for the dynamic load profiles of electric cars in China EV scenarios.

Data preprocessing is a critical first step in handling EV load data. We aggregate raw charging records into hourly intervals to form a consistent time series. For missing values, linear interpolation is applied to maintain data integrity. Meteorological data, which can significantly impact electric car charging behavior, are processed by normalizing continuous features using Min-Max scaling and encoding discrete features with one-hot encoding. The normalization formula is given by:

$$ w_j’ = \frac{w_j – w_{\text{min}}}{w_{\text{max}} – w_{\text{min}}} $$

where $ w_j’ $ is the normalized value for the $ j $-th meteorological feature, $ w_j $ is the original value, and $ w_{\text{min}} $ and $ w_{\text{max}} $ are the minimum and maximum values, respectively. This ensures that all features are on a comparable scale, reducing biases in model training.

To handle the complexity of EV load sequences, we apply VMD, which adaptively decomposes the load data into intrinsic mode functions (IMFs). VMD mitigates mode mixing and extracts underlying patterns, but its performance depends on parameters like the number of modes $ K $ and the penalty factor $ \alpha $. We optimize these parameters using WOA, which mimics whale foraging behavior. The WOA process involves initializing a population of whales, each representing a parameter set $ (K, \alpha) $, and iteratively updating positions based on fitness evaluation using minimum envelope entropy. The position update equations include:

$$ L^{t+1}(n) = L_{\text{best}}^t + |L_{\text{best}}^t – L^t(n)| \cdot e^{r_1} \cdot \cos(2\pi r_1) \quad \text{for } R \geq 0.5 $$

$$ \Delta L^t(n) = | \gamma \cdot L_{\text{rand}}^t – L^t(n) | \quad \text{for } R < 0.5 \text{ and } |\beta| \geq 1 $$

$$ L^{t+1}(n) = L_{\text{rand}}^t – \beta \cdot \Delta L^t(n) $$

where $ L^t(n) $ is the position of the $ n $-th whale at iteration $ t $, $ L_{\text{best}}^t $ is the best position found, $ L_{\text{rand}}^t $ is a random position, $ r_1 $ is a random number in $[-1,1]$, $ R $ is a random probability, and $ \beta $ and $ \gamma $ are control parameters. This optimization ensures that VMD decomposes the electric car load into meaningful sub-sequences, enhancing prediction accuracy.

In analyzing factors affecting EV load, we consider historical load, date, day of the week, and meteorological features. A multiple linear model is used to identify significant weather factors:

$$ Y = a_1 W_1 + a_2 W_2 + \cdots + a_i W_i + \cdots + a_s W_s + b $$

where $ Y $ is the EV load, $ a_i $ are coefficients for meteorological features $ W_i $, and $ b $ is the intercept. Features with coefficients exceeding a threshold (e.g., 0.5) are retained as relevant inputs. For instance, in our experiments, temperature (T), sea-level pressure (P), and pressure change (Pa) showed strong correlations with electric car charging patterns in China EV data.

The core of our model lies in the improved TCN residual blocks, which incorporate gating mechanisms and dual attention. Standard TCN residual blocks use dilated causal convolutions to capture temporal dependencies. We enhance this by adding a gated unit, similar to GRU, to control information flow:

$$ z_1 = \sigma(W_z * H_1 + b_z) $$
$$ r_1 = \sigma(W_r * H_1 + b_r) $$
$$ \tilde{H}_1 = r_1 \odot \text{state} $$
$$ H_1′ = (1 – z_1) \odot H_1 + z_1 \odot \tilde{H}_1 $$
$$ G_1 = \text{Dropout}(\text{ReLU}(H_1′)) $$

where $ H_1 $ is the output from the first dilated convolution, $ z_1 $ and $ r_1 $ are update and reset gates, $ \sigma $ is the sigmoid function, $ \odot $ denotes element-wise multiplication, and “state” represents the previous hidden state. This gating mechanism helps in modeling long-range dependencies in electric car load sequences.

We further integrate time attention and channel attention to emphasize important temporal points and feature channels. Time attention computes weights for each time step:

$$ A_t = \text{softmax}(\text{Flatten}(\tanh(W_t * X + C))) $$

where $ X $ is the input sequence, $ W_t $ is a weight matrix, and $ C $ is a bias term. The attention weights $ A_t $ are then applied to the features:

$$ Y = G_2 \odot A_t’ $$

where $ A_t’ $ is the repeated and permuted version of $ A_t $ to match feature dimensions. Channel attention, on the other hand, uses global average pooling and dense layers to generate channel-wise weights $ A_c $:

$$ Y’ = \text{GlobalAveragePooling}(Y) $$
$$ A_c = \text{Dense}(\text{Dense}(\text{Reshape}(Y’))) $$
$$ \text{Res} = X + Y \odot A_c $$

This dual attention mechanism allows the model to focus on critical aspects of the data, improving feature representation for China EV load forecasting.

For multi-scale feature fusion, we deploy multiple improved residual blocks with varying kernel sizes (e.g., 3, 5, 7) and dilation rates (e.g., 1, 2, 4). Each block processes the input at different temporal scales, and their outputs are combined:

$$ \text{Res}_{\text{mul}} = \text{Res}_1 + \text{Res}_2 + \text{Res}_3 $$

An attention layer then dynamically weights these multi-scale features:

$$ A_d = \text{Dense}(\text{softmax}(\text{Permute}(\text{Res}_{\text{mul}}))) $$
$$ \text{Res}_{\text{fusion}} = \text{Flatten}(\text{Dropout}(\text{Res}_{\text{mul}} \odot A_d)) $$
$$ V = \text{Dense}(\text{Dense}(\text{Dense}(\text{Res}_{\text{fusion}}))) $$

This fusion captures diverse patterns in electric car load data, from short-term fluctuations to long-term trends, enhancing the model’s robustness.

In our experimental setup, we use a dataset from the ElaadNL project, containing EV charging records from January 1 to June 30, 2019, with 163,255 entries. Meteorological data for the same period in the Netherlands includes features like temperature and pressure, sampled hourly. We preprocess the data by aggregating it into hourly loads and normalizing features. The last 1,200 hours are used as the test set. Simulation is conducted on an AMD Ryzen7 system with an RTX 4060 GPU, using PyCharm 2022. The model is trained for 50 epochs with the Adam optimizer and mean absolute error as the loss function. Residual blocks are configured with kernel sizes of 3, 5, and 7 and dilation rates of 1, 2, and 4 to handle multi-scale temporal features of electric cars.

To evaluate performance, we use mean squared error (MSE), mean absolute error (MAE), and R-squared (R²) metrics:

$$ \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_{\text{pred},i} – y_{\text{true},i})^2 $$
$$ \text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_{\text{pred},i} – y_{\text{true},i}| $$
$$ R^2 = 1 – \frac{\sum_{i=1}^{N} (y_{\text{pred},i} – y_{\text{true},i})^2}{\sum_{i=1}^{N} (y_{\text{true},i} – \bar{y})^2} $$

where $ N $ is the number of samples, $ y_{\text{pred},i} $ is the predicted load, $ y_{\text{true},i} $ is the actual load, and $ \bar{y} $ is the mean load. These metrics provide a comprehensive view of model accuracy for China EV applications.

The analysis of meteorological factors reveals key influencers on EV load. Using the multiple linear model, we identify features with coefficients above 0.5 as significant, as shown in Table 1.

Table 1: Impact of Meteorological Factors on Electric Vehicle Load
Feature	Coefficient
T (Temperature)	0.81
Po (Horizontal Pressure)	-0.86
P (Sea-Level Pressure)	0.85
Pa (3-Hour Pressure Change)	1.33
U (Relative Humidity)	0.05
Ff (Average Wind Speed)	0.28

Features like temperature and pressure changes have substantial effects, highlighting the importance of weather data in electric car load forecasting for China EV scenarios.

WOA-optimized VMD decomposes the EV load sequence into five IMFs, reducing complexity and improving prediction quality. The optimization process uses a whale population size of 10 and a maximum of 50 iterations, with $ K $ ranging from 2 to 10 and $ \alpha $ from 100 to 3000. This results in a decomposition that effectively separates noise and trends, facilitating more accurate modeling of electric car load patterns.

We compare our VMD-AM-MSF-TCNnet model against several benchmarks, including GRU, LSTM, CNN-Attention-LSTM (CNN-AT-LSTM), EMD-LSTM, VMD-CNN, VMD-FCA-TCN, VMD-LSTM, and ICEEMDAN-TCN-Attention-BiGRU (ITCN-AT-BiGRU). The prediction errors are summarized in Table 2.

Table 2: Prediction Error Comparison of Different Models
Model	MSE (kW²)	MAE (kW)	R²
GRU	46.28	5.02	0.618
LSTM	43.26	4.89	0.643
CNN-AT-LSTM	41.15	4.67	0.653
EMD-LSTM	25.84	3.80	0.787
VMD-CNN	7.31	2.05	0.937
VMD-FCA-TCN	6.93	1.97	0.940
VMD-LSTM	4.53	1.63	0.960
ITCN-AT-BiGRU	3.43	1.42	0.970
Proposed Model	1.47	0.91	0.987

Our model achieves the lowest MSE and MAE and the highest R², indicating superior performance. For instance, MSE is reduced by 96.82% compared to GRU and 57.14% compared to ITCN-AT-BiGRU, demonstrating the effectiveness of integrating VMD, attention, and multi-scale features for electric car load prediction in China EV contexts.

Ablation studies further validate the contributions of each component. We test variants including the base TCN, TCN with VMD (VT), VT with improved residual blocks (VT-IR), VT with multi-scale residual blocks (VT-MSR), and the full model. Results are shown in Table 3.

Table 3: Ablation Study Prediction Error Comparison
Model	MSE (kW²)	MAE (kW)	R²
TCN	47.55	4.97	0.607
VT	7.59	2.09	0.934
VT-IR	2.39	1.21	0.980
VT-MSR	1.67	1.01	0.985
Proposed Model	1.47	0.91	0.987

Adding VMD to TCN (VT) reduces MSE by 84.04%, highlighting the benefits of decomposition. Incorporating improved residual blocks (VT-IR) further cuts MSE by 68.51%, and multi-scale features (VT-MSR) add another 78.00% reduction. The full model achieves the best results, underscoring the synergy of all components in handling the complexities of electric car load data, particularly for China EV applications where load patterns are highly variable.

In conclusion, our VMD-AM-MSF-TCNnet model significantly enhances electric vehicle load prediction by leveraging VMD for data decomposition, attention mechanisms for feature refinement, and multi-scale fusion for comprehensive temporal analysis. The use of WOA optimizes decomposition parameters, while improved TCN residual blocks with gating and attention capture essential patterns. Experimental results confirm substantial improvements in accuracy, making it a robust solution for managing the growing demands of electric cars in China EV ecosystems. Future work could incorporate additional features like charging pricing to further refine predictions, supporting the sustainable integration of electric vehicles into smart grids.