Electric Vehicle Charging Station Charging Prediction Based on Multi-Feature Extraction and Multi-Level Transfer Learning

As a pivotal tool for achieving sustainable development and meeting future energy demands, electric vehicles represent a low-carbon and energy-efficient transportation solution. However, the electric vehicle industry in China remains underdeveloped, facing challenges such as low utilization of charging piles, irrational layout of charging stations, slow construction progress of charging infrastructure, and disparities in charging costs. These issues significantly hinder the widespread adoption of electric vehicles. Accurate prediction of charging volume at electric vehicle charging stations is crucial for optimizing station planning, enhancing operational efficiency, and improving user experience. Nevertheless, newly built or renovated charging stations often suffer from data scarcity, including missing data for certain periods and insufficient historical data, making it difficult to achieve precise predictions. Additionally, shallow neural network models struggle to capture the complex and variable input features inherent in charging data. To address these challenges, we propose a novel method for predicting electric vehicle charging station charging volume based on multi-feature extraction and multi-level transfer learning. This approach integrates user charging behavior features extracted via clustering algorithms with environmental and economic factors, employs a multi-scale hybrid temporal convolutional network-bidirectional long short-term memory network with attention mechanism (TCN-BiLSTM-Attention) for feature learning, and implements a multi-level transfer learning strategy to leverage data from source charging stations, thereby overcoming the limitations of small sample sizes in target domains.

The rapid growth of the electric vehicle market in China underscores the importance of efficient charging infrastructure. However, the unpredictability of charging demand at new or renovated stations poses operational challenges. Traditional forecasting methods often fail when historical data is limited, necessitating innovative approaches that can generalize from related domains. Our method focuses on extracting multifaceted features, including temporal patterns, user behavior, and external factors, to build a robust prediction model. By leveraging transfer learning, we aim to transfer knowledge from data-rich source domains to data-scarce target domains, enhancing prediction accuracy and reliability. This work contributes to the optimization of electric vehicle charging station management, supporting the broader goal of sustainable transportation in China.

In the context of electric vehicle charging prediction, feature extraction plays a critical role. We utilize the K-Means clustering algorithm to analyze charging order data from multiple charging stations, identifying distinct user charging behaviors. The clustering process involves computing the Euclidean distance between data points and cluster centers, with the goal of minimizing intra-cluster variance. The Euclidean distance $ D $ for a cluster set $ C_i $ (where $ i = 1, 2, \ldots, K $) is defined as:

$$ D = \sum_{i=1}^{K} \sum_{s \in C_i} \| s – \mu_i \|^2 $$

Here, $ s $ represents a data point, $ \mu_i $ is the centroid of cluster $ C_i $, and $ \| \cdot \| $ denotes the Euclidean norm. To evaluate the clustering quality, we use the silhouette coefficient $ S(t) $, which measures how similar a data point is to its own cluster compared to other clusters. For a data point $ t $, the silhouette coefficient is calculated as:

$$ S(t) = \frac{b(t) – a(t)}{\max\{a(t), b(t)\}} $$

where $ a(t) $ is the average distance from $ t $ to other points in the same cluster, and $ b(t) $ is the minimum average distance from $ t $ to points in any other cluster. A value of $ S(t) $ close to 1 indicates optimal clustering. Based on empirical analysis, we set the number of clusters $ K = 4 $, resulting in distinct user groups with unique charging patterns, such as peak charging during specific hours. These clustered features are integrated with other influencing factors, including timestamp data (year, month, day, hour), day type (workday, weekend, holiday), environmental data (maximum and minimum temperature, weather type), and economic data (electricity fee, service fee). This multi-feature input set enhances the model’s ability to capture complex dependencies in charging volume data.

Table 1: Evaluation Metrics for Different Input Features in Clustering
Input Data	Silhouette Coefficient	Clustering Time (s)
Start Time	0.365	5.42
End Time	0.356	5.52
Start Time + End Time	0.370	9.37
Start Time + End Time + Duration	0.372	10.24

The proposed prediction model, multi-scale hybrid TCN-BiLSTM-Attention, is designed to handle the temporal complexities of electric vehicle charging data. The model architecture begins with an input layer that processes normalized sequences of charging volume and associated features over a sliding window of size $ T $. At time $ t $, the input sequence $ X_t $ is represented as:

$$ X_t = \{X_{t-T+1}, X_{t-T+2}, \ldots, X_t\} $$

where each $ X_t $ comprises the multi-dimensional feature vector. The input is then passed through a multi-scale hybrid TCN layer, which consists of parallel TCN modules with different kernel sizes (1, 4, and 7) to capture features at various scales. The hybrid TCN incorporates dilated convolutions with increasing and decreasing dilation factors, expanding the receptive field without increasing computational cost. The output of the TCN layer for a given kernel size $ k $ is computed as:

$$ S_k = T_k(x) $$

where $ T_k $ denotes the TCN operation with kernel size $ k $, and $ x $ is the input vector. The outputs from different kernel sizes are concatenated to form a comprehensive feature map $ S $:

$$ S = C(S_1, S_2, S_3) $$

Here, $ C(\cdot) $ represents the concatenation operation. This multi-scale approach allows the model to capture both short-term and long-term dependencies in the charging data.

Subsequently, the feature map $ S $ is fed into a two-layer BiLSTM network to model temporal dependencies bidirectionally. The BiLSTM computes forward and backward hidden states $ \overrightarrow{h_t} $ and $ \overleftarrow{h_t} $ at each time step $ t $, using LSTM units with input, forget, and output gates. The LSTM gate mechanisms are defined as follows:

$$ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) $$
$$ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) $$
$$ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) $$
$$ g_t = \tanh(W_g \cdot [h_{t-1}, x_t] + b_g) $$
$$ C_t = f_t \cdot C_{t-1} + i_t \cdot g_t $$
$$ h_t = o_t \cdot \tanh(C_t) $$

where $ \sigma $ is the sigmoid activation function, $ W $ and $ b $ are weight matrices and bias terms, respectively. The BiLSTM combines forward and backward passes to produce a hidden state $ h_t $:

$$ h_t = \sigma(W_h \cdot [\overrightarrow{h_t}, \overleftarrow{h_t}] + b_h) $$

This bidirectional processing enables the model to leverage both past and future context for improved prediction. An attention layer is then applied to weigh the importance of different time steps, enhancing the model’s focus on critical features. The attention mechanism computes a context vector $ A $ as a weighted sum of BiLSTM outputs:

$$ A = \sum_{t=1}^{T} \alpha_t h_t $$

where $ \alpha_t $ is the attention weight for time step $ t $, calculated using a softmax function over learned scores. Finally, a fully connected layer with a sigmoid activation function generates the predicted charging volume $ \hat{y}_t $:

$$ \hat{y}_t = \sigma(W_a \cdot A + b_a) $$

The output layer denormalizes the predictions to obtain the final charging volume forecasts.

To address data scarcity in target charging stations, we implement a multi-level transfer learning strategy. First, source domain data from multiple charging stations are ranked based on their relevance to the target domain using the cross-entropy error function. The cross-entropy $ L $ between the predicted probability distribution $ \hat{y} $ and the actual distribution $ y $ is given by:

$$ L(\hat{y}, y) = -[y \log(\hat{y}) + (1 – y) \log(1 – \hat{y})] $$

Source stations are divided into $ M $ levels (where $ M = 7 $, corresponding to the model layers) based on ascending cross-entropy values, indicating increasing relevance. The multi-level transfer learning process involves sequentially training the model with data from each level, starting from the least relevant. At each level $ m $, the model weights $ w_{m-1} $ from the previous level are loaded, and new weights $ w_m $ are obtained through training. If the prediction error decreases, $ w_m $ is retained; otherwise, $ w_{m-1} $ is used. This iterative process ensures that the model adapts progressively to the target domain, minimizing negative transfer. Additionally, we incorporate adaptive learning through $ L_2 $ regularization to prevent overfitting. The overall loss function $ L $ combines mean squared error and regularization:

$$ L = \frac{1}{2} \sum_{t=1}^{T} (y_t^o – y_t^d)^2 + \frac{\lambda}{2Q} \| W \|^2 $$

where $ y_t^o $ and $ y_t^d $ are the predicted and actual charging volumes, $ \lambda $ is a penalty factor adjusted based on weight dispersion $ e_W $ and prediction error change rate $ \Delta e $:

$$ e_W = \frac{1}{Z-1} \sum_{m=1}^{Z} (W_m – \bar{W})^2 $$
$$ \Delta e = \frac{e_{t+1} – e_t}{e_t} $$
$$ \lambda = \Delta e \cdot e_W $$

This dynamic adjustment enhances model generalization. Furthermore, a sensitivity-based neuron growth strategy is employed in the BiLSTM layers to optimize network structure. Neuron sensitivity $ \phi_k(x_n) $ for input $ x_n $ is computed as:

$$ \phi_k(x_n) = \tanh(o_t \cdot C_t) $$

If all neuron sensitivities fall below a threshold $ \theta_0 = 0.4 $, new neurons are added with weights initialized as the average of the highest-sensitivity neurons, ensuring rapid integration into the network.

Table 2: Model Parameter Settings
Parameter	Value
Sliding Window Size	6
Feature Dimension	14
Batch Size	32
Epochs	200
TCN Kernel Sizes	1, 4, 7
TCN Filters	64
BiLSTM Units	128
Learning Rate	1e-4
Optimizer	Adam

We evaluate our method using real-world data from charging stations in China, focusing on electric vehicle charging volumes from January to August 2022. The dataset includes charging orders, weather conditions, and economic factors. Three scenarios simulate data scarcity: missing data from January-April, March-June, and May-August. The input features are normalized using min-max scaling:

$$ y’ = \frac{x_i – x_{\min}}{x_{\max} – x_{\min}} $$

Performance metrics include mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination ($ R^2 $):

$$ \text{MAE} = \frac{1}{N} \sum_{i=1}^{N} | \hat{y}_i – y_i | $$
$$ \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2} $$
$$ R^2 = 1 – \frac{\sum_{i=1}^{N} (\hat{y}_i – y_i)^2}{\sum_{i=1}^{N} (y_i – \bar{y})^2} $$

Experimental results demonstrate that incorporating user charging behavior features significantly improves prediction accuracy. For instance, compared to models without these features, MAE decreases by 49.61%, RMSE by 49.67%, and $ R^2 $ increases by 3.13%. The multi-scale hybrid TCN-BiLSTM-Attention model outperforms baseline models like SVR, TCN, LSTM, and TCN-LSTM, achieving the lowest errors. The multi-level transfer learning strategy further enhances performance, reducing MAE by 10.75% and RMSE by 13.73% compared to direct transfer, with $ R^2 $ reaching 0.989. These findings validate the effectiveness of our approach in addressing data scarcity and complex feature interactions in electric vehicle charging prediction.

Table 3: Comparison of Prediction Errors for Different Migration Strategies
Missing Data	Migration Strategy	MAE (kWh)	RMSE (kWh)	R²
January-April	No Transfer	73.96	94.57	0.872
January-April	Direct Transfer	24.17	31.24	0.985
January-April	Multi-Level Transfer	21.32	27.48	0.989
March-June	No Transfer	84.45	106.38	0.858
March-June	Direct Transfer	33.59	42.01	0.973
March-June	Multi-Level Transfer	26.64	33.72	0.980
May-August	No Transfer	71.43	91.02	0.874
May-August	Direct Transfer	23.07	30.29	0.986
May-August	Multi-Level Transfer	20.59	26.13	0.989

In conclusion, our method effectively predicts electric vehicle charging station charging volume by integrating multi-feature extraction and multi-level transfer learning. The use of K-Means clustering to derive user charging behavior features enriches the input data, while the multi-scale hybrid TCN-BiLSTM-Attention model captures complex temporal patterns. The multi-level transfer learning strategy mitigates data scarcity issues by leveraging source domain knowledge in a structured manner. Experimental results confirm that our approach achieves high accuracy and robustness across various data missing scenarios, contributing to the efficient management and planning of electric vehicle charging infrastructure in China. Future work could explore real-time adaptation and integration with smart grid technologies to further enhance the scalability of electric vehicle charging solutions.