The rapid global adoption of battery electric vehicles (BEVs) represents a pivotal shift towards sustainable transportation. However, the integration of large-scale, uncoordinated charging of these battery EV cars into existing power grids poses significant operational challenges. The inherently stochastic nature of user charging behavior, compounded by the influence of diverse external factors such as weather conditions, time-of-use electricity tariffs, and holiday schedules, creates highly volatile and complex load profiles at public charging stations. Accurate short-term forecasting of this charging demand is therefore not merely an academic exercise but a critical necessity for grid stability, efficient energy management, and the planning of future charging infrastructure. Traditional forecasting methods often fall short in this complex, multi-variable environment, struggling to capture the intricate spatial and temporal dependencies inherent in the charging behavior of numerous battery EV cars.

To address these challenges, this article proposes a novel short-term load forecasting framework specifically designed for clusters of battery EV car charging stations. The core innovation lies in a two-stage process that effectively reconstructs multi-source external data and leverages advanced deep learning architectures for precise temporal prediction. In the first stage, we construct a functional graph representing the charging stations, not based on precise geographical coordinates—which are often unavailable due to privacy concerns—but on the similarity of their historical load patterns. This is quantified using the Pearson correlation coefficient between station load sequences. A Graph Attention Network (GAT) is then employed to process this graph alongside homogeneous external features (e.g., city-wide weather, tariff signals). The GAT dynamically reconstructs these features into differentiated, station-specific representations, effectively encoding the spatial couplings and reducing information redundancy. In the second stage, these reconstructed multi-source features serve as exogenous variables, while the target station’s historical load acts as the endogenous variable. Both are fed into a TimeXer model, a transformer-based architecture enhanced for time series forecasting with exogenous inputs. TimeXer’s dual-attention mechanism separately captures internal temporal dependencies and cross-influences from external factors, leading to highly accurate predictions.
The Forecasting Framework: GAT-TimeXer
The overall workflow of the proposed GAT-TimeXer framework is illustrated in the following diagram and consists of four primary phases: data acquisition, preprocessing, model training, and evaluation.
The process begins with collecting multi-source data, including historical charging power sequences from multiple stations, and external information such as temperature, humidity, precipitation, time-of-day, day-of-week, holiday indicators, and electricity price. The raw data undergoes preprocessing involving data cleaning, aggregation of second-level transactions into hourly load, normalization, and the construction of an adjacency matrix based on load similarity. The preprocessed historical load forms the endogenous variable, while all other multi-source data forms the exogenous variable set. The adjacency matrix and the raw exogenous features are input into the GAT module for feature reconstruction. The output of the GAT—the reconstructed, station-differentiated exogenous features—is then paired with the endogenous variable (the target station’s history) and fed into the TimeXer model for training. Finally, the trained model is used to forecast future load (e.g., 24 to 96 hours ahead), and its performance is rigorously evaluated against multiple benchmarks and through ablation studies.
Model Components and Methodology
1. Quantifying Station Similarity for Graph Construction
In the absence of precise geographical data, we infer relationships between charging stations based on the functional similarity of their load profiles, which often correlates with geographical or usage-pattern proximity. For any two stations \(i\) and \(j\), we calculate the Pearson correlation coefficient \(r_{ij}\) between their hourly average power sequences, \(P_i\) and \(P_j\), over a historical period \(T\):
$$
r_{ij} = \frac{\sum_{t=1}^{T}(P_{i,t} – \bar{P}_i)(P_{j,t} – \bar{P}_j)}{\sqrt{\sum_{t=1}^{T}(P_{i,t} – \bar{P}_i)^2 \sum_{t=1}^{T}(P_{j,t} – \bar{P}_j)^2}}
$$
where \(P_{i,t}\) is the power for station \(i\) at time \(t\), and \(\bar{P}_i\) is its mean power. A threshold (e.g., \(|r_{ij}| > 0.6\)) is applied to binarize the matrix and create an adjacency matrix \(\mathbf{A}\), where \(A_{ij}=1\) indicates a strong functional link. This matrix defines the graph structure for the subsequent GAT model, connecting stations that serve similar populations of battery EV cars.
2. Feature Reconstruction with Graph Attention Network (GAT)
The Graph Attention Layer (GAL) is the core of the GAT. For a graph with node features \(\mathbf{h} = \{ \vec{h}_1, \vec{h}_2, …, \vec{h}_N \}, \vec{h}_i \in \mathbb{R}^F\), a shared linear transformation parameterized by a weight matrix \(\mathbf{W} \in \mathbb{R}^{F’ \times F}\) is applied first. The attention coefficient \(e_{ij}\) between nodes \(i\) and \(j\), which indicates the importance of node \(j\)’s features to node \(i\), is computed as:
$$
e_{ij} = \text{LeakyReLU}\left(\vec{a}^T [\mathbf{W}\vec{h}_i \| \mathbf{W}\vec{h}_j] \right)
$$
where \(\vec{a} \in \mathbb{R}^{2F’}\) is a weight vector for the attention mechanism and \(\|\) denotes concatenation. These coefficients are normalized across all neighbors \(j \in \mathcal{N}_i\) of node \(i\) using the softmax function to obtain the final attention weights \(\alpha_{ij}\):
$$
\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k \in \mathcal{N}_i} \exp(e_{ik})}
$$
The output feature for node \(i\) is a weighted sum of the transformed features of its neighbors, followed by a nonlinearity (e.g., ELU):
$$
\vec{h}_i’ = \text{ELU}\left(\sum_{j \in \mathcal{N}_i} \alpha_{ij} \mathbf{W} \vec{h}_j\right)
$$
We employ multi-head attention to stabilize the learning process. For \(K\) heads, their outputs are either concatenated (in intermediate layers) or averaged (in the final layer) to produce the final node representation. The GAT model takes the raw, homogeneous exogenous feature matrix \(\mathbf{X}_{exo} \in \mathbb{R}^{N \times T \times F_{exo}}\) and the adjacency matrix \(\mathbf{A}\) as input. Through training via a simple auxiliary prediction task, it learns to output a reconstructed feature matrix \(\mathbf{X}_{exo}’ \in \mathbb{R}^{N \times T \times F’}\), where the features are now distinct for each station, implicitly containing information about the influence from its functionally similar neighbors. This step is crucial for tailoring general external factors like weather to the specific context of battery EV car charging at each location.
3. Temporal Forecasting with the TimeXer Model
TimeXer is a transformer variant specifically designed to handle time series forecasting with exogenous variables. Its key innovations are tailored embedding strategies and a dual-attention mechanism. For a target station, the model receives two sequences: the endogenous history \(\mathbf{x}_{en} \in \mathbb{R}^T\) (e.g., past 168 hourly loads) and the reconstructed exogenous features \(\mathbf{Z}_{ex} \in \mathbb{R}^{T \times F’}\).
Embedding: The endogenous sequence is divided into non-overlapping patches, each linearly projected to a “temporal token” with added positional encoding. A learnable [GLOBAL] token is also appended to this sequence. Each exogenous variable is independently projected into a “variable token,” creating a set of external tokens.
Dual-Attention Mechanism: The encoder stack contains two key attention layers per block:
1. Endogenous Self-Attention: All endogenous tokens (including the [GLOBAL] token) attend to each other to capture internal temporal dynamics of the battery EV car charging load.
2. Exogenous-to-Endogenous Cross-Attention: The [GLOBAL] token acts as a query to attend to all exogenous variable tokens (keys and values). This allows the model to selectively incorporate relevant external information (like tomorrow’s predicted high temperature or off-peak电价) into the global representation of the time series.
The updated [GLOBAL] token is then reintegrated, and the processed sequence is passed through a feed-forward network. After \(L\) such blocks, the output endogenous tokens are flattened and passed through a linear projection head to generate the final forecast \(\hat{\mathbf{y}} \in \mathbb{R}^S\) for the next \(S\) hours.
Experimental Validation and Results
We validate our proposed GAT-TimeXer framework using a real-world, high-resolution dataset of battery EV car charging transactions from 13 public stations in Jiaxing, China, spanning two years. The data includes charging power, time stamps, and associated external features like weather and tariff.
Experimental Setup and Metrics
The dataset was preprocessed to aggregate transactions into hourly load profiles. The exogenous features included 9 variables: temperature, humidity, precipitation, month, weekday, hour, is-workday flag, is-holiday flag, and electricity price. The GAT was configured with a two-layer architecture and 4 attention heads. The TimeXer model used a patch length of 24, 3 encoder layers, and 8 attention heads. We forecast future loads for horizons of \(S = \{24, 48, 72, 96\}\) hours. Model performance was evaluated using standard metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²).
Performance and Comparative Analysis
The proposed model demonstrated excellent stability across different forecast horizons. The table below shows the performance for three representative stations over 72 hours, comparing GAT-TimeXer against several strong baselines.
| Model | Bus Station | Shopping Center | Urban Park | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | MSE | RMSE | R² | MAE | MSE | RMSE | R² | MAE | MSE | RMSE | R² | |
| Historical Mean | 0.8266 | 1.0280 | 1.0139 | -0.0198 | 1.0042 | 1.5229 | 1.2340 | 0.1364 | 0.8248 | 1.0351 | 1.0174 | 0.0232 |
| LSTM | 0.4910 | 0.4133 | 0.6429 | 0.6044 | 0.6446 | 0.7589 | 0.8712 | 0.5761 | 0.5337 | 0.4842 | 0.6959 | 0.5346 |
| Transformer | 0.5272 | 0.4569 | 0.6969 | 0.5312 | 0.6670 | 0.7955 | 0.8919 | 0.5508 | 0.5556 | 0.5507 | 0.7421 | 0.4660 |
| PatchTST | 0.4767 | 0.3763 | 0.6134 | 0.6368 | 0.5913 | 0.6046 | 0.7776 | 0.6586 | 0.4926 | 0.3973 | 0.6303 | 0.6148 |
| GAT-TimeXer (Ours) | 0.4766 | 0.3740 | 0.6116 | 0.6390 | 0.5772 | 0.5749 | 0.7582 | 0.6754 | 0.4827 | 0.3825 | 0.6185 | 0.6292 |
The results clearly show that GAT-TimeXer achieves the best or highly competitive performance across all stations and metrics. It consistently yields the lowest error values (MAE, MSE, RMSE) and the highest R² scores, indicating its superior ability to explain the variance in battery EV car charging load. The visual forecasts also confirm that the model accurately captures daily periodicity, peak/off-peak patterns, and anomalies during holidays.
Ablation Study and Feature Importance
Ablation studies were conducted to validate the contribution of each core component. We compared the full GAT-TimeXer model against two variants: 1) w/o GAT: Using raw, unprocessed exogenous features directly with TimeXer, and 2) w/o Multi-source Data: Using only historical load as input to TimeXer. The performance degradation, measured by the drop in R² score, is summarized below for the 72-hour forecast.
| Station Type | Full Model (R²) | w/o GAT (R²) | R² Drop | w/o Multi-source (R²) | R² Drop |
|---|---|---|---|---|---|
| Bus Station | 0.6390 | 0.6350 | 0.63% | 0.6380 | 0.16% |
| Shopping Center | 0.6754 | 0.6692 | 0.92% | 0.6713 | 0.61% |
| Urban Park | 0.6292 | 0.6158 | 2.13% | 0.6015 | 4.40% |
The ablation results unequivocally demonstrate the value of both the multi-source data and the GAT-based feature reconstruction module. The performance decline is most pronounced for the Urban Park station, where the charging behavior of battery EV cars is likely more sensitive to external factors like weather and holidays. The GAT module’s role in creating spatially-aware feature representations is crucial for maximizing the utility of the external data.
Conclusion and Future Work
This article presented a novel, data-driven framework for short-term load forecasting at battery EV car charging stations. The GAT-TimeXer model successfully addresses key challenges by: 1) dynamically modeling spatial correlations between stations without relying on precise location data, 2) intelligently reconstructing homogeneous multi-source external features into station-specific inputs, and 3) leveraging a powerful temporal transformer architecture designed to fuse endogenous and exogenous information. Experimental validation on a real-world dataset confirms that the framework outperforms established benchmarks in accuracy and stability across various prediction horizons.
The methodology offers a practical and effective solution for grid operators and charging network managers to anticipate the aggregated demand from fleets of battery EV cars. Future research directions include enhancing the graph construction process with adaptive thresholding or weighted adjacency, integrating even richer data sources such as real-time traffic flow or dynamic user incentives, and testing the framework’s generalizability across diverse urban environments and for longer-term forecasting scenarios. As the population of battery EV cars continues to grow, robust forecasting tools like the one proposed here will be indispensable for building a resilient and efficient electrified transportation ecosystem.
