With the rapid growth of the electric vehicle (EV) market, the reliability of EV charging stations has become a critical factor in user experience and the broader adoption of EVs. As essential infrastructure, EV charging stations are prone to various faults during operation, which can lead to service disruptions and increased maintenance costs. Accurately predicting the failure rate of EV charging stations enables proactive maintenance, enhancing safety and efficiency. In this study, we propose a novel failure rate prediction model that combines Long Short-Term Memory (LSTM) networks with an Embedding method (LSTM-EM). This approach captures both the temporal dependencies in time-series data and the multi-dimensional characteristics of discrete features, such as station and manufacturer information. By integrating these elements, the model achieves superior prediction accuracy, as demonstrated through comparative experiments with other deep learning models.

The dataset used in this study consists of alarm records from EV charging stations in a provincial network, collected over one month. It includes details such as device ID, fault start and end times, duration, station name, and manufacturer. Analysis reveals significant variations in failure rates across different EV charging stations and manufacturers, indicating that these factors influence fault occurrences. Additionally, temporal patterns, such as daily and weekly cycles, are observed, highlighting the time-series nature of the data. For instance, failure rates peak during specific hours and days, suggesting that operational load and environmental conditions play a role. These insights guide the feature engineering process, where we compute the daily failure rate for each EV charging station and incorporate categorical features like station and manufacturer identifiers.
To handle the data, we perform preprocessing steps including data cleaning, feature extraction, and normalization. The failure rate is calculated as the ratio of total fault duration in a day to the total minutes in a day (1440 minutes). For each EV charging station, the failure rate time series is derived as follows: Let \( D_i \) represent the set of fault time intervals for station \( i \), where each interval is defined by start and end times. After merging overlapping intervals, the daily fault duration \( T_{d,i} \) for day \( d \) is computed. The failure rate \( R_{d,i} \) is then given by:
$$ R_{d,i} = \frac{T_{d,i}}{1440} $$
This results in a time series \( R_i = [r_1, r_2, \ldots, r_D] \) for each EV charging station, where \( D \) is the number of days. We apply a sliding window of size \( W \) to generate input sequences for the model. For categorical features, such as station and manufacturer names, we use label encoding followed by Min-Max normalization to scale values between 0 and 1. The normalized feature \( x_{\text{scaled}} \) is computed as:
$$ x_{\text{scaled}} = \frac{x – x_{\min}}{x_{\max} – x_{\min}} $$
The final input for the model combines time-series and categorical features. For a window starting at index \( k \), the input is:
$$ X_i[k:k+W] = [r_k, r_{k+1}, \ldots, r_{k+W-1}, C_i, M_i] $$
where \( C_i \) and \( M_i \) are the encoded and normalized station and manufacturer features, respectively. The dataset is split into training, validation, and test sets based on time indices to ensure temporal consistency.
| Feature Type | Description | Range/Values |
|---|---|---|
| Time-Series | Daily failure rate | [0, 1] |
| Categorical | Station ID | Encoded integers (0 to N_s-1) |
| Categorical | Manufacturer ID | Encoded integers (0 to N_m-1) |
The LSTM-EM model is designed to leverage both temporal and categorical data. The LSTM component processes the time-series input, capturing long-term dependencies through its gated architecture. The LSTM unit equations are as follows:
$$ \begin{aligned}
f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\
i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\
\tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\
C_t &= f_t \cdot C_{t-1} + i_t \cdot \tilde{C}_t \\
o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\
h_t &= o_t \cdot \tanh(C_t)
\end{aligned} $$
where \( f_t \), \( i_t \), and \( o_t \) are the forget, input, and output gates, respectively; \( C_t \) is the cell state; \( h_t \) is the hidden state; and \( \sigma \) is the sigmoid function. The output from the LSTM layer is passed through a fully connected layer with ReLU activation for feature extraction.
For categorical features, the embedding layer maps discrete IDs to continuous vectors, enabling the model to learn semantic relationships. Given an input ID \( x \), the embedding vector \( e \) is obtained as:
$$ e = E \cdot x $$
where \( E \) is the embedding matrix. The embedded vectors are flattened and concatenated with the LSTM output. The combined feature vector is then processed through additional dense layers with ReLU and linear activations to produce the final prediction. The model architecture ensures that both time-series patterns and categorical influences are captured, enhancing prediction accuracy for EV charging station failure rates.
| Parameter | Value |
|---|---|
| Time window size (W) | 7 |
| LSTM units | 200 |
| Initial learning rate | 0.01 |
| Learning rate decay factor | 0.5 |
| Training epochs | 30 |
| Batch size | 16 |
| Embedding dimension | 50 |
Experiments are conducted on a hardware setup with an Intel i9-12900K CPU and 16GB RAM, using TensorFlow framework. The dataset is divided into training (70%), validation (15%), and test (15%) sets. We compare the LSTM-EM model against baseline models including Transformer, LSTM, RNN, and CNN-LSTM. Evaluation metrics include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R-squared (R²), and Symmetric Mean Absolute Percentage Error (SMAPE). These metrics are defined as:
$$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2 $$
$$ \text{RMSE} = \sqrt{\text{MSE}} $$
$$ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i – \hat{y}_i| $$
$$ R^2 = 1 – \frac{\sum_{i=1}^{n} (y_i – \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i – \bar{y})^2} $$
$$ \text{SMAPE} = \frac{100\%}{n} \sum_{i=1}^{n} \frac{|y_i – \hat{y}_i|}{(|y_i| + |\hat{y}_i|)/2} $$
where \( y_i \) is the actual value, \( \hat{y}_i \) is the predicted value, and \( \bar{y} \) is the mean of actual values. Lower values of MSE, RMSE, MAE, and SMAPE indicate better performance, while higher R² values denote better fit.
The results demonstrate that the LSTM-EM model outperforms all baselines across all metrics. For instance, compared to the Transformer model, LSTM-EM reduces SMAPE by 48.18%, highlighting its effectiveness in handling the complex dynamics of EV charging station data. The integration of embedding layers allows the model to leverage categorical information, such as manufacturer differences, which are often overlooked in traditional time-series models. This is particularly important for EV charging stations, where factors like maintenance practices and component quality vary by manufacturer. The LSTM component effectively captures temporal trends, such as seasonal variations in failure rates, which are common in EV charging station operations due to factors like weather and usage patterns.
| Model | MSE (↓) | RMSE (↓) | MAE (↓) | R² (↑) | SMAPE (%) (↓) |
|---|---|---|---|---|---|
| Transformer | 0.0197 | 0.1405 | 0.0771 | 0.2144 | 82.41 |
| LSTM | 0.0127 | 0.1128 | 0.0440 | 0.4934 | 77.56 |
| RNN | 0.0125 | 0.1119 | 0.0448 | 0.5015 | 75.92 |
| CNN-LSTM | 0.0123 | 0.1107 | 0.0422 | 0.5122 | 71.69 |
| LSTM-EM | 0.0110 | 0.1048 | 0.0366 | 0.5291 | 34.23 |
Further analysis of error distributions shows that LSTM-EM predictions are closely aligned with actual values, with most errors concentrated near zero. This indicates high reliability and stability, which is crucial for practical applications in EV charging station management. For example, operators can use these predictions to schedule maintenance during low-demand periods, minimizing downtime. The model’s ability to incorporate multiple data sources, such as station-specific and manufacturer-specific features, makes it adaptable to various EV charging station environments. Additionally, the use of embedding layers reduces the dimensionality of categorical data, improving computational efficiency without sacrificing accuracy.
In conclusion, the LSTM-EM model provides a robust solution for predicting failure rates in EV charging stations by effectively combining time-series and categorical data. The experimental results confirm its superiority over existing models, with significant improvements in key metrics. Future work will focus on incorporating additional features, such as real-time sensor data and environmental factors, to further enhance prediction accuracy. Moreover, extending the model to other types of electrical equipment could broaden its applicability, contributing to the development of smarter and more reliable infrastructure for the evolving EV ecosystem. As the number of EV charging stations continues to grow, such predictive models will play a vital role in ensuring operational efficiency and user satisfaction.
