Improved Decision Tree Prediction Method for EV Charging Station Failure Considering Typhoon Impact

In recent years, the rapid adoption of electric vehicles has heightened the importance of reliable EV charging station infrastructure. However, these stations are susceptible to various failures, particularly under extreme weather conditions like typhoons. Traditional fault prediction methods often rely on manual diagnostics or simplistic statistical models, which struggle to capture the complex interactions between environmental factors and equipment degradation. For instance, typhoons introduce high winds, heavy rainfall, and flooding, leading to physical damage, electrical short circuits, and corrosion in EV charging stations. These challenges necessitate advanced data-driven approaches that can integrate multidimensional data, including operational parameters and meteorological attributes, to enhance prediction accuracy. In this paper, we propose an improved decision tree-based fault prediction method for EV charging stations that explicitly incorporates typhoon influence factors and addresses feature coupling issues. By leveraging historical time-series data, fault logs, and environmental conditions, our approach aims to boost the operational reliability of EV charging stations during typhoon events, ensuring safer and more resilient charging infrastructure.

The core of our methodology involves preprocessing EV charging station data to handle inconsistencies and then enhancing the C4.5 decision tree algorithm to account for typhoon-related attributes and reduce feature redundancy. Data preprocessing is critical for ensuring model robustness, as raw data from EV charging stations often contain missing values, outliers, and varying scales. We employ a dynamic imputation strategy for missing values, which calculates sliding window averages around gaps to preserve local trends. For outliers, we remove samples that constitute less than 1% of the dataset and exhibit clear infeasibility, such as implausible voltage readings. Subsequently, we standardize the data using the Z-score method to normalize features while retaining their distribution characteristics, as shown in Equation 1:

$$x’ = \frac{x – \mu}{\sigma}$$

Here, $x$ represents the original data point, $\mu$ is the mean of all samples, $\sigma$ is the standard deviation, and $x’$ is the standardized value. This step ensures that features from EV charging station data, such as charging current and temperature, are on a comparable scale, facilitating better performance in machine learning models.

To address the limitations of traditional C4.5 decision tree algorithms in handling feature coupling and external factors like typhoons, we introduce a modified splitting criterion that integrates typhoon impact coefficients and attribute decoupling. The standard C4.5 algorithm uses information gain ratio for feature selection, which mitigates bias toward multi-valued attributes. The information entropy for a class variable $C$ in dataset $D$ is defined as:

$$I(C, D) = -\sum_{i=1}^{m} \frac{|C_i|}{|D|} \log_2 \frac{|C_i|}{|D|}$$

where $m$ is the number of classes, $|C_i|$ is the count of samples in class $C_i$, and $|D|$ is the total sample size. For an attribute $A_j$ with $t$ discrete values, the conditional entropy given $A_j$ is:

$$I(C, D|A_j) = \sum_{k=1}^{t} \frac{|D_{jk}|}{|D|} I(C, D_{jk})$$

The information gain is then computed as $G(C, D|A_j) = I(C, D) – I(C, D|A_j)$, and the information gain ratio is:

$$GR(C, D|A_j) = \frac{G(C, D|A_j)}{I(A_j, D)}$$

where $I(A_j, D)$ is the entropy of attribute $A_j$. However, this approach does not account for correlations between attributes or external factors like typhoons, which can lead to reduced accuracy in EV charging station fault prediction. To overcome this, we define a typhoon impact coefficient $\alpha_{ty}$ that quantifies the influence of typhoon conditions on EV charging station failures:

$$\alpha_{ty} = \omega_1 \times \frac{V_{\text{max}}}{V_{\text{crit}}} + \omega_2 R_{\text{total}}$$

In this equation, $V_{\text{max}}$ is the maximum typhoon wind speed affecting the EV charging station, $V_{\text{crit}}$ is the critical wind speed threshold for station damage, $R_{\text{total}}$ is the cumulative rainfall normalized to [0, 1], and $\omega_1$ and $\omega_2$ are weight coefficients set to 0.7 and 0.3, respectively, to emphasize wind impact. This coefficient is incorporated into the feature selection process to enhance sensitivity to typhoon-related attributes. Additionally, we measure the coupling between attributes to reduce redundancy. The conditional information gain ratio between attributes $A_j$ and $A_x$ is given by:

$$GR(A_j, D|A_x) = \frac{G(A_j, D|A_x)}{I(A_x, D)}$$

The average coupling of attribute $A_j$ with others in set $E$ (excluding $A_j$) is:

$$G_{\text{avg}}R(A_j, E) = \frac{\sum_{A_x \in E} GR(A_j, D|A_x)}{|E|}$$

We modify these equations to include the typhoon impact, using a feature indicator function $\delta(A_j)$ that equals 1 if $A_j$ is related to typhoon-induced failures in EV charging stations and 0 otherwise:

$$GR_{ty}(A_j, D|A_x) = GR(A_j, D|A_x) \times [1 + \alpha_{ty} \delta(A_j)]$$

$$G_{\text{avg}}R_{ty}(A_j, E) = \frac{\sum_{A_x \in E} GR_{ty}(A_j, D|A_x)}{|E|}$$

The final optimized gain ratio for splitting attributes in the decision tree is then:

$$GNR_{ty}(C, D|A_j) = \frac{GR_{ty}(C, D|A_j)}{G_{\text{avg}}R_{ty}(A_j, E)}$$

This criterion prioritizes attributes with high discriminatory power for fault classes while minimizing inter-attribute dependencies, thus improving the model’s generalization for EV charging station applications. The algorithm recursively builds the tree by selecting the attribute with the maximum $GNR_{ty}(C, D|A_j)$, handling continuous features through binary splitting, and applying pruning to avoid overfitting.

To validate our approach, we conducted experiments using a dataset comprising 85,500 training samples and 36,644 test samples from EV charging station operations, combined with typhoon meteorological data. The dataset included both normal operation and fault instances, with a class imbalance ratio of 99:1 for normal to fault samples, simulating real-world scenarios where EV charging station failures are rare but critical. We implemented the models in Python 3.12.0 with scikit-learn, using an Intel Core i5-8300H CPU. The simulation compared three methods: the standard C4.5 decision tree, a version with attribute decoupling only, and our proposed method with typhoon integration. Two datasets were used: a conventional dataset with only EV charging station operational data and a typhoon故障 dataset augmented with typhoon features like wind speed and rainfall.

For the conventional dataset, the results highlighted the limitations of traditional models in handling class imbalance. As shown in Table 1, while accuracy and precision were high, recall and AUC for fault samples were near random levels, indicating poor detection of EV charging station failures. This is because models tended to bias predictions toward the majority class. In contrast, the typhoon故障 dataset demonstrated significant improvements with our method. Table 2 summarizes the performance metrics, where our approach achieved an accuracy of 0.9077, compared to 0.6977 for standard C4.5, marking a 21 percentage point increase (30.1% relative improvement). Moreover, recall and AUC values approached 1.0, showing enhanced sensitivity to fault conditions under typhoon influence.

Table 1: Performance Metrics on Conventional Dataset for EV Charging Station Fault Prediction
Metric	C4.5 Decision Tree	Attribute Decoupling C4.5	Proposed Method
Accuracy (E)	0.9868	0.9813	0.9836
Recall (R)	1.0000	0.0275	0.9968
Precision (P)	0.9868	1.0000	0.9868
F1 Score (F1)	0.9934	0.0559	0.9917
Macro F1 (M_F1)	0.4967	0.5232	0.4983
AUC	0.4797	0.5144	0.5976

The superiority of our method is further illustrated by the metrics in the typhoon故障 dataset, where the integration of typhoon factors allowed the model to better distinguish fault conditions. For example, the recall of 0.9948 indicates that nearly all EV charging station failures were identified, reducing the risk of missed alarms during typhoons. The AUC value of 0.9977 confirms excellent class separation, outperforming other methods. These results underscore the importance of incorporating environmental data into fault prediction systems for EV charging stations, as typhoons introduce distinct failure patterns that conventional models may overlook.

Table 2: Performance Metrics on Typhoon故障 Dataset for EV Charging Station Fault Prediction
Metric	C4.5 Decision Tree	Attribute Decoupling C4.5	Proposed Method
Accuracy (E)	0.6977	0.8620	0.9077
Recall (R)	0.6926	0.8717	0.9948
Precision (P)	0.5973	0.9906	0.9280
F1 Score (F1)	0.5105	0.9236	0.9327
Macro F1 (M_F1)	0.6357	0.6363	0.9221
AUC	0.7947	0.6080	0.9977

In conclusion, our improved decision tree method effectively addresses the challenges of predicting EV charging station failures under typhoon conditions by integrating typhoon impact factors and optimizing feature selection. The proposed algorithm not only enhances prediction accuracy but also mitigates the bias toward majority classes, ensuring reliable fault detection in imbalanced datasets. This approach contributes to the resilience of EV charging station networks, supporting the broader adoption of electric vehicles in disaster-prone regions. Future work could explore real-time adaptation of the model using streaming data from EV charging stations or extend the method to other extreme weather events, further solidifying the role of data-driven solutions in smart grid infrastructure.