In recent years, the rapid growth of the electric vehicle industry has led to a significant increase in the deployment of EV charging stations. However, irregular placement of these stations, particularly roadside EV charging stations, poses serious safety hazards, such as electrical accidents and traffic disruptions. Accurate detection and localization of EV charging stations are crucial for ensuring safe and efficient operations. Traditional object detection methods often struggle with complex environments, diverse shapes, and occlusions, such as small or obscured cables. To address these challenges, we propose an enhanced YOLOv8-based model for EV charging station detection. Our approach integrates deformable convolutions, attention mechanisms, and an optimized loss function to improve feature representation and detection accuracy in dynamic road scenarios.

The core of our method lies in modifying the YOLOv8 architecture to better handle the deformations and scale variations common in EV charging station components, like cables. We replace standard convolutions in the C2f module with Deformable Convolution Networks v3 (DCNv3), forming a C2f_DCNv3 module. This adaptation allows the model to dynamically adjust sampling positions, capturing geometric transformations and enhancing robustness. Additionally, we incorporate the Large Separable Kernel Attention (LSKA) module into the backbone network to expand the receptive field and emphasize small target features, such as thin cables against complex backgrounds. Furthermore, we substitute the Complete Intersection over Union (CIoU) loss function with Weighted Intersection over Union (WIoU) to increase sensitivity to boundary details and improve localization precision. These modifications collectively address issues like low resolution, background clutter, and occlusions, which are prevalent in EV charging station environments.
To evaluate our model, we conducted experiments using a custom dataset of EV charging station images captured under varying conditions, including different times of day, weather, and traffic densities. The dataset comprises 2,000 high-resolution images (3,024×1,964 pixels), annotated and split into training, validation, and test sets in a 7:2:1 ratio. Performance metrics, including Precision (P), Recall (R), and mean Average Precision (mAP), were used to assess the model. Our results demonstrate that the proposed improvements significantly enhance detection accuracy compared to baseline YOLOv8 and other state-of-the-art methods. For instance, our model achieves a mAP50 of 0.855 and mAP50-95 of 0.712, outperforming YOLOv8n by 4.7% and 2.8%, respectively. This underscores the effectiveness of our approach in real-world EV charging station detection tasks.
The YOLOv8 algorithm, known for its efficiency and flexibility, consists of a backbone network, neck network, and head network. It employs an anchor-free detection head and advanced components like the C2f module for feature fusion. However, its fixed convolutional operations may lead to feature loss, especially for small objects like EV charging station cables. Our enhanced model addresses this by integrating DCNv3 into the C2f module, enabling adaptive feature extraction. The C2f_DCNv3 module processes input features through deformable convolutions, which learn offsets to sample relevant regions, thus handling deformations more effectively. The structure involves multiple Bottleneck layers with deformable convolutions, reducing redundancy and improving computational efficiency. Mathematically, deformable convolution can be represented as:
$$ y(p) = \sum_{k=1}^{K} w_k \cdot x(p + p_k + \Delta p_k) $$
where \( y(p) \) is the output at position \( p \), \( w_k \) denotes the kernel weights, \( x \) is the input feature map, \( p_k \) is the predefined sampling offset, and \( \Delta p_k \) is the learned offset. This allows the model to focus on irregular shapes, such as bent or folded cables in EV charging stations.
Another key improvement is the integration of the LSKA module, which combines large kernel convolutions with separable attention mechanisms. This module enhances the model’s ability to capture multi-scale features and emphasize small targets. The LSKA structure uses depthwise convolutions and dilated convolutions to expand the receptive field without increasing parameters significantly. For example, the attention mechanism computes weights to highlight important features, which is vital for distinguishing EV charging stations from noisy backgrounds. The attention output can be expressed as:
$$ A = \sigma \left( \text{DW-Conv}(F) \right) \otimes F $$
where \( A \) is the attended feature map, \( \sigma \) is the sigmoid function, DW-Conv denotes depthwise convolution, \( F \) is the input feature map, and \( \otimes \) represents element-wise multiplication. This helps in accurately localizing EV charging station components, even in cluttered scenes.
We also optimize the loss function by replacing CIoU with WIoU. The WIoU loss incorporates a weighting mechanism to penalize distance discrepancies between predicted and ground-truth bounding boxes, making it more sensitive to boundary errors. The formula for WIoU is:
$$ \text{WIoU} = \text{IoU} – \lambda \frac{d_x^2 + d_y^2}{c_x^2 + c_y^2} $$
where IoU is the intersection over union, \( \lambda \) is a weight coefficient, \( d_x \) and \( d_y \) are the center point distance differences, and \( c_x \) and \( c_y \) are the maximum differences in boundaries. This refinement improves the model’s precision in detecting EV charging stations, particularly for small or partially occluded objects.
In our experiments, we configured the environment with a Windows OS, NVIDIA Quadro P4000 GPU, Python 3.8, PyTorch 1.8.0, and CUDA 11.6. We used standard evaluation metrics, including Precision, Recall, and mAP, defined as:
$$ P = \frac{TP}{TP + FP} $$
$$ R = \frac{TP}{TP + FN} $$
$$ \text{mAP} = \frac{1}{n} \sum_{i=1}^{n} \int_{0}^{1} P_i(R) dR $$
where TP, FP, and FN represent true positives, false positives, and false negatives, respectively, and \( n \) is the number of classes. For EV charging station detection, we consider a single class, so \( n = 1 \).
We performed ablation studies to validate each component of our model. The results are summarized in the table below, which shows the impact of individual modifications on performance metrics. The baseline YOLOv8n model achieved a mAP50 of 0.808, and with the addition of C2f_DCNv3, LSKA, and WIoU, we observed incremental improvements.
| Configuration | C2f_DCNv3 | WIoU | LSKA | mAP50 | mAP50-95 | Precision (P) | Recall (R) |
|---|---|---|---|---|---|---|---|
| Baseline | 0.808 | 0.684 | 0.936 | 0.714 | |||
| Config 1 | ✓ | 0.816 | 0.677 | 0.951 | 0.723 | ||
| Config 2 | ✓ | 0.822 | 0.686 | 0.958 | 0.748 | ||
| Config 3 | ✓ | 0.831 | 0.698 | 0.955 | 0.730 | ||
| Config 4 | ✓ | ✓ | 0.829 | 0.695 | 0.953 | 0.746 | |
| Config 5 | ✓ | ✓ | 0.838 | 0.701 | 0.948 | 0.760 | |
| Config 6 | ✓ | ✓ | 0.846 | 0.706 | 0.961 | 0.728 | |
| Full Model | ✓ | ✓ | ✓ | 0.855 | 0.712 | 0.967 | 0.778 |
As shown, the full model with all improvements achieves the highest mAP50 and Recall, indicating its superiority in detecting EV charging stations. The C2f_DCNv3 module alone boosts mAP50 by 0.8%, while LSKA contributes a 2.3% increase, and WIoU adds 1.4%. The combined effect results in a 4.7% improvement over YOLOv8n, highlighting the synergy of these components.
We also compared our model with other popular object detection algorithms, including Faster R-CNN, SSD, DETR, and various YOLO variants. The following table presents the results, demonstrating that our approach outperforms all others in terms of mAP50 and mAP50-95 for EV charging station detection.
| Algorithm | mAP50 | mAP50-95 | Precision (P) | Recall (R) |
|---|---|---|---|---|
| Faster R-CNN | 0.638 | 0.467 | 0.771 | 0.302 |
| SSD | 0.712 | 0.557 | 0.832 | 0.658 |
| DETR | 0.689 | 0.562 | 0.858 | 0.607 |
| YOLOv5n | 0.781 | 0.664 | 0.941 | 0.706 |
| YOLOv6n | 0.777 | 0.651 | 0.938 | 0.701 |
| YOLOv7-tiny | 0.788 | 0.678 | 0.948 | 0.712 |
| YOLOv8n | 0.808 | 0.684 | 0.936 | 0.714 |
| YOLOv9-T | 0.821 | 0.681 | 0.942 | 0.736 |
| Our Model | 0.855 | 0.712 | 0.967 | 0.778 |
Our model achieves a 7.4% higher mAP50 than YOLOv5n and a 3.4% improvement over YOLOv9-T, confirming its advanced capabilities. The enhanced Recall of 0.778 indicates fewer missed detections, which is critical for safety applications involving EV charging stations. Visual comparisons further show that our model produces higher confidence scores and better localization in complex scenes, such as those with overlapping objects or poor lighting.
In conclusion, we have developed an improved YOLOv8-based algorithm for EV charging station detection that effectively addresses challenges like deformation, occlusion, and scale variation. By integrating DCNv3, LSKA, and WIoU, we achieve significant gains in accuracy and robustness. This work provides a reliable solution for monitoring and managing EV charging infrastructure, contributing to safer urban environments. Future efforts will focus on deploying this model in real-time systems and extending it to handle additional anomalies, such as damaged components or unauthorized placements, further enhancing the utility of EV charging station networks.
The mathematical formulations and experimental validations underscore the importance of adaptive feature extraction and loss optimization in object detection. For instance, the WIoU loss function’s weighting mechanism can be extended to other applications, while the LSKA module’s large kernel approach offers insights into handling multi-scale targets. As the adoption of EV charging stations continues to grow, such advancements will play a pivotal role in ensuring efficient and secure operations.
