Data-Driven Safety Envelope for EV Battery Packs Under Side Pole Collision

The rapid proliferation of electric vehicles (EVs) has brought the crash safety of their high-voltage energy storage systems, the EV battery pack, to the forefront of engineering and regulatory concerns. Among various impact scenarios, the side pole collision presents a severe and complex threat. In such an event, a concentrated, localized impact from a rigid pole can cause significant intrusion into the EV battery pack structure, potentially leading to mechanical damage of battery cells, internal short circuits, thermal runaway, and catastrophic consequences. Therefore, developing accurate and rapid methods to evaluate and predict the mechanical response and safety risk of an EV battery pack under side pole impact is of paramount importance.

Traditional approaches rely heavily on physical testing, which is prohibitively expensive and time-consuming, or high-fidelity finite element (FE) simulations, which are computationally intensive and cannot provide real-time assessments. To overcome these limitations, this work proposes and develops a comprehensive data-driven framework for predicting the safety of an EV battery pack subjected to side pole collisions. The core methodology involves generating a high-quality dataset through controlled FE simulations, performing intelligent feature engineering, and training robust machine learning models that can instantaneously predict key safety metrics from a set of collision boundary conditions.

1. Finite Element Modeling and Simulation Strategy

To generate the data necessary for training a predictive model, a representative and computationally efficient FE model of a typical EV battery pack was developed. The model incorporates a regional refinement strategy to balance accuracy with computational cost. The region of the EV battery pack expected to sustain direct impact is modeled with high detail, including representations of critical internal components like jellyrolls, end plates, and busbars. Areas farther from the impact zone are represented using homogenized or simplified models to reduce the total element count and simulation time.

The side pole collision scenario is parameterized by several key boundary conditions that define the crash severity and geometry. These parameters form the input space for our subsequent data-driven model:

Impact Velocity (v): The initial speed of the EV battery pack towards the stationary pole.
Impact Angle (θ): The angle between the pack’s longitudinal axis and the direction of travel, where 90° represents a pure lateral impact.
Collision Position (L): The initial contact point of the pole along the side wall of the EV battery pack, measured relative to a reference point.
Mass Compensation (m): Additional mass uniformly applied to the EV battery pack’s mounting points to simulate different vehicle loading conditions (e.g., curb weight vs. gross vehicle weight).
Pole Diameter (d): The diameter of the rigid cylindrical obstacle.

To ensure the dataset comprehensively covers the multi-dimensional parameter space without an exhaustive and costly full-factorial sampling, an Optimized Latin Hypercube Sampling (OLHS) strategy was employed. Unlike standard LHS, OLHS maximizes the minimum distance between all sample points, leading to a more uniform and space-filling distribution. The distance metric used for optimization is the Euclidean distance between normalized sample points. Given a sample point i with parameters $x_i = [v_i, θ_i, L_i, m_i, d_i]$, after normalization to a [0,1] range for each dimension, the distance to another point j is:
$$ d(x_i, x_j) = \sqrt{\sum_{k=1}^{5} (x_{i,k} – x_{j,k})^2} $$
The optimization goal is:
$$ \text{maximize: } \min_{i \neq j} d(x_i, x_j) $$
This process generated a well-distributed set of 200 unique simulation configurations, efficiently exploring variations in collision energy, geometry, and load case.

Table 1: Range of Parameters for the EV Battery Pack Side Pole Collision Simulation Matrix.
Parameter	Symbol	Range	Unit
Impact Velocity	$v$	27 – 52	km/h
Impact Angle	$θ$	75 – 90	°
Collision Position	$L$	0 – 120	mm
Mass Compensation	$m$	100 – 300	kg
Pole Diameter	$d$	254 – 354	mm

2. Dataset Generation and Feature Engineering

Each of the 200 FE simulations was executed, and the results were processed to extract quantitative measures that define the safety state of the EV battery pack post-impact. These measures serve as the target outputs for the machine learning model. Automated image processing techniques were applied to the deformed geometry to extract the following metrics consistently and accurately:

Side Wall Maximum Intrusion Depth ($I_1$): The maximum inward deformation of the battery pack’s side wall.
Intrusion Location ($X_{\text{max}}$): The longitudinal position along the side wall where the maximum intrusion occurs.
Intrusion Width ($W$): The lateral extent of the significant deformation zone on the side wall.
Jellyroll Maximum Intrusion ($I_2$): The maximum deformation measured in the internal battery jellyrolls, directly indicative of internal short-circuit risk.

Simply using the raw input parameters for machine learning may not capture the underlying physics optimally. Therefore, feature engineering was performed to create new, more informative input features. This process involves deriving parameters that have a stronger mechanistic link to the response of the EV battery pack.

Kinetic Energy Feature: The total initial kinetic energy is a primary driver of structural deformation. It is calculated as:
$$ e = \frac{1}{2} (m_0 + m) v^2 $$
where $m_0$ is the baseline mass of the EV battery pack model.
Velocity Components: The impact velocity vector is decomposed into components normal and tangential to the EV battery pack side wall:
$$ v_x = v \cdot \sin(\theta) $$
$$ v_y = v \cdot \cos(\theta) $$
These components more directly relate to the effective impact severity and sliding behavior than the speed and angle separately.

To scientifically select the final set of input features, Pearson correlation analysis was conducted. The correlation coefficient $r$ between a feature $X$ and a target $Y$ is given by:
$$ r_{XY} = \frac{\sum_{i=1}^{n} (X_i – \bar{X})(Y_i – \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i – \bar{X})^2} \sqrt{\sum_{i=1}^{n} (Y_i – \bar{Y})^2}} $$
Analysis revealed that the derived features $e$, $v_x$, and $v_y$ showed consistently higher correlations with the target safety metrics ($I_1$, $W$, $X_{\text{max}}$, $I_2$) compared to the original parameters $v$, $θ$, and $m$. Furthermore, the pole diameter $d$ exhibited very weak correlations. Consequently, the final input feature vector for model training was selected as: [$L$, $e$, $v_x$, $v_y$]. This step reduces dimensionality, mitigates multicollinearity, and focuses the model on the most physically relevant inputs for predicting EV battery pack damage.

3. Development and Evaluation of Data-Driven Prediction Models

With the curated dataset (200 samples, 4 input features, 4 output targets), three distinct machine learning algorithms were trained to establish the mapping from collision conditions to the EV battery pack’s mechanical response. The dataset was split into a training set (70%) and an independent test set (30%).

3.1 Machine Learning Algorithms

Support Vector Machine (SVM): A powerful algorithm for regression that finds the hyperplane which maximizes the margin while minimizing error. It is particularly effective in high-dimensional spaces and can model non-linear relationships using kernel functions (Radial Basis Function kernel was used here).
Random Forest (RF): An ensemble method that constructs a multitude of decision trees during training. The final prediction is the average of the individual tree predictions. RF is robust to overfitting and provides estimates of feature importance.
Back Propagation Neural Network (BPNN): A multi-layer perceptron with one or more hidden layers. It learns complex, non-linear relationships through iterative adjustment of connection weights using gradient descent. The architecture used here contained two hidden layers with 50 neurons each.

3.2 Model Performance and Accuracy
The performance of each trained model was rigorously evaluated on the unseen test set using multiple statistical metrics:
$$ \text{Mean Absolute Error (MAE)} = \frac{1}{n}\sum_{i=1}^{n} |y_i – \hat{y}_i| $$
$$ \text{Root Mean Squared Error (RMSE)} = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i – \hat{y}_i)^2} $$
$$ \text{Coefficient of Determination (R}^2\text{)} = 1 – \frac{\sum_{i=1}^{n} (y_i – \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i – \bar{y})^2} $$
where $y_i$ is the actual FE simulation value, $\hat{y}_i$ is the model prediction, and $\bar{y}$ is the mean of the actual values.

Table 2: Performance Comparison of Machine Learning Models for EV Battery Pack Safety Prediction.
Model	Target Metric	$R^2$	MAE (mm)	RMSE (mm)
SVM	Max Intrusion Depth ($I_1$)	0.984	2.56	3.51
	Intrusion Width ($W$)	0.937	18.73	25.69
	Intrusion Location ($X_{\text{max}}$)	0.964	5.20	6.96
	Jellyroll Intrusion ($I_2$)	0.964	1.42	1.77
Random Forest	Max Intrusion Depth ($I_1$)	0.957	4.59	5.75
	Intrusion Width ($W$)	0.944	17.63	24.28
	Intrusion Location ($X_{\text{max}}$)	0.931	7.35	9.62
	Jellyroll Intrusion ($I_2$)	0.954	1.34	1.99
BPNN	Max Intrusion Depth ($I_1$)	0.969	3.83	4.84
	Intrusion Width ($W$)	0.948	18.55	23.33
	Intrusion Location ($X_{\text{max}}$)	0.968	5.23	6.58
	Jellyroll Intrusion ($I_2$)	0.885	2.57	3.15

The results demonstrate that all three data-driven models achieve high predictive accuracy for the mechanical response of the EV battery pack. The SVM model delivered the best overall performance, with an average $R^2$ of 0.96 across the four target metrics. Predictions for the side wall maximum intrusion depth ($I_1$) were exceptionally accurate across all models ($R^2 > 0.95$), which is critical as this is a primary indicator of overall structural compromise. The prediction of jellyroll intrusion ($I_2$), while still good, showed slightly lower $R^2$ for the BPNN, partly due to the challenge of predicting near-zero deformations in low-severity cases.

3.3 Robustness Analysis with Gaussian Noise
A practical predictive model for EV battery pack safety must be robust to uncertainties and noise inherent in real-world data or measurement systems. To evaluate robustness, Gaussian noise with zero mean and varying standard deviation ($\sigma$) was added to the entire training dataset before re-training the models. The performance on the original, clean test set was then re-evaluated.

Table 3: Robustness Analysis of Models Under Training Data Corrupted by Gaussian Noise.
Model	Noise Level ($\sigma$)	Avg. $R^2$ ($I_1, W, X_{\text{max}}, I_2$)	Performance Notes
SVM	0.01	~0.95	Minimal degradation.
	0.1	~0.93	Moderate decrease, stable.
	0.5	~0.90	Significant but acceptable drop.
Random Forest	0.01	~0.94	Minimal degradation.
	0.1	~0.90	Larger decrease than SVM.
	0.5	~0.90	Similar to SVM at high noise.
BPNN	0.01	~0.94	Minimal degradation.
	0.1	~0.93	Very stable performance.
	0.5	~0.91	Best robustness; least degradation.

The robustness test revealed a key finding: while the SVM model achieved the highest accuracy on clean data, the BPNN model exhibited superior robustness against noise in the training data. Even with a substantial noise level ($\sigma = 0.5$), the BPNN maintained an average $R^2$ above 0.91, showing the smallest performance drop. This characteristic makes the BPNN a strong candidate for applications where training data may be noisy or less precise, a common scenario in early-stage EV battery pack design and analysis.

4. Conclusion and Practical Application

This work successfully establishes a data-driven framework for the rapid and accurate prediction of EV battery pack safety under side pole collision conditions. The methodology bridges high-fidelity physics-based simulation with efficient machine learning, creating a powerful tool for design evaluation and risk assessment.

The key conclusions are as follows:

The integration of optimized sampling, physics-informed feature engineering (kinetic energy, velocity components), and correlation analysis is crucial for building an effective predictive model for EV battery pack response.
Among the tested algorithms, the Support Vector Machine (SVM) model delivered the highest prediction accuracy on clean data, making it ideal for scenarios where precision is paramount and data quality is high.
The Back Propagation Neural Network (BPNN) demonstrated superior robustness when the training data contained significant noise. This resilience is highly valuable for practical engineering applications involving data variability or uncertainty.
The Random Forest (RF) model offered a strong balance between accuracy, training speed, and interpretability (through feature importance scores), suitable for rapid prototyping and understanding parameter influences on EV battery pack safety.

The developed models can instantly predict critical intrusion metrics—side wall deformation and internal jellyroll compression—from simple inputs: collision location, energy, and velocity components. This allows engineers to:

Perform rapid virtual safety sweeps across a wide range of collision scenarios.
Identify high-risk boundary conditions for a given EV battery pack design.
Integrate the model into broader vehicle safety system simulations or optimization loops.
Potentially support real-time risk assessment in advanced vehicle safety systems.

The framework is inherently flexible. While demonstrated on a specific EV battery pack architecture, the approach can be extended to other pack geometries, cell formats, and even different impact modes (e.g., underbody impact, frontal corner impact) by generating appropriate simulation datasets. Future work will focus on expanding the dataset diversity, incorporating multi-physics responses such as short-circuit prediction directly, and integrating the model into full-vehicle safety assessment toolchains to further enhance the development of crashworthy electric vehicles.