Advanced Fault Detection Framework for Lithium-ion Batteries in Electric Vehicles

The widespread adoption of battery electric vehicles (BEVs) represents a pivotal shift towards sustainable transportation. The lithium-ion battery pack, serving as the core energy storage unit, is fundamental to the performance, range, and, most critically, the safety of these vehicles. However, batteries are susceptible to various failure modes, including performance degradation, thermal runaway, and internal short circuits, especially under complex operating conditions involving charge-discharge cycles, long-term usage, and extreme temperatures. Timely detection of such faults is paramount to prevent performance loss and mitigate severe safety hazards. Consequently, developing intelligent and efficient fault detection systems for battery electric vehicles has become an area of significant research interest and practical importance.

Traditional fault diagnosis methods, such as model-based analysis and signal statistics, often struggle with the high dimensionality, nonlinearity, and complex interdependencies inherent in real-world battery management system (BMS) data. While data-driven approaches, particularly deep learning, have shown promise due to their ability to learn representations automatically, challenges remain. These include effectively modeling multi-source heterogeneous time-series data (e.g., voltage, current, temperature), handling the extreme scarcity of labeled fault samples, and capturing the strong dynamic correlations among battery parameters. Many advanced models, though powerful, incur high computational costs that hinder deployment on resource-constrained vehicle platforms.

To address these challenges in fault detection for battery electric vehicles, this work proposes a novel prediction-evaluation framework centered on a Dynamic Transformer Memory Autoencoder (DTMAD). The framework operates under an unsupervised learning paradigm, eliminating the dependency on large labeled fault datasets. The core innovation lies in its dual-path architecture designed to extract and fuse key response patterns from multi-source时序 data. The framework’s prediction model learns to reconstruct normal operational patterns, while the evaluation model assesses deviations from these patterns to identify anomalies. The primary contributions of this framework for enhancing the safety of battery electric vehicles are:

A Joint Feature Encoder that integrates a Gated Recurrent Unit (GRU) with a Variational Autoencoder (VAE) to perform feature fusion and dimensionality reduction on multi-source time-series data, extracting deep cross-modal representations.
A Pre-response Encoder based on a self-attention mechanism, enabling parallel and efficient capture of long-term temporal dependencies within the sequential data from the battery electric vehicle.
A Memory Parsing Module that employs a residual contrastive learning mechanism to align the prediction path with the actual response path, significantly enhancing the model’s discriminative power for detecting anomalous patterns.
A comprehensive Evaluation Model based on the analysis of reconstruction error distributions and the Area Under the Receiver Operating Characteristic Curve (AUROC) metric, providing a robust assessment of the framework’s fault detection capability for battery electric vehicle applications.

System Model and Methodology

Overall Framework Architecture

The proposed fault detection framework for battery electric vehicles consists of two main components: the Prediction Model and the Evaluation Model. The overall goal is to learn the normal operational manifold from multi-source time-series data and identify deviations as potential faults.

The Prediction Model (DTMAD) is responsible for learning and reconstructing the key response patterns. Its architecture includes:

Joint Feature Encoder: Fuses and compresses sequential response features using a GRU-VAE structure.
Pre-response Encoder: Models input sequences using self-attention for efficient feature extraction.
Memory Parsing Module: Integrates the outputs from the two encoders via cross-attention and residual comparison to perform the final reconstruction.

The Evaluation Model then analyzes the reconstruction errors produced by the Prediction Model. It employs a synergistic anomaly detection algorithm that calculates anomaly scores, performs threshold traversal, and ultimately evaluates performance using the AUROC metric. This two-stage approach allows for effective fault detection in battery electric vehicles without requiring fault labels during training.

The Prediction Model: DTMAD

1. Joint Feature Encoder

The Joint Feature Encoder processes the multi-variate temporal response features from the battery electric vehicle (e.g., temperatures, voltages). It is designed to capture sequential dependencies and learn a probabilistic latent representation. The encoder comprises three sub-modules:

First-layer GRU Encoder: Takes the sequential response features $Y = \{y_1, y_2, …, y_t\}$ as input, where each $y_t$ is a vector of features at time $t$. The GRU updates its hidden state $h_t$ using its gating mechanisms, capturing temporal dependencies. The final hidden state $h_T$ serves as a compressed summary of the input sequence.
$$h_t = \text{GRU}(y_t, h_{t-1})$$
Variational Autoencoder (VAE): The final GRU hidden state $h_T$ is fed into the VAE module. The encoder maps $h_T$ to the parameters of a latent distribution, typically Gaussian. Two separate fully-connected layers predict the mean $\mu$ and log-variance $\log(\sigma^2)$.
$$\mu = W_{\mu}^T h_T + b_{\mu}, \quad \log(\sigma^2) = W_{\sigma}^T h_T + b_{\sigma}$$
A latent variable $z$ is sampled using the reparameterization trick: $z = \mu + \sigma \odot \epsilon$, where $\epsilon \sim \mathcal{N}(0, I)$. This introduces stochasticity and regularizes the latent space.
Second-layer GRU Decoder: The sampled latent variable $z$ is used as the initial hidden state $h’_0$ for the decoder GRU. Along with additional conditional input features $X_{cond}$, the decoder reconstructs the target output sequence $\hat{Y}$.
$$h’_t = \text{GRU}([x_{cond,t}, h’_{t-1}]), \quad \hat{y}_t = f(h’_t)$$
where $f$ is an output layer.

This structure allows the model to learn a smooth, generative latent space that encapsulates the normal operational modes of the battery electric vehicle.

2. Pre-response Encoder

Operating in parallel to the Joint Feature Encoder, the Pre-response Encoder processes the same or related input features $X_{src}$ but focuses on extracting global temporal dependencies efficiently using the Transformer architecture. This is crucial for modeling long-range interactions in battery electric vehicle data.

Input Scaling and Embedding: The input features are first scaled by $\sqrt{d_{model}}$ for stability and then passed through a linear layer to project them into a higher-dimensional space $d_{model}$.
$$X’ = \text{Linear}(X_{src} \cdot \sqrt{d_{model}})$$
Positional Encoding: To inject sequential order information, sinusoidal positional encodings $PE$ are added to $X’$.
$$X_{emb} = X’ + PE(pos)$$
Multi-Head Self-Attention (MHSA): The embedded sequence is processed by a multi-head self-attention layer. For each head $i$:
$$\text{Head}_i = \text{Softmax}\left(\frac{Q_i K_i^T}{\sqrt{d_k}}\right)V_i$$
where $Q_i = X_{emb}W_i^Q$, $K_i = X_{emb}W_i^K$, $V_i = X_{emb}W_i^V$. The outputs of all heads are concatenated and linearly projected.
Residual Connection & Layer Normalization: The output of MHSA is added to the original input $X_{emb}$ and normalized.
$$\text{memory} = \text{LayerNorm}(X_{emb} + \text{Dropout}(\text{MHSA}(X_{emb})))$$
The resulting tensor, denoted as $\text{memory}$, carries rich contextual information about the input sequence.

3. Memory Parsing Module

This module is the fusion center of the DTMAD framework. It integrates the detailed, autoregressively generated path from the Joint Feature Encoder (the target path, $\text{tgt}$) with the globally contextualized memory path from the Pre-response Encoder.

Target Path Self-Attention: The initial reconstruction from the Joint Feature Encoder decoder ($\text{tgt}$) is further refined using a self-attention layer to enhance its internal coherence.
Cross-Attention Fusion: A multi-head cross-attention mechanism is employed where the refined $\text{tgt}$ acts as the Query, and the $\text{memory}$ tensor acts as both the Key and Value. This allows the model to “attend to” relevant contextual information from the Pre-response Encoder while generating the final output.
$$\text{output} = \text{CrossAttention}(\text{tgt}, \text{memory}, \text{memory})$$
$$\text{where CrossAttention}(Q, K, V) = \text{Softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$
Residual Contrastive Output: The final prediction $\hat{Y}_{final}$ is obtained. The reconstruction error for the $j$-th sample is computed as the Mean Squared Error (MSE) between the final prediction and the true response $Y$:
$$\text{ReconstructionError}_j = \frac{1}{N} \sum_{i=1}^{N} (y_i – \hat{y}_{final,i})^2$$
This error serves as the primary anomaly score.

4. Loss Function

The Prediction Model is trained by minimizing a composite loss function $\mathcal{L}_{TOTAL}$ that balances reconstruction fidelity, latent space regularization, and weak supervision from available metadata (e.g., mileage in a battery electric vehicle).

$$ \mathcal{L}_{TOTAL} = w_1 \cdot \mathcal{L}_{SM} + w_2 \cdot \mathcal{L}_{MILE} + w_3 \cdot \mathcal{L}_{KL} $$

The components are:

Primary Reconstruction Loss ($\mathcal{L}_{SM}$): Measures the discrepancy between the reconstructed and actual sequences. The Smooth L1 Loss is found to be particularly effective.
$$ \mathcal{L}_{Smooth} = \frac{1}{n} \sum_{i} \begin{cases} 0.5 (y_i – \hat{y}_i)^2, & \text{if } |y_i – \hat{y}_i| < 1 \\ |y_i – \hat{y}_i| – 0.5, & \text{otherwise} \end{cases} $$
Weakly Supervised Mileage Loss ($\mathcal{L}_{MILE}$): Incorporates vehicle mileage as a weak signal using Mean Squared Error, helping the model adapt to usage-related changes.
$$ \mathcal{L}_{MILE} = \frac{1}{n} \sum_{i} (m_i – \hat{m}_i)^2 $$
KL Divergence Loss ($\mathcal{L}_{KL}$): Regularizes the latent space of the VAE by forcing the learned distribution $q(z|x)$ to approximate a standard normal prior $p(z) = \mathcal{N}(0, I)$. This encourages a smooth and structured latent space, improving generalization.
$$ \mathcal{L}_{KL} = \frac{1}{2} \sum_{j=1}^{J} (1 + \log(\sigma_j^2) – \mu_j^2 – \sigma_j^2) $$
where $J$ is the latent dimension.

The weights $w_1, w_2, w_3$ are hyperparameters that control the influence of each term, typically with $w_1$ being the largest and $w_3$ often scheduled with an annealing strategy.

The Evaluation Model: AUROC-based Assessment

Since the framework is trained in an unsupervised manner on predominantly normal data from battery electric vehicles, its fault detection performance is evaluated based on the reconstruction error. The Evaluation Model formalizes this process.

Anomaly Score Calculation: For each vehicle charging segment, the average reconstruction error across all time steps and features is computed as its anomaly score.
Threshold Traversal & ROC Analysis: A detection threshold $\tau$ is varied across the range of anomaly scores. For each $\tau$, samples with scores $> \tau$ are labeled as “faulty,” and others as “normal.” Comparing these predictions against the ground truth labels (used only for evaluation) yields the True Positive Rate (TPR) and False Positive Rate (FPR).
$$ \text{TPR}(\tau) = \frac{TP(\tau)}{TP(\tau) + FN(\tau)}, \quad \text{FPR}(\tau) = \frac{FP(\tau)}{FP(\tau) + TN(\tau)} $$
AUROC Computation: Plotting TPR against FPR for all thresholds generates the Receiver Operating Characteristic (ROC) curve. The Area Under this curve (AUROC) provides a single scalar metric quantifying the model’s ability to discriminate between normal and faulty states of the battery electric vehicle, with 1.0 representing perfect discrimination and 0.5 representing random guessing.

Experiments and Results

Dataset and Experimental Setup

The framework is validated on a real-world dataset collected from 100 battery electric vehicles via their Battery Management Systems (BMS). The data comprises multi-variate time-series records from various charging sessions. Key statistics of the dataset are summarized below.

Description	Value
Total Vehicles	100
Normal Vehicles	91
Faulty Vehicles	9
Data Shape per Session	[128 time steps × 7 features]
Core Features	Soc (X1), Current (X2), Min/Max Temp (Y1,Y3), Min/Max Cell Voltage (Y4,Y2), Pack Voltage (Y5), Mileage (Y6)

Data from 80% of the normal vehicles is used for unsupervised training. The test set consists of the remaining 20% of normal data and all available faulty data. All experiments employ 5-fold cross-validation for robust evaluation. Model parameters are detailed in the following table.

Hyperparameter	Value	Hyperparameter	Value
Optimizer	AdamW	VAE Latent Dim	16
Learning Rate	0.01	GRU Hidden Size	128
Training Epochs	20	Attention Heads	5
Batch Size	128	Dropout Rate	0.1
Loss Weight $w_1$	10	Loss Weight $w_2$	0.001
Loss Weight $w_3$	0.1 (Annealed)

Analysis of the Prediction Model

1. Loss Function and Training Dynamics

An ablation study on the primary reconstruction loss $\mathcal{L}_{SM}$ confirms the superiority of the Smooth L1 loss for this task on battery electric vehicle data, leading to lower overall training loss compared to MSE and MAE.

Model	MSE Loss	MAE Loss	Smooth L1 Loss
Vanilla AE	0.13447	0.27487	0.06663
DyAD (Baseline)	0.07550	0.42673	0.03420
DTAD (Ours)	0.09213	0.47081	0.03344
DTMAD (Ours)	0.06996	0.49919	0.02925

Consequently, Smooth L1 is adopted. The training progression of the composite loss $\mathcal{L}_{TOTAL}$ and its components for different model variants is analyzed. The proposed DTMAD model achieves the lowest final total loss, indicating its enhanced capacity to model the normal operational data of the battery electric vehicle.

Model	$\mathcal{L}_{SM}$	$\mathcal{L}_{KL}$	$\mathcal{L}_{MILE}$	$\mathcal{L}_{TOTAL}$
DyAD	0.00531	0.16899	9.38419	0.07945
DTAD	0.00556	0.15506	9.35867	0.08048
DMAD	0.00546	0.14605	9.35563	0.07861
DTMAD	0.00492	0.15487	9.32203	0.07402

2. Reconstruction Performance and Trend Prediction

Qualitative analysis of the reconstructed time-series for key features like maximum cell voltage (Y2) and pack voltage (Y5) shows that the DTMAD model produces predictions that closely follow the true signal trends with minimal deviation, outperforming other variants. Quantitatively, the distribution of reconstruction errors across a test sample set reveals that DTMAD achieves consistently lower error values compared to the baseline and other ablations, validating its superior feature representation capability for the complex data from a battery electric vehicle.

3. Computational Efficiency

While introducing additional modules, the overall training time of DTMAD remains manageable. The Pre-response Encoder operates in parallel with the Joint Feature Encoder, mitigating significant overhead. The average total training time for DTMAD is approximately 780 seconds, which is a modest increase compared to the baseline DyAD (~704 seconds) and is justified by the substantial gain in performance, making it feasible for offline model development for battery electric vehicle diagnostics.

Model	Average Training Time (seconds)
DyAD (M5)	703.9
DTAD (M6)	716.9
DMAD (M7)	770.9
DTMAD (M8)	780.0

Performance of the Evaluation Model

1. Feature Combination Analysis

The impact of different combinations of response features (Y1-Y5) on the AUROC score was investigated. The results indicate that a specific subset of features, particularly those related to voltage extremes and temperatures, provides the most discriminative power for fault detection in this battery electric vehicle dataset. For instance, the combination {Y2: Max Cell Voltage, Y4: Min Cell Voltage} yielded a high AUROC of 0.924, suggesting these features are critical indicators of anomalous battery behavior.

2. Comparative Fault Detection Performance

The proposed DTMAD framework is compared against several state-of-the-art and baseline models for unsupervised anomaly detection on the battery electric vehicle dataset. The results, measured by the AUROC metric (mean ± standard deviation over 5 folds), are presented below.

Model	Description	AUROC Score
AE	Standard Autoencoder	0.6686
DeepSVDD	Deep Support Vector Data Description	0.6841
GDN	Graph Deviation Network	0.8018
GP	Gaussian Process (Baseline)	0.7063
DyAD (M5)	Dynamic Autoencoder (Our Baseline)	0.8691 ± 0.029
DTAD (M6)	DyAD + Pre-response Encoder	0.8873 ± 0.031
DMAD (M7)	DyAD + Memory Parsing Module	0.8859 ± 0.036
DTMAD (M8)	Full Proposed Framework	0.9008 ± 0.026

The proposed DTMAD framework achieves the highest AUROC score of 0.901, demonstrating a clear improvement over all baseline methods. Notably, it also exhibits a lower standard deviation (0.026) compared to its ablations (DTAD: 0.031, DMAD: 0.036), indicating more stable and reliable performance across different data folds. This underscores the effectiveness of the combined architectural innovations—the joint feature encoding, the attention-based pre-response encoding, and the memory parsing fusion—in building a robust fault detector for battery electric vehicles. The ROC curves visually confirm that DTMAD maintains a higher True Positive Rate across most False Positive Rate ranges, offering a better trade-off for practical deployment where early and accurate fault detection is critical.

Conclusion

This work presented a novel prediction-evaluation framework, DTMAD, for unsupervised fault detection in lithium-ion batteries of battery electric vehicles. The framework successfully addresses key challenges: modeling multi-source heterogeneous time-series data, operating without fault labels, and capturing complex variable interdependencies. The dual-path architecture, featuring a GRU-VAE based Joint Feature Encoder and a Transformer-based Pre-response Encoder, effectively learns a rich representation of normal battery operation. The Memory Parsing Module fuses these paths to produce accurate reconstructions, whose errors serve as sensitive anomaly indicators.

Comprehensive experiments on real-world battery electric vehicle data demonstrate that DTMAD significantly outperforms established benchmarks like Autoencoders, DeepSVDD, GDN, and its own baseline DyAD. It achieves a superior AUROC of 0.901 with enhanced stability, proving its efficacy in distinguishing faulty states from normal operation. The framework provides a potent tool for enhancing the safety and reliability of battery electric vehicles through proactive fault detection. Future work will focus on further optimizing the model’s computational footprint for real-time, on-vehicle deployment and validating its generalizability across diverse battery types, vehicle models, and driving conditions.