EV Charging Load Forecasting Using an Enhanced Monte Carlo Simulation Framework Integrating GMM and GRA

The global transition towards sustainable transportation has catalyzed an exponential growth in the electric vehicle (EV) market. The rapid and large-scale integration of millions of battery EV cars into existing power grids presents both a significant challenge and a pivotal opportunity for modern energy systems. While electrifying transport is crucial for decarbonization, the uncoordinated charging of these battery EV cars poses substantial risks to grid stability and operational efficiency. The inherent spatial and temporal randomness of charging behaviors can lead to localized load spikes, exacerbating peak-to-valley differences, causing transformer overloads, and potentially compromising power supply reliability. Accurate forecasting of the aggregate charging load from a fleet of battery EV cars is, therefore, a fundamental prerequisite for proactive grid planning, demand-side management, and ensuring the secure and economical operation of power networks under high EV penetration scenarios.

Among various forecasting methodologies, probabilistic approaches, particularly those based on Monte Carlo simulation, have gained prominence due to their ability to model the stochastic nature of user behavior. The conventional Monte Carlo framework typically relies on assumed probability distributions for key charging characteristics, such as daily mileage and charging start time, to generate numerous simulation scenarios. However, this traditional approach is fraught with limitations that compromise its predictive accuracy. A primary shortcoming is the reliance on simplistic, often unimodal, probability distributions (e.g., single normal or lognormal distributions) to model charging parameters. Real-world data for battery EV car usage reveals that these parameters, especially charging start time and initial state of charge (SOC), frequently exhibit complex, multimodal distributions corresponding to distinct user routines (e.g., post-commute charging, mid-day top-up). A single distribution cannot capture these nuances, leading to significant fitting errors and a misrepresentation of actual load patterns.

Furthermore, traditional models often indirectly derive the initial charging SOC from daily driving distance under a “one-charge-per-day” assumption, which ignores the prevalent behavior of opportunistic charging based on remaining battery capacity rather than distance traveled. This introduces error propagation. Additionally, the standard Monte Carlo process performs a single random sampling iteration. The resulting load profile from this single iteration can be heavily influenced by extreme random values, making the forecast volatile and lacking in statistical robustness. These deficiencies highlight the urgent need for an enhanced simulation framework that improves both the fidelity of the underlying probability models and the stability of the sampling process.

To systematically address these gaps, this article proposes a novel enhancement to the Monte Carlo simulation method by integrating a Gaussian Mixture Model (GMM) for superior probability distribution fitting and Grey Relational Analysis (GRA) for optimal sample set selection. The core innovation lies in replacing simplistic unimodal distributions with a GMM, whose parameters—including the optimal number of Gaussian components—are determined using the Bayesian Information Criterion (BIC). This allows the model to adaptively and accurately fit the multimodal characteristics observed in real charging data for different types of battery EV cars. Subsequently, instead of relying on a single random sample, the improved method generates multiple candidate sample sets. GRA is then employed to evaluate the relational degree between each candidate set and the original empirical data distribution, thereby identifying and selecting the most representative sample set for final load aggregation. This dual-layer enhancement—refined modeling plus intelligent screening—significantly elevates the accuracy and reliability of the predicted charging load profile.

1. Enhanced Probability Modeling with Gaussian Mixture Model (GMM)

1.1 Fundamentals of GMM

The charging behavior of a battery EV car, characterized by parameters like start time and initial SOC, is inherently stochastic. The GMM is a powerful probabilistic model that represents complex data distributions as a weighted sum of multiple Gaussian (normal) distributions. This flexibility makes it ideal for capturing the multimodal nature of real-world EV charging data, where peaks correspond to common user habits (e.g., evening home charging, daytime public charging). For a random variable $x$ representing a charging parameter, its probability density function (PDF) under a GMM with $M$ components is given by:

$$ p(x) = \sum_{m=1}^{M} \pi_m \mathcal{N}(x | \mu_m, \Sigma_m) $$

where:

$M$: The number of Gaussian components in the mixture.

$\pi_m$: The mixing coefficient or weight for the $m$-th component, satisfying $\sum_{m=1}^{M} \pi_m = 1$ and $\pi_m \geq 0$.

$\mathcal{N}(x | \mu_m, \Sigma_m)$: The PDF of the $m$-th Gaussian component with mean $\mu_m$ and covariance $\Sigma_m$.

This formulation allows the model to approximate a wide variety of distribution shapes beyond the capability of a single Gaussian, providing a much more accurate representation of the charging characteristics for a heterogeneous fleet of battery EV cars.

1.2 Determining the Optimal Number of Components via BIC

A critical step in constructing a GMM is selecting the appropriate number of components $M$. Choosing too few components leads to underfitting, failing to capture essential modes in the data. Choosing too many leads to overfitting, where the model captures noise rather than the underlying trend. Information criteria provide a principled way to balance model fit and complexity. The two most common criteria are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).

The AIC is defined as:
$$ AIC = -2 \ln(L) + 2k $$
where $L$ is the maximized value of the likelihood function for the model, and $k$ is the number of estimated parameters.

The BIC introduces a stronger penalty for model complexity, particularly valuable with larger sample sizes, and is defined as:
$$ BIC = -2 \ln(L) + k \ln(n) $$
where $n$ is the sample size.

While both can be used, the BIC’s heavier penalty term makes it more conservative and generally more effective at preventing overfitting in scenarios like EV load modeling, where the true underlying distribution often has a limited number of meaningful modes (e.g., 2-3 peaks for daily charging). The model with the lowest BIC value is preferred. As demonstrated in synthetic tests, when data is generated from a 3-component GMM, BIC correctly identifies the true number of components, whereas AIC may suggest a higher, unnecessary number. Therefore, this study employs BIC to determine the optimal $M$ for modeling the charging parameters of each type of battery EV car.

2. Charging Load Characteristics of Battery EV Cars

The charging load profile is primarily governed by two stochastic user-behavior variables and one technical vehicle parameter: the charging start time ($T_c$), the initial state of charge ($SOC_{start}$), and the battery capacity ($E$). The probability distributions of these parameters vary significantly across different types of battery EV cars, such as private cars, taxis, and buses.

2.1 Probability Distribution of Charging Start Time ($T_c$)

The time at which a user initiates charging is influenced by daily activity patterns. Private battery EV car owners predominantly charge after returning home from work, leading to a major peak in the late afternoon/evening (e.g., 18:00-20:00). A secondary, smaller peak may occur around midday due to opportunistic top-up charging. In contrast, electric taxis, requiring frequent recharging to maintain service, often exhibit two distinct peaks aligning with driver shift changes (e.g., early afternoon and late evening). Electric buses follow a more regimented schedule, with charging typically occurring during midday layovers and after the final evening route.

A unimodal normal distribution fails miserably to capture these multi-peak patterns. The GMM, fitted using BIC-optimized components, provides a vastly superior fit. For instance, the GMM can accurately model the dual peaks for taxis and the primary/secondary peaks for private cars, whereas a single normal distribution would average these into one misplaced peak, distorting the temporal load profile.

2.2 Probability Distribution of Initial State of Charge ($SOC_{start}$)

A key improvement in this proposed framework is the direct modeling of $SOC_{start}$ as a fundamental stochastic variable, moving away from the indirect derivation from daily mileage. Users typically plug in their battery EV car based on perceived battery anxiety or convenience, not strictly on distance driven. The distribution of $SOC_{start}$ also shows non-standard characteristics. While often unimodal, the distribution is frequently skewed. For private cars, the peak is often in the mid-to-low range (e.g., 0.3-0.5), reflecting a tendency to charge before deep depletion. Taxis, with higher daily energy consumption, start charging at a lower SOC on average. Buses, needing reliable operation, may charge at a higher residual SOC. The flexible shape of the GMM allows it to model these skewed distributions more accurately than a standard normal distribution.

The quality of the GMM fit for both $T_c$ and $SOC_{start}$ can be quantified using the correlation coefficient between the fitted probability density curve and the histogram of original data. As shown in the analysis, the GMM achieves correlation coefficients extremely close to 1, demonstrating excellent fitting performance.

EV Type	Correlation Coefficient (Charging Start Time)	Correlation Coefficient (Initial SOC)
Private Car	0.9684	0.9995
Taxi	0.8653	0.9980
Bus	0.9638	0.9986

2.3 Modeling Heterogeneous Battery Capacity

Treating all battery EV cars as having identical battery capacity introduces significant error. In reality, capacity varies by model and vehicle class. This heterogeneity is effectively modeled using a Gamma distribution, which is well-suited for representing positive, continuous quantities like battery capacity (in kWh). The probability density function of the Gamma distribution is:

$$ f(E; \alpha, \beta) = \frac{1}{\beta^{\alpha} \Gamma(\alpha)} E^{\alpha-1} e^{-E/\beta} $$

where $E$ is the battery capacity, $\alpha > 0$ is the shape parameter, $\beta > 0$ is the scale parameter, and $\Gamma(\cdot)$ is the Gamma function. Different parameters $(\alpha, \beta)$ are used for different vehicle types (e.g., small cars vs. buses). Therefore, for a fleet of $N$ battery EV cars, the battery capacity for each type is a random variable:
$$ E_{pri} \sim \text{Gamma}(\alpha_1, \beta_1), \quad E_{taxi} \sim \text{Gamma}(\alpha_2, \beta_2), \quad E_{bus} \sim \text{Gamma}(\alpha_3, \beta_3) $$

2.4 Charging Load Calculation

The charging load for an individual battery EV car is calculated based on the sampled parameters. The required energy $\Delta Q$ (in kWh) is:
$$ \Delta Q = (SOC_{end} – SOC_{start}) \cdot E $$

where $SOC_{end}$ is typically 1 (or 0.95, accounting for charge tapering). The charging duration $\Delta T$ (in hours) is then:
$$ \Delta T = \frac{\Delta Q}{P_c \cdot \eta_c} = \frac{(SOC_{end} – SOC_{start}) \cdot E}{P_c \cdot \eta_c} $$

where $P_c$ is the charging power (e.g., 7 kW for slow, 20+ kW for fast charging) and $\eta_c$ is the charging efficiency (~0.9).

The single-vehicle load $P_{j,t}$ for vehicle type $j$ at time $t$ is a rectangular pulse starting at $T_c$ and lasting for $\Delta T$. The total grid load $P_{total,t}$ at time $t$ is the sum of all individual loads from the entire simulated fleet:
$$ P_{total,t} = \sum_{p=1}^{N} P_{j_p, t}, \quad j \in \{\text{private, taxi, bus}\} $$

3. The Enhanced Monte Carlo Framework with GRA Screening

The proposed forecasting framework integrates the advanced GMM modeling with a post-sampling screening mechanism using Grey Relational Analysis (GRA). GRA is a method to measure the geometric similarity between sequences, used here to evaluate how well a randomly generated sample set reflects the original population distribution.

3.1 Grey Relational Analysis (GRA) Model

Let the reference sequence be the original empirical probability distribution data, denoted as $Y = [Y(1), Y(2), …, Y(n)]$. Let a candidate sequence generated by Monte Carlo sampling be $X_i = [X_i(1), X_i(2), …, X_i(n)]$, where $i$ is the sample set index. The steps are:

1. Normalization: Transform both sequences to a [0,1] scale for comparison.
$$ X’_i(k) = \frac{X_i(k) – \min(X_i(k))}{\max(X_i(k)) – \min(X_i(k))}, \quad k=1,2,…,n $$

2. Calculation of Grey Relational Coefficient: This coefficient for each element $k$ measures the local similarity.
$$ \gamma_i(k) = \frac{\min_i \min_k \Delta_i(k) + \rho \max_i \max_k \Delta_i(k)}{\Delta_i(k) + \rho \max_i \max_k \Delta_i(k)} $$

where $\Delta_i(k) = |X’_i(k) – Y'(k)|$ is the absolute difference, and $\rho$ is the distinguishing coefficient, usually set to 0.5.

3. Calculation of Grey Relational Grade (GRG): The overall relational degree is the average of the coefficients.
$$ r_i = \frac{1}{n} \sum_{k=1}^{n} \gamma_i(k) $$

The GRG $r_i$ lies in [0,1]. A value closer to 1 indicates the candidate sample set $X_i$ has a higher degree of similarity to the original data distribution $Y$. Among multiple generated candidate sets, the one with the highest $r_i$ is selected as the most representative for final load calculation, thereby mitigating the influence of outlier samples.

3.2 Integrated Forecasting Procedure

The step-by-step procedure of the enhanced Monte Carlo simulation is as follows:

Step 1: GMM Fitting. For each type of battery EV car (private, taxi, bus), fit the GMM to the historical data for $T_c$ and $SOC_{start}$ using BIC to determine $M$. This yields the PDFs $p_{T_c}(x)$ and $p_{SOC}(x)$.

Step 2: Multi-Set Sampling. For each vehicle type, use the fitted GMM PDFs and the Gamma distribution for battery capacity $E$ to generate $I$ independent sample sets. Each set contains $N_j$ tuples of $(T_c, SOC_{start}, E)$ for all simulated vehicles of type $j$.

Step 3: GRA-Based Optimal Set Selection. For each of the $I$ sample sets, calculate its GRG $r_i$ against the original distribution data for $T_c$ and $SOC_{start}$. Select the sample set with the highest composite GRG score for each vehicle type.

Step 4: Load Aggregation. Using the optimal sample sets, calculate the individual load for each battery EV car using Eqs. (7) and (8), then sum them across the entire fleet according to Eq. (9) to obtain the 24-hour total charging load profile $P_{total,t}$.

Step	Traditional Monte Carlo	Enhanced Monte Carlo (Proposed)
1. Modeling	Uses simple unimodal distributions (e.g., Normal).	Uses GMM optimized by BIC to fit multimodal distributions.
2. SOC Derivation	Often derived indirectly from daily mileage.	Modeled directly as a stochastic variable using GMM.
3. Sampling	Performs a single random sampling iteration.	Generates multiple sample sets ($I$ iterations).
4. Screening	No screening; uses the single sampled set.	Employs GRA to select the most representative sample set.
5. Output	Volatile, potentially skewed by random extremes.	Stable and robust, closely aligned with statistical trends.

4. Case Study and Simulation Results

A simulation was conducted for a regional fleet comprising 13,537 private battery EV cars, 2,200 electric taxis, and 1,151 electric buses. Battery capacities were modeled as: $E_{pri, taxi} \sim \text{Gamma}(10.8, 3.8)$ and $E_{bus} \sim \text{Gamma}(35.8, 5.8)$. Charging powers were set to 7 kW (slow) for private cars and 20 kW (fast) for taxis and buses, with $\eta_c = 0.9$.

4.1 GRA Screening Results

Five ($I=5$) candidate sample sets were generated for each vehicle type and parameter. The GRG was calculated for each set. The set with the highest GRG for each parameter was selected for the final load calculation, as summarized below.

EV Type	Parameter	Selected Optimal Sample Set (#)	Grey Relational Grade (r_i)
Private Car	Start Time ($T_c$)	Set #2	0.6977
Private Car	Initial SOC ($SOC_{start}$)	Set #3	0.7199
Taxi	Start Time ($T_c$)	Set #3	0.6716
Taxi	Initial SOC ($SOC_{start}$)	Set #4	0.7475
Bus	Start Time ($T_c$)	Set #5	0.8363
Bus	Initial SOC ($SOC_{start}$)	Set #2	0.7793

4.2 Forecasted Load Profile and Analysis

The aggregate load forecast from the enhanced method reveals a realistic and nuanced profile. The total load peaks at approximately 49.31 MW between 21:00 and 22:00, driven primarily by private battery EV car owners charging after returning home. A secondary, distinct peak of about 43.82 MW occurs around 12:00-13:00, resulting from midday charging of buses and some opportunistic charging of other battery EV cars. The load composition clearly shows the behavioral patterns: private cars dominate the evening peak; taxis contribute to afternoon and evening peaks; buses create the midday peak and a smaller evening peak.

In stark contrast, a traditional Monte Carlo simulation using unimodal distributions produces a load profile with a single, misplaced peak around 16:00, which does not align with real-world user behavior patterns. This discrepancy arises because the single normal distribution averages out the distinct morning/evening behavioral modes into one artificial central peak. The proposed method’s output is not only more accurate but also provides actionable insights for grid operators, such as identifying the true timing of stress periods (late evening) and potential opportunities for load shifting (e.g., incentivizing charging away from the 21:00 peak).

Performance Metric	Traditional Monte Carlo	Enhanced GMM-GRA Monte Carlo
Distribution Fit Accuracy	Low (fails to capture multimodality)	High (correlation coeff. up to 0.9995)
Peak Load Timing	Inaccurate (e.g., predicts peak at ~16:00)	Accurate (predicts peak at ~21:00, aligns with reality)
Profile Realism	Unrealistic single-peak profile	Realistic multi-peak profile matching behavior
Result Robustness	Low (susceptible to single-sample randomness)	High (uses optimal set screened via GRA)
Model Heterogeneity	Limited (often homogeneous battery capacity)	Comprehensive (Gamma-distributed capacity, vehicle-specific models)

5. Conclusion

Accurate forecasting of the charging load from a growing population of battery EV cars is essential for the secure and efficient integration of electric transportation. This article has presented a significant enhancement to the conventional Monte Carlo simulation framework by integrating two key methodological improvements: Gaussian Mixture Modeling (GMM) and Grey Relational Analysis (GRA).

The use of GMM, with its component number optimized by BIC, successfully addresses the critical limitation of traditional unimodal distributions. It enables the precise fitting of the complex, multimodal probability distributions that characterize real-world charging start times and initial SOC levels for different types of battery EV cars. Directly modeling $SOC_{start}$ as a stochastic variable, rather than deriving it from mileage, further enhances model fidelity and breaks the inaccurate “one-charge-per-day” assumption. Introducing the Gamma distribution for battery capacity effectively captures the heterogeneity within the vehicle fleet.

The incorporation of GRA introduces a robust screening layer to the probabilistic simulation. By generating multiple candidate sample sets and selecting the one with the highest relational degree to the original data, the method effectively filters out the influence of extreme random values that can distort a forecast based on a single sample. This process markedly improves the stability and representativeness of the final load prediction.

Simulation results confirm the superiority of the proposed framework. It generates a temporally accurate load profile with peaks aligned with known user behavior patterns (e.g., evening peak for private cars, midday peak for buses), which is a substantial improvement over the distorted profile produced by the traditional method. The enhanced Monte Carlo simulation provides grid planners and operators with a more reliable tool for anticipating the demand from battery EV cars, facilitating better infrastructure investment decisions, more effective demand response program design, and ultimately, a more resilient and sustainable power system for the era of electrified mobility.