Distributed Real-Time Scheduling for Battery EV Cars: A Polytope Aggregation and Consensus ADP Approach

The rapid proliferation of battery EV cars presents both a significant challenge and a substantial opportunity for modern power systems. While their uncoordinated charging can stress grid infrastructure, their inherent energy storage flexibility offers a valuable resource for enhancing system economic efficiency and reliability. However, realizing this potential through real-time scheduling is hindered by three primary challenges: the computational intractability of coordinating thousands of individual battery EV cars, privacy and reliability concerns inherent to centralized control architectures, and the pervasive stochasticity from renewable generation and loads. This work addresses these challenges by proposing a fully distributed, real-time charging coordination strategy for battery EV cars based on Approximate Dynamic Programming (ADP).

The core methodology begins with an accurate and computationally efficient representation of battery EV car fleets. Each individual battery EV car is modeled by its charging constraints over its connection horizon. These constraints define a feasible polytope in the multi-dimensional space of its possible charging power over time. To manage scalability, clusters of battery EV cars with similar connection times are formed. For each cluster, a base polytope is constructed. Then, an optimization-based polytope approximation method is employed: each individual battery EV car’s polytope is inner-approximated by a scaled and translated version of the cluster’s base polytope. The aggregation of the entire cluster is then simply the Minkowski sum of these approximated polytopes, which, due to the common base, reduces to the sum of their scaling and translation vectors. This polytope-based aggregation for battery EV cars provides a compact yet accurate representation of the cluster’s aggregate flexibility, significantly reducing model complexity while preserving a high degree of available control capability. The reverse process, disaggregation, is straightforward, allowing the aggregate power signal to be fairly distributed back to individual battery EV cars.

The system-wide real-time scheduling problem aims to minimize total operational cost, including thermal generation cost, penalties for wind and solar curtailment, and cost/revenue from transactions with the main grid, subject to power balance and device limits. The decision variables include the charging power for each aggregated battery EV car cluster. To frame this multi-period problem under uncertainty as a sequential decision-making process, it is recast as a Markov Decision Process (MDP). Let the state vector at time $t$ be $S_t$, which includes the state-of-charge (SOC) of each battery EV car cluster. The decision vector $a_t$ includes the charging power for each cluster. The core optimization is governed by the Bellman optimality equation:

$$Q_t(S_t) = \min_{a_t} \left\{ C_t(S_t, a_t) + \mathbb{E}[Q_{t+\Delta t}(S_{t+\Delta t}^a)] \right\}$$

where $C_t(S_t, a_t)$ is the immediate cost at time $t$, and $Q_{t+\Delta t}(S_{t+\Delta t}^a)$ represents the cost-to-go, encapsulating the impact of current decisions on the future. Directly solving this is computationally prohibitive due to the “curse of dimensionality.”

Our solution leverages Approximate Dynamic Programming (ADP). The key idea is to approximate the future cost-to-go function $Q_{t+\Delta t}(S_{t+\Delta t}^a)$ using a piecewise linear, convex function of the post-decision SOC of the battery EV car clusters. The approximate Bellman equation becomes:

$$\hat{Q}_t(S_t) \approx \min_{a_t} \left\{ C_t(S_t, a_t) + \sum_{k \in \mathcal{K}} \sum_{d=1}^{D} v_{k,t,d}^a SOC_{k,t,d}^a \right\}$$

Here, $v_{k,t,d}^a$ is the slope of the value function for battery EV car cluster $k$ in time period $t$ and segment $d$ of the piecewise linear function, and $SOC_{k,t,d}^a$ is the corresponding post-decision SOC. The slopes $v$ are learned offline through iterative training using historical or forecasted data on renewable generation, load, and prices. This training embeds the statistical experience of the system’s stochastic environment into the value function, enabling near-optimal real-time decisions that account for future uncertainty.

The critical innovation lies in solving this model in a fully distributed manner, eliminating the need for a central coordinator and protecting the privacy of all entities, including the battery EV car users. This is achieved through a two-stage integration of consensus theory with ADP.

Stage 1: Distributed Real-Time Optimization via Consensus. For a single time period $t$, given the current value function slopes, the optimization in the approximate Bellman equation is solved using a consensus-based distributed algorithm. The global problem is decomposed into local sub-problems for each agent (thermal generator, renewable plant, grid connection point, battery EV car aggregator). Each agent optimizes its own power setpoint based on a local estimate of the system’s marginal cost (Lagrange multiplier for power balance), $$\mu$$, and communicates only with its neighbors in a predefined communication graph. The update rules for the marginal cost vector $$\mu_t^{n_c}$$ and the power mismatch vector $$\delta_t^{n_c}$$ at consensus iteration $n_c$ are:

$$
\begin{aligned}
\mu_t^{n_c+1} &= A \mu_t^{n_c} + \alpha_{\mu} \delta_t^{n_c} \\
\delta_t^{n_c+1} &= A \delta_t^{n_c} – (P_t^{n_c+1} – P_t^{n_c})
\end{aligned}
$$

where $A$ is a doubly stochastic mixing matrix, $$\alpha_{\mu}$$ is a step size, and $P_t$ is the vector of power setpoints. For devices with linear cost/utility functions (e.g., battery EV car charging approximated by its value function slope, grid exchange, renewables), a direct application can lead to oscillation as their optimal decision jumps between bounds. To ensure convergence, a power limiting factor $$\kappa$$ is introduced for these marginal devices, modifying their local constraint during iteration. The process iterates until $$\|\mu_t^{n_c+1} – \mu_t^{n_c}\|$$ and $$\|\delta_t^{n_c}\|$$ are below thresholds, at which point a consensus on the optimal dispatch is reached using only local communication.

Stage 2: Distributed Value Function Training. The offline training of the value function slopes $v_{k,t,d}^a$ must also be distributed to maintain complete privacy. In centralized ADP, the slope update requires the global cost gradient. We derive a fully distributed update rule by recognizing that the slope for a battery EV car cluster’s value function represents the marginal impact of its SOC on the total system cost. This marginal impact can be inferred locally by the aggregator by introducing a small perturbation to its SOC and observing the resulting change in the consensus-based solution. Specifically, the update for the slope sampling estimate $$\hat{v}_{k,t}^{n_a}$$ in training iteration $n_a$ can be computed using only information available to the battery EV car aggregator after the consensus run:

$$
\hat{v}_{k,t}^{n_a} = \frac{\partial C_t}{\partial SOC_{k,t}^a} \approx
\begin{cases}
0, & \text{if } \Delta SOC_k \text{ causes no power change} \\
\sum_{g} (2a_g P_{g,t} + b_g) \frac{\partial P_{g,t}}{\partial P_{k,t}^{ev}} \eta_k^C, & \text{if compensated by thermal unit } g \\
c_t^{ex} \eta_k^C, & \text{if compensated by grid exchange} \\
c^{pv} \eta_k^C, & \text{if compensated by solar curtailment} \\
c^{wt} \eta_k^C, & \text{if compensated by wind curtailment}
\end{cases}
$$

The partial derivatives $$\frac{\partial P_{g,t}}{\partial P_{k,t}^{ev}}$$ etc., are implicitly obtained from the changes in the consensus Lagrange multipliers or power limiting factors from the training run. This allows each battery EV car aggregator to update its own value function slopes using purely local information, achieving distributed learning. The final slope is updated via $v^{n_a} = (1-\alpha)v^{n_a-1} + \alpha \hat{v}^{n_a}$, where $$\alpha$$ is a smoothing step size.

The overall framework operates in two phases: an offline distributed training phase, where multiple scenarios are run using the consensus-ADP algorithm to train the value functions for all battery EV car clusters, and an online distributed real-time scheduling phase, where the trained value functions are used within the consensus optimizer to determine charging setpoints for battery EV cars as new information arrives.

The efficacy of the proposed polytope aggregation, consensus optimization, and distributed ADP is validated through case studies on modified IEEE 6-bus and 118-bus systems. The polytope method for aggregating battery EV cars is compared against standard inner-approximation methods like box and zonotope. The results demonstrate its superior accuracy in capturing fleet flexibility.

Aggregation Method	Total Operating Cost (USD)	Relative Error vs. No Aggregation
No Aggregation (Benchmark)	20,669	—
Box Approximation	23,273	+12.60%
Zonotope Approximation	23,001	+11.28%
Proposed Polytope Method	21,248	+2.80%

The consensus algorithm successfully converges to the optimal solution while protecting privacy. The distributed ADP algorithm’s performance is tested against a centralized global optimum solver and a distributed myopic (greedy) strategy across 100 random scenarios. The results for the larger IEEE 118-bus system with 100,000 battery EV cars are summarized below:

Algorithm	Avg. Cost Error vs. Centralized Optimum	Max Cost Error	Key Feature
Distributed Myopic	6.02%	6.60%	No future consideration
Proposed Distributed ADP	0.29%	0.46%	Near-optimal, privacy-preserving

A further test using 30 days of real-world PJM data confirmed the practical effectiveness of the strategy for scheduling battery EV cars, with the distributed ADP maintaining an average error below 1% daily, significantly outperforming the myopic approach and leading to substantial cost savings over the month.

In conclusion, this work presents a comprehensive, privacy-preserving solution for the real-time coordination of large-scale battery EV car charging. By integrating polytope-based aggregation for computational efficiency, consensus theory for distributed optimization, and a novel distributed ADP mechanism for learning under uncertainty, the strategy enables power system operators and battery EV car aggregators to collaboratively achieve near-optimal economic performance. This approach effectively harnesses the flexibility of battery EV cars to balance stochastic renewable generation while respecting the privacy and autonomy of all distributed stakeholders in the modern grid.