Electric Vehicle Path Planning for Wireless Charging Based on Deep Reinforcement Learning

In recent years, the rapid adoption of electric vehicles globally, particularly in regions like China, has highlighted significant challenges in charging infrastructure. The widespread use of electric vehicles, often referred to as China EV in market analyses, faces issues such as high charging costs, low efficiency, and increased stress on urban power grids. As a solution, wireless charging systems embedded in roadways have emerged, allowing electric vehicles to recharge while in motion. This technology utilizes coils installed beneath driving lanes to transfer energy to receivers on electric vehicles, enabling continuous power replenishment without the need for stationary charging stations. The integration of such systems into urban environments requires efficient path planning algorithms to optimize energy usage and meet operational constraints.

The core problem involves scheduling routes for multiple electric vehicles to maximize their total remaining energy upon reaching their destinations, while adhering to deadline and energy constraints. This is formalized as the Scheduling for Wireless Charging of Electric Vehicle (SWCE) problem. Given the complexity of urban networks and the dynamic nature of electric vehicle travel, traditional optimization methods fall short. Thus, we propose a deep reinforcement learning-based approach to address this NP-hard problem, leveraging machine learning to derive near-optimal paths that enhance the overall efficiency of electric vehicle fleets. This work builds on the growing interest in smart transportation systems, where electric vehicles play a pivotal role in sustainable mobility.

To model the SWCE problem, we consider a transportation network represented as a complete graph $G = (N, A)$, where $N$ denotes nodes such as intersections, origins, and destinations, and $A$ represents edges connecting these nodes. Each electric vehicle $e \in V$ (where $V$ is the set of electric vehicles) has specific parameters: a start point $s_e$, destination $d_e$, deadline $t_e$, initial energy $E^0_e$, maximum battery capacity $E^{\text{max}}_e$, energy consumption rate per unit distance $\gamma_e$, and speed $r_a^e$ on edge $a$. A binary variable $x_a$ indicates whether edge $a$ is equipped with wireless charging coils ($x_a = 1$) or not ($x_a = 0$). The charging power is denoted by $\alpha$, and the transmission efficiency is $\gamma_0$. The remaining energy of an electric vehicle after traversing a path $p_e$ (a sequence of edges from $s_e$ to $d_e$) is given by:

$$E_a^e = \min \left\{ E^0_e – \gamma_e (|p_a^e| + |a|) + \sum_{a’ \in p_e} x_a \gamma_0 \alpha \frac{|a’|}{r_{a’}^e}, E^{\text{max}}_e \right\}$$

where $|a|$ is the length of edge $a$, and $|p_a^e|$ is the length of the sub-path $p_a^e \subseteq p_e$ leading to edge $a$. The objective of the SWCE problem is to find a set of paths $P$ that maximizes the total remaining energy for all electric vehicles, subject to constraints that each vehicle reaches its destination by the deadline and maintains non-negative energy throughout. Mathematically, this is expressed as:

$$\text{max} \sum_{e \in V} E_a^e \quad \text{for } d_e \in a, a \in p_e$$

subject to:

(a) $\sum_{a \in p_e} \frac{|a|}{r_a^e} \leq t_e \quad \forall e \in V$

(b) $\prod_{a \in p_e} E_a^e \geq 0 \quad \forall e \in V$

Constraint (a) ensures that the travel time does not exceed the deadline, while constraint (b) guarantees that the energy never drops below zero during the journey. The SWCE problem is NP-hard, as it generalizes the Restricted Shortest Path (RSP) problem, which is known to be computationally intractable. This complexity arises from the need to balance energy consumption and charging opportunities across multiple electric vehicles in a dynamic environment.

To tackle this challenge, we formulate the problem as a Markov Decision Process (MDP) and apply deep reinforcement learning. In this framework, each electric vehicle acts as an agent interacting with the environment. The state at time step $t$ for vehicle $e$ is defined as $S_e(t) = \{ a_e, E_{a_e}^e, t_{a_e}^e \}$, where $a_e$ is the current edge, $E_{a_e}^e$ is the remaining energy, and $t_{a_e}^e$ is the remaining time to the deadline, calculated as $t_{a_e}^e = t_e – \sum_{a \in p_{a_e}^e} \frac{|a|}{r_a^e} – \frac{|a_e|}{r_{a_e}^e}$. The action $A_e(t)$ involves selecting the next edge $a$ from the set of available edges $A(a_e)$ connected to the current node. The reward function $r_e(S_e(t), A_e(t))$ is based on the energy gained from charging on edge $a$, given by:

$$r_e(S_e(t), a) = x_a \gamma_0 \alpha \frac{|a|}{r_a^e}$$

This reward encourages the agent to prioritize routes with charging opportunities, thereby maximizing overall energy retention. We employ a deep Q-network (DQN) architecture with two neural networks: an evaluation network and a target network, parameterized by $\theta$ and $\bar{\theta}$, respectively. The training process involves an $\epsilon$-greedy policy for exploration, where with probability $\epsilon$, a random action is chosen, and with probability $1-\epsilon$, the action with the highest Q-value is selected. The Q-value function $Q(S_e(t), A_e(t); \theta)$ is updated using the Bellman equation:

$$Y(t) = r_e(S_e(t), A_e(t)) + \gamma \max_{A’_e(t)} Q(S’_e(t), A’_e(t); \bar{\theta})$$

where $\gamma$ is the discount factor, and $S’_e(t)$ is the next state. The loss function for updating the parameters is defined as:

$$\text{Loss}(\theta) = \mathbb{E} \left[ (Y(t) – Q(S_e(t), A_e(t); \theta))^2 \right]$$

We use gradient descent to minimize this loss, with the target network updated periodically to stabilize training. The algorithm proceeds over multiple episodes, with each episode involving the selection of a path for an electric vehicle while accumulating experiences in a replay buffer for batch learning.

The performance of our proposed algorithm, termed S-DRL (Scheduling via Deep Reinforcement Learning), is evaluated through simulations based on real-world road networks from Brooklyn, New York. We compare S-DRL against a greedy algorithm (GA) that selects the shortest path satisfying energy and deadline constraints. The simulation parameters are summarized in the table below, with default values used unless specified otherwise. For instance, the number of electric vehicles ranges from 10 to 50, charging segments vary from 50 to 200, and deadline means extend from 1.5 to 2.1 hours. The neural network parameters include a learning rate of 0.01, discount factor of 0.9, and $\epsilon$ of 0.9, with training conducted over 1000 episodes.

Default Simulation Parameters
Parameter	Value	Parameter	Value
Number of Nodes $\|N\|$	50	Number of Electric Vehicles $\|V\|$	50
Energy Consumption Rate $\gamma_e$	10 kWh/100 km	Vehicle Speed $r_a^e$	30–60 km/h
Maximum Battery Capacity $E^{\text{max}}_e$	80–100 kWh	Initial Energy $E^0_e$	30–50 kWh
Charging Power $\alpha$	30 kW	Transmission Efficiency $\gamma_0$	80%
Deadline Mean $\mu_1$	2.1 h	Deadline Standard Deviation $\sigma_1$	0.1

The results demonstrate that S-DRL consistently outperforms GA in terms of total remaining energy. For example, as the number of electric vehicles increases from 10 to 50, the total remaining energy rises for both algorithms, but S-DRL achieves approximately 1.27 times higher energy than GA. This improvement stems from S-DRL’s ability to optimize paths by considering both energy consumption and charging opportunities, whereas GA focuses solely on shortest paths. Similarly, when the number of charging segments increases from 50 to 200, S-DRL’s total remaining energy is about 1.25 times that of GA, highlighting its efficiency in leveraging additional charging infrastructure. The table below summarizes the comparative performance under varying conditions, emphasizing the robustness of S-DRL in handling dynamic constraints.

Performance Comparison of S-DRL and GA
Scenario	Total Remaining Energy (S-DRL, kWh)	Total Remaining Energy (GA, kWh)	Improvement Ratio
10 Electric Vehicles	950	750	1.27
50 Electric Vehicles	2200	1730	1.27
50 Charging Segments	1800	1440	1.25
200 Charging Segments	2400	1920	1.25
Deadline Mean 1.5 h	1700	1405	1.21
Deadline Mean 2.1 h	2300	1900	1.21

Further analysis reveals that S-DRL’s advantage grows with more relaxed deadlines, as it can explore longer paths with higher charging potential. The algorithm’s time complexity is $O(|V| K T (H + \bar{H}))$, where $K$ is the number of episodes, $T$ is the number of time steps, and $H$ and $\bar{H}$ represent the training times for the evaluation and target networks, respectively. This scalability makes S-DRL suitable for large-scale deployments in urban areas, where the number of electric vehicles is expected to rise, particularly in markets like China EV.

In conclusion, our deep reinforcement learning-based approach effectively addresses the SWCE problem by maximizing the total remaining energy of electric vehicles under real-world constraints. The S-DRL algorithm demonstrates superior performance compared to greedy methods, thanks to its ability to learn optimal policies through environmental interactions. This work contributes to the advancement of intelligent transportation systems, supporting the sustainable growth of electric vehicle adoption. Future research could extend this framework to incorporate real-time traffic data and multi-agent coordination, further enhancing the efficiency of wireless charging networks for electric vehicles.

The implications of this study are significant for regions with high electric vehicle penetration, such as China, where infrastructure development is crucial. By optimizing path planning, we can reduce charging costs, alleviate grid pressure, and promote the widespread use of electric vehicles. As the technology evolves, integration with renewable energy sources and smart grid systems could unlock even greater benefits, making electric vehicles a cornerstone of modern urban mobility.