Model Predictive Control for Super-Mild Hybrid Cars Using Markov Chain and Q-Learning

In recent years, the automotive industry has faced increasing pressure to address environmental concerns and energy efficiency. As a researcher focused on advanced vehicle technologies, I have been particularly interested in hybrid cars, which combine internal combustion engines with electric motors to reduce fuel consumption and emissions. Among various hybrid configurations, super-mild hybrid cars represent a promising avenue due to their cost-effectiveness and simplicity. However, optimizing energy management in these vehicles remains a critical challenge. Traditional strategies, such as rule-based or instantaneous optimization methods, often fail to achieve global optimality or real-time performance. In this article, I explore a novel approach that integrates Markov chain and Q-Learning algorithms within a model predictive control (MPC) framework to enhance energy management in super-mild hybrid cars. This method aims to balance global optimality with computational efficiency, ensuring that hybrid cars can operate more sustainably and intelligently on the road.

Hybrid cars have gained significant attention as a transitional technology toward fully electric vehicles. They leverage multiple power sources, typically an engine and a motor, to improve fuel economy and reduce tailpipe emissions. The energy management strategy (EMS) is the core component that dictates how power is distributed between these sources. Over the years, various EMS approaches have been developed, including rule-based strategies, which rely on heuristic rules but lack adaptability; instantaneous optimization methods, such as equivalent consumption minimization strategy (ECMS), which optimize for each time step but may not guarantee global efficiency; and global optimization techniques, like dynamic programming (DP), which require prior knowledge of driving cycles and are computationally intensive. Model predictive control (MPC) has emerged as a powerful alternative, as it predicts future vehicle states over a finite horizon and optimizes control actions accordingly. By combining prediction with rolling optimization, MPC can achieve near-optimal performance while maintaining real-time feasibility. In this context, I propose an MPC-based EMS that uses Markov chains for acceleration prediction and Q-Learning for optimization, specifically tailored for super-mild hybrid cars.

The architecture of a super-mild hybrid car typically involves a parallel configuration, where an engine and a motor are coupled through a clutch or transmission system. In my research, I consider a vehicle with a reflux continuously variable transmission (CVT), which offers wide ratio ranges and high efficiency. The key components include an engine, a motor, a battery pack, and associated control units. The power demand during driving is derived from overcoming resistive forces, such as rolling resistance, aerodynamic drag, and acceleration inertia. The total required power $ P_{\text{req}} $ can be expressed as:

$$ P_{\text{req}} = \left( mgf + \frac{1}{2} C_D A \rho v^2 + \delta m \frac{dv}{dt} \right) v $$

where $ m $ is the vehicle mass, $ g $ is gravitational acceleration, $ f $ is the rolling resistance coefficient, $ C_D $ is the aerodynamic drag coefficient, $ A $ is the frontal area, $ \rho $ is air density, $ v $ is velocity, $ \frac{dv}{dt} $ is acceleration, and $ \delta $ is the rotational mass factor. This equation forms the basis for calculating the power that must be supplied by the hybrid powertrain. The engine and motor models are essential for simulating their behavior. For instance, the engine torque $ T_e $ is often mapped as a function of speed and throttle opening, while the motor torque $ T_m $ depends on speed and efficiency. The battery state of charge (SOC) dynamics are governed by:

$$ \frac{dSOC}{dt} = -\frac{I}{Q_{\text{bat}}} $$

with $ I $ being the battery current and $ Q_{\text{bat}} $ the battery capacity. These models are integrated into a simulation environment, such as MATLAB/Simulink, to evaluate the performance of hybrid cars under various driving conditions.

My proposed MPC strategy consists of three main steps: prediction, optimization, and feedback correction. For prediction, I employ a multi-step Markov chain model to forecast future acceleration profiles. The Markov chain is suitable because acceleration in hybrid cars can be treated as a stochastic process with Markov properties, meaning future states depend only on the current state. By analyzing historical driving data from standard cycles like ECE_EUDC and UDDS, I construct transition probability matrices for acceleration at discrete velocity intervals. The probability $ P_{z,i,j} $ of acceleration transitioning from state $ i $ to state $ j $ at velocity $ z $ is estimated using maximum likelihood estimation:

$$ P_{z,i,j} = \frac{S_{i,j}}{\sum_{j} S_{i,j}} $$

where $ S_{i,j} $ counts the transitions from $ i $ to $ j $. This multi-step approach allows for predicting acceleration over a horizon of $ p $ steps, reducing error accumulation compared to single-step methods. The predicted acceleration is then used to compute future velocity and power demand for the hybrid car. To assess prediction accuracy, I calculate the root mean square error (RMSE) between predicted and actual velocities:

$$ R_e = \sqrt{\frac{1}{L} \sum_{k=1}^{L} \left( v(k) – v_{\text{np}}(k) \right)^2 } $$

where $ L $ is the total cycle duration. My results show that RMSE increases with longer prediction horizons, but the multi-step Markov model maintains reasonable accuracy for hybrid car energy management.

For optimization, I use Q-Learning, a reinforcement learning algorithm, to solve the rolling optimization problem within the prediction horizon. Q-Learning is a data-driven method that learns optimal actions through interaction with the environment, making it efficient for complex control problems in hybrid cars. The state space includes the required power $ P_{\text{req}} $ and battery SOC, while the action space is the motor torque $ T_m $. The objective is to minimize the equivalent fuel consumption, which combines actual fuel use and battery energy conversion. The reward function $ r_t(s, a) $ at time $ t $ is defined as:

$$ r_t(s, a) = m_{\text{fuel}} + m_{\text{ele}} + \beta (SOC(t) – SOC_{\text{ref}})^2 $$

where $ m_{\text{fuel}} $ is the instantaneous fuel consumption, $ m_{\text{ele}} $ is the equivalent fuel from electricity, $ \beta $ is a weighting factor, and $ SOC_{\text{ref}} $ is the reference SOC. The Q-Learning algorithm updates the state-action value function $ Q(s, a) $ iteratively:

$$ Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a’} Q(s’, a’) – Q(s, a) \right] $$

with learning rate $ \alpha $, discount factor $ \gamma $, and immediate reward $ r $. Through exploration and exploitation, Q-Learning converges to an optimal policy that dictates the motor torque distribution for the hybrid car. Constraints are imposed to protect components, such as limiting SOC between 0.4 and 0.8, and ensuring torque and speed bounds for the engine and motor. This optimization is performed over a receding horizon, meaning only the first control action is applied, and the process repeats at each time step. This rolling mechanism enhances adaptability and real-time performance for hybrid cars.

Feedback correction is the final step, where actual vehicle measurements (e.g., velocity and acceleration) are used to update the prediction model and refine the optimization. This closed-loop approach compensates for uncertainties and disturbances, ensuring robust control for hybrid cars. The overall MPC framework integrates these elements seamlessly, enabling efficient energy management without requiring prior knowledge of the entire driving cycle.

To validate my approach, I conducted simulations using MATLAB/Simulink under combined ECE_EUDC and UDDS cycles. The super-mild hybrid car parameters are summarized in the table below:

Component	Parameter	Value
Engine	Rated Power	49 kW
Motor	Rated Power	5 kW
Battery	Capacity	6.5 Ah
Vehicle	Mass	1190 kg

The simulation settings included a prediction horizon of 5 seconds, a time step of 0.01 seconds, and an initial SOC of 0.6. I compared my Markov chain + Q-Learning (MC-QL) strategy with a baseline Markov chain + DP (MC-DP) strategy. The results demonstrated that both strategies achieved similar fuel economy, with MC-QL showing a slight increase of 3.9% in equivalent fuel consumption but offering significant computational advantages. Specifically, the simulation time for MC-QL was 6 seconds, compared to 10 seconds for MC-DP, indicating a 40% reduction in runtime. This efficiency gain is crucial for real-time applications in hybrid cars. Additionally, the SOC trajectory remained stable, with MC-QL achieving a final SOC change of 0.0013, which is 7.1% lower than MC-DP, highlighting better battery balance. The torque distribution maps revealed that the engine operated near its optimal efficiency line, and the motor worked in high-efficiency regions, confirming the effectiveness of the proposed EMS for hybrid cars.

Further analysis involved examining the impact of prediction horizon length on performance. The table below shows RMSE values for different horizons:

Prediction Horizon (s)	RMSE (km/h)
3	1.8586
5	3.1355
7	4.4881
9	5.8921

As expected, longer horizons led to higher errors, but the MC-QL strategy compensated through robust optimization. The use of Q-Learning allowed for rapid convergence and adaptability, making it suitable for hybrid cars in dynamic environments. Moreover, the integration of Markov chains provided accurate predictions without relying on complex models or external data sources, which is beneficial for practical deployment in hybrid cars.

In conclusion, my research presents a comprehensive MPC-based energy management strategy for super-mild hybrid cars, leveraging Markov chains for prediction and Q-Learning for optimization. This approach addresses the trade-off between global optimality and real-time computation, which is a common issue in hybrid car control systems. By simulating standard driving cycles, I have shown that the MC-QL strategy achieves competitive fuel economy while significantly reducing computational time compared to DP-based methods. The hybrid car benefits from improved battery SOC maintenance and efficient torque distribution, contributing to overall sustainability. Future work could explore the integration of traffic information or machine learning techniques to further enhance prediction accuracy for hybrid cars. Ultimately, this study underscores the potential of intelligent control algorithms in advancing hybrid car technology toward a greener automotive future.

The implications of this work extend beyond academic research. As hybrid cars become more prevalent, efficient energy management systems can lead to substantial fuel savings and emission reductions on a global scale. The MC-QL strategy is particularly relevant for super-mild hybrid cars, which are often marketed as affordable alternatives to conventional vehicles. By implementing such strategies, manufacturers can improve performance without increasing costs, making hybrid cars more accessible to consumers. Additionally, the modular nature of the MPC framework allows for customization based on specific vehicle architectures or driving patterns, ensuring versatility across different hybrid car models. As I continue to investigate this topic, I aim to refine the algorithms and conduct real-world testing to validate simulation results, ultimately contributing to the evolution of hybrid cars as a key solution in the transition to sustainable transportation.