As a low-carbon and environmentally friendly mode of transportation, hybrid electric vehicles (HEVs) are rapidly evolving, with electric cars playing a pivotal role in China’s EV market. However, HEV owners face challenges such as limited driving range, long charging times, and uneven distribution of charging and refueling facilities. To address these issues while ensuring user privacy, I propose a station recommendation algorithm for HEVs based on vertical federated learning (VFL). This approach leverages local training and central aggregation mechanisms to update models without compromising sensitive data. By integrating blockchain technology with cloud computing, the system provides a secure and trustworthy network for transmitting encrypted training parameters, replacing centralized architectures prone to single points of failure with decentralized data aggregators. This creates a flexible and scalable cloud network that enhances recommendation efficiency and security.
In this study, I focus on developing a recommendation system that considers various factors, including station availability, user preferences, and real-time conditions. The model utilizes federated learning to train collaboratively across multiple parties—HEVs, charging stations (CSs), and gas stations (GSs)—without sharing raw data. This is particularly relevant in China’s growing EV ecosystem, where data privacy regulations are stringent. The algorithm aims to minimize user costs and waiting times while maximizing station utilization, thereby supporting the sustainable development of electric cars.

The system model involves multiple participants, denoted as $k = \{1, 2, \dots, K\}$, where each participant holds a local dataset $D_k$. For HEVs, CSs, and GSs, the data samples include input-output pairs such as $\{x_i^{\text{hev}}\}_{i \in D_{\text{hev}}}$, $\{x_i^{\text{cs}}, y_i\}_{i \in D_{\text{cs}}}$, and $\{x_i^{\text{gs}}, z_i\}_{i \in D_{\text{gs}}}$, with $D = \sum_{k=1}^K D_k$. The goal is to find a model parameter vector $\Theta \in \mathbb{R}^d$, where $d$ represents the feature dimensions. In VFL, each party may have different feature vectors, leading to varying sizes of $\Theta$. The model performance $\sigma_{\text{vfl}}$ should closely approximate that of a centralized model $\sigma_{\text{cent}}$, satisfying the condition:
$$|\sigma_{\text{vfl}} – \sigma_{\text{cent}}| < \sigma$$
where $\sigma$ is the error tolerance. This ensures that the federated approach does not sacrifice accuracy while preserving privacy.
Feature selection is critical for improving model efficiency. I identify relevant attributes for each participant, as summarized in Table 1. These features capture essential aspects such as location, cost, and time-related factors, which are vital for personalized recommendations in the context of China EV operations.
| Participant | Features |
|---|---|
| HEV | Vehicle location, average speed, current weather, surrounding infrastructure, traffic congestion, battery capacity, start/stop charging times, vehicle state (charging/refueling/driving) |
| CS | Latitude and longitude, number of charging points, average charging capacity, average charging cost, additional parking fees, service fees, charging start time, duration, and end time |
| GS | Latitude and longitude, number of fuel pumps, average refueling cost, average refueling capacity, waiting time, refueling duration |
The recommendation system framework consists of two main components: VFL model training and communication with a CloudletChain. In the VFL training phase, encrypted entity alignment is performed to match sample IDs across HEVs, CSs, and GSs without exposing raw data. This involves generating and exchanging encrypted values of sample IDs to find the intersection $I$ of local sets. Subsequently, local model training is conducted in five steps: initialization by cloud aggregators, computation of intermediate values, gradient and loss calculation with masking, aggregation by cloud nodes, and parameter updates. The training objective for the model $M_{\text{vfl}}$ is to minimize the loss function, which for ridge regression with homomorphic encryption can be expressed as:
$$\min_{\Theta_{\text{hev}}, \Theta_{\text{cs}}, \Theta_{\text{gs}}} \sum_i \left( \Theta_{\text{hev}} x_i^{\text{hev}} + \Theta_{\text{cs}} x_i^{\text{cs}} – y_i + \Theta_{\text{gs}} x_i^{\text{gs}} – z_i \right)^2 + \lambda \left( g(\Theta_{\text{hev}}) + g(\Theta_{\text{cs}}) + g(\Theta_{\text{gs}}) \right)$$
where $g(\cdot) = \frac{1}{2} \|\cdot\|^2$ is the regularization function, and $\lambda \in [0,1]$ is the regularization parameter. Let $u_i^{\text{hev}} = \Theta_{\text{hev}} x_i^{\text{hev}}$, $u_i^{\text{cs}} = \Theta_{\text{cs}} x_i^{\text{cs}}$, and $u_i^{\text{gs}} = \Theta_{\text{gs}} x_i^{\text{gs}}$. The encrypted loss $L$ is decomposed into components:
$$L = L_{\text{hev}} + L_{\text{cs}} + L_{\text{gs}} + L_{\text{hev-cs}} + L_{\text{hev-gs}} + L_{\text{cs-gs}}$$
where, for example, $L_{\text{hev}} = \sum_i (u_i^{\text{hev}})^2 + \lambda g(\Theta_{\text{hev}})$, and the cross terms like $L_{\text{hev-cs}} = 2 \sum_i (u_i^{\text{hev}} (u_i^{\text{cs}} – y_i))$. The gradients are computed as:
$$\frac{\partial L}{\partial \Theta_{\text{hev}}} = \sum_i d_i x_i^{\text{hev}} + \lambda \Theta_{\text{hev}}$$
$$\frac{\partial L}{\partial \Theta_{\text{cs}}} = \sum_i d_i x_i^{\text{cs}} + \lambda \Theta_{\text{cs}}$$
$$\frac{\partial L}{\partial \Theta_{\text{gs}}} = \sum_i d_i x_i^{\text{gs}} + \lambda \Theta_{\text{gs}}$$
with $d_i = u_i^{\text{hev}} + u_i^{\text{cs}} – y_i + u_i^{\text{gs}} – z_i$. This ensures secure and efficient model updates.
Communication with the CloudletChain involves a decentralized network of cloud nodes $C = \{c_1, c_2, \dots, c_p\}$ that handle parameter aggregation. Blockchain technology is employed to maintain a secure ledger of verified nodes, preventing unauthorized access. The process includes four steps: registering new cloud nodes via transactions, validating transactions by randomly selected miner nodes, block creation using Merkle tree roots, and propagation of new blocks. For instance, the Merkle root hash for transactions $TX_1$ to $TX_n$ is computed iteratively:
$$H(TX_n + TX_{n-1}) = H(\text{hash}(TX_n)) + H(\text{hash}(TX_{n-1}))$$
$$H(TX_{n-1} + TX_{n-2}) = H(\text{hash}(TX_{n-1})) + H(\text{hash}(TX_{n-2}))$$
$$\vdots$$
$$H(TX_2 + TX_1) = H(\text{hash}(TX_2)) + H(\text{hash}(TX_1))$$
This ensures data integrity and transparency. The CloudletChain enables low-latency communication, which is crucial for real-time recommendations in dynamic environments like those for electric cars in China.
To evaluate the algorithm, I conducted experiments using data from a Chinese city, including 20 CSs, 20 GSs, and 50 HEVs, with 10 cloud nodes distributed geographically. The results demonstrate the effectiveness of the proposed method. Table 2 compares the recommendation algorithm with baseline approaches such as Real-Time Recommendation (RT) and Earliest Finish Time (EFT). The proposed algorithm reduces queuing probability, total cost, and parking fees while improving time utilization and hourly revenue, highlighting its superiority in optimizing station usage for China EV applications.
| Algorithm | Queuing Probability | Total Cost (CNY) | Parking Cost (CNY) | Time Utilization | Hourly Revenue (CNY) |
|---|---|---|---|---|---|
| RT | 0.122 | 25.231 | 3.579 | 0.148 | 7,916 |
| EFT | 0 | 20.948 | 1.754 | 0.193 | 9,021 |
| Proposed | 0 | 19.426 | 0.126 | 0.202 | 9,062 |
Execution time analysis shows that as the number of participants increases, the system’s processing time grows linearly. Starting with 2 GSs, 2 CSs, and 5 HEVs, and scaling up to 20 GSs, 20 CSs, and 50 HEVs, the execution time remains manageable, with the curve flattening at higher scales. This indicates the algorithm’s scalability for large-scale deployments in electric car networks.
Waiting time is another critical metric. For HEVs, the waiting time $T_1$ at CSs includes charging and waiting duration: $T_1 = t_{\text{charging+waiting}} + t_{\text{arrival}}$. For GSs, waiting time $T_2$ is simplified to $T_2 = t_{\text{waiting}} + t_{\text{arrival}}$, as refueling time is negligible. The difference $\rho$ between previous and current HEV waiting times is minimized, ensuring accurate recommendations. Analysis reveals that for HEVs like HEV8, HEV13, and HEV16, $\rho$ values are low, indicating avoidance of congested stations.
Communication delay, defined as the average time from HEV request to receipt of the recommendation list $L_f$, is assessed under different cloud node configurations. As shown in Table 3, a decentralized network with 10 cloud nodes reduces delay to approximately 3 seconds, compared to 9 seconds in a centralized setup. This underscores the benefits of decentralization for real-time applications in the China EV sector.
| Cloud Node Count | Communication Delay (seconds) |
|---|---|
| None (Centralized) | 9.0 |
| 3 | 5.2 |
| 5 | 4.1 |
| 8 | 3.5 |
| 10 | 3.0 |
Blockchain mining time for new blocks is evaluated to ensure network security. As the number of cloud nodes increases from 6 to 10, the execution time for mining new blocks rises, with variance in minimum and maximum times. This is attributed to the proof-of-work consensus mechanism, which enhances tamper resistance but incurs higher computational overhead. The results confirm that the CloudletChain maintains security while supporting network expansion.
In conclusion, the proposed VFL-based recommendation algorithm for HEVs effectively addresses privacy concerns while improving recommendation accuracy and station utilization. By leveraging federated learning and blockchain-based cloud networks, the system offers a scalable solution for the growing electric car market in China. Future work will extend the model to public transportation, such as electric buses, to further optimize charging schedules and enhance user experience. This aligns with the global shift towards sustainable transportation and the rapid adoption of China EV technologies.
