Optimizing the Electronic Control System for Autonomous Vehicles from a Functional Safety Perspective

The rapid evolution of autonomous driving technology has ushered in a profound transformation within the automotive industry. At the heart of this revolution lies the Electronic Control System (ECS), a sophisticated network integrating perception, decision-making, and execution modules, which is fundamental to ensuring stable vehicle operation. However, as system complexity escalates, so do the challenges associated with functional safety. Failures within subsystems and the risks of their coupling can readily precipitate critical safety incidents. Therefore, conducting rigorous functional safety analysis, identifying potential hazards, and implementing optimized designs are of paramount practical significance for enhancing the safety of autonomous vehicles, accelerating technology deployment, and fostering standardized industrial development.

1 Functional Safety Risk Analysis for Autonomous Vehicle ECS

1.1 Risk Analysis Framework

In my analysis, I have established a risk assessment framework adhering to the ISO 26262 standard, structured around a four-stage process: System Definition, Hazard Identification, Risk Assessment, and Risk Control. During system definition, I delineated the boundaries of the ECS, encompassing core components such as perception units, decision units, execution units, and communication buses, clarifying their functional interfaces and interaction logic. For hazard identification, I employed a combined approach of Fault Tree Analysis (FTA) and Failure Mode and Effects Analysis (FMEA), focusing on typical autonomous driving scenarios like urban car-following and highway lane changes to unearth potential safety hazards. Risk assessment was conducted using a risk matrix, quantifying risk levels based on the probability of occurrence and severity of each hazard. Finally, risk control involved formulating preliminary mitigation strategies for high-risk items, laying a solid foundation for subsequent design phases. This framework thoroughly considers the inherent complexity and dynamic nature of the autonomous vehicle ECS, enabling systematic and comprehensive risk insight.

1.2 Identification and Classification of Core Subsystem Failure Modes

The ECS can be decomposed into four core subsystems: Perception, Decision, Execution, and Power Management. Their primary failure modes are summarized below.

Subsystem	Failure Mode Examples	Potential Causes
Perception	LiDAR point cloud loss, camera image distortion, mmWave radar ranging deviation.	Sensor hardware aging, environmental interference (e.g., intense light), data transmission bit errors.
Decision	Path planning conflict, obstacle avoidance logic error, scene misclassification.	Algorithm iteration defects, insufficient computational power, inadequate training data coverage.
Execution	Electro-hydraulic brake jamming, Electric Power Steering (EPS) instability, drive motor control unit torque surge.	Actuator mechanical wear, control signal latency, faults in the motor control unit.
Power Management	Traction battery voltage sag, DC/DC converter efficiency anomaly, low-voltage power supply interruption.	Battery cell imbalance, converter component failure, wiring harness faults.

This classification process also documents the triggering conditions and associated components for each failure mode, providing precise input for targeted risk control.

1.3 Coupled Failure Analysis

Coupled failures manifest as abnormal interactions across subsystems, primarily categorized into cascading failures and interactive coupling failures. Cascading failures exhibit a chain-reaction characteristic. For instance, data packet loss from a perception subsystem radar can lead the decision subsystem to generate an erroneous evasion command, which in turn causes the execution subsystem to over-brake or under-steer, potentially resulting in an accident. Interactive coupling failures arise from resource contention and signal interference during the parallel operation of multiple subsystems. For example, computational resource allocation conflicts within the decision subsystem can simultaneously degrade perception data processing efficiency and execution command response rates. Communication bus latency can cause data desynchronization between perception, decision, and execution units, disrupting the closed-loop functionality. Furthermore, hardware-software coupling failures are significant; an anomaly in a microcontroller’s hardware interrupt can lead to chaotic scheduling in the embedded software tasks, causing system function degradation or complete failure. These failures are more insidious and often require joint simulation techniques for accurate identification.

1.4 Risk Quantification and Assessment

I employed a hybrid method combining Risk Priority Number (RPN) with Fuzzy Comprehensive Evaluation to mitigate the limitations of a single approach. The RPN is calculated by assessing three factors: Severity (S), Occurrence Probability (O), and Detectability (D).

$$ RPN = S \times O \times D $$

Severity is classified into levels such as life-threatening, causing severe injury, or minor injury, referencing relevant safety standards. Occurrence probability is estimated based on subsystem reliability test data, supplemented by models like Markov chains to predict failure rates over different operational mileages. Detectability is rated according to the capability of existing diagnostic technologies. To address uncertainties in these ratings, I introduced fuzzy logic. Expert scoring was used to construct fuzzy membership matrices, allowing for a more nuanced and adjusted final risk level. For example, in a high-speed driving scenario, brake failure in the execution subsystem might yield an RPN of 120, categorizing it as an unacceptable risk. Conversely, a minor ranging deviation in the perception subsystem during low-speed operation might have an RPN of 16, deemed acceptable. This quantitative assessment clarifies the priority of all risk items, ensuring that design and validation resources are concentrated on managing the highest risks. The assessment of the motor control unit within the execution subsystem is particularly critical, as its failure directly impacts vehicle dynamics and safety.

2 Optimization Design Scheme for Autonomous Vehicle ECS Based on Functional Safety

2.1 Safety Design Principles

The safety design for the ECS is guided by four core principles derived from ISO 26262. First, the Functional Safety-Oriented Principle dictates that all design activities target the high-risk items identified during the risk assessment, such as brake failure or perception data loss. Second, the Redundancy Design Principle mandates the use of “primary-backup” or “multi-modal heterogeneous backup” architectures for components whose single-point failure could lead to system catastrophe. Third, the Fault Detectability and Controllability Principle requires embedding diagnostic mechanisms throughout the system to ensure fault identification within a stringent timeframe (e.g., 50 ms) and the activation of predefined fallback strategies to transition the system to a safe state. Fourth, the Compatibility Principle ensures the design withstands environmental disturbances (e.g., high temperature, EM radiation) and provides interfaces for future algorithm and hardware upgrades. These principles form a coherent design logic: risk targeting, redundancy for fault tolerance, fault management, and evolution readiness. Their implementation is validated through cross-disciplinary design reviews involving hardware engineers, software algorithm developers, and functional safety experts.

2.2 Hardware Layer Safety Design

My hardware layer design focuses on reinforcing high-risk components. The perception unit employs a heterogeneous redundant architecture with triple-sensor fusion (LiDAR, mmWave Radar, Camera), synchronized by a hardware-level data synchronization module. If the LiDAR fails, the system seamlessly switches to fused data from the radar and camera with a latency under 10 ms. Each sensor incorporates a built-in self-diagnostic module.

For the execution subsystem, the brake control unit utilizes a dual heterogeneous Microcontroller Unit (MCU) architecture. The primary motor control unit handles normal brake command generation, while the secondary motor control unit continuously verifies the primary’s calculations via a hardware comparator. If a command deviation exceeds a threshold (e.g., 0.5%), the secondary MCU immediately assumes control authority. The electro-hydraulic brake actuator features a dual-winding design with independent power circuits. Current sensors monitor each winding; if one fails, the other automatically increases its drive current to maintain braking performance. The power management subsystem adopts a dual-supply architecture (traction battery + low-voltage battery) with redundant DC/DC converters, ensuring voltage deviation is less than 0.1V and integrating over-voltage/over-current protection hardware with a response time ≤ 2 ms.

2.3 Software Layer Safety Design

The software architecture is built on modularity and fault tolerance. Perception software implements a multi-source data validation mechanism. LiDAR point cloud data must match visual feature points from camera images; if the matching score falls below a threshold (e.g., 85%), a filtering algorithm rejects the anomalous data. A Kalman filter is employed to predict vehicle state and correct for transient sensor errors:

$$ \hat{x}_{k|k-1} = F_k \hat{x}_{k-1|k-1} + B_k u_k $$
$$ P_{k|k-1} = F_k P_{k-1|k-1} F_k^T + Q_k $$

Decision software uses a scenario-classified modular architecture. Independent decision logic is predefined for scenarios like urban roads and highways, with a scene identification module enabling real-time switching. Reinforcement learning is integrated to enhance decision-making adaptability in edge cases. Execution software employs a dual-command verification mechanism. Brake or steering commands must be computed independently by the main and redundant programs; only consistent outputs are sent to the actuator. A Real-Time Operating System (RTOS) with priority-based task scheduling is used, assigning the highest priority to brake control tasks. A watchdog timer (e.g., 10 ms period) monitors for software hangs, triggering a reset upon timeout. Data integrity is ensured using the CRC-32 algorithm, with retransmission initiated upon checksum failure.

2.4 System Layer Safety Design

System-layer design focuses on managing coupled failures through a three-tier “Domain Control – Coordination – Monitoring” architecture. A Domain Controller (DC) architecture partitions the system into Perception, Decision, Execution, and Power domains. High-bandwidth Ethernet facilitates inter-domain data exchange, while intra-domain control commands use CAN FD to reduce cross-domain interference. A data buffering module between the Perception and Decision domains mitigates synchronization issues caused by transmission delays, with buffer time dynamically adjusted per scenario (e.g., compressed to <5 ms for high-speed driving). A cross-domain cooperative control mechanism is established. If one domain experiences a minor failure, domains negotiate to adjust control strategies—for example, the Decision domain may automatically increase the safe distance if the Perception domain reports a ranging deviation. A central Functional Safety Monitoring module collects real-time parameters (CPU load, memory usage, response time) from all domain controllers. Preset thresholds determine system status; if resource conflicts are detected, a scheduling algorithm is triggered to re-allocate computational power. For communication, an Ethernet and CAN FD redundant design is implemented for critical commands, which are transmitted simultaneously on both buses. The receiving end uses a “first-come, first-served” principle to select valid commands, preventing loss due to bus failure. This architecture ensures robust coordination, especially for critical signals from the vehicle’s various motor control unit instances.

3 Functional Safety Verification and Assessment of the Autonomous Vehicle ECS

3.1 Multi-Dimensional Verification Methodology

I established a closed-loop verification system encompassing four layers: Hardware, Software, System, and Vehicle. The methods and key technologies for each layer are summarized below.

Verification Layer	Method	Key Technology / Equipment	Objective
Hardware	Environmental Stress Screening & Fault Injection	Climate Chambers, Signal Generators (e.g., Tektronix AWG), High-speed Oscilloscopes	Validate component robustness and redundancy switch-over timing (e.g., dual-MCU).
Software	Model-in-the-Loop (MIL) & Software-in-the-Loop (SIL)	MATLAB/Simulink, Test Case Library (2000+ fault scenarios), Embedded Code Emulation	Test algorithm fault tolerance, task scheduling, and watchdog logic under CPU load.
System	Hardware-in-the-Loop (HIL)	dSPACE SCALEXIO, Prescan, CANoe, Physical Domain Controllers & Actuators	Validate system integration, domain coordination, and coupled failure responses.
Vehicle	Real-world Driving Tests	Data Loggers (e.g., Vector VN1630), Designated Test Routes	Assess overall system performance under real environmental and traffic conditions.

Specific tests included injecting fault signals (command deviation, power fluctuation) into the dual motor control unit brake controller, simulating CPU saturation at 95% load in SIL, and creating bus delay/coupling failure scenarios in HIL. Real-vehicle testing covered 128,000 km across six typical operational design domains (ODDs), including adverse weather conditions, with each test repeated five times to eliminate偶然误差.

3.2 Verification Results and Assessment

The verification data demonstrates significant improvement in system robustness. Hardware-layer tests showed maximum sensor data deviation under extreme environments was 1.8%, below the 2% design threshold. The fault response time for the dual-MCU brake control unit was 3.2 ms, and brake force decay during actuator winding switchover was less than 3%. In software MIL tests, the perception data validation algorithm achieved a 98.5% anomaly rejection rate, and the reinforcement learning decision logic showed no conflicts in sudden cut-in scenarios. SIL testing confirmed the brake control task response time stabilized at 0.8 ms, with a 100% watchdog timer reset success rate. System-level HIL tests revealed an inter-domain data synchronization error of 2.7 ms, zero loss of critical commands thanks to bus redundancy, and a 99.2% accuracy rate for cross-domain cooperative strategies under coupled failure scenarios. Real-vehicle testing yielded a perception fusion accuracy of 95.3% in rain/fog and a braking distance deviation of only 0.42 m in high-speed scenarios. The comprehensive assessment confirms that the core system metrics satisfy the stringent requirements of ISO 26262 ASIL-D. A noted area for future optimization is the increased LiDAR point cloud matching latency (up to 12 ms) under extreme torrential rain, which could be addressed by enhanced filtering algorithms. Overall, the verification data comprehensively covers the 16 high-risk items identified earlier, providing a reliable quantitative basis for the system’s safe mass production.

4 Conclusion

This research has established a complete framework of risk analysis, design optimization, and verification assessment centered on the functional safety of the autonomous vehicle Electronic Control System. Through precise hazard identification, targeted design enhancements—particularly in redundant hardware architectures like the dual motor control unit setup, robust software layers, and a cohesive system domain architecture—and rigorous multi-dimensional validation, a solid foundation for system safety has been laid. However, technology is perpetually evolving, and new application scenarios will invariably introduce novel safety challenges. Therefore, future work must focus on integrating emerging technologies like artificial intelligence to deepen research into dynamic risk assessment, further strengthening the system’s adaptive safety capabilities. This continuous effort is essential to fortify the safety bedrock for the high-quality development and widespread adoption of autonomous vehicles.