FPGA Implementation of a Low-Redundancy Communication Link for Battery Management Systems

The relentless advancement of new energy vehicles and large-scale energy storage systems has placed unprecedented demands on the accuracy, reliability, and real-time performance of Battery Management Systems (BMS). As the central nervous system monitoring and controlling battery packs, the BMS is responsible for critical tasks such as state-of-charge (SoC) estimation, state-of-health (SoH) monitoring, cell balancing, and thermal management. The efficacy of these functions is fundamentally dependent on the underlying communication architecture that gathers sensor data from potentially hundreds of individual battery cells.

Traditional communication architectures within BMS, such as Controller Area Network (CAN) buses, offer high reliability but can introduce significant wiring complexity and cost, especially in large battery packs. To address this, the daisy-chain topology has emerged as a mainstream solution for cost-sensitive applications. By serially connecting slave devices (typically Analog Front-End, or AFE, chips) and sharing a single communication bus, the daisy-chain drastically reduces the required wire harness, simplifying system integration. This topology is often combined with the Serial Peripheral Interface (SPI) protocol, forming what is commonly termed an ISO-SPI or hybrid architecture.

However, a critical bottleneck persists in these conventional daisy-chain implementations: communication efficiency. To ensure data integrity over the potentially long and noisy chain, robust error detection mechanisms are employed, most notably the Cyclic Redundancy Check (CRC). In standard implementations, a dedicated CRC checksum is appended to the data payload from each individual slave device in the chain. While effective for error detection, this approach introduces substantial protocol overhead. As documented in the communication protocols of commercial BMS AFE chips, this overhead—comprising frame headers, frame tails, and repeated CRC fields—can account for over 28% of the total transmitted frame. Consequently, the effective data throughput can drop below 75% of the theoretical bandwidth. This inefficiency becomes severely limiting in large-scale BMS applications with 64, 128, or even 256 serially connected nodes, where long frame lengths lead to increased communication latency and reduced sampling rates for vital battery parameters.

This article presents a novel solution to this challenge: an optimized, low-redundancy communication link for daisy-chain based Battery Management Systems. The core innovation lies in a dynamic frame structure coupled with an Incremental Regenerative CRC algorithm. This scheme eliminates the need for multiple, discrete CRC fields by generating a single, cumulative checksum that validates the entire data stream from all slave nodes. The proposed architecture is designed and implemented on an FPGA platform, demonstrating a significant reduction in frame length while maintaining robust error detection capabilities, thereby enhancing the overall efficiency and scalability of BMS communication.

Challenges in Conventional BMS Daisy-Chain Communication

The daisy-chain architecture, while advantageous for hardware simplification, imposes specific constraints on the communication protocol. The standard data frame format for a read operation in a traditional ISO-SPI-based BMS is illustrated below. Each transmission is a structured sequence of fields designed to ensure reliable command delivery and data retrieval.

The frame begins with a Start Frame, a unique synchronization pattern. This is followed by a Command Frame (e.g., 8 bits) specifying the operation (Read/Write). Next, an Address Frame (e.g., 16 bits) identifies the target chip and register. The Data Frame length is variable. In a broadcast read command, each slave device in the chain sequentially appends its own data (e.g., 16 bits of voltage data) to this field as the frame propagates back to the master. Crucially, after its data segment, each slave also appends a CRC checksum calculated solely on its own 16-bit data payload. Finally, an End Frame signal terminates the transmission.

The total frame length $ L_{traditional} $ for a system with $ N $ slave nodes can be expressed as:
$$ L_{traditional} = F_{start} + F_{cmd} + F_{addr} + N \times D_{slave} + N \times C_{CRC} + F_{end} $$
where:

$ F_{start}, F_{cmd}, F_{addr}, F_{end} $ are the fixed lengths of the start, command, address, and end fields.
$ D_{slave} $ is the data payload per slave (e.g., 16 bits = 2 bytes).
$ C_{CRC} $ is the CRC checksum length per slave (e.g., 16 bits = 2 bytes).

For a typical setup using a 16-bit CRC, the overhead is severe. The following table compares the structure for a single node versus a multi-node system, clearly showing the linear growth of CRC overhead.

Field	Size (Bytes) – 1 Slave	Size (Bytes) – N Slaves	Description
Start Frame	1	1	Synchronization pattern.
Command Frame	1	1	Read/Write command.
Address Frame	2	2	Chip and register address.
Data Frame	2	N × 2	Accumulated data from slaves.
CRC Fields	2	N × 2	Individual CRC per slave data.
End Frame	1	1	End-of-frame delimiter.
Total Length	9 bytes	5 + (N × 4) bytes

As shown, for N=4 slaves, the traditional frame is 21 bytes long, with 8 bytes (38%) dedicated solely to CRC. For N=16, it grows to 69 bytes, with 32 bytes (46%) as CRC overhead. This structure is the root cause of the inefficiency plaguing large-scale BMS deployments, limiting the speed and responsiveness of the battery management system.

Proposed Low-Redundancy Optimization Scheme

To overcome the limitations of the standard approach, we propose a fundamental redesign of the daisy-chain communication protocol. The core objective is to maintain end-to-end data integrity for the entire stream while eliminating the repetitive CRC fields. This is achieved through a two-pronged strategy: Dynamic Frame Structure Optimization and an Incremental Regenerative CRC algorithm.

Dynamic Frame Structure

The optimized frame collapses the multiple CRC fields into a single, final CRC checksum. In this scheme, slave nodes do not transmit independent CRCs. Instead, they participate in a coordinated, cumulative checksum calculation as the data frame passes through them. The structure of the optimized frame is as follows:

Start Frame: Unchanged synchronization header.
Command & Address Frames: Unchanged.
Consolidated Data Frame: This field contains the concatenated data payloads from all N slave devices, exactly as before. However, no intermediate CRCs are inserted between the data segments.
Single CRC Frame: A single CRC checksum, calculated over the entire Consolidated Data Frame (i.e., all data from all slaves), is appended at the very end of the frame, just before the End Frame.
End Frame: Unchanged.

The length of the optimized frame $ L_{optimized} $ is now:
$$ L_{optimized} = F_{start} + F_{cmd} + F_{addr} + N \times D_{slave} + C_{CRC} + F_{end} $$
Comparing this to the traditional length, the saving is:
$$ \Delta L = L_{traditional} – L_{optimized} = (N – 1) \times C_{CRC} $$
This represents a dramatic reduction in overhead, scaling linearly with the number of nodes.

Incremental Regenerative CRC Algorithm

The simplified frame structure necessitates a new mechanism for CRC generation and verification within the daisy-chain. The key requirement is that each slave must contribute to the final, cumulative CRC without knowing the data from upstream slaves. This is solved by an Incremental Regenerative CRC algorithm executed by a dedicated hardware module in each slave device (Stack chip).

The algorithm operates in two concurrent modes for each slave: Regeneration Mode and Verification Mode.

1. Regeneration Mode (Forward Path): As a slave chip receives the downstream data stream from the next slave in the chain, it performs a continuous, incremental CRC calculation. It initializes a CRC accumulator with a standard seed value. For every byte of downstream data received, it updates the accumulator. Crucially, after processing all downstream data, it then feeds its own local data payload (e.g., 16-bit voltage) into the same CRC accumulator. The resulting checksum now represents a valid CRC for the concatenated data of [All Downstream Slaves’ Data + This Slave’s Data]. This updated checksum is passed upstream to the previous slave, continuing the regeneration process. The final slave in the chain (closest to the master) ultimately sends the single, unified CRC for the entire data stream.

The mathematical operation for a slave at position $ i $ (where i=1 is the last slave, i=N is the first) can be described. Let $ D_{i} $ be the local data of slave $ i $, and $ Stream_{i-1} $ be the data stream received from slave $ i-1 $ (which includes data from slaves 1 to $ i-1 $). The regenerated CRC $ CRC_{i} $ sent upstream by slave $ i $ is:
$$ CRC_{i} = \text{CRC\_Function}( Stream_{i-1} \ || \ D_{i} ) $$
where $ || $ denotes concatenation, and the CRC calculation uses the last slave’s $ CRC_{i-1} $ accumulator state as the starting point for processing $ Stream_{i-1} $.

2. Verification Mode (Backward Path): To ensure the integrity of the data it has just forwarded, each slave also performs a local verification. After transmitting its own data upstream, the slave will eventually receive the final, unified CRC checksum from the master (which is echoed back down the chain for verification in a subsequent operation or in the same frame’s trailing period, depending on the protocol timing). Alternatively, in a pipelined scheme, it can verify the CRC segment sent by its immediate downstream neighbor. The slave independently calculates a CRC on the data stream it received from downstream and compares it with the CRC value it received. A mismatch flags a transmission error originating from a downstream node.

The process for slave $ i $ verifying the data from slave $ i-1 $ involves:
$$ CRC_{calc} = \text{CRC\_Function}( Stream_{i-1} ) $$
$$ \text{Error Flag} = (CRC_{calc} \neq \text{Received } CRC_{i-1}) $$
This dual-mode operation ensures that every segment of the data path is validated, preserving the robust error detection of the traditional method while using only one CRC field.

System Architecture and FPGA Implementation

The proposed low-redundancy communication link was realized using a hierarchical digital design targeted for FPGA implementation. The overall system architecture for the Battery Management System (BMS) communication network consists of four key components:

Component	Role	Key Function
Host/Upper Computer	System Controller	Issues high-level commands (e.g., start logging, change parameters) via USART to the MCU and receives aggregated battery data.
Microcontroller (MCU)	Protocol Master & Data Hub	Acts as the SPI master. Parses host commands, formats SPI frames for the daisy-chain, receives and processes the aggregated data stream from the chain, and relays it to the Host.
Base Chip (Master Node)	Signal Interface Converter	Implemented on the primary FPGA. Converts standard SPI signals from the MCU into robust differential signals (e.g., ISO-SPI) for the daisy-chain and vice-versa. Manages the top-level communication timing.
Stack Chips (Slave Nodes)	Data Acquisition & Communication Node	Implemented on identical FPGAs acting as AFE substitutes. Each has a unique address, measures simulated battery cell voltages, and contains the core Incremental Regenerative CRC module. They are connected via differential ports (DC_RX/DC_TX).

The heart of the optimization lies within the design of the CRC module for each Stack chip. The module, named crc_gen, was designed in Verilog HDL with the following specification:

Polynomial: CRC-16-IBM (Standard), polynomial: $$ P(x) = x^{16} + x^{15} + x^{2} + 1 $$ represented as 0x8005.
Inputs: Local 16-bit data, received 8-bit serial data stream, chip position identifier (chip_count), clock, and control signals.
Outputs: 16-bit regenerated CRC (crc_out), CRC valid signal, and error flag (crc_error).

The internal architecture of the crc_gen module is built around a sub-module crc_16_accumulate. This sub-module performs the fundamental byte-wise CRC calculation with a crucial feature: its internal register retains the CRC state between calculations. This allows for the incremental computation required by the algorithm:
$$ \text{CRC}_{state}(t+1) = f(\text{CRC}_{state}(t), \text{Data}_{byte}(t)) $$
where $ f $ is the standard CRC-16 computation function.

The crc_gen module employs two primary counters to manage the algorithm’s state machine:

Transmit Data Counter (rx_count1): Tracks the amount of data (in bytes) that has been transmitted upstream, including the local data. It determines when the local data should be injected into the CRC accumulator and when the final regenerated CRC for upstream transmission is ready.
Receive Data Counter (rx_count2): Tracks the amount of data received from the downstream slave. It determines when the downstream data segment is complete, triggering the freeze of the local verification CRC value for the subsequent comparison with the received downstream CRC.

The position identifier chip_count is critical. It informs the module whether it is the last slave in the chain (value = 1), the second-last (value = 2), etc. This knowledge dictates its behavior—for example, the last slave knows it receives only raw data from the cell, not a downstream CRC, and thus only performs the regeneration path.

Simulation, Testing, and Performance Analysis

The design underwent rigorous verification through both software simulation and hardware FPGA testing to validate functionality and quantify performance gains for the Battery Management System (BMS).

Functional Simulation

Using Synopsys VCS and Verdi tools, a testbench was created to simulate a system with one Base and three Stack chips (S1, S2, S3). A broadcast read command was issued. Each Stack was programmed with dummy voltage data: S1=0xA510, S2=0xA520, S3=0xA530. The simulation meticulously traced the operation of the Incremental Regenerative CRC module in Stack S1 (the second-to-last node).

The results confirmed correct operation of the dual-mode algorithm:

Regeneration Path: S1 correctly computed the cumulative CRC over the concatenated data {S2_Data, S3_Data, S1_Data} = {0xA520, 0xA530, 0xA510}, producing a final output of 0x812E to send upstream.
Verification Path: S1 independently calculated the CRC for the received downstream data {S2_Data, S3_Data} = {0xA520, 0xA530}, resulting in 0x1A42. The simulation showed this value being correctly compared against the CRC value (0x1A42) later received from S2/S3, generating a “no error” condition.

FPGA Hardware Validation

A physical prototype was built using five Cyclone IV EP4CE10F17C8 FPGA boards. One board served as the Base chip, four as Stack chips. An STM32F103C8T6 microcontroller acted as the system MCU. The boards were connected in a daisy-chain via differential pairs, with the MCU communicating with the Base chip via SPI at 2 Mbps.

In this test, the four Stack chips were programmed with data: 0xA510, 0xA520, 0xA530, and 0xA540. The MCU issued a broadcast read command. Internal logic analyzer (SignalTap) captures from the Base chip and Stack chips confirmed successful end-to-end communication.

The Base chip capture showed the successful reception of the differential data stream and its conversion into a valid SPI data packet for the MCU.
A capture from the first Stack chip (S1) showed its crc_gen module outputting the regenerated CRC 0x58EB (calculated on {S2,S3,S4,S1} data) and successfully performing the downstream data verification, comparing its calculated 0x838A against the received value.

Performance Metrics and Comparative Analysis

The most significant metric is the reduction in frame length. Using the formula and assuming 1-byte Start/Command/End frames, a 2-byte Address field, 2-byte data per slave, and a 2-byte CRC:

Number of Slaves (N)	Traditional Frame Length	Optimized Frame Length	Reduction in Bytes	Percentage Reduction
1	9 bytes	9 bytes	0	0%
4	5 + (4×4) = 21 bytes	5 + (4×2) + 2 = 15 bytes	6 bytes	28.6%
8	5 + (8×4) = 37 bytes	5 + (8×2) + 2 = 23 bytes	14 bytes	37.8%
16	5 + (16×4) = 69 bytes	5 + (16×2) + 2 = 39 bytes	30 bytes	43.5%
32	5 + (32×4) = 133 bytes	5 + (32×2) + 2 = 71 bytes	62 bytes	46.6%

For the tested 4-node system, the frame length was reduced from a traditional 19 bytes (if using slightly different field sizes as in the original paper) to 13 bytes, achieving a 31.58% reduction, directly aligning with the simulation. The efficiency gain increases with the scale of the BMS.

Reliability Testing: The FPGA system was stressed under continuous operation for 24 hours at 2 Mbps. The results confirmed robust performance:

Packet Loss Rate: 0%.
Bit Error Rate (BER) under normal conditions: < 1×10⁻⁴.
BER with induced 50 mV white noise on the differential links: ~8×10⁻³. While higher, this error rate remains within acceptable bounds for many automotive safety integrity level (ASIL) B applications, demonstrating the scheme’s inherent noise resilience inherited from the strong CRC-16 check on the complete dataset.

Conclusion and Future Work

This article has presented the design, implementation, and validation of a novel low-redundancy communication link for daisy-chain based Battery Management Systems. By identifying the protocol overhead in traditional multi-CRC schemes as a major bottleneck for scalable BMS, we proposed an optimized solution centered on an Incremental Regenerative CRC algorithm and a dynamic frame structure.

The FPGA-based implementation proves the concept’s feasibility and tangible benefits. The key achievement is a significant reduction in communication frame length—over 31% for a small chain and theoretically approaching 50% for large-scale systems—directly translating to higher effective data throughput and lower latency for cell parameter monitoring. This enhancement is achieved without compromising the robust error-detection capabilities essential for the safety-critical nature of Battery Management System operation. The dual-mode CRC operation within each node ensures continuous data path validation, maintaining system integrity.

The proposed architecture offers a practical, efficient, and scalable communication backbone for next-generation BMS, particularly suited for electric vehicles and large energy storage systems where hundreds of cells must be monitored in real-time. Future work may focus on further optimizations, such as adapting the algorithm for other error-correcting codes, implementing the design in a low-power ASIC for integration into commercial BMS AFE chips, and exploring hybrid topologies that combine the efficiency of this daisy-chain protocol with the fault tolerance of redundant network structures.