Dependence Analysis and Structured Construction for Batched Sparse Code

Jiaxin Qing, Xiaohong Cai, Yijun Fan, Mingyang Zhu, and Raymond W. Yeung J. Qing, X. Cai, and Y. Fan are with the Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR (emails: {jqing, cx021, fy022}@ie.cuhk.edu.hk).M. Zhu is with the Institute of Network Coding, The Chinese University of Hong Kong (email: [email protected])R. W. Yeung is with the Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR. R. W. Yeung is also with the Institute of Network Coding, The Chinese University of Hong Kong, and he is also a Principal Investigator of the Centre for Perceptual and Interactive Intelligence (CPII) Limited (email: [email protected]). The work of R. W. Yeung was supported in part by a fellowship award from the Research Grants Council of the Hong Kong Special Administrative Region, China under Grant CUHK SRFS2223-4S03.

Abstract

In coding theory, codes are usually designed with a certain level of randomness to facilitate analysis and accommodate different channel conditions. However, the resulting random code constructed can be suboptimal in practical implementations. Represented by a bipartite graph, the Batched Sparse Code (BATS Code) is a randomly constructed erasure code that utilizes network coding to achieve near-optimal performance in wireless multi-hop networks. In the performance analysis in the previous research, it is implicitly assumed that the coded batches in the BATS code are independent. This assumption holds only asymptotically when the number of input symbols is infinite, but it does not generally hold in a practical setting where the number of input symbols is finite, especially when the code is constructed randomly. We show that dependence among the batches significantly degrades the code’s performance. In order to control the batch dependence through graphical design, we propose constructing the BATS code in a structured manner. A hardware-friendly structured BATS code called the Cyclic-Shift BATS (CS-BATS) code is proposed, which constructs the code from a small base graph using light-weight cyclic-shift operations. We demonstrate that when the base graph is properly designed, a higher decoding rate and a smaller complexity can be achieved compared with the random BATS code.

I Introduction

The sixth-generation communication (6G) is envisioned to be reliable and intelligent, providing seamless connectivity for global computing and broadband coverage. It will be the key infrastructure that supports an even higher density of connections from a wider variety of devices than 5G, such as mobile phones, vehicles, wireless sensors, and other edge devices, creating a massive wireless network where different nodes are connected and communicating through each other [1, 2]. The wireless multi-hop network is a commonly used model to study data transmission in wireless mesh networks and empowers a wide range of applications, such as integrated ground-air-space networks [3], smart sensing [4], autonomous driving [5, 6], internet-of-things [7] and integrated access-backhaul networks (IAB) [8], etc., which is crucial for providing a seamless, stable and intelligent communication experience for users.

Refer to caption — Figure 1: Graphical Representation of the BATS Code. The input symbols are represented by circles (variable nodes). The batches are represented by squares (check nodes), which consist of several coded symbols.

However, data loss is inevitable in wireless communication because of certain physical phenomena, such as refraction, diffraction, and multipath reflection, as electromagnetic waves propagate through the air. From the upper layer protocols’ perspective, data are represented as packets. An erasure channel can model the data transmission, where a packet is either lost or well-received. In addition to the loss induced in the physical layer, packets can also be lost due to channel congestion and competition or unreliable connections due to the high mobility of devices [9, 10]. As a result, packet loss accumulates exponentially fast as data traverse through the multi-hop network, which can easily exceed the threshold that TCP can handle with packet retransmissions after a few hops.¹¹1TCP is based on the assumption that the loss is below an outrage probability. This assumption is always true when the optical fiber or the twisted pair cable is used [11, 12]. Even though some techniques exist to constrain the link-to-link packet loss under a certain threshold, for example, Adaptive Modulation and Coding (AMC), they rely on using either a higher transmission power or a lower data rate [13, 12, 14], which makes reliable and high-throughput communication through a wireless multi-hop network impractical. According to [15], the throughput of a 20 Mbps single-hop network drops to around 1 Mbps when the number of hops increases to 8 using IEEE802.11a.

The Batched Sparse Code (BATS Code) solves this “multi-hop curse” with a network-coded fountain [16], where the intermediate nodes perform coding on the received packets rather than simple forwarding. The end-to-end packet loss asymptotically converges to the single-hop packet loss by employing network coding. Using the BATS code, the network capacity can be approached for unicast networks and certain multicast networks under different network topologies [17].

However, when analyzing the performance [17, 18, 19], an implicit assumption is made that check nodes are independent. But when the number of input symbols (variable nodes) and the number of coded symbols (check nodes) are moderately small, check nodes can be highly dependent. In this paper, we will show that the mutual independence of check nodes is a sufficient condition that achieves an upper bound of the decoding rate; the decoding rate decreases as the dependence strength increases. The dependence among check nodes is not considered in the random BATS code construction, where check nodes are randomly connected to variable nodes according to a designed degree distribution.

Additionally, in practical adaptations of the BATS code, a hardware implementation, for example, Field Programmable Gate Array (FPGA) or Application Specific Integrated Circuit (ASIC), is usually required to match the increasing demand for throughput and power-consumption efficiency [20, 21]. However, the traditional random construction leads to high complexity in circuit routing and buffer allocations in hardware implementations [22, 23]. An example of added complexity due to random code construction is given in Fig. 2a. To randomly select input symbols for encoding in hardware, a fully connected circuitry is needed to route all possible combinations of the input symbols from the memory to the register buffers for further computations. Furthermore, the number of symbols to select (i.e., degree) is usually determined on the fly according to a probability distribution. This means the register buffers must be sizeable enough to accommodate the maximum degree, even though its occurrence may be low.

To address these issues, we propose constructing the BATS code in a more structured way so that better performance and lower complexity can be achieved, preserving the desired properties of a BATS code simultaneously. We summarize our contributions as follows.

1.

We show that the check node dependence degrades the decoding rate²²2The decoding rate is defined as the number of decoded symbols divided by the total number of source symbols when the decoding stops., and independence is a sufficient condition to achieve the decoding rate upper bound.
2.

We propose a new class of BATS codes called the structured BATS, exploiting a new design dimension and reducing node dependence in the construction. In particular, we introduce a hardware-friendly protograph-based Cyclic-Shift construction method (CS-BATS).
3.

From exhaustive numerical simulations, we show that CS-BATS is superior not only in implementation complexity but also in decoding rate when the code is well-structured.

The remainder of this paper is organized as follows. Section II and Section III present the related works and preliminaries. Section IV shows that check node dependence degrades the decoding rate of the BATS code from two perspectives: conditional probability and correlation. The Cyclic-Shift BATS code is introduced in Section V. Simulations are performed in Section V to compare the performance and complexity of the CS-BATS and the random BATS. Section VI presents a reduced complexity decoding scheme based on the CS-BATS code for hardware implementations. Finally, our conclusions are offered in Section VII.

II Related Work

The BATS code retains the defining characteristics of the fountain code, including its rateless nature and the low complexity of its encoding and decoding processes. When compared with traditional random linear network coding methods [24, 25, 26], the BATS code offers advantages such as lower complexity in encoding and decoding, reduced overhead for the coefficient vector, and decreased caching requirements at the intermediate nodes. Furthermore, compared with other low-complexity random linear network coding techniques such as the Gamma code [27, 28], EC code [29], and L-chunked code [28], the BATS code consistently delivers a higher transmission rate and can produce an unlimited number of batches. The utilization of the BATS code in various network communication settings has been examined in the studies in [30, 31, 32]. More discussion on the related works of the BATS code and their comparison could be found in [17, 19].

The BATS code can be described and decoded graphically according to the Tanner graph [33], similar to the Low-Density Parity-Check (LDPC) code [34, 35, 36]. Motivated by the protograph LDPC [37, 38, 23, 39, 40, 41], [42] explored the design of a structured BATS code to increase the decoding rate when the number of input symbols is small. The authors in [42] used a reinforcement learning approach to explore the graphical space, trying to find graphs that give a higher decoding rate and coding efficiency than the randomly constructed BATS code. However, this method fails to preserve the rateless property of the BATS code, and the deep learning models add extra overheads for practical implementations. Nonetheless, it shows that random construction is suboptimal regarding the decoding rate and complexity.

Before this work, there were mainly three ways to analyze the performance of BATS codes: differential equation analysis [17], tree analysis [18], and finite-length analysis [19]. Differential equation analysis and tree analysis consider the asymptotic decoding rate as the number of input symbols goes to infinity. The finite-length analysis evaluates the decoding rate for a given number of batches, which is a more practical analysis relaxing the asymptotic assumption in the previous work. However, all these analyses implicitly assume that check nodes are decoded independently. For example, in the finite-length analysis, $p_{t,s}$ , the probability that a batch is decodable for the first time at time $t$ and has degree $s$ , is used to derive a recursive formula for calculating the decoding stop** time. In formulating $p_{t,s}$ , a hypergeometric distribution is used [19], which implicitly assumes check nodes are decoded independently. However, the decodable probability of check nodes can be highly dependent on each other when the number of input symbols ( $K$ ) is small, especially when the Tanner graph is constructed randomly.

III BATS Code Preliminary

The BATS code is a matrix-generalized network-coded fountain code [43] that works on multi-hop networks with erasure channels. It enables an operation called “recoding” at the intermediate nodes, which performs random linear network coding (RLNC) [44] on the received packets. The design of the recoding operation is called the inner code. Careful design of the inner code increases the network throughput in different scenarios [45, 46, 47]. Besides the inner code, the BATS code consists of an outer code that performs encoding and decoding at the source and destination nodes. In this work, we mainly study the outer code, and the recoding is included in this section for the completeness of the discussion.

III-A Encoding

As a matrix generation of the fountain code, the outer code generates encoded packets as “batches” comprising several coded packets. To generate a batch, we first need to randomly select $dg$ symbols from a total of $K$ input symbols, where $dg$ is a degree sampled from an optimized probability distribution $\Psi$ . Each symbol is a vector of $pk$ elements from the Galois field $GF(q)$ . Then, we take $M$ different linear combinations of the selected symbols to generate a batch of size $M$ . This process can be described by the following linear system for generating the $i$ -th batch,

\bm{X}_{i}=\bm{B}_{i}\bm{G}_{i},

(1)

where $\bm{B}_{i}\in\mathbb{F}_{q}^{pk\times dg}$ is comprised of $dg$ selected symbols, and $\bm{G}_{i}\in\mathbb{F}_{q}^{dg\times M}$ describes the linear combinations to take, resulting in a matrix $\bm{X}_{i}\in\mathbb{F}_{q}^{pk\times M}$ representing $M$ coded symbols. The resulting code can be represented by a Tanner graph as in Fig. 1.

III-B Recoding

The recoding takes random linear combinations of the received packets from the same batch and generates a new batch of size $M$ . It can be described by the linear system

\bm{Y}_{i}=\bm{X}_{i}\bm{H}_{i},

(2)

where $\bm{Y}_{i}\in\mathbb{F}_{q}^{pk\times M}$ is the recoded batch, $\bm{X}_{i}\in\mathbb{F}_{q}^{pk\times m}$ consists of the received packets, $\bm{H}_{i}\in\mathbb{F}_{q}^{m\times M}$ is called the transfer matrix and $m$ is the number of received packets. If no packet is lost during the transmission, $m$ will equal $M$ . If $m$ is smaller than $M$ , which means packets are lost during the transmission, the rank of the batch will decrease even though the recoding generates $(M-m)$ packets. Therefore, we can use a rank distribution to model the end-to-end channel condition from the source node to the destination node. Specifically, the rank distribution is written as $h=(h_{0},\ldots,h_{j},\ldots,h_{M})$ , where $h_{j}$ represents the probability of receiving a batch with a rank of $j$ , and $j\in[0,M]$ .

III-C Decoding

The BATS code is decoded at a destination node where the $K$ input symbols need to be recovered. Consider the system of linear equations received at a destination node,

\bm{Y}_{i}=\bm{B}_{i}\bm{G}_{i}\bm{H}_{i},

(3)

where $\bm{Y}_{i}\in\mathbb{F}_{q}^{pk\times m}$ consists of the received packets, $\bm{B}_{i}\in\mathbb{F}_{q}^{pk\times dg_{i}}$ consists of selected input symbols and $\bm{G}_{i}\bm{H}_{i}\in\mathbb{F}_{q}^{dg_{i}\times m}$ is the product of the generator matrix and the transfer matrix. In (3), $\bm{Y}_{i}$ and $\bm{G}_{i}\bm{H}_{i}$ are known from the received information while $\bm{B}_{i}$ comprises the unknown symbols to be solved. For each batch, we need to solve this linear system and recursively substitute the already decoded symbols to other batches according to the underlying Tanner graph. This recursive linear-equation-solving and substituting decoding process is called the belief propagation (BP) decoding for the BATS code. A batch is decodable if and only if the rank of the matrix formed by the received packets (the rank of the received packets for brevity) is equal to the degree of this batch, namely $\textrm{rank}(\bm{G}_{i}\bm{H}_{i})=dg_{i}$ .

However, the performance of BP decoding is not satisfactory when the number of input symbols is small. Thus, a compromise between BP and the Gaussian elimination decoding, called inactivation decoding [17], is usually used for recovering more input symbols at the expense of higher computational complexity. The induced complexity of inactivation decoding increases as the number of inactivation symbols increases. In the extreme case that all the unsolved variables are inactivated, it becomes Gaussian elimination decoding.

III-D Degree Distribution Optimization

Given a rank distribution $h=(h_{0},\ldots,h_{M})$ at a destination node, the most important step in designing a good BATS code for that destination node is to find the asymptotically optimal degree distribution $\Psi$ , which can be obtained by solving the following optimization problem,


$\displaystyle\max_{\Psi}\quad$	$\displaystyle\theta$	(4a)
s.t.	$\displaystyle\Omega(x,\Psi,h)+\theta\ln(1-x)\geq 0,\ \forall\ 0\leq x\leq\eta$	(4b)
	$\displaystyle\sum^{K}_{d=1}\Psi_{d}=1,\ \Psi_{d}\geq 0,\quad d=1,...,K$	(4c)

where $\theta$ is the achievable rate. The definitions of $\Omega$ and $\zeta_{r}^{k}$ can be found in [17]. From the tree analysis [18] and asymptotic analysis [17] of the BATS code, (4b) gives a sufficient condition for decoding up to $\eta K$ input symbols with probability at least $1-\exp({-cK})$ .

IV Decoding Rate and Dependence

The decoding rate is defined as the portion of variable nodes decoded by the end of the decoding, which can be analyzed from the decodable probability of an arbitrary variable node. Notably, the decoding rate is not affected by the order in which the batches are decoded [17]. The decodable probability is defined as starting from the initial state, the probability of being decoded when the decoding stops. In this section, we investigate how this probability would change if the dependence relation among its neighboring check node changes.

In this section, we associate each node (variable node and check node) with a Bernoulli random variable as an indicator of its decodability. Firstly, we analyze the expectation of a variable node under a simplified model, where we investigate how this expectation would change with the correlation between two neighboring check nodes. Then, a lower bound and an upper bound will be derived for the decodable probability of a variable node in terms of the decodable probability of its neighboring check nodes under a general model.

IV-A Correlation Decreases Decodable Expectation

The dependence relations represented by a Tanner graph can be exceedingly complicated. To develop some intuition, let us consider a simplified model where an arbitrary variable node is connected to two check nodes. Since we only consider two check nodes in this model, the Pearson correlation coefficient can be used to measure the relation between them.

Generally, the correlation coefficient is not equivalent to the dependence measured by a more general measure, the mutual information. For example, two random variables being independent implies that the correlation coefficient is zero, but the converse is not always true because the correlation coefficient only measures linear dependencies. However, as each node can be represented by a Bernoulli indicator random variable, only linear dependencies can exist between two nodes [48].

Here, we consider a variable node $\tilde{V}$ connected to two check nodes, $\tilde{C_{1}},\tilde{C_{2}}$ . Consider the Pearson correlation of two sets of data obtained by sampling two random variables, $C_{1}$ and $C_{2}$ , respectively, for multiple trials.³³3By multiple trials, we mean that decoding is performed multiple times with the same Tanner graph, rank distribution, and other BATS configurations. For each trial, we will obtain a pair of binary indicator numbers. Thus we can plot the pairs in a 2-dimensional space as a heat map to indicate the occurrence frequency for different $(C_{1},C_{2})$ pairs. The Pearson correlation coefficient $\rho_{C_{1},C_{2}}$ can be calculated to measure the correlation between $C_{1}$ and $C_{2}$ . Fig. 3 shows four possible cases and the corresponding correlation value range. As discussed, the correlation coefficient is not generally equivalent to the dependence, as the former only measures the linear relationship between random variables while the dependence also measures the nonlinear relationship. However, uncorrelatedness and independence are equivalent for multivariate Bernoulli distribution [48], which means that $C_{1}$ and $C_{2}$ can only have linear relationships. This is further illustrated in Fig. 3, which lists all possible relationships between two Bernoulli random variables. When the occurrence of $(0,0)$ and $(1,1)$ dominates, we will have a high positive correlation due to strong positive linear relations between the two random variables as shown in Fig. 3a and Fig. 3c. When the occurrence of $(0,0)$ , $(0,1)$ , $(1,0)$ and $(1,1)$ are all the same as shown in Fig. 3b, $C_{1}$ and $C_{2}$ will have a zero correlation. Note that a negative correlation is meaningless and physically invalid in our context, as one node being undecodable cannot increase the decodability of another node and vice versa. In other words, $(0,1)$ and $(1,0)$ will never dominate in the trials as shown in Fig. 3d. Therefore, without loss of generality, we only consider that $\rho_{C_{1},C_{2}}\in[0,1]$ .

Since a variable node is decodable when at least one of its neighboring check nodes becomes decodable, we can write $V=C_{1}+C_{2}-C_{1}C_{2}$ , where $V,C_{1}$ and $C_{2}$ are Bernoulli random variables. Specifically, we have $C_{1}\sim\textrm{Bernoulli}(\alpha_{1})$ and $C_{2}\sim\textrm{Bernoulli}(\alpha_{2})$ . Then the expectation of $V$ can be written as

	$\displaystyle\mathbb{E}(V)$	$\displaystyle=$	$\displaystyle\mathbb{E}(C_{1})+\mathbb{E}(C_{2})-\mathbb{E}(C_{1}C_{2})$		(5)
		$\displaystyle=$	$\displaystyle\alpha_{1}+\alpha_{2}-\mathbb{E}(C_{1}C_{2}).$		(5)

The Pearson correlation of $(C_{1},C_{2})$ is given by

\rho_{C_{1},C_{2}}=\frac{\mathbb{E}(C_{1}C_{2})-\alpha_{1}\alpha_{2}}{\sqrt{% \alpha_{1}(1-\alpha_{1})\alpha_{2}(1-\alpha_{2})}}.

(6)

Substitute $\mathbb{E}(C_{1}C_{2})$ in (6) to (5), we can write $\mathbb{E}(V)$ as a function of $\rho_{C_{1},C_{2}}$ ,

\mathbb{E}(V)=\alpha_{1}+\alpha_{2}-\alpha_{1}\alpha_{2}-\rho_{C_{1},C_{2}}% \sqrt{\alpha_{1}(1-\alpha_{1})\alpha_{2}(1-\alpha_{2})}.

(7)

When $\alpha_{1}$ and $\alpha_{2}$ are fixed, Eq. 7 suggests that the expectation of $V$ decreases as $\rho_{C_{1},C_{2}}$ increases, and it is minimized when $\rho_{C_{1},C_{2}}=1$ . Namely, only $(0,0)$ and $(1,1)$ occur in the trials, which can be achieved by complete dependence as defined in (10). It is maximized when $\rho_{C_{1},C_{2}}=0$ , that is, when $C_{1}$ and $C_{2}$ have no correlation with each other. We can see that the complete dependence of check nodes is a sufficient condition to achieve the lower bound of the decoding rate, and the independence of check nodes is a sufficient condition to achieve the upper bound. In addition, Eq. 7 also characterizes the change of decoding rate along with the correlation coefficient, showing that a higher correlation among check nodes decreases the decoding rate of variable nodes. However, this correlation analysis assumes the variable node is connected to only two check nodes. Therefore, we will investigate a more general case with conditional probability in the next section.

IV-B Dependence Bounds Decodable Probability

Without loss of generality, we consider a random variable node connected to multiple check nodes, and all other connections are arbitrary, as shown in Fig. 4. In BATS codes, the decodable probability of a variable node depends on its neighboring check nodes. The variable node will be decodable if at least one of its neighboring check nodes becomes decodable. On the other hand, the decodable probability of a check node depends on how many of its neighboring variables can be decoded during the BP decoding if the check node’s rank is insufficient when the BP starts.

Consider a Tanner graph defined by $\mathcal{T}=(\tilde{C},\tilde{V},\tilde{E})$ . Let $\tilde{C}\triangleq\{\tilde{C}_{1},\tilde{C}_{2},...,\tilde{C}_{N}\}$ and $\tilde{V}\triangleq\{\tilde{V}_{1},\tilde{V}_{2},...,\tilde{V}_{K}\}$ be disjoint sets representing the check nodes and variable nodes, respectively. Set $\tilde{E}$ contains edges that connect the nodes in $\tilde{V}$ and the nodes in $\tilde{C}$ . Here, we assume that the coding coefficients are chosen independently from the non-zero elements of the base field according to the uniform distribution. With this setting, we associate each node in $\mathcal{T}$ with a Bernoulli random variable as an indicator for its decodability. Let $C\triangleq\{C_{1},C_{2},...,C_{N}\}$ and $V\triangleq\{V_{1},V_{2},...,V_{K}\}$ , where $C_{n}$ corresponding to $\tilde{C}_{n}$ and $V_{k}$ corresponding to $\tilde{V}_{k}$ , $1\leq n\leq N$ and $1\leq k\leq K$ , are random Bernoulli variables with parameters of $\alpha_{n}$ and $\beta_{k}$ , respectively, where $\alpha_{n},\beta_{k}\in(0,1]$ . Specifically, we let $P(C_{n}=1)=\alpha_{n}$ be the decodable probability of $\tilde{C}_{n}$ and $P(V_{k}=1)=\beta_{k}$ be the decodable probability of $\tilde{V}_{k}$ . Accordingly, we have $P(C_{n}=0)=1-\alpha_{n}$ and $P(V_{k}=0)=1-\beta_{k}$ for the probability of being undecoded.

To analyze the decodable probability of a variable node and understand how it is affected by its neighboring check nodes, we first prove an intuitive lemma, which states that when some check nodes are known to be undecodable, the undecodable probability of another check node will either be increased or remain unchanged.

Lemma 1

Assume that a variable node is connected to $n$ check nodes, $\tilde{C}_{1},\tilde{C}_{2},...,\tilde{C}_{n}$ , where $n\geq 1$ . Then for all $i\in[1,n]$ and all $\mathcal{J}\subseteq[1,n]\backslash\{i\}$ ,

P\left(C_{i}=0\right)\leq P\left(C_{i}=0|C_{j}=0,\ j\in\mathcal{J}\right).

(8)

Proof:

Let $i\in[1,n]$ be fixed. We will first prove the inequality

P\left(C_{i}=1\right)\geq P\left(C_{i}=1|C_{j}=0,\ j\in\mathcal{J}\right),

(9)

which can be used to prove (8) by considering

	$\displaystyle P(C_{i}=1\|C_{j}=0,\ j\in\mathcal{J})$
	$\displaystyle=1-P(C_{i}=0\|C_{j}=0,\ j\in\mathcal{J}),$

and

P(C_{i}=1)=1-P(C_{i}=0).

We will prove (9) by induction on $|\mathcal{J}|$ . Firstly, when $|\mathcal{J}|=0$ , i.e., $\mathcal{J}=\varnothing$ , (9) obviously holds.

Now assume (9) holds for all $\mathcal{J}\subseteq[1,n]\backslash\{i\}$ , s.t. $|\mathcal{J}|=k$ , and we want to prove that it also holds for $|\mathcal{J}|=k+1$ , where $0\leq k\leq n-2$ . Now consider a subset $\mathcal{J}$ of $[1,n]\backslash\{i\}$ of size $k+1$ , and let $\alpha$ be an arbitrary element in $\mathcal{J}$ . Then

	$\displaystyle P(C_{i}=1\|C_{j}=0,j\in\mathcal{J})$
	$\displaystyle=\frac{P(C_{i}=1,C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle=\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}{P(% C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad-\frac{P(C_{i}=1,C_{\alpha}=1\|C_{j}=0,j\in\mathcal{J}% \backslash\{\alpha\})}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})}$
	$\displaystyle=\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}{P(% C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad-\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\}% )}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad\times P(C_{\alpha}=1\|C_{i}=1,C_{j}=0,j\in\mathcal{J}% \backslash\{\alpha\})$
	$\displaystyle\mathrel{\overset{\makebox[0.0pt]{\mbox{\tiny(a)}}}{\leq}}\frac{P% (C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}{P(C_{\alpha}=0\|C_{j}=0,% j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad-\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\}% )}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad\times P(C_{\alpha}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})$
	$\displaystyle=P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})$
	$\displaystyle\quad\times\frac{1-P(C_{\alpha}=1\|C_{j}=0,j\in\mathcal{J}% \backslash\{\alpha\})}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})}$
	$\displaystyle=P(C_{i}=1\|C_{j}=0,\ j\in\mathcal{J}\backslash\{\alpha\})$
	$\displaystyle\mathrel{\overset{\makebox[0.0pt]{\mbox{\tiny(b)}}}{\leq}}P(C_{i}% =1),$

where (a) is obtained from the observation that if one check node is decoded, the degree of any other neighboring check node will be decreased by 1, and the rank of that check node will be decreased by at most 1, thus giving $P(C_{\alpha}=1|C_{i}=1,C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})\geq P(C_{% \alpha}=1|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})$ . Since $\alpha$ is an arbitrary element in $\mathcal{J}$ , we can obtain (b) from the induction assumption. Thus we have shown that (9) holds for all $\mathcal{J}\subseteq[1,n]\backslash\{i\}$ , where $|\mathcal{J}|=k+1$ . This completes the proof. ∎

With the help of Lemma 1, we can prove an interesting result that the decodable probability of a variable node is bounded by the decodable probability of its neighboring check nodes. Sufficient conditions for achieving the bounds can be expressed in terms of the dependence relations among the check nodes. Specifically, the upper bound is achieved when all check nodes are mutually independent; the lower bound is achieved when the check nodes are completely dependent. For $n$ check nodes, we define the complete dependence as

P\left(C_{1}=C_{2}=...=C_{n}\right)=1.

(10)

We also define the mutual independence as

P(C_{1}=x_{1},...,C_{n}=x_{n})=P(C_{1}=x_{1})\cdots P(C_{n}=x_{n}),

(11)

for all $(x_{1},x_{2},...,x_{n})\in\{0,1\}^{n}$ .

Theorem 1

For an arbitrary variable node $\tilde{V}$ with $n$ neighboring check nodes, where $n\geq 1$ , for $i\in[1,n]$ , the decodable probabiliy of $\tilde{V}$ is bounded by

1-\min_{i}P(C_{i}=0)\leq P(V=1)\leq 1-\prod_{k=1}^{n}P(C_{k}=0).

(12)

Proof:

We see that $\tilde{V}$ is decodable if and only if at least one of its neighboring check nodes becomes decodable. Then

	$\displaystyle P(V=1)$
	$\displaystyle=1-P(C_{1}=0,C_{2}=0,...,C_{n}=0)$
	$\displaystyle=1-P(C_{1}=0)\prod_{k=2}^{n}P(C_{k}=0\|C_{1}=0,...,C_{k-1}=0).$

The lower bound in (12) is obvious as $\prod_{k=2}^{n}P(C_{k}=0|C_{1}=0,...,C_{k-1}=0)\leq 1$ , and equality is achieved when all the neighboring check nodes are completely dependent, giving $\prod_{k=2}^{n}P(C_{k}=0|C_{1}=0,...,C_{k-1}=0)=1$ . The upper bound is proved by invoking Lemma 1 as follows:

\begin{cases}P(C_{2}=0)&\leq P(C_{2}=0|C_{1}=0)\\ P(C_{3}=0)&\leq P(C_{3}=0|C_{1}=0,C_{2}=0)\\ &\cdots\\ P(C_{n}=0)&\leq P(C_{n}=0|C_{1}=0,...,C_{n-1}=0)\\ \end{cases}

	$\displaystyle\Rightarrow$	$\displaystyle\prod_{k=1}^{n}P(C_{k}=0)$
		$\displaystyle\leq P(C_{1}=0)\prod_{k=2}^{n}P(C_{k}=0\|C_{1}=0,...,C_{k-1}=0).$

The upper bound is achievable when the check nodes are mutually independent, which gives $\prod_{k=2}^{n}P(C_{k}=0|C_{1}=0,...,C_{k-1}=0)=\prod_{k=2}^{n}P(C_{k}=0)$ . ∎

The proof shows that independence and complete dependence are sufficient conditions to achieve the upper bound and the lower bound of the decodable probability, respectively.

The intuition underlying Theorem 1 is as follows. If the neighboring check nodes are mutually independent, each check node has an independent contribution to the decodability of the variable node, thus achieving the maximum decodable probability. On the other hand, if any of the check nodes are completely dependent, they can be reduced to a single check node, effectively reducing the contribution to the variable node. Thus, we achieve the minimum decodable probability if all check nodes are completely dependent.

IV-C Implication of Asymptotic Assumption in Dependence

To visualize the physical implication of the asymptotic assumption, we can use a result from the tree analysis [18], which states that when $K$ is sufficiently large, the subgraph expanded from each variable node in the Tanner graph, including all the nodes within its $l$ -neighborhood, converges to a tree. Therefore, we can convert a Tanner graph into a tree with $l+1$ levels, where the root is a variable node, as shown in Fig. 5. According to [18], the BP decoding can be applied to the tree level by level from the leaves, and the tree is considered decodable if the root is decodable.

In fact, the tree is the least dependent structure when the number of CNs and VNs are fixed. For example, in Fig. 5, the root V0 has two information propagation paths, p1 and p2, where information that passes through them is independent of each other due to the acyclic nature of a tree. Since the two subtrees of V0 are independent of each other, the upper bound in (12) is achieved for V0 according to Theorem 1. On the other hand, if we add an edge between the two subtrees, for example, an edge between V1 and C2, which introduces a cycle, the information that passes through p1 and p2 will be dependent. If enough edges are added between the original two subtrees, causing complete dependence, the lower bound in (12) will be achieved. Notably, if edges are added between the two subtrees, the tree assumption will also be violated.

In finite-length cases, if the BATS code is constructed randomly, there can be many cycles in the Tanner graph, especially when $K$ is moderately small, thus leading to a low decoding rate. However, if we can construct the code with a deterministic structure that reduces the dependence, we can guarantee a better decoding rate than the random BATS.

V Structured BATS Code

Recall that the random BATS code construction relies on sampling a degree distribution to determine the number of edges for each check node and then randomly connecting these edges to the variable nodes. Even though the optimized degree distribution ensures the BATS code’s asymptotic decodability, the random connection fails to control the structure and the check node dependence, which we have shown to have a significant impact on the decoding rate. Additionally, random connection is difficult to implement in hardware.

Therefore, we propose that the BATS code should be constructed in a more structured way. To this end, we design a structured BATS code that is constructed from a base graph using only lightweight cyclic-shift operations, thus called the Cyclic-Shift BATS code (CS-BATS). This section shows that the CS-BATS code can achieve better performance and lower computation and implementation complexity with a properly designed base graph. Furthermore, the CS-BATS code satisfies the necessary decodability condition and preserves all the desirable properties of the random BATS code, such as the rateless property, the equal protection property, etc.

V-A Construction of Batches

Formally, we use the bi-adjacency matrix to represent a Tanner graph, where the columns represent the variable nodes and the rows represent the check nodes. For a Tanner graph $\mathcal{T}$ representing a BATS code of $K$ input symbols, we can write $\mathcal{T}=\{t_{0},t_{1},...,t_{k},...\}$ , where $t_{k}$ is the $k$ -th row vector in $\mathcal{T}$ and $t_{k}\in\{0,1\}^{1\times K}$ .

Definition 1

(Base Graph) The protograph used to construct the code is called the base graph, denoted by $\mathcal{G}=\{g_{0},g_{1},...,g_{i},...g_{m-1}\}$ , where $g_{i}\in\{0,1\}^{1\times K},0\leq i\leq m-1$ .

Definition 2

(Layer) The group of check nodes generated from the base graph with the same number of cyclic shifts is called a layer. Specifically, if a check node is generated with $n$ cyclic shifts, we say it belongs to the $n$ -th layer. The check nodes of the base graph form the $0$ -th layer.

Algorithm 1 Cyclic-Shift BATS Code Encoding

1:Input

\mathcal{G}=\{g_{0},...,g_{m-1}\}

2:Output

\mathcal{T}=\{t_{0},...,t_{N-1}\}

for

N

batches

3:for

n

\{0,1,...,N-1\}

t_{i}=g_{(i\bmod{m})}\ggg\lfloor{i/m}\rfloor

5:end for

Algorithm 1 describes the proposed procedure to construct a BATS code of $N$ batches from a base graph. Check nodes are constructed from the rows in the base graph using right cyclic-shift operations (denoted by “ $\ggg$ ”). Notice that the number of variable nodes is the same in $\mathcal{G}$ and $\mathcal{T}$ , but $\mathcal{G}$ has much fewer check nodes than $\mathcal{T}$ . This procedure is illustrated in Fig. 6.

V-B Bounded Complexity

In hardware implementations, data originally stored in a large memory is usually moved in small pieces to a faster buffering memory like on-chip registers for further computations. In the context of the BATS code, input symbols are selected and moved to the buffer for linear combinations. As illustrated in Fig. 2a, when the input symbol selection is random and determined on the fly, a large buffer is needed to accommodate the maximum possible degree, thus giving a buffering complexity of $\mathcal{O}(K)$ , where $K$ is the total number of input symbols. Additionally, for each position of the buffer, any inputs can be selected and routed to this position, which leads to a routing complexity of $\mathcal{O}(K^{2})$ .

On the other hand, if the input symbol selection is structured like the CS-BATS code, we only need to allocate a buffer with an appropriate size according to the row degrees of the base graph. Therefore, the buffering complexity is $\mathcal{O}(d)$ , where $d$ is the maximum degree of the base graph. Additionally, as the position of selected input symbols is pre-determined, we can just move input symbols to the buffer from a fixed position and cyclic-shift all input symbols in the memory. This gives a routing complexity of $\mathcal{O}(d)$ .

For example, for a random BATS code, when $K=256$ , the maximum degree is also $256$ , which means that up to 256 packets could be moved from the data storage to the computation buffer. In hardware implementation, data storage usually consists of a large off-chip memory connected to the computation circuits through interconnections. The computation buffers are usually fast on-chip registers that make parallel processing of data possible. Therefore, we need enough buffers to accommodate the maximum number of packets, which is 256 in this example, even though the maximum degree is sampled with a small probability. If each packet consists of $256$ symbols from $GF(2^{8})$ , $64$ KBytes of buffers are required. In comparison, the maximum degree of a CS-BATS code is determined by the base graph, which is typically much smaller. For example, in Section V-E, our experiments use a base graph with a maximum degree of $27$ , which requires only $6.75$ KBytes of buffers on hardware.

Since packets are randomly selected in the random BATS, every packet could be selected and routed to every register buffer through a multiplexer as shown in Fig. 2, which is a fully connected circuit with $256\times 256$ wires that connect the data storage to the buffers. However, if the structure is deterministic, as for the CS-BATS, the packets can be easily routed to the buffer with cyclic shift operations, requiring only $27$ wires in our example.

In FPGA, off-chip data is usually accessed by addressing a Direct Memory Access (DMA) Unit. Random data access caused by the random BATS gives the worst-case performance, while the inherent structured access patterns in the CS-BATS can achieve the highest throughput through bursting, coalescing, and other optimization techniques [49].

As we can see, the implementation complexity of CS-BATS, determined by the base graph, is significantly reduced compared with the random BATS, especially when $d\ll K$ . Therefore, with the CS-BATS code, the tradeoff between the complexity and the code performance can be controlled by the design of the base graph. It does not rely on reducing $K$ to reduce the complexity, which is important for adapting the implementation to different devices without affecting the code performance.

V-C Preserved Properties

According to Algorithm 1, an unlimited number of batches can be generated with light-weight cyclic-shift operations, which preserves the rateless property of the BATS code. This is important for the adaptability of the BATS code for achieving reliable communication in a network with unknown or changing channel conditions.

The random connection in the traditional BATS code ensures that each input symbol has the same probability of being selected by a check node. The CS-BATS code also preserves this property. The empirical probability of each variable node being selected converges to the same value as the number of check nodes goes to infinity because the selection position is shifted cyclically on all input symbols.

V-D Base Graph Design

Based on our previous analysis, this section proposes two empirical conditions for a good base graph.

Convergence to an optimized degree distribution. The row degrees of the base graph should be chosen according to the optimized degree distribution. Specifically, we first take the normalized probability, $\Psi^{\prime}$ , of the $m$ largest probability masses of the optimized degree distribution $\Psi$ , which is obtained by minimizing $\theta$ in (4a) such that the conditions in (4b) and (4c) are satisfied. Assume that $\Psi$ is in descending order of the probability masses. Then

\Psi^{\prime}_{d}=\frac{\Psi_{d}}{\sum^{m}_{j=1}\Psi_{j}}.

(13)

Note that when $m\rightarrow\infty$ , $\Psi^{\prime}_{d}\rightarrow\Psi_{d}$ . Let us now consider designing the base graph $\mathcal{G}$ with an aim to mimic $\Psi^{\prime}_{d}$ . Let

\gamma_{d}=\#\{g\in\mathcal{G}|\textrm{deg}(g)=d\},

(14)

where $\textrm{deg}(g)$ is the degree of row $g$ . Then in order to mimic $\Psi^{\prime}_{d}$ , for $1\leq d\leq m$ , $\gamma_{d}$ should be chosen such that

\begin{cases}\sum^{K}_{d=1}\gamma_{d}=m\\ \gamma_{d}=\lceil m\Psi^{\prime}_{d}\rceil\ \textrm{or}\ \lfloor m\Psi^{\prime% }_{d}\rfloor.\end{cases}

(15)

Condition (15) suggests that the degrees of the base graph should be designed to match the largest $m$ probability masses in the optimized degree distribution. This ensures the necessary condition (4b) for the decodability of the BATS code to be satisfied.

TABLE I: Numerical Results. Average results from 2000 instances of randomly generated graphs are reported for the Random BATS. Each experiment is repeated 10 times. Average results from 500 repetitions using the designed base graph are reported for the CS-BATS. Higher decoding rates are highlighted.

			Number of Batches (code rate) (10 hops)					Number of Hops (20 batches)
			16 (1.0)	18 (1.125)	20 (1.25)	22 (1.375)	24 (1.5)	1	5	10	15	20
Decoding Rate	Inac	CS	0.37 $\pm 0.09$	0.63 $\pm 0.07$	0.76 $\pm 0.03$	0.90 $\pm 0.06$	0.94 $\pm 0.03$	0.84 $\pm 0.04$	0.78 $\pm 0.03$	0.76 $\pm 0.03$	0.75 $\pm 0.03$	0.74 $\pm 0.04$
	Inac	Rand	0.33 $\pm 0.17$	0.44 $\pm 0.20$	0.63 $\pm 0.19$	0.84 $\pm 0.12$	0.92 $\pm 0.07$	0.82 $\pm 0.11$	0.71 $\pm 0.14$	0.63 $\pm 0.19$	0.57 $\pm 0.20$	0.52 $\pm 0.21$
	BP	CS	0.35 $\pm 0.08$	0.57 $\pm 0.12$	0.75 $\pm 0.06$	0.89 $\pm 0.06$	0.93 $\pm 0.03$	0.84 $\pm 0.04$	0.78 $\pm 0.04$	0.75 $\pm 0.06$	0.71 $\pm 0.12$	0.67 $\pm 0.015$
	BP	Rand	0.26 $\pm 0.15$	0.32 $\pm 0.18$	0.42 $\pm 0.20$	0.54 $\pm 0.22$	0.65 $\pm 0.21$	0.66 $\pm 0.15$	0.53 $\pm 0.18$	0.42 $\pm 0.20$	0.37 $\pm 0.20$	0.31 $\pm 0.19$
$\#$ Inact Symbols		CS	17 $\pm 2.8$	11 $\pm 2.0$	4.4 $\pm 1.4$	0.4 $\pm 0.7$	0.4 $\pm 0.7$	1 $\pm 1.2$	3.3 $\pm 1.5$	4.4 $\pm 1.4$	5.4 $\pm 1.5$	5.8 $\pm 1.6$
$\#$ Inact Symbols		Rand	43 $\pm 21$	30 $\pm 16$	18 $\pm 13$	8 $\pm 10$	4 $\pm 8$	10.5 $\pm 13$	16.4 $\pm 13$	17.9 $\pm 12$	20.7 $\pm 12$	23.2 $\pm 14$
$\#$ Edges		CS	257	285	324	362	388
$\#$ Edges		Rand	742 $\pm 128$	818 $\pm 157$	890 $\pm 165$	940 $\pm 162$	1018 $\pm 183$

TABLE II: The column degree design outperforms the random base graph. CS: randomly generated base graph; CS-CD: random base graphs with the column degree design. For each algorithm, 500 base graphs are randomly generated with the same degrees, and the average decoding rate using BP decoding is reported.

		Number of Batches (code rate) (10 hops)
		19	20	21	22	23	24	25	26	27	28
Decoding Rate	CS	0.70 $\pm 0.03$	0.73 $\pm 0.03$	0.76 $\pm 0.03$	0.77 $\pm 0.03$	0.78 $\pm 0.03$	0.79 $\pm 0.03$	0.80 $\pm 0.03$	0.83 $\pm 0.03$	0.83 $\pm 0.03$	0.85 $\pm 0.03$
Decoding Rate	CS-CD	0.74 $\pm 0.04$	0.79 $\pm 0.02$	0.83 $\pm 0.02$	0.84 $\pm 0.02$	0.85 $\pm 0.02$	0.86 $\pm 0.02$	0.88 $\pm 0.02$	0.89 $\pm 0.01$	0.90 $\pm 0.02$	0.91 $\pm 0.02$

Column degree design. After the row degrees are chosen, we need to connect the check nodes and variable nodes in the base graph accordingly, which, in effect, determines the number of neighboring check nodes for each variable node in the base graph. As the variable nodes are represented by the columns in the bi-adjacency matrix, we also refer to the number of neighboring check nodes for a variable node as the column degree.

In an ideal case, mutual independence among check nodes is desired to achieve the decoding rate upper bound. Even though this condition is usually too strong for building the complete Tanner graph, we can still obtain a graph with a “less dependent” structure by minimizing the dependence of check nodes in the base graph. Specifically, we will choose columns with the smallest column degrees for a given row degree, as shown in Algorithm 2, which leads to a base graph with balanced variable node connections. Notice that when there are multiple columns with the smallest column degree, a column will be picked randomly. A balanced variable node connectivity is needed to avoid large dependence between check nodes when the check node degrees are fixed.

Algorithm 2 Column degree design

1:Input Pre-determined degrees

\{d_{0},d_{1},...,d_{m-1}\}

2:Output Base graph

\mathcal{G}=\{g_{0},g_{1},...,g_{m-1}\}

3:// ensure variable node coverage

4:variable_idx =

[0,1,...,K-1]

5:for

i

[0,1,...,m-1]

6: idx

\leftarrow

the smallest

d_{i}

indices from variable_idx with the smallest column degrees.

g_{i}=0

g_{i}[\textrm{idx}]=1

9:end for

V-E Experiments

In this section, we will compare the CS-BATS code with the random BATS code in terms of performance, the number of edges, and the number of inactivation symbols. In $GF(2^{8})$ , we use a packet number $K$ of $256$ , a packet size $pk$ of $256$ , and a batch size $M$ of $16$ for the BATS code. A $10\%$ i.i.d. packet loss for the links is applied during the simulation. Randomly generated generator matrices from $GF(2^{8})$ will be used for all experiments. The optimized degree distributions $\Psi_{BP}$ and $\Psi_{inact}$ are obtained for BP decoding and inactivation decoding, respectively. For the random BATS code, we generated 2000 random instances of the code for each experiment and report the average results and standard deviation. Each instance was repeated 10 times in the simulation for reliable results.

For CS-BATS, we choose a base graph of dimension $7\times 256$ (i.e., $m=7$ , $K=256$ ) generated by Algorithm 2. Starting from the optimized degree distributions $\Psi_{BP}$ , we applied the method described in Section V-D to obtain the following degree design for the base graph:

\{11,12,14,14,19,20,27\},

which will be used in Algorithm 2. Notice that the number of rows in the base graph ( $m$ ) determines the size of the base graph. The implementation complexity increases as $m$ increases, approaching the complexity of the random BATS. Additionally, the design space of the CS-BATS increases as $m$ increases, thus potentially leading to the design of a better code. In the extreme case that $m$ is very large, we can pre-define the structure of every generated batch so that they are mutually independent. In practice, $m$ should be determined according to the implementation requirement. To demonstrate the impact of $m$ in the simulations, we generate a base graph with $m=8$ with the same procedure. The following row degrees are used:

\{11,12,14,14,16,19,20,27\}.

We will also show the effectiveness of Algorithm 2 in designing a good base graph by comparing the average decoding rate of 500 instances of randomly generated base graphs and the base graphs generated by Algorithm 2.

Decoding Rates. Recall that the decoding rate is defined as the number of decoded symbols divided by the number of input symbols, which is the most direct metric to measure the decoding effectiveness of a code under the erasure channel model. In practice, a fixed-rate precode can be added to ensure full recovery provided that the BATS code achieves a certain decoding rate [17]. However, we consider the BATS code without precoding here.

Two experiments were designed to compare the performance of the CS-BATS code and the random BATS code from different perspectives. In the first experiment, we fix the number of batches to 20 ( $\textrm{code rate}=MN/K=1.25$ ) and vary the number of hops. This experiment shows how the performance changes as the number of hops increases, which is of great practical interest for a BATS code. The second experiment fixes the number of hops to 10 and varies the number of batches, which investigates the change in performance as the number of batches $N$ changes. Specifically, we start from $N=16$ ( $\textrm{code rate}=1.0$ ), and end at $N=28$ ( $\textrm{code rate}=1.75$ ).

In Fig. 7 and Fig. 8, we plot the results for inactivation decoding and BP decoding, respectively. Part of the numeric results is also shown in Table I. From Fig. 7, we can see that the CS-BATS code retains a higher and more stable decoding rate for both inactivation decoding ( $0.84$ to $0.74$ ) and BP ( $0.83$ to $0.66$ ) as the number of hops increases from 1 to 20. In contrast, the decoding rate of the random BATS code degrades from $0.82$ to $0.51$ for inactivation decoding and from $0.66$ to $0.31$ for BP decoding as the hop increases. From Fig. 8, as the number of batches increases, the decoding rate of the CS-BATS also outperforms the decoding rate of the random BATS, especially when BP decoding is used. Even though the performance is close in the higher decoding rate region when inactivation decoding is used, fewer inactivation symbols are used by CS-BATS, leading to a smaller decoding complexity as shown in Table I.

By comparing Fig. 7a and Fig. 7b, and comparing Fig. 8a and Fig. 8b, we see that a significant performance gap exists between inactivation decoding and BP decoding for the random BATS. Thus, in practice, the random BATS code is usually decoded with inactivation decoding. However, this gap is greatly reduced by the CS-BATS code, making BP decoding a feasible choice for some resource-limited applications.

When comparing the performance of $m=7$ and $m=8$ for the CS-BATS, we observe that the latter outperforms the former in general, which confirms our analysis that a larger base graph offers more design choices. In order to make a fair comparison between $m=7$ and $m=8$ , we search among the base graphs generated using Algorithm 2 for the ones that contain the most balanced variable node connections after constructing the Tanner graph.

Number of Edges. In Table I, we also record the number of edges of the Tanner graph with different numbers of check nodes (batches), as the total number of edges is usually related to the encoding and decoding complexity. The average results are also reported across multiple instances of Tanner graphs. Generally, we can see that the number of edges increases with the number of batches. Graphs constructed with CS-BATS code have much fewer edges (around 30% of the random BATS), which are controlled by the degrees of the base graph.

Number of Inactivation Symbols. The number of inactivation symbols represents the computation needed in inactivation decoding [17]. Decoding processes with fewer inactivation symbols have less computation, leading to higher throughput and shorter delay. Therefore, we also compare the number of inactivation symbols used when decoding the graphs constructed by the CS-BATS with those constructed by the random BATS. As shown in Table I, the CS-BATS uses much fewer inactivation symbols than the random BATS, indicating a significant increase in computational efficiency.

Column Degree Design. In Table II, we compare the results obtained from randomly generated base graphs and those generated by Algorithm 2. Specifically, 500 instances of randomly generated base graphs and another 500 randomly generated base graphs with column degree designing were simulated with different numbers of batches. We observe a higher decoding rate for base graphs with column degree designing in Table II.

Algorithm 3 Layered Decoding

1:Input Received Tanner graph

\mathcal{T}

with

N

batches

2:Output Decoded packets

\{v_{0},v_{1},...\}

3:for layer

\mathcal{T}_{k}

\{\mathcal{T}_{0},\mathcal{T}_{1},...,\mathcal{T}_{\lceil\frac{N}{m}\rceil}\}

4: for

c

in the set of CNs in

\mathcal{T}_{k}

\textsc{BP}(c)

6: end for

7: for

v

in the set of undecoded VNs in

\mathcal{T}_{k}

8: Inactivate

v

9: for

c_{v}

in the set of neighboring CNs of

v

10:

\textsc{BP}(c_{v})

11: end for

12: end for

13: if linear constraint rank =

\#

inactivated symbols then

14: Solve inactivation symbols

15: Substitute into involved VNs

16: end if

17:end for

18:

19:procedure BP(

c

)

20: if rank(

c

)

=

deg(

c

) and

c

is unsolved then

21: Solve

c\rightarrow\{v_{k1},v_{k2},...\}

22: Collect linear constraints for inactivated symbols

23: for

c_{k}

in the set of neighboring CNs

\{c_{k1},c_{k2},...\}

24: Substitute

v_{k}

into

c_{k}

25:

\textsc{BP}(c_{k})

\triangleright

Call BP again

26: end for

27: end if

28:end procedure

VI Layered Decoding

As a further demonstration of how the CS-BATS reduces the hardware implementation complexity, we will propose a complexity-reduced and flexible decoding algorithm tailored for hardware.

The number of batches to be transmitted in a wireless communication session depends on the channel condition during that session. In particular, more batches are needed if the channel condition is poor. Despite the uncertainty on the number of arrived batches, an intuitive decoder implementation is to set a maximum processing ability, allocating enough hardware resources in advance, such as memory buffers for storing received batches and computation circuits for solving and substituting symbols. However, this limits the flexibility and decoding ability of the design. Ideally, the decoder should be able to accommodate a potentially unlimited number of batches, and the hardware consumption should be flexible and scalable. The decoding algorithm for CS-BATS to be discussed next addresses this issue.

The Tanner graph constructed by the CS-BATS can be naturally divided into layers according to the number of cyclic shifts. Motivated by this layered structure and inspired by a similar complexity-reducing decoding scheme in LDPC [50, 51, 52], we propose a layered decoding algorithm for the CS-BATS whose implementation complexity is bounded by the size of the base graph.

Consider Algorithm 3 for layered decoding for a base graph with $m$ rows. In this algorithm, BP and inactivation decoding is performed layer by layer. After the inactivation decoding consumes all the batches in a layer, the received packets of this layer can be discarded, and the decoding process will proceed to decode the next layer. Since each layer is decoded sequentially with the same procedure, the same hardware can be used for each layer. During the transition from one layer to another, the only growth in memory consumption is due to the linear constraints collected during the BP decoding from the previous layer. Generally, the total number of linear constraints that need to be stored depends on the number of inactivation symbols, which is bounded by $\mathcal{O}(K)$ . The previous section shows that the CS-BATS uses only a few inactivation symbols. Thus, the memory consumption caused by the linear constraints is mild.

Notice that Algorithm 3 can also be used by the random BATS code. However, the CS-BATS with embedded layer structure can fully exploit this algorithm with hardware reuse and low implementation complexity.

As discussed in Section V-B, the encoding complexity of the CS-BATS depends only on the base graph. The same applies to the complexity of layered decoding. Thus, the implementation complexity of decoding can also be controlled by the design of the base graph. This leads to higher flexibility and better scalability in hardware design, which is essential for adapting the implementation to various grades of devices.

VII Discussion and Conclusion

This paper analyzes the influence of check node dependence on the decoding rate of a BATS code. We show that the check node dependence degrades the performance of a BATS code, especially when the number of input symbols is finite. Based on the analysis, we propose constructing the code in a more structured way instead of relying on random connections. As an example, a structured BATS code called the Cyclic-Shift BATS is presented, which controls the check node dependence and introduces a new design dimension. Conditions supported by empirical experiments are given for designing a well-structured base graph.

We further demonstrate that the CS-BATS code achieves a significantly higher decoding rate and more stable performance with a smaller and controllable complexity compared with the random BATS. Furthermore, we propose a layered decoding algorithm that exploits the layered structure of the CS-BATS code. The implementation complexity of this algorithm is bounded by the size of the base graph.

The goal of this paper is to present a novel class of BATS codes, the structured BATS, and demonstrate that its performance can be superior to the random BATS when its structure is well-designed. The CS-BATS is a possible instance of this class of codes, which is designed especially for efficient hardware implementations. Further research is necessary to design different variants of the structure BATS tailored for different purposes. For example, storing the base graph when the code length is large may incur significant overhead. In that case, more flexible construction methods could be used to construct the base graph by lifting a smaller graph.

VIII Acknowledgement

The authors would like to thank Prof. Philip H. W. Leong, Prof. Kin Hong Lee, Hoover H. F. Yin, Yulin Chen, and Fangwei Ye for the useful discussions and valuable suggestions on this work.

References

[1] B. Rong, “6G: The next horizon: From connected people and things to connected intelligence,” IEEE Wireless Communications, vol. 28, no. 5, pp. 8–8, 2021.
[2] M. Z. Chowdhury, M. Shahjalal, S. Ahmed, and Y. M. Jang, “6G wireless communication systems: Applications, requirements, technologies, challenges, and research directions,” IEEE Open Journal of the Communications Society, vol. 1, pp. 957–975, 2020.
[3] J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-air-ground integrated network: A survey,” IEEE Communications Surveys & Tutorials, vol. 20, no. 4, pp. 2714–2741, 2018.
[4] M. Majid, S. Habib, A. R. Javed, M. Rizwan, G. Srivastava, T. R. Gadekallu, and J. C.-W. Lin, “Applications of wireless sensor networks and Internet of things frameworks in the industry revolution 4.0: A systematic literature review,” Sensors, vol. 22, no. 6, p. 2087, 2022.
[5] H. Bagheri, M. Noor-A-Rahim, Z. Liu, H. Lee, D. Pesch, K. Moessner, and P. Xiao, “5G NR-V2X: Toward connected and cooperative autonomous driving,” IEEE Communications Standards Magazine, vol. 5, no. 1, pp. 48–54, 2021.
[6] S. Yang, H. H. Yin, R. W. Yeung, X. Xiong, Y. Huang, L. Ma, M. Li, and C. Tang, “On scalable network communication for infrastructure-vehicle collaborative autonomous driving,” IEEE Open Journal of Vehicular Technology, vol. 4, pp. 310–324, 2022.
[7] Z. Ma, M. Xiao, Y. Xiao, Z. Pang, H. V. Poor, and B. Vucetic, “High-reliability and low-latency wireless communication for Internet of things: Challenges, fundamentals, and enabling technologies,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7946–7970, 2019.
[8] H. Alghafari and M. S. Haghighi, “Decentralized joint resource allocation and path selection in multi-hop integrated access backhaul 5G networks,” Computer Networks, vol. 207, p. 108837, 2022.
[9] D. C. Salyers, A. D. Striegel, and C. Poellabauer, “Wireless reliability: Rethinking 802.11 packet loss,” in 2008 International Symposium on a World of Wireless, Mobile and Multimedia Networks. IEEE, 2008, pp. 1–4.
[10] E. Tanghe, W. Joseph, L. Verloock, and L. Martens, “Evaluation of vehicle penetration loss at wireless communication frequencies,” IEEE Transactions on Vehicular Technology, vol. 57, no. 4, pp. 2036–2041, 2008.
[11] A. M. Al-Jubari, M. Othman, B. Mohd Ali, and N. A. W. Abdul Hamid, “TCP performance in multi-hop wireless ad hoc networks: challenges and solution,” EURASIP Journal on Wireless Communications and Networking, vol. 2011, no. 1, pp. 1–25, 2011.
[12] C. Fischione, M. Butussi, K. H. Johansson, and M. D’angelo, “Power and rate control with outage constraints in CDMA wireless networks,” IEEE Transactions on Communications, vol. 57, no. 8, pp. 2225–2229, 2009.
[13] R. Fantacci, D. Marabissi, D. Tarchi, and I. Habib, “Adaptive modulation and coding techniques for OFDMA systems,” IEEE Transactions on Wireless Communications, vol. 8, no. 9, pp. 4876–4883, 2009.
[14] Y. Tian, K. Xu, and N. Ansari, “TCP in wireless environments: problems and solutions,” IEEE Communications Magazine, vol. 43, no. 3, pp. S27–S32, 2005.
[15] T. Kanematsu, K. Sanada, Z. Li, T. Pei, Y.-J. Choi, K. Nguyen, and H. Sekiya, “Throughput and delay analysis for IEEE 802.11 multi-hop networks considering data rate,” International Journal of Distributed Sensor Networks, vol. 16, no. 9, p. 1550147720959262, 2020.
[16] S. Yang and R. W. Yeung, “Coding for a network coded fountain,” in 2011 IEEE International Symposium on Information Theory (ISIT). IEEE, 2011, pp. 2647–2651.
[17] ——, “Batched sparse codes,” IEEE Transactions on Information Theory, vol. 60, no. 9, pp. 5322–5346, 2014.
[18] S. Yang and Q. Zhou, “Tree analysis of BATS codes,” IEEE Communications Letters, vol. 20, no. 1, pp. 37–40, 2015.
[19] S. Yang, T.-C. Ng, and R. W. Yeung, “Finite-length analysis of BATS codes,” IEEE Transactions on Information Theory, vol. 64, no. 1, pp. 322–348, 2017.
[20] X. Zhang, H. Jiang, L. Zhang, C. Zhang, Z. Wang, and X. Chen, “An energy-efficient ASIC for wireless body sensor networks in medical applications,” IEEE Transactions on Biomedical Circuits and Systems, vol. 4, no. 1, pp. 11–18, 2009.
[21] F. Karray, M. W. Jmal, M. Abid, M. S. BenSaleh, and A. M. Obeid, “A review on wireless sensor node architectures,” in 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC). IEEE, 2014, pp. 1–8.
[22] P. Hailes, L. Xu, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “A survey of FPGA-based LDPC decoders,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1098–1122, 2015.
[23] Y. Fang, G. Bi, Y. L. Guan, and F. C. Lau, “A survey on protograph LDPC codes and their applications,” IEEE Communications Surveys & Tutorials, vol. 17, no. 4, pp. 1989–2016, 2015.
[24] D. S. Lun, M. Médard, R. Koetter, and M. Effros, “On coding for reliable communication over packet networks,” Physical Communication, vol. 1, no. 1, pp. 3–20, 2008.
[25] Y. Wu, “A trellis connectivity analysis of random linear network coding with buffering,” in 2006 IEEE International Symposium on Information Theory. IEEE, 2006, pp. 768–772.
[26] A. F. Dana, R. Gowaikar, R. Palanki, B. Hassibi, and M. Effros, “Capacity of wireless erasure networks,” IEEE Transactions on Information Theory, vol. 52, no. 3, pp. 789–804, 2006.
[27] K. Mahdaviani, M. Ardakani, H. Bagheri, and C. Tellambura, “Gamma codes: A low-overhead linear-complexity network coding solution,” in 2012 International Symposium on Network Coding (NetCod). IEEE, 2012, pp. 125–130.
[28] B. Tang and S. Yang, “An ldpc approach for chunked network codes,” IEEE/ACM Transactions on Networking, vol. 26, no. 1, pp. 605–617, 2018.
[29] B. Tang, S. Yang, Y. Yin, B. Ye, and S. Lu, “Expander graph based overlapped chunked codes,” in 2012 IEEE International Symposium on Information Theory Proceedings. IEEE, 2012, pp. 2451–2455.
[30] S. Yang, J. Ma, and X. Huang, “Multi-hop underwater acoustic networks based on bats codes,” in Proceedings of the 13th International Conference on Underwater Networks & Systems, 2018, pp. 1–5.
[31] X. Xu, Y. Zeng, Y. L. Guan, and L. Yuan, “Expanding-window bats code for scalable video multicasting over erasure networks,” IEEE Transactions on Multimedia, vol. 20, no. 2, pp. 271–281, 2017.
[32] Z. Huakai, D. Guangliang, and L. Haitao, “Simplified bats codes for deep space multihop networks,” in 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference. IEEE, 2016, pp. 311–314.
[33] R. Tanner, “A recursive approach to low complexity codes,” IEEE Transactions on information theory, vol. 27, no. 5, pp. 533–547, 1981.
[34] R. Gallager, “Low-density parity-check codes,” IRE Transactions on information theory, vol. 8, no. 1, pp. 21–28, 1962.
[35] D. J. MacKay and R. M. Neal, “Near Shannon limit performance of low density parity check codes,” Electronics letters, vol. 33, no. 6, pp. 457–458, 1997.
[36] D. J. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE transactions on Information Theory, vol. 45, no. 2, pp. 399–431, 1999.
[37] R. M. Tanner, “On quasi-cyclic repeat-accumulate codes,” in Proceedings of the Annual Allerton Conference on Communication Control and Computing, vol. 37. Citeseer, 1999, pp. 249–259.
[38] R. Smarandache and P. O. Vontobel, “Quasi-cyclic LDPC codes: Influence of proto-and tanner-graph structure on minimum hamming distance upper bounds,” IEEE Transactions on Information Theory, vol. 58, no. 2, pp. 585–607, 2012.
[39] X. Wu, X. You, and C. Zhao, “A necessary and sufficient condition for determining the girth of quasi-cyclic LDPC codes,” IEEE Transactions on Communications, vol. 56, no. 6, pp. 854–857, 2008.
[40] M. Karimi and A. H. Banihashemi, “On the girth of quasi-cyclic protograph LDPC codes,” IEEE transactions on information theory, vol. 59, no. 7, pp. 4542–4552, 2013.
[41] J. Li, K. Liu, S. Lin, and K. Abdel-Ghaffar, “Algebraic quasi-cyclic LDPC codes: Construction, low error-floor, large girth and a reduced-complexity decoding scheme,” IEEE Transactions on communications, vol. 62, no. 8, pp. 2626–2637, 2014.
[42] J. Qing, H. H. Yin, and R. W. Yeung, “Enhancing the decoding rates of BATS codes by learning with guided information,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 37–42.
[43] D. J. MacKay, “Fountain codes,” IEE Proceedings-Communications, vol. 152, no. 6, pp. 1062–1068, 2005.
[44] T. Ho, M. Médard, R. Koetter, D. R. Karger, M. Effros, J. Shi, and B. Leong, “A random linear network coding approach to multicast,” IEEE Transactions on Information Theory, vol. 52, no. 10, pp. 4413–4430, 2006.
[45] H. H. Yin, K. H. Ng, A. Z. Zhong, R. W. Yeung, S. Yang, and I. Y. Chan, “Intrablock interleaving for batched network coding with blockwise adaptive recoding,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 4, pp. 1135–1149, 2021.
[46] H. H. Yin, B. Tang, K. H. Ng, S. Yang, X. Wang, and Q. Zhou, “A unified adaptive recoding framework for batched network coding,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 4, pp. 1150–1164, 2021.
[47] H. H. Yin, H. W. Wong, M. Tahernia, and J. Qing, “Packet size optimization for batched network coding,” in 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022, pp. 1584–1589.
[48] B. Dai, S. Ding, and G. Wahba, “Multivariate Bernoulli distribution,” Bernoulli, pp. 1465–1483, 2013.
[49] Y.-k. Choi, Y. Chi, W. Qiao, N. Samardzic, and J. Cong, “Hbm connect: High-performance hls interconnect for fpga hbm,” in The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021, pp. 116–126.
[50] D. E. Hocevar, “A reduced complexity decoder architecture via layered decoding of LDPC codes,” in IEEE Workshop onSignal Processing Systems, 2004. SIPS 2004. IEEE, 2004, pp. 107–112.
[51] S. Kim, G. E. Sobelman, and H. Lee, “A reduced-complexity architecture for ldpc layered decoding schemes,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 6, pp. 1099–1103, 2010.
[52] K. Zhang, X. Huang, and Z. Wang, “High-throughput layered decoder implementation for quasi-cyclic ldpc codes,” IEEE Journal on Selected Areas in Communications, vol. 27, no. 6, pp. 985–994, 2009.

	$\displaystyle P(C_{i}=1\|C_{j}=0,j\in\mathcal{J})$
	$\displaystyle=\frac{P(C_{i}=1,C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle=\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}{P(% C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad-\frac{P(C_{i}=1,C_{\alpha}=1\|C_{j}=0,j\in\mathcal{J}% \backslash\{\alpha\})}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})}$
	$\displaystyle=\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}{P(% C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad-\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\}% )}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad\times P(C_{\alpha}=1\|C_{i}=1,C_{j}=0,j\in\mathcal{J}% \backslash\{\alpha\})$
	$\displaystyle\mathrel{\overset{\makebox[0.0pt]{\mbox{\tiny(a)}}}{\leq}}\frac{P% (C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}{P(C_{\alpha}=0\|C_{j}=0,% j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad-\frac{P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\}% )}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})}$
	$\displaystyle\quad\times P(C_{\alpha}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})$
	$\displaystyle=P(C_{i}=1\|C_{j}=0,j\in\mathcal{J}\backslash\{\alpha\})$
	$\displaystyle\quad\times\frac{1-P(C_{\alpha}=1\|C_{j}=0,j\in\mathcal{J}% \backslash\{\alpha\})}{P(C_{\alpha}=0\|C_{j}=0,j\in\mathcal{J}\backslash\{% \alpha\})}$
	$\displaystyle=P(C_{i}=1\|C_{j}=0,\ j\in\mathcal{J}\backslash\{\alpha\})$
	$\displaystyle\mathrel{\overset{\makebox[0.0pt]{\mbox{\tiny(b)}}}{\leq}}P(C_{i}% =1),$