Design and Analysis of Massive Uncoupled Unsourced Random Access with Bayesian Joint Decoding

Feiyan Tian, Xiaoming Chen, Yong Liang Guan, and Chau Yuen Feiyan Tian and Xiaoming Chen are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (e-mail: {tian_feiyan, chen_xiaoming}@zju.edu.cn). Yong Liang Guan and Chau Yuen are with the school of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail: {eylguan, chau.yuen}@ntu.edu.sg).

Abstract

In this paper, we investigate unsourced random access for massive machine-type communications (mMTC) in the sixth-generation (6G) wireless networks. Firstly, we establish a high-efficiency uncoupled framework for massive unsourced random accesss without extra parity check bits. Then, we design a low-complexity Bayesian joint decoding algorithm, including codeword detection and stitching. In particular, we present a Bayesian codeword detection approach by exploiting Bayes-optimal divergence-free orthogonal approximate message passing in the case of unknown priors. The output long-term channel statistic information is well leveraged to stitch codewords for recovering the original message. Thus, the spectral efficiency is improved by avoiding the use of parity bits. Moreover, we analyze the performance of the proposed Bayesian joint decoding-based massive uncoupled unsourced random access scheme in terms of computational complexity and error probability of decoding. Furthermore, by asymptotic analysis, we obtain some useful insights for the design of massive unsourced random access. Finally, extensive simulation results confirm the effectiveness of the proposed scheme in 6G wireless networks.

Index Terms:

6G, mMTC, unsourced random access, Bayesian joint decoding.

I Introduction

The sixth-generation (6G) wireless networks are expected to provide various machine-type communication services for Internet-of-Things (IoT) [1, 2, 3]. Generally, machine-type communications in 6G wireless networks have the demands of massive connectivity and small payload. It is predicted that by 2030, the number of IoT devices will approach hundreds of billions and the length of a message is usually tens of bits. For such massive machine-type communications (mMTC), applying traditional grant-based random access schemes may lead to exceedingly high access latency and prohibitive signaling overhead [4]. To address these challenging issues, it is imperative to introduce grant-free random access schemes for supporting massive access according to the characteristics of mMTC in 6G wireless networks [5, 6, 7].

As a new grant-free random access scheme, unsourced random access overcomes the disadvantages of sourced grant-free random access, i.e., the leading process of acquisition of the device activity information and channel state information has been cancelled, such that the end-to-end delay is decreased and the spectral efficiency is increased [8, 9]. To be specific, in the unsourced random access protocol, the base station (BS) only focuses on the recovery of transmitted messages, but is not concerned about the identities and channel states of active devices which sent the messages [10, 11, 12]. In this context, sending unique long pilots in advance is not necessary for device detection and channel estimation, which reduces the consumption of wireless resources significantly in the case of massive connectivity. In fact, the biggest superiority of unsourced random access over sourced random access is that the millions of individual codebooks of devices are avoided [8]. Hence, all devices can share the same codebook in unsourced random access. The message of each active device is transmitted over uplink channel after map** to a codeword from this common codebook. Subsequently, the BS performs codeword activity detection and recovers the list of transmitted codewords. Note that a codebook needs to contain at least $2^{b}$ codewords to represent $b$ -bit message. As the length of the message $b$ increases, the size of the codebook $2^{b}$ grows exponentially, resulting in unbearable computational complexity for codeword activity detection even with short message of tens of bits. In order to reduce the computational complexity for codeword activity detection, a divide-and-send approach is introduced and each segmentation is independently sent based on a small codebook [13].

Recently, the authors of [9] proposed a low-complexity scheme for unsourced random access based on a coupled compressed sensing problem. In their scheme, the message transmission slot is divided into several sub-slots and the message fragments are transmitted by each active device across these sub-slots. Specifically, at the sender, the message is first divided into multiple fragments by a outer encoder with redundancy attached to form fixed-length sub-blocks. Then an inner encoder maps each sub-block to a codeword via a common codebook matrix. At the BS, the inner decoder identifies which codewords have been transmitted and a tree-based outer decoder connects the decoded sub-blocks to recover original long messages according to the redundant parity. Based on the proposed framework in [9], the authors of [14] extended the system model from a single-antenna receiver to a large-scale receive antenna array, which widens application scenarios of coupled unsourced random access. In [15], approximate meessage passing (AMP) was exploited as an inner decoder to realize codeword activity detection. Additionally, codeword activity detection of unsourced random access in the inner decoder was formulated as a non-Bayesian maximum likelihood (ML) problem in [16]. It is proved that this scheme has better stability over the Bayesian AMP schemes. In [17], a beam-space tree decoder was proposed to improve the decoding performance by exploiting the beam division property. On the other hand, the authors of [18] improved the unsourced random access framework in [9] by letting the inner and the outer decoder cooperate and passing information back and forth to enhance the performance of coded compressed sensing.

Meanwhile, in [19], the message of each active device was just divided into two parts. The first part is to determine the transmission rule and the second part is encoded by low density parity check (LDPC) code. The involved sparse joint Tanner graph can provide some improvements in performance. Similarly, the scheme that some message bits were mapped to pilot and spreading sequences and the other bits were processed by polar code and QPSK modulation was studied in [35], providing good solutions for supporting a large number of active UEs with finite block length. In addition, the authors of [36] also proposed an elegant semi-blind detection framework based on bilinear generalized approximate message passing algorithm, which can support both sourced and unsourced random access.

Although the redundancy introduced by coupled unsourced random access is limited compared with the pilot sequences in the sourced random access, in order to ensure the correctness and uniqueness of output of the outer decoder, these additional redundant parity bits usually occupy half or more of each sub-block, resulting in low spectral efficiency of the system, which is intolerable under the shortage of wireless resources. In this context, a distinct uncoupled compressed sensing-based unsourced random access scheme was proposed in [20]. At the sender, the message is not attached to the check bits and the tree-based outer decoder is changed into a clustering decoder, where the disordered sub-blocks are connected by leveraging the inherent correlations of instantaneous channels of the same active device across sub-slots. Further, the authors of [21] studied the unsourced random access scheme by exploiting the angular domain sparsity of channel based on this uncoupled framework, which has a better error performance with high spectral efficiency.

As a matter of fact, codeword connection by clustering instantaneous channels may be prone to errors because the channel state is not invariable in a long time slot. Intuitively, it is more reasonable to assume that statistic information of channel is constant in a certain long time slot. Hence, it makes sense to exploit the statistics of channel to implement the codeword stitching. Besides, the scheme in [21] is limited to 3D channel modeling in angle domain, leading to higher implementation complexity. In other words, existing methods trade complexity for the decoding performance of unsourced random access. In addition, to the best of authors’ knowledge, the error performance analysis of massive unsourced random access is still an open issue.

To improve the performance of unsourced random access schemes, we aim to provide a high-efficiency and low-complexity solution for uncoupled unsourced random access in 6G wireless networks. The contributions of this paper are as follows:

1.

We provide a high-efficiency unsourced random access scheme inspired by the recent uncoupled framework in which parity bits are avoided but channel characteristics are exploited for codeword concatenation.
2.

We design a low-complexity Bayesian joint decoding algorithm, which implements codeword detection in the case of unknown priors and codeword stitching with the assistance of channel statistics.
3.

We analyze the overall performance of the proposed Bayesian joint decoding-based uncoupled unsourced random access scheme, and confirm its low complexity and high reliability.
4.

We obtain some useful insights by asymptotic analysis and prove that the error probability of codeword detection tends to zero by increasing the number of BS antennas and transmit power.

The rest of this paper is organized as follows: Section II introduces the uncoupled unsourced random access-based 6G mMTC model. Section III designs a Bayesian joint decoding algorithm for the proposed massive uncoupled unsourced random access scheme. Section IV analyzes the convergence, complexity and error performance of the proposed decoding algorithm and provides some useful insights via asymptotic analysis. Then, simulation results are given in Section V to evaluate the effectiveness of the proposed scheme. Finally, Section VI concludes the paper.

Notations: Bold upper (lower) letters denote matrices (column vectors), $(\cdot)^{T}$ denotes transpose, $(\cdot)^{H}$ denotes conjugate transpose, $\mathbb{C}^{a\times b}$ denotes a complex matrix or vector of dimension $a\times b$ , $\mathcal{CN}(\textbf{x},\textbf{Y})$ denotes the complex Gaussian distribution of a vector with mean x and covariance Y, $\textmd{Pr}(\cdot)$ denotes the probability of an event, $\mathbb{E}\{\cdot\}$ denotes expectation, $\exp(\cdot)$ denotes the exponent, $[\textbf{X}]_{a,b}$ , $[\textbf{X}]_{a,:}$ and $[\textbf{X}]_{:,b}$ denote the $(a,b)$ -th element, $a$ -th row and $b$ -th column of matrix X, respectively, $\|\cdot\|_{2}$ and $\|\cdot\|_{F}$ denote the 2-norm of a vector and Frobenius norm of a matrix respectively, and $\textmd{tr}(\cdot)$ denotes the trace of a matrix. $[\mathcal{L}]_{k}$ denotes the $k$ -th element of set $\mathcal{L}$ .

II System Model

Consider an unsourced random access protocol-based single-cell 6G mMTC system, where a BS equipped with $M$ antennas serves $K_{\textmd{tot}}$ single-antenna IoT user equipments (UEs) over the same time-frequency resource block. Due to the sporadic traffic of IoT applications, only a small set of $K_{a}(K_{a}\ll K_{\textrm{tot}})$ UEs, denoted by $\mathcal{K}_{a}$ , are active in a certain time slot. Active UE $k$ transmits a binary message $\bm{m}_{k}$ of $b$ bits to the BS and the transmitted message set is represented by $\mathcal{L}=\{\bm{m}_{k}:k\in\mathcal{K}_{a}\}$ . To reduce the complexity of receiver, the $b$ -bit message $\bm{m}_{k}$ of the $k$ -th $(k\in\mathcal{K}_{a})$ active UE is divided into $L$ short sub-blocks of length $J$ with $b=LJ$ , such that a small codebook can be employed. Meanwhile, a time slot is partitioned into $L$ sub-slots of duration $n_{0}$ symbols each. According to the principle of unsourced random access, all UEs share the same codebook. Each active UE maps its sub-blocks to codewords based on a common codeword selection scheme. Specifically, let $\textbf{C}=[\bm{c}_{1},...,\bm{c}_{2^{J}}]\in\mathbb{C}^{n_{0}\times 2^{J}}$ be the codebook with each column $\{\bm{c}_{i}\in\mathbb{C}^{n_{0}\times 1},\ i\in[1:2^{J}]\}$ representing a codeword with unit norm $\|\bm{c}_{i}\|^{2}=1$ . In the $l$ -th ( $l\in[1,L]$ ) sub-slot, the $l$ -th sub-block of the $k$ -th ( $k\in\mathcal{K}_{a}$ ) active UE is mapped to integer $i_{k,l}$ and the $i_{k,l}$ -th codeword $\bm{c}_{i_{k,l}}$ will be transmitted. For a $J$ -bit binary sub-block, there are $2^{J}$ possible combinations, thus $i_{k,l}\in[1,2^{J}]$ . Herein, the split of message and map** of codeword at the active UE terminal are called encoding.

After map**, all active UEs synchronously transmit their codewords to the BS over $L$ sub-slots in sequence. Hence, the received signal $\textbf{Y}_{l}\in\mathbb{C}^{n_{0}\times M}$ at the BS in the $l$ -th sub-slot can be expressed as

\displaystyle\textbf{Y}_{l}=\sum\limits_{k\in\mathcal{K}_{a}}\bm{c}_{i_{k,l}}% \bm{h}^{T}_{k,l}+\textbf{Z}_{l}=\textbf{C}\bm{\Delta}_{l}\textbf{H}_{l}+% \textbf{Z}_{l}=\textbf{C}\textbf{X}_{l}+\textbf{Z}_{l},

(1)

where $\bm{c}_{i_{k,l}}$ denotes the codeword sent by the $k$ -th active UE in the $l$ -th sub-slot. $\bm{h}_{k,l}\in\mathbb{C}^{M\times 1}$ is the channel vector from the UE $k$ to the BS in the $l$ -th sub-slot. It follows Rayleigh block fading across sub-slots, i.e., $\bm{h}_{k,l}\sim\mathcal{CN}(\bm{0},\tilde{g}_{k}\mathbf{I})$ with $\tilde{g}_{k}$ being the large-scale fading coefficient which are unknown at the BS. $\textbf{Z}_{l}$ represents the additive white Gaussian noise (AWGN) with zero mean and variance $\sigma^{2}=N_{0}/(n_{0}P_{t})$ , where $N_{0}$ is noise power and $P_{t}$ is the per-symbol maximum transmit power. By arrangement, we let $\textbf{H}_{l}=[\bm{h}_{1,l},\bm{h}_{2,l},...,\bm{h}_{{K_{\textrm{tot}}},l}]^{T}$ . $\bm{\Delta}_{l}\in\{0,1\}^{2^{J}\times K_{\textrm{tot}}}$ is a binary codeword activity indicator matrix. It has 1 elements in the $i_{k,l}$ -th row and the $k$ -th ( $k\in\mathcal{K}_{a}$ ) column, and 0 entries in the rest. With the received signals, the BS detects active slot-wise codewords $\bm{c}_{i_{k,l}}$ and stitches the codewords across sub-slots to recover messages $\bm{m}_{k}$ . In the following section, we design a decoding algorithm for the BS to achieve this goal.

III Design of Bayesian Joint Decoding

In this section, we design a Bayesian joint decoding algorithm to recover the original messages from the received signals at the BS. Specifically, the decoder detects the transmitted codewords of every sub-slot based on noisy observation $\textbf{Y}_{l}$ and common codebook C without prior information including the number of active UEs $K_{a}$ , channel fading coefficient $\tilde{g}_{k}$ , and noise variance $\sigma^{2}$ . Then, the acquired codewords are unmapped to sub-blocks and stitched together to obtain original messages. The details of the Bayesian joint decoding algorithm are provided below.

III-A The preparation of decoding

Assumption: In this paper, to facilitate decoding design and performance analysis, it is assumed that the transmitted codewords within a certain sub-slot do not collide with each other due to suitable parameter settings¹¹1In general, the $2^{J}$ codewords in the common codebook are selected by $K_{a}$ active UEs independently across $L$ sub-slots. The codeword collision probability can be computed as $\textmd{Pr}(\textmd{collision})=1-((1-2^{-J})^{(K_{a}-1)})^{L}$ . For convenience of decoding design and performance analysis, the ideal situation without collision under suitable parameters is considered. ²In the case of codeword collision, the following collision intervention mechanism before codeword stitching is given for reference [22]. When collisions occur in sub-slot $l$ , the number of non-zero rows of the codeword detection output $\hat{\textbf{X}}_{l}$ is less than $K_{a}$ (the estimated $K_{a}$ in the former sub-slot without collisions can be used) and the BS can only recover the superimposed channels of collided codewords, resulting in the failure of codeword concatenation. At this time, the BS judges which codewords are sent by multiple UEs via energy detection and then feeds the indices of repeatedly transmitted codewords back to all UEs. The UEs who find they are in a collision slide their sub-block $l$ window of length $J$ bits forward with sliding length $0<L_{\rm slide}<J$ , such that the new sequences can be used to map different codewords. Note that these new codewords are retransmitted and only used to detect the channels of collided UEs, where the channels no longer overlap and can be used for the stitching of original codewords.^,2.

Under this assumption, the following approximate distribution can be adopted in decoding. For the matrix $\textbf{X}_{l}\in\mathbb{C}^{2^{J}\times M}=\bm{\Delta}_{l}\textbf{H}_{l}$ , its $j$ -th row follows a Bernoulli-Gaussian distribution, i.e., $\forall j\in[1,2^{J}]$

\bm{x}_{j,l}\sim\left\{\begin{array}[]{ll}\mathcal{CN}(\bm{0},g_{j,l}\textbf{I% }),&\textmd{probability}=\varepsilon_{j}\\ \textbf{0},&\textmd{probability}=1-\varepsilon_{j}\end{array}\right.,

(2)

where $\varepsilon_{j}=1-(1-1/2^{J})^{K_{a}}\approx K_{a}/2^{J}$ denotes the non-zero probability of $\bm{x}_{j,l}$ , i.e., codeword activity probability. $g_{j,l}=\sum\nolimits_{k}\delta^{l}_{j,k}\tilde{g}_{k}$ represents the codeword variance with $\delta^{l}_{j,k}$ being the $(j,k)$ -th element of $\bm{\Delta}_{l}$ . For notational simplicity, we omit the sub-slot index $l$ since the codewords transmission and detection are identical in each sub-slot, and denote the above distribution as

\displaystyle P_{X}(\bm{x}_{j};\varepsilon_{j},g_{j})=(1-\varepsilon_{j})% \delta_{0}+\varepsilon_{j}\mathcal{CN}(\bm{x}_{j};\bm{0},g_{j}\textbf{I}),

(3)

where $\delta_{0}$ is the dirac Delta at zero. In this context, X can be called a non-binary codeword state matrix and the row non-zero probability is $\varepsilon_{j}$ . Due to the sporadic activation of UEs, matrix X is row-wise sparse. Moreover, it is seen that the dimensions of codeword state matrix are independent of the total number of potential UEs $K_{\textmd{tot}}$ . Based on this model setting, the proposed Bayesian joint decoding algorithm containing codeword detection and stitching is implemented as illustrated in Fig. 1.

Refer to caption — Figure 1: The flow chart of Bayesian joint decoding. The decoder consists of four local modules, these modules work together to recover the original messages $\hat{\bm{m}}_{k}$ sent from multiple active UEs according to the noisy received signals Y. In particular, the OAMP detector including linear estimator $\gamma(\cdot)$ and non-linear estimator $\phi(\cdot)$ aims to detect the codeword state matrix X. The iteration between R and S has been defined in (4) and (5). By leveraging the MMSE estimations $[\pi,\bm{\lambda},\rho]$ of OAMP detector, parameter estimator $\psi(\cdot)$ estimates the unknown system parameters $[\sigma^{2},\varepsilon,g]$ and then feeds them back. With the estimated codeword state matrix $\hat{\textbf{X}}$ and channel statistics $\hat{g}$ , the original messages $\hat{\bm{m}}_{k}$ can be reconstructed in codeword splicer $\mu(\cdot)$ .

III-B The detection of codeword state matrix

Intuitively, the recovery of lists of transmitted codewords is equivalent to the reconstruction of codeword state matrix X, which is a compressed sensing problem because X is row-wise sparse. Further, this is also a multiple measurement vectors (MMV) setup due to multiple BS antennas [23]. By leveraging the sparsity structure of X as prior, in this part, we exploit the Bayes-optimal and divergence-free orthogonal approximate message passing (OAMP) method to detect the codeword state matrix [24, 25, 26]. Meanwhile, since the BS has no knowledge of codeword activity probability $\varepsilon_{j}$ , codeword variance $g_{j}$ and noise variance $\sigma^{2}$ , an expectation maximization (EM) algorithm is adopted to estimate them [27].

Generally, we aim to obtain the minimum mean square error (MMSE) estimation of X based on the received signals (1) and sparsity structure (3). Yet, it is not trivial to address these two constraints jointly. To this end, the proposed OAMP detector works in an iterative way. Specifically, two local modules, i.e., a linear estimator $\gamma(\cdot)$ and a non-linear estimator $\phi(\cdot)$ , process linear constraint (1) and non-linear constraint (3) separately and run iteratively to obtain the final results. Define the iteration between the $\gamma(\cdot)$ and $\phi(\cdot)$ as:

	$\displaystyle\textbf{R}^{t}$	$\displaystyle=\gamma(\textbf{S}^{t-1}),$		(4)
	$\displaystyle\textbf{S}^{t}$	$\displaystyle=\phi(\textbf{R}^{t}),$		(5)

where $t$ is the iteration index, $\textbf{R}^{t}$ in (4) (or $\textbf{S}^{t}$ in (5)) is the a-posterior estimation of X generated by estimator $\gamma(\cdot)$ (or $\phi(\cdot)$ ) and the a-prior mean $\textbf{S}^{t-1}$ of X in (4) (or $\textbf{R}^{t}$ in (5)).

Now let’s decompose the MMV problem caused by multiple-antenna deployment to simplify OAMP iterations. For linear constraint (1), since the correlation between different antennas (i.e., different columns of X) is not considered in our paper, matrix operation $\gamma(\textbf{S}^{t-1})$ is equivalent to column-vector operation $\gamma([\textbf{S}^{t-1}]_{:,m}),m\in[1,M]$ . For non-linear constraint (3), because X is row-sparse and channels between different codewords (i.e., different rows of X) are independent, matrix operation $\phi(\textbf{R}^{t})$ is equivalent to row-vector operation $\phi([\textbf{R}^{t}]_{j,:}),j\in[1,2^{J}]$ .

First, we consider the linear estimator $\gamma([\textbf{S}^{t-1}]_{:,m})$ based on the received column-vector signals $[\textbf{Y}]_{:,m}=\textbf{C}[\textbf{X}]_{:,m}+[\textbf{Z}]_{:,m},m\in[1,M]$ . Assume that the Gaussian observation of column vector $[\textbf{X}]_{:,m}$ at the $t$ -th iteration is given by

[\textbf{S}^{t}]_{:,m}=[\textbf{X}]_{:,m}+\bm{n}^{t}_{m},m\in[1,M],

(6)

where $\bm{n}_{m}^{t}\sim\mathcal{CN}(\bm{0},v_{m}^{t}\textbf{I})$ is the complex Gaussian random vector with variance $v_{m}^{t}=\tfrac{1}{2^{J}}\mathbb{E}\{\|[\textbf{S}^{t}]_{:,m}-[\textbf{X}]_{:% ,m}\|^{2}\}$ . Starting with $[\textbf{S}^{0}]_{:,m}=\bm{0}$ , $v_{m}^{0}=1$ and $t=1$ , the linear MMSE (LMMSE) estimation of $[\textbf{X}]_{:,m}$ at the $t$ -th iteration in column-by-column way can be computed as

[\hat{\textbf{R}}^{t}]_{:,m}=\hat{\textbf{B}}^{t-1}_{m}([\textbf{Y}]_{:,m}-% \textbf{C}[\textbf{S}^{t-1}]_{:,m}),m\in[1,M]

(7)

with

\hat{\textbf{B}}^{t-1}_{m}=v_{m}^{t-1}\textbf{C}^{H}\left(v_{m}^{t-1}\textbf{C% }\textbf{C}^{H}+(\sigma^{2})^{t-1}\textbf{I}\right)^{-1}.

(8)

To guarantee the convergence of iterative algorithm, the output of linear estimator $\gamma(\cdot)$ at the $t$ -th iteration is imposed a scaling and an orthogonalization on LMMSE estimation as follows ( $m\in[1,M]$ )

[\textbf{R}^{t}]_{:,m}=\gamma([\textbf{S}^{t-1}]_{:,m})=[\textbf{S}^{t-1}]_{:,% m}+\frac{2^{J}}{\textmd{tr}(\hat{\textbf{B}}^{t-1}_{m}\textbf{C})}[\hat{% \textbf{R}}^{t}]_{:,m},

(9)

and the corresponding error variance can be calculated as

u^{t}_{m}=v_{m}^{t-1}[\frac{2^{J}}{\textmd{tr}(\hat{\textbf{B}}^{t-1}_{m}% \textbf{C})}-1].

(10)

Next, we consider the non-linear estimator $\phi([\textbf{R}^{t}]_{j,:})$ based on the row-sparse structure of X. Assume that the Gaussian observation of row vector $\bm{x}_{j}$ at the $t$ -th iteration is given by

\bm{r}_{j}^{t}=[\textbf{R}^{t}]_{j,:}=\bm{x}_{j}+\mathbbm{n}_{j}^{t},j\in[1,2^% {J}],

(11)

where $\mathbbm{n}_{j}^{t}\!\!\sim\!\!\mathcal{CN}(\bm{0},u^{t}\textbf{I})$ is the complex Gaussian random vector with variance $u^{t}=\tfrac{1}{M}\sum\limits_{m=1}^{M}u^{t}_{m}$ . Under this assumption, the MMSE estimations of $\bm{x}_{j}$ in row-by-row way are generated as

	$\displaystyle\hat{\bm{s}}^{t}_{j}=\pi^{t}_{j}\bm{\lambda}^{t}_{j},$		(12)
	$\displaystyle\hat{v}^{t}=\frac{1}{2^{J}}\sum\limits_{j=1}^{2^{J}}\left[\pi^{t}% _{j}\left(\rho^{t}_{j}+\\|\bm{\lambda}^{t}_{j}\\|_{2}^{2}\right)-\\|\hat{\bm{s}}^% {t}_{j}\\|_{2}^{2}\right],$		(13)

where posterior activity probability $\pi^{t}_{j}$ , Gaussian posterior mean $\bm{\lambda}^{t}_{j}$ and variance $\rho^{t}_{j}$ are calculated as

	$\displaystyle\pi^{t}_{j}=[(1/\varepsilon^{t-1}_{j}-1)(1+g^{t-1}_{j}/u^{t})^{M}$		(14)
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \textmd{exp}\left(\frac{-g^{t-1}_{j}% \\|\bm{r}^{t}_{j}\\|_{2}^{2}}{u^{t}(g^{t-1}_{j}+u^{t})}\right)+1]^{-1},$
	$\displaystyle\bm{\lambda}^{t}_{j}=\frac{g^{t-1}_{j}}{g^{t-1}_{j}+u^{t}}\bm{r}^% {t}_{j},$		(15)
	$\displaystyle\rho^{t}_{j}=\frac{g^{t-1}_{j}u^{t}}{g^{t-1}_{j}+u^{t}}.$		(16)

After that, similarly, we carry out the scaling and orthogonalization on MMSE estimations at the $t$ -th iteration to guarantee the convergence of OAMP detector as follows

	$\displaystyle v^{t}=\frac{\hat{v}^{t}u^{t}}{u^{t}-\hat{v}^{t}},$		(17)
	$\displaystyle\bm{s}^{t}_{j}=[\textbf{S}^{t}]_{j,:}=\phi(\bm{r}^{t}_{j})=v^{t}[% \hat{\bm{s}}^{t}_{j}/\hat{v}^{t}-\bm{r}^{t}_{j}/u^{t}].$		(18)

The derivations and detailed explanations of the above content can be found in our previous work [28]. Notice that when the iteration between two estimators converges, the MMSE estimation $\hat{\textbf{S}}^{t_{Last}}=[\hat{\bm{s}}^{t_{Last}}_{1},...,\hat{\bm{s}}^{t_{% Last}}_{j},...,\hat{\bm{s}}^{t_{Last}}_{2^{J}}]^{T}$ before scaling and orthogonalization is the final estimation of X, i.e., $\hat{\textbf{X}}=\hat{\textbf{S}}^{t_{Last}}$ , where $t_{Last}$ is the index of last iteration.

III-C The estimation of unknown priors

During the above iteration between $\gamma(\cdot)$ and $\phi(\cdot)$ , the prior information including noise variance $(\sigma^{2})^{t-1}$ , codeword activity probability $\varepsilon_{j}^{t-1}$ and codeword variance $g_{j}^{t-1}$ are required simultaneously. However, it is difficult for the BS to acquire these information accurately and timely in advance. Therefore, a parameter estimator $\psi(\cdot)$ is needed and combined with $\gamma(\cdot)$ and $\phi(\cdot)$ to realize the codeword detection.

In general, the maximum likelihood (ML) method is often used for parameter estimation [29]. However, the likelihood function cannot be directly solved due to the existence of hidden variables. Fortunately, the existing powerful EM algorithm is able to convert the ML problem into the maximization problem of its lower bound. In this case, the problem is done through an “expectation” step of finding the distribution of the hidden variable and a “maximization” step of maximizing the likelihood function. Inspired by that, we employ an EM algorithm to learn the codeword activity probability $\varepsilon_{j}$ , codeword variance $g_{j}$ and noise variance $\sigma^{2}$ , denoted by $\bm{\omega}\triangleq[\sigma^{2},\varepsilon_{j},g_{j}]$ .

To be specific, we try to obtain the ML estimations of these unknown parameters, i.e., $\hat{\bm{\omega}}=\textmd{ arg}\max\limits_{\bm{\omega}}\ln{P(\textbf{Y}|\bm{% \omega})}$ . It can be derived through the following two steps:

•

Expectation Step: Solve the expectation conditioned on Y with parameters $\bm{\omega}^{t-1}$ .

	$\displaystyle Q(\bm{\omega},\bm{\omega}^{t-1})$	$\displaystyle=\mathbb{E}\{\ln{P(\textbf{Y},\textbf{X}\|\bm{\omega})}\|\textbf{Y}% ,\bm{\omega}^{t-1}\}$		(19)
		$\displaystyle=\int_{\textbf{X}}P(\textbf{X}\|\textbf{Y},\bm{\omega}^{t-1})\ln{P% (\textbf{Y},\textbf{X}\|\bm{\omega})}.$		(20)

•

Maximization Step: Find the maximum of conditional expectation.

\displaystyle\bm{\omega}^{t}={\rm arg}\max\limits_{\bm{\omega}}Q(\bm{\omega},% \bm{\omega}^{t-1}).

(21)

The joint optimization of $\bm{\omega}$ is intractable, thus we divide this ML problem into three independent parts, where one parameter is updated at one time while other parameters are fixed.

EM update of noise variance: The elements of AWGN matrix Z follow the i.i.d. Gaussian distribution $P_{\textbf{Z}}(z;\sigma^{2})=\mathcal{CN}(z;0,\sigma^{2})$ . Due to

\displaystyle P(\textbf{Y},\textbf{X}|\bm{\omega})=c_{1}P(\textbf{Y}|\textbf{X% };\sigma^{2})=c_{1}\prod\limits_{m=1}^{M}P([\textbf{Y}]_{:,m}|\textbf{C}[% \textbf{X}]_{:,m},\sigma^{2})

(22)

with $c_{1}$ being a constant independent of $\sigma^{2}$ , the EM update can be rewritten as

$\displaystyle(\sigma^{2})^{t}$	$\displaystyle=\textmd{arg}\max\limits_{\sigma^{2}}\sum\limits_{m=1}^{M}\mathbb% {E}\{\ln{P([\textbf{Y}]_{:,m}\|\textbf{C}[\textbf{X}]_{:,m};\sigma^{2})}\|% \textbf{Y},\bm{\omega}^{t-1}\}$
	$\displaystyle=\textmd{arg}\max\limits_{\sigma^{2}}\sum\limits_{m=1}^{M}P(% \textbf{C}[\textbf{X}]_{:,m}\|\textbf{Y},\bm{\omega}^{t-1})$	(23)
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ln{P([% \textbf{Y}]_{:,m}\|\textbf{C}[\textbf{X}]_{:,m};\sigma^{2})}.$	(24)

Substituting $P([\textbf{Y}]_{:,m}|\textbf{C}[\textbf{X}]_{:,m};\sigma^{2})\!\!=\!\!\mathcal% {CN}([\textbf{Y}]_{:,m};\textbf{C}[\textbf{X}]_{:,m},\sigma^{2}\textbf{I})$ into the above equation and zeroing the derivative, we can derive

\displaystyle\sum\limits_{j=1}^{2^{J}}\sum\limits_{m=1}^{M}\int_{X}P([\textbf{% CX}]_{j,m}|\textbf{Y},\bm{\omega}^{t-1})\frac{\textmd{ d}}{\textmd{d}\sigma^{2% }}\ln([\textbf{Y}]_{j,m}|[\textbf{CX}]_{j,m};\sigma^{2})=0.

(25)

Therefore, we have

	$\displaystyle(\sigma^{2})^{t}$	$\displaystyle=\frac{1}{M}\sum\limits_{m=1}^{M}\int_{X}\\|[\textbf{Y}]_{:,m}-% \textbf{C}[\textbf{X}]_{:,m}\\|_{2}^{2}P(\textbf{C}[\textbf{X}]_{:,m}\|\textbf{Y% },\bm{\omega}^{t-1})$
		$\displaystyle=\frac{1}{M}\sum\limits_{m=1}^{M}\left[\frac{1}{n_{0}}\\|[\textbf{% Y}]_{:,m}-\textbf{C}[\hat{\textbf{S}}^{t}]_{:,m}\\|_{2}^{2}+\hat{v}^{t}\right]$		(26)

with $\hat{\textbf{S}}^{t}$ and $\hat{v}^{t}$ being MMSE estimations in the non-linear estimator $\phi(\cdot)$ .

EM update of codeword activity probability: Due to

\displaystyle P(\textbf{Y},\textbf{X}|\bm{\omega})=c_{2}P_{X}(\textbf{X};% \varepsilon,g)=c_{2}\prod\limits_{j=1}^{2^{J}}P_{X}(\bm{x}_{j};\varepsilon_{j}% ,g_{j})

(27)

with $c_{2}$ being a constant independent of $\varepsilon$ , the EM update can be rewritten as

	$\displaystyle\varepsilon^{t}_{j}$	$\displaystyle=\textmd{arg}\max\limits_{\varepsilon\in(0,1)}\mathbb{E}\{\ln P(% \bm{x}_{j};\varepsilon_{j},g^{t-1}_{j})\|\textbf{Y},\bm{\omega}^{t-1}\}$
		$\displaystyle=\textmd{arg}\max\limits_{\varepsilon\in(0,1)}\int_{X}P(\bm{x}_{j% }\|\textbf{Y},\bm{\omega}^{t-1})\ln P(\bm{x}_{j};\varepsilon_{j},g^{t-1}_{j}).$		(28)

Substituting $P_{X}(\bm{x}_{j};\varepsilon_{j},g_{j})=(1-\varepsilon_{j})\delta_{0}+% \varepsilon_{j}\mathcal{CN}(\bm{x}_{j};\bm{0},g_{j}\textbf{I})$ into the above equation and zeroing the derivative, we can derive

\displaystyle\int_{X}P(\bm{x}_{j}|\textbf{Y},\bm{\omega}^{t-1})\frac{\textmd{d% }}{\textmd{d}\varepsilon_{j}}\ln P(\bm{x}_{j};\varepsilon_{j},g^{t-1}_{j})=0.

(29)

Hence, we have

\displaystyle\varepsilon^{t}_{j}=\pi^{t}_{j}

(30)

with $\pi^{t}_{j}=P(\bm{x}_{j}|\textbf{Y},\bm{\omega}^{t-1})$ being the posterior activity probability in the non-linear estimator $\phi(\cdot)$ .

EM update of codeword variance: Similarly, the EM update of $g_{j}$ can be rewritten as

	$\displaystyle g^{t}_{j}$	$\displaystyle=\textmd{arg}\max\limits_{g>0}\mathbb{E}\{\ln P(\bm{x}_{j};% \varepsilon^{t-1}_{j},g_{j})\|\textbf{Y},\bm{\omega}^{t-1}\}$
		$\displaystyle=\textmd{arg}\max\limits_{g>0}\int_{X}P(\bm{x}_{j}\|\textbf{Y},\bm% {\omega}^{t-1})\ln P(\bm{x}_{j};\varepsilon^{t-1}_{j},g_{j}).$		(31)

\displaystyle\int_{X}P(\bm{x}_{j}|\textbf{Y},\bm{\omega}^{t-1})\frac{\textmd{d% }}{\textmd{d}g_{j}}\ln P(\bm{x}_{j};\varepsilon^{t-1}_{j},g_{j})=0.

(32)

Thus, we have

\displaystyle g^{t}_{j}=\frac{1}{M}\|\bm{\lambda}^{t}_{j}\|_{2}^{2}+\rho^{t}_{j}

(33)

with $\bm{\lambda}^{t}_{j}$ and $\rho^{t}_{j}$ being the Gaussian posterior mean and variance in the non-linear estimator $\phi(\cdot)$ . So far, the parameter estimator $\left[(\sigma^{2})^{t},\varepsilon_{j}^{t},g^{t}_{j}\right]=\psi\left((\sigma^% {2})^{t-1},\varepsilon_{j}^{t-1},g^{t-1}_{j}\right)$ has been designed.

Remark: It is worth pointing out that initial values of these unknown parameters are crucial to the above EM updates. Following a previously relevant work [27], $\bm{\omega}^{0}=\left[(\sigma^{2})^{0},\varepsilon_{j}^{0},g_{j}^{0}\right]$ are set as

		$\displaystyle(\sigma^{2})^{0}=\frac{\\|\textbf{Y}\\|_{F}^{2}}{n_{0}M\left[(\\|% \textbf{CX}\\|_{F}^{2}/\\|\textbf{Z}\\|_{F}^{2})^{0}+1\right]},$		(34)
		$\displaystyle\varepsilon_{j}^{0}=\frac{n_{0}}{2^{J}}\max\limits_{c_{3}>0}\frac% {1-2^{J+1}\left[(1+c_{3}^{2})\Phi(-c_{3})-c_{3}\varphi(c_{3})\right]/n_{0}}{1+% c^{2}-2\left[(1+c_{3}^{2})\Phi(-c_{3})-c_{3}\varphi(c_{3})\right]},$		(35)
		$\displaystyle g_{j}^{0}=\frac{n_{0}}{2^{J}M}\left\\|[\textbf{C}^{H}\textbf{Y}]_% {j,:}\right\\|_{2}^{2},$		(36)

where $(\|\textbf{CX}\|_{F}^{2}/\|\textbf{Z}\|_{F}^{2})^{0}$ is usually set as 100, $\Phi(\cdot)$ and $\varphi(\cdot)$ denote the cumulative distribution function (CDF) and probability density function (PDF) of standard normal distribution, respectively. The iteration process of state detection and parameter estimation continues until the estimated codeword state matrix converges.

III-D The stitching of disordered codewords

Based on the estimated codeword state matrix $\hat{\textbf{X}}_{l}$ in the $l$ -th sub-slot, we can determine the sets of active codewords present in each sub-slot. However, it is currently unknown which active codewords originate from the same active UE, making it impossible to recover their complete messages. To address this issue, we need to classify the disordered active codewords into distinct classes based on specific characteristics. These classes represent groups of active codewords transmitted by the same device, and the characteristics used for classification are referred to as class labels. Once the active codewords belonging to the same class are mapped back to binary sub-blocks and concatenated in the correct chronological order, we can obtain the original messages from each active UE.

Specifically, we set a flexible decision threshold $\theta_{j}$ for different codeword index $j$ to obtain the list of active codewords in each sub-slot. The specific expression of threshold $\theta_{j}$ is derived in Section IV-B. Denote the list of active codewords of sub-slot $l$ as $\mathcal{L}^{\textmd{ac}}_{l}=\{j:\|\hat{\bm{x}}_{j,l}\|_{2}^{2}>\theta_{j}\},% l\in[1,L]$ , where $\hat{\bm{x}}_{j,l}$ is the $j$ -th row of estimated codeword state matrix $\hat{\textbf{X}}_{l}$ and the $j$ in the list is ordered from smallest to largest. In other words, the $j$ -th codeword in codebook C, for $\|\hat{\bm{x}}_{j,l}\|_{2}^{2}>\theta_{j}$ , is the transmitted codeword from one of active UEs. Meanwhile, let $\hat{K}_{a}=|\mathcal{L}^{\textmd{ac}}_{1}|$ , we can obtain $\hat{K}_{a}$ channel vector estimations and channel fading coefficient estimations as follows, $\forall\hat{k}\in[1,\hat{K}_{a}],j=[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k}}$ ,

	$\displaystyle\hat{\bm{h}}_{\hat{k},l}$	$\displaystyle=\hat{\bm{x}}_{j,l}=\bm{h}_{\hat{k},l}+\bm{e}\sim\mathcal{CN}% \left(\bm{0},\hat{g}_{\hat{k},l}\textbf{I}\right),$		(37)
	$\displaystyle\hat{g}_{\hat{k},l}$	$\displaystyle=\tilde{g}_{\hat{k}}+\hat{v}^{t_{Last}},$		(38)

where the unknown true $\tilde{g}_{\hat{k}}$ needs to be replaced with EM update result $g_{j,l}^{t_{Last}}(j=[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k}})$ and the estimation error $\bm{e}\sim\mathcal{CN}(\bm{0},\hat{v}^{t_{Last}}\textbf{I})$ . $g_{j,l}^{t_{Last}}$ and $\hat{v}^{t_{Last}}$ can be fonud in (33) and (13). Due to the flexible setting of thresholds in Section IV-B, the proposed Bayesian codeword detection can achieve high detection accuracy under favorable conditions, thereby it is assumed that $\hat{K}_{a}$ is accurately estimated. As a supplement, we conduct experiments to evaluate the influence of inaccurate $\hat{K}_{a}$ on the algorithm performance in Section V.

With these estimated information, we employ a Bayesian classification approach to stitch the codewords sent from the same active UE and recover its original long message [30, 31]. Based on the intrinsic correlations between the slot-wise channels experienced by a certain active UE, we pick $L$ codewords from $L$ lists of active codewords in sequence for each active UE. Codewords in one list must be selected by different active UEs due to the no-collision transmission.

First, let $\hat{K}_{a}$ codewords in $\mathcal{L}^{\textmd{ac}}_{1}$ naturally be $\hat{K}_{a}$ different classes, denoted by $\{\mathcal{C}_{k^{\prime}},k^{\prime}\in[1,\hat{K}_{a}]\}$ . Define the initial labels of these classes, i.e., the common channel statistics of codewords in the same class, as $\{\xi_{k^{\prime}}=\hat{g}_{k^{\prime},1},k^{\prime}\in[1,\hat{K}_{a}]\}$ . For $l\in[2,L],\hat{k}\in[1,\hat{K}_{a}]$ , the posterior probability of channel vector estimation $\hat{\bm{h}}_{\hat{k},l}$ in each class will be calculated to predict which class the estimated $\hat{\bm{h}}_{\hat{k},l}$ is actually in. Specifically, the $k^{\prime}$ -th class $\mathcal{C}_{k^{\prime}},k^{\prime}\in[1,\hat{K}_{a}]$ with the maximum posterior probability is the class to which $\hat{\bm{h}}_{\hat{k},l}$ belongs and the corresponding codeword index $[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k}}$ is grouped into this class. That is,

	$\displaystyle[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k}}\in\mathcal{C}_{k^{% \prime}}$	$\displaystyle=\textmd{ arg}\max\limits_{\mathcal{C}_{k^{\prime}}}P(\mathcal{C}% _{k^{\prime}}\|\hat{\bm{h}}_{\hat{k},l})$		(39)
		$\displaystyle=\textmd{ arg}\max\limits_{\mathcal{C}_{k^{\prime}}}P(\mathcal{C}% _{k^{\prime}})\prod\limits_{m=1}^{M}P(\hat{h}_{\hat{k},l,m}\|\mathcal{C}_{k^{% \prime}}),$

where $P(\mathcal{C}_{k^{\prime}})=\tfrac{1}{\hat{K}_{a}}$ represents the prior probability of class $\mathcal{C}_{k^{\prime}}$ , $P(\hat{h}_{\hat{k},l,m}|\mathcal{C}_{k^{\prime}})$ represents the conditional probability and $\hat{h}_{\hat{k},l,m}$ is the $m$ -th element of $\hat{\bm{h}}_{\hat{k},l}$ . Due to the Gaussian distribution of $\hat{h}_{\hat{k},l,m}$ , the probability of observing $\hat{h}_{\hat{k},l,m}$ given the class $\mathcal{C}_{k^{\prime}}$ can be expressed as

\displaystyle P(\hat{h}_{\hat{k},l,m}|\mathcal{C}_{k^{\prime}})=\frac{1}{\sqrt% {2\pi\xi_{k^{\prime}}}}{\rm exp}(-\frac{\hat{h}^{2}_{\hat{k},l,m}}{2\xi_{k^{% \prime}}}).

(40)

When all indices of active codewords in list $\mathcal{L}^{\textmd{ac}}_{l}$ have been grouped, each class label $\xi_{k^{\prime}}$ is updated as

\displaystyle\xi_{k^{\prime}}=\left((l-1)\xi_{k^{\prime}}+\hat{g}_{\hat{k},l}% \right)/l,k^{\prime}\in[1,\hat{K}_{a}],[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k% }}\in\mathcal{C}_{k^{\prime}}.

(41)

This is one round of codeword classification. The final class labels of this round are assigned to the initial class labels of the next round, and then the above classification process is repeated. The classification result when the class labels converge is the final result. After that, concatenate $L$ codewords which belong to the same class in chronological order. The long message consisting of $L$ sub-blocks mapped back by these codewords is one of the decoder outputs. Do this for all classes, the transmitted message set $\{\hat{\bm{m}}_{k^{\prime}}:k^{\prime}\in[1,\hat{K}_{a}]\}$ of all active UEs is obtained. In summary, the proposed Bayesian joint decoding algorithm can be described as Algorithm 1.

Algorithm 1 : Bayesian Joint Decoding

Input: The codebook matrix C, the received signal Y.
Output: The estimated message.

1: for

l=1:L

2: Initialize the maximum number of iterations

T_{D},T_{S}

\hat{\textbf{X}}_{l}^{0}=\textbf{0}

\textbf{S}^{0}=\textbf{0}

v^{0}=1

, convergence accuracy

\varpi_{D},\varpi_{S}

, and

\bm{\omega}^{0}=[(\sigma^{2})^{0},\varepsilon_{j}^{0},g_{j}^{0}]

are initialized as (34)-(36).

3: for

t=1:T_{D}

4: — Bayesian Codeword Detection —

5: % Linear Estimator

\gamma(\cdot)

6: Compute LMMSE estimations

\hat{\textbf{B}}_{m}^{t-1}

and

[\hat{\textbf{R}}^{t}]_{:,m}

according to (8) and (7);

7: Obtain orthogonal outputs

[\textbf{R}^{t}]_{:,m}

and

u_{m}^{t}

according to (9) and (10);

8: % Non-Linear Estimator

\phi(\cdot)

9: Set

u^{t}=\tfrac{1}{M}\sum\nolimits_{m=1}^{M}u^{t}_{m}

and

\bm{r}_{j}^{t}=[\textbf{R}^{t}]_{j,:}

;

10: Compute MMSE estimations

\hat{\bm{s}}_{j}^{t}

and

\hat{v}^{t}

according to (12) and (13), and obtain matrix

\hat{\textbf{S}}^{t}=[\hat{\bm{s}}^{t}_{1},...,\hat{\bm{s}}^{t}_{j},...,\hat{% \bm{s}}^{t}_{2^{J}}]^{T}

;

11: Compute orthogonal outputs

\bm{s}_{j}^{t}

and

v^{t}

according to (18) and (17), and obtain matrix

\textbf{S}^{t}=[\bm{s}^{t}_{1},...,\bm{s}^{t}_{j},...,\bm{s}^{t}_{2^{J}}]^{T}

;

12: % Parameter Estimator

\psi(\cdot)

13: Update

(\sigma^{2})^{t},\varepsilon_{j}^{t},g_{j}^{t}

according to (III-C), (30) and (33);

14: if

\|\hat{\textbf{S}}^{t}-\hat{\textbf{X}}_{l}^{t-1}\|_{F}^{2}/\|\hat{\textbf{S}}% ^{t}\|_{F}^{2}<\varpi_{D}

then

15: Save

\hat{\textbf{X}}_{l}=\hat{\textbf{S}}^{t}

and

\bm{g}_{l}=[g_{1}^{t},...,g_{j}^{t},...,g_{2^{J}}^{t}]

;

16: Break;

17: else

18: Set

\hat{\textbf{X}}_{l}^{t}=\hat{\textbf{S}}^{t}

;

19: end if

20: end for

21: — Bayesian Codeword Stitching —

22: % Hard Decision

23: Compute decision threshold

\theta_{j}

;

24: Judge

\mathcal{L}_{l}^{\textmd{ac}}=\{j:\|\hat{\bm{x}}_{j,l}\|_{2}^{2}>\theta_{j},j% \in[1,2^{J}]\}

and let

\hat{K}_{a}=|\mathcal{L}_{1}^{\textmd{ac}}|

;

25: % Codeword Splicer

\mu(\cdot)

26: Save

\hat{\bm{h}}_{\hat{k},l}=\hat{\bm{x}}_{j,l}

and

\hat{g}_{\hat{k},l}=g^{t}_{j,l}+\hat{v}^{t},\forall\hat{k}\in[1,\hat{K}_{a}],j% =[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k}}

;

27: end for

28: Set initial class label

\{\xi_{k^{\prime}}^{0}=\hat{g}_{k^{\prime},1},k^{\prime}\in[1,\hat{K}_{a}]\}

;

29: for

t=1:T_{S}

30: Let

\xi_{k^{\prime}}^{t}=\xi_{k^{\prime}}^{t-1}

;

31: for

l=2:L

32: Compute

\mathcal{C}_{k^{\prime}}

based on (39) for

\forall\hat{k},k^{\prime}\in[1,\hat{K}_{a}]

, judge the codeword index

[\mathcal{L}^{\textmd{ac}}_{l}]_{\hat{k}}\in\mathcal{C}_{k^{\prime}}

;

33: Update

\xi_{k^{\prime}}^{t}

according to (41);

34: end for

35: if

\max(|\bm{\xi}^{t}-\bm{\xi}^{t-1}|)<\varpi_{S}

then

36: Break;

37: end if

38: end for

39: Output

\hat{\bm{m}}_{k^{\prime}}((l-1)J+1:lJ)=\textmd{demap}(j_{l})\in\mathbb{B}^{J% \times 1},j_{l}\in\mathcal{C}_{k^{\prime}}\cap\mathcal{L}^{\textmd{ac}}_{l},k^% {\prime}\in[1,\hat{K}_{a}],l\in[1,L]

IV Performance Analysis of Massive Unsourced Random Access

In this section, the convergence, complexity and error probability of the proposed Bayesian joint decoding-based massive uncoupled unsourced random access scheme are analyzed. Meanwhile, asymptotic analysis in some extreme cases are performed to provide useful insights for the design of massive unsourced random access.

IV-A Convergence and Decoding Complexity

First, we discuss the convergence of the proposed decoding algorithm. There are two essential iterations involved. For the convergence of codeword detection. The orthogonalization of OAMP in (9), (10), (17) and (18) are diverge-free and de-correlated operations (please see [24] for proof), which ensures that $\{u^{t},v^{t}\}$ are monotonically decreasing sequences. Besides, they both have lower bound $0$ . Combined with the monotone bounded theorem, we know that codeword detection is convergent. The SE fixed point of detector is derived in Appendix A. For the convergence of codeword stitching, because the initial class labels of the next round of classification are assigned by the final updated values of current round, the classification result of the next round must be better than that of the current round. After several rounds of classification, class labels will not change. This indicates that the classification result reaches a steady state and thereby the convergence of the proposed codeword stitching is guaranteed.

Then, let us state the complexity of the proposed decoding algorithm. On the one hand, the computational complexity of codeword detection in each iteration can be expressed as $\mathcal{O}\left((n_{0}^{2}2^{J}+n_{0}^{3})M\right)$ , which is dominated by matrix multiplication and matrix inverse in LMMSE estimation, norm operation in MMSE estimation and matrix-vector multiplication in parameter estimation. If codebook C has special structures, such as Hadamard or Discrete Fourier Transformation (DFT) matrix, the computational complexity will be reduced to $\mathcal{O}\left(J2^{J}M\right)$ . On the other hand, the computational complexity of codeword stitching in each round depends on the calculation and comparison of posterior probability, that is $\mathcal{O}\left(M(L-1)K_{a}^{2})\right)$ . Combined with the fact presented in simulation that the algorithm converges within 15 iterations, we can draw a conclusion that the proposed Bayesian joint decoding algorithm has a relatively low complexity by exploiting the codebook with a particular structure.

IV-B Error Probability of Decoding

To characterize the effectiveness of the proposed decoding algorithm, we focus on the error probability of decoding based on Assumption. In the following, error probability of codeword detection and codeword stitching will be derived respectively.

IV-B1 Bayesian codeword detection

Define the average detection error probability $P_{1}$ as

\displaystyle P_{1}=\frac{1}{L}\sum\limits_{l=1}^{L}\left[P^{\textmd{ac}}\frac% {1}{2^{J}}\sum\limits_{j=1}^{2^{J}}P^{\textmd{md}}_{j}+(1-P^{\textmd{ac}})% \frac{1}{2^{J}}\sum\limits_{j=1}^{2^{J}}P^{\textmd{fa}}_{j}\right],

(42)

where $P^{\textmd{ac}}=1-\left(1-1/2^{J}\right)^{K_{a}}\approx K_{a}/2^{J}$ denotes the overall codeword activity probability. $P^{\textmd{md}}_{j}=\textmd{Pr}[\hat{\delta}_{j}=0|\delta_{j}=1]$ and $P^{\textmd{fa}}_{j}=\textmd{Pr}[\hat{\delta}_{j}=1|\delta_{j}=0]$ are the misdetection probability and the false alarm probability of the $j$ -th codeword, respectively, with $\hat{\delta}_{j}$ and $\delta_{j}$ representing the estimated and actual activity indicator of the $j$ -th codeword. As mentioned above, the value of decision threshold determines the probabilities of misdetection and false alarm. Hence, we obtain these two error probabilities according to their definitions and the expression of hard threshold.

Recall that the input Gaussian observation of non-linear estimator $\phi(\cdot)$ at $t$ -th iteration $\bm{r}_{j}^{t}=\bm{x}_{j}+\mathbbm{n}_{j}^{t}$ , due to $\mathbbm{n}_{j}^{t}\sim\mathcal{CN}(\bm{0},u^{t}\textbf{I})$ and $P_{X}(\bm{x}_{j};\varepsilon_{j},g_{j})=(1-\varepsilon_{j})\delta_{0}+% \varepsilon_{j}\mathcal{CN}(\bm{x}_{j};\bm{0},g_{j}\textbf{I})$ ,the random variable

\displaystyle X\triangleq\begin{cases}(\bm{r}^{t}_{j})^{H}\bm{r}^{t}_{j}/(2u^{% t}),\quad\varepsilon_{j}=0,\\ (\bm{r}^{t}_{j})^{H}\bm{r}^{t}_{j}/(2(u^{t}+g^{t-1}_{j})),\quad\varepsilon_{j}% =1\end{cases}

(43)

follows the $\mathcal{X}^{2}$ distribution with $2M$ degree of freedom [32]. In the non-linear estimator, the posterior codeword activity probability can be rewritten as

	$\displaystyle\pi^{t}_{j}$	$\displaystyle=\left[(1/\varepsilon^{t-1}_{j}-1)(1+g^{t-1}_{j}/u^{t})^{M}e^{% \frac{-g^{t-1}_{j}\\|\bm{r}^{t}_{j}\\|_{2}^{2}}{u^{t}(g^{t-1}_{j}+u^{t})}}+1% \right]^{-1}$
		$\displaystyle=\left[(1/\varepsilon^{t-1}_{j}-1)e^{-M\left(\frac{g^{t-1}_{j}\\|% \bm{r}^{t}_{j}\\|_{2}^{2}}{Mu^{t}(g^{t-1}_{j}+u^{t})}-\log(1+g^{t-1}_{j}/u^{t})% \right)}+1\right]^{-1}.$		(44)

Intuitively, whether $\pi^{t}_{j}$ approaches $0$ or $1$ depends on the relative magnitude of $\frac{g^{t-1}_{j}\|\bm{r}^{t}_{j}\|_{2}^{2}}{Mu^{t}(g^{t-1}_{j}+u^{t})}$ and $\log(1+g^{t-1}_{j}/u^{t})$ . Based on this observation, the codeword activity decision is set as

\displaystyle\hat{\delta}_{j}=\begin{cases}0,\quad\|\bm{r}^{t}_{j}\|_{2}^{2}<% \theta_{j},\\ 1,\quad\|\bm{r}^{t}_{j}\|_{2}^{2}>\theta_{j},\end{cases}

(45)

where threshold $\theta_{j}=Mu^{t}(g^{t-1}_{j}+u^{t})\log(1+g^{t-1}_{j}/u^{t})/g^{t-1}_{j}$ . Then, the probabilities of misdetection and false alarm can be computed as

$\displaystyle P^{\rm md}_{j}$	$\displaystyle={\rm Pr}[\hat{\delta}_{j}=0\|\delta_{j}=1]={\rm Pr}\left(X\leq a_% {j}\right)$	(46)
	$\displaystyle=F_{2M}(a_{j})$
	$\displaystyle=\frac{\underline{\gamma}(M,a_{j}/2)}{\Gamma(M)}$

and

$\displaystyle P^{\rm fa}_{j}$	$\displaystyle={\rm Pr}[\hat{\delta}_{j}=1\|\delta_{j}=0]={\rm Pr}\left(X\geq b_% {j}\right)$	(47)
	$\displaystyle=1-F_{2M}(b_{j})$
	$\displaystyle=\frac{\overline{\gamma}(M,b_{j}/2)}{\Gamma(M)},$

where $a_{j}=Mu^{t}(g^{t-1}_{j}+u^{t})\log(1+g^{t-1}_{j}/u^{t})/(2g^{t-1}_{j}(u^{t}+g% _{j}^{t-1}))=Mu^{t}\log(1+g^{t-1}_{j}/u^{t})/(2g^{t-1}_{j})$ and $b_{j}=Mu^{t}(g^{t-1}_{j}+u^{t})\log(1+g^{t-1}_{j}/u^{t})/(2u^{t}g^{t-1}_{j})=M% (g^{t-1}_{j}+u^{t})\log(1+g^{t-1}_{j}/u^{t})/(2g^{t-1}_{j})$ . $F_{2M}(\cdot)$ is the CDF of $\mathcal{X}^{2}$ distribution with $2M$ degree of freedom. The Gamma function, the lower and upper incomplete Gamma functions are respectively denoted as

$\displaystyle\Gamma(s)$	$\displaystyle=\int\nolimits_{0}^{\infty}x^{s-1}e^{-x}dx,$	(48)
$\displaystyle\underline{\gamma}(s,x)$	$\displaystyle=\int\nolimits_{x}^{\infty}t^{s-1}e^{-t}dt,$
$\displaystyle\overline{\gamma}(s,x)$	$\displaystyle=\int\nolimits_{0}^{x}t^{s-1}e^{-t}dt.$

Therefore, the average detection error probability $P_{1}$ is

\displaystyle P_{1}=\frac{K_{a}}{2^{J}}\bar{P}^{\rm md}+(1-\frac{K_{a}}{2^{J}}% )\bar{P}^{\rm fa}

(49)

with $\bar{P}^{\rm md}=\frac{1}{2^{J}}\sum\nolimits_{j=1}^{2^{J}}P^{\rm md}_{j}$ and $\bar{P}^{\rm fa}=\frac{1}{2^{J}}\sum\nolimits_{j=1}^{2^{J}}P^{\rm fa}_{j}$ .

It is found that $P^{\rm md}_{j}$ and $P^{\rm fa}_{j}$ in Bayesian codeword detection have the following relationship

\displaystyle P^{\rm md}_{j}+P^{\rm fa}_{j}=1-\tau(\theta),

(50)

where $\tau(\theta)=F_{2M}(b_{j})-F_{2M}(a_{j})$ . Due to $b_{j}-a_{j}=\frac{M}{2}\log(1+\frac{g_{j}}{u^{t}})>0$ , it can be seen that $0<\tau(\theta)<1$ based on the monotone increasing CDF of $\mathcal{X}^{2}$ distribution. Hence, we can balance the misdetection probability and the false alarm probability by adjusting the threshold.

IV-B2 Bayesian codeword stitching

At the receiver, we adopt a Bayesian classification approach to realize the codeword stitching. In what follows, from the perspective of classification error rate (CER), we prove that the proposed Bayesian classification approach is optimal.

For any classification rule, when determining data $\bm{h}_{k,l}$ belongs to class $\mathcal{C}_{k^{\prime}}$ , the CER for a single class is defined as the probability that $\bm{h}_{k,l}$ does not belong to $\mathcal{C}_{k^{\prime}}$ , i.e., $P({\rm CER}|\mathcal{C}_{k^{\prime}})=1-P(\mathcal{C}_{k^{\prime}})$ . It can be observed that the CER now depends solely on the prior probability of class $\mathcal{C}_{k^{\prime}}$ , which can still be reduced. In this case, by considering the effect of observation $\bm{h}_{k,l}$ on classification decisions, the total CER for a class space with $K_{a}$ classes and $L$ data points in each class can be represented as

$\displaystyle P_{\rm CER}$	$\displaystyle=\sum\limits_{k^{\prime}=1}^{K_{a}}P(\mathcal{C}_{k^{\prime}})P({% \rm CER}\|\mathcal{C}_{k^{\prime}})$
	$\displaystyle=\sum\limits_{k^{\prime}=1}^{K_{a}}P(\mathcal{C}_{k^{\prime}})% \left(1-1/L\sum\limits_{k\in\mathcal{C}_{k^{\prime}}}P(\bm{h}_{k,l}\|\mathcal{C% }_{k^{\prime}})\right)$
	$\displaystyle=\sum\limits_{k^{\prime}=1}^{K_{a}}P(\mathcal{C}_{k^{\prime}})-% \sum\limits_{k^{\prime}=1}^{K_{a}}\left(P(\mathcal{C}_{k^{\prime}})/L\sum% \limits_{k\in\mathcal{C}_{k^{\prime}}}P(\bm{h}_{k,l}\|\mathcal{C}_{k^{\prime}})\right)$
	$\displaystyle=1-\sum\limits_{k^{\prime}=1}^{K_{a}}\left(1/L\sum\limits_{k\in% \mathcal{C}_{k^{\prime}}}P(\mathcal{C}_{k^{\prime}})P(\bm{h}_{k,l}\|\mathcal{C}% _{k^{\prime}})\right),$	(51)

where $P({\rm CER}|\mathcal{C}_{k^{\prime}})=1-1/L\sum\nolimits_{k\in\mathcal{C}_{k^{% \prime}}}P(\bm{h}_{k,l}|\mathcal{C}_{k^{\prime}})$ is the CER for data $\{\bm{h}_{k,l},k\in\mathcal{C}_{k^{\prime}},l\in[1,L]\}$ in the $k^{\prime}$ -th class. Intuitively, when the sample data $\bm{h}_{k,l}$ is grouped into the class $\mathcal{C}_{k^{\prime}}$ that maximizes $P(\mathcal{C}_{k^{\prime}})P(\bm{h}_{k,l}|\mathcal{C}_{k^{\prime}})$ , the total classification correct rate $(1-P_{\rm CER})$ for the class space can reach its maximum

\displaystyle\sum\limits_{k=1,l=1}^{K_{a},L}\max\limits_{k^{\prime}}P(\mathcal% {C}_{k^{\prime}})P(\bm{h}_{k,l}|\mathcal{C}_{k^{\prime}})/L.

(52)

This is equivalent to maximizing the Bayesian posterior probability $P(\mathcal{C}_{k^{\prime}}|\bm{h}_{k,l})$ . Therfore, it can be proved that the proposed Bayesian classification achieves the minimum CER, i.e.,

\displaystyle P_{\rm CER}^{\rm min}=1-\sum\limits_{k=1,l=1}^{K_{a},L}\max% \limits_{k^{\prime}}P(\mathcal{C}_{k^{\prime}})P(\bm{h}_{k,l}|\mathcal{C}_{k^{% \prime}})/L.

(53)

Note that the class conditional probability $P(\bm{h}_{k,l}|\mathcal{C}_{k^{\prime}})$ in Bayesian codeword stitching only needs the relatively magnitude to be used to pick out the maximum, and it doesn’t matter if the value is greater than $1$ . While the class conditional probability should be normalized when calculating the theoretical CER here.

After codeword detection, the codeword splicer receives a correct alternative codeword with probability $1-\bar{P}^{\rm md}$ and a wrong alternative codeword with probability $\bar{P}^{\rm fa}$ . Then, the final error probability of Bayesian joint decoding can be cast as

$\displaystyle P_{2}$	$\displaystyle=f(1-\bar{P}^{\rm md},\bar{P}^{\rm fa})$
	$\displaystyle=\underbrace{(1-\frac{K_{a}}{2^{J}})\bar{P}^{\rm fa}}\limits_{\rm wrong% \ codeword}$	(54)
	$\displaystyle+{\small\underbrace{\frac{K_{a}}{2^{J}}(1-\bar{P}^{\rm md})\left(% 1-\sum\limits_{k=1,l=1}^{k=K_{a},l=L}\max\limits_{k^{\prime}}P(\mathcal{C}_{k^% {\prime}})P(\bm{h}_{k,l}\|\mathcal{C}_{k^{\prime}})/L\right)}\limits_{\rm wrong% \ stitching\ of\ correct\ codeword}.}$

This means that false alarm of inactive codewords and wrong stitching of detected active codewords cause errors of Bayesian joint decoding³³3As an overall performance index of unsourced random access, $P_{2}$ is defined from the perspective of “codeword” and equivalent to the sum of the traditionally used probability of misdetection and the probability of false-alarm of per-user message, which are defined from the perspective of “message”.. However, it is difficult to observe the effects of key system parameters on decoding performance and draw some deterministic conclusions from this expression directly. Therefore, asymptotic analysis will be performed below to help us understand the theoretical performance.

IV-C Asymptotic Analysis

In this part, to facilitate the performance analysis of error probability of Bayesian joint decoding, some key parameters in extreme cases will be considered according to the characteristics of massive unsourced random access. For convenience, we first introduce the following lemmas.

Lemma 1

Assume that $n_{0},2^{J}\rightarrow\infty$ with a fixed ratio $n_{0}/2^{J}$ and C is a right unitarily invariant matrix, the iterative performance of Bayesian codeword detection can be tracked by the following state evolution: Starting with $t=1$ and $v^{1}=1$ ,

	$\displaystyle u^{t}$	$\displaystyle=\gamma_{\rm SE}(v^{t})=v^{t}[1/\Omega_{\gamma}^{t}-1],$		(55)
	$\displaystyle v^{t+1}$	$\displaystyle=\phi_{\rm SE}(u^{t})=u^{t}[1/\Omega_{\phi}^{t+1}-1],$		(56)

where

	$\displaystyle\Omega_{\gamma}^{t}$	$\displaystyle=\frac{1}{2^{J}}{\rm tr}\left\{\textbf{C}^{H}\left(\frac{\sigma^{% 2}}{v^{t}}\textbf{I}+\textbf{CC}^{H}\right)^{-1}\right\},$		(57)
	$\displaystyle\Omega_{\phi}^{t+1}$	$\displaystyle=1-\frac{1}{Mu^{t}2^{J}}\sum_{j}\left\\|\hat{\bm{s}}_{j}^{t}-\bm{x% }_{j}\right\\|_{2}^{2}$		(58)

with $\hat{\bm{s}}_{j}^{t}$ being the MMSE estimation in the non-linear estimator $\phi(\cdot)$ . Please see our previous work [28] for proof.

Lemma 2

Based on Lemma 1 and $K_{a}<n_{0}$ , the fixed point of $u^{t}$ in the above state evolution will converge to

\displaystyle u^{\infty}\approx\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})% \varepsilon}{(1-\varepsilon)n_{0}}}.

(59)

Please refer to Appendix A for proof. In the following analysis, $u^{t}$ in each expression is replaced by the convergent value $u^{\infty}$ .

IV-C1 The influence of the number of BS antennas on Bayesian joint decoding

Considering that the BS of 6G wireless networks is equipped with a very large antenna array, this analysis reveals the influence of the number of BS antennas on the error probability of Bayesian joint decoding in the extreme case.

For $\alpha_{j}=2a_{j}/M<1$ and $\beta_{j}=2b_{j}/M>1$ , the expressions of $P^{\rm md}_{j}$ and $P^{\rm fa}_{j}$ can be scaled in $M$ and expanded to [33]

$\displaystyle P^{\rm md}_{j}$	$\displaystyle=\frac{\underline{\gamma}(M,a_{j}/2)}{\Gamma(M)}$
	$\displaystyle=\frac{1}{2}{\rm erfc}\left(-c_{j}\sqrt{\frac{M}{2}}\right)-\frac% {\exp\left(-\frac{1}{2}Mc^{2}_{j}\right)}{\sqrt{2\pi M}}\left(\frac{1}{2a_{j}/% M-1}-\frac{1}{c_{j}}\right)$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ -\emph{o}\left(\frac{\exp(-M)}{\sqrt{M}% }\right)$
	$\displaystyle=-\frac{\exp\left(-\frac{1}{2}Mc^{2}_{j}\right)}{2\sqrt{2\pi M}}% \left(\frac{1}{\alpha_{j}-1}+\frac{1}{c_{j}}\right)+\emph{o}\left(\frac{\exp(-% M)}{\sqrt{M}}\right)$	(60)

and

$\displaystyle P^{\rm fa}_{j}$	$\displaystyle=\frac{\overline{\gamma}(M,b_{j}/2)}{\Gamma(M)}$
	$\displaystyle=\frac{1}{2}{\rm erfc}\left(d_{j}\sqrt{\frac{M}{2}}\right)+\frac{% \exp\left(-\frac{1}{2}Md^{2}_{j}\right)}{\sqrt{2\pi M}}\left(\frac{1}{2b_{j}/M% -1}-\frac{1}{d_{j}}\right)$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ +\emph{o}\left(\frac{\exp(-M)}{\sqrt{M}% }\right)$
	$\displaystyle=\frac{\exp\left(-\frac{1}{2}Md^{2}_{j}\right)}{2\sqrt{2\pi M}}% \left(\frac{1}{\beta_{j}-1}+\frac{1}{d_{j}}\right)+\emph{o}\left(\frac{\exp(-M% )}{\sqrt{M}}\right),$	(61)

where the complementary error function

\displaystyle{\rm erfc}(x)=\frac{\exp(-x^{2})}{\sqrt{\pi}x}\left(1+\emph{o}% \left(\frac{1}{x^{2}}\right)\right)

(62)

and $c_{j}=-\sqrt{2(\alpha_{j}-1-\log(\alpha_{j}))}$ , $d_{j}=\sqrt{2(\beta_{j}-1-\log(\beta_{j}))}$ . Then, we have

$\displaystyle\lim\limits_{M\rightarrow\infty}P^{\rm md}_{j}$	$\displaystyle=\lim\limits_{M\rightarrow\infty}-\frac{\exp\left(-\frac{1}{2}Mc^% {2}_{j}\right)}{2\sqrt{2\pi M}}\left(\frac{1}{\alpha_{j}-1}+\frac{1}{c_{j}}\right)$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ +\emph{o}\left(\frac{\exp(-M)}{\sqrt{M}% }\right)$
	$\displaystyle=0$	(63)

and

$\displaystyle\lim\limits_{M\rightarrow\infty}P^{\rm fa}_{j}$	$\displaystyle=\lim\limits_{M\rightarrow\infty}\frac{\exp\left(-\frac{1}{2}Md^{% 2}_{j}\right)}{2\sqrt{2\pi M}}\left(\frac{1}{\beta_{j}-1}+\frac{1}{d_{j}}\right)$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ +\emph{o}\left(\frac{\exp(-M)}{\sqrt{M}% }\right)$
	$\displaystyle=0.$	(64)

Thus, the average detection error probability

\displaystyle\lim\limits_{M\rightarrow\infty}P_{1}

\displaystyle=\lim\limits_{M\rightarrow\infty}\left[\frac{K_{a}}{2^{J}}\bar{P}% ^{\rm md}+(1-\frac{K_{a}}{2^{J}})\bar{P}^{\rm fa}\right]=0.

(65)

From another point of view, due to the fact that

\displaystyle\frac{g_{j}}{u^{\infty}}>\log(1+\frac{g_{j}}{u^{\infty}})>\frac{g% _{j}}{g_{j}+u^{\infty}},

(66)

$\lim\limits_{M\rightarrow\infty}\|\hat{\bm{x}}_{j}\|>\theta_{j}$ when $\delta_{j}=1$ and $\lim\limits_{M\rightarrow\infty}\|\hat{\bm{x}}_{j}\|<\theta_{j}$ when $\delta_{j}=0$ always hold. In other words, the codeword activity decision will be always right when $M$ is large enough. So $P_{1}$ goes to zero.

For Bayesian codeword stitching, $M\rightarrow\infty$ means that the number of attribute of data to be classified tends to infinity, which will cause CER to increase. However, $P_{1}$ goes to zero faster than $P_{\rm CER}$ goes up, so that the final decoding error probability $P_{2}$ still shows a downward trend as $M\rightarrow\infty$ . On the other hand, as analyzed in Section IV-A, large $M$ indicates high complexity. Hence, it is necessary to choose an appropriate $M$ to balance the error probability and complexity of decoding.

Additionally, it is interestingly found that due to channel hardening in the scenario with a large-scale antenna array [34], Bayesian classification in codeword stitching will be degenerated to random classification. In this case, the final error probability of Bayesian joint decoding when $M\rightarrow\infty$ can be simplified to

$\displaystyle\lim\limits_{M\rightarrow\infty}P_{2}$	$\displaystyle=(1-\frac{K_{a}}{2^{J}})\lim\limits_{M\rightarrow\infty}\bar{P}^{% \rm fa}$
	$\displaystyle\ \ \ \ \ +\frac{K_{a}}{2^{J}}(1-\lim\limits_{M\rightarrow\infty}% \bar{P}^{\rm md})\lim\limits_{M\rightarrow\infty}P_{\rm CER}$
	$\displaystyle\approx 0+\frac{K_{a}}{2^{J}}\left(1-\frac{1}{K_{a}}\right)$
	$\displaystyle=\frac{K_{a}-1}{2^{J}},$	(67)

which demonstrates its performance saturation value. Note that this performance constraint is affected by the Rayleigh fading channel model-based Bayesian classifier, and we can use preprocessing techniques such as dimension reduction or machine learning methods such as support vector machine to improve the performance of the channel features-based codeword concatenation.

IV-C2 The influence of the transmit power on Bayesian joint decoding

Similarly, it is helpful to examine the decoding performance with a sufficiently high transmit power, which will imply an upper bound on performance regardless of transmit costs.

When transmit power $P_{t}\rightarrow\infty$ , combined with $\sigma^{2}=N_{0}/(n_{0}P_{t})$ , we have

	$\displaystyle\lim\limits_{P_{t}\rightarrow\infty}\alpha_{j}$	$\displaystyle=\lim\limits_{P_{t}\rightarrow\infty}2a_{j}/M$
		$\displaystyle=\lim\limits_{u^{\infty}\rightarrow 0}u^{\infty}\log(1+g_{j}/u^{% \infty})/g_{j}=0$		(68)

and

	$\displaystyle\lim\limits_{P_{t}\rightarrow\infty}\beta_{j}$	$\displaystyle=\lim\limits_{P_{t}\rightarrow\infty}2b_{j}/M$
		$\displaystyle=\lim\limits_{u^{\infty}\rightarrow 0}(u^{\infty}+g_{j})\log(1+g_% {j}/u^{\infty})/g_{j}=\infty.$		(69)

Therefore,

	$\displaystyle\lim\limits_{P_{t}\rightarrow\infty}P^{\rm md}_{j}$	$\displaystyle=\lim\limits_{\alpha_{j}\rightarrow 0}-\frac{\exp\left[-M\left(% \alpha_{j}-1-\log(\alpha_{j})\right)\right]}{2\sqrt{2\pi M}}$
		$\displaystyle\ \ \left(\frac{1}{\alpha_{j}-1}+\frac{1}{-\sqrt{2(\alpha_{j}-1-% \log(\alpha_{j}))}}\right)=0$		(70)

and

	$\displaystyle\lim\limits_{P_{t}\rightarrow\infty}P^{\rm fa}_{j}$	$\displaystyle=\lim\limits_{\beta_{j}\rightarrow\infty}\frac{\exp\left[-M\left(% \beta_{j}-1-\log(\beta_{j})\right)\right]}{2\sqrt{2\pi M}}$
		$\displaystyle\ \ \left(\frac{1}{\beta_{j}-1}+\frac{1}{\sqrt{2(\beta_{j}-1-\log% (\beta_{j}))}}\right)=0.$		(71)

That is, the average detection error probability

\displaystyle\lim\limits_{P_{t}\rightarrow\infty}P_{1}

\displaystyle=\lim\limits_{P_{t}\rightarrow\infty}\left[\frac{K_{a}}{2^{J}}% \bar{P}^{\rm md}+(1-\frac{K_{a}}{2^{J}})\bar{P}^{\rm fa}\right]=0.

(72)

Yet, it is hard to obtain a closed-form expression of final error probability of decoding $\lim\limits_{P_{t}\rightarrow\infty}P_{2}$ due to the CER in an uncertain form when $P_{t}\rightarrow\infty$ . In the simulations, the above theoretical results will be verified.

V Numerical Results

In this part, we conduct extensive simulations to evaluate the effectiveness of the proposed Bayesian joint decoding-based massive uncoupled unsourced random access scheme. Unless extra specified, the main simulation parameters are set as: $K_{\textmd{tot}}=500$ , $K_{a}=50$ , $M=32$ , $b=96$ bits, $J=12$ bits, $L=8$ , and $n_{0}=1024$ . In addition, the codebook matrix C is obtained by $n_{0}$ randomly selected rows of an $2^{J}$ -point DFT matrix. Thereby, $\textbf{CC}^{H}=2^{J}/n_{0}\textbf{I}$ . The noise power is assumed to be $N_{0}=-110\ \textmd{dBm}$ and the signal-to-noise ratio (SNR) is defined as ${\rm SNR}_{k}=\frac{P_{t}\tilde{g}_{k}}{N_{0}}$ . For convenience, set the minimum receive SNR be $15\ {\rm dB}$ . The path loss model of the uplink channel for UE $k$ is $\tilde{g}_{k}[\textmd{dB}]=-128.1-37.6\log_{10}(d_{k})$ [32]. $d_{k}$ is the distance between UE $k$ and the BS, and it is randomly distributed in $(0,0.5]$ km.

We first observe the convergence of the proposed algorithm with different numbers of active UEs. As shown in Fig. 2, the mean normalized MSE (NMSE) of the proposed codeword detection, i.e., $\textmd{NMSE}=\tfrac{1}{2^{J}M}\|\hat{\textbf{X}}-\textbf{X}\|_{F}^{2}/\|% \textbf{X}\|_{F}^{2}$ , which is presented by three solid lines, converges within 15 iterations with precision $\varpi_{D}=1e-5$ for all three cases. Additionally, the convergence of class labels, i.e., the convergence of the proposed codeword stitching, which is presented by the dotted line, can also be guaranteed within 15 iterations with precision $\varpi_{S}=1e-15$ . These facts all imply the low complexity of the proposed Bayesian joint decoder.

Fig. LABEL:sim1 examines the impact of SNR and the accuracy of $\hat{K}_{a}$ on the performance of the proposed Bayesian joint decoding algorithm. In Fig. LABEL:sim1, the blue bars represent the error probabilities of $\hat{K}_{a}$ (the estimated $\hat{K}_{a}$ is larger or smaller than the true $K_{a}$ ) under different SNR levels. The red solid line indicates the scenario where $\hat{K}_{a}$ is assumed to be perfectly estimated. This line solely reflects the influence of SNR on the final error probability. On the other hand, the red dashed line represents the situation where the BS cannot accurately determine $K_{a}$ active codewords in each sub-slot after the proposed Bayesian codeword detection. This line demonstrates the combined effect of both SNR and the estimation error of $\hat{K}_{a}$ on the algorithm’s performance. Intuitively, as the SNR increases, the gap between the two red lines gradually narrows, and they almost coincide when the SNR exceeds 10 dB. This is because, at low SNR levels, the accuracy of estimated $\hat{K}_{a}$ is relatively poor, which makes the proposed codeword stitching not work well. When the SNR surpasses a certain threshold, the estimation error of the number of active devices becomes negligible in its impact on the performance of the proposed Bayesian joint decoding algorithm. Therefore, under favorable conditions, it is reasonable to assume that $\hat{K}_{a}$ is accurately estimated.

Fig. 4 compares the final error probability of the proposed Algorithm 1-based uncoupled unsourced random access scheme with different unsourced random access schemes versus the number of BS antennas. For comparison, we conduct the following schemes.

(i) Coupled unsourced random access with MMV-AMP codeword detection used and tree code stitching used in [16] (written as “MMV-AMP + Tree” in the legend). To ensure the same length of message $b=96$ bits, the message is divided into $L=32$ sub-blocks of length $J=12$ bits with parity check in $L$ sub-blocks occupying $\{0,9,...,9,12,12,12\}$ bits. For fairness, the same EM parameter estimation as in this paper is employed.

(ii) Coupled unsourced random access with energy codeword detection (ECD) and tree code stitching (written as “ECD + Tree” in the legend). For codeword detection, compute the codeword energy by correlating the received signal with each column in codebook and obtain the indices corresponding to $K_{a}$ largest values. For tree code stitching, the parameter settings are the same as scheme (i).

(iii) Uncoupled unsourced random access with MMV-OAMP codeword detection in this paper and K-means stitching adopted in [20, 21] (written as “MMV-OAMP + K-means” in the legend). The parameters are set the same as this paper.

Intuitively, the error probabilities of all the schemes decrease monotonically as the number of BS antennas increases due to the array gains and the proposed scheme with OAMP detector and Bayesian splicer has lower error probability. Furthermore, from the perspective of spectral efficiency (SE) per user per channel use, the SE of the proposed scheme and scheme (iii) can be calculated as $SE_{0}=SE_{3}=\frac{J}{n_{0}}=\frac{12}{1024}$ bits/user/channel-use. Since extra parity check bits introduced in (i) and (ii) occupy some positions, their SE are given by $SE_{1}=SE_{2}=\frac{b}{Ln_{0}}=\frac{8\times 12}{32\times 1024}=\frac{1}{4}SE_% {0}$ . However, the coupled transmission schemes possess their own advantages. They do not exhibit the error floor effect that arises from channel features-based codeword stitching and have the potential to accommodate a larger number of active devices. For the proposed scheme and scheme (iii), which have the same SE, it can be seen that when the 3D antenna deployment of the original solution in [21] is missing, that is, the additional angle domain information is unavailable, the proposed Bayesian classifier outperforms the K-means method with a small number of antennas. This proves the fact that Bayesian classification has the smallest CER. Additionally, from the perspective of computational complexity, the complexity of each scheme can be given by

(i) $\mathcal{O}(J2^{J}M)+\mathcal{O}_{\rm tree}(Ka,L,a_{l})$ [16],

(ii) $\mathcal{O}(2^{J}n_{0}M^{2})+\mathcal{O}_{\rm tree}(Ka,L,a_{l})$ [16],

(iii) $\mathcal{O}(J2^{J}M+MLK_{a}^{3})$ [21],
where the complexity of tree decode $\mathcal{O}_{\rm tree}(Ka,L,a_{l})=K_{a}(L-1)+K_{a}\sum\nolimits_{n=2}^{L-1}% \sum\limits_{m=2}^{n}K_{a}^{n-m}(K_{a}-1)\prod\nolimits_{l=m}^{n}(2^{-a_{l}})$ and $a_{l}$ is the length of parity check in $l$ -th sub-block [9]. Obviously, the proposed algorihtm with complexity of $\mathcal{O}(J2^{J}M+MK_{a}^{2}(L-1))$ has lower complexity. In a nutshell, these facts indicate the feasibility and effectiveness of the proposed algorithm in some scenarios where the receiver requires low computational cost.

Then, we show the impact of SNR on the proposed algorithm for different numbers of BS antennas. In Fig. 5, it is seen that the final error probability of Bayesian joint decoding decreases as SNR increases in all cases. Moreover, the decoding performance can be enhanced by adding BS antennas due to the increasing array gains. When the error probability $P_{2}<0.02$ , only $0\ \textmd{dB}$ is needed for $M=64$ while $20\ \textmd{dB}$ is acquired for $M=32$ . Thus, it is likely to reduce the transmission costs of device terminals by deploying more antennas at the BS.

Next, we check the asymptotic analysis with the number of BS antennas $M$ in the extreme case. It can be observed from Fig. 6 that no matter it is error probability of codeword detection $P_{1}$ or the final error probability $P_{2}$ , the simulation data obtained from the experiments are basically consistent with the theoretical value calculated from the analysis expression. Specifically, the error probability of codeword detection approaches to zero when the number of BS antennas is sufficiently large, and the final error probability tends to the order of magnitude of $K_{a}/2^{J}$ . To explain, since the channel hardening associated with the large number of antennas, the error of Bayesian classification method which employs channel information for codeword stitching tends to be constant in such cases, so that the final error probability decreases with the increment of $M$ at the beginning and tends to saturate when $M$ is large enough.

Finally, Fig. 7 confirms the codeword detection performance when the transmit cost is not a concern. Intuitively, the curves of theoretical and simulated error probability of codeword detection also almost overlap and both of them tend to zero with increment of SNR. When the SNR is large enough, some simulation results are always equal to $0$ , such as $10\ \textmd{dB}$ for $K_{a}=30$ , $30\ \textmd{dB}$ for $K_{a}=50$ , and thus cannot be represented in the figure. Meanwhile, it can be found that detection performance improves naturally as the number of active UEs decreases due to less interference.

In summary, the proposed Bayesian joint decoding algorithm has a promising potential of improving the performance of uncoupled unsourced random access in 6G wireless networks.

VI Conclusion

This paper proposed a high-efficiency massive uncoupled unsourced random access scheme for 6G wireless networks without requiring extra parity check bits. A low-complexity Bayesian joint decoding algorithm was designed to implement codeword detection and stitching based on channel statistical information. Both theoretical analysis and numerical simulations confirmed that the proposed algorithm had low complexity and good performance. Moreover, asymptotic analysis showed that the error probability of codeword detection tended to zero as the number of BS antennas and the transmit power increased.

Appendix A Derivation of Lemma 2

Based on (55) and (56), the state evolution with DFT matrix C in Bayesian codeword detection can be simplified as below

	$\displaystyle u^{t}$	$\displaystyle=\gamma_{\rm SE}(v^{t})=\sigma^{2}+(2^{J}/n_{0}-1)v^{t},$		(73)
	$\displaystyle v^{t+1}$	$\displaystyle=\phi_{\rm SE}(u^{t})=\left[1/\Omega_{\rm MMSE}^{t}-1/u^{t}\right% ]^{-1},$		(74)

where

$\displaystyle\Omega_{\rm MMSE}^{t}$	$\displaystyle=\frac{1}{M}{\rm E}\left\{\varepsilon_{j}\left\\|\hat{\bm{s}}_{j}^% {t}-\bm{x}_{j}\right\\|_{2}^{2}\right\}$
	$\displaystyle=\frac{1}{M}{\rm E}\{\varepsilon_{j}\}{\rm E}\{\left\\|\pi_{j}% \frac{g_{j}}{g_{j}+u^{t}}\left(\bm{x}_{j}+\mathbbm{n}_{j}^{t}\right)-\bm{x}_{j% }\right\\|_{2}^{2}\}$
	$\displaystyle\leq\frac{1}{M}{\rm E}\{\varepsilon_{j}\}{\rm E}\{\left(\frac{g_{% j}}{g_{j}+u^{t}}\right)^{2}\left(\bm{x}_{j}+\mathbbm{n}_{j}^{t}\right)^{H}% \left(\bm{x}_{j}+\mathbbm{n}_{j}^{t}\right)$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\bm{x}_{j}^{H}\bm{x}_{j}-2% \frac{g_{j}}{g_{j}+u^{t}}\bm{x}_{j}^{H}\bm{x}_{j}\}$
	$\displaystyle=\frac{1}{M}{\rm E}\{\varepsilon_{j}\}{\rm E}\{\left(1-\frac{g_{j% }}{g_{j}+u^{t}}\right)^{2}\bm{x}_{j}^{H}\bm{x}_{j}$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +\left(\frac{g_{j}}{g_{j}+u^{% t}}\right)^{2}{\mathbbm{n}_{j}^{t}}^{H}\mathbbm{n}_{j}^{t}\}$
	$\displaystyle={\rm E}\{\varepsilon_{j}\}{\rm E}\{\left(1-\frac{g_{j}}{g_{j}+u^% {t}}\right)^{2}g_{j}+\left(\frac{g_{j}}{g_{j}+u^{t}}\right)^{2}u^{t}\}$
	$\displaystyle={\rm E}\{\varepsilon_{j}\}{\rm E}\{\frac{(u^{t})^{2}g_{j}+g_{j}^% {2}u^{t}}{(g_{j}+u^{t})^{2}}\}$
	$\displaystyle={\rm E}\{\frac{\varepsilon_{j}g_{j}u^{t}}{g_{j}+u^{t}}\}.$	(75)

Thus, we have

\displaystyle v^{t}=\left[1/{\rm E}\left\{\frac{\varepsilon_{j}g_{j}u^{t}}{g_{% j}+u^{t}}\right\}-1/u^{t}\right]^{-1}={\rm E}\left\{\frac{\varepsilon_{j}g_{j}% u^{t}}{(1-\varepsilon_{j})g_{j}+u^{t}}\right\}.

(76)

Substituting the above equation into the state evolution, the recursive expression can be rewritten as

\displaystyle u^{t+1}=\sigma^{2}+(2^{J}/n_{0}-1){\rm E}\left\{\frac{% \varepsilon_{j}g_{j}u^{t}}{(1-\varepsilon_{j})g_{j}+u^{t}}\right\}.

(77)

Then, we define the function

\displaystyle f(x)=x-\sigma^{2}-(2^{J}/n_{0}-1){\rm E}\left\{\frac{\varepsilon% _{j}g_{j}x}{(1-\varepsilon_{j})g_{j}+x}\right\}.

(78)

The derivative of $f(x)$ with respect to $x$ is

\displaystyle f^{\prime}(x)=1-(2^{J}/n_{0}-1){\rm E}\left\{\frac{\varepsilon_{% j}}{1-\varepsilon_{j}}\right\}{\rm E}\left\{\frac{g_{j}^{2}}{\left(g_{j}+x/(1-% \varepsilon_{j})\right)^{2}}\right\}.

(79)

Due to

\displaystyle 0<{\rm E}\left\{\frac{g_{j}^{2}}{\left(g_{j}+x/(1-\varepsilon_{j% })\right)^{2}}\right\}<1,

(80)

$(2^{J}/n_{0}-1){\rm E}\left\{\frac{\varepsilon_{j}}{1-\varepsilon_{j}}\right\}<1$ , that is $K_{a}<n_{0}$ , is needed to be satisfied to guarantee $f^{\prime}(x)>0$ , i.e., the monotonicity of $f(x)$ . Consequently, the fixed point of state evolution is unique. Next, from (77), we have

\displaystyle\sigma^{2}\leq u^{t}\leq\sigma^{2}+(2^{J}/n_{0}-1){\rm E}\left\{% \frac{\varepsilon_{j}}{1-\varepsilon_{j}}\right\}u^{t},

(81)

and it can be further transformed to

\displaystyle u^{t}\leq\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})\varepsilon_{j}}% {(1-\varepsilon_{j})n_{0}}}.

(82)

Finally, substituting this upper bound of $u^{t}$ into $f(x)$ , we have (A) at the top of the next page.

$\displaystyle f(\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})\varepsilon_{j}}{(1-% \varepsilon_{j})n_{0}}})$	$\displaystyle=\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})\varepsilon_{j}}{(1-% \varepsilon_{j})n_{0}}}-\sigma^{2}-(2^{J}/n_{0}-1)\frac{\sigma^{2}}{1-\frac{(2% ^{J}-n_{0})\varepsilon_{j}}{(1-\varepsilon_{j})n_{0}}}{\rm E}\left\{\frac{% \varepsilon_{j}g_{j}}{(1-\varepsilon_{j})g_{j}+\frac{\sigma^{2}}{1-\frac{(2^{J% }-n_{0})\varepsilon_{j}}{(1-\varepsilon_{j})n_{0}}}}\right\}$
	$\displaystyle=\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})\varepsilon_{j}}{(1-% \varepsilon_{j})n_{0}}}-\sigma^{2}-(2^{J}/n_{0}-1)\frac{\sigma^{2}}{1-\frac{(2% ^{J}-n_{0})\varepsilon_{j}}{(1-\varepsilon_{j})n_{0}}}{\rm E}\left\{\frac{% \varepsilon_{j}}{1-\varepsilon_{j}}\right\}\underbrace{{\rm E}\left\{\frac{g_{% j}}{g_{j}+\frac{\sigma^{2}}{(1-\varepsilon_{j})-\frac{(2^{J}-n_{0})\varepsilon% _{j}}{n_{0}}}}\right\}}\limits_{\approx 1}$
	$\displaystyle\approx\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})\varepsilon_{j}}{(1% -\varepsilon_{j})n_{0}}}-\sigma^{2}-(2^{J}/n_{0}-1)\frac{\sigma^{2}}{1-\frac{(% 2^{J}-n_{0})\varepsilon_{j}}{(1-\varepsilon_{j})n_{0}}}{\rm E}\left\{\frac{% \varepsilon_{j}}{1-\varepsilon_{j}}\right\}$
	$\displaystyle=0.$	(83)

Therefore, it has been proved that the fixed point of state evolution is $u^{\infty}\approx\frac{\sigma^{2}}{1-\frac{(2^{J}-n_{0})\varepsilon_{j}}{(1-% \varepsilon_{j})n_{0}}}$ .

References

[1] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615-637, Mar. 2021.
[2] X. Chen, Massive Access for Cellular Internet of Things Theory and Technique, Germany: Springer, 2019.
[3] Z. Gao, M. Ke, Y. Mei, L. Qiao, S. Chen, D. W. K. Ng, H. V. Poor, “Compressive sensing-based grant-free massive access for 6G massive communication,” IEEE Internet of Things J., vol. 11, no. 5, pp. 7411-7435, Mar. 2024.
[4] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the Internet of Things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88-99, Sep. 2018.
[5] Z. Zhang, X. Wang, Y. Zhang, and Y. Chen, “Grant-free rateless multiple access: A novel massive access scheme for internet of things,” IEEE Commun. Lett., vol. 20, no. 10, pp. 2019-2022, Oct. 2016.
[6] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabled massive MIMO: A compressive sensing approach,” IEEE Trans. Commun., vol. 66, no. 12, pp. 6164-6175, Dec. 2018.
[7] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing-based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764-779, 2020.
[8] Y. Polyanskiy, “A perspective on massive random access,” in Proc. IEEE International Symposium on Information Theory (ISIT), pp. 2523-2527, Aug. 2017.
[9] V. K. Amalladinne, J. Chamberland, and K. R. Narayanan, “A coupled compressive sensing scheme for unsourced multiple access,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6628-6632, Sep. 2018.
[10] V. K. Amalladinne, A. K. Pradhan, C. Rush, J. Chamberland, and K. R. Narayanan, “On approximate message passing for unsourced access with coded compressed sensing,” in Proc. IEEE International Symposium on Information Theory (ISIT), Aug. 2020.
[11] Y. Wu, X. Gao, S. Zhou, W. Yang, Y. Polyanskiy, and G. Caire, “Massive access for future wireless communication system,” IEEE Wireless Commun., vol. 27, no. 4, pp. 148-156, Apr. 2020.
[12] A. K. Pradhan, V. K. Amalladinne, K. R. Narayanan, and J. Chamberland, “Polar coding and random spreading for unsourced multiple access,” in Proc. IEEE International Conference on Communications (ICC), Jul. 2020.
[13] F. Tian, X. Chen, L. Liu, and D. W. K. Ng, “Massive unsourced random access over rician fading channels: Design, analysis, and optimization,” IEEE Internet Things J., vol. 9, no. 18, pp. 17675-17688, Sep. 2022.
[14] A. Fengler, G. Caire, P. Jung, and S. Haghighatshoar, “Massive MIMO unsourced random access,” arXiv:1901.00828, 2019.
[15] A. Fengler, P. Jung, and G. Caire, “SPARCs for unsourced random access,” IEEE Trans. Info. Theory, vol. 67, no. 10, pp.6894-6915, Oct. 2021.
[16] A. Fengler, S. Haghighatshoar, P. Jung, and G. Caire, “Non-Bayesian activity detection, large-scale fading coefficient estimation, and unsourced random access with a massive MIMO receiver,” IEEE Trans. Info. Theory, vol. 67, no. 5, pp. 2925-2951, Mar. 2021.
[17] J. Che, Z. Zhang, Z. Yang, X. Chen, C. Zhong, and D. W. K. Ng, “Unsourced random massive access with beam-space tree decoding,” IEEE J. Sel. Areas Commun., vol. 40, no.4, pp. 1146-1161, Apr. 2022.
[18] V. K. Amalladinne, A. K. Pradhan, C. Rush, and J. Chamberland, “Unsourced random access with coded compressed sensing: Integrating AMP and belief propagation,” IEEE Trans. Info. Theory, vol. 68, no. 4, pp. 2384-2409, Apr. 2022.
[19] A. K. Pradhan, V. Amalladinne, A. Vem, K. R. Narayanan, and J-F. Chamberland, “Sparse IDMA: A joint graph-based coding scheme for unsourced random access,” IEEE Trans. Commun., vol. 70, no. 11, pp. 7124-7133, Nov. 2022.
[20] V. Shyianov, F. Bellili, A. Mezghani, and E. Hossain, “Massive unsourced random access based on uncoupled compressive sensing: Another blessing of massive MIMO,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 820-834, Aug. 2020.
[21] X. Xie, Y. Wu, J. An, J. Gao, W. Zhang, C. Xing, K-K. Wong, and C. Xiao, “Massive unsourced random access: exploiting angular domain sparsity,” IEEE Trans. Commun., vol. 70, no. 4, pp. 2480-2498, Apr. 2022.
[22] T. Li, Y. Wu, M. Zheng, W. Zhang, C. Xing, J. An, X. Xia, and C. Xiao, “Joint device detection, channel estimation, and data decoding with collision resolution for MIMO massive unsourced random access,” IEEE J. Sel. Areas Commun., vol. 40, no. 5, pp. 1535-1555, May 2022.
[23] J. Ziniel and P. Schniter, “Efficient high-dimensional inference in the multiple measurement vector problem,” IEEE Trans. Signal Process., vol. 61, no. 2, pp. 340-354, Jan. 2013.
[24] J. Ma and L. **, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020-2033, 2017.
[25] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” IEEE Trans. Info. Theory, vol. 65, no. 10, pp. 6664-6684, Oct. 2019.
[26] Y. Cheng, L. Liu, and L. **, “Orthogonal AMP for massive access in channels with spatial and temporal correlations,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 726-740, Mar. 2021.
[27] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,” IEEE Trans. Signal Process., vol. 61, no. 19, pp. 4658-4672, Oct. 2013.
[28] F. Tian, L. Liu, and X. Chen, “Generalized memory approximate message passing for generalized linear model,” IEEE Trans. Signal Process., vol. 70, pp. 6404-6418, 2022.
[29] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs, NJ: Prentice-Hall, 1993.
[30] A. H. Jahromi and M. Taheri, “A non-parametric mixture of Gaussian naive Bayes classifiers based on local independent features,” in Proc. 2017 Artificial Intelligence and Signal Processing Conference (AISP), Mar. 2018.
[31] M. Ontivero-Ortega, A. Lage-Castellanos, G. Valente, R. Goebel, and M. Valdes-Sosa, “Fast Gaussian Naive Bayes for searchlight classification analysis,” NeuroImage, vol. 163, pp. 471-479, Dec. 2017.
[32] L. Liu and W. Yu, “Massive connectivity with massive MIMO-Part 1: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933-2946, Jun. 2018.
[33] W. Gautschi, “The incomplete Gamma functions since Tricomi,” Atti dei Convegni Linci, no. 1998, pp. 203-237, 2011.
[34] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Trans. Commun., vol. 61, no. 4, pp. 1436-1449, Apr. 2013.
[35] M. Gkagkos, K. R. Narayanan, J.-F. Chamberland, and C. N. Georghiades, “FASURA: A scheme for quasi-static fading unsourced random access channels,” IEEE Trans. Commun., vol. 71, no. 11, pp. 6391-6401, Nov. 2023.
[36] M. Ke, Z. Gao, M. Zhou, D. Zheng, D. W. K. Ng, and H. V. Poor, “Next-generation URLLC with massive devices: A unified semi-blind detection framework for sourced and unsourced random access,” IEEE J. Sel. Areas Commun., vol. 41, no. 7, pp. 2223-2244, Jul. 2023.

$\displaystyle P_{\rm CER}$	$\displaystyle=\sum\limits_{k^{\prime}=1}^{K_{a}}P(\mathcal{C}_{k^{\prime}})P({% \rm CER}\|\mathcal{C}_{k^{\prime}})$
	$\displaystyle=\sum\limits_{k^{\prime}=1}^{K_{a}}P(\mathcal{C}_{k^{\prime}})% \left(1-1/L\sum\limits_{k\in\mathcal{C}_{k^{\prime}}}P(\bm{h}_{k,l}\|\mathcal{C% }_{k^{\prime}})\right)$
	$\displaystyle=\sum\limits_{k^{\prime}=1}^{K_{a}}P(\mathcal{C}_{k^{\prime}})-% \sum\limits_{k^{\prime}=1}^{K_{a}}\left(P(\mathcal{C}_{k^{\prime}})/L\sum% \limits_{k\in\mathcal{C}_{k^{\prime}}}P(\bm{h}_{k,l}\|\mathcal{C}_{k^{\prime}})\right)$
	$\displaystyle=1-\sum\limits_{k^{\prime}=1}^{K_{a}}\left(1/L\sum\limits_{k\in% \mathcal{C}_{k^{\prime}}}P(\mathcal{C}_{k^{\prime}})P(\bm{h}_{k,l}\|\mathcal{C}% _{k^{\prime}})\right),$	(51)