Joint Channel and Data Estimation for Multiuser Extremely Large-Scale MIMO Systems

Kabuto Arai, , Koji Ishibashi, ,
Hiroki Iimori, , Paulo Valente Klaine, and Szabolcs Malomsoky K. Arai and K. Ishibashi are with the Advanced Wireless and Communication Research Center (AWCC), The University of Electro-Communications, Tokyo 182-8285, Japan (e-mail: [email protected], [email protected])H. Iimori, P. V. Klaine, and S. Malomsoky are with Ericsson Research, Ericsson Japan K.K., (e-mail: {hiroki.iimori, paulo.valente.klaine, szabolcs.malomsoky}@ericsson.com)

Abstract

This paper proposes a joint channel and data estimation (JCDE) algorithm for uplink multiuser extremely large-scale multiple-input-multiple-output (XL-MIMO) systems. The initial channel estimation is formulated as a sparse reconstruction problem based on the angle and distance sparsity under the near-field propagation condition. This problem is solved using non-orthogonal pilots through an efficient low complexity two-stage compressed sensing algorithm. Furthermore, the initial channel estimates are refined by employing a JCDE framework driven by both non-orthogonal pilots and estimated data. The JCDE problem is solved by sequential expectation propagation (EP) algorithms, where the channel and data are alternately updated in an iterative manner. In the channel estimation phase, integrating Bayesian inference with a model-based deterministic approach provides precise estimations to effectively exploit the near-field characteristics in the beam-domain. In the data estimation phase, a linear minimum mean square error (LMMSE)-based filter is designed at each sub-array to address the correlation due to energy leakage in the beam-domain arising from the near-field effects. Numerical simulations reveal that the proposed initial channel estimation and JCDE algorithm outperforms the state-of-the-art approaches in terms of channel estimation, data detection, and computational complexity.

Index Terms:

Extremely large-scale-MIMO (XL-MIMO), near-field, joint channel and data estimation, compressed sensing

I Introduction

To meet the demands for high spectral efficiency in future 6G systems [1], it is essential to further exploit spatial multiplexing and abundant spectral resources at mid/high frequency bands such as centimeter-wave (cmWave), milimeter-wave (mmWave), and sub-terahertz (sub-THz). In light of these requirements, extremely large-scale multiple-input-multiple-output (XL-MIMO) [2, 3, 4] has emerged as a promising technology, enabling sharp directive beamforming and extensive spatial multiplexing. However, the significant increase in antenna aperture leads to an expansion of the Rayleigh distance [5, 6], defined as the border between the near-field and far-field regions. Thus, the near-field effects in XL-MIMO systems may not be negligible in some practically-relevant circumstances, such as in small area coverage with high carrier frequency bands [7].

Unlike the conventional far-field channel, the near-field channel depends not only on gains and angles (e.g., angles of arrivals (AoAs)) but also on distances from signal sources such as user equipments (UEs) and scatterers. Hence, conventional channel estimation methods such as [8, 9], which exploit the beam-domain sparsity under the assumption of planar wavefront, experience significant performance degradation in the near-field due to energy leakage effects in the beam-domain. To tackle this issue, the authors in [7] have proposed a polar-domain simultaneous orthogonal matching pursuit (P-SOMP) algorithm, which leverages the angle and distance sparsity known as polar-domain sparsity arising from the aforementioned near-field peculiar characteristics. In P-SOMP, polar (angle-distance) grids are generated by spatially quantizing the polar-domain to utilize compressed sensing techniques, however, in multiuser systems, P-SOMP requires orthogonal pilots to separate multiple UEs. As such, the proposed approach results in non-negligible overhead as the number of UEs grows, especially in XL-MIMO systems capable of spatially multiplexing many UEs.

Considering these challenges, a near-field channel estimation algorithm, which works even with non-orthogonal pilots, has recently been proposed in [10] in the context of grant-free XL-MIMO systems¹¹1Note that while this method was originally developed for jointly active user detection and channel estimation, it is also applicable to sole channel estimation problems without active user detection.. However, due to the non-orthogonality among pilots, inter-user interference still remains, so it is necessary to jointly estimate all UE channel components. As a result, this joint estimation significantly increases computational complexity because it requires UE-wise polar grids, which leads to a large grid size. Therefore, the authors in [10] proposed a 2D-compressive sampling matching pursuit (CoSaMP) algorithm, which is based on the CoSaMP algorithm in the polar-UE 2D domain constructed by UE-wise polar grids. While the 2D-CoSaMP algorithm can mitigate computational complexity, its estimation performance is hindered by overfitting to noisy measurements, which results from the inverse operation on over-sampled estimates. Consequently, subsequent data detection suffers from severe performance deterioration, particularly with high-order modulation.

One of the prospective solutions to obtain accurate channel estimate with non-orthogonal pilots is joint channel and data estimation (JCDE) [11, 12, 13, 14, 15], where not only pilot sequences but also estimated data symbols are utilized as pilot replicas, leveraging their statistical quasi-orthogonality. The JCDE problem can be formulated as a bilinear inference problem (BIP). One of the prominent algorithms based on a Bayesian framework for BIP is bilinear generalized approximate message passing (BiGAMP) [16]. BiGAMP is an extension of GAMP, originally designed for a high-dimensional generalized-linear problem by utilizing loopy belief propagation (BP) with central limit theorem (CLT) and Taylor-series approximations based on large system limit to simplify the BP update. Due to the heavy dependency on the large system assumption of BiGAMP, the convergence performance deteriorates significantly in case the system is too small, the pilots are too short or when there is an improper prior distribution [11]. To address these issues, the authors in [11] have proposed bilinear Gaussian belief propagation (BiGaBP), which relaxes the BP update rules of BiGAMP, based on GaBP [17], without heavily relying on the approximation under a large system limit assumption. This relaxation of the approximation leads to performance improvements while maintaining the same complexity order as BiGAMP.

Bilinear inference algorithms that exploit physical model structures, such as channel sparsity in the beam-domain, have been investigated in [13, 14]. In these papers, the channel sparsity is modeled using a Bernoulli Gaussian (BG) prior distribution because this prior is analytically tractable with a closed-form posterior. However, the sparse structure cannot be exactly expressed by the analytically tractable prior, which leads to modeling errors resulting in performance deterioration. To tackle this issue, the authors of [12] have integrated a model-based deterministic approach [18] into a Bayesian inference framework [11], referred to as AoA-aided BiGaBP. This deterministic approach rectifies the model mismatch caused by the use of the tractable prior.

However, since this method assumes a far-field model, the model correction by the deterministic estimation is insufficient for the near-field region in XL-MIMO systems. Moreover, AoA-aided BiGaBP relies on a maximum ratio combining (MRC)-based detector, which cannot address the correlation caused by energy leakage in the beam-domain due to near-field effects. In addition, the computational complexity of the data denoising process in AoA-aided BiGaBP based on prior information for modulation constellations scales proportionally to not only the modulation order but also the number of antennas. This increase in complexity stems from the fact that BiGaBP suppresses the self-feedback of messages in the algorithmic iteration by generating antenna-wise extrinsic values based on BP rules without the Onsager correction term [11]. Consequently, the number of inputs to the denoiser function, based on modulation constellations, increases proportionally to the number of antennas.

Within the context outlined above, we propose a JCDE algorithm for multiuser XL-MIMO systems with non-orthogonal pilots. Our contributions are summarized as follows.

•

Initial channel estimation for JCDE: A novel initialization mechanism for the multiuser near-field channel estimation problem with pilot contamination due to non-orthogonal pilots is proposed, enabling an accurate initial estimate that is then used in the subsequent JCDE algorithm. The proposed initial channel estimation algorithm consists of two stages to maintain low computational complexity. In the first stage, angle and distance parameters for all UEs are estimated from the polar grids using the simultaneous orthogonal matching pursuit (SOMP) algorithm without pairing between each estimated path and the corresponding UE. Subsequently, the second stage involves the UE-path pairing using 2D-OMP [19] with a reduced number of grids constructed on the angle and distance parameters derived from the first stage. Owing to the above two-stage procedure, our proposed initial channel estimation outperforms the existing state-of-the-art scheme [10], while maintaining comparable computational complexity.
•

JCDE algorithm with model-based estimation: A novel bilinear JCDE inference algorithm is proposed, which integrates a model-based deterministic estimation mechanism with a Bayesian inference framework to address possible performance degradation due to modeling errors in the prior knowledge assumed in the Bayesian inference, as in [11]. In contrast to the state-of-the-art AoA-aided BiGaBP under the assumption of far-field propagation, our algorithm estimates the channels as an aggregation of two distinct quantities: 1) a model-based estimate that captures the near-field channel structure and 2) its modeling error that captures how much different the current estimate is from the true channel. The model-based estimate is alternately updated through a matching pursuit algorithm exploiting the near-field model structures, whereas the residual modeling errors and data symbols are jointly estimated by the expectation propagation (EP) algorithm [20], where an approximate posterior is calculated by minimizing the Kullback-Leibler (KL) divergence between the true posterior and the approximate posterior. To tackle the spatial correlation caused by the energy leakage across neighboring beams in the beam domain while reducing the complexity, we introduce a novel posterior calculation design that enables the implementation of a sub-array-wise linear minimum mean square error (LMMSE)-based filter, allowing parallel computation of the matrix inversion with a much smaller dimension than the array size. This design indeed results in lower computational complexity compared to the state-of-the-art method, owing to the modification of an extrinsic value generation that does not rely on BP rules, as shown in the simulation results.

Notation: The notation $[\mathbf{A}]_{i,j}$ indicates the $(i,j)$ element of the matrix $\mathbf{A}$ . For a random variable $x$ and a probabilistic density function $p(x)$ , $\mathbb{E}_{p(x)}[x]$ indicate the expectation of $x$ over $p(x)$ . For any function $f(\mathbf{z})$ , $\int_{\mathbf{z}/z_{i}}f(\mathbf{z})$ denotes the integral of $f(\mathbf{z})$ with respect to $\mathbf{z}$ except for $z_{i}$ . The operator $\otimes$ denotes the Kronecker product. For the index sets $\mathcal{I}=\{1,2,\ldots,I\}$ and $\mathcal{J}=\{1,2,\ldots,J\}$ , $\mathcal{I}\times\mathcal{J}$ denotes the cartesian product of $\mathcal{I}$ and $\mathcal{J}$ . $\mathcal{I}\setminus i$ represent the set $\{1,\ldots,i-1,i+1,\ldots,I\}$ .

II System Model

Refer to caption — Figure 1: Near-field channel model.

We consider an uplink XL-MIMO system, where a base station (BS) has a uniform linear array (ULA) with $N$ -antennas, serving $U$ single antenna UEs. The ULA is positioned along the $y$ -axis, where $y^{(n)}=(n-1)d-\frac{(N-1)d}{2},\ n=1,2,\ldots,N$ is the $n$ -th antenna coordinate, and $d=\lambda/2$ is antenna spacing with wavelength $\lambda$ , as shown in Fig.1.

II-A Channel Model

The near-field channel in the spatial domain between a BS and the $u$ -th UE is modeled as

\displaystyle\mathbf{h}_{u}^{\mathcal{S}}=\sum_{l=1}^{L_{u}}\mathbf{a}(\theta_% {u,l},r_{u,l})z_{u,l}=\mathbf{A}(\bm{\theta}_{u},\mathbf{r}_{u})\mathbf{z}_{u},

(1)

where $\theta_{u,l}\in\mathbb{R}$ , and $z_{u,l}\in\mathbb{C}$ denote the AoA and path gain of the $l$ -th path and the $u$ -th UE, respectively [21].

Without loss of generality, $l=1$ represents the line-of-sight (LoS) component and $l\in\{2,\ldots\hat{L}_{u}\}$ represents the non-line-of-sight (NLoS) components. Let $L\triangleq\sum_{u=1}^{U}L_{u}$ denote the total number of path including $U$ -UEs. Accordingly, $r_{u,1}\in\mathbb{R}$ denotes the distance between the BS and the $u$ -th UE, and $r_{u,l}\in\mathbb{R},\ l\neq 1$ is the distance between the BS and the $l$ -th scatterer around the $u$ -th UE. Besides, $\mathbf{a}(\theta_{u,l},r_{u,l})\in\mathbb{C}^{N\times 1}$ denotes the array response vector defined as

\displaystyle[\mathbf{a}(\theta_{u,l},r_{u,l})]_{n}=\exp\left[-j\frac{2\pi}{% \lambda}\left(r_{u,l}^{(n)}-r_{u,l}\right)\right],

(2)

where $r_{u,l}^{(n)}=\sqrt{r_{u,l}^{2}+y^{(n)2}-2r_{u,l}y^{(n)}\sin\theta_{u,l}}$ is the distance between the $n$ -th antenna and the $u$ -th UE or scatterers.

For the $u$ -th UE, let us define the collections of AoAs, distances, and path gains as $\bm{\theta}_{u}\triangleq\{\theta_{u,l}\}_{l=1}^{L_{u}}$ , $\mathbf{r}_{u}\triangleq\{r_{u,l}\}_{l=1}^{L_{u}}$ , and $\mathbf{z}_{u}\triangleq\begin{bmatrix}z_{u,1},\ldots,z_{u,L_{u}}\end{bmatrix}% ^{\mathrm{T}}\in\mathbb{C}^{L_{u}\times 1}$ , respectively, and the corresponding array response matrix is defined as $\mathbf{A}(\bm{\theta}_{u},\mathbf{r}_{u})\triangleq\begin{bmatrix}\mathbf{a}(% \theta_{u,1},r_{u,1}),\ldots,\mathbf{a}(\theta_{u,L_{u}},r_{u,L_{u}})\end{% bmatrix}\in\mathbb{C}^{N\times L_{u}}$ . Then, the channel matrix $\mathbf{H}^{\mathcal{S}}\triangleq\begin{bmatrix}\mathbf{h}_{1}^{\mathcal{S}},% \ldots,\mathbf{h}_{U}^{\mathcal{S}}\end{bmatrix}\in\mathbb{C}^{N\times U}$ is written as

\displaystyle\mathbf{H}^{\mathcal{S}}=\mathbf{A}(\bm{\theta},\mathbf{r})% \mathbf{Z},

(3)

where $\mathbf{A}(\bm{\theta},\mathbf{r})\triangleq\begin{bmatrix}\mathbf{A}(\bm{% \theta}_{1},\mathbf{r}_{1}),\ldots,\mathbf{A}(\bm{\theta}_{U},\mathbf{r}_{U})% \end{bmatrix}\in\mathbb{C}^{N\times L}$ is the array response matrix consisting of $U$ -UEs with AoAs, distances and path gains defined as $\bm{\theta}\triangleq\{\bm{\theta}_{u}\}_{u=1}^{U}$ , $\mathbf{r}\triangleq\{\mathbf{r}_{u}\}_{u=1}^{U}$ , and $\mathbf{Z}\triangleq\mathrm{blkdiag}\begin{pmatrix}\mathbf{z}_{1},\mathbf{z}_{% 2},\ldots,\mathbf{z}_{U}\end{pmatrix}\in\mathbb{C}^{L\times U}$ , respectively.

The array response vector in the far-field region, i.e., when $r_{u,l}\rightarrow\infty$ is expressed as $[\mathbf{a}(\theta_{u,l},\infty)]_{n}=\exp\left[j\frac{2\pi y^{(n)}}{\lambda}% \sin\theta_{u,l}\right]$ from (2). As the far-field array response depends only on the angle, the far-field channel $\mathbf{h}_{u}^{\mathcal{S}}$ can be converted into a sparse beam-domain channel $\mathbf{h}_{u}=\mathbf{D}_{N}\mathbf{h}_{u}^{\mathcal{S}}$ with the discrete Fourier transform (DFT) matrix $\mathbf{D}_{N}\in\mathbb{C}^{N\times N}$ . In contrast, as the far-field approximation does not hold in the near-field region, the beam-domain near-field channel $\mathbf{h}_{u}$ exhibits not a simple sparse structure but rather a cluster sparse structure, which is caused by energy leakage due to a model mismatch between the DFT matrix $\mathbf{D}_{N}$ and the near-field array response $\mathbf{a}(\theta_{u,l},r_{u,l})$ . To illustrate the energy leakage effects, Fig. 2 depicts the amplitude of the beam-domain channel vector in the near-field and far-field regions. It can be seen that the far-field channel possesses a distinct sparse structure with a peaky spike. On the other hand, the near-field channel exhibits a clustered sparsity with flatter peaks due to energy leakage. Hence, conventional channel estimation methods exploiting the beam-domain sparsity [8, 9, 18] encounter significant performance degradation in the near-field.

II-B Received Signal Model

To estimate the near-field channel, the $u$ -th UE transmits a non-orthogonal pilot sequence $\mathbf{x}_{\mathrm{p},u}\in\mathbb{C}^{K_{\mathrm{p}}\times 1}$ and data symbol $\mathbf{x}_{\mathrm{d},u}\in\mathbb{C}^{K_{\mathrm{d}}\times 1}$ subsequently, where $K_{\mathrm{p}}(<U)$ and $K_{\mathrm{d}}$ are the length of pilots and data symbols. Each entry of $\mathbf{x}_{\mathrm{d},u}$ is randomly generated from a $Q$ -quadrature amplitude modulation (QAM) constellation $\mathcal{X}\triangleq\{\mathcal{X}_{1},\ldots,\mathcal{X}_{Q}\}$ with average symbol energy $E_{\mathrm{s}}$ . Then, the received pilot $\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}\in\mathbb{C}^{N\times K_{\mathrm{p}}}$ and data $\mathbf{Y}_{\mathrm{d}}^{\mathcal{S}}\in\mathbb{C}^{N\times K_{\mathrm{d}}}$ in the spatial domain are given by

\displaystyle\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}=\mathbf{H}^{\mathcal{S}}% \mathbf{X}_{\mathrm{p}}+\mathbf{N}_{\mathrm{p}}^{\mathcal{S}},\ \ \mathbf{Y}_{% \mathrm{d}}^{\mathcal{S}}=\mathbf{H}^{\mathcal{S}}\mathbf{X}_{\mathrm{d}}+% \mathbf{N}_{\mathrm{d}}^{\mathcal{S}},

(4)

where $\mathbf{X}_{\mathrm{p}}\triangleq\begin{bmatrix}\mathbf{x}_{\mathrm{p},1},% \ldots,\mathbf{x}_{\mathrm{p},U}\end{bmatrix}^{\mathrm{T}}\in\mathbb{C}^{U% \times K_{\mathrm{p}}}$ and $\mathbf{X}_{\mathrm{d}}\triangleq\begin{bmatrix}\mathbf{x}_{\mathrm{d},1},% \ldots,\mathbf{x}_{\mathrm{d},U}\end{bmatrix}^{\mathrm{T}}\in\mathbb{C}^{U% \times K_{\mathrm{d}}}$ are the transmitted pilot matrix and data matrix. $\mathbf{N}_{\mathrm{p}}\in\mathbb{C}^{N\times K_{\mathrm{p}}}$ and $\mathbf{N}_{\mathrm{d}}\in\mathbb{C}^{N\times K_{\mathrm{d}}}$ are the additive white Gaussian noise (AWGN) matrices, whose entries are generated from $\mathcal{CN}(0,\sigma^{2})$ with noise variance $\sigma^{2}$ . By stacking the received pilot $\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}$ and data $\mathbf{Y}_{\mathrm{d}}^{\mathcal{S}}$ , the effective received signal becomes $\mathbf{Y}^{\mathcal{S}}\triangleq\begin{bmatrix}\mathbf{Y}_{\mathrm{p}}^{% \mathcal{S}},\mathbf{Y}_{\mathrm{d}}^{\mathcal{S}}\end{bmatrix}\in\mathbb{C}^{% N\times K}$ , and the sum length of pilots and data $K\triangleq K_{\mathrm{p}}+K_{\mathrm{d}}$ , is formulated as

\displaystyle\mathbf{Y}^{\mathcal{S}}=\mathbf{H}^{\mathcal{S}}\mathbf{X}+% \mathbf{N}^{\mathcal{S}},

(5)

with $\mathbf{N}^{\mathcal{S}}\triangleq\begin{bmatrix}\mathbf{N}_{\mathrm{p}}^{% \mathcal{S}},\mathbf{N}_{\mathrm{d}}^{\mathcal{S}}\end{bmatrix}\in\mathbb{C}^{% N\times K}$ and $\mathbf{X}\triangleq\begin{bmatrix}\mathbf{X}_{\mathrm{p}},\mathbf{X}_{\mathrm% {d}}\end{bmatrix}\in\mathbb{C}^{U\times K}$ . For the sake of future convenience, let us define the pilot and data index set as $\mathcal{K}\triangleq\mathcal{K}_{\mathrm{p}}\cup\mathcal{K}_{\mathrm{d}}$ with $\mathcal{K}_{\mathrm{p}}\triangleq\{1,2,\ldots,K_{\mathrm{p}}\}$ and $\mathcal{K}_{\mathrm{d}}\triangleq\{K_{\mathrm{p}}+1,\ldots,K_{\mathrm{p}}+K_{% \mathrm{d}}\}$ .

From (3) and (4), the received pilot $\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}$ is rewritten as

\displaystyle\mathbf{Y}^{\mathcal{S}}_{\mathrm{p}}

\displaystyle=\mathbf{A}(\bm{\theta},\mathbf{r})\mathbf{V}+\mathbf{N}_{\mathrm% {p}}^{\mathcal{S}},

(6)

where $\mathbf{V}\triangleq\mathbf{Z}\mathbf{X}=\begin{bmatrix}\mathbf{x}_{1}\mathbf{% z}_{1}^{\mathrm{T}},\mathbf{x}_{2}\mathbf{z}_{2}^{\mathrm{T}},\ldots,\mathbf{x% }_{U}\mathbf{z}_{U}^{\mathrm{T}}\end{bmatrix}^{\mathrm{T}}\in\mathbb{C}^{L% \times K_{\mathrm{p}}}$ is the matrix composed of path gains and pilots.

III Overview of the Proposed Algorithm

This section describes the overview of the proposed algorithm. The overall procedures of the proposed algorithm are illustrated in Fig. 3. As shown in the figure, the proposed algorithm mainly consists of two parts: the initial channel estimation part and subsequent JCDE part. The initial channel estimation part yields an accurate near-field channel estimate to support the convergence of the subsequent JCDE algorithm, and is composed of two stages to reduce the computational complexity. In the first stage, the angle and distance candidates from large-size polar-grids are estimated by utilizing the SOMP algorithm. In the second stage, the pairing between the path candidates obtained in the first stage and corresponding UEs is performed via the 2D-OMP algorithm by using UE specific pilot sequences. The first and second stages for initial channel estimation are described in Section IV-A and Section IV-B, respectively.

In the subsequent JCDE process, the channel and data are jointly estimated via the EP algorithm with a deterministic model-based estimation approach using the initial channel estimate. To exploit the near-field model structures, the beam-domain channel matrix $\mathbf{H}\in\mathbb{C}^{N\times U}$ is decomposed into a model-based estimate $\hat{\mathbf{S}}\in\mathbb{C}^{N\times U}$ and residual channel error $\mathbf{E}\in\mathbb{C}^{N\times U}$ . $\mathbf{E}$ and $\mathbf{X}$ are jointly estimated by the EP algorithm, where the approximate joint posterior for $\mathbf{E}$ and $\mathbf{X}$ is calculated as described in Section V-C and V-D. The model-based estimate $\hat{\mathbf{S}}$ is determined by the initial channel estimate and adaptively updated in the algorithm iterations to further improve estimation performance as described in Section V-F.

IV Proposed Initial Channel Estimation

IV-A Angle and Distance Estimation

To leverage the near-field channel sparsity, the virtual channel representation in the polar-domain [8] is utilized with polar-grids. The polar-grids are designed by spatially quantizing the angle and distance domain into $G_{\theta}G_{r}$ grid points as $\tilde{\bm{\theta}}\triangleq\{\tilde{\theta}_{g_{\theta}}|g_{\theta}\in\{1,% \ldots,G_{\theta}\}\}$ and $\tilde{\mathbf{r}}\triangleq\left\{\tilde{r}_{g_{r},g_{\theta}}|g_{r}\in\{1,% \ldots,G_{r}\},\ g_{\theta}\in\{1,\ldots,G_{\theta}\}\right\}$ with $\tilde{\theta}_{g_{\theta}}\in[-\pi,\pi]$ and $\tilde{r}_{g_{r},g_{\theta}}\in[0,\infty]$ . Using the polar-grids $\tilde{\bm{\theta}}$ and $\tilde{\bm{r}}$ , the polar-domain dictionary (i.e., virtual array response matrix) is designed as

	$\displaystyle\tilde{\mathbf{A}}($	$\displaystyle\tilde{\bm{\theta}},\tilde{\mathbf{r}})=\left[\mathbf{a}(\tilde{% \theta}_{1},\tilde{r}_{1,1}),\ldots\mathbf{a}(\tilde{\theta}_{1},\tilde{r}_{G_% {r},1}),\ldots\right.$
		$\displaystyle\!\!\left.\ldots,\mathbf{a}(\tilde{\theta}_{G_{\theta}},\tilde{r}% _{1,G_{\theta}}),\ldots,\mathbf{a}(\tilde{\theta}_{G_{\theta}},\tilde{r}_{G_{r% },G_{\theta}})\right]\in\mathbb{C}^{N\times G_{\theta}G_{r}}.$		(7)

From (6) and (IV-A), the received pilot signal $\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}$ is given by

\displaystyle\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}

\displaystyle\simeq\tilde{\mathbf{A}}(\tilde{\bm{\theta}},\tilde{\mathbf{r}})% \tilde{\mathbf{V}}+\mathbf{N}^{\mathcal{S}}_{\mathrm{p}},

(8)

where $\tilde{\mathbf{V}}\in\mathbb{C}^{G_{\theta}G_{r}\times K_{\mathrm{p}}}$ is the row sparse matrix such that the number of nonzero rows is only $L$ and other $G_{\theta}G_{r}-L$ rows are zero since the channel is composed of a total of $L$ paths defined as in (1), with a sufficiently large number of grids, i.e., $G_{\theta}G_{r}\gg L$ . Equation (8) exactly holds only if there is no quantization errors in polar grids. In actual environments, however, it approximately holds due to the presence of quantization errors. Therefore, to compensate the quantization errors, we overestimate the number of paths $\hat{L}>L$ based on the propagation environment in the considered carrier frequency [7]. To estimate $\hat{L}$ path candidates from $G_{\theta}G_{r}$ grids, the sparse reconstruction problem for $\tilde{\mathbf{V}}$ is formulated as

		$\displaystyle\underset{\tilde{\mathbf{V}}}{\text{minimize}}\ \ \left\\|\mathbf{% Y}_{\mathrm{p}}^{\mathcal{S}}-\tilde{\mathbf{A}}(\tilde{\bm{\theta}},\tilde{% \mathbf{r}})\tilde{\mathbf{V}}\right\\|_{\mathrm{F}}^{2}$
		$\displaystyle\text{subject to}\ \ \\|\tilde{\mathbf{V}}\\|_{2,0}=\hat{L},$		(9)

where $\|\tilde{\mathbf{V}}\|_{2,0}$ denotes the number of non-zero rows of $\tilde{\mathbf{V}}$ .

The problem in (IV-A) can be approximately solved by a compressed sensing algorithm for multiple measurement vectors (MMV) problems, e.g., SOMP [8]. The computational complexity of SOMP at the $t$ -th iteration in a naive implementation is $\mathcal{O}(G_{r}G_{\theta}NK_{\mathrm{p}}+Nt+Nt^{2}+t^{3})$ . Its complexity can be further reduced by using the matrix inversion lemma (MIL) to $\mathcal{O}(G_{r}G_{\theta}K_{\mathrm{p}}t+Nt)$ [22, 23]. Solving the problem (IV-A) yields the angle and distance candidates corresponding to the non-zero rows of $\tilde{\mathbf{V}}$ , defined as $\check{\bm{\theta}}\triangleq\{\check{\theta}_{l}\}_{l=1}^{\hat{L}}$ and $\check{\mathbf{r}}\triangleq\{\check{r}_{l}\}_{l=1}^{\hat{L}}$ .

IV-B UE-Path Pairing

The path candidate set $\{\check{\theta}_{l},\check{r}_{l}\}_{l=1}^{\hat{L}}$ obtained in the first stage does not specify the association of individual paths with each UE. To estimate individual channels for each UE, the second stage performs UE-path pairing, where the estimated path candidates are associated with each user using UE-specific non-orthogonal pilot sequences $\{\mathbf{x}_{\mathrm{p},u}\}_{u=1}^{U}$ ²²2In case of orthogonal piloting, one can readily imagine that this is a straightforward task.. The usage of limited path candidates $\check{\bm{\theta}}$ , $\check{\mathbf{r}}$ , rather than large-size polar-grids $\tilde{\bm{\theta}}$ , $\tilde{\mathbf{r}}$ sampling the entire polar domain, can lead to a complexity reduction. Using the path set $\{\check{\theta}_{l},\check{r}_{l}\}_{l=1}^{\hat{L}}$ , the polar-domain dictionary matrix is designed as

\displaystyle\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{r}})\!% \triangleq\!\begin{bmatrix}\mathbf{a}(\check{\theta}_{1},\check{r}_{1}),% \mathbf{a}(\check{\theta}_{2},\check{r}_{2}),\ldots\mathbf{a}(\check{\theta}_{% \hat{L}},\check{r}_{\hat{L}})\end{bmatrix}\in\mathbb{C}^{N\times\hat{L}}.

(10)

Reducing the size of the polar grids from $G_{r}G_{\theta}$ in (IV-A) to $\hat{L}(\ll G_{r}G_{\theta})$ in (10) can effectively lower the complexity in the following compressed sensing algorithm. Then, the channel vector for the $u$ -th UE can be approximated with the polar-domain dictionary $\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{r}})$ as

\displaystyle\mathbf{h}_{u}^{\mathcal{S}}=\mathbf{A}(\bm{\theta}_{u},\mathbf{r% }_{u})\mathbf{z}_{u}\simeq\check{\mathbf{A}}(\check{\bm{\theta}},\check{% \mathbf{r}})\check{\mathbf{z}}_{u},

(11)

where $\check{\mathbf{z}}_{u}\in\mathbb{C}^{\hat{L}\times 1}$ is the virtual path gain vector.

From (11), the received pilot $\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}$ is approximated as

\displaystyle\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}}

\displaystyle\simeq\sum_{u=1}^{U}\check{\mathbf{A}}(\check{\bm{\theta}},\check% {\mathbf{r}})\check{\mathbf{z}}_{u}\mathbf{x}_{\mathrm{p},u}^{\mathrm{T}}\!+\!% \mathbf{N}_{\mathrm{p}}=\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{% r}})\check{\mathbf{Z}}\mathbf{X}_{\mathrm{p}}+\mathbf{N}_{\mathrm{p}}^{% \mathcal{S}},

(12)

with $\check{\mathbf{Z}}\triangleq\begin{bmatrix}\check{\mathbf{z}}_{1},\check{% \mathbf{z}}_{2},\ldots,\ \check{\mathbf{z}}_{U}\end{bmatrix}\in\mathbb{C}^{% \hat{L}\times U}$ .

The equation (12) can be transformed into a 1D linear equation as $\mathbf{y}_{\mathrm{p}}^{\mathcal{S}}\simeq\mathbf{\Phi}_{\mathrm{p}}\check{% \mathbf{z}}$ with $\mathbf{y}_{\mathrm{p}}^{\mathcal{S}}\triangleq\mathrm{vec}(\mathbf{Y}^{% \mathcal{S}}_{\mathrm{p}})$ , $\check{\mathbf{z}}\triangleq\mathrm{vec}(\check{\mathbf{Z}})\in\mathbb{C}^{% \hat{L}U\times 1}$ and $\mathbf{\Phi}_{\mathrm{p}}\triangleq\left(\mathbf{X}_{\mathrm{p}}^{\mathrm{T}}% \otimes\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{r}})\right)\in% \mathbb{C}^{NK_{\mathrm{p}}\times\hat{L}U}$ . Although the estimation for $\check{\mathbf{z}}$ from the vectorized observation $\mathbf{y}_{\mathrm{p}}^{\mathcal{S}}$ can be simply addressed by various methods such as OMP [22], this significantly increases the complexity due to the large-size dictionary $\mathbf{\Phi}_{\mathrm{p}}$ . Hence, to circumvent the high computational burden, the 2D signal representation in (12) is directly addressed without the vectorized 1D representation. Then, the sparse reconstruction problem for $\check{\mathbf{Z}}$ in (12) is formulated as

		$\displaystyle\underset{\check{\mathbf{Z}}}{\text{minimize}}\ \ \left\\|\mathbf{% Y}_{\mathrm{p}}^{\mathcal{S}}-\check{\mathbf{A}}(\check{\bm{\theta}},\check{% \mathbf{r}})\check{\mathbf{Z}}\mathbf{X}_{\mathrm{p}}\right\\|_{\mathrm{2}}^{2}$
		$\displaystyle\text{subject to}\ \ \left\\|\check{\mathbf{z}}\right\\|_{0}=\hat{L},$		(13)

with $\check{\mathbf{z}}\triangleq\mathrm{vec}(\check{\mathbf{Z}})\in\mathbb{C}^{% \hat{L}U\times 1}$ .

The optimization problem (IV-B) is solved via a two-dimensional compressed sensing algorithm. The conventional method [10] tackles this problem with the large-size polar dictionary $\tilde{\mathbf{A}}(\tilde{\bm{\theta}},\tilde{\mathbf{r}})$ in (IV-A) instead of $\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{r}})$ in (10) via the 2D-CoSaMP algorithm, which sacrifices estimation performance for complexity reduction compared to 2D-OMP [19]. In contrast, our proposed method solves the optimization problem (IV-B) via the 2D-OMP algorithm using the small-size polar-domain dictionary $\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{r}})$ constructed by the path candidates $\{\check{\theta}_{l},\check{r}_{l}\}_{l=1}^{\hat{L}}$ in the first stage. As a result, the proposed method possesses the prominent capability to overcome the conventional approach [10] while retaining comparable computational complexity. Detailed discussions regarding the complexity of the proposed algorithm are presented in Section VI.

Solving the problem (IV-B) yields the estimated path gain vector $\hat{\mathbf{z}}_{u}\in\mathbb{C}^{\hat{L}_{u}\times 1}$ , angle $\hat{\bm{\theta}}_{u}\in\mathbb{R}^{\hat{L}_{u}\times 1}$ , and distance $\hat{\mathbf{r}}_{u}\in\mathbb{R}^{\hat{L}_{u}\times 1}$ corresponding to the non-zero elements of $\check{\mathbf{z}}_{u}\in\mathbb{C}^{\hat{L}\times 1}$ , where $\hat{L}_{u}$ is the estimated number of paths for the $u$ -th UE. Given the estimates, the initial channel estimate can be obtained as

\displaystyle\hat{\mathbf{H}}^{\mathcal{S}}_{0}\!=\!\begin{bmatrix}\hat{% \mathbf{h}}_{1}^{\mathcal{S}},\ldots,\hat{\mathbf{h}}_{U}^{\mathcal{S}}\end{% bmatrix}\in\mathbb{C}^{N\times U},\text{with }\hat{\mathbf{h}}_{u}^{\mathcal{S% }}\!=\!\hat{\mathbf{A}}(\hat{\bm{\theta}}_{u},\hat{\mathbf{r}}_{u})\hat{% \mathbf{z}}_{u},

(14)

where $\hat{\mathbf{A}}(\hat{\bm{\theta}}_{u},\hat{\mathbf{r}}_{u})=\begin{bmatrix}% \mathbf{a}(\hat{\theta}_{u,1},\hat{r}_{u,1}),\ldots,\mathbf{a}(\hat{\theta}_{u% ,\hat{L}_{u}},\hat{r}_{u,\hat{L}_{u}})\end{bmatrix}\in\mathbb{C}^{N\times\hat{% L}_{u}}$ is the estimated array response. The proposed initial channel estimation method is summarized in Algorithm 1.

Algorithm 1 Proposed channel estimation algorithm

1:Input:

\mathbf{Y}_{\mathrm{p}}^{\mathcal{S}},\ \mathbf{X}_{\mathrm{p}},\ \hat{L}

2:Output:

\hat{\mathbf{H}}_{0}^{\mathcal{S}},\{\hat{\bm{\theta}}_{u},\hat{\mathbf{r}}_{u% },\hat{\mathbf{z}}_{u},\}_{u=1}^{U}

4:// First Stage - Angle and distance estimation

5:Calculate polar-domain dictionary

\tilde{\mathbf{A}}(\tilde{\bm{\theta}},\tilde{\mathbf{r}})

in (IV-A)

6:Estimate

\check{\bm{\theta}},\check{\mathbf{r}}

by solving (IV-A) via SOMP with MIL [22]

7:// Second Stage - UE-path pairing

8:Calculate polar-domain dictionary

\check{\mathbf{A}}(\check{\bm{\theta}},\check{\mathbf{r}})

in (10)

9:Estimate

\{\hat{\bm{\theta}}_{u},\hat{\mathbf{r}}_{u},\hat{\mathbf{z}}_{u}\}_{u=1}^{U}

by solving (IV-B) via 2D-OMP [19]

10:Estimate channel matrix

\hat{\mathbf{H}}_{0}^{\mathcal{S}}

in (14)

V Proposed joint channel and data estimation

Given the initial estimates obtained from Algorithm 1, we aim to improve both the channel estimation performance as well as the data estimation accuracy by jointly processing the channel estimation and data detection while considering near-field properties. This section elaborates on the proposed JCDE algorithm with the initial channel estimate.

V-A Pre-processing for Channel and Data Estimation

V-A1 Pre-processing for Channel Estimation

To exploit the channel sparsity, the received signal and channel matrix in the spatial-domain are transformed in the beam-domain as $\mathbf{Y}\triangleq\mathbf{D}_{N}\mathbf{Y}^{\mathcal{S}}\in\mathbb{C}^{N% \times K}$ and $\mathbf{H}\triangleq\mathbf{D}_{N}\mathbf{H}^{\mathcal{S}}\in\mathbb{C}^{N% \times U}$ , where $\mathbf{D}_{\mathrm{N}}\in\mathbb{C}^{N\times N}$ is the DFT matrix . As described in Section II-A, the near-field channel has a cluster sparse structure due to energy leakage, thus, to tackle this problem, the channel matrix $\mathbf{H}$ is first considered as the aggregation of the model-based estimate $\hat{\mathbf{S}}\in\mathbb{C}^{N\times U}$ and the residual channel estimation error $\mathbf{E}\triangleq\mathbf{H}-\hat{\mathbf{S}}\in\mathbb{C}^{N\times U}$ , resulting in

\displaystyle\mathbf{H}=\hat{\mathbf{S}}+\mathbf{E}.

(15)

An initial value for the model-based estimate $\hat{\mathbf{S}}$ is determined with the proposed initial channel estimate $\hat{\mathbf{H}}_{0}^{\mathcal{S}}$ in (14) as $\hat{\mathbf{S}}=\mathbf{D}_{N}\hat{\mathbf{H}}_{0}^{\mathcal{S}}$ , and it is adaptively updated based on the near-field model structure as described in Section V-F. As the residual error $\mathbf{E}$ is defined by subtracting the current estimate $\hat{\mathbf{S}}$ from the beam-domain channel $\mathbf{H}$ as in (15), this subtraction results in a sparser domain representation compared to the original beam-domain channel $\mathbf{H}$ . The dominant path components are removed from $\mathbf{H}$ by $\hat{\mathbf{S}}$ , facilitating the sparse matrix reconstruction by considering $\mathbf{E}$ (instead of $\mathbf{H}$ ) as the variable to be estimated by a Bayesian inference framework.

V-A2 Pre-processing for Data Estimation

For low-complexity data estimation, the conventional methods based on the far-field assumption, such as [13, 12, 14], utilize MRC-based detectors that are effective in the far-field region since the beam-domain channel exhibits a simple sparse structure with a peaky spike and no correlation between the beam indices. However, these detectors are ineffective in the near-field scenario because the near-field channel has cluster sparsity due to energy leakage, and the leaked energy is correlated in the beam-domain. Although LMMSE-based detection methods such as [24, 25] are effective to deal with the correlation, these methods require matrix inversion with the size $N$ , which is computationally expensive especially in XL-MIMO systems. To balance the computational complexity and detection performance, the array is virtually divided into multiple sub-arrays, and a sub-array-wise LMMSE-based detector is designed similarly to [26]. In contrast to [26], which assumes perfect channel state information (CSI), the proposed method considers the channel estimation error while jointly estimating data and channel, exploiting the near-field model structures.

Accordingly, the extra-large array with $N$ antennas are partitioned into $C$ sub-arrays, and the sub-array $c\in\mathcal{C}\triangleq\{1,2,\ldots,C\}$ has $N_{c}$ antennas satisfying $N=\sum_{c=1}^{C}N_{c}$ . The received signals, residual channel errors, and model-based estimates can be also seen as $\mathbf{Y}=\begin{bmatrix}\mathbf{Y}_{1}^{\mathrm{T}},\mathbf{Y}_{2}^{\mathrm{% T}},\dots,\mathbf{Y}_{C}^{\mathrm{T}}\end{bmatrix}^{\mathrm{T}}$ , $\mathbf{E}=\begin{bmatrix}\mathbf{E}_{1}^{\mathrm{T}},\mathbf{E}_{2}^{\mathrm{% T}},\dots,\mathbf{E}_{C}^{\mathrm{T}}\end{bmatrix}^{\mathrm{T}}$ , and $\hat{\mathbf{S}}=\begin{bmatrix}\hat{\mathbf{S}}_{1}^{\mathrm{T}},\hat{\mathbf% {S}}_{2}^{\mathrm{T}},\dots,\hat{\mathbf{S}}_{C}^{\mathrm{T}}\end{bmatrix}^{% \mathrm{T}}$ , with $\mathbf{Y}_{c}\in\mathbb{C}^{N_{c}\times K}$ , $\mathbf{E}_{c}\in\mathbb{C}^{N_{c}\times U}$ , and $\hat{\mathbf{S}}_{c}\in\mathbb{C}^{N_{c}\times U}$ . The received signals $\mathbf{Y}$ and $\mathbf{Y}_{c}$ can then be rewritten as

\displaystyle\mathbf{Y}=\mathbf{E}\mathbf{X}+\hat{\mathbf{S}}\mathbf{X}+% \mathbf{N},\text{ with }\mathbf{Y}_{c}=\mathbf{E}_{c}\mathbf{X}+\hat{\mathbf{S% }}_{c}\mathbf{X}+\mathbf{N}_{c}.

(16)

For convenience, let us define $\mathcal{N}\triangleq\{1,2,\ldots,N\}$ as the antenna index set, and $\mathcal{N}_{c}\triangleq\left\{{n_{c(1)}},n_{c(2)},\ldots,n_{c(Nc)}\right\}% \subset\mathcal{N}$ as the antenna index set at the $c$ -th sub-array such that $\mathcal{N}_{1}\cup\mathcal{N}_{2}\cup\cdots\cup\mathcal{N}_{C}=\mathcal{N}$ and $\mathcal{N}_{i}\cap\mathcal{N}_{j}=\emptyset,\ i\neq j\in\mathcal{C}$ .

V-B Bayesian Inference Formulation

Based on the linear observation in (16) with the deterministic variable $\hat{\mathbf{S}}$ and random variables $\mathbf{X}$ and $\mathbf{E}$ , the likelihood function for $\mathbf{X}$ and $\mathbf{E}$ can be expressed as

\displaystyle p(\mathbf{Y}|\mathbf{E},\mathbf{X})\!=\!\prod_{n\in\mathcal{N}}% \prod_{k\in\mathcal{K}}p(y_{n,k}|\bar{\mathbf{x}}_{k},\bar{\mathbf{e}}_{n}),

(17)

where $p(y_{n,k}|\bar{\mathbf{e}}_{n},\bar{\mathbf{x}}_{k})=\mathcal{CN}((\bar{% \mathbf{e}}_{n}+\bar{\mathbf{s}}_{n})^{\mathrm{T}}\bar{\mathbf{x}}_{k},\ % \sigma^{2})$ with $\bar{\mathbf{e}}_{n}=[e_{n,1},\ldots,e_{n,U}]^{\mathrm{T}}\in\mathbb{C}^{U% \times 1}$ , $\bar{\mathbf{s}}_{n}=[\hat{s}_{n,1},\ldots,\hat{s}_{n,U}]^{\mathrm{T}}\in% \mathbb{C}^{U\times 1}$ , and $\bar{\mathbf{x}}_{k}=[x_{1,k},\ldots,x_{U,k}]^{\mathrm{T}}\in\mathbb{C}^{U% \times 1}$ .

Since each entry of $\mathbf{X}_{\mathrm{d}}$ is randomly selected from the QAM constellation point set $\mathcal{X}$ , the prior $p(\mathbf{X})$ can be written as

\displaystyle p(\mathbf{X})=\prod_{u\in\mathcal{U}}\prod_{k\in\mathcal{K}}p(x_% {u,k}),

(18)

with $p(x_{u,k_{d}})=\frac{1}{Q}\sum_{\mathcal{X}_{i}\in\mathcal{X}}\delta(x_{u,k_{d% }}-\mathcal{X}_{i}),\ \forall k_{d}\in\mathcal{K}_{d}$ and $p(x_{u,k_{p}})=\delta(x_{u,k_{p}}-[\mathbf{X}_{\mathrm{p}}]_{u,k_{p}}),\ % \forall k_{p}\in\mathcal{K}_{p}$ .

Although many conventional methods such as [13, 14] design the i.i.d. sparse prior for the beam-domain channel as $p(\mathbf{H})=\prod_{n\in\mathcal{N}}\prod_{u\in\mathcal{U}}p(h_{n,u})$ (e.g., BG prior), this modeling causes the model mismatch due to energy leakage effects in the near-field region. Therefore, we design the sparse prior for the residual channel error $\mathbf{E}$ instead of $\mathbf{H}$ as

\displaystyle p(\mathbf{E};\mathbf{\Theta})=\prod_{n\in\mathcal{N}}\prod_{u\in% \mathcal{U}}p(e_{n,u};\mathbf{\Theta}),

(19)

where $p(e_{n,u};\mathbf{\Theta})=\mathcal{CN}(0,\ \sigma^{e}_{n,u})$ is Gaussian prior distribution with zero mean and variance $\sigma^{e}_{n,u}$ , which is widely used for sparse representation in the sparse Bayesian learning (SBL) algorithm [27], where $\mathbf{\Theta}\triangleq\{\sigma^{e}_{n,u}\}_{n\in\mathcal{N},u\in\mathcal{U}}$ is the hyper parameter set to be optimized through the expectation maximization (EM) algorithm [28] as described in Section V-E.

From the likelihood in (17) and priors in (18), (19), the posterior can be written as

\displaystyle p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{\Theta})=p(\mathbf{Y}% |\mathbf{E},\mathbf{X})p(\mathbf{X})p(\mathbf{E};\mathbf{\Theta})/p(\mathbf{Y}% ;\mathbf{\Theta}),

(20)

where $p(\mathbf{Y};\mathbf{\Theta})=\int_{\mathbf{E},\mathbf{X}}p(\mathbf{Y},\mathbf% {E},\mathbf{X};\mathbf{\Theta})$ is the marginal likelihood referred to as the evidence for parameter $\mathbf{\Theta}$ . Our objective is to estimate $\mathbf{E}$ , $\mathbf{X}$ , and $\mathbf{\Theta}$ through the posterior and the evidence.

The estimator for $\mathbf{\Theta}$ by the type-II maximum likelihood method [29] is given as

\displaystyle\hat{\mathbf{\Theta}}=\underset{\mathbf{\Theta}}{\mathrm{argmax}}% \ p(\mathbf{Y};\mathbf{\Theta}).

(21)

However, the calculation of the evidence $p(\mathbf{Y};\mathbf{\Theta})$ is intractable due to the multidimensional integral for $\mathbf{X}$ and $\mathbf{E}$ . Hence, we utilize the EM algorithm, which maximizes the evidence lower bound (ELBO) in each iteration, instead of directly maximizing the evidence [28]. Given $\mathbf{\Theta}^{(t)}$ at the $t$ -th iteration, $\mathbf{\Theta}^{(t+1)}$ at the $(t+1)$ -th iteration can be obtained as the following E-step and M-step:

		$\displaystyle\text{E-step : }\mathcal{F}(\mathbf{\Theta},\mathbf{\Theta}^{(t)}% )\!=\!\mathbb{E}_{p(\mathbf{E},\mathbf{X}\|\mathbf{Y};\mathbf{\Theta}^{(t)})}\!% \left[\ln p(\mathbf{Y},\mathbf{E},\mathbf{X};\mathbf{\Theta})\right]+\mathsf{c% }_{0}^{(t)}\!,$		(22)
		$\displaystyle\text{M-step : }\mathbf{\Theta}^{(t+1)}=\underset{\mathbf{\Theta}% }{\mathrm{argmax}}\ \mathcal{F}(\mathbf{\Theta},\mathbf{\Theta}^{(t)}),$		(23)

where $\mathcal{F}(\mathbf{\Theta},\mathbf{\Theta}^{(t)})$ is the ELBO with the constant value $\mathsf{c}_{0}^{(t)}=\mathbb{E}_{p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{% \Theta}^{(t)})}\left[\ln p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{\Theta}^{(% t)})\right]$ .

Since E-step requires the calculation of a multidimensional integral that is computationally unreasonable, we approximate the posterior by $g^{(t)}(\mathbf{E},\mathbf{X}|\mathbf{Y})\simeq p(\mathbf{E},\mathbf{X}|% \mathbf{Y};\mathbf{\Theta}^{(t)})$ , using the EP algorithm. After the E-step, the maximization problem in (23) with the approximate posterior $g^{(t)}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is solved, which is described in detail in Section V-E. The EP procedure continues until it reaches the maximum number of iterations $T$ . Finally, the last updated parameters at $t=T$ are used as the final estimates as $\hat{\mathbf{\Theta}}\triangleq\mathbf{\Theta}^{(T)}$ , $\hat{\mathbf{E}}\triangleq\mathbb{E}_{g^{(T)}(\mathbf{E},\mathbf{X}|\mathbf{Y}% )}[\mathbf{E}]$ , and $\hat{\mathbf{X}}\triangleq\mathbb{E}_{g^{(T)}(\mathbf{E},\mathbf{X}|\mathbf{Y}% )}[\mathbf{X}]$ .

In what follows, let us drop the iteration index $t$ for notation simplicity. The approximate posterior $g(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is derived by minimizing the KL divergence subject to a Gaussian distribution set $\mathbf{\Phi}$ as

\displaystyle\underset{g\in\mathbf{\Phi}}{\mathrm{minimize}}\ \mathrm{KL}\left% (p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{\Theta})\|g(\mathbf{E},\mathbf{X}|% \mathbf{Y})\right),

(24)

where the approximate posterior $g(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is designed as

\displaystyle g(\mathbf{E},\mathbf{X}|\mathbf{Y})

\displaystyle=Z_{g}^{-1}Q^{x}(\mathbf{X})Q^{e}(\mathbf{E})B^{x}(\mathbf{X})B^{% e}(\mathbf{E}),

(25)

where $Z_{g}=\int_{\mathbf{E},\mathbf{X}}Q^{x}(\mathbf{X})Q^{e}(\mathbf{E})B^{x}(% \mathbf{X})B^{e}(\mathbf{E})$ is a normalizing constant, and $Q^{x}(\mathbf{X})$ , $Q^{e}(\mathbf{E})$ , $B^{x}(\mathbf{X})$ , and $B^{e}(\mathbf{E})$ are the approximate factors such that $Q^{x}(\mathbf{X})Q^{e}(\mathbf{E})\simeq p(\mathbf{Y}|\mathbf{E},\mathbf{X})$ , $B^{x}(\mathbf{X})\simeq p(\mathbf{X})$ , and $B^{e}(\mathbf{E})\simeq p(\mathbf{E};\bm{\Theta})$ subject to Gaussian distribution set $\mathbf{\Phi}$ .

These approximate factors are designed as $Q^{x}(\mathbf{X})=\prod_{c,u,k}q^{x}_{c,u,k}(x_{u,k})$ , $Q^{e}(\mathbf{E})=\prod_{c}\prod_{n_{c},u,k}q^{e}_{n_{c},u,k}(e_{n_{c},u})$ , $B^{x}(\mathbf{X})=\prod_{u,k}b^{x}_{u,k}(x_{u,k})$ , $B^{e}(\mathbf{E})=\prod_{c}\prod_{n_{c},u}b^{e}_{n_{c},u}(e_{n_{c},u})$ , where $q^{x}_{c,u,k}(\cdot)$ , $q^{e}_{n,u,k}(\cdot)$ , $b^{x}_{u,k}(\cdot)$ , and $b^{e}_{u,k}(\cdot)$ are the parameterized approximate functions defined as


$\displaystyle q^{x}_{c,u,k}(x_{u,k})$	$\displaystyle\triangleq\exp\left(-\|x_{u,k}-\hat{x}^{q}_{c,u,k}\|^{2}/\xi^{q,x}_% {c,u,k}\right),$	(26a)
$\displaystyle q^{e}_{n_{c},u,k}(e_{n_{c},u})$	$\displaystyle\triangleq\exp\left(-\|e_{n_{c},u}-\hat{e}^{q}_{n_{c},u,k}\|^{2}/% \xi^{q,e}_{n_{c},u,k}\right),$	(26b)
$\displaystyle b^{x}_{u,k}(x_{u,k})$	$\displaystyle\triangleq\exp\left(-\|x_{u,k}-\hat{x}^{b}_{u,k}\|^{2}/\xi^{b,x}_{u% ,k}\right),$	(26c)
$\displaystyle b^{e}_{n_{c},u}(e_{n_{c},u})$	$\displaystyle\triangleq\exp\left(-\|e_{n_{c},u}-\hat{e}^{b}_{n_{c},u}\|^{2}/\xi^% {b,e}_{n_{c},u}\right),$	(26d)

where $\bm{\pi}^{q,x}_{c,u,k}\!\triangleq\!\begin{bmatrix}\hat{x}^{q}_{c,u,k},\xi^{q,% x}_{c,u,k}\end{bmatrix}^{\mathrm{T}}$ , $\bm{\pi}^{q,e}_{n_{c},u,k}\!\triangleq\!\begin{bmatrix}\hat{e}^{q}_{n_{c},u,k}% ,\xi^{q,e}_{n_{c},u,k}\end{bmatrix}^{\mathrm{T}}$ , $\bm{\pi}^{b,x}_{u,k}\!\triangleq\!\begin{bmatrix}\hat{x}^{b}_{u,k},\xi^{b,x}_{% u,k}\end{bmatrix}^{\mathrm{T}}$ , and $\bm{\pi}^{b,e}_{n_{c},u}\!\triangleq\!\begin{bmatrix}\hat{e}^{b}_{n_{c},u},\xi% ^{b,e}_{n_{c},u}\end{bmatrix}^{\mathrm{T}}$ are unknown parameters to be optimized by minimizing the KL divergence.

Since the approximate posterior $g(\mathbf{E},\mathbf{X}|\mathbf{Y})$ in (25) is designed subject to Gaussian distribution set $\mathbf{\Phi}$ , the marginalized approximate posterior $g(x_{u,k}|\mathbf{Y})$ and $g(e_{n_{c},u}|\mathbf{Y})$ can be expressed as $g(x_{u,k}|\mathbf{Y})=\mathcal{CN}(\hat{x}_{u,k},\xi^{x}_{u,k})$ and $g(e_{n_{c},u}|\mathbf{Y})=\mathcal{CN}(\hat{e}_{n_{c},u},\xi^{e}_{n_{c},u})$ , where $\hat{x}_{u,k}$ and $\hat{e}_{n_{c},u}$ are the posterior means, and $\xi^{x}_{u,k}$ and $\xi^{e}_{n_{c},u}$ are the posterior variances.

Let $\mathbf{\Pi}\triangleq\{\bm{\pi}^{q,x}_{c,u,k},\ \bm{\pi}^{b,x}_{u,k},\ \bm{% \pi}^{q,e}_{n_{c},u,k},\ \bm{\pi}^{b,e}_{n_{c},u}\}_{c\in\mathcal{C},n_{c}\in% \mathcal{N},u\in\mathcal{U},k\in\mathcal{K}}$ denote an unknown parameter set to be optimized. The optimal unknown parameter set $\hat{\mathbf{\Pi}}$ is obtained by minimizing the KL divergence in (24). However, the objective function cannot be expressed in closed-form because $\mathrm{KL}\left(p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{\Theta})\|g(% \mathbf{E},\mathbf{X}|\mathbf{Y})\right)$ includes intractable integral operations with respect to the true posterior $p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{\Theta})$ . To tackle this, we set the target distribution $\hat{p}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ instead of the true posterior $p(\mathbf{E},\mathbf{X}|\mathbf{Y};\mathbf{\Theta})$ into the KL divergence in (24). The target distribution is designed by replacing a part of the true posterior with the approximate functions in (26a)-(26d) as described in the following sections. For the sake of notation convenience for the design of the target distribution in the following section, the approximate distribution $l^{x}_{c,k}(\mathbf{E}_{c},\bar{\mathbf{x}}_{k})\simeq p(\mathbf{y}_{c,k}|% \mathbf{E}_{c},\bar{\mathbf{x}}_{k})$ and $l^{e}_{n_{c},k}(\bar{\mathbf{e}}_{n_{c}},\bar{\mathbf{x}}_{k})\simeq p(y_{n_{c% },k}|\bar{\mathbf{e}}_{n_{c}},\bar{\mathbf{x}}_{k})$ are expressed using (26a)-(26d) as


		$\displaystyle l^{x}_{c,k}(\mathbf{E}_{c},\bar{\mathbf{x}}_{k})\!\propto\!\prod% _{u\in\mathcal{U}}q^{x}_{c,u,k}(x_{u,k})\prod_{u\in\mathcal{U}}\prod_{n_{c}\in% \mathcal{N}_{c}}\!q^{e}_{n_{c},u,k}(e_{n_{c},u}),\!\!\!$		(27a)
		$\displaystyle l^{e}_{n_{c},k}(\bar{\mathbf{e}}_{n_{c}},\bar{\mathbf{x}}_{k})\!% \propto\!\prod_{u\in\mathcal{U}}\!\!\left\{q^{x}_{c,u,k}(x_{u,k})\right\}^{% \frac{1}{N_{c}}}\!\!\prod_{u\in\mathcal{U}}q^{e}_{n_{c},u,k}(e_{n_{c},u}).\!\!\!$		(27b)

To solve the KL minimization problem, the alternating optimization algorithm [30] is utilized, where a target parameter is optimized while the other parameters are fixed. In what follows, the estimation method for $\{\bm{\pi}^{q,x}_{c,u,k},\bm{\pi}^{b,x}_{u,k}\}$ and $\{\bm{\pi}^{b,e}_{n,u}$ , $\bm{\pi}^{q,e}_{n,u,k}\}$ is described in Section V-C and V-D, respectively.

V-C EP for Data Estimation

V-C1 Update $\bm{\pi}^{q,x}_{c,u,k}$

While the parameter $\bm{\pi}^{q,x}_{c,u,k}$ in $q^{x}_{c,u,k}(x_{u,k})$ is updated, the other parameters $\mathbf{\Pi}\setminus\{\bm{\pi}^{q,x}_{c,u,k}\}$ are fixed as the tentative estimated values, that is, the KL minimization problem for $\bm{\pi}^{q,x}_{c,u,k}$ is formulated as

\displaystyle\underset{\bm{\pi}^{q,x}_{c,u,k}}{\mathrm{minimize}}\ \mathrm{KL}% \left(\hat{p}_{c,k}^{q,x}(\mathbf{E},\mathbf{X}|\mathbf{Y})\|g(\mathbf{E},% \mathbf{X}|\mathbf{Y})\right),

(28)

where $\hat{p}_{c,k}^{q,x}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is the target distribution for $\bm{\pi}^{q,x}_{c,u,k}$ , which is designed using $l^{x}_{c,k}(\mathbf{E}_{c},\bar{\mathbf{x}}_{k})$ in (27a) as

	$\displaystyle\hat{p}_{c,k}^{q,x}($	$\displaystyle\mathbf{E},\mathbf{X}\|\mathbf{Y})=C_{c,k}^{q,x}\ p(\mathbf{y}_{c,% k}\|\mathbf{E}_{c},\bar{\mathbf{x}}_{k})$
		$\displaystyle\prod_{(c^{\prime},k^{\prime})\in\mathcal{C}\times\mathcal{K}% \setminus(c,k)}\underbrace{l^{x}_{c^{\prime},k^{\prime}}(\mathbf{E}_{c^{\prime% }},\mathbf{x}_{k^{\prime}})}_{\simeq p(\mathbf{y}_{c^{\prime},k}\|\mathbf{E}_{c% ^{\prime}},\mathbf{x}_{k^{\prime}})}\underbrace{B^{x}(\mathbf{X})B^{e}(\mathbf% {E})}_{\simeq p(\mathbf{X})p(\mathbf{E};\mathbf{\Theta})},$		(29)

where $C_{c,k}^{q,x}$ is a normalizing constant.

Let $\mathcal{L}^{q,x}_{c,u,k}(\bm{\pi}^{q,x}_{c,u,k})\triangleq\mathrm{KL}\left(% \hat{p}_{c,k}^{q,x}(\mathbf{E},\mathbf{X}|\mathbf{Y})\|g(\mathbf{E},\mathbf{X}% |\mathbf{Y})\right)$ denote the objective function in (28), resorting to

\displaystyle\mathcal{L}^{q,x}_{c,u,k}=\!\ln Z_{g}\!-\!\mathbb{E}_{\hat{p}^{q,% x}_{c,k}(x_{u,k}|\mathbf{Y})}\!\left[\ln q^{x}_{c,u,k}(x_{u,k})\right]+\mathrm% {const}.

(30)

Since the objective function $\mathcal{L}^{q,x}_{c,u,k}(\bm{\pi}_{c,u,k}^{q,x})$ is convex with respect to $\bm{\pi}_{c,u,k}^{q,x}$ , the necessary and sufficient condition for the global optimal , i.e., $\partial\mathcal{L}^{q,x}_{c.u,k}/\partial\bm{\pi}_{c,u,k}^{q,x}=\mathbf{0}$ , is equivalent to

\displaystyle g(x_{u,k}|\mathbf{Y})=\mathrm{proj}_{\mathbf{\Phi}}\left[\hat{p}% ^{q,x}_{c,k}(x_{u,k}|\mathbf{Y})\right],

(31)

where $\mathrm{proj}_{\mathbf{\Phi}}\left[p(x)\right]\triangleq\mathcal{CN}\left(% \mathbb{E}_{p(x)}[x],\ \mathbb{V}_{p(x)}[x]\right)$ is the projection operator onto Gaussian distribution set $\mathbf{\Phi}$ , which indicates the moment matching, i.e., the first and second moments of distribution $p(x)$ matches those of the target distribution.

The marginalized approximate posterior $g(x_{u,k}|\mathbf{Y})=\int_{\mathbf{E},\mathbf{X}\setminus x_{u,k}}g(\mathbf{E% },\mathbf{X}|\mathbf{Y})$ in (31) is written as

\displaystyle g(x_{u,k}|\mathbf{Y})

\displaystyle\propto q^{x}_{c,u,k}(x_{u,k})v^{x}_{c,u,k}(x_{u,k}),

(32)

with $v^{x}_{c,u,k}(x_{u,k})\triangleq b_{u,k}^{x}(x_{u,k})\prod_{c^{\prime}\in% \mathcal{C}\setminus c}q^{x}_{c^{\prime},u,k}(x_{u,k})$ , which can also be represented as

\displaystyle v^{x}_{c,u,k}(x_{u,k})

\displaystyle=C^{v,x}_{c,u,k}\exp\left(-|x_{u,k}-\hat{x}^{v}_{c,u,k}|^{2}/\xi^% {v,x}_{c,u,k}\right),

(33)

with the normalizing constant $C^{v,x}_{c,u,k}$ .

The marginalized target distribution $\hat{p}^{q,x}_{c,k}(x_{u,k}|\mathbf{Y})=\int_{\mathbf{E},\mathbf{X}\setminus x% _{u,k}}\hat{p}^{q,x}_{c,k}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ in (31) is written as

\displaystyle\hat{p}^{q,x}_{c,k}(x_{u,k}|\mathbf{Y})

\displaystyle=\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}|x_{u,k})v^{x}_{c,u,k}(x_{u,% k}),

(34)

where $\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}|x_{u,k})$ is the conditional probability distribution defined in (35) in the top of next page along with $v^{e}_{n_{c},u,k}(e_{n_{c},u})$ , and $\tilde{v}^{e}_{c,u,k}(\mathbf{e}_{c,u})$ , with $\bar{C}^{x}_{c,u,k}$ being the normalizing constant and

	$\displaystyle\hat{\mathbf{e}}^{v}_{c,u,k}$	$\displaystyle=\begin{bmatrix}\hat{e}^{v}_{n_{c(1)},u,k},\hat{e}^{v}_{n_{c(2)},% u,k},\ldots,\hat{e}^{v}_{n_{c(N_{c})},u,k}\end{bmatrix}^{\mathrm{T}}\in\mathbb% {C}^{N_{c}\times 1},$
	$\displaystyle\mathbf{\Xi}^{v,e}_{c,u,k}\!\!$	$\displaystyle=\!\mathrm{diag}\left(\!\xi^{v,e}_{n_{c(1)},u,k},\xi^{v,e}_{n_{c(% 2)},u,k},\ldots,\xi^{v,e}_{n_{c(N_{c})},u,k}\!\right)\in\mathbb{R}^{N_{c}% \times N_{c}}.$

$\displaystyle\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}\|x_{u,k})$	$\displaystyle\triangleq\bar{C}^{x}_{c,u,k}\int_{\mathbf{E}_{c},\bar{\mathbf{x}% }_{k}\setminus x_{u,k}}p(\mathbf{y}_{c,k}\|\mathbf{E}_{c},\bar{\mathbf{x}}_{k})% \prod_{u^{\prime}\in\mathcal{U}\setminus u}v^{x}_{c,u^{\prime},k}(x_{u^{\prime% },k})\prod_{u^{\prime}\in\mathcal{U}}\prod_{n_{c}\in\mathcal{N}_{c}}v^{e}_{n_{% c},u^{\prime},k}(e_{n_{c},u^{\prime}}),$	(35)
$\displaystyle v^{e}_{n_{c},u,k}(e_{n_{c},u})$	$\displaystyle\triangleq b_{n_{c},u}^{e}(e_{n_{c},u})\prod_{k^{\prime}\in% \mathcal{K}\setminus k}q^{e}_{n_{c},u,k^{\prime}}(e_{n_{c},u})=C^{v,e}_{n_{c},% u,k}\exp\left(-\|e_{n_{c},u}-\hat{e}^{v}_{n_{c},u,k}\|^{2}/\xi^{v,e}_{n_{c},u,k}% \right),$	(36)
$\displaystyle\tilde{v}^{e}_{c,u,k}(\mathbf{e}_{c,u})$	$\displaystyle\triangleq\prod_{n_{c}\in\mathcal{N}_{c}}v^{e}_{n_{c},u,k}(e_{n_{% c},u})\propto\exp\left\{-(\mathbf{e}_{c,u}-\hat{\mathbf{e}}^{v}_{c,u,k})^{% \mathrm{H}}\mathbf{\Xi}^{e,v-1}_{c,u,k}(\mathbf{e}_{c,u}-\hat{\mathbf{e}}^{v}_% {c,u,k})\right\}.$	(37)

From the conditional distribution $\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}|x_{u,k})$ in (35), the mean $\tilde{\mathbf{y}}^{x}_{c,u,k}\triangleq\mathbb{E}_{\bar{p}^{x}_{c,u,k}(% \mathbf{y}_{c,k}|x_{u,k})}\left[\mathbf{y}_{c,k}\right]$ and covariance $\mathbf{\Omega}^{x}_{c,k}\triangleq\mathbb{E}_{\bar{p}^{x}_{c,u,k}(\mathbf{y}_% {c,k}|x_{u,k})}\left[(\mathbf{y}_{c,k}-\tilde{\mathbf{y}}^{x}_{c,u,k})(\mathbf% {y}_{c,k}-\tilde{\mathbf{y}}^{x}_{c,u,k})^{\mathrm{H}}\right]$ , can be calculated as


$\displaystyle\tilde{\mathbf{y}}^{x}_{c,u,k}$	$\displaystyle=\mathbf{y}_{c,k}-\sum_{u^{\prime}\in\mathcal{U}\setminus u}\ % \hat{\mathbf{h}}^{v}_{c,u^{\prime},k}\hat{x}^{v}_{c,u^{\prime},k},$	(38a)
$\displaystyle\mathbf{\Omega}^{x}_{c,k}$	$\displaystyle=\sum_{u^{\prime}\in\mathcal{U}}\Big{\{}\xi^{c,x}_{c,u^{\prime},k% }\hat{\mathbf{h}}^{v}_{c,u^{\prime},k}\hat{\mathbf{h}}^{v\mathrm{H}}_{c,u^{% \prime},k}$
	$\displaystyle\qquad+(\xi^{v,x}_{c,u^{\prime},k}+\|\hat{x}^{v}_{c,u^{\prime},k}\|% ^{2})\mathbf{\Xi}^{v,e}_{c,u^{\prime},k}\Big{\}}+\sigma^{2}\mathbf{I}_{N_{c}},$	(38b)

with $\hat{\mathbf{h}}^{v}_{c,u^{\prime},k}\triangleq\hat{\mathbf{e}}^{v}_{c,u^{% \prime},k}+\hat{\mathbf{s}}_{c,u^{\prime}}$ .

Substituting (32) and (34) into (31), the approximate function $q_{c,u,k}^{x}(x_{u,k})$ can be obtained as

\displaystyle q^{x}_{c,u,k}(x_{u,k})\propto\frac{\mathrm{proj}_{\mathbf{\Phi}}% \left[\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}|x_{u,k})v^{x}_{c,u,k}(x_{u,k})% \right]}{v^{x}_{c,u,k}(x_{u,k})}.

(39)

Under large system conditions with CLT, the conditional distribution can be approximated as $\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}|x_{u,k})\simeq\mathcal{CN}(\tilde{\mathbf% {y}}^{x}_{c,u,k},\mathbf{\Omega}^{x}_{c,k})$ . Thus, the approximate function can be expressed as $q^{x}_{c,u,k}(x_{u,k})\propto\bar{p}^{x}_{c,u,k}(\mathbf{y}_{c,k}|x_{u,k})$ with the mean and variance calculated as


$\displaystyle\hat{x}^{q}_{c,u,k}$	$\displaystyle=\hat{\mathbf{h}}^{v\mathrm{H}}_{c,u,k}(\mathbf{\Omega}_{c,k}^{x}% )^{-1}\tilde{\mathbf{y}}^{x}_{c,u,k}/\gamma^{x}_{c,u,k},$	(40a)
$\displaystyle\xi^{q,x}_{c,u,k}$	$\displaystyle=1/\gamma^{x}_{c,u,k}-\xi^{v,x}_{c,u,k},$	(40b)

with $\gamma^{x}_{c,u,k}=\hat{\mathbf{h}}^{v\mathrm{H}}_{c,u,k}(\mathbf{\Omega}^{x}_% {c,k})^{-1}\hat{\mathbf{h}}^{v}_{c,u,k}$ . The calculation of $\tilde{\mathbf{y}}^{x}_{c,u,k}$ in (38a) corresponds to a soft interference cancellation (Soft-IC) [12] using data replicas $\{\hat{x}^{v}_{c,u^{\prime},k}\}_{u^{\prime}\in\mathcal{U}\setminus u}$ and channel replicas $\{\hat{\mathbf{h}}^{v}_{c,u^{\prime},k}\}_{u^{\prime}\in\mathcal{U}}$ .

Unlike the conventional MRC-based detections [13, 14, 12], the LMMSE-based detection for each sub-array $c$ in (40a) can deal with the correlation between the leaked energy in the beam-domain owing to whitening operation by $(\mathbf{\Omega}_{c,k}^{x})^{-1}$ .

V-C2 Update $\bm{\pi}^{b,x}_{u,k}$

The KL minimization problem for $\bm{\pi}^{b,x}_{u,k}$ is formulated as

\displaystyle\underset{\bm{\pi}^{b,x}_{u,k}}{\mathrm{minimize}}\ \mathrm{KL}% \left(\hat{p}_{u,k}^{b,x}(\mathbf{E},\mathbf{X}|\mathbf{Y})\|g(\mathbf{E},% \mathbf{X}|\mathbf{Y})\right),

(41)

where $\hat{p}_{u,k}^{b,x}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is the target distribution defined as

	$\displaystyle\hat{p}_{u,k}^{b,x}(\mathbf{E},\mathbf{X}\|\mathbf{Y})=\ C_{u,k}^{% b,x}\ p(x_{u,k})$
	$\displaystyle\prod_{(u^{\prime},k^{\prime})\in\mathcal{U}\times\mathcal{K}% \setminus(u,k)}\underbrace{b_{u^{\prime},k^{\prime}}^{x}(x_{u^{\prime},k^{% \prime}})}_{\simeq p(x_{u^{\prime},k^{\prime}})}\underbrace{Q^{x}(\mathbf{X})Q% ^{e}(\mathbf{E})}_{\simeq p(\mathbf{Y}\|\mathbf{E},\mathbf{X})}\underbrace{B^{e% }(\mathbf{E})}_{\simeq p(\mathbf{E};\mathbf{\Theta})}$		(42)

where $C_{u,k}^{b,x}$ is a normalizing constant.

Similar to the derivation of (31), the optimal condition for $\bm{\pi}^{b,x}_{u,k}$ is derived as

\displaystyle g(x_{u,k}|\mathbf{Y})=\mathrm{proj}_{\mathbf{\Phi}}\left[\hat{p}% ^{b,x}_{u,k}(x_{u,k}|\mathbf{Y})\right],

(43)

where $\hat{p}^{b,x}_{u,k}(x_{u,k}|\mathbf{Y})=\int_{\mathbf{E},\mathbf{X}\setminus x% _{u,k}}\hat{p}^{b,x}_{u,k}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is the marginalized target distribution calculated as

\displaystyle\hat{p}^{b,x}_{u,k}(x_{u,k}|\mathbf{Y})

\displaystyle\propto\ p(x_{u,k})\prod_{c^{\prime}\in\mathcal{C}}q^{x}_{c^{% \prime},u,k}(x_{u,k}),

(44)

with the approximate function multiplied over the sub-array direction, $\prod_{c^{\prime}\in\mathcal{C}}q^{x}_{c^{\prime},u,k}(x_{u,k})$ , calculated as

\displaystyle\prod_{c^{\prime}\in\mathcal{C}}q^{x}_{c^{\prime},u,k}(x_{u,k})% \propto\exp\left(-|x_{u,k}-\hat{x}^{q}_{u,k}|^{2}/\xi^{q,x}_{u,k}\right),

(45)

with

\displaystyle\hat{x}_{u,k}^{q}=\xi_{u,k}^{q,x}\left(\sum_{c^{\prime}\in% \mathcal{C}}\frac{\hat{x}_{c^{\prime},u,k}^{q}}{\xi_{c^{\prime},u,k}^{q,x}}% \right),\ \xi_{u,k}^{q,x}=\left(\sum_{c^{\prime}\in\mathcal{C}}\frac{1}{\xi_{c% ^{\prime},u,k}^{q,x}}\right)^{-1}\!\!\!\!\!.

(46)

Note that combining the mean $\{\hat{x}^{q}_{c,u,k}\}_{c\in\mathcal{C}}$ and variance $\{\xi^{q,x}_{c,u,k}\}_{c\in\mathcal{C}}$ over the sub-array direction $c\in\mathcal{C}$ , as written in (46), leads to further improvements for data detection owing to the spatial diversity. Substituting (44) into (43), the approximate posterior $g(x_{u,k}|\mathbf{Y})$ can be written as

\displaystyle g(x_{u,k}|\mathbf{Y})

\displaystyle\propto\mathrm{proj_{\mathbf{\Phi}}}\left[p(x_{u,k})\prod_{c^{% \prime}\in\mathcal{C}}q^{x}_{c^{\prime},u,k}(x_{u,k})\right].

(47)

The approximate posterior mean $\hat{x}_{u,k}$ and variance $\xi^{x}_{u,k}$ of $g(x_{u,k}|\mathbf{Y})$ can be derived using the MMSE denoiser function $\eta(\cdot)$ [31], which is designed based on the prior for QAM constellation $p(x_{u,k})$ in (18). Then, the posterior mean and variance is expressed as $\hat{x}_{u,k}=\eta(\hat{x}^{q}_{u,k},\xi^{q,x}_{u,k})\triangleq\mathbb{E}_{g(x% _{u,k}|\mathbf{Y})}[x_{u,k}]$ and $\xi^{x}_{u,k}=\xi^{q,x}_{u,k}\frac{\partial\eta(\hat{x}^{q}_{u,k},\xi^{q,x}_{u% ,k})}{\partial\hat{x}^{q}_{u,k}}$ , which can be calculated as


		$\displaystyle\!\!\hat{x}_{u,k}=C^{g,x}_{u,k}\sum_{\mathcal{X}_{q}\in\mathcal{X% }}\mathcal{X}_{q}\exp\left(-\|\mathcal{X}_{q}-\hat{x}^{q}_{u,k}\|^{2}/\xi^{q,x}_% {u,k}\right),$		(48a)
		$\displaystyle\!\!\xi^{x}_{u,k}\!=\!C^{g,x}_{u,k}\!\!\sum_{\mathcal{X}_{q}\in% \mathcal{X}}\!\!\|\mathcal{X}_{q}\|^{2}\exp\!\left(\!-\|\mathcal{X}_{q}\!-\!\hat{% x}^{q}_{u,k}\|^{2}/\xi^{q,x}_{u,k}\!\right)\!-\!\|\hat{x}_{u,k}\|^{2}\!\!,\!$		(48b)

with $(C^{g,x}_{u,k})^{-1}=\sum_{\mathcal{X}_{q}\in\mathcal{X}}\exp\left(-|\mathcal{% X}_{q}-\hat{x}^{q}_{u,k}|^{2}/\xi^{q,x}_{u,k}\right).$

From (32), $v_{c,u,k}^{x}(x_{u,k})$ can be updated as

\displaystyle v^{x}_{c,u,k}(x_{u,k})\propto\ g(x_{u,k}|\mathbf{Y})/q^{x}_{c,u,% k}(x_{u,k}),

(49)

from which the associated mean and variance are given by

\displaystyle\hat{x}^{v}_{c,u,k}\!=\!\xi^{v,x}_{c,u,k}\left(\frac{\hat{x}_{u,k% }}{\xi^{x}_{u,k}}\!-\!\frac{\hat{x}^{q}_{c,u,k}}{\xi^{q,x}_{c,u,k}}\right)\!,% \ \xi^{v,x}_{c,u,k}\!=\!\left(\!\frac{1}{\xi^{x}_{u,k}}\!-\!\frac{1}{\xi^{q,x}% _{c,u,k}}\!\right)^{-1}\!.

(50)

As shown in the Soft-IC process in (38a)-(38), $\hat{x}^{v}_{c,u,k}$ and $\xi^{v,x}_{c,u,k}$ in (50) are used as soft replicas instead of $\hat{x}_{u,k}$ and $\xi^{x}_{u,k}$ in (48a)-(48b) in order to suppress the self-noise feedback in the algorithm iterations [32]. In conventional JCDE algorithms [11, 12], the self-feedback suppression is performed before the denoising process in (48a)-(48b) by generating antenna-wise extrinsic values based on BP rules. Hence, the complexity of the denoising process is $\mathcal{O}(QNUK_{\mathrm{d}})$ . In contrast, the proposed method can reduce the complexity in the denoising process as $\mathcal{O}(QUK_{\mathrm{d}})$ , since the extrinsic values $\hat{x}^{v}_{c,u,k}$ and $\xi^{v,x}_{c,u,k}$ are generated after the denoising process in (50).

Finally, from (32), $b^{x}_{u,k}(x_{u,k})$ can be updated as

\displaystyle b^{x}_{u,k}(x_{u,k})\

\displaystyle\propto\ g(x_{u,k}|\mathbf{Y}_{u,k})\Big{/}\prod_{c^{\prime}\in% \mathcal{C}}q^{x}_{c^{\prime},u,k}(x_{u,k}),

(51)

with the mean $\hat{x}^{b}_{u,k}$ and variance $\xi^{b,x}_{u,k}$ of $b^{x}_{u,k}(x_{u,k})$ being

\displaystyle\hat{x}^{b}_{u,k}\!=\!\xi^{b,x}_{u,k}\left(\frac{\hat{x}_{u,k}}{% \xi^{x}_{u,k}}-\frac{\hat{x}^{q}_{u,k}}{\xi^{q,x}_{u,k}}\right),\!\!\!

\displaystyle\xi^{b,x}_{u,k}\!=\!\left(\frac{1}{\xi^{x}_{u,k}}-\frac{1}{\xi^{q% ,x}_{u,k}}\right)^{-1}\!\!\!\!\!\!.

(52)

V-D EP for Residual Channel Error Estimation

V-D1 Update $\bm{\pi}^{q,e}_{n_{c},u,k}$

For $\bm{\pi}^{q,e}_{n_{c},u,k}$ , we minimize

\displaystyle\underset{\bm{\pi}^{q,e}_{n_{c},u,k}}{\mathrm{minimize}}\ \mathrm% {KL}\left(\hat{p}_{n_{c},k}^{q,e}(\mathbf{E},\mathbf{X}|\mathbf{Y})\|g(\mathbf% {E},\mathbf{X}|\mathbf{Y})\right),

(53)

where $\hat{p}_{n_{c},k}^{q,e}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is the target distribution designed as

	$\displaystyle\hat{p}_{n_{c},k}^{q,e}(\mathbf{E},\mathbf{X}\|\mathbf{Y})=C_{n_{c% },k}^{q,e}\ p(y_{n_{c},k}\|\bar{\mathbf{e}}_{n_{c}},\bar{\mathbf{x}}_{k})$
	$\displaystyle\prod_{(n_{c}^{\prime},k^{\prime})\in\mathcal{N}\times\mathcal{K}% \setminus(n_{c},k)}\underbrace{l^{e}_{n_{c}^{\prime},k^{\prime}}(\bar{\mathbf{% e}}_{n_{c}^{\prime}},\mathbf{x}_{k^{\prime}})}_{\simeq p(y_{n_{c}^{\prime},k^{% \prime}}\|\bar{\mathbf{e}}_{n_{c}^{\prime}},\mathbf{x}_{k^{\prime}})}% \underbrace{B^{x}(\mathbf{X})}_{\simeq p(\mathbf{X})}\underbrace{B^{e}(\mathbf% {E})}_{\simeq p(\mathbf{E};\mathbf{\Theta})},$		(54)

where $C_{n_{c},k}^{q,e}$ is a normalizing constant.

Through the same procedure as the derivation of $q^{x}_{c,u,k}(x_{u,k})$ in (39), the mean and variance of approximate function $q^{e}_{n_{c},u,k}(e_{n_{c},u})$ are obtained as

\displaystyle\hat{e}^{q}_{n_{c},u,k}=\frac{\hat{x}^{w\ast}_{n_{c},u,k}\tilde{y% }^{e}_{n_{c},u,k}}{|\hat{x}_{n_{c},u,k}^{w}|^{2}},\ \xi^{q,e}_{n_{c},u,k}=% \frac{\phi^{e}_{n_{c},u,k}}{|\hat{x}_{n_{c},u,k}^{w}|^{2}},

(55)

with


$\displaystyle\tilde{y}^{e}_{n_{c},u,k}$	$\displaystyle=y_{n_{c},k}\!-\!\sum_{u^{\prime}\in\mathcal{U}\setminus u}\hat{x% }^{w}_{c,u^{\prime},k}\hat{e}^{v}_{n_{c},u^{\prime},k}-\sum_{u^{\prime}\in% \mathcal{U}}\hat{x}^{w}_{n_{c},u^{\prime},k}\hat{s}_{n_{c},u^{\prime}},$	(56a)
$\displaystyle\phi^{e}_{n_{c},u,k}$	$\displaystyle=\sum_{u^{\prime}\in\mathcal{U}}\left(\|\hat{e}^{v}_{n_{c},u^{% \prime},k}\|^{2}+\|\hat{s}_{n_{c},u^{\prime}}\|^{2}+\xi^{v,e}_{c,u^{\prime},k}% \right)\xi^{w,x}_{n_{c},u^{\prime},k}$
	$\displaystyle\quad+\sum_{u^{\prime}\in\mathcal{U}\setminus u}\xi^{v,e}_{n_{c},% u^{\prime},k}\|\hat{x}^{w}_{c,u^{\prime},k}\|^{2}+\sigma^{2},$	(56b)
$\displaystyle\hat{x}^{w}_{c,u,k}$	$\displaystyle=\xi^{w,x}_{c,u,k}\left(\hat{x}^{x}_{u,k}(\xi^{x}_{u,k})^{-1}-% \hat{x}^{q}_{c,u,k}(N_{c}\xi^{q,x}_{c,u,k})^{-1}\!\right),\!\!$	(56c)
$\displaystyle\xi^{w,x}_{c,u,k}$	$\displaystyle=\left((\xi^{x}_{u,k})^{-1}-(N_{c}\xi^{q,x}_{c,u,k})^{-1}\right)^% {-1}.$	(56d)

V-D2 Update $\bm{\pi}^{b,e}_{n_{c},u}$

For $\bm{\pi}^{b,e}_{n_{c},u}$ , we have

\displaystyle\underset{\bm{\pi}^{b,e}_{n_{c},u}}{\mathrm{minimize}}\ \mathrm{% KL}\left(\hat{p}_{n_{c},u}^{b,e}(\mathbf{E},\mathbf{X}|\mathbf{Y})\|g(\mathbf{% E},\mathbf{X}|\mathbf{Y})\right),

(57)

where $\hat{p}_{n_{c},u}^{b,e}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ is the target distribution designed as

	$\displaystyle\hat{p}_{n_{c},u}^{b,e}(\mathbf{E},\mathbf{X}\|\mathbf{Y})=C_{n_{c% },u}^{b,e}\ p(e_{n_{c},u};\mathbf{\Theta})$
	$\displaystyle\!\!\!\!\cdot\!\!\!\!\prod_{(n_{c}^{\prime},u^{\prime})\in% \mathcal{N}\times\mathcal{U}\setminus(n_{c},u)}\underbrace{b_{n_{c}^{\prime},u% ^{\prime}}^{e}(e_{n_{c}^{\prime},u^{\prime}})}_{\simeq p(e_{n_{c},u};\mathbf{% \Theta})}\underbrace{Q^{x}(\mathbf{X})Q^{e}(\mathbf{E})}_{\simeq p(\mathbf{Y}\|% \mathbf{E},\mathbf{X})}\underbrace{B^{x}(\mathbf{X})}_{\simeq p(\mathbf{X})},$		(58)

where $C_{n_{c},u}^{b,e}$ is a normalizing constant.

Following the same methodology used to derive $g(x_{u,k}|\mathbf{Y})$ in (47), the approximate posterior $g(e_{n_{c},u}|\mathbf{Y})$ are derived as

\displaystyle g(e_{n_{c},u}|\mathbf{Y})\!

\displaystyle\propto\mathrm{proj}_{\mathbf{\Phi}}\!\left[p(e_{n_{c},u};\mathbf% {\Theta})\prod_{k^{\prime}\in\mathcal{K}}q_{n_{c},u,k^{\prime}}^{e}(e_{n_{c},u% })\right]\!,

(59)

where the mean and variance of $g(e_{n_{c},u}|\mathbf{Y})$ can be calculated based on the prior distribution $p(e_{n_{c},u};\mathbf{\Theta})$ in (19) as

\displaystyle\hat{e}_{n_{c},u}=\frac{\sigma^{e}_{n_{c},u}\hat{e}^{q}_{n_{c},u}% }{\sigma^{e}_{n_{c},u}+\xi^{q,e}_{n_{c},u}},\ \xi^{e}_{n_{c},u}=\left(\frac{1}% {\sigma^{e}_{n_{c},u}}+\frac{1}{\xi^{q,e}_{n_{c},u}}\right)^{-1},

(60)

with

\displaystyle\hat{e}_{n_{c},u}^{q}\!=\!\xi_{n_{c},u}^{q,e}\!\left(\sum_{k^{% \prime}\in\mathcal{K}}\frac{\hat{e}_{n_{c},u,k^{\prime}}^{q}}{\xi_{n_{v},u,k^{% \prime}}^{q,e}}\!\right),\xi_{n_{c},u}^{q,e}\!\!=\!\left(\sum_{k^{\prime}\in% \mathcal{K}}\frac{1}{\xi_{n_{c},u,k^{\prime}}^{q,e}}\!\right)^{-1}\!\!\!\!\!\!% \!\!.

(61)

Similarly, the approximate function $v^{e}_{n_{c},u,k}(e_{n_{c},u})$ can be derived in the same manner as (49):

\displaystyle v^{e}_{n_{c},u,k}(e_{n_{c},u})\propto g(e_{n_{c},u}|\mathbf{Y})/% q_{n_{c},u,k}^{e}(e_{n_{c},u}),

(62)

from which the mean and variance are respectively given by


		$\displaystyle\!\hat{e}^{v}_{n_{c},u,k}\!=\!\xi^{v,e}_{n_{c},u,k}\left(\!(\xi^{% e}_{n_{c},u})^{-1}\hat{e}_{n_{c},u}\!-\!(\xi^{q,e}_{n_{c},u,k})^{-1}\hat{e}^{q% }_{n_{c},u,k}\!\right),\!$		(63a)
		$\displaystyle\!\xi^{v,e}_{n_{c},u,k}=\left((\xi^{e}_{n_{c},u})^{-1}-(\xi^{q,e}% _{n_{c},u,k})^{-1}\right)^{-1}.$		(63b)

Finally, the approximate function $b^{e}_{n_{c},u}(e_{n_{c},u})$ is obtained in a similar way as the derivation of (51) as

\displaystyle b^{e}_{n_{c},u}(e_{n_{c},u})

\displaystyle\propto g(e_{n_{c},u}|\mathbf{Y})\Big{/}\prod_{k^{\prime}\in% \mathcal{K}}q_{n_{c},u,k^{\prime}}^{e}(e_{n_{c},u}),

(64)

with the mean and variance of $b^{e}_{n_{c},u}(e_{n_{c},u})$ being

\displaystyle\hat{e}^{b}_{n_{c},u}\!=\!\xi^{b,e}_{n_{c},u}\left(\frac{\hat{e}_% {n_{c},u}}{\xi^{e}_{n_{c},u}}-\frac{\hat{e}^{q}_{n_{c},u}}{\xi^{q,e}_{n_{c},u}% }\right),\ \xi^{b,e}_{n_{c},u}\!=\!\left(\frac{1}{\xi^{e}_{n_{c},u}}-\frac{1}{% \xi^{q,e}_{n_{c},u}}\right)^{-1}\!\!.

(65)

V-E Expectation Maximization for Hyper Parameter Learning

In this section, we describe the estimation method for hyper parameter set $\mathbf{\Theta}$ via the EM algorithm corresponding to M-step in (23). Using the approximate posterior $g^{(t)}(\mathbf{E},\mathbf{X}|\mathbf{Y})$ at the $t$ -th step as descrbed in Section V-C and V-D, the ELBO $\mathcal{F}\left(\mathbf{\Theta},\mathbf{\Theta}^{(t)}\right)$ in (23) can be approximated as

	$\displaystyle\mathcal{F}(\mathbf{\Theta},$	$\displaystyle\mathbf{\Theta}^{(t)})\simeq-\sum_{n\in\mathcal{N}}\sum_{u\in% \mathcal{U}}\Big{\{}\ln\sigma^{e}_{n_{c},u}+$
		$\displaystyle(\sigma^{e}_{n_{c},u})^{-1}\mathbb{E}_{g^{(t)}(e_{n_{c},u}\|% \mathbf{Y})}\left[\|e_{n_{c},u}\|^{2}\right]\Big{\}}+\mathrm{const}.$		(66)

Since the ELBO $\mathcal{F}(\mathbf{\Theta},\mathbf{\Theta}^{(t)})$ is concave for $(\sigma^{e}_{n,u})^{-1}$ , the maximization problem in (23) can be solved by the first-order necessary and sufficient condition $\partial\mathcal{F}(\mathbf{\Theta},\mathbf{\Theta}^{(t)})/\partial\left(% \sigma_{n,u}^{e}\right)^{-1}=\mathbf{0}$ , which derives the optimal variance $\sigma^{e(t+1)}_{n,u}$ at the $t$ -th step as

\displaystyle\sigma^{e(t+1)}_{n_{c},u}

\displaystyle=\mathbb{E}_{g^{(t)}(e_{n_{c},u}|\mathbf{Y})}\left[|e_{n_{c},u}|^% {2}\right]=|\hat{e}^{(t)}_{n_{c},u}|^{2}+\xi^{e(t)}_{n_{c},u},

(67)

where $\hat{e}^{(t)}_{n_{c},u}$ and $\xi^{e(t)}_{n_{c},u}$ are the approximate posterior mean and variance at the $t$ -th step, calculated in (60).

V-F Reinforcement for the Model-Based Estimate

To further improve the convergence performance for the EP algorithm, we update the model-based estimate $\hat{\mathbf{S}}$ in each iteration. Using the estimated residual channel error $\hat{\mathbf{E}}^{(t-1)}\triangleq\mathbb{E}_{g^{(t-1)}(\mathbf{E},\mathbf{X}|% \mathbf{Y})}[\mathbf{E}]$ at the $(t-1)$ -th iteration, the channel estimate for the $u$ -th UE can be reconstructed as

\displaystyle\hat{\mathbf{h}}^{(t-1)}_{u}=\hat{\mathbf{s}}^{(t-1)}_{u}+\hat{% \mathbf{e}}^{(t-1)}_{u}.

(68)

The model-based estimate at the $t$ -th iteration $\hat{\mathbf{s}}_{u}^{(t)}$ is updated with the channel estimate at the previous iteration $\hat{\mathbf{h}}_{u}^{(t-1)}$ in (68). To efficiently estimate $\hat{\mathbf{s}}_{u}^{(t)}$ by leveraging the near-field sparsity, the virtual channel representation with polar grids as described in Section IV-A are utilized. The grids are dynamically designed in the iterations, where the center of the grids is set as the angle and distance estimates at the previous iteration, and the range of grids decreases with the number of iterations. Thus, the angle and distance grids for the $u$ -th UE and $l$ -th path at the $t$ -th iteration are designed as


		$\displaystyle\tilde{\theta}_{u,l,g_{\theta}}^{(t)}\!\in\!\left[\hat{\theta}_{u% ,l}^{(t-1)}\!-\!\sigma_{\theta}^{(t)},\ \hat{\theta}_{u,l}^{(t-1)}\!+\!\sigma_% {\theta}^{(t)}\right],$		(69a)
		$\displaystyle\tilde{r}_{u,l,g_{r}}^{(t)}\!\!\in\!\left[\hat{r}_{u,l}^{(t-1)}\!% -\!\sigma_{r}^{(t)},\ \hat{r}_{u,l}^{(t-1)}\!+\!\sigma_{r}^{(t)}\right],$		(69b)

with $g_{\theta}\!\in\!\{1,\ldots,\bar{G}_{\theta}\}$ , $g_{r}\!\in\!\{1,\ldots,\bar{G}_{r}\}$ . $\hat{\theta}_{u,l}^{(t-1)}$ and $\hat{r}_{u,l}^{(t-1)}$ are the angle and distance estimates at the $(t-1)$ -th iteration, respectively, and $\sigma_{\theta}^{(t)}$ and $\sigma_{r}^{(t)}$ are, respectively, the range of angle and distance grids, where the initial values $\hat{\theta}_{u,l}^{(0)}$ and $\hat{r}_{u,l}^{(0)}$ are determined using the angle and distance estimates obtained by the initial channel estimation as shown in Algorithm 1.

Note that the range of angle and distance grids $\sigma_{\theta}^{(t)}$ and $\sigma_{r}^{(t)}$ are respectively designed by a monotonically decreasing function such as $\sigma_{\theta}^{(t)}=a_{\theta}\exp(-t/2)+b_{\theta}$ and $\sigma_{r}^{(t)}=a_{r}\exp(-t/2)+b_{r}$ , where the constant values $\{a_{\theta},b_{\theta},a_{r},b_{r}\}$ are uniquely determined with the desired range $\sigma_{\theta}^{(1)}$ , $\sigma_{r}^{(1)}$ , $\sigma_{\theta}^{(T)}$ , and $\sigma_{r}^{(T)}$ . Accordingly, the sets of angle and distance grids for the $u$ -th UE are defined as $\tilde{\bm{\theta}}_{u,l}^{(t)}\triangleq\{\tilde{{\theta}}_{u,l,g_{\theta}}^{% (t)}\}_{g_{\theta}=1}^{\bar{G}_{\theta}}$ , $\tilde{\bm{\theta}}_{u}^{(t)}\triangleq\{\tilde{{\bm{\theta}}}_{u,l}^{(t)}\}_{% l=1}^{\hat{L}_{u}}$ , $\tilde{\mathbf{r}}_{u,l}^{(t)}\triangleq\{\tilde{{r}}_{u,l,g_{r}}^{(t)}\}_{g_{% r}=1}^{\bar{G}_{r}}$ , and $\tilde{\mathbf{r}}_{u}^{(t)}\triangleq\{\tilde{{\mathbf{r}}}_{u,l}^{(t)}\}_{l=% 1}^{\hat{L}_{u}}$ .

Using the angle and distance grids in (69a)-(69b), the polar-domain dictionary matrix for the $u$ -th UE is designed as

\displaystyle\tilde{\mathbf{A}}_{u}(\tilde{\bm{\theta}}_{u}^{(t)},\tilde{% \mathbf{r}}_{u}^{(t)})=\begin{bmatrix}\tilde{\mathbf{A}}_{u,1}(\tilde{\bm{% \theta}}_{u,1}^{(t)},\tilde{\mathbf{r}}_{u,1}^{(t)}),\ldots,\tilde{\mathbf{A}}% _{u,\hat{L}_{u}}(\tilde{\bm{\theta}}_{u,\hat{L}_{u}}^{(t)},\tilde{\mathbf{r}}_% {u,\hat{L}_{u}}^{(t)})\end{bmatrix},

(70)

where $\tilde{\mathbf{A}}_{u,l}(\tilde{\bm{\theta}}_{u,l}^{(t)},\tilde{\mathbf{r}}_{u% ,l}^{(t)})\in\mathbb{C}^{N\times\bar{G}_{\theta}\bar{G}_{r}}$ is the virtual array response for the $u$ -th UE and $l$ -th path defined as

\displaystyle\tilde{\mathbf{A}}_{u,l}(\tilde{\bm{\theta}}_{u,l}^{(t)},\tilde{% \mathbf{r}}_{u,l}^{(t)})=\begin{bmatrix}\tilde{\mathbf{a}}(\tilde{{\theta}}_{u% ,l,1}^{(t)},\tilde{r}_{u,l,1}^{(t)}),\ldots,\tilde{\mathbf{a}}(\tilde{{\theta}% }_{u,l,G_{\theta}}^{(t)},\tilde{r}_{u,l,G_{r}}^{(t)})\end{bmatrix}.

Through the virtual channel representation with the dictionary matrix $\tilde{\mathbf{A}}_{u}(\tilde{\bm{\theta}}_{u}^{(t)},\tilde{\mathbf{r}}_{u}^{(% t)})$ , the near-field channel for the $u$ -th UE can be expressed as

\displaystyle\mathbf{h}_{u}=\sum_{l=1}^{\hat{L}}\tilde{\mathbf{A}}_{u,l}(\bm{% \theta}_{u,l}^{(t)},\tilde{\mathbf{r}}_{u,l}^{(t)})\tilde{\mathbf{z}}_{u,l}=% \tilde{\mathbf{A}}_{u}(\tilde{\bm{\theta}}_{u}^{(t)},\tilde{\mathbf{r}}_{u}^{(% t)})\tilde{\mathbf{z}}_{u},

(71)

where $\tilde{\mathbf{z}}_{u,l}\in\mathbb{C}^{\bar{G}_{\theta}\bar{G}_{r}\times 1}$ is the virtual path gain vector for the $l$ -th path, and $\tilde{\mathbf{z}}_{u}=[\tilde{\mathbf{z}}_{u,1}^{\mathrm{T}},\ldots,\tilde{% \mathbf{z}}_{u,\hat{L}_{u}}^{\mathrm{T}}]^{\mathrm{T}}\in\mathbb{C}^{\bar{G}_{% \theta}\bar{G}_{r}\hat{L}_{u}\times 1}$ is the virtual path gain vector including all paths.

In light of the near-field model in (71), an update of the model-based estimate $\hat{\mathbf{s}}^{(t)}_{u}$ can be obtained by

\displaystyle\hat{\mathbf{s}}_{u}^{(t)}=\mathbf{A}(\hat{\bm{\theta}}_{u}^{(t)}% ,\hat{\mathbf{r}}_{u}^{(t)})\hat{\mathbf{z}}_{u}^{(t)},

(72)

with $\hat{\mathbf{z}}_{u}^{(t)}$ denoting the path gain estimates, $\mathbf{A}(\hat{\bm{\theta}}_{u}^{(t)},\hat{\mathbf{r}}_{u}^{(t)})$ being the corresponding array responses, which can be computed by solving

$\displaystyle\underset{\tilde{\mathbf{z}_{u}}}{\text{minimize}}$	$\displaystyle\ \ \left\\|\hat{\mathbf{h}}_{u}^{(t-1)}-\hat{\mathbf{s}}^{(t)}_{u% }\right\\|_{2}^{2}$
subject to	$\displaystyle\ \ \ \hat{\mathbf{s}}^{(t)}_{u}=\tilde{\mathbf{A}}_{u}(\tilde{% \bm{\theta}}_{u}^{(t)},\tilde{\mathbf{r}}_{u}^{(t)})\tilde{\mathbf{z}}_{u}$
	$\displaystyle\ \ \left\\|\tilde{\mathbf{z}}_{u,l}\right\\|_{0}=1,\ \ \forall l% \in\{1,2,\ldots,\hat{L}_{u}\}.$	(73)

To summarize, the proposed algorithm is encapsulated in Algorithm 2, where a dam** scheme [16] is introduced in line 6, 7, 12, and 18 to enhance convergence performance.

Algorithm 2 Proposed JCDE algorithm

1:Input:

\mathbf{Y},\ \mathbf{X}_{\mathrm{p}},\ \mathbf{H}^{\mathcal{S}}_{0},\ \{\hat{% \bm{\theta}}_{u},\hat{\mathbf{r}}_{u}\}_{u\in\mathcal{U}},\ \{\hat{L}_{u}\}_{u% =1}^{U},\ T,\ \bar{G}_{\theta},\ \bar{G}_{r},

\sigma_{\theta}^{(1)},\ \sigma_{\theta}^{(T)},\ \sigma_{r}^{(1)},\ \sigma_{r}^% {(T)}

2:Output:

\hat{\mathbf{X}},\ \hat{\mathbf{H}},\ \{\hat{\bm{\theta}}_{u},\ \hat{\mathbf{r% }}_{u}\}_{u\in\mathcal{U}}

4:// Initialization

\hat{\mathbf{S}}=\mathbf{D}_{N}\mathbf{H}_{0}^{\mathcal{S}}

from Algorithm 1

\hat{x}^{v}_{c,u,k_{\mathrm{d}}}\!=\hat{x}^{w}_{c,u,k_{\mathrm{d}}}\!=0

\xi^{v,x}_{c,u,k_{\mathrm{d}}}\!=\xi^{w,x}_{c,u,k_{\mathrm{d}}}\!=E_{s}

\forall k_{\mathrm{d}}\in\mathcal{K}_{\mathrm{d}}

\hat{x}^{v}_{c,u,k_{\mathrm{p}}}\!\!\!\!=\hat{x}^{w}_{c,u,k_{\mathrm{p}}}\!\!% \!\!=[\mathbf{X}_{\mathrm{p}}]_{u,k_{p}}

\xi^{v,x}_{c,u,k_{\mathrm{p}}}\!\!\!\!=\xi^{w,x}_{c,u,k_{\mathrm{p}}}\!\!\!\!=0

\forall k_{\mathrm{p}}\in\mathcal{K}_{\mathrm{p}}

\hat{e}^{v}_{n_{c},u,k}=0

\xi^{v,e}_{n_{c},u,k}=|[\mathbf{D}_{N}\mathbf{H}_{0}^{\mathcal{S}}]_{n_{c},u}|% ^{2}

\forall k\in\mathcal{K}

9:for

t=1,2,\ldots,T

10: // EP for data estimation

11: Calculate

\tilde{\mathbf{y}}^{x}_{c,u,k},\ \mathbf{\Omega}^{x}_{c,k}

from (38a)-(38)

12: Calculate

\hat{x}^{q}_{c,u,k},\ \xi^{q,x}_{c,u,k}

from (40a)-(40b)

13: Calculate

\hat{x}_{u,k}^{q},\ \xi_{u,k}^{q,x}

from (46)

14: Calculate

\hat{x}_{u,k},\ \xi^{x}_{u,k}

from (48a)-(48b)

15: Calculate

\hat{x}^{v}_{c,u,k},\ \hat{\xi}^{v,x}_{c,u,k}

from (50) with dam**

16: Calculate

\hat{x}^{w}_{c,u,k},\ \hat{\xi}^{w,x}_{c,u,k}

from (56c) with dam**

17: // EP for residual error estimation

18: Calculate

\tilde{y}^{e}_{n_{c},u,k},\ \phi^{e}_{n_{c},u,k}

from (56a)-(56)

19: Calculate

\hat{e}^{q}_{n_{c},u,k},\ \xi^{q,e}_{n_{c},u,k}

from (55)

20: Calculate

\hat{e}_{n_{c},u}^{q,e},\ \xi_{n_{c},u}^{q,e}

from (61)

21: Calculate

\hat{e}_{n_{c},u},\ \xi^{e}_{n_{c},u}

from (60)

22: Calculate

\hat{e}^{v}_{n_{c},u,k},\hat{\xi}^{v,e}_{n_{c},u,k}

from (63a)-(63b) with dam**

23: Update channel estimate

\hat{\mathbf{h}}_{u}=\hat{\mathbf{s}}_{u}+\hat{\mathbf{e}}_{u}

from (68)

24: // EM algorithm for hyper parameter learning

25: Update

\sigma^{e}_{n_{c},u}

from (67)

26: // Reinforcement for the model-based estimate

27: Generate the grids

\tilde{\theta}_{u,l,g_{\theta}},\ \tilde{r}_{u,l,g_{r}}

from (69a)-(69b)

28: Design the dictionary

\tilde{\mathbf{A}}_{u}(\tilde{\bm{\theta}}_{u},\tilde{\mathbf{r}}_{u})

from (70)

29: Obtain

\hat{\bm{\theta}}_{u},\hat{\mathbf{r}}_{u},\hat{\mathbf{z}}_{u}

by solving (V-F)

30: Calculate

\hat{\mathbf{s}}_{u}

from (72) with dam**

31:end for

VI Simulation Results

This section evaluates the performance of the proposed initial channel estimation and subsequent JCDE algorithms under the following setup. The carrier frequency is $100\ \mathrm{GHz}$ , the number of BS antennas $N$ is $200$ , the number of UEs $U$ is $50$ , the modulation order $Q$ is $64$ -QAM, and the length of pilots $K_{\mathrm{p}}$ and data $K_{\mathrm{d}}$ are $25$ and $100$ , respectively. The non-orthogonal pilot $\mathbf{X}_{\mathrm{p}}\in\mathbb{C}^{50\times 25}$ is designed by the frame design method in [15]. The near-field channel is composed of $L_{u}=3$ paths, i.e., $1$ LoS path and $2$ NLoS paths, with a Rician $K$ -factor of 10 dB. The total number of paths is $L\!=\!50\!\times\!3\!=\!150$ , and the corresponding oversampling quantity used in Algorithm 1 is set to $\hat{L}=250$ . The AoAs and distances are uniformly randomly generated in the range $\theta_{u,l}\in[-60^{\circ},60^{\circ}]$ and $r_{u,l}\in[1,10]$ m, respectively. The polar-domain dictionary $\tilde{\mathbf{A}}(\tilde{\bm{\theta}},\tilde{\mathbf{r}})$ in (IV-A) is designed with $G_{r}=7$ , $G_{\theta}=395$ and desired coherence $\gamma_{\mathrm{d}}=0.6$ in [10]. The performance is evaluated by the normalized mean-squared error (NMSE) and bit error rate (BER) under various signal-to-noise ratio (SNR). NMSE and SNR are defined as $\mathrm{NMSE}(\bm{\mathbf{H}})\triangleq\mathbb{E}\left[{\|\bm{\mathbf{H}}-% \hat{\bm{\mathbf{H}}}\|_{\mathrm{F}}^{2}}/{\|\bm{\mathbf{H}}\|_{\mathrm{F}}^{2% }}\right]$ , and $\mathrm{SNR}\triangleq\mathbb{E}\left[\|\mathbf{HX}\|_{\mathrm{F}}^{2}\right]/% \mathbb{E}\left[\|\mathbf{N}\|_{\mathrm{F}}^{2}\right]$ . In what follows, the initial channel estimation and JCDE performance are evaluated in Section VI-A and VI-B, respectively.

VI-A Initial Channel Estimation Performance

To evaluate the initial channel estimation performance, the following estimation methods are compared: (a) LS: a classical least squares-based channel estimation, (b) P-SOMP [7]: a near-field channel estimation without considering the non-orthogonality of pilots. (c) 2D-CoSaMP [10]: a near-field channel estimation considering non-orthogonality, and (d) the proposed initial channel estimation method in Algorithm 1.

Fig. 4 shows the NMSE against SNR. The P-SOMP exhibits limited improvement with an increase in SNR due to pilot contamination stemming from non-orthogonal pilots, whereas 2D-CoSaMP demonstrates a performance enhancement compared to P-SOMP. The proposed method surpasses these conventional methods by mitigating noise amplification through the utilization of 2D-OMP in the second stage associated with UE-path pairing, resulting in superior channel estimation. Fig. 4 and Table I show the computational complexity evaluated by floating point operations (FLOPs). As depicted in the figure, the FLOPs of the proposed method are comparable to 2D-CoSaMP, owing to the two-stage procedure separating angle-distance estimation and UE-path pairing.

TABLE I: Computational complexity of initial channel estimation

Algotrithm

FLOPs

P-SOMP [7]

\mathcal{O}\left(NU(G_{r}G_{\theta}+\hat{L})+\hat{L}^{2}U(N+G_{r}G_{\theta})\right)

2D-CoSaMP [10]

\mathcal{O}\left(T_{\mathrm{iter}}\left(NU(K_{\mathrm{p}}+G_{r}G_{\theta})+% \hat{L}NK_{\mathrm{p}}+\hat{L}^{3}\right)\right)

Proposed

\mathcal{O}\Big{(}NK_{\mathrm{p}}(G_{r}G_{\theta}+\hat{L}+NU)

+\hat{L}^{2}(G_{r}G_{\theta}K_{\mathrm{p}}+NU+NK_{\mathrm{p}})+\hat{L}^{3}K_{% \mathrm{p}}+\hat{L}^{4}\Big{)}

Note: $T_{\mathrm{iter}}$ is the iteration number of 2D-CoSaMP, determined by [10].

VI-B JCDE Performance

In this subsection, we evaluate the performance of the proposed JCDE algorithm. As for JCDE algorithm parameters, the dam** factor is set to $0.5$ , the number of iterations is $T=30$ , the number of grids are $\bar{G}_{\theta}=5,\bar{G}_{r}=5$ , the grid ranges are $\sigma_{\theta}^{(1)}=5^{\circ}$ , $\sigma_{\theta}^{(T)}=0.1^{\circ}$ , $\sigma_{r}^{(1)}=5\ \mathrm{m}$ , and $\sigma_{r}^{(T)}=1\ \mathrm{m}$ , respectively. The extremely large array with $N=200$ antennas is divided into $C=4$ sub-arrays with $N_{c}=50$ antennas per sub-array. For comparison, AoA-aided BiGaBP [12] are employed as a benchmark, which is a state-of-the-art JCDE algorithm. Besides, we consider an ideal Genie-aided case with perfect knowledge of CSI or data corresponding to the lower bound of the proposed method.

VI-B1 JCDE Performance with Initial Channel Estimation

This subsection reveals the NMSE and BER performance of the JCDE algorithms with various initial channel estimation methods, including P-SOMP, 2D-CoSaMP, and the proposed initial channel estimation method. To evaluate the data detection capability of the above initial channel estimation methods, the LMMSE detector is used for data estimation.

Fig. 5 shows the BER and NMSE performance. As shown in the figures, while LMMSE with LS, which cannot take advantage of the near-field model structures, exhibits poor BER performance, LMMSE with the other initial estimation approaches considering the near-field model structure achieve a slight performance improvement. However, there remains high-level error floors due to the non-orthogonal pilots. In contrast, the JCDE algorithms boost BER performance due to utilizing both pilot and consecutive data. In particular, the proposed JCDE algorithm with the proposed initial channel estimation demonstrates a significant performance gain, approaching the lower bound of perfect CSI or perfect data.

Moreover, the proposed JCDE algorithm demonstrates a notable BER performance compared to the state-of-the-art AoA-aided BiGaBP [12]. The performance improvement can be attributed to two primary factors. The first factor is that the proposed algorithm can leverage the near-field model-based estimation described in Section V-F, whereas BiGaBP relies on the far-field assumption. The second factor is that the proposed sub-array-wise LMMSE-based detection in (40a) is capable of addressing the correlation between the leaked energy in the beam-domain, whereas BiGaBP is incapable of doing so because of its MRC. To reveal the aforementioned two factors, in Section VI-B2, we show the convergence analysis with and without near-field model information. Besides, we evaluate in Section VI-B3 the proposed sub-array-wise LMMSE-based detection performance and its complexity across various numbers of sub-arrays $C$ .

VI-B2 Convergence Analysis

To clarify the advantages gained by leveraging the near-field model structure, we evaluate the proposed JCDE algorithm with and without the model-based estimation process explained in Section V-F. Fig. 6 illustrates the BER and NMSE convergence behavior with respect to the number of algorithmic iterations. In the figure, the red triangle marker corresponds to the proposed JCDE algorithm without the model-based estimate, i.e., $\hat{\mathbf{S}}^{(t)}=\mathbf{0}$ , where the prior distribution is designed i.i.d. for each element of $\mathbf{H}$ instead of $\mathbf{E}$ , akin to [13, 14]. The green square marker corresponds to the proposed JCDE algorithm with the initial model-based estimate but without updating in iterations, i.e., $\hat{\mathbf{S}}^{(t)}=\hat{\mathbf{S}}^{(0)}$ . Comparing the red triangle maker and green square marker, we can verify the performance improvement stemming from the use of the near-filed model through the decomposition of $\mathbf{H}$ into $\hat{\mathbf{S}}$ and $\mathbf{E}$ as written in (15). Furthermore, in comparison to the proposed algorithm with adaptive update, it can be seen that the adaptive updating of the model-based estimate $\hat{\mathbf{S}}^{(t)}$ enhances the BER and NMSE performance by further exploiting the near-field model.

VI-B3 Performance Against the Number of Sub-arrays

TABLE II: Computational complexity of JCDE algorithms

Algotrithm

FLOPs

BiGaBP [12]

\mathcal{O}\Big{(}UK_{\mathrm{d}}NQ+UKN+U\hat{L}_{u}^{2}N^{2}+UN^{2}\hat{L}_{u% }\Big{)}

Proposed

\mathcal{O}\Big{(}CUK_{\mathrm{d}}N_{c}^{2}+CK_{\mathrm{d}}N_{c}^{3}+UK_{% \mathrm{d}}Q+UKN

+UN\bar{G}_{\theta}\bar{G}_{r}+U\hat{L}_{u}^{2}N+U\hat{L}_{u}\bar{G}_{\theta}% \bar{G}_{r}\Big{)}

To analyze the impact of the number of sub-arrays $C$ on the performance of the proposed JCDE algorithm employing the sub-array-wise LMMSE-based detection, we offer in Fig. 7 the BER and FLOPs with respect to various numbers of sub-arrays $C$ , where $C=1$ corresponds to the full-array LMMSE-based detection and $C=N=200$ corresponds to the MRC-based detection. As depicted in the figure, the BER decreases as the number of sub-arrays increases (i.e., the number of antennas at each sub-array $N_{\mathrm{c}}$ decreases) because each sub-array fails to effectively whiten the correlation in the beam-domain even in the perfect CSI case. In particular, the MRC-based detection corresponding to $C=200$ exhibits poor performance. In contrast, an increase in the number of sub-arrays leads to a reduction in FLOPs attributed to the decreased size of the inverse matrix associated with the LMMSE-based detection in (40a). Despite relying on the LMMSE-based detector, the proposed algorithm can achieve lower FLOPs when $C>4$ compared to BiGaBP, which relies on an MRC-based detector, since the proposed method suppresses self-feedback in (50) after the denoising process as in (48a)-(48b) with FLOPs $\mathcal{O}(UK_{d}Q)$ , whereas BiGaBP suppresses self-feedback before the denoising process [12, 32] with FLOPs $\mathcal{O}(NUK_{d}Q)$ that is dominant complexity throughout the entire process as shown in Table II. From the above results, it is evident that the proposed method outperforms the conventional method in terms of both data detection and complexity.

VII Conclusion

This paper proposed an initial channel estimation algorithm and subsequent JCDE algorithm for multiuser XL-MIMO systems with non-orthogonal pilots. The initial channel estimation is performed by an efficient two-stage compressed sensing algorithm exploiting the polar-domain sparsity. Furthermore, the initial channel estimates are refined by jointly utilizing both non-orthogonal pilots and data via the EP algorithm. To improve channel estimation accuracy, the model-based deterministic approach is integrated into a Bayesian inference framework. In addition, to address the near-field specific correlation in the beam domain, a sub-array-wise LMMSE filter is designed considering the correlation and channel estimation errors for data detection. Computer simulations validated that the proposed method is superior to existing approaches in terms of channel estimation, data detection, and complexity.

References

[1] H. Tataria, M. Shafi, A. F. Molisch, M. Dohler, H. SjÃ¶land, and F. Tufvesson, “6G wireless systems: Vision, requirements, challenges, insights, and opportunities,” Proc. IEEE, vol. 109, no. 7, pp. 1166–1199, 2021.
[2] E. D. Carvalho, A. Ali, A. Amiri, M. Angjelichinoski, and R. W. Heath, “Non-stationarities in extra-large-scale massive MIMO,” IEEE Wirel. Commun., vol. 27, no. 4, pp. 74–80, 2020.
[3] Z. Wang et al., “A tutorial on extremely large-scale MIMO for 6G: Fundamentals, signal processing, and applications,” IEEE Commun. Surveys Tuts., Early Access, 2024.
[4] H. Iimori, T. Takahashi, K. Ishibashi, G. T. F. de Abreu, D. GonzÃ¡lez G., and O. Gonsa, “Joint activity and channel estimation for extra-large MIMO systems,” IEEE Trans. Wirel. Commun., vol. 21, no. 9, pp. 7253–7270, 2022.
[5] M. Cui, Z. Wu, Y. Lu, X. Wei, and L. Dai, “Near-field MIMO communications for 6G: Fundamentals, challenges, potentials, and future directions,” IEEE Commun. Mag., vol. 61, no. 1, pp. 40–46, 2023.
[6] Y. Liu, Z. Wang, J. Xu, C. Ouyang, X. Mu, and R. Schober, “Near-field communications: A tutorial review,” IEEE Open J. Commun. Soc., vol. 4, pp. 1999–2049, 2023.
[7] M. Cui and L. Dai, “Channel estimation for extremely large-scale MIMO: Far-field or near-field?” IEEE Trans. Commun., vol. 70, no. 4, pp. 2663–2677, 2022.
[8] J. Rodríguez-Fernández, N. González-Prelcic, K. Venugopal, and R. W. Heath, “Frequency-domain compressive channel estimation for frequency-selective hybrid millimeter wave MIMO systems,” IEEE Trans. Wirel. Commun., vol. 17, no. 5, pp. 2946–2960, 2018.
[9] C. Hu, L. Dai, T. Mir, Z. Gao, and J. Fang, “Super-resolution channel estimation for mmwave massive mimo with hybrid precoding,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 8954–8958, 2018.
[10] X. Xie, Y. Wu, J. An, D. W. K. Ng, C. Xing, and W. Zhang, “Massive unsourced random access for near-field communications,” IEEE Trans. Commun., pp. 1–1, Early Access 2024.
[11] K. Ito, T. Takahashi, S. Ibi, and S. Sampei, “Bilinear gaussian belief propagation for massive mimo detection with non-orthogonal pilots,” IEEE Trans. Commun., vol. 72, no. 2, pp. 1045–1061, 2024.
[12] K. Ito, T. Takahashi, K. Igarashi, S. Ibi, and S. Sampei, “AoA estimation-aided Bayesian receiver design via bilinear inference for mmWave massive MIMO,” in Proc. IEEE Int. Conf. Commun. (ICC), 2023, pp. 6474–6479.
[13] W. Yan and X. Yuan, “Semi-blind channel-and-signal estimation for uplink massive MIMO with channel sparsity,” IEEE Access, vol. 7, pp. 95 008–95 020, 2019.
[14] L. Chen and X. Yuan, “Blind multiuser detection in massive MIMO channels with clustered sparsity,” IEEE Wirel. Commun. Lett., vol. 8, no. 4, pp. 1052–1055, 2019.
[15] H. Iimori, T. Takahashi, K. Ishibashi, G. T. F. de Abreu, and W. Yu, “Grant-free access via bilinear inference for cell-free MIMO with low-coherence pilots,” IEEE Trans. Wirel. Commun., vol. 20, no. 11, pp. 7694–7710, 2021.
[16] J. T. Parker, P. Schniter, and V. Cevher, “Bilinear generalized approximate message passing―part I: Derivation,” IEEE Trans. Signal Process., vol. 62, no. 22, pp. 5839–5853, 2014.
[17] Y. Kabashima, “A CDMA multiuser detection algorithm on the basis of belief propagation,” J. Phys. A, Math. Gen., vol. 36, no. 43, 2003.
[18] D. Fan et al., “Angle domain channel estimation in hybrid millimeter wave massive MIMO systems,” IEEE Trans. Wirel. Commun., vol. 17, no. 12, pp. 8165–8179, 2018.
[19] Y. Fang, J. Wu, and B. Huang, “2D sparse signal recovery via 2D orthogonal matching pursuit,” Sci. China Inf. Sci., vol. 55, pp. 889–897, 2012.
[20] T. P. Minka, “Expectation propagation for approximate bayesian inference,” Proc. 17th Conf. Uncertainty Artif, pp. 362–369, 2001.
[21] H. Iimori, G. T. F. de Abreu, O. Taghizadeh, R.-A. Stoica, T. Hara, and K. Ishibashi, “Stochastic learning robust beamforming for millimeter-wave systems with path blockage,” IEEE Wirel. Commun. Lett., vol. 9, no. 9, pp. 1557–1561, 2020.
[22] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition,” in Proc. Asilomar Conf. Signals, Syst., Comput, 1993, pp. 40–44 vol.1.
[23] B. L. Sturm and M. G. Christensen, “Comparison of orthogonal matching pursuit implementations,” in Proc. 20th Eur. Signal Process. Conf. (EUSIPCO), 2012, pp. 220–224.
[24] J. Ma and L. **, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–2033, 2017.
[25] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” IEEE Trans. Inf. Theory, vol. 65, no. 10, pp. 6664–6684, 2019.
[26] H. Wang, A. Kosasih, C.-K. Wen, S. **, and W. Hardjawana, “Expectation propagation detector for extra-large scale massive MIMO,” IEEE Trans. Wirel. Commun., vol. 19, no. 3, pp. 2036–2051, 2020.
[27] A. Mishra, A. Rajoriya, A. K. Jagannatham, and G. Ascheid, “Sparse bayesian learning-based channel estimation in millimeter wave hybrid MIMO systems,” in Proc. IEEE Int. Workshop Sig. Process. Ad. Wirel. Commun. (SPAWC), 2017, pp. 1–5.
[28] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Berlin, Germany: Springer-Verlag, 2006.
[29] M. E. Tip**, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, no. 2, pp. 211–244, 2002.
[30] P. Jain, P. Kar et al., “Non-convex optimization for machine learning,” Found. Trends Mach. Learn., vol. 10, no. 3-4, pp. 142–363, 2017.
[31] Q. Zou and H. Yang, “A concise tutorial on approximate message passing,” arXiv:2201.07487, 2022.
[32] R. Tamaki, K. Ito, T. Takahashi, S. Ibi, and S. Sampei, “Suppression of self-noise feedback in GAMP for highly correlated large MIMO detection,” in Proc. IEEE Int. Conf. Commun. (ICC), 2022, pp. 1300–1305.

Joint Channel and Data Estimation for Multiuser Extremely Large-Scale MIMO Systems

Abstract

Index Terms:

I Introduction

II System Model

II-A Channel Model

II-B Received Signal Model

III Overview of the Proposed Algorithm

IV Proposed Initial Channel Estimation

IV-A Angle and Distance Estimation

IV-B UE-Path Pairing

V Proposed joint channel and data estimation

V-A Pre-processing for Channel and Data Estimation

V-A1 Pre-processing for Channel Estimation

V-A2 Pre-processing for Data Estimation

V-B Bayesian Inference Formulation

V-C EP for Data Estimation

V-C1 Update 𝝅c,u,kq,xsubscriptsuperscript𝝅𝑞𝑥𝑐𝑢𝑘\bm{\pi}^{q,x}_{c,u,k}bold_italic_π start_POSTSUPERSCRIPT italic_q , italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_u , italic_k end_POSTSUBSCRIPT

V-C2 Update 𝝅u,kb,xsubscriptsuperscript𝝅𝑏𝑥𝑢𝑘\bm{\pi}^{b,x}_{u,k}bold_italic_π start_POSTSUPERSCRIPT italic_b , italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u , italic_k end_POSTSUBSCRIPT

V-D EP for Residual Channel Error Estimation

V-D2 Update 𝝅nc,ub,esubscriptsuperscript𝝅𝑏𝑒subscript𝑛𝑐𝑢\bm{\pi}^{b,e}_{n_{c},u}bold_italic_π start_POSTSUPERSCRIPT italic_b , italic_e end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_u end_POSTSUBSCRIPT

V-E Expectation Maximization for Hyper Parameter Learning

V-F Reinforcement for the Model-Based Estimate

VI Simulation Results

VI-A Initial Channel Estimation Performance

VI-B JCDE Performance

VI-B1 JCDE Performance with Initial Channel Estimation

VI-B2 Convergence Analysis

VI-B3 Performance Against the Number of Sub-arrays

VII Conclusion

References

V-C1 Update $\bm{\pi}^{q,x}_{c,u,k}$

V-C2 Update $\bm{\pi}^{b,x}_{u,k}$

V-D2 Update $\bm{\pi}^{b,e}_{n_{c},u}$