A Statistical Characterization of Wireless Channels Conditioned on Side Information

Benedikt Böck, , Michael Baur, ,
Nurettin Turan, , Dominik Semmler, ,
and Wolfgang Utschick ©This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Abstract

Statistical prior channel knowledge, such as the wide-sense-stationary-uncorrelated-scattering (WSSUS) property, and additional side information both can be used to enhance physical layer applications in wireless communication. Generally, the wireless channel’s strongly fluctuating path phases and WSSUS property characterize the channel by a zero mean and Toeplitz-structured covariance matrices in different domains. In this work, we derive a framework to comprehensively categorize side information based on whether it preserves or abandons these statistical features conditioned on the given side information. To accomplish this, we combine insights from a generic channel model with the representation of wireless channels as probabilistic graphs. Additionally, we exemplify several applications, ranging from channel modeling to estimation and clustering, which demonstrate how the proposed framework can practically enhance physical layer methods utilizing machine learning (ML).

Index Terms:

Wide-sense-stationary-uncorrelated-scattering, probabilistic graphs, Toeplitz structure, joint communication and sensing, channel modeling.

I Introduction

Statistical knowledge about the first and second moment of the wireless channel between a user equipment (UE) and a base station (BS) is of key importance to improve their communication link. Among other things, the first and second moment of the channel can be used to improve channel estimation [1, Sec. 3.5.2], provide information about channel parameters [2, Sec. 2.6], or can be employed in other applications, where instantaneous channel state information (CSI) is not available or costly to acquire, with the advantage of reduced pilot and computational overhead [3]. Hence, structural prior information about the channel’s first and second moment can be beneficial, e.g., to reduce the number of required samples for estimating these moments [4]. A prominent example of such prior information is the WSSUS assumption, which characterizes the wireless channel as a wide sense stationary (WSS) process in both the temporal and frequency domains and, thus, constrains the channel CMs in these domains to be Toeplitz structured [5]. Equivalently, the WSSUS assumption can be extended to the spatial domain, leading to a spatial CM with Toeplitz structure [6, Sec. 2.6].

Recently, leveraging additional side information about the wireless channel between the UE and the BS to enhance their communication link has attracted a lot of attention in research. While this side information either can be interpretable in form of the UE’s position [7], it can also be given in an abstract representation by some ML-based latent embedding [8]. Depending on the side information characteristics, the conditioning on this information can either preserve or abandon structural features of the channel’s mean and CM. In this work, we aim to establish a comprehensive framework for characterizing any side information based on its influence on the first and second channel moments. Our main contributions are the following:

•

We establish a theorem leveraging the statistical relation between arbitrary side information and the complex path loss phases of a channel to describe how the WSSUS and zero-mean channel properties are either preserved or abandoned given this side information.
•

By combining this theorem with a probabilistic graph representation of wireless channels, we introduce a framework, which allows to comprehensively categorize side information regarding its effect on the channel’s WSSUS and zero-mean properties.
•

We present various exemplary applications of this framework. Specifically, we introduce a validation technique for the proper training of ML-based channel models, regularize channel clustering and analyze the utility of side information for channel estimation by means of our proposed framework.

II Preliminaries

II-A Channel Model

Over the last decades, several channel models have been proposed following different paradigms and imposing slightly different assumptions [9]. In this work, we consider a generic wideband and time-varying multiple-input-multiple-output (MIMO) baseband channel, which is sampled equidistantly in the time, frequency as well as the spatial domains. This is the case in the typical orthogonal-frequency-division-multiplexing (OFDM) setup with constant subcarrier spacings and constant symbol durations, in which the transmitter and the receiver are both equipped with uniform linear arrays (ULAs). The resulting channel (tensor) is given by

\bm{H}=\sum_{\ell=1}^{L}\sqrt{p_{\ell}}\mathrm{e}^{-\operatorname{j}\beta_{% \ell}}\bm{a}_{f}(\tau_{\ell})\otimes\bm{a}_{t}(\nu_{\ell})\otimes\bm{a}_{% \operatorname{R}}(\theta^{(\operatorname{R})}_{\ell})\otimes\bm{a}_{% \operatorname{T}}(\theta^{(\operatorname{T})}_{\ell})

(1)

characterized by $L$ paths and the channel parameters, i.e., its complex path losses $\sqrt{p_{\ell}}\mathrm{e}^{-\operatorname{j}\beta_{\ell}}$ , delays $\tau_{\ell}$ , Doppler-shifts $\nu_{\ell}$ , directions of arrival (DoAs) $\theta^{(\operatorname{R})}_{\ell}$ , and directions of depature (DoDs) $\theta^{(\operatorname{T})}_{\ell}$ . The vectors $\bm{a}_{(\cdot)}(\cdot)$ denote the steering vectors across the different domains, respectively. The path loss phase shifts $\{\beta_{\ell}\}_{\ell=1}^{L}$ contain polarization effects as well as the center frequency phase shift $2\pi d_{\ell}/\lambda_{c}$ with center wavelength $\lambda_{c}$ and path distance $d_{\ell}$ . Our theoretical findings in Section III build on the following model assumptions.

Assumption 1.

The phases $\beta_{\ell}$ are uniformly distributed, i.e.,

\beta_{\ell}\sim\mathcal{U}(0,2\pi)\ \text{for}\ \text{all}\ \ell.

(2)

Assumption 1 holds when the path distances $d_{\ell}$ are not known on a scale of the center wavelength $\lambda_{c}$ and $d_{\ell}\gg\lambda_{c}$ [1, Sec. 2.4.2]. This is the case in typical wireless communication scenarios rendering Assumption 1 to be generally reasonable.

Assumption 2.

The phases $\beta_{\ell}$ are statistically independent of all channel parameters as well as across different paths, i.e.,

		$\displaystyle\beta_{\ell}\perp\kappa_{\ell^{\prime}}\ \text{for}\ \text{all}\ % \ell,\ell^{\prime}\ \text{and}\ \kappa\in\{p,\tau,\nu,\theta^{(\operatorname{R% })},\theta^{(\operatorname{T})}\},$		(3)
		$\displaystyle\beta_{\ell}\perp\beta_{\ell^{\prime}}\ \text{for}\ \text{all}\ % \ell\neq\ell^{\prime}.$		(4)

Although rigorously, the phases $\{\beta_{\ell}\}_{\ell=1}^{L}$ depend on the delays $\tau_{\ell}$ , they are commonly modeled to satisfy Assumption 2 due to their strong fluctuations and the multitude of different influential effects contained [10, Sec. 7.5]. In addition, within small frequency, time and spatial ranges, the channel parameters and steering vectors are constant over the frequency $f$ , the time $t$ and the positions of the transmitting and receiving antennas. As a result, the channel exhibits a zero mean and the WSSUS property, i.e., it is a WSS process across all domains [5]. This implies $\bm{H}$ in (1) to possess a zero mean and a Toeplitz structured CM in any domain. The steering vectors are given by

	$\displaystyle\bm{a}_{f}(\tau_{\ell})=[1,\mathrm{e}^{-\operatorname{j}2\pi% \Delta f\tau_{\ell}},\mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j% }2\pi\Delta f(M_{\mathrm{SC}}-1)\tau_{\ell}}]^{\operatorname{T}},$		(5)
	$\displaystyle\bm{a}_{t}(\nu_{\ell})=[1,\mathrm{e}^{-\operatorname{j}2\pi\Delta T% \nu_{\ell}},\mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j}2\pi% \Delta T(M_{\mathrm{SN}}-1)\nu_{\ell}}]^{\operatorname{T}},$		(6)
	$\displaystyle\bm{a}_{\operatorname{R}}(\theta^{(\operatorname{R})}_{\ell})=[1,% \mathrm{e}^{-\operatorname{j}\pi\sin(\theta^{(\operatorname{R})}_{\ell})},% \mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j}\pi(M_{\mathrm{R}}-1% )\sin(\theta^{(\operatorname{R})}_{\ell})}]^{\operatorname{T}},$		(7)
	$\displaystyle\bm{a}_{\operatorname{T}}(\theta^{(\operatorname{T})}_{\ell})=[1,% \mathrm{e}^{-\operatorname{j}\pi\sin(\theta^{(\operatorname{T})}_{\ell})},% \mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j}\pi(M_{\mathrm{T}}-1% )\sin(\theta^{(\operatorname{T})}_{\ell})}]^{\operatorname{T}}$		(8)

with subcarrier spacing $\Delta f$ , symbol duration $\Delta T$ , half wavelength antenna spacing, and number of subcarriers $M_{\mathrm{SC}}$ , symbols $M_{\mathrm{SN}}$ , and receive and transmit antennas $M_{\mathrm{R}}$ and $M_{\mathrm{T}}$ , respectively. We do not explicitly specify statistical characteristics for any other channel parameters, as our subsequent findings apply in general. In the remainder of this work, we use $\bm{H}$ and its vectorized version $\text{vec}(\bm{H})$ interchangeably.

II-B Statistical Independence in Bayesian Networks

A probabilistic graph corresponds to a graphical representation of a statistical model, in which random variables/vectors are modeled as nodes and statistical dependencies as edges. The causality between two dependent nodes, if clear, is encoded by a directed edge. If all edges in a probabilistic graph are directed, the graph is called a Bayesian network (BN). An example of a BN is given in Fig. 1 a). One advantage of explicitly representing a statistical model as a BN is the possibility to directly infer conditional dependencies between nodes across the whole graph. To do so, two different setups have to be distinguished. The triplet $(A,C,B)$ in Fig. 1 a) builds a v-structure due to both directed edges pointing towards $C$ (i.e., $A\rightarrow C\leftarrow B$ ). We assume that neither $A$ nor $B$ deterministically determines $C$ . Then, the endpoints $A$ and $B$ in a v-structure are so-called d-separated if and only if neither the center node $C$ nor one of its descendants (i.e., $D$ in Fig. 1 a)) is observed [11, Sec. 3.3.1]. If the d-separation is so-called sound, it implies statistical independence, which is typically the case and is assumed throughout this work [11, Sec. 3.3.2]. In any other configuration of arrows (e.g., $A\rightarrow C\rightarrow D$ ), the endpoints $A$ and $D$ are d-separated if and only if the center node $C$ is observed. If the two endpoints in every triplet of adjacent nodes in any trail in the BN between two specific nodes of interest exhibit d-separation, these two nodes are d-separated and, thus, statistically independent.

Refer to caption — Figure 1: a) Exemplary Bayesian network, b) the sensing and modeling setup and c) the direct inference setup.

III Main Result

Our characterization of side information $\bm{z}$ is based on the following observation. Preserving the zero channel mean and Toeplitz structured channel CMs by conditioning on $\bm{z}$ is solely linked to the impact of $\bm{z}$ on $\bm{\beta}=\{\beta_{\ell}\}_{\ell=1}^{L}$ . Formally, this observation is presented in Theorem 1.

Theorem 1.

Let $\bm{H}$ be defined according to (1) with Assumptions 1 and 2, and steering vectors (5)-(8). Let $\bm{z}$ be any side information about $\bm{H}$ . Moreover, let $\bm{\Xi}$ contain the channel parameters $\{p_{\ell},\tau_{\ell},\nu_{\ell},\theta_{\ell}^{(\operatorname{R})},\theta_{% \ell}^{(\operatorname{T})}\}_{\ell=1}^{L}$ . Then, if

\beta_{\ell}|(\bm{\Xi},\bm{z})\sim\mathcal{U}([0,2\pi])\ \text{for}\ \text{all% }\ \ell=1,\ldots,L

(9)

holds true, it implies

\operatorname{\mathbb{E}}[\bm{H}|\bm{z}]=\bm{0},\operatorname{\mathbb{E}}[\bm{% H}\bm{H}^{\operatorname{H}}|\bm{z}]\in\bar{\mathcal{C}}_{\mathcal{T}}

(10)

for any arbitrary distribution $p(\bm{\Xi})$ and with $\bar{\mathcal{C}}_{\mathcal{T}}=\mathcal{C}^{(M_{\mathrm{SC}})}_{\mathcal{T}}% \otimes\mathcal{C}^{(M_{\mathrm{SN}})}_{\mathcal{T}}\otimes\mathcal{C}^{(M_{% \mathrm{R}})}_{\mathcal{T}}\otimes\mathcal{C}^{(M_{\mathrm{T}})}_{\mathcal{T}}$ , where $\mathcal{C}^{(F)}_{\mathcal{T}}$ denotes the set of $F\times F$ Toeplitz structured CMs.

Proof.

See Appendix -A. ∎

Theorem 1 states that if the conditioning on $\bm{z}$ does not affect the statistical characteristics of $\beta_{\ell}$ , then the channel’s WSSUS and zero-mean properties are preserved, independent of the statistical characteristics of the other channel parameters $\bm{\Xi}$ and any potential effects of $\bm{z}$ on these parameters. The cases where (9) is either true or false can be divided in just two distinct setups by means of a BN representation, which leads to a comprehensive characterization of side information on its effect on $\beta_{\ell}$ and, thus, on the channel’s WSSUS and zero-mean properties.

III-A Sensing and Modeling

One setup is illustrated by the BN given in Fig. 1 b). It describes the situation in which the side information $\bm{z}$ is used to infer the channel parameters $\bm{\Xi}$ and/or the channel $\bm{H}$ without being directly observed through $\bm{H}$ itself. Since we condition on $(\bm{\Xi},\bm{z})$ in (9), these variables are considered to be observed and marked gray. A multitude of different situations fall into this case, ranging from the co-existed design of joint communication and sensing with separate resources for communication and sensing functions [12] to classical and modern ML-based channel models based on, e.g., variational autoencoders (VAEs) (cf. Section IV-A). The only two trails in Fig. 1 b) between the observed $\bm{z}$ and $\bm{\beta}=\{\beta_{\ell}\}_{\ell=1}^{L}$ are $\bm{z}\rightarrow\bm{H}\leftarrow\bm{\beta}$ and $\bm{z}\rightarrow\bm{\Xi}\rightarrow\bm{H}\leftarrow\bm{\beta}$ . Moreover, the only trails between the observed $\bm{\Xi}$ and $\bm{\beta}$ are given by $\bm{\Xi}\rightarrow\bm{H}\leftarrow\bm{\beta}$ and $\bm{\Xi}\leftarrow\bm{z}\rightarrow\bm{H}\leftarrow\bm{\beta}$ . By applying the rules from Section II-B and assuming that $\bm{z}$ does not deterministically determine $\bm{H}$ , we see that all trails contain a v-structure with non-observed center node $\bm{H}$ and no observed descendants. We conclude that $\bm{z}$ and $\bm{\beta}$ as well as $\bm{\Xi}$ and $\bm{\beta}$ are statistically independent. Hence, $p(\beta_{\ell}|\bm{\Xi},\bm{z})$ equals $p(\beta_{\ell})$ for all $\ell=1,\ldots,L$ and due to (2), (9) holds true in general.

III-B Direct Inference

The other setup describes the situation in which the side information $\bm{z}$ is a direct observation of the channel $\bm{H}$ itself. The arguably most important example for this setup is channel estimation, where $\bm{z}$ represents a noisy observation of the channel $\bm{H}$ . The corresponding BN is given in Fig. 1 c). Since the center node $\bm{H}$ in $\bm{\beta}\rightarrow\bm{H}\rightarrow\bm{z}$ is not observed, $\bm{\beta}$ and $\bm{z}$ are not independent. Moreover, since the descendant $\bm{z}$ of the center node $\bm{H}$ in $\bm{\beta}\rightarrow\bm{H}\leftarrow\bm{\Xi}$ is observed, $\bm{\beta}$ and $\bm{\Xi}$ are also not independent. Thus, in this setup, both observed variables $(\bm{\Xi},\bm{z})$ potentially influence the statistics of $\bm{\beta}$ and we cannot claim (9) to hold true in general.

IV Applications

Section III provides the means to categorize any side information based on the way it is acquired and related to the channel parameters. In this section, we discuss possibilities how this characterization can be utilized to enhance or to analyze applications for wireless communication.

IV-A Channel Modeling

One possible application is given in the context of modern ML-based channel modeling, where generative models, e.g., generative adversarial networks (GANs) [13] or VAEs [14], are used to capture the underlying channel distribution from a training set of channel realizations. In this context, VAEs aim to encode the channel specific statistical features in a latent variable $\bm{z}$ by learning two distributions $q_{\bm{\phi}}(\bm{z}|\bm{H})$ and $p_{\bm{\theta}}(\bm{H}|\bm{z})$ , where $\bm{H}$ is the channel (cf. (1)) and $(\bm{\phi},\bm{\theta})$ are neural network (NN) parameters. Typically, $p_{\bm{\theta}}(\bm{H}|\bm{z})$ and $q_{\bm{\phi}}(\bm{z}|\bm{H})$ are modeled as conditionally Gaussian, i.e., $p_{\bm{\theta}}(\bm{H}|\bm{z})=\mathcal{N}_{\mathbb{C}}(\bm{H};\bm{\mu}_{\bm{% \theta}}(\bm{z}),\bm{C}_{\bm{\theta}}(\bm{z}))$ and $q_{\bm{\phi}}(\bm{z}|\bm{H})=\mathcal{N}(\bm{z};\bm{\mu}_{\bm{\phi}}(\bm{H}),% \text{diag}(\bm{\sigma}_{\bm{\phi}}(\bm{H})^{2}))$ . The VAE’s training is based on forwarding training channels $\bm{H}_{\text{train}}$ through an encoder-decoder processing chain yielding $(\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{train}}),\bm{\sigma}_{\bm{\phi}}(\bm{H}_{% \text{train}}))$ as well as $(\bm{\mu}_{\bm{\theta}}(\tilde{\bm{z}}),\bm{C}_{\bm{\theta}}(\tilde{\bm{z}}))$ with $\tilde{\bm{z}}\sim q_{\bm{\phi}}(\bm{z}|\bm{H}_{\text{train}})$ ¹¹1We refer to [14] for more details about VAEs for wireless communication.. Due to the strong fluctuations of $\beta_{\ell}$ (cf. Section II-A), these phases cannot contain distinct statistical channel characteristics rendering it unlikely that the VAE stores $\beta_{\ell}$ -specific information in $\bm{z}$ . In consequence, the BN in Fig. 1 b) applies, (9) holds, and $\bm{\mu}_{\bm{\theta}}(\bm{z})$ and $\bm{C}_{\bm{\theta}}(\bm{z})$ are expected to be learned to be zero and Toeplitz in any domain, respectively (cf. Section III-A). Thus, we can utilize Theorem 1 as a tool to verify the VAE’s correct training. In Fig. 2, the behaviour of the parameterized conditional mean and CM during the VAE’s training is shown. As training set we used $50\,000$ narrowband and static channels generated by the geometry-based stochastic channel model QuaDRiGa [15], which are randomly sampled in a $120\,^{\circ}$ sector of the “3GPP_38.901_UMa_NLOS” scenario. The BS and each user is equipped with $8$ and $1$ antennas, respectively, which results in $\bm{H}$ exhibiting solely the spatial receiver domain. We allowed $\bm{C}_{\bm{\theta}}(\bm{z})$ to take any unstructured CM during training by outputting its arbitrarily learnable Cholesky decomposition $\bm{L}_{\bm{\theta}}(\bm{z})$ with $\bm{C}_{\bm{\theta}}(\bm{z})=\bm{L}_{\bm{\theta}}(\bm{z})\bm{L}_{\bm{\theta}}(% \bm{z})^{\operatorname{H}}$ . Similary, we also allowed $\bm{\mu}_{\bm{\theta}}(\bm{z})$ to take any value. In Fig. 2 a), the normalized MSE (NMSE) $\mathrm{nMSE}=1/500\sum_{n=1}^{500}(\|\bm{C}_{\bm{\theta}}(\bm{z}_{n})-\bm{C}^% {(\operatorname{P})}_{\bm{\theta}}(\bm{z}_{n})\|_{\operatorname{F}}^{2})/\|\bm% {C}_{\bm{\theta}}(\bm{z}_{n})\|_{\operatorname{F}}^{2}$ of output covariances matrices $\bm{C}_{\bm{\theta}}(\bm{z}_{n})$ to their orthogonal projection $\bm{C}^{(\operatorname{P})}_{\bm{\theta}}(\bm{z}_{n})$ on Toeplitz matrices (cf. [16]) over the training iterations is shown. Additionally, the mean squared error (MSE) $\mathrm{MSE}=1/500\sum_{n=1}^{500}\|\bm{\mu}_{\bm{\theta}}(\bm{z}_{n})-\bm{0}% \|_{2}^{2}$ of output means $\bm{\mu}_{\bm{\theta}}(\bm{z}_{n})$ to zero is given. Both are generated from randomly sampled latent realizations $\{\bm{z}_{n}\}_{n=1}^{500}$ after each iteration. It can be seen that both measures decrease significantly, approaching zero. Remarkably, Fig. 2 a) shows that the VAE, although not explicitly constrained, is trained to solely output Toeplitz structured CMs and zero means. This behaviour indicates correct training in the sense that $\bm{z}$ captures distribution relevant features and encodes no information about $\bm{\beta}$ . Additionally, in Fig. 2 b), the convergence of $\bm{C}_{\bm{\theta}}(\bm{z})$ towards Toeplitz structured matrices is illustrated by the real and imaginary part of one exemplary output CM, where the conditional mean $\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{val}})$ of $q_{\bm{\phi}}(\bm{z}|\bm{H}_{\text{val}})$ for a fixed validation channel $\bm{H}_{\text{val}}$ is used to generate the plotted $\bm{C}_{\bm{\theta}}(\bm{z}=\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{val}}))$ . These findings also theoretically underpin results from [14] and [17], where it is observed that constraining $\bm{C}_{\bm{\theta}}(\bm{z})$ to be Toeplitz (or circulant as Toeplitz approximation) or $\bm{\mu}_{\bm{\theta}}(\bm{z})$ to be zero results in VAEs with strong channel estimation performance.

IV-B Channel Clustering

In the previous section, we utilized Theorem 1 to verify correct training and to interpret the information encoded in the VAE’s latent embedding. In this section, we actively build on Theorem 1 to directly regularize a clustering algorithm towards the desired outcome. We consider a situation, in which we aim to cluster time-varying channel trajectories according to the users’ velocities without having access to any explicit velocity information. In this example, $\bm{z}$ represents the discrete and finite clusters. If $\bm{z}$ solely encodes the velocities and, thus, information about $\bm{\Xi}$ , the BN in Fig. 1 b) applies, and (9) holds (cf. Section III-A). In consequence, we know $\operatorname{\mathbb{E}}[\bm{H}\bm{H}^{\operatorname{H}}|\bm{z}=\text{cluster% }\ i]$ to be Toeplitz in any domain and the mean $\operatorname{\mathbb{E}}[\bm{H}|\bm{z}=\text{cluster}\ i]$ to be zero for all $i$ in advance (cf. Theorem 1), and, thus, any mean-based clustering algorithm (e.g., k-means) to be sub-optimal. On the other hand, while Gaussian mixture models (GMMs) can also be used as generative models as in Section IV-A, GMM-based clustering allows to incorporate the insights of Theorem 1. Specifically, GMMs assign an unconstrained Gaussian distribution to every cluster individually. However, it is also possible to regularize their means and CMs. In the following, we utilize the GMM covariance Toeplitz parameterization and clustering from [18], in which the cluster indices are used for CSI feedback²²2We refer to [19] for more details about GMMs for wireless communication.. However, in addition to [18], we also enforce the GMM means to be zero, such that every cluster is regularized to have a zero mean and Toeplitz structured CM. We trained this regularized GMM with $80\,000$ single-input-single-output (SISO) narrowband time-varying channels, which are sampled $16$ times every $0.5$ ms in a $120\,^{\circ}$ sector of the “3GPP_38.901_UMa_NLOS” scenario of QuaDRiGa. This results in $\bm{H}$ (cf. (1)) exhibiting solely the temporal domain. Each user’s velocity is randomly drawn from the density $p(v)$ illustrated in Fig. 3 a). This density exhibits four distinct velocities regions (region 1-4), which allows a ground truth velocity clustering of a test set of $8000$ channel trajectories, where each trajectory is labeled with the corresponding user’s velocity region. After training the regularized GMM, which gets no direct velocity information during training as well as testing, we cluster the test set using the GMM. For evaluation, we then compare the ground truth velocity clustering represented by the random variable $C_{v}$ with the GMM clustering represented by $C_{g}$ by means of their mutual information $I(C_{v},C_{g})$ (cf. [20]). The same is done for k-means clustering $C_{k}$ and is illustrated for different k-means and GMM cluster numbers $K$ in Fig. 3 b). It can be seen that the GMM clustering generally shows higher mutual information with the ground truth velocity clustering than k-means. Remarkably, although the GMM does not get any explicit velocity information, its mutual information achieves the entropy $H_{v}$ of the ground truth velocity clustering for $K=16$ and larger. This implies that in this regime, each GMM cluster is associated with exactly one of the velocity regions 1-4 in Fig. 3 a), i.e., the regularized GMM yields perfect velocity clustering. In the same $K$ -regime, the mutual information of k-means shows a significant gap to $H_{v}$ .

	$\displaystyle\operatorname{\mathbb{E}}_{\bm{H}\|\bm{z}}[\bm{H}]=$	$\displaystyle\operatorname{\mathbb{E}}_{(\bm{\beta},\bm{\Xi})\|\bm{z}}[\bm{H}]=% \operatorname{\mathbb{E}}_{\bm{\Xi}\|\bm{z}}\left[\sum_{\ell=1}^{L}\sqrt{p_{% \ell}}\operatorname{\mathbb{E}}_{\bm{\beta}\|(\bm{z},\bm{\Xi})}\left[\mathrm{e}% ^{-\operatorname{j}\beta_{\ell}}\right]\bm{a}_{f,\ell}\otimes\bm{a}_{t,\ell}% \otimes\bm{a}_{\operatorname{R},\ell}\otimes\bm{a}_{\operatorname{T},\ell}\right]$		(11)
	$\displaystyle\operatorname{\mathbb{E}}_{\bm{H}\|\bm{z}}[\bm{H}\bm{H}^{% \operatorname{H}}]=$	$\displaystyle\operatorname{\mathbb{E}}_{\bm{\Xi}\|\bm{z}}\left[\sum_{\ell,\ell^% {\prime}=1}^{L}\sqrt{p_{\ell}}\sqrt{p_{\ell^{\prime}}}\operatorname{\mathbb{E}% }_{\bm{\beta}\|(\bm{z},\bm{\Xi})}\left[\mathrm{e}^{-\operatorname{j}\beta_{\ell% }}\mathrm{e}^{\operatorname{j}\beta_{\ell^{\prime}}}\right]\bm{a}_{f,\ell}\bm{% a}_{f,\ell^{\prime}}^{\operatorname{H}}\otimes\bm{a}_{t,\ell}\bm{a}_{t,\ell^{% \prime}}^{\operatorname{H}}\otimes\bm{a}_{\operatorname{R},\ell}\bm{a}_{% \operatorname{R},\ell^{\prime}}^{\operatorname{H}}\otimes\bm{a}_{\operatorname% {T},\ell}\bm{a}_{\operatorname{T},\ell^{\prime}}^{\operatorname{H}}\right]$		(12)

IV-C Channel Estimation

A further application of Section III is given in the context of channel estimation. Theorem 1 characterizes the conditional mean $\operatorname{\mathbb{E}}[\bm{H}|\bm{z}]$ of the channel $\bm{H}$ given some side information $\bm{z}$ . It is well known that this conditional mean represents the minimum mean squared error (MMSE) channel estimator based on $\bm{z}$ . Thus, Theorem 1 in combination with the setups in Fig. 1 establishes a framework to analyze which kind of information can be fundamentally utilized for estimating the channel. More precisely, if $\bm{z}$ is not statistically relevant for $\beta_{\ell}$ , then the best channel estimate given $\bm{z}$ is the zero vector (cf. Theorem 1). Consequently, $\bm{z}$ has to contain a descendant of $\bm{H}$ , i.e., a pilot observation for channel estimation (cf. Fig. 1 c)). Additionally, Section III also theoretically underpins that as long as $\bm{z}$ contains some direct observation of $\bm{H}$ , any additional side information about $\bm{\Xi}$ can improve the estimation. This is due to $\beta_{\ell}$ being able to statistically depend on $\bm{\Xi}$ if and only if $\bm{H}$ or one of its descendants (i.e, a direct observation) is given (cf. Section II-B). For evaluating these insights, we trained several fully-connected NNs for channel estimation in an end-to-end fashion. Specifically, we generated $10^{5}$ channels using (1) containing solely the spatial receiver domain of dimension $16$ . All these realizations contain three paths, $\beta_{\ell}$ and $\theta_{\ell}^{(\operatorname{R})}$ are uniformly distributed, and the path losses $\sqrt{p_{\ell}}$ are uniformly distributed between zero and one and then normalized to sum up to one. In Fig. 4, the estimation performance in terms of their $\mathrm{nMSE}$ to the ground truth channel of three different NN-based estimators over the signal-to-noise ratio (SNR) is shown, denoted by sensing, pilot and joint. For evaluation, we used a test set of $5000$ channels. All NNs are trained by minimizing the MSE between their output and the true channel, and their NN depth and width were optimized by a random search using a validation set of size $5000$ . As input for the sensing NN, we took the ground truth steering vectors (cf. (7)) of the corresponding channel’s three paths. However, since these have no statistical relevance for $\beta_{\ell}$ , (9) holds, the MMSE estimator is given by the zero vector and the sensing network does not outperform the zero vector. On the other hand, the pilot network takes a noisy channel observation as input. Thus, a descendant of $\bm{H}$ is observed, Fig. 1 c) applies and the estimator outperforms the zero vector (cf. Section III-B). The joint network takes both, the ground truth steering vectors as well as the noisy channel observation as input. Since a descendant of $\bm{H}$ (i.e., the noisy channel observation) is given, $\bm{\beta}$ and $\bm{\Xi}$ can statistically depend on each other (cf. Section III-B), and additional information about $\bm{\Xi}$ can improve the estimation, which is seen in Fig. 4 in form of joint outperforming pilot.

V Conclusion

In this work, we introduced a comprehensive framework, which combines insights from a generic channel model with BNs to categorize side information on its effect on the channel’s WSSUS and zero-mean properties. This framework can be utilized in various ways. We demonstrated how it can be leveraged to analyze side information for channel generation or estimation, and to directly regularize channel clustering. While we discussed three particular applications, this analysis indicates that many more exist, which are part of future work.

-A Proof of Theorem 1

To prove Theorem 1, we reformulate the conditional mean of $\bm{H}$ given $\bm{z}$ according to (11), where we insert the definition (1) of $\bm{H}$ . For simplicity, we also leave out the arguments of the steering vectors. The inner expectation $\operatorname{\mathbb{E}}_{\bm{\beta}|(\bm{z},\bm{\Xi})}[\mathrm{e}^{-% \operatorname{j}\beta_{\ell}}]$ equals zero according to the assumption $\beta_{\ell}|(\bm{\Xi},\bm{z})\sim\mathcal{U}([0,2\pi])$ in Theorem 1. Equivalently, the conditional CM $\operatorname{\mathbb{E}}[\bm{H}\bm{H}^{\operatorname{H}}|\bm{z}]$ can be decomposed according to (12). Considering assumption (4) and the above reasoning, the inner expectation $\operatorname{\mathbb{E}}_{\bm{\beta}|(\bm{z},\bm{\Xi})}[\mathrm{e}^{-% \operatorname{j}\beta_{\ell}}\mathrm{e}^{\operatorname{j}\beta_{\ell^{\prime}}}]$ equals zero for $\ell\neq\ell^{\prime}$ and one otherwise. Since $\bm{a}_{(\cdot)}(\cdot)\bm{a}_{(\cdot)}(\cdot)^{\operatorname{H}}$ is a Toeplitz structured matrix for any domain and any channel parameter configuration (cf. (5)-(8)), it is Toeplitz in expectation concluding the proof.

References

[1] D. Tse and P. Viswanath, Fundamentals of wireless communication. USA: Cambridge University Press, 2005.
[2] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017.
[3] E. Vagenas, G. S. Paschos, and S. A. Kotsopoulos, “Beamforming capacity optimization for MISO systems with both mean and covariance feedback,” IEEE Trans. Wireless Commun., vol. 10, no. 9, pp. 2994–3001, 2011.
[4] B. Böck, D. Semmler, B. Fesl, M. Baur, and W. Utschick, “Gohberg-Semencul estimation of Toeplitz structured covariance matrices and their inverses,” 2023, arXiv:2311.14995.
[5] P. Bello, “Characterization of randomly time-variant linear channels,” IEEE Trans. on Commun. Syst., vol. 11, no. 4, pp. 360–393, 1963.
[6] X. Yin and X. Cheng, Propagation Channel Characterization, Parameter Estimation, and Modeling for Wireless Communication. Wiley-IEEE Press, November 2016.
[7] J. A. Zhang, F. Liu, C. Masouros, R. W. Heath, Z. Feng, L. Zheng, and A. Petropulu, “An overview of signal processing techniques for joint communication and radar sensing,” IEEE J. Sel. Topics Signal Process., vol. 15, no. 6, pp. 1295–1315, 2021.
[8] C. Studer, S. Medjkouh, E. Gonultaş, T. Goldstein, and O. Tirkkonen, “Channel charting: Locating users within the radio environment using channel state information,” IEEE Access, vol. 6, pp. 47 682–47 698, 2018.
[9] P. Almers, E. Bonek, A. G. Burr, N. Czink, M. Debbah, V. Degli‐Esposti, H. Hofstetter, P. Kyösti, D. I. Laurenson, G. Matz, A. F. Molisch, C. Oestges, and H. Ozcelik, “Survey of channel and radio propagation models for wireless MIMO systems,” EURASIP J. Wireless Commun. Netw., vol. 2007, pp. 1–19, 2007.
[10] 3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (release 16),” 3rd Generation Partnership Project (3GPP), Tech. Rep. TR 38.901 version 16.1.0 Release 16, 2020.
[11] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, 2009.
[12] Z. Feng, Z. Wei, X. Chen, H. Yang, Q. Zhang, and P. Zhang, “Joint communication, sensing, and computation enabled 6G intelligent machine system,” IEEE Network, vol. 35, no. 6, pp. 34–42, 2021.
[13] Y. Yang, Y. Li, W. Zhang, F. Qin, P. Zhu, and C.-X. Wang, “Generative-adversarial-network-based wireless channel modeling: Challenges and opportunities,” IEEE Commun. Mag., vol. 57, no. 3, pp. 22–27, 2019.
[14] M. Baur, B. Fesl, and W. Utschick, “Leveraging variational autoencoders for parameterized MMSE estimation,” 2024, arXiv:2307.05352.
[15] S. Jaeckel, L. Raschkowski, K. Börner, and L. Thiele, “Quadriga: A 3-d multi-cell channel model with time evolution for enabling virtual field trials,” IEEE Trans. Antennas Propag., vol. 62, no. 6, pp. 3242–3256, 2014.
[16] K. Grigoriadis, A. Frazho, and R. Skelton, “Application of alternating convex projection methods for computation of positive Toeplitz matrices,” IEEE Trans. Signal Process., vol. 42, no. 7, pp. 1873–1875, 1994.
[17] B. Fesl, N. Turan, B. Böck, and W. Utschick, “Channel estimation for quantized systems based on conditionally Gaussian latent models,” IEEE Trans. Signal Process., pp. 1–16, 2024.
[18] N. Turan, B. Fesl, and W. Utschick, “Enhanced low-complexity FDD system feedback with variable bit lengths via generative modeling,” in 2023 57th Asilomar Conf. Signals, Syst., Comput., 2023, pp. 363–369.
[19] N. Turan, B. Fesl, M. Koller, M. Joham, and W. Utschick, “A versatile low-complexity feedback scheme for FDD systems via generative modeling,” IEEE Trans. Wireless Commun., pp. 1–1, 2023.
[20] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance,” J. Mach. Learn. Res., vol. 11, p. 2837–2854, dec 2010.

	$\displaystyle\operatorname{\mathbb{E}}_{\bm{H}\|\bm{z}}[\bm{H}]=$	$\displaystyle\operatorname{\mathbb{E}}_{(\bm{\beta},\bm{\Xi})\|\bm{z}}[\bm{H}]=% \operatorname{\mathbb{E}}_{\bm{\Xi}\|\bm{z}}\left[\sum_{\ell=1}^{L}\sqrt{p_{% \ell}}\operatorname{\mathbb{E}}_{\bm{\beta}\|(\bm{z},\bm{\Xi})}\left[\mathrm{e}% ^{-\operatorname{j}\beta_{\ell}}\right]\bm{a}_{f,\ell}\otimes\bm{a}_{t,\ell}% \otimes\bm{a}_{\operatorname{R},\ell}\otimes\bm{a}_{\operatorname{T},\ell}\right]$		(11)
	$\displaystyle\operatorname{\mathbb{E}}_{\bm{H}\|\bm{z}}[\bm{H}\bm{H}^{% \operatorname{H}}]=$	$\displaystyle\operatorname{\mathbb{E}}_{\bm{\Xi}\|\bm{z}}\left[\sum_{\ell,\ell^% {\prime}=1}^{L}\sqrt{p_{\ell}}\sqrt{p_{\ell^{\prime}}}\operatorname{\mathbb{E}% }_{\bm{\beta}\|(\bm{z},\bm{\Xi})}\left[\mathrm{e}^{-\operatorname{j}\beta_{\ell% }}\mathrm{e}^{\operatorname{j}\beta_{\ell^{\prime}}}\right]\bm{a}_{f,\ell}\bm{% a}_{f,\ell^{\prime}}^{\operatorname{H}}\otimes\bm{a}_{t,\ell}\bm{a}_{t,\ell^{% \prime}}^{\operatorname{H}}\otimes\bm{a}_{\operatorname{R},\ell}\bm{a}_{% \operatorname{R},\ell^{\prime}}^{\operatorname{H}}\otimes\bm{a}_{\operatorname% {T},\ell}\bm{a}_{\operatorname{T},\ell^{\prime}}^{\operatorname{H}}\right]$		(12)