A Statistical Characterization of Wireless Channels Conditioned on Side Information

Benedikt Böck, , Michael Baur, ,
Nurettin Turan, , Dominik Semmler, ,
and Wolfgang Utschick
©This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
Abstract

Statistical prior channel knowledge, such as the wide-sense-stationary-uncorrelated-scattering (WSSUS) property, and additional side information both can be used to enhance physical layer applications in wireless communication. Generally, the wireless channel’s strongly fluctuating path phases and WSSUS property characterize the channel by a zero mean and Toeplitz-structured covariance matrices in different domains. In this work, we derive a framework to comprehensively categorize side information based on whether it preserves or abandons these statistical features conditioned on the given side information. To accomplish this, we combine insights from a generic channel model with the representation of wireless channels as probabilistic graphs. Additionally, we exemplify several applications, ranging from channel modeling to estimation and clustering, which demonstrate how the proposed framework can practically enhance physical layer methods utilizing machine learning (ML).

Index Terms:
Wide-sense-stationary-uncorrelated-scattering, probabilistic graphs, Toeplitz structure, joint communication and sensing, channel modeling.

I Introduction

Statistical knowledge about the first and second moment of the wireless channel between a user equipment (UE) and a base station (BS) is of key importance to improve their communication link. Among other things, the first and second moment of the channel can be used to improve channel estimation [1, Sec. 3.5.2], provide information about channel parameters [2, Sec. 2.6], or can be employed in other applications, where instantaneous channel state information (CSI) is not available or costly to acquire, with the advantage of reduced pilot and computational overhead [3]. Hence, structural prior information about the channel’s first and second moment can be beneficial, e.g., to reduce the number of required samples for estimating these moments [4]. A prominent example of such prior information is the WSSUS assumption, which characterizes the wireless channel as a wide sense stationary (WSS) process in both the temporal and frequency domains and, thus, constrains the channel CMs in these domains to be Toeplitz structured [5]. Equivalently, the WSSUS assumption can be extended to the spatial domain, leading to a spatial CM with Toeplitz structure [6, Sec. 2.6].

Recently, leveraging additional side information about the wireless channel between the UE and the BS to enhance their communication link has attracted a lot of attention in research. While this side information either can be interpretable in form of the UE’s position [7], it can also be given in an abstract representation by some ML-based latent embedding [8]. Depending on the side information characteristics, the conditioning on this information can either preserve or abandon structural features of the channel’s mean and CM. In this work, we aim to establish a comprehensive framework for characterizing any side information based on its influence on the first and second channel moments. Our main contributions are the following:

  • We establish a theorem leveraging the statistical relation between arbitrary side information and the complex path loss phases of a channel to describe how the WSSUS and zero-mean channel properties are either preserved or abandoned given this side information.

  • By combining this theorem with a probabilistic graph representation of wireless channels, we introduce a framework, which allows to comprehensively categorize side information regarding its effect on the channel’s WSSUS and zero-mean properties.

  • We present various exemplary applications of this framework. Specifically, we introduce a validation technique for the proper training of ML-based channel models, regularize channel clustering and analyze the utility of side information for channel estimation by means of our proposed framework.

II Preliminaries

II-A Channel Model

Over the last decades, several channel models have been proposed following different paradigms and imposing slightly different assumptions [9]. In this work, we consider a generic wideband and time-varying multiple-input-multiple-output (MIMO) baseband channel, which is sampled equidistantly in the time, frequency as well as the spatial domains. This is the case in the typical orthogonal-frequency-division-multiplexing (OFDM) setup with constant subcarrier spacings and constant symbol durations, in which the transmitter and the receiver are both equipped with uniform linear arrays (ULAs). The resulting channel (tensor) is given by

𝑯==1Lpejβ𝒂f(τ)𝒂t(ν)𝒂R(θ(R))𝒂T(θ(T))𝑯superscriptsubscript1𝐿tensor-producttensor-producttensor-productsubscript𝑝superscriptejsubscript𝛽subscript𝒂𝑓subscript𝜏subscript𝒂𝑡subscript𝜈subscript𝒂Rsubscriptsuperscript𝜃Rsubscript𝒂Tsubscriptsuperscript𝜃T\bm{H}=\sum_{\ell=1}^{L}\sqrt{p_{\ell}}\mathrm{e}^{-\operatorname{j}\beta_{% \ell}}\bm{a}_{f}(\tau_{\ell})\otimes\bm{a}_{t}(\nu_{\ell})\otimes\bm{a}_{% \operatorname{R}}(\theta^{(\operatorname{R})}_{\ell})\otimes\bm{a}_{% \operatorname{T}}(\theta^{(\operatorname{T})}_{\ell})bold_italic_H = ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT square-root start_ARG italic_p start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG roman_e start_POSTSUPERSCRIPT - roman_j italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_a start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ⊗ bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ⊗ bold_italic_a start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ⊗ bold_italic_a start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) (1)

characterized by L𝐿Litalic_L paths and the channel parameters, i.e., its complex path losses pejβsubscript𝑝superscriptejsubscript𝛽\sqrt{p_{\ell}}\mathrm{e}^{-\operatorname{j}\beta_{\ell}}square-root start_ARG italic_p start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG roman_e start_POSTSUPERSCRIPT - roman_j italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, delays τsubscript𝜏\tau_{\ell}italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, Doppler-shifts νsubscript𝜈\nu_{\ell}italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, directions of arrival (DoAs) θ(R)subscriptsuperscript𝜃R\theta^{(\operatorname{R})}_{\ell}italic_θ start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, and directions of depature (DoDs) θ(T)subscriptsuperscript𝜃T\theta^{(\operatorname{T})}_{\ell}italic_θ start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. The vectors 𝒂()()subscript𝒂\bm{a}_{(\cdot)}(\cdot)bold_italic_a start_POSTSUBSCRIPT ( ⋅ ) end_POSTSUBSCRIPT ( ⋅ ) denote the steering vectors across the different domains, respectively. The path loss phase shifts {β}=1Lsuperscriptsubscriptsubscript𝛽1𝐿\{\beta_{\ell}\}_{\ell=1}^{L}{ italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT contain polarization effects as well as the center frequency phase shift 2πd/λc2𝜋subscript𝑑subscript𝜆𝑐2\pi d_{\ell}/\lambda_{c}2 italic_π italic_d start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT / italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT with center wavelength λcsubscript𝜆𝑐\lambda_{c}italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and path distance dsubscript𝑑d_{\ell}italic_d start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT. Our theoretical findings in Section III build on the following model assumptions.

Assumption 1.

The phases βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT are uniformly distributed, i.e.,

β𝒰(0,2π)forall.similar-tosubscript𝛽𝒰02𝜋forall\beta_{\ell}\sim\mathcal{U}(0,2\pi)\ \text{for}\ \text{all}\ \ell.italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ∼ caligraphic_U ( 0 , 2 italic_π ) for all roman_ℓ . (2)

Assumption 1 holds when the path distances dsubscript𝑑d_{\ell}italic_d start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT are not known on a scale of the center wavelength λcsubscript𝜆𝑐\lambda_{c}italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and dλcmuch-greater-thansubscript𝑑subscript𝜆𝑐d_{\ell}\gg\lambda_{c}italic_d start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ≫ italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT [1, Sec. 2.4.2]. This is the case in typical wireless communication scenarios rendering Assumption 1 to be generally reasonable.

Assumption 2.

The phases βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT are statistically independent of all channel parameters as well as across different paths, i.e.,

βκforall,andκ{p,τ,ν,θ(R),θ(T)},formulae-sequenceperpendicular-tosubscript𝛽subscript𝜅superscriptforallsuperscriptand𝜅𝑝𝜏𝜈superscript𝜃Rsuperscript𝜃T\displaystyle\beta_{\ell}\perp\kappa_{\ell^{\prime}}\ \text{for}\ \text{all}\ % \ell,\ell^{\prime}\ \text{and}\ \kappa\in\{p,\tau,\nu,\theta^{(\operatorname{R% })},\theta^{(\operatorname{T})}\},italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ⟂ italic_κ start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for all roman_ℓ , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and italic_κ ∈ { italic_p , italic_τ , italic_ν , italic_θ start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT } , (3)
ββforall.perpendicular-tosubscript𝛽subscript𝛽superscriptforallsuperscript\displaystyle\beta_{\ell}\perp\beta_{\ell^{\prime}}\ \text{for}\ \text{all}\ % \ell\neq\ell^{\prime}.italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ⟂ italic_β start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for all roman_ℓ ≠ roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . (4)

Although rigorously, the phases {β}=1Lsuperscriptsubscriptsubscript𝛽1𝐿\{\beta_{\ell}\}_{\ell=1}^{L}{ italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT depend on the delays τsubscript𝜏\tau_{\ell}italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, they are commonly modeled to satisfy Assumption 2 due to their strong fluctuations and the multitude of different influential effects contained [10, Sec. 7.5]. In addition, within small frequency, time and spatial ranges, the channel parameters and steering vectors are constant over the frequency f𝑓fitalic_f, the time t𝑡titalic_t and the positions of the transmitting and receiving antennas. As a result, the channel exhibits a zero mean and the WSSUS property, i.e., it is a WSS process across all domains [5]. This implies 𝑯𝑯\bm{H}bold_italic_H in (1) to possess a zero mean and a Toeplitz structured CM in any domain. The steering vectors are given by

𝒂f(τ)=[1,ej2πΔfτ,..,ej2πΔf(MSC1)τ]T,subscript𝒂𝑓subscript𝜏superscript1superscriptej2𝜋Δ𝑓subscript𝜏..superscriptej2𝜋Δ𝑓subscript𝑀SC1subscript𝜏T\displaystyle\bm{a}_{f}(\tau_{\ell})=[1,\mathrm{e}^{-\operatorname{j}2\pi% \Delta f\tau_{\ell}},\mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j% }2\pi\Delta f(M_{\mathrm{SC}}-1)\tau_{\ell}}]^{\operatorname{T}},bold_italic_a start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ( italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = [ 1 , roman_e start_POSTSUPERSCRIPT - roman_j 2 italic_π roman_Δ italic_f italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , start_ATOM . . end_ATOM , roman_e start_POSTSUPERSCRIPT - roman_j 2 italic_π roman_Δ italic_f ( italic_M start_POSTSUBSCRIPT roman_SC end_POSTSUBSCRIPT - 1 ) italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , (5)
𝒂t(ν)=[1,ej2πΔTν,..,ej2πΔT(MSN1)ν]T,subscript𝒂𝑡subscript𝜈superscript1superscriptej2𝜋Δ𝑇subscript𝜈..superscriptej2𝜋Δ𝑇subscript𝑀SN1subscript𝜈T\displaystyle\bm{a}_{t}(\nu_{\ell})=[1,\mathrm{e}^{-\operatorname{j}2\pi\Delta T% \nu_{\ell}},\mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j}2\pi% \Delta T(M_{\mathrm{SN}}-1)\nu_{\ell}}]^{\operatorname{T}},bold_italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = [ 1 , roman_e start_POSTSUPERSCRIPT - roman_j 2 italic_π roman_Δ italic_T italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , start_ATOM . . end_ATOM , roman_e start_POSTSUPERSCRIPT - roman_j 2 italic_π roman_Δ italic_T ( italic_M start_POSTSUBSCRIPT roman_SN end_POSTSUBSCRIPT - 1 ) italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , (6)
𝒂R(θ(R))=[1,ejπsin(θ(R)),..,ejπ(MR1)sin(θ(R))]T,subscript𝒂Rsubscriptsuperscript𝜃Rsuperscript1superscriptej𝜋subscriptsuperscript𝜃R..superscriptej𝜋subscript𝑀R1subscriptsuperscript𝜃RT\displaystyle\bm{a}_{\operatorname{R}}(\theta^{(\operatorname{R})}_{\ell})=[1,% \mathrm{e}^{-\operatorname{j}\pi\sin(\theta^{(\operatorname{R})}_{\ell})},% \mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j}\pi(M_{\mathrm{R}}-1% )\sin(\theta^{(\operatorname{R})}_{\ell})}]^{\operatorname{T}},bold_italic_a start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = [ 1 , roman_e start_POSTSUPERSCRIPT - roman_j italic_π roman_sin ( italic_θ start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , start_ATOM . . end_ATOM , roman_e start_POSTSUPERSCRIPT - roman_j italic_π ( italic_M start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT - 1 ) roman_sin ( italic_θ start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT , (7)
𝒂T(θ(T))=[1,ejπsin(θ(T)),..,ejπ(MT1)sin(θ(T))]Tsubscript𝒂Tsubscriptsuperscript𝜃Tsuperscript1superscriptej𝜋subscriptsuperscript𝜃T..superscriptej𝜋subscript𝑀T1subscriptsuperscript𝜃TT\displaystyle\bm{a}_{\operatorname{T}}(\theta^{(\operatorname{T})}_{\ell})=[1,% \mathrm{e}^{-\operatorname{j}\pi\sin(\theta^{(\operatorname{T})}_{\ell})},% \mathinner{{\ldotp}{\ldotp}},\mathrm{e}^{-\operatorname{j}\pi(M_{\mathrm{T}}-1% )\sin(\theta^{(\operatorname{T})}_{\ell})}]^{\operatorname{T}}bold_italic_a start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) = [ 1 , roman_e start_POSTSUPERSCRIPT - roman_j italic_π roman_sin ( italic_θ start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , start_ATOM . . end_ATOM , roman_e start_POSTSUPERSCRIPT - roman_j italic_π ( italic_M start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT - 1 ) roman_sin ( italic_θ start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT (8)

with subcarrier spacing ΔfΔ𝑓\Delta froman_Δ italic_f, symbol duration ΔTΔ𝑇\Delta Troman_Δ italic_T, half wavelength antenna spacing, and number of subcarriers MSCsubscript𝑀SCM_{\mathrm{SC}}italic_M start_POSTSUBSCRIPT roman_SC end_POSTSUBSCRIPT, symbols MSNsubscript𝑀SNM_{\mathrm{SN}}italic_M start_POSTSUBSCRIPT roman_SN end_POSTSUBSCRIPT, and receive and transmit antennas MRsubscript𝑀RM_{\mathrm{R}}italic_M start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT and MTsubscript𝑀TM_{\mathrm{T}}italic_M start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT, respectively. We do not explicitly specify statistical characteristics for any other channel parameters, as our subsequent findings apply in general. In the remainder of this work, we use 𝑯𝑯\bm{H}bold_italic_H and its vectorized version vec(𝑯)vec𝑯\text{vec}(\bm{H})vec ( bold_italic_H ) interchangeably.

II-B Statistical Independence in Bayesian Networks

A probabilistic graph corresponds to a graphical representation of a statistical model, in which random variables/vectors are modeled as nodes and statistical dependencies as edges. The causality between two dependent nodes, if clear, is encoded by a directed edge. If all edges in a probabilistic graph are directed, the graph is called a Bayesian network (BN). An example of a BN is given in Fig. 1 a). One advantage of explicitly representing a statistical model as a BN is the possibility to directly infer conditional dependencies between nodes across the whole graph. To do so, two different setups have to be distinguished. The triplet (A,C,B)𝐴𝐶𝐵(A,C,B)( italic_A , italic_C , italic_B ) in Fig. 1 a) builds a v-structure due to both directed edges pointing towards C𝐶Citalic_C (i.e., ACB𝐴𝐶𝐵A\rightarrow C\leftarrow Bitalic_A → italic_C ← italic_B). We assume that neither A𝐴Aitalic_A nor B𝐵Bitalic_B deterministically determines C𝐶Citalic_C. Then, the endpoints A𝐴Aitalic_A and B𝐵Bitalic_B in a v-structure are so-called d-separated if and only if neither the center node C𝐶Citalic_C nor one of its descendants (i.e., D𝐷Ditalic_D in Fig. 1 a)) is observed [11, Sec. 3.3.1]. If the d-separation is so-called sound, it implies statistical independence, which is typically the case and is assumed throughout this work [11, Sec. 3.3.2]. In any other configuration of arrows (e.g., ACD𝐴𝐶𝐷A\rightarrow C\rightarrow Ditalic_A → italic_C → italic_D), the endpoints A𝐴Aitalic_A and D𝐷Ditalic_D are d-separated if and only if the center node C𝐶Citalic_C is observed. If the two endpoints in every triplet of adjacent nodes in any trail in the BN between two specific nodes of interest exhibit d-separation, these two nodes are d-separated and, thus, statistically independent.

Refer to caption
Figure 1: a) Exemplary Bayesian network, b) the sensing and modeling setup and c) the direct inference setup.

III Main Result

Our characterization of side information 𝒛𝒛\bm{z}bold_italic_z is based on the following observation. Preserving the zero channel mean and Toeplitz structured channel CMs by conditioning on 𝒛𝒛\bm{z}bold_italic_z is solely linked to the impact of 𝒛𝒛\bm{z}bold_italic_z on 𝜷={β}=1L𝜷superscriptsubscriptsubscript𝛽1𝐿\bm{\beta}=\{\beta_{\ell}\}_{\ell=1}^{L}bold_italic_β = { italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. Formally, this observation is presented in Theorem 1.

Theorem 1.

Let 𝐇𝐇\bm{H}bold_italic_H be defined according to (1) with Assumptions 1 and 2, and steering vectors (5)-(8). Let 𝐳𝐳\bm{z}bold_italic_z be any side information about 𝐇𝐇\bm{H}bold_italic_H. Moreover, let 𝚵𝚵\bm{\Xi}bold_Ξ contain the channel parameters {p,τ,ν,θ(R),θ(T)}=1Lsuperscriptsubscriptsubscript𝑝subscript𝜏subscript𝜈superscriptsubscript𝜃Rsuperscriptsubscript𝜃T1𝐿\{p_{\ell},\tau_{\ell},\nu_{\ell},\theta_{\ell}^{(\operatorname{R})},\theta_{% \ell}^{(\operatorname{T})}\}_{\ell=1}^{L}{ italic_p start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_ν start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_T ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. Then, if

β|(𝚵,𝒛)𝒰([0,2π])forall=1,,Lformulae-sequencesimilar-toconditionalsubscript𝛽𝚵𝒛𝒰02𝜋forall1𝐿\beta_{\ell}|(\bm{\Xi},\bm{z})\sim\mathcal{U}([0,2\pi])\ \text{for}\ \text{all% }\ \ell=1,\ldots,Litalic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | ( bold_Ξ , bold_italic_z ) ∼ caligraphic_U ( [ 0 , 2 italic_π ] ) for all roman_ℓ = 1 , … , italic_L (9)

holds true, it implies

𝔼[𝑯|𝒛]=𝟎,𝔼[𝑯𝑯H|𝒛]𝒞¯𝒯formulae-sequence𝔼conditional𝑯𝒛0𝔼conditional𝑯superscript𝑯H𝒛subscript¯𝒞𝒯\operatorname{\mathbb{E}}[\bm{H}|\bm{z}]=\bm{0},\operatorname{\mathbb{E}}[\bm{% H}\bm{H}^{\operatorname{H}}|\bm{z}]\in\bar{\mathcal{C}}_{\mathcal{T}}blackboard_E [ bold_italic_H | bold_italic_z ] = bold_0 , blackboard_E [ bold_italic_H bold_italic_H start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT | bold_italic_z ] ∈ over¯ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT (10)

for any arbitrary distribution p(𝚵)𝑝𝚵p(\bm{\Xi})italic_p ( bold_Ξ ) and with 𝒞¯𝒯=𝒞𝒯(MSC)𝒞𝒯(MSN)𝒞𝒯(MR)𝒞𝒯(MT)subscript¯𝒞𝒯tensor-productsubscriptsuperscript𝒞subscript𝑀SC𝒯subscriptsuperscript𝒞subscript𝑀SN𝒯subscriptsuperscript𝒞subscript𝑀R𝒯subscriptsuperscript𝒞subscript𝑀T𝒯\bar{\mathcal{C}}_{\mathcal{T}}=\mathcal{C}^{(M_{\mathrm{SC}})}_{\mathcal{T}}% \otimes\mathcal{C}^{(M_{\mathrm{SN}})}_{\mathcal{T}}\otimes\mathcal{C}^{(M_{% \mathrm{R}})}_{\mathcal{T}}\otimes\mathcal{C}^{(M_{\mathrm{T}})}_{\mathcal{T}}over¯ start_ARG caligraphic_C end_ARG start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT = caligraphic_C start_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT roman_SC end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT ⊗ caligraphic_C start_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT roman_SN end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT ⊗ caligraphic_C start_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT roman_R end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT ⊗ caligraphic_C start_POSTSUPERSCRIPT ( italic_M start_POSTSUBSCRIPT roman_T end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT, where 𝒞𝒯(F)subscriptsuperscript𝒞𝐹𝒯\mathcal{C}^{(F)}_{\mathcal{T}}caligraphic_C start_POSTSUPERSCRIPT ( italic_F ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT denotes the set of F×F𝐹𝐹F\times Fitalic_F × italic_F Toeplitz structured CMs.

Proof.

See Appendix -A. ∎

Theorem 1 states that if the conditioning on 𝒛𝒛\bm{z}bold_italic_z does not affect the statistical characteristics of βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, then the channel’s WSSUS and zero-mean properties are preserved, independent of the statistical characteristics of the other channel parameters 𝚵𝚵\bm{\Xi}bold_Ξ and any potential effects of 𝒛𝒛\bm{z}bold_italic_z on these parameters. The cases where (9) is either true or false can be divided in just two distinct setups by means of a BN representation, which leads to a comprehensive characterization of side information on its effect on βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and, thus, on the channel’s WSSUS and zero-mean properties.

III-A Sensing and Modeling

One setup is illustrated by the BN given in Fig. 1 b). It describes the situation in which the side information 𝒛𝒛\bm{z}bold_italic_z is used to infer the channel parameters 𝚵𝚵\bm{\Xi}bold_Ξ and/or the channel 𝑯𝑯\bm{H}bold_italic_H without being directly observed through 𝑯𝑯\bm{H}bold_italic_H itself. Since we condition on (𝚵,𝒛)𝚵𝒛(\bm{\Xi},\bm{z})( bold_Ξ , bold_italic_z ) in (9), these variables are considered to be observed and marked gray. A multitude of different situations fall into this case, ranging from the co-existed design of joint communication and sensing with separate resources for communication and sensing functions [12] to classical and modern ML-based channel models based on, e.g., variational autoencoders (VAEs) (cf. Section IV-A). The only two trails in Fig. 1 b) between the observed 𝒛𝒛\bm{z}bold_italic_z and 𝜷={β}=1L𝜷superscriptsubscriptsubscript𝛽1𝐿\bm{\beta}=\{\beta_{\ell}\}_{\ell=1}^{L}bold_italic_β = { italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT are 𝒛𝑯𝜷𝒛𝑯𝜷\bm{z}\rightarrow\bm{H}\leftarrow\bm{\beta}bold_italic_z → bold_italic_H ← bold_italic_β and 𝒛𝚵𝑯𝜷𝒛𝚵𝑯𝜷\bm{z}\rightarrow\bm{\Xi}\rightarrow\bm{H}\leftarrow\bm{\beta}bold_italic_z → bold_Ξ → bold_italic_H ← bold_italic_β. Moreover, the only trails between the observed 𝚵𝚵\bm{\Xi}bold_Ξ and 𝜷𝜷\bm{\beta}bold_italic_β are given by 𝚵𝑯𝜷𝚵𝑯𝜷\bm{\Xi}\rightarrow\bm{H}\leftarrow\bm{\beta}bold_Ξ → bold_italic_H ← bold_italic_β and 𝚵𝒛𝑯𝜷𝚵𝒛𝑯𝜷\bm{\Xi}\leftarrow\bm{z}\rightarrow\bm{H}\leftarrow\bm{\beta}bold_Ξ ← bold_italic_z → bold_italic_H ← bold_italic_β. By applying the rules from Section II-B and assuming that 𝒛𝒛\bm{z}bold_italic_z does not deterministically determine 𝑯𝑯\bm{H}bold_italic_H, we see that all trails contain a v-structure with non-observed center node 𝑯𝑯\bm{H}bold_italic_H and no observed descendants. We conclude that 𝒛𝒛\bm{z}bold_italic_z and 𝜷𝜷\bm{\beta}bold_italic_β as well as 𝚵𝚵\bm{\Xi}bold_Ξ and 𝜷𝜷\bm{\beta}bold_italic_β are statistically independent. Hence, p(β|𝚵,𝒛)𝑝conditionalsubscript𝛽𝚵𝒛p(\beta_{\ell}|\bm{\Xi},\bm{z})italic_p ( italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | bold_Ξ , bold_italic_z ) equals p(β)𝑝subscript𝛽p(\beta_{\ell})italic_p ( italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) for all =1,,L1𝐿\ell=1,\ldots,Lroman_ℓ = 1 , … , italic_L and due to (2), (9) holds true in general.

III-B Direct Inference

The other setup describes the situation in which the side information 𝒛𝒛\bm{z}bold_italic_z is a direct observation of the channel 𝑯𝑯\bm{H}bold_italic_H itself. The arguably most important example for this setup is channel estimation, where 𝒛𝒛\bm{z}bold_italic_z represents a noisy observation of the channel 𝑯𝑯\bm{H}bold_italic_H. The corresponding BN is given in Fig. 1 c). Since the center node 𝑯𝑯\bm{H}bold_italic_H in 𝜷𝑯𝒛𝜷𝑯𝒛\bm{\beta}\rightarrow\bm{H}\rightarrow\bm{z}bold_italic_β → bold_italic_H → bold_italic_z is not observed, 𝜷𝜷\bm{\beta}bold_italic_β and 𝒛𝒛\bm{z}bold_italic_z are not independent. Moreover, since the descendant 𝒛𝒛\bm{z}bold_italic_z of the center node 𝑯𝑯\bm{H}bold_italic_H in 𝜷𝑯𝚵𝜷𝑯𝚵\bm{\beta}\rightarrow\bm{H}\leftarrow\bm{\Xi}bold_italic_β → bold_italic_H ← bold_Ξ is observed, 𝜷𝜷\bm{\beta}bold_italic_β and 𝚵𝚵\bm{\Xi}bold_Ξ are also not independent. Thus, in this setup, both observed variables (𝚵,𝒛)𝚵𝒛(\bm{\Xi},\bm{z})( bold_Ξ , bold_italic_z ) potentially influence the statistics of 𝜷𝜷\bm{\beta}bold_italic_β and we cannot claim (9) to hold true in general.

IV Applications

Section III provides the means to categorize any side information based on the way it is acquired and related to the channel parameters. In this section, we discuss possibilities how this characterization can be utilized to enhance or to analyze applications for wireless communication.

Refer to caption
Figure 2: a) nMSEnMSE\mathrm{nMSE}roman_nMSE of the output covariance matrix to its Toeplitz projection and MSEMSE\mathrm{MSE}roman_MSE of the output mean to zero over training iterations, b) real and imaginary parts of an exemplary output covariance matrix 𝑪𝜽(𝒛=𝝁ϕ(𝑯val))subscript𝑪𝜽𝒛subscript𝝁bold-italic-ϕsubscript𝑯val\bm{C}_{\bm{\theta}}(\bm{z}=\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{val}}))bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z = bold_italic_μ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) ) generated by the same validation sample 𝑯valsubscript𝑯val\bm{H}_{\text{val}}bold_italic_H start_POSTSUBSCRIPT val end_POSTSUBSCRIPT after 1111, 103superscript10310^{3}10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT and 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT iterations.

IV-A Channel Modeling

One possible application is given in the context of modern ML-based channel modeling, where generative models, e.g., generative adversarial networks (GANs) [13] or VAEs [14], are used to capture the underlying channel distribution from a training set of channel realizations. In this context, VAEs aim to encode the channel specific statistical features in a latent variable 𝒛𝒛\bm{z}bold_italic_z by learning two distributions qϕ(𝒛|𝑯)subscript𝑞bold-italic-ϕconditional𝒛𝑯q_{\bm{\phi}}(\bm{z}|\bm{H})italic_q start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_H ) and p𝜽(𝑯|𝒛)subscript𝑝𝜽conditional𝑯𝒛p_{\bm{\theta}}(\bm{H}|\bm{z})italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_H | bold_italic_z ), where 𝑯𝑯\bm{H}bold_italic_H is the channel (cf. (1)) and (ϕ,𝜽)bold-italic-ϕ𝜽(\bm{\phi},\bm{\theta})( bold_italic_ϕ , bold_italic_θ ) are neural network (NN) parameters. Typically, p𝜽(𝑯|𝒛)subscript𝑝𝜽conditional𝑯𝒛p_{\bm{\theta}}(\bm{H}|\bm{z})italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_H | bold_italic_z ) and qϕ(𝒛|𝑯)subscript𝑞bold-italic-ϕconditional𝒛𝑯q_{\bm{\phi}}(\bm{z}|\bm{H})italic_q start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_H ) are modeled as conditionally Gaussian, i.e., p𝜽(𝑯|𝒛)=𝒩(𝑯;𝝁𝜽(𝒛),𝑪𝜽(𝒛))subscript𝑝𝜽conditional𝑯𝒛subscript𝒩𝑯subscript𝝁𝜽𝒛subscript𝑪𝜽𝒛p_{\bm{\theta}}(\bm{H}|\bm{z})=\mathcal{N}_{\mathbb{C}}(\bm{H};\bm{\mu}_{\bm{% \theta}}(\bm{z}),\bm{C}_{\bm{\theta}}(\bm{z}))italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_H | bold_italic_z ) = caligraphic_N start_POSTSUBSCRIPT blackboard_C end_POSTSUBSCRIPT ( bold_italic_H ; bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) , bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) ) and qϕ(𝒛|𝑯)=𝒩(𝒛;𝝁ϕ(𝑯),diag(𝝈ϕ(𝑯)2))subscript𝑞bold-italic-ϕconditional𝒛𝑯𝒩𝒛subscript𝝁bold-italic-ϕ𝑯diagsubscript𝝈bold-italic-ϕsuperscript𝑯2q_{\bm{\phi}}(\bm{z}|\bm{H})=\mathcal{N}(\bm{z};\bm{\mu}_{\bm{\phi}}(\bm{H}),% \text{diag}(\bm{\sigma}_{\bm{\phi}}(\bm{H})^{2}))italic_q start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_H ) = caligraphic_N ( bold_italic_z ; bold_italic_μ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H ) , diag ( bold_italic_σ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ). The VAE’s training is based on forwarding training channels 𝑯trainsubscript𝑯train\bm{H}_{\text{train}}bold_italic_H start_POSTSUBSCRIPT train end_POSTSUBSCRIPT through an encoder-decoder processing chain yielding (𝝁ϕ(𝑯train),𝝈ϕ(𝑯train))subscript𝝁bold-italic-ϕsubscript𝑯trainsubscript𝝈bold-italic-ϕsubscript𝑯train(\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{train}}),\bm{\sigma}_{\bm{\phi}}(\bm{H}_{% \text{train}}))( bold_italic_μ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ) , bold_italic_σ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ) ) as well as (𝝁𝜽(𝒛~),𝑪𝜽(𝒛~))subscript𝝁𝜽~𝒛subscript𝑪𝜽~𝒛(\bm{\mu}_{\bm{\theta}}(\tilde{\bm{z}}),\bm{C}_{\bm{\theta}}(\tilde{\bm{z}}))( bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_z end_ARG ) , bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( over~ start_ARG bold_italic_z end_ARG ) ) with 𝒛~qϕ(𝒛|𝑯train)similar-to~𝒛subscript𝑞bold-italic-ϕconditional𝒛subscript𝑯train\tilde{\bm{z}}\sim q_{\bm{\phi}}(\bm{z}|\bm{H}_{\text{train}})over~ start_ARG bold_italic_z end_ARG ∼ italic_q start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_H start_POSTSUBSCRIPT train end_POSTSUBSCRIPT )111We refer to [14] for more details about VAEs for wireless communication.. Due to the strong fluctuations of βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT (cf. Section II-A), these phases cannot contain distinct statistical channel characteristics rendering it unlikely that the VAE stores βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT-specific information in 𝒛𝒛\bm{z}bold_italic_z. In consequence, the BN in Fig. 1 b) applies, (9) holds, and 𝝁𝜽(𝒛)subscript𝝁𝜽𝒛\bm{\mu}_{\bm{\theta}}(\bm{z})bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) and 𝑪𝜽(𝒛)subscript𝑪𝜽𝒛\bm{C}_{\bm{\theta}}(\bm{z})bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) are expected to be learned to be zero and Toeplitz in any domain, respectively (cf. Section III-A). Thus, we can utilize Theorem 1 as a tool to verify the VAE’s correct training. In Fig. 2, the behaviour of the parameterized conditional mean and CM during the VAE’s training is shown. As training set we used 50 0005000050\,00050 000 narrowband and static channels generated by the geometry-based stochastic channel model QuaDRiGa [15], which are randomly sampled in a 120superscript120120\,^{\circ}120 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT sector of the “3GPP_38.901_UMa_NLOS” scenario. The BS and each user is equipped with 8888 and 1111 antennas, respectively, which results in 𝑯𝑯\bm{H}bold_italic_H exhibiting solely the spatial receiver domain. We allowed 𝑪𝜽(𝒛)subscript𝑪𝜽𝒛\bm{C}_{\bm{\theta}}(\bm{z})bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) to take any unstructured CM during training by outputting its arbitrarily learnable Cholesky decomposition 𝑳𝜽(𝒛)subscript𝑳𝜽𝒛\bm{L}_{\bm{\theta}}(\bm{z})bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) with 𝑪𝜽(𝒛)=𝑳𝜽(𝒛)𝑳𝜽(𝒛)Hsubscript𝑪𝜽𝒛subscript𝑳𝜽𝒛subscript𝑳𝜽superscript𝒛H\bm{C}_{\bm{\theta}}(\bm{z})=\bm{L}_{\bm{\theta}}(\bm{z})\bm{L}_{\bm{\theta}}(% \bm{z})^{\operatorname{H}}bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) = bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT. Similary, we also allowed 𝝁𝜽(𝒛)subscript𝝁𝜽𝒛\bm{\mu}_{\bm{\theta}}(\bm{z})bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) to take any value. In Fig. 2 a), the normalized MSE (NMSE) nMSE=1/500n=1500(𝑪𝜽(𝒛n)𝑪𝜽(P)(𝒛n)F2)/𝑪𝜽(𝒛n)F2nMSE1500superscriptsubscript𝑛1500superscriptsubscriptnormsubscript𝑪𝜽subscript𝒛𝑛subscriptsuperscript𝑪P𝜽subscript𝒛𝑛F2superscriptsubscriptnormsubscript𝑪𝜽subscript𝒛𝑛F2\mathrm{nMSE}=1/500\sum_{n=1}^{500}(\|\bm{C}_{\bm{\theta}}(\bm{z}_{n})-\bm{C}^% {(\operatorname{P})}_{\bm{\theta}}(\bm{z}_{n})\|_{\operatorname{F}}^{2})/\|\bm% {C}_{\bm{\theta}}(\bm{z}_{n})\|_{\operatorname{F}}^{2}roman_nMSE = 1 / 500 ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 500 end_POSTSUPERSCRIPT ( ∥ bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - bold_italic_C start_POSTSUPERSCRIPT ( roman_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / ∥ bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of output covariances matrices 𝑪𝜽(𝒛n)subscript𝑪𝜽subscript𝒛𝑛\bm{C}_{\bm{\theta}}(\bm{z}_{n})bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) to their orthogonal projection 𝑪𝜽(P)(𝒛n)subscriptsuperscript𝑪P𝜽subscript𝒛𝑛\bm{C}^{(\operatorname{P})}_{\bm{\theta}}(\bm{z}_{n})bold_italic_C start_POSTSUPERSCRIPT ( roman_P ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) on Toeplitz matrices (cf. [16]) over the training iterations is shown. Additionally, the mean squared error (MSE) MSE=1/500n=1500𝝁𝜽(𝒛n)𝟎22MSE1500superscriptsubscript𝑛1500superscriptsubscriptnormsubscript𝝁𝜽subscript𝒛𝑛022\mathrm{MSE}=1/500\sum_{n=1}^{500}\|\bm{\mu}_{\bm{\theta}}(\bm{z}_{n})-\bm{0}% \|_{2}^{2}roman_MSE = 1 / 500 ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 500 end_POSTSUPERSCRIPT ∥ bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - bold_0 ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of output means 𝝁𝜽(𝒛n)subscript𝝁𝜽subscript𝒛𝑛\bm{\mu}_{\bm{\theta}}(\bm{z}_{n})bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) to zero is given. Both are generated from randomly sampled latent realizations {𝒛n}n=1500superscriptsubscriptsubscript𝒛𝑛𝑛1500\{\bm{z}_{n}\}_{n=1}^{500}{ bold_italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 500 end_POSTSUPERSCRIPT after each iteration. It can be seen that both measures decrease significantly, approaching zero. Remarkably, Fig. 2 a) shows that the VAE, although not explicitly constrained, is trained to solely output Toeplitz structured CMs and zero means. This behaviour indicates correct training in the sense that 𝒛𝒛\bm{z}bold_italic_z captures distribution relevant features and encodes no information about 𝜷𝜷\bm{\beta}bold_italic_β. Additionally, in Fig. 2 b), the convergence of 𝑪𝜽(𝒛)subscript𝑪𝜽𝒛\bm{C}_{\bm{\theta}}(\bm{z})bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) towards Toeplitz structured matrices is illustrated by the real and imaginary part of one exemplary output CM, where the conditional mean 𝝁ϕ(𝑯val)subscript𝝁bold-italic-ϕsubscript𝑯val\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{val}})bold_italic_μ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) of qϕ(𝒛|𝑯val)subscript𝑞bold-italic-ϕconditional𝒛subscript𝑯valq_{\bm{\phi}}(\bm{z}|\bm{H}_{\text{val}})italic_q start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_z | bold_italic_H start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) for a fixed validation channel 𝑯valsubscript𝑯val\bm{H}_{\text{val}}bold_italic_H start_POSTSUBSCRIPT val end_POSTSUBSCRIPT is used to generate the plotted 𝑪𝜽(𝒛=𝝁ϕ(𝑯val))subscript𝑪𝜽𝒛subscript𝝁bold-italic-ϕsubscript𝑯val\bm{C}_{\bm{\theta}}(\bm{z}=\bm{\mu}_{\bm{\phi}}(\bm{H}_{\text{val}}))bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z = bold_italic_μ start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_italic_H start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) ). These findings also theoretically underpin results from [14] and [17], where it is observed that constraining 𝑪𝜽(𝒛)subscript𝑪𝜽𝒛\bm{C}_{\bm{\theta}}(\bm{z})bold_italic_C start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) to be Toeplitz (or circulant as Toeplitz approximation) or 𝝁𝜽(𝒛)subscript𝝁𝜽𝒛\bm{\mu}_{\bm{\theta}}(\bm{z})bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_z ) to be zero results in VAEs with strong channel estimation performance.

IV-B Channel Clustering

Refer to caption
Figure 3: a) Velocity probability distribution with four distinct regions, b) mutual information I(Cv,C)𝐼subscript𝐶𝑣𝐶I(C_{v},C)italic_I ( italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_C ) between the GMM Toeplitz and zero mean (Cgsubscript𝐶𝑔C_{g}italic_C start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT) or k-means (Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) clustering and the ground truth velocity clustering Cvsubscript𝐶𝑣C_{v}italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT with C{Cg,Ck}𝐶subscript𝐶𝑔subscript𝐶𝑘C\in\{C_{g},C_{k}\}italic_C ∈ { italic_C start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }.

In the previous section, we utilized Theorem 1 to verify correct training and to interpret the information encoded in the VAE’s latent embedding. In this section, we actively build on Theorem 1 to directly regularize a clustering algorithm towards the desired outcome. We consider a situation, in which we aim to cluster time-varying channel trajectories according to the users’ velocities without having access to any explicit velocity information. In this example, 𝒛𝒛\bm{z}bold_italic_z represents the discrete and finite clusters. If 𝒛𝒛\bm{z}bold_italic_z solely encodes the velocities and, thus, information about 𝚵𝚵\bm{\Xi}bold_Ξ, the BN in Fig. 1 b) applies, and (9) holds (cf. Section III-A). In consequence, we know 𝔼[𝑯𝑯H|𝒛=clusteri]𝔼conditional𝑯superscript𝑯H𝒛cluster𝑖\operatorname{\mathbb{E}}[\bm{H}\bm{H}^{\operatorname{H}}|\bm{z}=\text{cluster% }\ i]blackboard_E [ bold_italic_H bold_italic_H start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT | bold_italic_z = cluster italic_i ] to be Toeplitz in any domain and the mean 𝔼[𝑯|𝒛=clusteri]𝔼conditional𝑯𝒛cluster𝑖\operatorname{\mathbb{E}}[\bm{H}|\bm{z}=\text{cluster}\ i]blackboard_E [ bold_italic_H | bold_italic_z = cluster italic_i ] to be zero for all i𝑖iitalic_i in advance (cf. Theorem 1), and, thus, any mean-based clustering algorithm (e.g., k-means) to be sub-optimal. On the other hand, while Gaussian mixture models (GMMs) can also be used as generative models as in Section IV-A, GMM-based clustering allows to incorporate the insights of Theorem 1. Specifically, GMMs assign an unconstrained Gaussian distribution to every cluster individually. However, it is also possible to regularize their means and CMs. In the following, we utilize the GMM covariance Toeplitz parameterization and clustering from [18], in which the cluster indices are used for CSI feedback222We refer to [19] for more details about GMMs for wireless communication.. However, in addition to [18], we also enforce the GMM means to be zero, such that every cluster is regularized to have a zero mean and Toeplitz structured CM. We trained this regularized GMM with 80 0008000080\,00080 000 single-input-single-output (SISO) narrowband time-varying channels, which are sampled 16161616 times every 0.50.50.50.5 ms in a 120superscript120120\,^{\circ}120 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT sector of the “3GPP_38.901_UMa_NLOS” scenario of QuaDRiGa. This results in 𝑯𝑯\bm{H}bold_italic_H (cf. (1)) exhibiting solely the temporal domain. Each user’s velocity is randomly drawn from the density p(v)𝑝𝑣p(v)italic_p ( italic_v ) illustrated in Fig. 3 a). This density exhibits four distinct velocities regions (region 1-4), which allows a ground truth velocity clustering of a test set of 8000800080008000 channel trajectories, where each trajectory is labeled with the corresponding user’s velocity region. After training the regularized GMM, which gets no direct velocity information during training as well as testing, we cluster the test set using the GMM. For evaluation, we then compare the ground truth velocity clustering represented by the random variable Cvsubscript𝐶𝑣C_{v}italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT with the GMM clustering represented by Cgsubscript𝐶𝑔C_{g}italic_C start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT by means of their mutual information I(Cv,Cg)𝐼subscript𝐶𝑣subscript𝐶𝑔I(C_{v},C_{g})italic_I ( italic_C start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) (cf. [20]). The same is done for k-means clustering Cksubscript𝐶𝑘C_{k}italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and is illustrated for different k-means and GMM cluster numbers K𝐾Kitalic_K in Fig. 3 b). It can be seen that the GMM clustering generally shows higher mutual information with the ground truth velocity clustering than k-means. Remarkably, although the GMM does not get any explicit velocity information, its mutual information achieves the entropy Hvsubscript𝐻𝑣H_{v}italic_H start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT of the ground truth velocity clustering for K=16𝐾16K=16italic_K = 16 and larger. This implies that in this regime, each GMM cluster is associated with exactly one of the velocity regions 1-4 in Fig. 3 a), i.e., the regularized GMM yields perfect velocity clustering. In the same K𝐾Kitalic_K-regime, the mutual information of k-means shows a significant gap to Hvsubscript𝐻𝑣H_{v}italic_H start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT.

Refer to caption
Figure 4: nMSEnMSE\mathrm{nMSE}roman_nMSE over the SNR of the four channel estimators pilot, sensing, joint and the zero vector.
𝔼𝑯|𝒛[𝑯]=subscript𝔼conditional𝑯𝒛𝑯absent\displaystyle\operatorname{\mathbb{E}}_{\bm{H}|\bm{z}}[\bm{H}]=blackboard_E start_POSTSUBSCRIPT bold_italic_H | bold_italic_z end_POSTSUBSCRIPT [ bold_italic_H ] = 𝔼(𝜷,𝚵)|𝒛[𝑯]=𝔼𝚵|𝒛[=1Lp𝔼𝜷|(𝒛,𝚵)[ejβ]𝒂f,𝒂t,𝒂R,𝒂T,]subscript𝔼conditional𝜷𝚵𝒛𝑯subscript𝔼conditional𝚵𝒛superscriptsubscript1𝐿tensor-productsubscript𝑝subscript𝔼conditional𝜷𝒛𝚵superscriptejsubscript𝛽subscript𝒂𝑓subscript𝒂𝑡subscript𝒂Rsubscript𝒂T\displaystyle\operatorname{\mathbb{E}}_{(\bm{\beta},\bm{\Xi})|\bm{z}}[\bm{H}]=% \operatorname{\mathbb{E}}_{\bm{\Xi}|\bm{z}}\left[\sum_{\ell=1}^{L}\sqrt{p_{% \ell}}\operatorname{\mathbb{E}}_{\bm{\beta}|(\bm{z},\bm{\Xi})}\left[\mathrm{e}% ^{-\operatorname{j}\beta_{\ell}}\right]\bm{a}_{f,\ell}\otimes\bm{a}_{t,\ell}% \otimes\bm{a}_{\operatorname{R},\ell}\otimes\bm{a}_{\operatorname{T},\ell}\right]blackboard_E start_POSTSUBSCRIPT ( bold_italic_β , bold_Ξ ) | bold_italic_z end_POSTSUBSCRIPT [ bold_italic_H ] = blackboard_E start_POSTSUBSCRIPT bold_Ξ | bold_italic_z end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT square-root start_ARG italic_p start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_β | ( bold_italic_z , bold_Ξ ) end_POSTSUBSCRIPT [ roman_e start_POSTSUPERSCRIPT - roman_j italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] bold_italic_a start_POSTSUBSCRIPT italic_f , roman_ℓ end_POSTSUBSCRIPT ⊗ bold_italic_a start_POSTSUBSCRIPT italic_t , roman_ℓ end_POSTSUBSCRIPT ⊗ bold_italic_a start_POSTSUBSCRIPT roman_R , roman_ℓ end_POSTSUBSCRIPT ⊗ bold_italic_a start_POSTSUBSCRIPT roman_T , roman_ℓ end_POSTSUBSCRIPT ] (11)
𝔼𝑯|𝒛[𝑯𝑯H]=subscript𝔼conditional𝑯𝒛𝑯superscript𝑯Habsent\displaystyle\operatorname{\mathbb{E}}_{\bm{H}|\bm{z}}[\bm{H}\bm{H}^{% \operatorname{H}}]=blackboard_E start_POSTSUBSCRIPT bold_italic_H | bold_italic_z end_POSTSUBSCRIPT [ bold_italic_H bold_italic_H start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ] = 𝔼𝚵|𝒛[,=1Lpp𝔼𝜷|(𝒛,𝚵)[ejβejβ]𝒂f,𝒂f,H𝒂t,𝒂t,H𝒂R,𝒂R,H𝒂T,𝒂T,H]subscript𝔼conditional𝚵𝒛superscriptsubscriptsuperscript1𝐿tensor-producttensor-producttensor-productsubscript𝑝subscript𝑝superscriptsubscript𝔼conditional𝜷𝒛𝚵superscriptejsubscript𝛽superscriptejsubscript𝛽superscriptsubscript𝒂𝑓superscriptsubscript𝒂𝑓superscriptHsubscript𝒂𝑡superscriptsubscript𝒂𝑡superscriptHsubscript𝒂Rsuperscriptsubscript𝒂RsuperscriptHsubscript𝒂Tsuperscriptsubscript𝒂TsuperscriptH\displaystyle\operatorname{\mathbb{E}}_{\bm{\Xi}|\bm{z}}\left[\sum_{\ell,\ell^% {\prime}=1}^{L}\sqrt{p_{\ell}}\sqrt{p_{\ell^{\prime}}}\operatorname{\mathbb{E}% }_{\bm{\beta}|(\bm{z},\bm{\Xi})}\left[\mathrm{e}^{-\operatorname{j}\beta_{\ell% }}\mathrm{e}^{\operatorname{j}\beta_{\ell^{\prime}}}\right]\bm{a}_{f,\ell}\bm{% a}_{f,\ell^{\prime}}^{\operatorname{H}}\otimes\bm{a}_{t,\ell}\bm{a}_{t,\ell^{% \prime}}^{\operatorname{H}}\otimes\bm{a}_{\operatorname{R},\ell}\bm{a}_{% \operatorname{R},\ell^{\prime}}^{\operatorname{H}}\otimes\bm{a}_{\operatorname% {T},\ell}\bm{a}_{\operatorname{T},\ell^{\prime}}^{\operatorname{H}}\right]blackboard_E start_POSTSUBSCRIPT bold_Ξ | bold_italic_z end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT roman_ℓ , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT square-root start_ARG italic_p start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG square-root start_ARG italic_p start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_ARG blackboard_E start_POSTSUBSCRIPT bold_italic_β | ( bold_italic_z , bold_Ξ ) end_POSTSUBSCRIPT [ roman_e start_POSTSUPERSCRIPT - roman_j italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT roman_j italic_β start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] bold_italic_a start_POSTSUBSCRIPT italic_f , roman_ℓ end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT italic_f , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ⊗ bold_italic_a start_POSTSUBSCRIPT italic_t , roman_ℓ end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT italic_t , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ⊗ bold_italic_a start_POSTSUBSCRIPT roman_R , roman_ℓ end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT roman_R , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ⊗ bold_italic_a start_POSTSUBSCRIPT roman_T , roman_ℓ end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT roman_T , roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT ] (12)

 

IV-C Channel Estimation

A further application of Section III is given in the context of channel estimation. Theorem 1 characterizes the conditional mean 𝔼[𝑯|𝒛]𝔼conditional𝑯𝒛\operatorname{\mathbb{E}}[\bm{H}|\bm{z}]blackboard_E [ bold_italic_H | bold_italic_z ] of the channel 𝑯𝑯\bm{H}bold_italic_H given some side information 𝒛𝒛\bm{z}bold_italic_z. It is well known that this conditional mean represents the minimum mean squared error (MMSE) channel estimator based on 𝒛𝒛\bm{z}bold_italic_z. Thus, Theorem 1 in combination with the setups in Fig. 1 establishes a framework to analyze which kind of information can be fundamentally utilized for estimating the channel. More precisely, if 𝒛𝒛\bm{z}bold_italic_z is not statistically relevant for βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, then the best channel estimate given 𝒛𝒛\bm{z}bold_italic_z is the zero vector (cf. Theorem 1). Consequently, 𝒛𝒛\bm{z}bold_italic_z has to contain a descendant of 𝑯𝑯\bm{H}bold_italic_H, i.e., a pilot observation for channel estimation (cf. Fig. 1 c)). Additionally, Section III also theoretically underpins that as long as 𝒛𝒛\bm{z}bold_italic_z contains some direct observation of 𝑯𝑯\bm{H}bold_italic_H, any additional side information about 𝚵𝚵\bm{\Xi}bold_Ξ can improve the estimation. This is due to βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT being able to statistically depend on 𝚵𝚵\bm{\Xi}bold_Ξ if and only if 𝑯𝑯\bm{H}bold_italic_H or one of its descendants (i.e, a direct observation) is given (cf. Section II-B). For evaluating these insights, we trained several fully-connected NNs for channel estimation in an end-to-end fashion. Specifically, we generated 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT channels using (1) containing solely the spatial receiver domain of dimension 16161616. All these realizations contain three paths, βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and θ(R)superscriptsubscript𝜃R\theta_{\ell}^{(\operatorname{R})}italic_θ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_R ) end_POSTSUPERSCRIPT are uniformly distributed, and the path losses psubscript𝑝\sqrt{p_{\ell}}square-root start_ARG italic_p start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_ARG are uniformly distributed between zero and one and then normalized to sum up to one. In Fig. 4, the estimation performance in terms of their nMSEnMSE\mathrm{nMSE}roman_nMSE to the ground truth channel of three different NN-based estimators over the signal-to-noise ratio (SNR) is shown, denoted by sensing, pilot and joint. For evaluation, we used a test set of 5000500050005000 channels. All NNs are trained by minimizing the MSE between their output and the true channel, and their NN depth and width were optimized by a random search using a validation set of size 5000500050005000. As input for the sensing NN, we took the ground truth steering vectors (cf. (7)) of the corresponding channel’s three paths. However, since these have no statistical relevance for βsubscript𝛽\beta_{\ell}italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT, (9) holds, the MMSE estimator is given by the zero vector and the sensing network does not outperform the zero vector. On the other hand, the pilot network takes a noisy channel observation as input. Thus, a descendant of 𝑯𝑯\bm{H}bold_italic_H is observed, Fig. 1 c) applies and the estimator outperforms the zero vector (cf. Section III-B). The joint network takes both, the ground truth steering vectors as well as the noisy channel observation as input. Since a descendant of 𝑯𝑯\bm{H}bold_italic_H (i.e., the noisy channel observation) is given, 𝜷𝜷\bm{\beta}bold_italic_β and 𝚵𝚵\bm{\Xi}bold_Ξ can statistically depend on each other (cf. Section III-B), and additional information about 𝚵𝚵\bm{\Xi}bold_Ξ can improve the estimation, which is seen in Fig. 4 in form of joint outperforming pilot.

V Conclusion

In this work, we introduced a comprehensive framework, which combines insights from a generic channel model with BNs to categorize side information on its effect on the channel’s WSSUS and zero-mean properties. This framework can be utilized in various ways. We demonstrated how it can be leveraged to analyze side information for channel generation or estimation, and to directly regularize channel clustering. While we discussed three particular applications, this analysis indicates that many more exist, which are part of future work.

-A Proof of Theorem 1

To prove Theorem 1, we reformulate the conditional mean of 𝑯𝑯\bm{H}bold_italic_H given 𝒛𝒛\bm{z}bold_italic_z according to (11), where we insert the definition (1) of 𝑯𝑯\bm{H}bold_italic_H. For simplicity, we also leave out the arguments of the steering vectors. The inner expectation 𝔼𝜷|(𝒛,𝚵)[ejβ]subscript𝔼conditional𝜷𝒛𝚵superscriptejsubscript𝛽\operatorname{\mathbb{E}}_{\bm{\beta}|(\bm{z},\bm{\Xi})}[\mathrm{e}^{-% \operatorname{j}\beta_{\ell}}]blackboard_E start_POSTSUBSCRIPT bold_italic_β | ( bold_italic_z , bold_Ξ ) end_POSTSUBSCRIPT [ roman_e start_POSTSUPERSCRIPT - roman_j italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] equals zero according to the assumption β|(𝚵,𝒛)𝒰([0,2π])similar-toconditionalsubscript𝛽𝚵𝒛𝒰02𝜋\beta_{\ell}|(\bm{\Xi},\bm{z})\sim\mathcal{U}([0,2\pi])italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT | ( bold_Ξ , bold_italic_z ) ∼ caligraphic_U ( [ 0 , 2 italic_π ] ) in Theorem 1. Equivalently, the conditional CM 𝔼[𝑯𝑯H|𝒛]𝔼conditional𝑯superscript𝑯H𝒛\operatorname{\mathbb{E}}[\bm{H}\bm{H}^{\operatorname{H}}|\bm{z}]blackboard_E [ bold_italic_H bold_italic_H start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT | bold_italic_z ] can be decomposed according to (12). Considering assumption (4) and the above reasoning, the inner expectation 𝔼𝜷|(𝒛,𝚵)[ejβejβ]subscript𝔼conditional𝜷𝒛𝚵superscriptejsubscript𝛽superscriptejsubscript𝛽superscript\operatorname{\mathbb{E}}_{\bm{\beta}|(\bm{z},\bm{\Xi})}[\mathrm{e}^{-% \operatorname{j}\beta_{\ell}}\mathrm{e}^{\operatorname{j}\beta_{\ell^{\prime}}}]blackboard_E start_POSTSUBSCRIPT bold_italic_β | ( bold_italic_z , bold_Ξ ) end_POSTSUBSCRIPT [ roman_e start_POSTSUPERSCRIPT - roman_j italic_β start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_e start_POSTSUPERSCRIPT roman_j italic_β start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] equals zero for superscript\ell\neq\ell^{\prime}roman_ℓ ≠ roman_ℓ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and one otherwise. Since 𝒂()()𝒂()()Hsubscript𝒂subscript𝒂superscriptH\bm{a}_{(\cdot)}(\cdot)\bm{a}_{(\cdot)}(\cdot)^{\operatorname{H}}bold_italic_a start_POSTSUBSCRIPT ( ⋅ ) end_POSTSUBSCRIPT ( ⋅ ) bold_italic_a start_POSTSUBSCRIPT ( ⋅ ) end_POSTSUBSCRIPT ( ⋅ ) start_POSTSUPERSCRIPT roman_H end_POSTSUPERSCRIPT is a Toeplitz structured matrix for any domain and any channel parameter configuration (cf. (5)-(8)), it is Toeplitz in expectation concluding the proof.

References

  • [1] D. Tse and P. Viswanath, Fundamentals of wireless communication.   USA: Cambridge University Press, 2005.
  • [2] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017.
  • [3] E. Vagenas, G. S. Paschos, and S. A. Kotsopoulos, “Beamforming capacity optimization for MISO systems with both mean and covariance feedback,” IEEE Trans. Wireless Commun., vol. 10, no. 9, pp. 2994–3001, 2011.
  • [4] B. Böck, D. Semmler, B. Fesl, M. Baur, and W. Utschick, “Gohberg-Semencul estimation of Toeplitz structured covariance matrices and their inverses,” 2023, arXiv:2311.14995.
  • [5] P. Bello, “Characterization of randomly time-variant linear channels,” IEEE Trans. on Commun. Syst., vol. 11, no. 4, pp. 360–393, 1963.
  • [6] X. Yin and X. Cheng, Propagation Channel Characterization, Parameter Estimation, and Modeling for Wireless Communication.   Wiley-IEEE Press, November 2016.
  • [7] J. A. Zhang, F. Liu, C. Masouros, R. W. Heath, Z. Feng, L. Zheng, and A. Petropulu, “An overview of signal processing techniques for joint communication and radar sensing,” IEEE J. Sel. Topics Signal Process., vol. 15, no. 6, pp. 1295–1315, 2021.
  • [8] C. Studer, S. Medjkouh, E. Gonultaş, T. Goldstein, and O. Tirkkonen, “Channel charting: Locating users within the radio environment using channel state information,” IEEE Access, vol. 6, pp. 47 682–47 698, 2018.
  • [9] P. Almers, E. Bonek, A. G. Burr, N. Czink, M. Debbah, V. Degli‐Esposti, H. Hofstetter, P. Kyösti, D. I. Laurenson, G. Matz, A. F. Molisch, C. Oestges, and H. Ozcelik, “Survey of channel and radio propagation models for wireless MIMO systems,” EURASIP J. Wireless Commun. Netw., vol. 2007, pp. 1–19, 2007.
  • [10] 3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz (release 16),” 3rd Generation Partnership Project (3GPP), Tech. Rep. TR 38.901 version 16.1.0 Release 16, 2020.
  • [11] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques.   MIT Press, Cambridge, 2009.
  • [12] Z. Feng, Z. Wei, X. Chen, H. Yang, Q. Zhang, and P. Zhang, “Joint communication, sensing, and computation enabled 6G intelligent machine system,” IEEE Network, vol. 35, no. 6, pp. 34–42, 2021.
  • [13] Y. Yang, Y. Li, W. Zhang, F. Qin, P. Zhu, and C.-X. Wang, “Generative-adversarial-network-based wireless channel modeling: Challenges and opportunities,” IEEE Commun. Mag., vol. 57, no. 3, pp. 22–27, 2019.
  • [14] M. Baur, B. Fesl, and W. Utschick, “Leveraging variational autoencoders for parameterized MMSE estimation,” 2024, arXiv:2307.05352.
  • [15] S. Jaeckel, L. Raschkowski, K. Börner, and L. Thiele, “Quadriga: A 3-d multi-cell channel model with time evolution for enabling virtual field trials,” IEEE Trans. Antennas Propag., vol. 62, no. 6, pp. 3242–3256, 2014.
  • [16] K. Grigoriadis, A. Frazho, and R. Skelton, “Application of alternating convex projection methods for computation of positive Toeplitz matrices,” IEEE Trans. Signal Process., vol. 42, no. 7, pp. 1873–1875, 1994.
  • [17] B. Fesl, N. Turan, B. Böck, and W. Utschick, “Channel estimation for quantized systems based on conditionally Gaussian latent models,” IEEE Trans. Signal Process., pp. 1–16, 2024.
  • [18] N. Turan, B. Fesl, and W. Utschick, “Enhanced low-complexity FDD system feedback with variable bit lengths via generative modeling,” in 2023 57th Asilomar Conf. Signals, Syst., Comput., 2023, pp. 363–369.
  • [19] N. Turan, B. Fesl, M. Koller, M. Joham, and W. Utschick, “A versatile low-complexity feedback scheme for FDD systems via generative modeling,” IEEE Trans. Wireless Commun., pp. 1–1, 2023.
  • [20] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance,” J. Mach. Learn. Res., vol. 11, p. 2837–2854, dec 2010.