Wideband Beamforming for RIS Assisted Near-Field Communications

Ji Wang, Jian Xiao, Yixuan Zou, Wenwu Xie, and Yuanwei Liu
Abstract

A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid precoding architecture by introducing sub-connected true time delay (TTD) units, while two specific RIS architectures, namely true time delay-based RIS (TTD-RIS) and virtual subarray-based RIS (SA-RIS), are exploited to realize the frequency-dependent passive beamforming at the RIS. Furthermore, the efficient E2E beamforming models without explicit channel state information are proposed, which jointly exploits the uplink channel training module and the downlink wideband beamforming module. In the proposed network architecture of the E2E models, the classical communication signal processing methods, i.e., polarized filtering and sparsity transform, are leveraged to develop a signal-guided beamforming network. Numerical results show that the proposed E2E models have superior beamforming performance and robustness to conventional beamforming benchmarks. Furthermore, the tradeoff between the beamforming gain and the hardware complexity is investigated for different frequency-dependent RIS architectures, in which the TTD-RIS can achieve better spectral efficiency than the SA-RIS while requiring additional energy consumption and hardware cost.

Index Terms:
Deep learning, near-field communications, reconfigurable intelligent surface, wideband beamforming.

I Introduction

The sixth-generation (6G) wireless networks aim to further deliver high throughput, achieve massive connectivity, and enhance energy efficiency. In order to accomplish these promising objectives, extremely large scale antenna arrays (ELAAs) and tremendously high frequencies form a pair of prospective technological solutions. In particular, as a new type of metamaterial antenna, reconfigurable intelligent surface (RIS) technology has been regarded as one of the highly anticipated candidate ELAA solution to construct a smart radio environment [1, 2]. In this case, the near-field boundary in 6G communications will be significantly extended due to the increase of Rayleigh distance that is positively correlated with the array aperture and the communication frequency [3]. Considering a large number of available bandwidth in high frequencies, e.g., millimeter wave (mmWave) and Terahertz (THz), the near-field wideband RIS communications is becoming an up-and-coming communication paradigm in 6G era [4].

In near-field wideband RIS systems, new electromagnetic (EM) characteristics need to be considered compared to the classic far-field narrowband systems. Firstly, in contrast to the planar wavefront assumption in the far-field channel modeling, the near-field channel involves both angle and distance dimensions due to the spherical wavefront in the near-field radiation, which results in the near-field beamfocusing effect instead of the far-field beamsteering [5, 6]. Secondly, the near-field wideband channels can be strongly frequency-dependent due to the large bandwidth between different subcarriers. However, for the popular hybrid beamforming architecture in ELAA systems, the typical analog beamformer is frequency-independent. Consequently, the beams generated in different frequencies may be focused at different locations, which is termed as the beam split effect [7]. Especially, in RIS enabled wideband communications, since reflection units at the RIS only carry out the passive phase shifting operation, the im**ing and reflected beams at the RIS will be also split into different physical directions for different frequencies. In this case, the specific property of the frequency-independent analog precoding at the base station (BS) and phase shifting at the RIS cause the unique double beam split effect [8]. Consequently, efficient frequency-dependent beamforming architectures and optimization schemes are urgently expected to investigate for near-field wideband RIS systems.

I-A Prior works

I-A1 Wideband RIS Communications

To deal with the beam split effect in RIS-aided mmWave/THz systems, distributed RISs and delay adjustable RISs are two feasible solutions. In [9], the distributed RIS deployment strategy was proposed to relieve the beam split effect, which required the high deployment cost and still relied on the frequency-independent phase shifting architecture [10]. In order to completely break through this limitation of the analog phase shifting circuit at the RIS, the proposed true time delay (TTD) module in the wideband hybrid precoder architecture was extended to the classic RIS architecture. Specifically, in [11], the RIS element can realize the frequency-selective operation by introducing TTD units, in which a sub-surface architecture of RIS was provided to balance the power gain and the hardware cost. Furthermore, the authors of [8] proposed a sub-connected phase-delay-phase RIS architecture to provide an energy-efficient implementation for the frequency-dependent wideband phase shifting scheme. Moreover, the emerging simultaneously transmitting and reflecting RIS (STAR-RIS) architecture has been explored in THz wideband communications [12, 13].

I-A2 Near-Field Wideband ELAA Communications

In near-field wideband communications, the intricacies of near-field beamfocusing and wideband beam split effects are deeply coupled, significantly complicating beamforming optimization efforts. The authors of [14] investigated the near-field beam split phenomenon for conventional ELAA systems, in which a phase-delay focusing approach was proposed to improve effective beamforming gain. Furthermore, in [15], two distinct TTD configurations, i.e., a serial TTD and a hybrid serial-parallel TTD, were proposed to address the spatial-wideband effect in near-field ELAA systems. In [16], focusing on holographic metasurface antennas-based ELAA systems, a multi-user beam combining optimization framework was proposed, which accounted for both the near-field and dual-wideband effects in holographic communication systems.

I-A3 Near-Field Wideband RIS Communications

For RIS enabled near-field wideband systems, the authors of [17] proposed an RIS configuration approach to maximize the communication rate, which effectively mitigates the beamforming losses caused by the near-field beam split effect. In [18], the delay adjustable metasurface technique in [19] that can adjust the delays of signals reflected by different RIS elements was applied to alleviate the beam split effect. However, the fully-connected TTD module in [18] will lead to excessive hardware cost and power consumption, which requires the number of delay units has to be equal to the number of massive RIS elements. Considering the uplink achievable rate optimization in THz RIS systems, the authors of [20] divided the RIS into multiple virtual subarrays, in which the phase shift of each subarray was optimized according to the corresponding subcarrier channel. In this way, the RIS was endowed with the ability to carry out the frequency-dependent passive beamforming at different sub-bandwidths. However, the effective aperture of RIS will be shrunk for the virtual subarray architecture, which results in the significant energy loss of received signals at the RIS and hence restricts the system performance.

I-B Motivations and Contributions

While a few of research efforts have been devoted to investigate the near-field wideband RIS systems, the necessary prior assumptions, such as the known array manifold [18] and the line-of-sight (LOS)-dominant channel [20], were required for the RIS phase shifting derivation. In addition, in the aforementioned works, the authors only focus on the phase shifting design at the RIS, while the BS was assumed to be equipped with the single antenna or the predetermined fully-digital precoder. Consequently, the comprehensive solution is expected to be further investigated for the joint passive and active beamforming optimization, which involves the coupled non-convex optimization in the wideband hybrid beamforming at the BS and phase shifting at the RIS. Moreover, when the RIS and the BS are equipped with large-scale antenna arrays, the required high-dimensional channel acquisition is also an intractable challenge for the beamforming optimization.

Against the above background, in this work, we investigate the near-field wideband beamforming design for RIS-aided multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. Our main contributions are summarized as follows.

\bullet We investigate frequency-dependent hybrid precoding and phase shifting architecture for near-field wideband RIS systems, aiming for alleviating the beamforming performance loss caused by the near-field double beam split effect. Specifically, the BS is equipped with frequency-dependent hybrid precoding architecture by introducing sub-connected TTD units, to deal with the near-field beam split effect at the BS. Furthermore, considering the wideband beam split effect at the RIS, two specific RIS architectures, namely true time delay-based RIS (TTD-RIS) and virtual subarray-based RIS (SA-RIS), are exploited to realize the frequency-dependent passive beamforming at the RIS.

\bullet We propose a deep learning-based end-to-end (E2E) beamforming optimization framework to maximize the effective spectral efficiency in RIS-aided MIMO-OFDM systems. The proposed E2E model is composed of the uplink channel training (UL-CT) module and the downlink beamforming (DL-BF) module, in which the learnable combining matrix at the BS and phase shifting at the RIS are designed to realize the joint optimization of beamforming and channel estimation with limited pilot overhead. In contrast to the pre-defined combining matrix and reflection pattern in traditional channel estimators, the combining matrix and phase shifting in the proposed UL-CT module can be adaptively tuned according to dynamic wireless environments.

\bullet We exploit an efficient signal-guided beamforming network architecture based on the proposed E2E optimization framework, which integrates advanced neural network architectures and classical communication signal processing methods. Specifically, in the proposed UL-CT module, we design a polar attention architecture to imitate the typical communication signal filtering in the frequency domain and time-spatial domain, which can finely learn effective latent channel semantic information from the received pilots. Motivated by the natural channel sparsity for high-frequency ELAA systems, a learnable discrete Fourier transform (DFT) is introduced into the proposed DL-BF module, which guides and accelerates the convergence of the beamforming network.

\bullet Our numerical results reveal that a superior beamforming performance can be achieved by the proposed E2E models over the conventional beamforming benchmarks. Specifically, compared to the conventional hybrid precoding and the classic RIS architecture, the proposed TTD-RIS and SA-RIS can effectively mitigate the near-field double beam split effect. Furthermore, the proposed E2E models can jointly optimize the active and passive beamforming with the implicit CSI, which reduces the required training overhead and improves the effective spectral efficiency. Moreover, the robustness and generalization of the proposed E2E models are evaluated under various system setups.

I-C Organizations and Notations

The remainder of this paper is organized as follows. Section II introduces the near-field wideband channel modeling and system model in RIS assisted MIMO-OFDM systems. In Section III, the deep learning-based near-field wideband beamforming framework is proposed. Furthermore, the signal-guided network architecture is presented in Section IV. Section V provides numerical results of the proposed E2E models. In Section VI, this paper is comprehensively summarized.

Notations: Lower-case and upper-case boldface letters denote a vector and a matrix, respectively; 𝐀Tsuperscript𝐀𝑇\mathbf{A}^{T}bold_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝐀Hsuperscript𝐀𝐻\mathbf{A}^{H}bold_A start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT denote the transpose and conjugate transpose of matrix A𝐴Aitalic_A, respectively; asuperscript𝑎a^{*}italic_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT denotes the conjugate of complex number a𝑎aitalic_a; diag(𝐚)diag𝐚\rm{diag}(\mathbf{a})roman_diag ( bold_a ) denotes the diagonal matrix with the vector 𝐚𝐚\mathbf{a}bold_a on its diagonal; 𝐈asubscript𝐈𝑎{\bf{I}}_{a}bold_I start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is a a×a𝑎𝑎a\times aitalic_a × italic_a identity matrix, while 𝟏asubscript1𝑎{\bf{1}}_{a}bold_1 start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is a a×1𝑎1a\times 1italic_a × 1 vector, satisfying 𝟏i=1,i={1,,a}formulae-sequencesubscript1𝑖1for-all𝑖1𝑎{\bf{1}}_{i}=1,\forall i=\{1,\ldots,a\}bold_1 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 , ∀ italic_i = { 1 , … , italic_a }; Symbols |||{\cdot}|| ⋅ |, \left\|{\cdot}\right\|∥ ⋅ ∥, and F\left\|{\cdot}\right\|_{F}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT denote the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and Frobenius norm, respectively; proportional-to\propto denotes the proportionality relation. direct-product\odot and tensor-product\otimes denote the Hadamard product and Kronecker product, respectively. (𝐀)𝐀\Re(\mathbf{A})roman_ℜ ( bold_A ) and (𝐀)𝐀\Im(\mathbf{A})roman_ℑ ( bold_A ) denote the real and imaginary components of the complex-value matrix 𝐀𝐀\mathbf{A}bold_A. det(𝐀)det𝐀\mathrm{det}(\mathbf{A})roman_det ( bold_A ) denote the determinant of the matrix 𝐀𝐀\mathbf{A}bold_A. Symbol jj\mathrm{j}roman_j denotes the imaginary number, and \left\lfloor{\cdot}\right\rfloor⌊ ⋅ ⌋ denotes the round down operation.

II System Model and Problem Formulation

Refer to caption


Figure 1: Near-field wideband systems assisted by different frequency-dependent RIS architectures. (a) The TTD-RIS architecture; (b) The SA-RIS architecture.

As shown in Fig. 1, we consider an N𝑁Nitalic_N-element RIS assisted mmWave MIMO-OFDM systems with M𝑀Mitalic_M transmit and U𝑈Uitalic_U receive antennas. Both transmit antennas at the BS and the receive antennas at the user equipment (UE) are placed in uniform linear arrays (ULAs), while the reflection elements at the RIS are placed in uniform plane arrays (UPAs), i.e., N=N1×N2𝑁subscript𝑁1subscript𝑁2N=N_{1}\times N_{2}italic_N = italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Considering the high susceptibility in mmWave communications and practical complex wireless environments, we assume that the LOS path between the BS and the UE is completely blocked, while the non-LOS (NLOS) paths are constructed by limited clustered scatterers. To enhance mmWave communications, the RIS is introduced to provide the virtual LOS path between the BS and the UE.

To deal with the double beam split effect caused by the coupling of wideband beam split at the BS and the RIS, we introduce the newly fashionable frequency-dependent beamforming architecture. Firstly, the sub-connected TTD module is introduced to the hybrid beamforming architecture at the BS, which is composed of K𝐾Kitalic_K TTDs and MRFMmuch-less-thansubscript𝑀RF𝑀M_{\text{RF}}\ll Mitalic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT ≪ italic_M radio frequency (RF) chains. Specifically, each RF chain is connected to K𝐾Kitalic_K TTDs, and then each TTD is connected to P=M/K𝑃𝑀𝐾P=M/Kitalic_P = italic_M / italic_K phase-shifters (PSs). By utilizing the TTD module, the traditional analogy beamforming will evolve into the frequency-dependent analogy beamforming at the BS. Then, for the wideband RIS architecture, we provide two feasible candidate solutions, i.e., sub-connected TTD-RIS in Fig. 1(a) and virtual SA-RIS architecture in Fig. 1(b), whose detailed design guidelines are presented in Section II.C.

II-A Near-Field Boundary in RIS-Aided MIMO Systems

In near-field communications, the Rayleigh distance R𝑅Ritalic_R is a widely used criterion to define the near-field boundary, which is given by [21]

R=2D2λ,𝑅2superscript𝐷2𝜆\displaystyle R=\frac{2D^{2}}{\lambda},italic_R = divide start_ARG 2 italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ end_ARG , (1)

where D𝐷Ditalic_D and λ𝜆\lambdaitalic_λ denote the antenna aperture and the wavelength, respectively. In multiple-input single-output (MISO) systems, the near-field boundary can be defined according to (1). With the increase of antenna array and communication frequency, the near-field region will be significantly extended.

In particular, for RIS-aided MIMO communications, the near-field criterion is different for the BS\toUE direct link and the BS\toRIS\toUE cascaded link due to different EM characteristics. For the BS\toUE MIMO communications, the near-field boundary is given by [3]

R=2(DB+DU)2λ,𝑅2superscriptsuperscript𝐷Bsuperscript𝐷U2𝜆\displaystyle R=\frac{2(D^{\text{B}}+D^{\text{U}})^{2}}{\lambda},italic_R = divide start_ARG 2 ( italic_D start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT + italic_D start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ end_ARG , (2)

where DBsuperscript𝐷BD^{\text{B}}italic_D start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT and DUsuperscript𝐷UD^{\text{U}}italic_D start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT denote the antenna aperture of the BS and the UE, respectively.

In contrast to MISO/MIMO channels in conventional communication systems, the cascaded BS\toRIS\toUE reflection link comprises BS\toRIS and RIS\toUE links in RIS systems. According to the near-field criterion in [3], the near-field region in RIS systems can be expressed as

r1r2r1+r2<2D2λ,subscript𝑟1subscript𝑟2subscript𝑟1subscript𝑟22superscript𝐷2𝜆\begin{split}\frac{{{r_{1}}{r_{2}}}}{{r_{1}+r_{2}}}<\frac{{2{D^{2}}}}{\lambda}% ,\end{split}start_ROW start_CELL divide start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG < divide start_ARG 2 italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ end_ARG , end_CELL end_ROW (3)

where r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT denote the BS\toRIS and RIS\toUE distance, respectively. We observe that the near-field region in RIS systems is determined by the harmonic mean of r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In other words, as long as any of r1subscript𝑟1r_{1}italic_r start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and r2subscript𝑟2r_{2}italic_r start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is shorter than Rayleigh distance, the near-field communications will occur in RIS systems due the passive property of the RIS.

II-B Near-Field Wideband Channel Model

Considering the three-dimensional Cartesian coordinate system in Fig. 2, we assume that both BS and RIS lie on the xz𝑥𝑧xzitalic_x italic_z-plane, whose array center coordinate are set to 𝐜B=(xB,yB,zB)superscript𝐜Bsuperscript𝑥Bsuperscript𝑦Bsuperscript𝑧B\mathbf{c}^{\text{B}}=\left({x^{\text{B}},{y^{\text{B}}},{z^{\text{B}}}}\right)bold_c start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT ) and 𝐜R=(xR,yR,zR)superscript𝐜Rsuperscript𝑥Rsuperscript𝑦Rsuperscript𝑧R\mathbf{c}^{\text{R}}=\left({{x^{\text{R}}},y^{\text{R}},{z^{\text{R}}}}\right)bold_c start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT ), respectively. The coordinate of UE 𝐜U=(xU,yU,zU)superscript𝐜Usuperscript𝑥Usuperscript𝑦Usuperscript𝑧U\mathbf{c}^{\text{U}}=\left({x^{\text{U}},y^{\text{U}},z^{\text{U}}}\right)bold_c start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT ) is randomly distributed around the RIS. Let ΔmΔ𝑚\Delta mroman_Δ italic_m, ΔuΔ𝑢\Delta{u}roman_Δ italic_u and Δn=Δn1=Δn2Δ𝑛Δsubscript𝑛1Δsubscript𝑛2\Delta n=\Delta{n_{1}}=\Delta{n_{2}}roman_Δ italic_n = roman_Δ italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Δ italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT denote the distance between two adjacent antennas (elements) at the BS, the UE, and the RIS, respectively. Generally, the antenna spacing is set to d=Δm=Δu=Δn=c2fc𝑑Δ𝑚Δ𝑢Δ𝑛𝑐2subscript𝑓𝑐d=\Delta m=\Delta u=\Delta n=\frac{c}{2f_{c}}italic_d = roman_Δ italic_m = roman_Δ italic_u = roman_Δ italic_n = divide start_ARG italic_c end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG in large-scale array communications, where c𝑐citalic_c and fcsubscript𝑓𝑐f_{c}italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT denote the speed of light and the central carrier frequency, respectively. Hence, the coordinate of the AP antenna m𝑚{{m}}italic_m is 𝐜mB=(xB+(mM+12)d,yB,zB)subscriptsuperscript𝐜B𝑚superscript𝑥B𝑚𝑀12𝑑superscript𝑦Bsuperscript𝑧B\mathbf{c}^{\text{B}}_{m}=\left(x^{\text{B}}+({m}-\frac{{{M}+1}}{2})d,{y^{% \text{B}}},{z^{\text{B}}}\right)bold_c start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = ( italic_x start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT + ( italic_m - divide start_ARG italic_M + 1 end_ARG start_ARG 2 end_ARG ) italic_d , italic_y start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT ). Accordingly, the coordinate of the UE antenna u𝑢{{u}}italic_u is 𝐜uU=(xU+(uU+12)d,yU,zU)subscriptsuperscript𝐜U𝑢superscript𝑥U𝑢𝑈12𝑑superscript𝑦Usuperscript𝑧U\mathbf{c}^{\text{U}}_{u}=\left(x^{\text{U}}+({u}-\frac{{{U}+1}}{2})d,{y^{% \text{U}}},{z^{\text{U}}}\right)bold_c start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ( italic_x start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT + ( italic_u - divide start_ARG italic_U + 1 end_ARG start_ARG 2 end_ARG ) italic_d , italic_y start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT U end_POSTSUPERSCRIPT ). For the UPA-based RIS, the coordinate of the RIS element (n1,n2)subscript𝑛1subscript𝑛2\left({{n_{1}},{n_{2}}}\right)( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) is 𝐜n1,n2R=(xR+(n1N1+12)d,yR,zR+(n2N2+12)d)subscriptsuperscript𝐜Rsubscript𝑛1subscript𝑛2superscript𝑥Rsubscript𝑛1subscript𝑁112𝑑superscript𝑦Rsuperscript𝑧Rsubscript𝑛2subscript𝑁212𝑑\mathbf{c}^{\text{R}}_{n_{1},n_{2}}=\left({x^{\text{R}}}+({n_{1}}-\frac{{{N_{1% }}+1}}{2})d,{{y^{\text{R}}},{z^{\text{R}}}+({n_{2}}-\frac{{{N_{2}}+1}}{2})d}\right)bold_c start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( italic_x start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT + ( italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG ) italic_d , italic_y start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT + ( italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG ) italic_d ). In this work, we consider clustered scatterer propagation environments [22], in which the coordinate of scatterer s𝑠sitalic_s (1sSc)1𝑠subscript𝑆𝑐\left({1\leq s\leq{S_{c}}}\right)( 1 ≤ italic_s ≤ italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) in cluster c𝑐citalic_c (1cCs)1𝑐subscript𝐶s\left({1\leq c\leq C_{\text{s}}}\right)( 1 ≤ italic_c ≤ italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT ) is denoted as 𝐜c,sS=(xc,sS,yc,sS,zc,sS)subscriptsuperscript𝐜S𝑐𝑠subscriptsuperscript𝑥S𝑐𝑠subscriptsuperscript𝑦S𝑐𝑠subscriptsuperscript𝑧S𝑐𝑠\mathbf{c}^{\text{S}}_{c,s}=\left({{x^{\text{S}}_{c,s}},{y^{\text{S}}_{c,s}},{% z^{\text{S}}_{c,s}}}\right)bold_c start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT = ( italic_x start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT , italic_z start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ).

According to the wideband ray-tracing-based channel model [23], the BS\toUE NLOS channel 𝐃[τ]U×M𝐃delimited-[]𝜏superscript𝑈𝑀\mathbf{D}[\tau]\in\mathbb{C}^{U\times M}bold_D [ italic_τ ] ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × italic_M end_POSTSUPERSCRIPT at the τ𝜏\tauitalic_τ-th delay can be expressed as

𝐃[τ]=γBUc=1Css=1Scςc,sBUGBGULc,sBU𝐮c,sBU(𝐚c,sBU)Tδ(τTsτc,sBU),𝐃delimited-[]𝜏superscript𝛾BUsuperscriptsubscript𝑐1subscript𝐶ssuperscriptsubscript𝑠1subscript𝑆𝑐superscriptsubscript𝜍𝑐𝑠BUsubscript𝐺Bsubscript𝐺Usuperscriptsubscript𝐿𝑐𝑠BUsubscriptsuperscript𝐮BU𝑐𝑠superscriptsuperscriptsubscript𝐚𝑐𝑠BU𝑇𝛿𝜏subscript𝑇𝑠superscriptsubscript𝜏𝑐𝑠BU\displaystyle\mathbf{D}[\tau]={{\gamma^{\text{BU}}}\sum\limits_{c=1}^{{{C_{% \text{s}}}}}{\sum\limits_{s=1}^{{{{{S}_{c}}}}}{{{{\varsigma}_{c,s}^{\text{BU}}% }}}}\sqrt{G_{\text{B}}G_{\text{U}}L_{c,s}^{\text{BU}}}\mathbf{u}^{\text{BU}}_{% c,s}{(\mathbf{a}_{c,s}^{\text{BU}})^{T}}}\delta(\tau T_{s}-{\tau_{c,s}^{\text{% BU}}}),bold_D [ italic_τ ] = italic_γ start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT square-root start_ARG italic_G start_POSTSUBSCRIPT B end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT U end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT end_ARG bold_u start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_δ ( italic_τ italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ) , (4)

where γBU=1c=1CsScsuperscript𝛾BU1superscriptsubscript𝑐1subscript𝐶ssubscript𝑆𝑐{\gamma^{\text{BU}}}=\sqrt{\frac{1}{\sum\nolimits_{c=1}^{{C_{\text{s}}}}{{{{{S% }}}_{c}}}}}italic_γ start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG end_ARG is a normalization factor, ςc,sBU𝒞𝒩(0,1)similar-tosuperscriptsubscript𝜍𝑐𝑠BU𝒞𝒩01{{{\varsigma}}_{c,s}^{\text{BU}}}\sim\mathcal{C}\mathcal{N}(0,1)italic_ς start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , 1 ) is the complex gain, and Lc,sBUsuperscriptsubscript𝐿𝑐𝑠BUL_{c,s}^{\text{BU}}italic_L start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT is the path loss for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ). Parameters GBsubscript𝐺BG_{\text{B}}italic_G start_POSTSUBSCRIPT B end_POSTSUBSCRIPT and GUsubscript𝐺UG_{\text{U}}italic_G start_POSTSUBSCRIPT U end_POSTSUBSCRIPT denote the antenna gain at the transmitter ans receiver, respectively. The function δ(τ)𝛿𝜏\delta(\tau)italic_δ ( italic_τ ) denotes the dirac function for Tssubscript𝑇𝑠T_{s}italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT-spaced signaling evaluated at τ𝜏\tauitalic_τ seconds. 𝐚c,sBUM×1superscriptsubscript𝐚𝑐𝑠BUsuperscript𝑀1\mathbf{a}_{c,s}^{\text{BU}}\in{\mathbb{C}^{M\times 1}}bold_a start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT denotes the transmitting array response at the BS, and 𝐮c,sSUU×1superscriptsubscript𝐮𝑐𝑠SUsuperscript𝑈1\mathbf{u}_{c,s}^{\text{SU}}\in{\mathbb{C}^{U\times 1}}bold_u start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT SU end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × 1 end_POSTSUPERSCRIPT represents the receiving response at the UE. Parameter τc,sBUsuperscriptsubscript𝜏𝑐𝑠BU\tau_{c,s}^{\text{BU}}italic_τ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT denotes the path delay of scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ).

Suppose system bandwidth W𝑊Witalic_W is divided into B𝐵Bitalic_B orthogonal subcarriers in OFDM systems, the communication frequency at subcarrier b𝑏bitalic_b can be expressed as fb=fc+W(2b1B)2B,(1bB)subscript𝑓𝑏subscript𝑓𝑐𝑊2𝑏1𝐵2𝐵1𝑏𝐵f_{b}=f_{c}+\frac{W(2b-1-B)}{2B},(1\leq b\leq B)italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + divide start_ARG italic_W ( 2 italic_b - 1 - italic_B ) end_ARG start_ARG 2 italic_B end_ARG , ( 1 ≤ italic_b ≤ italic_B ). In this work, we adopt the uniform spherical wave model to characterize the near-field wideband array response vector [5]. The ULA array response at subcarrier frequency fbsubscript𝑓𝑏f_{b}italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT for BS\toscatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) is given by

𝐚(fb,ϕc,sBU,rc,sBU)=ej2πfbc(mdcosϕc,sBU+m2d2(sinϕc,sBU)22rc,sBU),𝐚subscript𝑓𝑏superscriptsubscriptitalic-ϕ𝑐𝑠BUsubscriptsuperscript𝑟BU𝑐𝑠superscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐𝑚𝑑superscriptsubscriptitalic-ϕ𝑐𝑠BUsuperscript𝑚2superscript𝑑2superscriptsuperscriptsubscriptitalic-ϕ𝑐𝑠BU22subscriptsuperscript𝑟BU𝑐𝑠\displaystyle\mathbf{a}(f_{b},\phi_{c,s}^{\text{BU}},r^{\text{BU}}_{c,s})=e^{-% j\frac{2\pi f_{b}}{c}\left(-md\cos\phi_{c,s}^{\text{BU}}+\frac{{{m}}^{2}d^{2}(% \sin\phi_{c,s}^{\text{BU}})^{2}}{2r^{\text{BU}}_{c,s}}\right)},bold_a ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT , italic_r start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) = italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG ( - italic_m italic_d roman_cos italic_ϕ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT + divide start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_sin italic_ϕ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_r start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT end_ARG ) end_POSTSUPERSCRIPT , (5)

where ϕc,sBUsuperscriptsubscriptitalic-ϕ𝑐𝑠BU\phi_{c,s}^{\text{BU}}italic_ϕ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT denotes the azimuth angle of departure (AoD) for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) at the BS, and rc,sBU=𝐜B𝐜c,sSsubscriptsuperscript𝑟BU𝑐𝑠normsuperscript𝐜Bsubscriptsuperscript𝐜S𝑐𝑠r^{\text{BU}}_{c,s}=\left\|{\mathbf{c}^{\text{B}}-\mathbf{c}^{\text{S}}_{c,s}}\right\|italic_r start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT = ∥ bold_c start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT - bold_c start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ∥ is the distance between the BS and scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) in the BS\toUE link.

Since the number of antennas of the UE is generally limited, the far-field radiation with the planner wavefront can be directly adopted for the channel modeling between scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) and the UE in the BS\toUE link. In this case, the ULA array response 𝐮𝐮\mathbf{u}bold_u for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) is given by

𝐮(fb,ϑc,sBU)=[1,ej2πfbcdsinϑc,sBU,,ej2πfbc(U1)dsinϑc,sBU]T,𝐮subscript𝑓𝑏superscriptsubscriptitalic-ϑ𝑐𝑠BUsuperscript1superscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐𝑑superscriptsubscriptitalic-ϑ𝑐𝑠BUsuperscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐𝑈1𝑑superscriptsubscriptitalic-ϑ𝑐𝑠BU𝑇\displaystyle{\mathbf{u}(f_{b},\vartheta_{c,s}^{\text{BU}})=\left[1,e^{{-j% \frac{2\pi f_{b}}{c}}d\sin\vartheta_{c,s}^{\text{BU}}},\ldots,e^{{-j\frac{2\pi f% _{b}}{c}}(U-1)d\sin\vartheta_{c,s}^{\text{BU}}}\right]^{T}},bold_u ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_ϑ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ) = [ 1 , italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG italic_d roman_sin italic_ϑ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , … , italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG ( italic_U - 1 ) italic_d roman_sin italic_ϑ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , (6)

where ϑc,sBUsuperscriptsubscriptitalic-ϑ𝑐𝑠BU\vartheta_{c,s}^{\text{BU}}italic_ϑ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT denotes the angle of arrival (AoA) for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) at the UE.

In OFDM systems, the time-domain signal is transformed into the frequency domain by utilizing the DFT. Correspondingly, the frequency-domain BS\toUE NLOS channel 𝐃[b]U×M𝐃delimited-[]𝑏superscript𝑈𝑀\mathbf{D}[b]\in\mathbb{C}^{U\times M}bold_D [ italic_b ] ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × italic_M end_POSTSUPERSCRIPT at subcarrier b𝑏bitalic_b is given by [23]

𝐃[b]=γBUc=1Css=1Scςc,sBUGBGULc,sBU𝐮c,sBU(𝐚c,sBU)Tej2πbWτc,sBUB.𝐃delimited-[]𝑏superscript𝛾BUsuperscriptsubscript𝑐1subscript𝐶ssuperscriptsubscript𝑠1subscript𝑆𝑐superscriptsubscript𝜍𝑐𝑠BUsubscript𝐺Bsubscript𝐺Usuperscriptsubscript𝐿𝑐𝑠BUsubscriptsuperscript𝐮BU𝑐𝑠superscriptsuperscriptsubscript𝐚𝑐𝑠BU𝑇superscript𝑒𝑗2𝜋𝑏𝑊superscriptsubscript𝜏𝑐𝑠BU𝐵\displaystyle\mathbf{D}[b]={{\gamma^{\text{BU}}}\sum\limits_{c=1}^{{{C_{\text{% s}}}}}{\sum\limits_{s=1}^{{{{{S}_{c}}}}}{{{{\varsigma}_{c,s}^{\text{BU}}}}}}% \sqrt{G_{\text{B}}G_{\text{U}}L_{c,s}^{{\text{BU}}}}\mathbf{u}^{\text{BU}}_{c,% s}{(\mathbf{a}_{c,s}^{\text{BU}})^{T}}}e^{\frac{-j2\pi bW\tau_{c,s}^{\text{BU}% }}{B}}.bold_D [ italic_b ] = italic_γ start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ς start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT square-root start_ARG italic_G start_POSTSUBSCRIPT B end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT U end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT end_ARG bold_u start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π italic_b italic_W italic_τ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT end_ARG start_ARG italic_B end_ARG end_POSTSUPERSCRIPT . (7)

Refer to caption


Figure 2: System layout for the near-filed channel modeling in BS\toRIS link.

The BS\toRIS channel 𝐆[τ]=𝐆LOS[τ]+𝐆NLOS[τ]N×M𝐆delimited-[]𝜏subscript𝐆LOSdelimited-[]𝜏subscript𝐆NLOSdelimited-[]𝜏superscript𝑁𝑀\mathbf{G}[\tau]=\mathbf{G}_{\text{LOS}}[\tau]+\mathbf{G}_{\text{NLOS}}[\tau]% \in\mathbb{C}^{N\times M}bold_G [ italic_τ ] = bold_G start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_τ ] + bold_G start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_τ ] ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × italic_M end_POSTSUPERSCRIPT is composed of a stable LOS path and clustered NLOS paths. In Fig. 2, the specific near-field BS\toRIS channel modeling is presented, wherein the central antenna of the BS is set to the origin of the coordinate system. The LOS channel 𝐆LOS[τ]subscript𝐆LOSdelimited-[]𝜏\mathbf{G}_{\text{LOS}}[\tau]bold_G start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_τ ] between the BS and the RIS at the τ𝜏\tauitalic_τ-th delay can be expressed as

𝐆LOS[τ]=subscript𝐆LOSdelimited-[]𝜏absent\displaystyle\mathbf{G}_{\text{LOS}}[\tau]=bold_G start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_τ ] = ς¯LOSBR𝐛LOSBR(𝐚LOSBR)T(𝐆cx𝐆cz)δ(τTsτLOSBR),direct-productsubscriptsuperscript¯𝜍BRLOSsuperscriptsubscript𝐛LOSBRsuperscriptsuperscriptsubscript𝐚LOSBR𝑇tensor-productsubscriptsuperscript𝐆𝑥𝑐subscriptsuperscript𝐆𝑧𝑐𝛿𝜏subscript𝑇𝑠superscriptsubscript𝜏LOSBR\displaystyle\bar{\varsigma}^{\text{BR}}_{\text{LOS}}\mathbf{b}_{\text{LOS}}^{% \text{BR}}{(\mathbf{a}}_{\text{LOS}}^{\text{BR}})^{T}\odot\left(\mathbf{G}^{x}% _{c}\otimes\mathbf{G}^{z}_{c}\right)\delta(\tau T_{s}-{\tau_{\text{LOS}}^{% \text{BR}}}),over¯ start_ARG italic_ς end_ARG start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ( bold_a start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⊙ ( bold_G start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⊗ bold_G start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) italic_δ ( italic_τ italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_τ start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) , (8)

where parameter ς¯LOSBR=GBGRLLOSBRsubscriptsuperscript¯𝜍BRLOSsubscript𝐺Bsubscript𝐺Rsuperscriptsubscript𝐿LOSBR\bar{\varsigma}^{\text{BR}}_{\text{LOS}}=\sqrt{G_{\text{B}}G_{\text{R}}L_{% \text{LOS}}^{\text{BR}}}over¯ start_ARG italic_ς end_ARG start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT = square-root start_ARG italic_G start_POSTSUBSCRIPT B end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT R end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT end_ARG is composed of transmitting antenna gain, RIS element gain, and path loss. τLOSBRsuperscriptsubscript𝜏LOSBR{\tau_{\text{LOS}}^{\text{BR}}}italic_τ start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT denotes the path delay of the LOS channel. 𝐚LOSBRM×1superscriptsubscript𝐚LOSBRsuperscript𝑀1\mathbf{a}_{\text{LOS}}^{\text{BR}}\in{\mathbb{C}^{M\times 1}}bold_a start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT and 𝐛LOSBRN×1superscriptsubscript𝐛LOSBRsuperscript𝑁1\mathbf{b}_{\text{LOS}}^{\text{BR}}\in{\mathbb{C}^{N\times 1}}bold_b start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT denote the transmitting array response at the BS and the receiving response at the RIS for the BS\toRIS LOS path, respectively. Due to the space limitation, the specific definition of 𝐚LOSBRsuperscriptsubscript𝐚LOSBR\mathbf{a}_{\text{LOS}}^{\text{BR}}bold_a start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT at the BS can refer to (5). The near-field UPA array response 𝐛LOSBRsuperscriptsubscript𝐛LOSBR\mathbf{b}_{\text{LOS}}^{\text{BR}}bold_b start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT at the RIS is given by

𝐛LOSBR=𝐛x(fb,ϕLOSR,φLOSR,rBR)𝐛z(fb,φLOSR,rBR),superscriptsubscript𝐛LOSBRtensor-productsubscript𝐛𝑥subscript𝑓𝑏subscriptsuperscriptitalic-ϕRLOSsubscriptsuperscript𝜑RLOSsuperscript𝑟BRsubscript𝐛𝑧subscript𝑓𝑏subscriptsuperscript𝜑RLOSsuperscript𝑟BR\displaystyle\mathbf{b}_{\text{LOS}}^{\text{BR}}=\mathbf{b}_{x}(f_{b},\phi^{% \text{R}}_{\text{LOS}},\varphi^{\text{R}}_{\text{LOS}},r^{\text{BR}})\otimes% \mathbf{b}_{z}(f_{b},\varphi^{\text{R}}_{\text{LOS}},r^{\text{BR}}),bold_b start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT = bold_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_ϕ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT , italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) ⊗ bold_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) , (9a)
[𝐛x]n1=ej2πfbc(n¯1dζLOSR+n¯12d2(1(ζLOSR)2)2rc,sBR),subscriptdelimited-[]subscript𝐛𝑥subscript𝑛1superscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐subscript¯𝑛1𝑑subscriptsuperscript𝜁RLOSsuperscriptsubscript¯𝑛12superscript𝑑21superscriptsubscriptsuperscript𝜁RLOS22subscriptsuperscript𝑟BR𝑐𝑠\displaystyle{\left[\mathbf{b}_{x}\right]_{{n}_{1}}=e^{-j\frac{2\pi f_{b}}{c}% \left(-\bar{n}_{1}d\zeta^{\text{R}}_{\text{LOS}}+\frac{{\bar{n}_{1}}^{2}d^{2}% \left(1-(\zeta^{\text{R}}_{\text{LOS}})^{2}\right)}{2r^{\text{BR}}_{c,s}}% \right)}},[ bold_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG ( - over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d italic_ζ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT + divide start_ARG over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - ( italic_ζ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT end_ARG ) end_POSTSUPERSCRIPT , (9b)
[𝐛z]n2=ej2πfbc(n¯2dcosφc,sR+n¯22d2sin2φc,sR2rc,sBR),subscriptdelimited-[]subscript𝐛𝑧subscript𝑛2superscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐subscript¯𝑛2𝑑superscriptsubscript𝜑𝑐𝑠Rsuperscriptsubscript¯𝑛22superscript𝑑2superscript2superscriptsubscript𝜑𝑐𝑠R2subscriptsuperscript𝑟BR𝑐𝑠\displaystyle{\left[\mathbf{b}_{z}\right]_{{n}_{2}}=e^{-j\frac{2\pi f_{b}}{c}% \left(-\bar{n}_{2}d\cos\varphi_{c,s}^{\text{R}}+\frac{{\bar{n}_{2}}^{2}d^{2}% \sin^{2}\varphi_{c,s}^{\text{R}}}{2r^{\text{BR}}_{c,s}}\right)},}[ bold_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c end_ARG ( - over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d roman_cos italic_φ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT + divide start_ARG over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT end_ARG ) end_POSTSUPERSCRIPT , (9c)

where ϕLOSRsubscriptsuperscriptitalic-ϕRLOS\phi^{\text{R}}_{\text{LOS}}italic_ϕ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT and φLOSRsubscriptsuperscript𝜑RLOS\varphi^{\text{R}}_{\text{LOS}}italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT denote the azimuth and elevation angle of AoA of LOS path at the RIS, respectively. Parameters n¯1=n1N1+12subscript¯𝑛1subscript𝑛1subscript𝑁112\bar{n}_{1}={n_{1}}-\frac{{{N_{1}}+1}}{2}over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG, n¯2=n2N2+12subscript¯𝑛2subscript𝑛2subscript𝑁212\bar{n}_{2}={n_{2}}-\frac{{{N_{2}}+1}}{2}over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 end_ARG start_ARG 2 end_ARG, ζLOSR=cosϕLOSRsinφLOSRsubscriptsuperscript𝜁RLOSsubscriptsuperscriptitalic-ϕRLOSsubscriptsuperscript𝜑RLOS\zeta^{\text{R}}_{\text{LOS}}=\cos\phi^{\text{R}}_{\text{LOS}}\sin\varphi^{% \text{R}}_{\text{LOS}}italic_ζ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT = roman_cos italic_ϕ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT roman_sin italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT, and rBR=𝐜B𝐜Rsuperscript𝑟BRnormsuperscript𝐜Bsuperscript𝐜Rr^{\text{BR}}=\left\|{\mathbf{c}^{\text{B}}-\mathbf{c}^{\text{R}}}\right\|italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT = ∥ bold_c start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT - bold_c start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT ∥ denotes the distance between the BS and the RIS. Note that the near-field LOS BS\toRIS channel in (8) includes an additional coupled component, i.e., 𝐆cx𝐆cztensor-productsubscriptsuperscript𝐆𝑥𝑐subscriptsuperscript𝐆𝑧𝑐\mathbf{G}^{x}_{c}\otimes\mathbf{G}^{z}_{c}bold_G start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⊗ bold_G start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which can be expressed as

[𝐆cx]n2=ej2πfbcrBRn¯2d2(1ζLOS2),subscriptdelimited-[]subscriptsuperscript𝐆𝑥𝑐subscript𝑛2superscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐superscript𝑟BRsubscript¯𝑛2superscript𝑑21subscriptsuperscript𝜁2LOS\displaystyle\left[\mathbf{G}^{x}_{c}\right]_{{n}_{2}}=e^{-j\frac{2\pi f_{b}}{% cr^{\text{BR}}}\bar{n}_{2}d^{2}\left(1-\zeta^{2}_{\text{LOS}}\right)},[ bold_G start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT end_ARG over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , (10a)
[𝐆cz]n1,m=ej2πfbcrBRn¯1md2sin2φLOSBR.subscriptdelimited-[]subscriptsuperscript𝐆𝑧𝑐subscript𝑛1𝑚superscript𝑒𝑗2𝜋subscript𝑓𝑏𝑐superscript𝑟BRsubscript¯𝑛1𝑚superscript𝑑2superscript2subscriptsuperscript𝜑BRLOS\displaystyle\left[\mathbf{G}^{z}_{c}\right]_{{n}_{1},m}=e^{-j\frac{2\pi f_{b}% }{cr^{\text{BR}}}\bar{n}_{1}md^{2}\sin^{2}\varphi^{\text{BR}}_{\text{LOS}}}.[ bold_G start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_m end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_j divide start_ARG 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_ARG start_ARG italic_c italic_r start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT end_ARG over¯ start_ARG italic_n end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_m italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_sin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_φ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (10b)

Due to the presence of the above coupled component, near-field LOS MIMO channels exhibit higher degree of freedoms (DoFs) than far-field LOS MIMO channels. Hence, the higher near-field DoFs can be exploited by constructing the virtual LOS path in RIS systems.

For the clustered NLOS components of the BS\toRIS channel, the NLOS channel 𝐆NLOS[τ]subscript𝐆NLOSdelimited-[]𝜏\mathbf{G}_{\text{NLOS}}[\tau]bold_G start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_τ ] at the τ𝜏\tauitalic_τ-th delay can be expressed as

𝐆NLOS[τ]=γBRc=1Css=1Scς¯c,sBR𝐛c,sBR(𝐚c,sBR)Tδ(τTsτc,sBR),subscript𝐆NLOSdelimited-[]𝜏superscript𝛾BRsuperscriptsubscript𝑐1subscript𝐶ssuperscriptsubscript𝑠1subscript𝑆𝑐subscriptsuperscript¯𝜍BR𝑐𝑠subscriptsuperscript𝐛BR𝑐𝑠superscriptsuperscriptsubscript𝐚𝑐𝑠BR𝑇𝛿𝜏subscript𝑇𝑠subscriptsuperscript𝜏BR𝑐𝑠\displaystyle\mathbf{G}_{\text{NLOS}}[\tau]={{\gamma^{\text{BR}}}\sum\limits_{% c=1}^{{{C_{\text{s}}}}}{\sum\limits_{s=1}^{{{{{S}_{c}}}}}}\bar{\varsigma}^{% \text{BR}}_{c,s}\mathbf{b}^{\text{BR}}_{c,s}({\mathbf{a}_{c,s}^{\text{BR}}})^{% T}}\delta(\tau T_{s}-{\tau^{\text{BR}}_{c,s}}),bold_G start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_τ ] = italic_γ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over¯ start_ARG italic_ς end_ARG start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT bold_b start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_δ ( italic_τ italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - italic_τ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) , (11)

where γBRsuperscript𝛾BR{\gamma^{\text{BR}}}italic_γ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT is a normalization factor for scatterer paths, and parameter ς¯c,sBR=ςc,sBRGBGRLc,sBRsubscriptsuperscript¯𝜍BR𝑐𝑠subscriptsuperscript𝜍BR𝑐𝑠subscript𝐺Bsubscript𝐺Rsuperscriptsubscript𝐿𝑐𝑠BR\bar{\varsigma}^{\text{BR}}_{c,s}={\varsigma}^{\text{BR}}_{c,s}\sqrt{G_{\text{% B}}G_{\text{R}}L_{c,s}^{\text{BR}}}over¯ start_ARG italic_ς end_ARG start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT = italic_ς start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT square-root start_ARG italic_G start_POSTSUBSCRIPT B end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT R end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT end_ARG is composed of the complex gain, the array gain and the path loss for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) in the BS\toRIS link. 𝐚c,sBRM×1subscriptsuperscript𝐚BR𝑐𝑠superscript𝑀1\mathbf{a}^{\text{BR}}_{c,s}\in{\mathbb{C}^{M\times 1}}bold_a start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT denotes the transmitting array response at the BS for the BS\toRIS NLOS path, and 𝐛c,sBR=𝐛x(fb,ϕc,sR,φc,sR,rc,sR)𝐛z(fb,φc,sR,rc,sR)N×1subscriptsuperscript𝐛BR𝑐𝑠tensor-productsubscript𝐛𝑥subscript𝑓𝑏subscriptsuperscriptitalic-ϕR𝑐𝑠subscriptsuperscript𝜑Rc,ssubscriptsuperscript𝑟R𝑐𝑠subscript𝐛𝑧subscript𝑓𝑏subscriptsuperscript𝜑R𝑐𝑠subscriptsuperscript𝑟R𝑐𝑠superscript𝑁1\mathbf{b}^{\text{BR}}_{c,s}=\mathbf{b}_{x}(f_{b},\phi^{\text{R}}_{c,s},% \varphi^{\text{R}}_{\text{c,s}},r^{\text{R}}_{c,s})\otimes\mathbf{b}_{z}(f_{b}% ,\varphi^{\text{R}}_{c,s},r^{\text{R}}_{c,s})\in{\mathbb{C}^{N\times 1}}bold_b start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT = bold_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_ϕ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT , italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT c,s end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) ⊗ bold_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT denotes the receiving response at the RIS. Parameters ϕc,sRsubscriptsuperscriptitalic-ϕR𝑐𝑠\phi^{\text{R}}_{c,s}italic_ϕ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT and φc,sRsubscriptsuperscript𝜑R𝑐𝑠\varphi^{\text{R}}_{c,s}italic_φ start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT denote the azimuth and elevation angle of AoA for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) at the RIS, respectively. rc,sR=𝐜R𝐜c,sSsubscriptsuperscript𝑟R𝑐𝑠normsuperscript𝐜Rsubscriptsuperscript𝐜S𝑐𝑠r^{\text{R}}_{c,s}=\left\|{\mathbf{c}^{\text{R}}-\mathbf{c}^{\text{S}}_{c,s}}\right\|italic_r start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT = ∥ bold_c start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT - bold_c start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ∥ and τc,sBRsubscriptsuperscript𝜏BR𝑐𝑠\tau^{\text{BR}}_{c,s}italic_τ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT denotes the path delay of scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) in the BS\toRIS link.

By carrying out the DFT on 𝐆[τ]𝐆delimited-[]𝜏\mathbf{G}[\tau]bold_G [ italic_τ ], the frequency-domain channel 𝐆[b]=𝐆LOS[b]+𝐆NLOS[b]𝐆delimited-[]𝑏subscript𝐆LOSdelimited-[]𝑏subscript𝐆NLOSdelimited-[]𝑏\mathbf{G}[b]=\mathbf{G}_{\text{LOS}}[b]+\mathbf{G}_{\text{NLOS}}[b]bold_G [ italic_b ] = bold_G start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_b ] + bold_G start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_b ] at the b𝑏bitalic_b-th subcarrier is given by

𝐆[b]𝐆delimited-[]𝑏\displaystyle\mathbf{G}[b]bold_G [ italic_b ] =ς¯LOSBR𝐛LOSBR(𝐚LOSBR)T(𝐆cx𝐆cz)ej2πbWτLOSBRB𝐆LOS[b]absentsubscriptdirect-productsubscriptsuperscript¯𝜍BRLOSsuperscriptsubscript𝐛LOSBRsuperscriptsuperscriptsubscript𝐚LOSBR𝑇tensor-productsubscriptsuperscript𝐆𝑥𝑐subscriptsuperscript𝐆𝑧𝑐superscript𝑒𝑗2𝜋𝑏𝑊superscriptsubscript𝜏LOSBR𝐵subscript𝐆LOSdelimited-[]𝑏\displaystyle{=}\underbrace{\bar{\varsigma}^{\text{BR}}_{\text{LOS}}\mathbf{b}% _{\text{LOS}}^{\text{BR}}({\mathbf{a}}_{\text{LOS}}^{\text{BR}})^{T}\odot\left% (\mathbf{G}^{x}_{c}\otimes\mathbf{G}^{z}_{c}\right)e^{\frac{-j2\pi bW{\tau_{% \text{LOS}}^{\text{BR}}}}{B}}}_{\mathbf{G}_{\text{LOS}}[b]}= under⏟ start_ARG over¯ start_ARG italic_ς end_ARG start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ( bold_a start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⊙ ( bold_G start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⊗ bold_G start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π italic_b italic_W italic_τ start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT end_ARG start_ARG italic_B end_ARG end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_b ] end_POSTSUBSCRIPT
+γBRc=1Css=1Scςc,sBRς¯c,sBR𝐛c,sBR(𝐚c,sBR)Tej2πbWτc,sBRB𝐆NLOS[b].subscriptsuperscript𝛾BRsuperscriptsubscript𝑐1subscript𝐶ssuperscriptsubscript𝑠1subscript𝑆𝑐subscriptsuperscript𝜍BR𝑐𝑠subscriptsuperscript¯𝜍BR𝑐𝑠subscriptsuperscript𝐛BR𝑐𝑠superscriptsuperscriptsubscript𝐚𝑐𝑠BR𝑇superscript𝑒𝑗2𝜋𝑏𝑊subscriptsuperscript𝜏BR𝑐𝑠𝐵subscript𝐆NLOSdelimited-[]𝑏\displaystyle{+}\underbrace{{{\gamma^{\text{BR}}}\sum\limits_{c=1}^{{{C_{\text% {s}}}}}{\sum\limits_{s=1}^{{{{{S}_{c}}}}}{{{{\varsigma}^{\text{BR}}_{c,s}}}}}% \bar{\varsigma}^{\text{BR}}_{c,s}\mathbf{b}^{\text{BR}}_{c,s}({\mathbf{a}_{c,s% }^{\text{BR}}})^{T}}e^{\frac{-j2\pi bW{\tau^{\text{BR}}_{c,s}}}{B}}}_{\mathbf{% G}_{\text{NLOS}}[b]}.+ under⏟ start_ARG italic_γ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT over¯ start_ARG italic_ς end_ARG start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT bold_b start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π italic_b italic_W italic_τ start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_B end_ARG end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_b ] end_POSTSUBSCRIPT . (12)

Similar to the BS\toRIS channel modeling, the frequency-domain RIS\toUE channel 𝐇[b]=𝐇LOS[b]+𝐇NLOS[b]U×N𝐇delimited-[]𝑏subscript𝐇LOSdelimited-[]𝑏subscript𝐇NLOSdelimited-[]𝑏superscript𝑈𝑁\mathbf{H}[b]=\mathbf{H}_{\text{LOS}}[b]+\mathbf{H}_{\text{NLOS}}[b]\in\mathbb% {C}^{U\times N}bold_H [ italic_b ] = bold_H start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_b ] + bold_H start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_b ] ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × italic_N end_POSTSUPERSCRIPT can be expressed as

𝐇[b]=𝐇delimited-[]𝑏absent\displaystyle\mathbf{H}[b]=bold_H [ italic_b ] = GRGULLOSRU𝐮LOSRU(𝐛LOSRU)T(𝐇cx𝐇cz)ej2πbWτLOSRUB𝐇LOS[b]subscriptdirect-productsubscript𝐺Rsubscript𝐺Usuperscriptsubscript𝐿LOSRUsuperscriptsubscript𝐮LOSRUsuperscriptsuperscriptsubscript𝐛LOSRU𝑇tensor-productsubscriptsuperscript𝐇𝑥𝑐subscriptsuperscript𝐇𝑧𝑐superscript𝑒𝑗2𝜋𝑏𝑊superscriptsubscript𝜏LOSRU𝐵subscript𝐇LOSdelimited-[]𝑏\displaystyle\underbrace{\sqrt{G_{\text{R}}G_{\text{U}}L_{\text{LOS}}^{{\text{% RU}}}}\mathbf{u}_{\text{LOS}}^{\text{RU}}({\mathbf{b}}_{\text{LOS}}^{\text{RU}% })^{T}\odot\left(\mathbf{H}^{x}_{c}\otimes\mathbf{H}^{z}_{c}\right)e^{\frac{-j% 2\pi bW{\tau_{\text{LOS}}^{\text{RU}}}}{B}}}_{\mathbf{H}_{\text{LOS}}[b]}under⏟ start_ARG square-root start_ARG italic_G start_POSTSUBSCRIPT R end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT U end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT end_ARG bold_u start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT ( bold_b start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⊙ ( bold_H start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ⊗ bold_H start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π italic_b italic_W italic_τ start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT end_ARG start_ARG italic_B end_ARG end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT [ italic_b ] end_POSTSUBSCRIPT
+γRUc=1Css=1Scςc,sRUGRGULc,sRU𝐮c,sRU(𝐛c,sRU)Tej2πbWτc,sRUB𝐇NLOS[b],subscriptsuperscript𝛾RUsuperscriptsubscript𝑐1subscript𝐶ssuperscriptsubscript𝑠1subscript𝑆𝑐subscriptsuperscript𝜍RU𝑐𝑠subscript𝐺Rsubscript𝐺Usuperscriptsubscript𝐿𝑐𝑠RUsubscriptsuperscript𝐮RU𝑐𝑠superscriptsubscriptsuperscript𝐛RU𝑐𝑠𝑇superscript𝑒𝑗2𝜋𝑏𝑊subscriptsuperscript𝜏RU𝑐𝑠𝐵subscript𝐇NLOSdelimited-[]𝑏\displaystyle+\underbrace{{{\gamma}^{\text{RU}}\sum\limits_{c=1}^{{{C_{\text{s% }}}}}{\sum\limits_{s=1}^{{{{{S}_{c}}}}}{{{{\varsigma}^{\text{RU}}_{c,s}}}}}% \sqrt{G_{\text{R}}G_{\text{U}}L_{c,s}^{\text{RU}}}\mathbf{u}^{\text{RU}}_{c,s}% (\mathbf{b}^{\text{RU}}_{c,s})^{T}}e^{\frac{-j2\pi bW\tau^{\text{RU}}_{c,s}}{B% }}}_{\mathbf{H}_{\text{NLOS}}[b]},+ under⏟ start_ARG italic_γ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_ς start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT square-root start_ARG italic_G start_POSTSUBSCRIPT R end_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT U end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT end_ARG bold_u start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ( bold_b start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π italic_b italic_W italic_τ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_B end_ARG end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT NLOS end_POSTSUBSCRIPT [ italic_b ] end_POSTSUBSCRIPT , (13)

where γRUsuperscript𝛾RU{\gamma^{\text{RU}}}italic_γ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT, ςc,sRUsubscriptsuperscript𝜍RU𝑐𝑠{{{\varsigma}}^{\text{RU}}_{c,s}}italic_ς start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT and Lc,sRUsuperscriptsubscript𝐿𝑐𝑠RUL_{c,s}^{\text{RU}}italic_L start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT are the normalization factor, the complex gain and the path loss for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) in the RIS\toUE link, respectively. 𝐮LOSRUU×1subscriptsuperscript𝐮RULOSsuperscript𝑈1\mathbf{u}^{\text{RU}}_{\text{LOS}}\in{\mathbb{C}^{U\times 1}}bold_u start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × 1 end_POSTSUPERSCRIPT is the near-field ULA array response for the LOS path, while 𝐮c,sRUU×1subscriptsuperscript𝐮RU𝑐𝑠superscript𝑈1\mathbf{u}^{\text{RU}}_{c,s}\in{\mathbb{C}^{U\times 1}}bold_u start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × 1 end_POSTSUPERSCRIPT denotes the far-field array response for scatterer (c,s)𝑐𝑠(c,s)( italic_c , italic_s ). 𝐛c,sRU=𝐛x(fb,ϕc,sRU,φc,sRU,rc,sRU)𝐛z(fb,φc,sRU,rc,sRU)N×1subscriptsuperscript𝐛RU𝑐𝑠tensor-productsubscript𝐛𝑥subscript𝑓𝑏subscriptsuperscriptitalic-ϕRU𝑐𝑠subscriptsuperscript𝜑RUc,ssubscriptsuperscript𝑟RU𝑐𝑠subscript𝐛𝑧subscript𝑓𝑏subscriptsuperscript𝜑RU𝑐𝑠subscriptsuperscript𝑟RU𝑐𝑠superscript𝑁1\mathbf{b}^{\text{RU}}_{c,s}=\mathbf{b}_{x}(f_{b},\phi^{\text{RU}}_{c,s},% \varphi^{\text{RU}}_{\text{c,s}},r^{\text{RU}}_{c,s})\otimes\mathbf{b}_{z}(f_{% b},\varphi^{\text{RU}}_{c,s},r^{\text{RU}}_{c,s})\in{\mathbb{C}^{N\times 1}}bold_b start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT = bold_b start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_ϕ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT , italic_φ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT c,s end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) ⊗ bold_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_φ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT , italic_r start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT denotes the receiving response at the RIS. Parameters ϕc,sRUsubscriptsuperscriptitalic-ϕRU𝑐𝑠\phi^{\text{RU}}_{c,s}italic_ϕ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT (ϕLOSRUsubscriptsuperscriptitalic-ϕRULOS\phi^{\text{RU}}_{\text{LOS}}italic_ϕ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT) and φc,sRUsubscriptsuperscript𝜑RU𝑐𝑠\varphi^{\text{RU}}_{c,s}italic_φ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c , italic_s end_POSTSUBSCRIPT (φLOSRUsubscriptsuperscript𝜑RULOS\varphi^{\text{RU}}_{\text{LOS}}italic_φ start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT start_POSTSUBSCRIPT LOS end_POSTSUBSCRIPT) denote the azimuth and elevation of AoD of scatterer path (c,s)𝑐𝑠(c,s)( italic_c , italic_s ) (LOS path) at the RIS, respectively.

II-C Wideband Beamforming Architecture in RIS Systems

Compared to conventional ELAA systems, the challenges of near-field beam focusing and the wideband beam split effect in RIS communications are significantly heightened. Firstly, in extremely-large RIS systems, the cascaded channel model in the BS\toRIS\toUE link comprises two near-field channel components, creating a double-hop beam focusing characteristic. Additionally, similar to the analog beamformer in the hybrid beamforming architecture at the BS, the beams at different subcarrier frequencies generated by the frequency-independent phase shifting circuit at the RIS point to different physical directions, resulting in a specific near-field double beam split effect for the considered RIS system. Finally, the active hybrid beamforming at the BS and the passive beamfroming at the RIS is highly coupled, necessitating joint design to optimize the overall communication system performance. Consequently, near-field wideband beamforming in RIS systems presents greater challenges than in conventional ELAA systems.

II-C1 TTD-Based Hybrid Beamforming at the BS

To exhibit the frequency-dependent hybrid beamforming at the BS, the TTD-based hybrid beamforming architecture is adopted [7]. Let 𝐅PS=blkdiag(𝐅PS,1,,𝐅PS,MRF)M×KMRFsubscript𝐅PSblkdiagsubscript𝐅PS1subscript𝐅PSsubscript𝑀RFsuperscript𝑀𝐾subscript𝑀RF\mathbf{F}_{\text{PS}}=\operatorname{blkdiag}\left(\mathbf{F}_{\text{PS},1},% \ldots,\mathbf{F}_{\text{PS},M_{\text{RF}}}\right)\in\mathbb{C}^{M\times K{M_{% \text{RF}}}}bold_F start_POSTSUBSCRIPT PS end_POSTSUBSCRIPT = roman_blkdiag ( bold_F start_POSTSUBSCRIPT PS , 1 end_POSTSUBSCRIPT , … , bold_F start_POSTSUBSCRIPT PS , italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_K italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denote the analog beamformer achieved by PSs, where 𝐅PS,n=blkdiag(𝐟mRF,1,,𝐟mRF,K)M×K(1mRFMRF)subscript𝐅PS𝑛blkdiagsubscript𝐟subscript𝑚RF1subscript𝐟subscript𝑚RF𝐾superscript𝑀𝐾1subscript𝑚RFsubscript𝑀RF\mathbf{F}_{\mathrm{PS},n}=\operatorname{blkdiag}\left(\mathbf{f}_{m_{\text{RF% }},1},\ldots,\mathbf{f}_{m_{\text{RF}},K}\right)\in\mathbb{C}^{M\times K}(1% \leq m_{\text{RF}}\leq{M_{\text{RF}}})bold_F start_POSTSUBSCRIPT roman_PS , italic_n end_POSTSUBSCRIPT = roman_blkdiag ( bold_f start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , 1 end_POSTSUBSCRIPT , … , bold_f start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , italic_K end_POSTSUBSCRIPT ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_K end_POSTSUPERSCRIPT ( 1 ≤ italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT ≤ italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT ) denotes the PS-based analog beamformer for the subarray connected to the mRFsubscript𝑚RFm_{\text{RF}}italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT-th RF chain, and 𝐟mRF,kP×1(1kK)subscript𝐟subscript𝑚RF𝑘superscript𝑃11𝑘𝐾\mathbf{f}_{m_{\text{RF}},k}\in\mathbb{C}^{P\times 1}(1\leq k\leq K)bold_f start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_P × 1 end_POSTSUPERSCRIPT ( 1 ≤ italic_k ≤ italic_K ) denotes the analog beamformer connected to the k𝑘kitalic_k-th TTD of this chain. 𝐅BB,bMRF×Nssubscript𝐅BB𝑏superscriptsubscript𝑀RFsubscript𝑁s\mathbf{F}_{\mathrm{BB},b}\in\mathbb{C}^{M_{\text{RF}}\times N_{\text{s}}}bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes the digital beamformer at subcarrier b𝑏bitalic_b, where Nssubscript𝑁sN_{\text{s}}italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT denotes the number of transmitting data stream. According to [7], the TTD-based analog beamformer at subcarrier b𝑏bitalic_b is given by

𝐓b=blkdiag(ej2πfb𝐭1,,ej2πfb𝐭MRF),subscript𝐓𝑏blkdiagsuperscript𝑒𝑗2𝜋subscript𝑓𝑏subscript𝐭1superscript𝑒𝑗2𝜋subscript𝑓𝑏subscript𝐭subscript𝑀RF\displaystyle\mathbf{T}_{b}=\operatorname{blkdiag}\left(e^{-j2\pi f_{b}\mathbf% {t}_{1}},\ldots,e^{-j2\pi f_{b}\mathbf{t}_{M_{\mathrm{RF}}}}\right),bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = roman_blkdiag ( italic_e start_POSTSUPERSCRIPT - italic_j 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_e start_POSTSUPERSCRIPT - italic_j 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_t start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT roman_RF end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) , (14)

where 𝐭mRF=[tmRF,1,,tmRF,K]K×1subscript𝐭subscript𝑚RFsubscript𝑡subscript𝑚RF1subscript𝑡subscript𝑚RF𝐾superscript𝐾1\mathbf{t}_{m_{\text{RF}}}=[t_{m_{\text{RF}},1},\ldots,t_{m_{\text{RF}},K}]\in% \mathbb{R}^{K\times 1}bold_t start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT end_POSTSUBSCRIPT = [ italic_t start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , italic_K end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_K × 1 end_POSTSUPERSCRIPT denotes the time-delay vector realized by the TTDs connected to the mRFsubscript𝑚RFm_{\text{RF}}italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT-th RF chain. The time delay of each TTD needs to satisfy the maximum delay constraint, i.e., tmRF,k[0,tmax],mRF=1,,MRF,k=1,,Kformulae-sequencesubscript𝑡subscript𝑚RF𝑘0subscript𝑡maxformulae-sequencefor-allsubscript𝑚RF1subscript𝑀RFfor-all𝑘1𝐾t_{m_{\text{RF}},k}\in[0,t_{\text{max}}],\forall m_{\text{RF}}=1,\ldots,M_{% \text{RF}},\forall k=1,\ldots,Kitalic_t start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ∈ [ 0 , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] , ∀ italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT = 1 , … , italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , ∀ italic_k = 1 , … , italic_K.

Hence, the transmitted signal 𝐱bM×1subscript𝐱𝑏superscript𝑀1{\mathbf{x}}_{b}\in\mathbb{C}^{M\times 1}bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT at the BS at subcarrier b𝑏bitalic_b is given by

𝐱b=𝐅PS𝐓b𝐅BB,b𝐬b,subscript𝐱𝑏subscript𝐅PSsubscript𝐓𝑏subscript𝐅BB𝑏subscript𝐬𝑏\displaystyle{\mathbf{x}}_{b}=\mathbf{F}_{\mathrm{PS}}\mathbf{T}_{b}\mathbf{F}% _{\mathrm{BB},b}{\mathbf{s}}_{b},bold_x start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = bold_F start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT bold_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , (15)

where 𝐬bNs×1subscript𝐬𝑏superscriptsubscript𝑁s1\mathbf{s}_{b}\in\mathbb{C}^{N_{\text{s}}\times 1}bold_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT denotes the information symbols at subcarrier b𝑏bitalic_b, satisfying 𝔼[𝐬b𝐬bH]=1Ns𝐈Ns𝔼delimited-[]subscript𝐬𝑏subscriptsuperscript𝐬𝐻𝑏1subscript𝑁ssubscript𝐈subscript𝑁s\mathbb{E}[\mathbf{s}_{b}\mathbf{s}^{H}_{b}]=\frac{1}{N_{\text{s}}}\mathbf{I}_% {N_{\text{s}}}blackboard_E [ bold_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_s start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_ARG bold_I start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

II-C2 TTD-Based Phase-Shifting at the RIS

For the classic RIS architecture, the refection coefficients can be expressed as 𝜽=[β1ejθ1,β2ejθ2,,βNejθN]TN×1𝜽superscriptsubscript𝛽1superscript𝑒𝑗subscript𝜃1subscript𝛽2superscript𝑒𝑗subscript𝜃2subscript𝛽𝑁superscript𝑒𝑗subscript𝜃𝑁𝑇superscript𝑁1{\bm{\theta}}={[{\beta_{1}}{e^{j{\theta_{1}}}},{\beta_{2}}{e^{j{\theta_{2}}}},% \ldots,{\beta_{N}}{e^{j{\theta_{N}}}}]^{T}}\in{\mathbb{C}^{N\times 1}}bold_italic_θ = [ italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_j italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_j italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , … , italic_β start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_j italic_θ start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, where θi(i=1,,N)subscript𝜃𝑖𝑖1𝑁{\theta}_{i}(i=1,\ldots,N)italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_i = 1 , … , italic_N ) and βi{0,1}subscript𝛽𝑖01{\beta_{i}}\in\{0,1\}italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } denote the phase shift and the ON/OFF state of the i𝑖iitalic_i-th RIS element, respectively. To mitigate the beam split effect at the RIS, as discussed in [18, 19] and [24], the frequency-dependent RIS architecture with fully-connected TTD units is employed, in which each RIS element is connected to an independent TTD unit. However, the power consumption and hardware cost associated with TTD units are significantly higher than those of conventional PSs. Inspired by the sub-connected TTD architecture in [8], we develop a sub-connected TDD-RIS architecture, as depicted in Fig. 1(a), to realize the energy-efficiency frequency-dependent phase shifting operation, in which each subarray at the RIS is connected to a common TTD unit.

The TTD-RIS is divided into S𝑆Sitalic_S = S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT subarrays, where each subarray consists of S¯¯𝑆\bar{S}over¯ start_ARG italic_S end_ARG = S¯1subscript¯𝑆1\bar{S}_{1}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × S¯2subscript¯𝑆2\bar{S}_{2}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT elements, i.e., S¯1=N1S1subscript¯𝑆1subscript𝑁1subscript𝑆1\bar{S}_{1}=\frac{N_{1}}{S_{1}}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG and S¯2=N2S2subscript¯𝑆2subscript𝑁2subscript𝑆2\bar{S}_{2}=\frac{N_{2}}{S_{2}}over¯ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG. To deal with the beam split effect at the RIS, each element of the subarray is equipped with double PS layers and is connected to a common TTD module. The received signal at each RIS element first passes through the first-layer PS with reflection coefficients 𝜽1N×1subscript𝜽1superscript𝑁1{\bm{\theta}_{1}}\in{\mathbb{C}^{N\times 1}}bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, which aims to create the constructive received signal superposition at the subarray. Then, the im**ing signal at the s𝑠sitalic_s-th RIS subarray (1sS)1𝑠𝑆(1\leq s\leq S)( 1 ≤ italic_s ≤ italic_S ) is adjusted by the common TTD module with the time delay νs[0,tmax]subscript𝜈𝑠0subscript𝑡max\nu_{s}\in[0,t_{\text{max}}]italic_ν start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∈ [ 0 , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] to realize the frequency-dependent phase shifting. Finally, the signal passes through the second-layer PS with reflection coefficients 𝜽2N×1subscript𝜽2superscript𝑁1{\bm{\theta}_{2}}\in{\mathbb{C}^{N\times 1}}bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT to accomplish the passive beamforming at the RIS111In the proposed TDD-RIS architecture with sub-connected TTD units, the conventional single-layer PS circuit used in the costly fully-connected TTD architecture needs to be extended to the double-layer PS circuit. Employing a single-layer PS in the TDD-RIS architecture could lead to incoherent mixing of the received signals at a RIS subarray after they pass through a common TTD unit. This configuration might introduce critical interference and result in severe attenuation of the desired received signal [8].. Hence, the equivalent refection phase shifting matrix at subcarrier b𝑏bitalic_b for the TTD-RIS can be expressed as

𝚯¯b=𝚯1𝚲b𝚯2,subscript¯𝚯𝑏subscript𝚯1subscript𝚲𝑏subscript𝚯2\displaystyle\bar{\bm{\Theta}}_{b}=\bm{\Theta}_{1}{\bm{\Lambda}}_{b}\bm{\Theta% }_{2},over¯ start_ARG bold_Θ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = bold_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_Λ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (16)

where 𝚯1=diag(𝜽1)subscript𝚯1diagsubscript𝜽1\bm{\Theta}_{1}=\mathrm{diag}({\bm{\theta}_{1}})bold_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_diag ( bold_italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), 𝚯2=diag(𝜽2)subscript𝚯2diagsubscript𝜽2\bm{\Theta}_{2}=\mathrm{diag}({\bm{\theta}_{2}})bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = roman_diag ( bold_italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ), and 𝚲b=diag(𝚲b,1,,𝚲b,S)subscript𝚲𝑏diagsubscript𝚲𝑏1subscript𝚲𝑏𝑆{\bm{\Lambda}}_{b}=\mathrm{diag}({\bm{\Lambda}}_{b,1},\ldots,{\bm{\Lambda}}_{b% ,S})bold_Λ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = roman_diag ( bold_Λ start_POSTSUBSCRIPT italic_b , 1 end_POSTSUBSCRIPT , … , bold_Λ start_POSTSUBSCRIPT italic_b , italic_S end_POSTSUBSCRIPT ). The time delay vector 𝚲b,sS¯×1subscript𝚲𝑏𝑠superscript¯𝑆1{\bm{\Lambda}}_{b,s}\in{\mathbb{C}^{\bar{S}\times 1}}bold_Λ start_POSTSUBSCRIPT italic_b , italic_s end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT over¯ start_ARG italic_S end_ARG × 1 end_POSTSUPERSCRIPT at the s𝑠sitalic_s-th subarray is given by 𝚲b,s=𝟏S¯ej2πfbνssubscript𝚲𝑏𝑠tensor-productsubscript1¯𝑆superscript𝑒𝑗2𝜋subscript𝑓𝑏subscript𝜈𝑠{\bm{\Lambda}}_{b,s}=\mathbf{1}_{\bar{S}}\otimes e^{-j2\pi f_{b}{\nu}_{s}}bold_Λ start_POSTSUBSCRIPT italic_b , italic_s end_POSTSUBSCRIPT = bold_1 start_POSTSUBSCRIPT over¯ start_ARG italic_S end_ARG end_POSTSUBSCRIPT ⊗ italic_e start_POSTSUPERSCRIPT - italic_j 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT italic_ν start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT.

II-C3 Virtual Subarray-Based Phase-Shifting at the RIS

In the above TTD-RIS architecture, the additional TTD units and double-layer phase shifting circuit are required, which increases the hardware cost and energy consumption in RIS systems. In Fig. 1(b), we present an SA-RIS architecture by dividing the RIS into B𝐵Bitalic_B virtual subarrays, in which the reflection coefficients 𝜽~bNB×1subscript~𝜽𝑏superscript𝑁𝐵1\tilde{\bm{\theta}}_{b}\in{\mathbb{C}^{\frac{N}{B}\times 1}}over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT divide start_ARG italic_N end_ARG start_ARG italic_B end_ARG × 1 end_POSTSUPERSCRIPT at subarray b𝑏bitalic_b is optimized according to the channels 𝐃[b],𝐆[b]𝐃delimited-[]𝑏𝐆delimited-[]𝑏\mathbf{D}[b],\mathbf{G}[b]bold_D [ italic_b ] , bold_G [ italic_b ] and 𝐇[b]𝐇delimited-[]𝑏\mathbf{H}[b]bold_H [ italic_b ] at subcarrier b𝑏bitalic_b. In this case, the refection phase shifting matrix at the SA-RIS is given by 𝚯~=diag(𝜽~1,,𝜽~B)~𝚯diagsubscript~𝜽1subscript~𝜽𝐵\tilde{\mathbf{\Theta}}=\mathrm{diag}(\tilde{\bm{\theta}}_{1},\ldots,\tilde{% \bm{\theta}}_{B})over~ start_ARG bold_Θ end_ARG = roman_diag ( over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ).

Remark 1: On the one hand, compared to the TTD-RIS architecture, the SA-RIS architecture does not require the additional hardware cost, while the beamforming gain will be reduced due to the RIS aperture shrinkage at the specific subcarrier. On other hand, for the specific design of optimization algorithms, the frequency-dependent phase shifting composed of 𝚯1subscript𝚯1\bm{\Theta}_{1}bold_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝚯2subscript𝚯2\bm{\Theta}_{2}bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝚲bsubscript𝚲𝑏\bm{\Lambda}_{b}bold_Λ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT at the TTD-RIS architecture is more complex, while the dimension of phase shifting at the SA-RIS architecture is similar to the classic RIS architecture.

II-D Problem Formulation

In the downlink signal transmission at the q𝑞qitalic_q-th slot, the received signal 𝐲q,bU×1subscript𝐲𝑞𝑏superscript𝑈1{\mathbf{y}_{q,b}}\in{\mathbb{C}^{U\times 1}}bold_y start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × 1 end_POSTSUPERSCRIPT of UE at subcarrier b𝑏bitalic_b can be expressed as

𝐲q,b=Pt(𝐇𝚯q,bfb𝐆b+𝐃)b𝐱q,b+𝐧q,b,\displaystyle{\mathbf{y}_{q,b}}=\sqrt{P_{t}}\left({\mathbf{{H}}{{}_{b}}{\bm{% \Theta}^{f}_{q,b}}\mathbf{G}_{b}+\mathbf{{D}}{{}_{b}}}\right){\mathbf{x}}_{q,b% }+{\mathbf{n}_{q,b}},bold_y start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT = square-root start_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ( bold_H start_FLOATSUBSCRIPT italic_b end_FLOATSUBSCRIPT bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT + bold_D start_FLOATSUBSCRIPT italic_b end_FLOATSUBSCRIPT ) bold_x start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT + bold_n start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT , (17)

where Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the transmission power per stream and 𝐧q,b𝒞𝒩(0,σb2𝐈U)similar-tosubscript𝐧𝑞𝑏𝒞𝒩0superscriptsubscript𝜎𝑏2subscript𝐈𝑈{{\mathbf{n}}_{q,b}}\sim\mathcal{C}\mathcal{N}(0,{{{\sigma}}_{b}^{2}{\mathbf{I% }}_{U}})bold_n start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ) denotes complex Gaussian noise. According to different RIS architectures, 𝚯q,bf=diag(𝜽q,b,1f,,𝜽q,b,Nf)subscriptsuperscript𝚯𝑓𝑞𝑏diagsubscriptsuperscript𝜽𝑓𝑞𝑏1subscriptsuperscript𝜽𝑓𝑞𝑏𝑁{\bm{\Theta}^{f}_{q,b}}=\mathrm{diag}({\bm{\theta}^{f}_{q,b,1}},\ldots,{\bm{% \theta}^{f}_{q,b,N}})bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT = roman_diag ( bold_italic_θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b , 1 end_POSTSUBSCRIPT , … , bold_italic_θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b , italic_N end_POSTSUBSCRIPT ), f{,𝕋,𝕍}for-all𝑓𝕋𝕍\forall f\in\{\mathbb{P},\mathbb{T},\mathbb{V}\}∀ italic_f ∈ { blackboard_P , blackboard_T , blackboard_V }, denotes the corresponding RIS reflection coefficients at slot q𝑞qitalic_q, in which the indicator symbol \mathbb{P}blackboard_P, 𝕋𝕋\mathbb{T}blackboard_T, and 𝕍𝕍\mathbb{V}blackboard_V are associated with the frequency-independent RIS, the TTD-RIS, and the SA-RIS, respectively.

In this work, we aim to maximize the spectral efficiency of near-field wideband RIS systems by optimizing the TTD-based hybrid beamforming at the BS and the frequency-dependent phase shifting at the RIS. Considering the widely used block fading channel assumption in RIS systems [25], the channels remain constant within each channel coherent block that consists of Q𝑄Qitalic_Q symbol durations. The achievable communication rate of UE at subcarrier b𝑏bitalic_b within each channel coherent block is given by

Rb=log2det(𝐈U+PtNsδ2𝐙b𝐀b𝐀bH𝐙bH),subscript𝑅𝑏subscript2detsubscript𝐈𝑈subscript𝑃𝑡subscript𝑁ssuperscript𝛿2subscript𝐙𝑏subscript𝐀𝑏subscriptsuperscript𝐀𝐻𝑏subscriptsuperscript𝐙𝐻𝑏\displaystyle R_{b}=\log_{2}\mathrm{det}\left({\bf{I}}_{U}+\frac{P_{t}}{N_{% \text{s}}\delta^{2}}\mathbf{Z}_{b}\mathbf{A}_{b}\mathbf{A}^{H}_{b}\mathbf{Z}^{% H}_{b}\right),italic_R start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_det ( bold_I start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT + divide start_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG bold_Z start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_A start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_Z start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) , (18)

where 𝐙b=𝐇𝚯qfb𝐆b+𝐃b\mathbf{Z}_{b}={\mathbf{{H}}{{}_{b}}{\bm{\Theta}^{f}_{q}}\mathbf{G}_{b}+% \mathbf{{D}}{{}_{b}}}bold_Z start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = bold_H start_FLOATSUBSCRIPT italic_b end_FLOATSUBSCRIPT bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT + bold_D start_FLOATSUBSCRIPT italic_b end_FLOATSUBSCRIPT and 𝐀b=𝐅PS𝐓b𝐅BB,bsubscript𝐀𝑏subscript𝐅PSsubscript𝐓𝑏subscript𝐅BB𝑏\mathbf{A}_{b}=\mathbf{F}_{\mathrm{PS}}\mathbf{T}_{b}\mathbf{F}_{\mathrm{BB},b}bold_A start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = bold_F start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT.

In the existing near-field wideband beamforming schemes in RIS systems [20, 18], the CSI is assumed to be perfectly known when optimizing the involved beamforming variables. In fact, for near-field wideband RIS systems, the accurate channel acquisition is challenging due to the extremely large-scale antenna arrays and the passive characteristic of RIS. Considering the typical division duplex systems with the channel reciprocity, the downlink channel can be obtained by estimating the uplink channel. Hence, the effective spectral efficiency can be expressed as

R=QQtrQ(LCP+B)b=1BRb,𝑅𝑄subscript𝑄tr𝑄subscript𝐿CP𝐵superscriptsubscript𝑏1𝐵subscript𝑅𝑏\displaystyle R=\frac{Q-Q_{\text{tr}}}{Q(L_{\text{CP}}+B)}\sum\limits_{b=1}^{B% }R_{b},italic_R = divide start_ARG italic_Q - italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_ARG start_ARG italic_Q ( italic_L start_POSTSUBSCRIPT CP end_POSTSUBSCRIPT + italic_B ) end_ARG ∑ start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT italic_R start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , (19)

where QtrQsubscript𝑄tr𝑄Q_{\text{tr}}\leq Qitalic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT ≤ italic_Q denotes the number of symbol durations for the channel training, and LCPsubscript𝐿CPL_{\text{CP}}italic_L start_POSTSUBSCRIPT CP end_POSTSUBSCRIPT denotes the length of cyclic prefix in OFDM systems.

Remark 2: In the considered near-field wideband RIS systems, BMU(N+1)𝐵𝑀𝑈𝑁1BMU(N+1)italic_B italic_M italic_U ( italic_N + 1 ) unknown entries need to be estimated in the channel estimation, which requires the high pilot training overhead Qtrsubscript𝑄trQ_{\text{tr}}italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT. In spite of the fact that several low-overhead channel estimation schemes have been investigated for far-field or narrowband RIS systems[25, 26], e.g., the compressed sensing approach by exploiting the channel sparsity and the deep learning-based intelligent channel estimation scheme [27], the aforementioned channel estimation approaches are hard to directly extended to near-field wideband RIS systems due to the specific near-field radiation and beam split effect. Consequently, the optimization of effective spectral efficiency not only depends on the beamforming design, but is also related to the channel estimation scheme.

To sum up, the spectral efficiency maximization problem in near-field wideband RIS systems can be formulated as

max𝐅PS,𝐓b,𝐅BB,b,𝚯bf,Qtr\displaystyle\max_{\mathbf{F}_{\mathrm{PS}},\mathbf{T}_{b},\mathbf{F}_{\mathrm% {BB},b},\hfill\atop\scriptstyle{\bm{\Theta}^{f}_{b}},Q_{\text{tr}}}roman_max start_POSTSUBSCRIPT FRACOP start_ARG bold_F start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT , bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT , end_ARG start_ARG bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT R(𝐅PS,𝐓b,𝐅BB,b,𝚯bf,Qtr)𝑅subscript𝐅PSsubscript𝐓𝑏subscript𝐅BB𝑏subscriptsuperscript𝚯𝑓𝑏subscript𝑄tr\displaystyle R\left(\mathbf{F}_{\mathrm{PS}},\mathbf{T}_{b},\mathbf{F}_{% \mathrm{BB},b},{\bm{\Theta}^{f}_{b}},Q_{\text{tr}}\right)italic_R ( bold_F start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT , bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT , bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT ) (20a)
s.t. 𝐅PS𝐓b𝐅BB,bF2=ρ,b,superscriptsubscriptnormsubscript𝐅PSsubscript𝐓𝑏subscript𝐅BB𝑏𝐹2𝜌for-all𝑏\displaystyle\left\|\mathbf{F}_{\mathrm{PS}}\mathbf{T}_{b}\mathbf{F}_{\mathrm{% BB},b}\right\|_{F}^{2}=\rho,\forall b,∥ bold_F start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_ρ , ∀ italic_b , (20b)
|𝐟mRF,k|=1,mRF,k,subscript𝐟subscript𝑚RF𝑘1for-allsubscript𝑚RFfor-all𝑘\displaystyle|\mathbf{f}_{m_{\text{RF}},k}|=1,\forall m_{\text{RF}},\forall k,| bold_f start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT | = 1 , ∀ italic_m start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT , ∀ italic_k , (20c)
|[𝜽bf]i|=1,i={1,,N},formulae-sequencesubscriptdelimited-[]subscriptsuperscript𝜽𝑓𝑏𝑖1𝑖1𝑁\displaystyle|[{\bm{\theta}^{f}_{b}}]_{i}|=1,i=\{1,\ldots,N\},| [ bold_italic_θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = 1 , italic_i = { 1 , … , italic_N } , (20d)
𝐓b𝒯b,b,subscript𝐓𝑏subscript𝒯𝑏for-all𝑏\displaystyle\mathbf{T}_{b}\in\mathcal{T}_{b},\forall b,bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∈ caligraphic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , ∀ italic_b , (20e)

where ρ𝜌\rhoitalic_ρ denotes the transmit power available of the precoder for each subcarrier at the BS. 𝒯bsubscript𝒯𝑏\mathcal{T}_{b}caligraphic_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is a feasible set of the TTD-based analog beamformers imposed by the structure in (14) and the maximum time delay constraint. Note that the frequency-dependent phase shifting 𝚯b𝕋subscriptsuperscript𝚯𝕋𝑏{\bm{\Theta}^{\mathbb{T}}_{b}}bold_Θ start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT at the TTD-RIS architecture involves three coupled variables, i.e., 𝚯1subscript𝚯1\bm{\Theta}_{1}bold_Θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝚯2subscript𝚯2\bm{\Theta}_{2}bold_Θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝚲bsubscript𝚲𝑏\bm{\Lambda}_{b}bold_Λ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, to be optimized, while the optimization of 𝚯b𝕍subscriptsuperscript𝚯𝕍𝑏{\bm{\Theta}^{\mathbb{V}}_{b}}bold_Θ start_POSTSUPERSCRIPT blackboard_V end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is composed of B𝐵Bitalic_B independent phase shifting subarrays for the SA-RIS architecture.

III Deep Learning Based End-to-End Beamforming Framework

Refer to caption


Figure 3: The proposed E2E beamforming framework in near-field wideband RIS systems.

To solve the non-convex and high-dimensional optimization problem in (20), we develop an E2E beamforming optimization framework by leveraging the powerful non-linear map** ability of the deep learning model. As illustrated in Fig. 3, the overall E2E framework can be divided into the UL-CT module in the uplink pilot transmission stage and the DL-BF module in the downlink signal transmission.

III-A Deep Learning-Based Uplink Channel Training

For the design of channel estimation scheme in RIS systems, the channel estimation performance is related to the joint design of the channel estimator and the RIS reflection protocol. For instance, in [28], the DFT-based reflection protocol has been proven to be optimal for the classic minimum variance unbiased estimator in narrowband far-field RIS systems. However, the existing reflection protocol need to be further developed for near-field wideband RIS systems. Firstly, the assumption of planar wavefront in far-field communications is no longer applicable for near-field systems, while the RIS reflection protocol need to match the spherical wavefront characteristics instead of the far-field assumption of planar wavefront. Secondly, in the reflection protocol design for wideband RIS systems, the frequency-dependent hybrid beamforming and phase shifting involve the new time-delay dimension. Thirdly, the key characteristic of deep learning estimator is the adaptive learning ability for the latent representation of the wireless signal, while the advantage of RIS is to operate the wireless environment as a passive reflector. However, for the deep learning-based channel estimator, the reflection protocol design in near-field wideband RIS systems has not been investigated.

Consequently, in the proposed E2E optimization framework, we develop a learnable UL-CT module to learn high-dimensional channel semantics, in which the trainable wideband phase shifting at the RIS and the combining matrix at the BS are designed. In contrast to the pre-defined RIS reflection pattern in the existing channel estimation works [28, 25], the phase shifting and combining matrix in the proposed E2E model can be adaptively tuned according to dynamic wireless environments. Suppose Qtrsubscript𝑄trQ_{\text{tr}}italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT OFDM pilots are used for the channel training, the received pilot signal 𝐘q,bMRF×1subscript𝐘𝑞𝑏superscriptsubscript𝑀RF1\mathbf{Y}_{q,b}\in{\mathbb{C}^{M_{\text{RF}}\times 1}}bold_Y start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT at slot q(1qQtr)𝑞1𝑞subscript𝑄trq(1\leq q\leq Q_{\text{tr}})italic_q ( 1 ≤ italic_q ≤ italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT ) in subcarrier b𝑏bitalic_b of the BS is given by

𝐘q,b=𝐖q,b(𝐇b𝚯bf𝐆q,b+𝐃b)𝐗q,btr+𝐧q,btr,subscript𝐘𝑞𝑏subscript𝐖𝑞𝑏subscript𝐇𝑏subscriptsuperscript𝚯𝑓𝑏subscript𝐆𝑞𝑏subscript𝐃𝑏subscriptsuperscript𝐗tr𝑞𝑏subscriptsuperscript𝐧tr𝑞𝑏\displaystyle{\mathbf{Y}_{q,b}}={\mathbf{W}}_{q,b}\left({\mathbf{{H}}_{b}{\bm{% \Theta}^{f}_{b}}\mathbf{G}_{q,b}+\mathbf{{D}}_{b}}\right){\mathbf{X}}^{\text{% tr}}_{q,b}+{\mathbf{n}^{\text{tr}}_{q,b}},bold_Y start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT = bold_W start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ( bold_H start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT + bold_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) bold_X start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT + bold_n start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT , (21)

where 𝐗q,btrU×1subscriptsuperscript𝐗tr𝑞𝑏superscript𝑈1\mathbf{X}^{\text{tr}}_{q,b}\in{\mathbb{C}^{U\times 1}}bold_X start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_U × 1 end_POSTSUPERSCRIPT denotes the pilot signal sent by the UE at slot q𝑞qitalic_q in subcarrier b𝑏bitalic_b, 𝐖q,bMRF×Msubscript𝐖𝑞𝑏superscriptsubscript𝑀RF𝑀\mathbf{W}_{q,b}\in{\mathbb{C}^{M_{\text{RF}}\times M}}bold_W start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_M end_POSTSUPERSCRIPT denotes the uplink combining matrix in subcarrier b𝑏bitalic_b at the BS, and complex Gaussian noise follows 𝐧q,btr𝒞𝒩(0,σb2𝐈MRF)similar-tosubscriptsuperscript𝐧tr𝑞𝑏𝒞𝒩0superscriptsubscript𝜎𝑏2subscript𝐈subscript𝑀RF{{\mathbf{n}}^{\text{tr}}_{q,b}}\sim\mathcal{C}\mathcal{N}(0,{{{\sigma}}_{b}^{% 2}{\mathbf{I}}_{M_{\text{RF}}}})bold_n start_POSTSUPERSCRIPT tr end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). In this work, the phase shifting 𝚯q,bfsubscriptsuperscript𝚯𝑓𝑞𝑏{\bm{\Theta}^{f}_{q,b}}bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT, f{,𝕋,𝕍}for-all𝑓𝕋𝕍\forall f\in\{\mathbb{P},\mathbb{T},\mathbb{V}\}∀ italic_f ∈ { blackboard_P , blackboard_T , blackboard_V } and the combining matrix 𝐖q,bsubscript𝐖𝑞𝑏\mathbf{W}_{q,b}bold_W start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT are designed the trainable tensors, which are optimized by utilizing the massive training data.

In near-field wideband RIS systems, 𝚯q,bfsubscriptsuperscript𝚯𝑓𝑞𝑏{\bm{\Theta}^{f}_{q,b}}bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT and 𝐖q,bsubscript𝐖𝑞𝑏\mathbf{W}_{q,b}bold_W start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT need to be restricted by the specific constraint. Specifically, for the TTD-RIS architecture as illustrated in (16), the frequency-dependent phase shifting 𝚯q𝕋subscriptsuperscript𝚯𝕋𝑞{\bm{\Theta}^{\mathbb{T}}_{q}}bold_Θ start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT at slot q𝑞qitalic_q consists of the learnable phase shifting tensor 𝚯^q,1𝕋N×Nsubscriptsuperscript^𝚯𝕋𝑞1superscript𝑁𝑁\widehat{\bm{\Theta}}^{\mathbb{T}}_{q,1}\in{\mathbb{C}^{N\times N}}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT, 𝚯^q,2𝕋N×Nsubscriptsuperscript^𝚯𝕋𝑞2superscript𝑁𝑁\widehat{\bm{\Theta}}^{\mathbb{T}}_{q,2}\in{\mathbb{C}^{N\times N}}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT, and time delay tensor 𝚲^qS×Bsubscript^𝚲𝑞superscript𝑆𝐵\widehat{\mathbf{\Lambda}}_{q}\in{\mathbb{C}^{S\times B}}over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_B end_POSTSUPERSCRIPT. The diagonal elements in 𝚯^q,1𝕋subscriptsuperscript^𝚯𝕋𝑞1\widehat{\bm{\Theta}}^{\mathbb{T}}_{q,1}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT and 𝚯^q,2𝕋subscriptsuperscript^𝚯𝕋𝑞2\widehat{\bm{\Theta}}^{\mathbb{T}}_{q,2}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT should satisfy the unit modular constraint. Thus, the complex exponent function exp(j)\mathrm{exp}(\mathrm{j}\cdot)roman_exp ( roman_j ⋅ ) is applied to obtain the desired uplink 𝚯^q,1upsubscriptsuperscript^𝚯up𝑞1\widehat{\bm{\Theta}}^{\text{up}}_{q,1}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT up end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT and 𝚯^q,2upsubscriptsuperscript^𝚯up𝑞2\widehat{\bm{\Theta}}^{\text{up}}_{q,2}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT up end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT, which is given by

{𝚯^q,1up,𝕋=exp(j𝚯^q,1𝕋),𝚯^q,2up,𝕋=exp(j𝚯^q,2𝕋),q.casessubscriptsuperscript^𝚯up𝕋𝑞1expjsubscriptsuperscript^𝚯𝕋𝑞1subscriptsuperscript^𝚯up𝕋𝑞2expjsubscriptsuperscript^𝚯𝕋𝑞2for-all𝑞\displaystyle\left\{\begin{array}[]{l}\widehat{\bm{\Theta}}^{\text{up},\mathbb% {T}}_{q,1}=\mathrm{exp}\left({\mathrm{j}\cdot\widehat{\bm{\Theta}}^{\mathbb{T}% }_{q,1}}\right),\\ \widehat{\bm{\Theta}}^{\text{up},\mathbb{T}}_{q,2}=\mathrm{exp}\left({\mathrm{% j}\cdot\widehat{\bm{\Theta}}^{\mathbb{T}}_{q,2}}\right),\forall q.\\ \end{array}\right.{ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT up , blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT = roman_exp ( roman_j ⋅ over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 1 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT up , blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT = roman_exp ( roman_j ⋅ over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , 2 end_POSTSUBSCRIPT ) , ∀ italic_q . end_CELL end_ROW end_ARRAY (24)

Due to the maximum time-delay constraint in the TTD unit, each learnable time delay element ν^s,bsubscript^𝜈𝑠𝑏\widehat{\nu}_{s,b}over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT italic_s , italic_b end_POSTSUBSCRIPT in 𝚲^qsubscript^𝚲𝑞\widehat{\mathbf{\Lambda}}_{q}over^ start_ARG bold_Λ end_ARG start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT need to be normalized by the maximum delay tmaxsubscript𝑡maxt_{\text{max}}italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, which is given by

ν^s,bup=tmax11+eν^s,b,s,b.subscriptsuperscript^𝜈up𝑠𝑏subscript𝑡max11superscript𝑒subscript^𝜈𝑠𝑏for-all𝑠𝑏\displaystyle\widehat{\nu}^{\text{up}}_{s,b}=t_{\text{max}}\frac{1}{1+e^{-% \widehat{\nu}_{s,b}}},\forall s,b.over^ start_ARG italic_ν end_ARG start_POSTSUPERSCRIPT up end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_b end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - over^ start_ARG italic_ν end_ARG start_POSTSUBSCRIPT italic_s , italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG , ∀ italic_s , italic_b . (25)

In the SA-RIS without TTD units, the uplink phase shifting 𝚯^qup,𝕍subscriptsuperscript^𝚯up𝕍𝑞\widehat{\bm{\Theta}}^{\text{up},\mathbb{V}}_{q}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT up , blackboard_V end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT is directly normalized by utilizing the complex exponent function. For the learnable uplink combining tensor at the BS, the uplink 𝐖q,bupsubscriptsuperscript𝐖up𝑞𝑏{\mathbf{W}}^{\text{up}}_{q,b}bold_W start_POSTSUPERSCRIPT up end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT should satisfy the constant modular constraint, which is given by [29]

𝐖q,bup=1Mexp(1j𝐖q,b),q,b.subscriptsuperscript𝐖up𝑞𝑏1𝑀exp1jsubscript𝐖𝑞𝑏for-all𝑞𝑏\displaystyle{\mathbf{W}}^{\text{up}}_{q,b}=\frac{1}{\sqrt{M}}\mathrm{exp}% \left({1\mathrm{j}\cdot{\mathbf{W}}_{q,b}}\right),\forall q,b.bold_W start_POSTSUPERSCRIPT up end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_M end_ARG end_ARG roman_exp ( 1 roman_j ⋅ bold_W start_POSTSUBSCRIPT italic_q , italic_b end_POSTSUBSCRIPT ) , ∀ italic_q , italic_b . (26)

After Qtrsubscript𝑄trQ_{\text{tr}}italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT pilots transmission, we can obtain B×MRF×Qtr𝐵subscript𝑀RFsubscript𝑄trB\times M_{\text{RF}}\times Q_{\text{tr}}italic_B × italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT observation tensor 𝐘Psuperscript𝐘P{\mathbf{Y}}^{\text{P}}bold_Y start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT. To facilitate data processing in the neural network, the complex-to-real operation is used to separate the real and imaginary parts of 𝐘Psuperscript𝐘P{\mathbf{Y}}^{\text{P}}bold_Y start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT, and then are stacked along the antenna dimension to obtain the real-value input tensor 𝐘¯P={(𝐘P),(𝐘P)}B×2MRF×Qtrsuperscript¯𝐘Psuperscript𝐘Psuperscript𝐘Psuperscript𝐵2subscript𝑀RFsubscript𝑄tr{\bar{\mathbf{Y}}^{\text{P}}}=\{\Re({{\mathbf{Y}}^{\text{P}}}),{\Im}({{\mathbf% {Y}}^{\text{P}}})\}\in\mathbb{C}^{B\times 2M_{\text{RF}}\times Q_{\text{tr}}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT = { roman_ℜ ( bold_Y start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT ) , roman_ℑ ( bold_Y start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT ) } ∈ blackboard_C start_POSTSUPERSCRIPT italic_B × 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. We exploit an implicit CSI learning network C()superscriptC{{\mathcal{F}}^{\text{C}}}(\cdot)caligraphic_F start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT ( ⋅ ) to map the latent CSI semantic 𝛀𝛀\bf{\Omega}bold_Ω from the observation tensor 𝐘¯Psuperscript¯𝐘P\bar{{\mathbf{Y}}}^{\text{P}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT, which is given by

𝛀=C(𝝎C,𝐘¯P),𝛀superscriptCsuperscript𝝎Csuperscript¯𝐘P\displaystyle{\bf{\Omega}}={{\mathcal{F}}^{\text{C}}}(\bm{\omega}^{\text{C}},% \bar{\mathbf{Y}}^{\text{P}}),bold_Ω = caligraphic_F start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT , over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT ) , (27)

where 𝝎Csuperscript𝝎C\bm{\omega}^{\text{C}}bold_italic_ω start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT denotes the trainable network parameters of the implicit CSI learning network. In Section VI-A, the detailed network architecture of the proposed UL-CT module C()superscriptC{{\mathcal{F}}^{\text{C}}}(\cdot)caligraphic_F start_POSTSUPERSCRIPT C end_POSTSUPERSCRIPT ( ⋅ ) will be elaborated.

III-B Deep Learning-Based Downlink Wideband Beamforming

The proposed DL-BF module B()superscriptB{{\mathcal{F}}^{\text{B}}}(\cdot)caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT ( ⋅ ) is composed of a low-level shared network shB()subscriptsuperscriptBsh{{\mathcal{F}}^{\text{B}}_{\text{sh}}}(\cdot)caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT sh end_POSTSUBSCRIPT ( ⋅ ) and P𝑃Pitalic_P sub-networks pB()subscriptsuperscriptB𝑝{\mathcal{F}}^{\text{B}}_{p}(\cdot)caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( ⋅ ), 1pP1𝑝𝑃1\leq p\leq P1 ≤ italic_p ≤ italic_P. In the pipeline of information flow, the extracted CSI semantic 𝛀𝛀\bf{\Omega}bold_Ω in the UL-CT module is delivered to the shared network at first, which generates the shared features 𝚽=shB(ωshB,𝛀)B×N𝚽subscriptsuperscriptBshsubscriptsuperscript𝜔Bsh𝛀superscript𝐵𝑁{\bf{\Phi}}={{\mathcal{F}}^{\text{B}}_{\text{sh}}}(\omega^{\text{B}}_{\text{sh% }},{\bf{\Omega}})\in\mathbb{R}^{B\times N}bold_Φ = caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT sh end_POSTSUBSCRIPT ( italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT sh end_POSTSUBSCRIPT , bold_Ω ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_N end_POSTSUPERSCRIPT. Then, the downlink frequency-dependent phase shifting at the RIS and the hybrid beamforming at the BS is obtained from different sub-networks, respectively, which is given by

{𝚯^down,f=exp(1j1B(𝝎1B,𝚽)),f,𝐅^PSdown=exp(1j2B(𝝎2B,𝚽)),𝐓^bdown=exp(1j2πfb3B(𝝎3B,𝚽)),b,𝐅^BB,bdown=Pt4B(𝝎4B,𝚽)𝐅PS𝐓b𝐅BB,bF,b,casessuperscript^𝚯down𝑓exp1jsubscriptsuperscriptB1subscriptsuperscript𝝎B1𝚽for-all𝑓subscriptsuperscript^𝐅downPSexp1jsubscriptsuperscriptB2subscriptsuperscript𝝎B2𝚽subscriptsuperscript^𝐓down𝑏exp1j2𝜋subscript𝑓𝑏subscriptsuperscriptB3subscriptsuperscript𝝎B3𝚽for-all𝑏subscriptsuperscript^𝐅downBB𝑏subscript𝑃𝑡subscriptsuperscriptB4subscriptsuperscript𝝎B4𝚽subscriptnormsubscript𝐅PSsubscript𝐓𝑏subscript𝐅BB𝑏𝐹for-all𝑏\displaystyle\left\{\begin{array}[]{l}\widehat{\bm{\Theta}}^{\text{down},f}=% \mathrm{exp}\left({1\mathrm{j}\cdot{\mathcal{F}}^{\text{B}}_{1}\left(\bm{% \omega}^{\text{B}}_{1},{\bf{\Phi}}\right)}\right),\forall f,\\ \widehat{\mathbf{F}}^{\text{down}}_{\text{PS}}=\mathrm{exp}\left(1\mathrm{j}% \cdot{\mathcal{F}}^{\text{B}}_{2}\left(\bm{\omega}^{\text{B}}_{2},{\bf{\Phi}}% \right)\right),\\ \widehat{\mathbf{T}}^{\text{down}}_{b}=\mathrm{exp}\left(1\mathrm{j}\cdot 2\pi f% _{b}\cdot{\mathcal{F}}^{\text{B}}_{3}\left(\bm{\omega}^{\text{B}}_{3},{\bf{% \Phi}}\right)\right),\forall b,\\ \widehat{\mathbf{F}}^{\text{down}}_{\text{BB},b}=\frac{\sqrt{P_{t}}{\mathcal{F% }}^{\text{B}}_{4}\left(\bm{\omega}^{\text{B}}_{4},{\bf{\Phi}}\right)}{\left\|% \mathbf{F}_{\mathrm{PS}}\mathbf{T}_{b}\mathbf{F}_{\mathrm{BB},b}\right\|_{F}},% \forall b,\end{array}\right.{ start_ARRAY start_ROW start_CELL over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT down , italic_f end_POSTSUPERSCRIPT = roman_exp ( 1 roman_j ⋅ caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_Φ ) ) , ∀ italic_f , end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT PS end_POSTSUBSCRIPT = roman_exp ( 1 roman_j ⋅ caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_Φ ) ) , end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_T end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = roman_exp ( 1 roman_j ⋅ 2 italic_π italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ⋅ caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , bold_Φ ) ) , ∀ italic_b , end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT BB , italic_b end_POSTSUBSCRIPT = divide start_ARG square-root start_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG caligraphic_F start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , bold_Φ ) end_ARG start_ARG ∥ bold_F start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT bold_T start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_ARG , ∀ italic_b , end_CELL end_ROW end_ARRAY (32)

where 𝝎shBsubscriptsuperscript𝝎Bsh\bm{\omega}^{\text{B}}_{\text{sh}}bold_italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT sh end_POSTSUBSCRIPT and 𝝎pB(1p4)subscriptsuperscript𝝎B𝑝1𝑝4\bm{\omega}^{\text{B}}_{p}(1\leq p\leq 4)bold_italic_ω start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( 1 ≤ italic_p ≤ 4 ) denote the trainable network parameters of the shared network and sub-network s𝑠sitalic_s, respectively. Similar to the tensor constraints in the proposed UL-CT module, the specific constraints need to be satisfied in the output of each sub-network. Firstly, the unit-modulus constraints are applied to the phase shifting 𝚯^down,fsuperscript^𝚯down𝑓\widehat{\bm{\Theta}}^{\text{down},f}over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT down , italic_f end_POSTSUPERSCRIPT at the RIS and PS-based analog beamformer 𝐅^PSdownsubscriptsuperscript^𝐅downPS\widehat{\mathbf{F}}^{\text{down}}_{\text{PS}}over^ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT PS end_POSTSUBSCRIPT at the BS by utilizing the complex exponent function. Secondly, the time-delay vector in TTD-based analog beamformer 𝐓^bdownsubscriptsuperscript^𝐓down𝑏\widehat{\mathbf{T}}^{\text{down}}_{b}over^ start_ARG bold_T end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is normalized referring to the operation in (25). Finally, the power normalization is carried out for the digital beamformer 𝐅^BB,bdownsubscriptsuperscript^𝐅downBB𝑏\widehat{\mathbf{F}}^{\text{down}}_{\text{BB},b}over^ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT BB , italic_b end_POSTSUBSCRIPT.

III-C Joint Optimization of E2E Beamforming Framework

To maximize the spectral efficiency in (20), the loss function (𝝎)𝝎\mathcal{L}(\bm{\omega})caligraphic_L ( bold_italic_ω ) is designed as the negative spectral efficiency in the network training, which is minimized by utilizing the gradient descent methods. The spectral efficiency maximization problem can be reformulated as

min𝝎subscript𝝎\displaystyle\min_{\bm{\omega}}~{}roman_min start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT (𝝎)=R(𝝎,𝐅^PSdown,𝐓^bdown,𝐅^BB,bdown,𝚯^bdown,f,Qtr)𝝎𝑅𝝎subscriptsuperscript^𝐅downPSsubscriptsuperscript^𝐓down𝑏subscriptsuperscript^𝐅downBB𝑏subscriptsuperscript^𝚯down𝑓𝑏subscript𝑄tr\displaystyle\mathcal{L}(\bm{\omega})=-R\left(\bm{\omega},\widehat{\mathbf{F}}% ^{\text{down}}_{\mathrm{PS}},\widehat{\mathbf{T}}^{\text{down}}_{b},\widehat{% \mathbf{F}}^{\text{down}}_{\mathrm{BB},b},{\widehat{\bm{\Theta}}^{\text{down},% f}_{b}},Q_{\text{tr}}\right)caligraphic_L ( bold_italic_ω ) = - italic_R ( bold_italic_ω , over^ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT , over^ start_ARG bold_T end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , over^ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT , over^ start_ARG bold_Θ end_ARG start_POSTSUPERSCRIPT down , italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT ) (33a)
s.t. (24),(25),(26),(32)s.t. italic-(24italic-)italic-(25italic-)italic-(26italic-)italic-(32italic-)\displaystyle\text{ s.t. }\eqref{exp1},\eqref{time},\eqref{upcom},\eqref{BF}s.t. italic_( italic_) , italic_( italic_) , italic_( italic_) , italic_( italic_) (33b)

where 𝝎𝝎{\bm{\omega}}bold_italic_ω denotes the overall network trainable parameters composed of the UL-CT module and the DL-BF module.

In the network training, a computational-efficiency stochastic optimization with first-order gradients is adopted to update 𝝎𝝎{\bm{\omega}}bold_italic_ω [30], which has been proven to be robust and well-suited to a wide range of non-convex optimization problems. Let gt=𝝎t(𝝎)=(𝝎t)𝝎tsubscript𝑔𝑡subscript𝝎subscript𝑡𝝎subscript𝝎𝑡subscript𝝎𝑡g_{t}={{\nabla_{{\bm{\omega}}}}\cal L}_{t}({\bm{\omega}})=\frac{{\partial% \mathcal{L}({{\bm{\omega}}_{t}})}}{{{{\bm{\omega}}_{t}}}}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_ω ) = divide start_ARG ∂ caligraphic_L ( bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG denote the gradient at timestep t𝑡titalic_t in the network training. The moving averages gradient mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the squared gradient vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at timestep t𝑡titalic_t are defined as

mt=β1tmt1+(1β1t)gt,subscript𝑚𝑡subscriptsuperscript𝛽𝑡1subscript𝑚𝑡11subscriptsuperscript𝛽𝑡1subscript𝑔𝑡\displaystyle m_{t}=\beta^{t}_{1}\cdot m_{t-1}+\left(1-\beta^{t}_{1}\right)% \cdot g_{t},italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_m start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋅ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (34)
vt=β2tvt1+(1β2t)gt2,subscript𝑣𝑡subscriptsuperscript𝛽𝑡2subscript𝑣𝑡11subscriptsuperscript𝛽𝑡2superscriptsubscript𝑔𝑡2\displaystyle v_{t}=\beta^{t}_{2}\cdot v_{t-1}+\left(1-\beta^{t}_{2}\right)% \cdot g_{t}^{2},italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ italic_v start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⋅ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (35)

where β1tsubscriptsuperscript𝛽𝑡1\beta^{t}_{1}italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and β2t[0,1)subscriptsuperscript𝛽𝑡201\beta^{t}_{2}\in[0,1)italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ [ 0 , 1 ) denote the hyper-parameters that control the exponential decay rates of mtsubscript𝑚𝑡m_{t}italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, respectively. Furthermore, the update rule of 𝝎𝝎{\bm{\omega}}bold_italic_ω at timestep t+1𝑡1t+1italic_t + 1 can be expressed as

𝝎t+1=𝝎tαtmt/(vt+ϵ^),subscript𝝎𝑡1subscript𝝎𝑡subscript𝛼𝑡subscript𝑚𝑡subscript𝑣𝑡^italic-ϵ\displaystyle{\bm{\omega}}_{t+1}={\bm{\omega}}_{t}-\alpha_{t}\cdot m_{t}/\left% (\sqrt{v_{t}}+\hat{\epsilon}\right),bold_italic_ω start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = bold_italic_ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / ( square-root start_ARG italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + over^ start_ARG italic_ϵ end_ARG ) , (36)

where αt=α1β2t/(1β1t)subscript𝛼𝑡𝛼1superscriptsubscript𝛽2𝑡1superscriptsubscript𝛽1𝑡\alpha_{t}=\alpha\cdot\sqrt{1-\beta_{2}^{t}}/\left(1-\beta_{1}^{t}\right)italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_α ⋅ square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG / ( 1 - italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) denotes the adaptive learning rate at timestep t𝑡titalic_t based on the default learning rate α𝛼\alphaitalic_α, and ϵ^^italic-ϵ\hat{\epsilon}over^ start_ARG italic_ϵ end_ARG is a regularization term to avoid dividing by zero.

Remark 3: Due to the low computational complexity in the weight update of network, the above stochastic optimization principle in the proposed E2E beamforming framework is a versatile algorithm that scales to large-scale high-dimensional non-linear map** problems. In addition, this stochastic optimization can adaptively adjust the learning rate in the network training process, thereby improving the convergence speed and generalization ability of the beamforming model for dynamic wireless environments.

IV Signal-Guided Beamforming Network Architecture in Near-Field Wideband RIS Systems

In this section, we will present the specific beamforming network components based on the proposed E2E beamforming framework in Section III, which consists of the polar-attention network architecture in the UL-CT module and multi-task network architecture in the DL-BF module.

IV-A Polarized Self-Attention for Channel Semantic Learning

Refer to caption


Figure 4: Polarized self-attention mechanism for channel semantic learning in the UL-CT module.

In the proposed E2E models, the UL-CT module need to learn the efficient channel semantics from the received pilot signal, which facilitates the beamforming optimization in the subsequent DL-BF module. In the popular network architectures for deep learning enabled MIMO communications, the classical convolutional neural network (CNN) with spatial modeling ability is usually used as the basic network backbone [31]. However, due to the limited receptive field of local convolution window, the insufficient ability to extract global information of CNN has been widely investigated. Accordingly, the promising self-attention mechanism has drawn enthusiastic concern. Motivated by the extensive representation learning ability of the self-attention mechanism, we exploit a dedicated polarized self-attention (PSA) mechanism to characterize the implicit CSI semantics, which leverages the specific physical knowledge of wireless communications data.

In contrast to the dataset in computer vision or nature language models, each dimension of input tensor 𝐘¯Psuperscript¯𝐘𝑃{\bar{\mathbf{Y}}^{P}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT in the proposed UL-CT module has the specific physical implications, which represents time, frequency, and antenna domains, respectively. In the vanilla self-attention mechanism, the attention operation is only carried out in the spatial domain of input tensor, i.e., the dimension 2MRF×Qtr2subscript𝑀RFsubscript𝑄tr2M_{\text{RF}}\times Q_{\text{tr}}2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT in 𝐘¯Psuperscript¯𝐘𝑃{\bar{\mathbf{Y}}^{P}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT. In this work, the self-attention mechanism is introduced into the frequency and temporal-spatial domain of 𝐘¯Psuperscript¯𝐘𝑃{\bar{\mathbf{Y}}^{P}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT by designing the dedicated PSA module [32], respectively. In Fig. 4, we present the detailed operation of the PSA module, which consists of the feature extraction branches in the frequency and time-spatial domain. Specifically, in the frequency feature extraction branch, the output features 𝐗FB×1×1superscript𝐗Fsuperscript𝐵11{{\mathbf{X}}^{\text{F}}}\in\mathcal{R}^{B\times 1\times 1}bold_X start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT italic_B × 1 × 1 end_POSTSUPERSCRIPT can be expressed as

𝐗F=Sig(𝐖zF(Re(𝐖vF(𝐘¯P))×Soft(Re(𝐖qF(𝐘¯P)))),\displaystyle{{\mathbf{X}}^{\text{F}}}=\text{Sig}\left(\mathbf{W}^{\text{F}}_{% z}(\text{Re}(\mathbf{W}^{\text{F}}_{v}({\bar{\mathbf{Y}}^{P}}))\times\text{% Soft}(\text{Re}(\mathbf{W}^{\text{F}}_{q}({\bar{\mathbf{Y}}^{P}})))\right),bold_X start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT = Sig ( bold_W start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( Re ( bold_W start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) ) × Soft ( Re ( bold_W start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) ) ) ) , (37)

where 𝐖iF,i={q,v,z}superscriptsubscript𝐖𝑖F𝑖𝑞𝑣𝑧\mathbf{W}_{i}^{\text{F}},i=\{q,v,z\}bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT , italic_i = { italic_q , italic_v , italic_z } denotes 1×1111\times 11 × 1 convolutional layer to reduce or increase the frequency-domain dimension of 𝐘¯Psuperscript¯𝐘𝑃{\bar{\mathbf{Y}}^{P}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT. Specifically, 𝐖qFsubscriptsuperscript𝐖F𝑞\mathbf{W}^{\text{F}}_{q}bold_W start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT, 𝐖vFsubscriptsuperscript𝐖F𝑣\mathbf{W}^{\text{F}}_{v}bold_W start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT, and 𝐖zFsubscriptsuperscript𝐖F𝑧\mathbf{W}^{\text{F}}_{z}bold_W start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT are composed of 1111, B/2𝐵2B/2italic_B / 2, and B𝐵Bitalic_B convolutional filters, respectively. Function Re()Re\text{Re}(\cdot)Re ( ⋅ ) represents the tensor reshape operator to adjust the dimension of different feature tensors. Functions Soft()Soft\text{Soft}(\cdot)Soft ( ⋅ ) and Sig()Sig\text{Sig}(\cdot)Sig ( ⋅ ) denote Softmax and Sigmoid activation function, respectively, which can be expressed as

Soft(𝐱)=e𝐱ii=1Le𝐱i,Soft𝐱superscript𝑒subscript𝐱𝑖superscriptsubscript𝑖1𝐿superscript𝑒subscript𝐱𝑖\displaystyle\text{Soft}(\mathbf{x})=\frac{e^{\mathbf{x}_{i}}}{\sum_{i=1}^{L}{% e^{\mathbf{x}_{i}}}},Soft ( bold_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG , (38)
Sig(𝐱)=11+e𝐱i,Sig𝐱11superscript𝑒subscript𝐱𝑖\displaystyle\text{Sig}(\mathbf{x})=\frac{1}{1+e^{-{\mathbf{x}}_{i}}},Sig ( bold_x ) = divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG , (39)

where 𝐱L×1𝐱superscript𝐿1\mathbf{x}\in\mathcal{R}^{L\times 1}bold_x ∈ caligraphic_R start_POSTSUPERSCRIPT italic_L × 1 end_POSTSUPERSCRIPT denotes a feature tensor.

Similarly, the output features in the time-spatial feature extraction branch 𝐗T1×2MRF×Qtrsuperscript𝐗Tsuperscript12subscript𝑀RFsubscript𝑄tr{{\mathbf{X}}^{\text{T}}}\in\mathcal{R}^{1\times 2M_{\text{RF}}\times Q_{\text% {tr}}}bold_X start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ∈ caligraphic_R start_POSTSUPERSCRIPT 1 × 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_POSTSUPERSCRIPT can be expressed as

𝐗T=Sig(Re(𝐖vT(𝐘¯P))×Soft(𝒢(𝐖qT(𝐘¯P))),\displaystyle{{{\mathbf{X}}^{\text{T}}}}=\text{Sig}\left(\text{Re}(\mathbf{W}^% {\text{T}}_{v}({\bar{\mathbf{Y}}^{P}}))\times\text{Soft}(\mathcal{G}(\mathbf{W% }^{\text{T}}_{q}({\bar{\mathbf{Y}}^{P}}))\right),bold_X start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT = Sig ( Re ( bold_W start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) ) × Soft ( caligraphic_G ( bold_W start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) ) ) , (40)

where 𝐖iT,i={q,v}superscriptsubscript𝐖𝑖T𝑖𝑞𝑣\mathbf{W}_{i}^{\text{T}},i=\{q,v\}bold_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT , italic_i = { italic_q , italic_v } denotes a 1×1111\times 11 × 1 convolutional layer composed of B/2𝐵2B/2italic_B / 2 convolutional filters, and 𝒢()𝒢\mathcal{G}(\cdot)caligraphic_G ( ⋅ ) denotes the global average pooling operator. For the feature tensor 𝐅T=𝐖q(𝐘¯P)B/2×2MRF×Qtrsuperscript𝐅Tsubscript𝐖𝑞superscript¯𝐘𝑃superscript𝐵22subscript𝑀RFsubscript𝑄tr\mathbf{F}^{\text{T}}=\mathbf{W}_{q}({\bar{\mathbf{Y}}^{P}})\in\mathcal{R}^{B/% 2\times 2M_{\text{RF}}\times Q_{\text{tr}}}bold_F start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT = bold_W start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) ∈ caligraphic_R start_POSTSUPERSCRIPT italic_B / 2 × 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_POSTSUPERSCRIPT obtained by a convolutional layer with the 1×1111\times 11 × 1 kernel, the feature vector 𝐳=[z1,,zb,,zB/2]B/2×1𝐳subscript𝑧1subscript𝑧𝑏subscript𝑧𝐵2superscript𝐵21{\mathbf{z}}=\left[{{z_{1}},\cdots,{z_{b}},\cdots,{z_{{B/2}}}}\right]\in{% \mathbb{R}^{{B/2}\times 1}}bold_z = [ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_B / 2 end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_B / 2 × 1 end_POSTSUPERSCRIPT after pooling operation 𝒢(𝐅T)𝒢superscript𝐅T\mathcal{G}(\mathbf{F}^{\text{T}})caligraphic_G ( bold_F start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ) is given by

zb=𝒢(𝐅T)=12MRF×Qtrm=12MRFq=1Qtr𝐅bT(m,q).subscript𝑧𝑏𝒢superscript𝐅T12subscript𝑀RFsubscript𝑄trsuperscriptsubscript𝑚12subscript𝑀RFsuperscriptsubscript𝑞1subscript𝑄trsubscriptsuperscript𝐅T𝑏𝑚𝑞\displaystyle{z_{b}}=\mathcal{G}(\mathbf{F}^{\text{T}})=\frac{1}{{2M_{\text{RF% }}\times Q_{\text{tr}}}}\sum\limits_{{m}=1}^{2M_{\text{RF}}}{\sum\limits_{{q}=% 1}^{Q_{\text{tr}}}{\mathbf{F}^{\text{T}}_{b}}}({m},{q}).italic_z start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = caligraphic_G ( bold_F start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_q = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_F start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_m , italic_q ) . (41)

Finally, the frequency and time-spatial features are fused in the UL-CT module, which can be expressed as

𝛀=𝐗FF𝐘¯𝐏+𝐗TT𝐘¯𝐏,𝛀superscriptdirect-productFsuperscript𝐗Fsuperscript¯𝐘𝐏superscriptdirect-productTsuperscript𝐗Tsuperscript¯𝐘𝐏\displaystyle\bf{\Omega}={{\mathbf{X}}^{\text{F}}}\odot^{\text{F}}{\bar{% \mathbf{Y}}^{P}}+{{\mathbf{X}}^{\text{T}}}\odot^{\text{T}}{\bar{\mathbf{Y}}^{P% }},bold_Ω = bold_X start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT ⊙ start_POSTSUPERSCRIPT F end_POSTSUPERSCRIPT over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT bold_P end_POSTSUPERSCRIPT + bold_X start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ⊙ start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT bold_P end_POSTSUPERSCRIPT , (42)

where i,i{F,T}superscriptdirect-product𝑖𝑖FT\odot^{i},i\in\{\text{F},\text{T}\}⊙ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_i ∈ { F , T } denotes a channel-wise or spatial-wise multiplication operator, respectively. Note that compared to the classic self-attention mechanism, the PSA module has lower computational complexity due to the separable dual branch attention architecture222For the observation pilot tensor 𝐘¯PB×MRF×Qtrsuperscript¯𝐘Psuperscript𝐵subscript𝑀RFsubscript𝑄tr\bar{{\mathbf{Y}}}^{\text{P}}\in\mathbb{R}^{B\times M_{\text{RF}}\times Q_{% \text{tr}}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_POSTSUPERSCRIPT in the UL-CT module, the uplink pilots in each subcarrier are independently designed, and hence the frequency domain features are processed in an independent branch. Since the temporal-spatial domain of the received pilot signal is strongly correlated and the number of RF chains MRFsubscript𝑀RFM_{\text{RF}}italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT is relatively small, the temporal-spatial domain data tensor is simultaneously operated in the UL-CT module. This arrangement efficiently facilitates the feature learning process for 𝐘¯¯𝐘\bar{{\mathbf{Y}}}over¯ start_ARG bold_Y end_ARG..

IV-B Signal-Guided Network for Beamforming Design

Refer to caption


Figure 5: Signal-guided shared network architecture for beamforming design in the DL-BF module.

The DL-BF network consists of the low-level shared network to extract the shared embedded features and multiple independent sub-networks to jointly design near-field wideband beamforming.

IV-B1 Signal-Guided Shared Network Architecture

In the large-scale array communications at high frequencies, wireless channels present the natural sparsity due to limited scatterer paths. Considering the severe energy spreading effect, the sparsity representation in the far-field channel, such as the DFT transform-based angular domain sparsity [33], may no longer be applicable to the near-field channel. However, as a classic digital signal processing tool, the DFT principle is still useful for guiding the network architecture design [34], especially for the dedicated network in wireless communications. In the proposed DL-BF module, we incorporate Fourier transform to the conventional self-attention mechanism and employ learnable filters to interchange information globally among the feature tokens in the Fourier domain, which is termed as a signal-guided deep learning approach.

As shown in Fig. 5, for the feature 𝛀𝛀\bf{\Omega}bold_Ω obtained by the UL-CT module in Fig. 4, the two-dimensional DFT at subcarrier b𝑏bitalic_b is carried out at first, which can be expressed as

𝐅bDFT[u,v]=(𝛀b)=m=02MRF1q=0Qtr1𝛀b[m,q]ej2π(um2MRF+vqQtr).subscriptsuperscript𝐅DFT𝑏𝑢𝑣subscript𝛀𝑏superscriptsubscript𝑚02subscript𝑀RF1superscriptsubscript𝑞0subscript𝑄tr1subscript𝛀𝑏𝑚𝑞superscript𝑒𝑗2𝜋𝑢𝑚2subscript𝑀RF𝑣𝑞subscript𝑄tr\displaystyle{\bf{F}}^{\text{DFT}}_{b}[u,v]=\mathcal{F}({\bf{\Omega}}_{b})=% \sum_{m=0}^{2M_{\text{RF}}-1}\sum_{q=0}^{Q_{\text{tr}}-1}{\bf{\Omega}}_{b}[m,q% ]e^{-j2\pi\left(\frac{um}{2M_{\text{RF}}}+\frac{vq}{Q_{\text{tr}}}\right)}.bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_u , italic_v ] = caligraphic_F ( bold_Ω start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_m = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_q = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT bold_Ω start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_m , italic_q ] italic_e start_POSTSUPERSCRIPT - italic_j 2 italic_π ( divide start_ARG italic_u italic_m end_ARG start_ARG 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_v italic_q end_ARG start_ARG italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_ARG ) end_POSTSUPERSCRIPT . (43)

Since the DFT of real input tensor 𝛀b[m,q]subscript𝛀𝑏𝑚𝑞{\bf{\Omega}}_{b}[m,q]bold_Ω start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_m , italic_q ] satisfies the conjugate symmetry property, i.e., 𝐅bDFT[2MRFu,Qtrv]=(𝐅bDFT[u,v])subscriptsuperscript𝐅DFT𝑏2subscript𝑀RF𝑢subscript𝑄tr𝑣superscriptsubscriptsuperscript𝐅DFT𝑏𝑢𝑣{\bf{F}}^{\text{DFT}}_{b}[2M_{\text{RF}}-u,Q_{\text{tr}}-v]=\left({\bf{F}}^{% \text{DFT}}_{b}[u,v]\right)^{*}bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT - italic_u , italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT - italic_v ] = ( bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_u , italic_v ] ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, this property implies that the half of 𝐅DFTB×2MRF×Qtr/2superscript𝐅DFTsuperscript𝐵2subscript𝑀RFsubscript𝑄tr2{\bf{F}}^{\text{DFT}}\in{\mathbb{R}^{B\times 2M_{\text{RF}}\times{Q_{\text{tr}% }/2}}}bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT contains the full information about the frequency characteristics of 𝛀𝛀\bf{\Omega}bold_Ω. In this case, we can take only the half of the values in 𝐅DFTsuperscript𝐅DFT{\bf{F}}^{\text{DFT}}bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT but preserve the full information, which can reduce the network parameters and computational cost. As mentioned above, the pure DFT is hard to fully exploit the near-field channel sparsity, we further employ a learnable filter 𝚿B×2MRF×Qtr/2𝚿superscript𝐵2subscript𝑀RFsubscript𝑄tr2{\bf{\Psi}}\in{\mathbb{R}^{B\times 2M_{\text{RF}}\times{Q_{\text{tr}}/2}}}bold_Ψ ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT / 2 end_POSTSUPERSCRIPT in the frequency domain to modulate the spectrum of 𝐅DFTsuperscript𝐅DFT{\bf{F}}^{\text{DFT}}bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT, which can be expressed as

𝐅¯DFT=𝚿𝐅DFT.superscript¯𝐅DFTdirect-product𝚿superscript𝐅DFT\displaystyle\bar{\mathbf{F}}^{\text{DFT}}={\bf{\Psi}}\odot{\mathbf{F}}^{\text% {DFT}}.over¯ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT = bold_Ψ ⊙ bold_F start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT . (44)

For the practical implementation of DFT in deep neural networks, the existing efficient algorithms for computing the DFT, i.e., the well-known fast Fourier transform (FFT) algorithms, are well supported by both GPU and CPU architectures, thanks to the acceleration libraries, e.g., cuFFT and mkl-fft [35]. The learnable filter 𝚿𝚿{\bf{\Psi}}bold_Ψ is designed as a tensor with trainable parameters, which can be adaptively optimized according to the training data in the network training stage. In the test stage, the parameters of 𝚿𝚿{\bf{\Psi}}bold_Ψ can be directly determined based on the received pilot tensor 𝐘¯Psuperscript¯𝐘P{\bar{\mathbf{Y}}^{\text{P}}}over¯ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT P end_POSTSUPERSCRIPT.

Next, the inverse DFT is used to transform 𝐅¯DFTsuperscript¯𝐅DFT\bar{\mathbf{F}}^{\text{DFT}}over¯ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT into the original domain, i.e., 𝐅bIDFT=1(𝐅¯bDFT)B×2MRF×Qtr,bformulae-sequencesubscriptsuperscript𝐅IDFT𝑏superscript1subscriptsuperscript¯𝐅DFT𝑏superscript𝐵2subscript𝑀RFsubscript𝑄trfor-all𝑏{\bf{F}}^{\text{IDFT}}_{b}=\mathcal{F}^{-1}(\bar{\mathbf{F}}^{\text{DFT}}_{b})% \in{\mathbb{R}^{B\times 2M_{\text{RF}}\times{Q_{\text{tr}}}}},\forall bbold_F start_POSTSUPERSCRIPT IDFT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over¯ start_ARG bold_F end_ARG start_POSTSUPERSCRIPT DFT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT × italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ∀ italic_b. Then, 𝐅IDFTsuperscript𝐅IDFT{\bf{F}}^{\text{IDFT}}bold_F start_POSTSUPERSCRIPT IDFT end_POSTSUPERSCRIPT is flatten to the feature tensor 𝐅M2MRFQtr×Bsuperscript𝐅Msuperscript2subscript𝑀RFsubscript𝑄tr𝐵\mathbf{F}^{\text{M}}\in{\mathbb{R}^{2M_{\text{RF}}Q_{\text{tr}}\times B}}bold_F start_POSTSUPERSCRIPT M end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT × italic_B end_POSTSUPERSCRIPT along the time-spatial dimension. A multi-layer perceptron (MLP) block that is composed of two linear layers, are used to realize the frequency-domain feature interaction of 𝐅Msuperscript𝐅M\mathbf{F}^{\text{M}}bold_F start_POSTSUPERSCRIPT M end_POSTSUPERSCRIPT, which is given by

𝐀f=GeLU(𝐅M𝐖1)𝐖2+𝐅M,superscript𝐀fGeLUsuperscript𝐅Msubscript𝐖1subscript𝐖2superscript𝐅M\displaystyle{\mathbf{A}^{\text{f}}=\rm{GeLU}(\mathbf{\mathbf{F}^{\text{M}}}% \mathbf{W}_{1})\cdot\mathbf{W}_{2}+\mathbf{\mathbf{F}^{\text{M}}},}bold_A start_POSTSUPERSCRIPT f end_POSTSUPERSCRIPT = roman_GeLU ( bold_F start_POSTSUPERSCRIPT M end_POSTSUPERSCRIPT bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋅ bold_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + bold_F start_POSTSUPERSCRIPT M end_POSTSUPERSCRIPT , (45)

where the first linear layer 𝐖1B×υB,(υ1)subscript𝐖1superscript𝐵𝜐𝐵𝜐1\mathbf{W}_{1}\in{\mathbb{R}^{B\times\upsilon B}},(\upsilon\geq 1)bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_υ italic_B end_POSTSUPERSCRIPT , ( italic_υ ≥ 1 ) projects the feature 𝐅Msuperscript𝐅M\mathbf{F}^{\text{M}}bold_F start_POSTSUPERSCRIPT M end_POSTSUPERSCRIPT into the high-dimension representation space. The second linear layer 𝐖2υB×Bsubscript𝐖2superscript𝜐𝐵𝐵\mathbf{W}_{2}\in{\mathbb{R}^{\upsilon B\times B}}bold_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_υ italic_B × italic_B end_POSTSUPERSCRIPT is used to recover the desired channel dimension again. Function GeLU()GeLU\rm{GeLU}(\cdot)roman_GeLU ( ⋅ ) denotes the Gaussian error linear unit activation function to provide the non-linearity of feature transformation.

In the signal-guided shared network architecture, t𝑡titalic_t learnable DFT blocks are stacked to extract the latent representation of CSI semantics in the Fourier domain. Finally, 𝐀fsuperscript𝐀f\mathbf{A}^{\text{f}}bold_A start_POSTSUPERSCRIPT f end_POSTSUPERSCRIPT is converted into a feature tensor 𝚽B×N𝚽superscript𝐵𝑁{\bf{\Phi}}\in\mathbb{R}^{B\times N}bold_Φ ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_N end_POSTSUPERSCRIPT by a linear layer, which is used as the input tensor for subsequent beamforming sub-networks.

IV-B2 High-Level Sub-Network Architecture

Considering the network complexity and the convenience of tensor operation in different sub-networks, the stacked linear layers are used as the basic component of sub-network architecture. Specifically, an MLP-Mixer module in [36] that consists of two MLP blocks, is designed to refine the feature extraction of the shared feature tensor 𝚽𝚽{\bf{\Phi}}bold_Φ. In the MLP-Mixer module, the first MLP block acts on columns of 𝚽𝚽{\bf{\Phi}}bold_Φ to obtain the feature tensor 𝚽1subscript𝚽1{\bf{\Phi}}_{1}bold_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, while the second MLP block in the MLP-Mixer module acts on columns of 𝚽1subscript𝚽1{\bf{\Phi}}_{1}bold_Φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to obtain the output feature tensor 𝚽2subscript𝚽2{\bf{\Phi}}_{2}bold_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. By leveraging the MLP operations along the columns and rows, the cross-variate information with the global dependency can be extracted.

In the output layers of sub-networks, the linear layers are designed to realize the dimension alignment between the feature tensor 𝚽2subscript𝚽2{\bf{\Phi}}_{2}bold_Φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and the desired frequency-dependent phase shifting, as well as the hybrid precoding matrices. For the frequency-dependent phase shifting at the TTD-RIS architecture, three parallel linear layers are used to construct the output 𝚯1downsubscriptsuperscript𝚯down1\bm{\Theta}^{\text{down}}_{1}bold_Θ start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝚯2downsubscriptsuperscript𝚯down2\bm{\Theta}^{\text{down}}_{2}bold_Θ start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝚲bdownsubscriptsuperscript𝚲down𝑏\bm{\Lambda}^{\text{down}}_{b}bold_Λ start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, and then are normalized according to (24) and (25), respectively. The frequency-dependent phase shifting 𝚯b𝕋subscriptsuperscript𝚯𝕋𝑏{\bm{\Theta}^{\mathbb{T}}_{b}}bold_Θ start_POSTSUPERSCRIPT blackboard_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is obtained by aggregating 𝚯1downsubscriptsuperscript𝚯down1\bm{\Theta}^{\text{down}}_{1}bold_Θ start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝚯2downsubscriptsuperscript𝚯down2\bm{\Theta}^{\text{down}}_{2}bold_Θ start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝐓¯bdownsubscriptsuperscript¯𝐓down𝑏\bar{\mathbf{T}}^{\text{down}}_{b}over¯ start_ARG bold_T end_ARG start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT in (16). In the SA-RIS architecture, we utilize a linear layer with N𝑁Nitalic_N neurons to obtain 𝚯𝕍superscript𝚯𝕍{\bm{\Theta}^{\mathbb{V}}}bold_Θ start_POSTSUPERSCRIPT blackboard_V end_POSTSUPERSCRIPT, in which N𝑁Nitalic_N neurons are divided into B𝐵Bitalic_B groups and each group consists of N/B𝑁𝐵N/Bitalic_N / italic_B neurons to map the phase shifting of a RIS subarray. Similarly, the frequency-dependent hybrid precoding matrices at the BS composed of the analog beamformer 𝐅PSdownsubscriptsuperscript𝐅downPS\mathbf{F}^{\text{down}}_{\mathrm{PS}}bold_F start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_PS end_POSTSUBSCRIPT, the time-delay vector 𝐓bdownsubscriptsuperscript𝐓down𝑏\mathbf{T}^{\text{down}}_{b}bold_T start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and the digital beamformer 𝐅BB,bdownsubscriptsuperscript𝐅downBB𝑏\mathbf{F}^{\text{down}}_{\mathrm{BB},b}bold_F start_POSTSUPERSCRIPT down end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_BB , italic_b end_POSTSUBSCRIPT, are obtained by constructing three sub-networks with the given constraints in (32).

IV-C Model Deployment and Complexity Analysis

The model deployment of the proposed E2E beamforming architecture can be divided into three stages: offline training, online finetuning, and real-time testing stages. 1) In the offline training stage, given a general communication dataset, the E2E model is trained according to the proposed optimization framework in Section III; 2) In the finetuning stage, the network parameters of the UL-CT module and the DL-BF module, the uplink wideband phase shifting at the RIS, and the combining matrices at the BS are updated according to the received pilot signal in the specific communication scenario; 3) In the testing phase, the server transmits the trained model to the BS, which can generate the desired phase shifting at the RIS and the hybrid precoding matrices at the BS for the target communication scenario.

In the proposed E2E beamforming framework, the time complexity of the PSA-based UL-CE module can be expressed as 𝒪(BMRFQtr(7B/4+1))𝒪𝐵subscript𝑀RFsubscript𝑄tr7𝐵41{\mathcal{O}}\left(BM_{\rm{RF}}Q_{\rm{tr}}(7B/4+1)\right)caligraphic_O ( italic_B italic_M start_POSTSUBSCRIPT roman_RF end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT roman_tr end_POSTSUBSCRIPT ( 7 italic_B / 4 + 1 ) ). In the DL-BF module, the time complexity of the low-level shared network can be represented by 𝒪(t(2BMRFQtr+2μB2)+2NMRFQtr)𝒪𝑡2𝐵subscript𝑀RFsubscript𝑄tr2𝜇superscript𝐵22𝑁subscript𝑀RFsubscript𝑄tr{\mathcal{O}}\left(t(2BM_{\rm{RF}}Q_{\rm{tr}}+2\mu B^{2})+2NM_{\rm{RF}}Q_{\rm{% tr}}\right)caligraphic_O ( italic_t ( 2 italic_B italic_M start_POSTSUBSCRIPT roman_RF end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT roman_tr end_POSTSUBSCRIPT + 2 italic_μ italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + 2 italic_N italic_M start_POSTSUBSCRIPT roman_RF end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT roman_tr end_POSTSUBSCRIPT ). For the TDD-RIS architecture, the time complexity of the high-level sub-netwoks is given by 𝒪(12μ(B2+C2)+B(2+B)+NMRF(K+Ns))𝒪12𝜇superscript𝐵2superscript𝐶2𝐵2𝐵𝑁subscript𝑀RF𝐾subscript𝑁s{\mathcal{O}}\left(12\mu(B^{2}+C^{2})+B(2+B)+NM_{\rm{RF}}(K+N_{\rm{s}})\right)caligraphic_O ( 12 italic_μ ( italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_B ( 2 + italic_B ) + italic_N italic_M start_POSTSUBSCRIPT roman_RF end_POSTSUBSCRIPT ( italic_K + italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) ), while the time complexity of the high-level sub-netwoks can be reduced to 𝒪(12μ(B2+C2)+N/B+NMRF(K+Ns))𝒪12𝜇superscript𝐵2superscript𝐶2𝑁𝐵𝑁subscript𝑀RF𝐾subscript𝑁s{\mathcal{O}}\left(12\mu(B^{2}+C^{2})+N/B+NM_{\rm{RF}}(K+N_{\rm{s}})\right)caligraphic_O ( 12 italic_μ ( italic_B start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + italic_N / italic_B + italic_N italic_M start_POSTSUBSCRIPT roman_RF end_POSTSUBSCRIPT ( italic_K + italic_N start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) ) in the SA-RIS architecture due to the relatively simpler RIS configuration. Furthermore, the space complexity of the UL-CE module is given by 𝒪(B(7B/4+1))𝒪𝐵7𝐵41{\mathcal{O}}\left(B(7B/4+1)\right)caligraphic_O ( italic_B ( 7 italic_B / 4 + 1 ) ). For the DL-BF module that consists of the learnable DFT and stacked linear layers, the space complexity is approximately equivalent to the time complexity.

V Numerical Results

In this section, we first introduce the simulation setups of the formulated near-field wideband systems and training hyper-parameters of the proposed models. Then, we compare the spectral efficiency of the proposed E2E models with the existing benchmarks, and further evaluate the beamforming performance under various system setups.

V-A Simulation Setups

In the simulation, we set M=128𝑀128M=128italic_M = 128, N=16×32𝑁1632N=16\times 32italic_N = 16 × 32, U=MRF=Ns=4𝑈subscript𝑀RFsubscript𝑁s4U=M_{\text{RF}}=N_{\text{s}}=4italic_U = italic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT = italic_N start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 4, B=16𝐵16B=16italic_B = 16 and LCP=4subscript𝐿CP4L_{\text{CP}}=4italic_L start_POSTSUBSCRIPT CP end_POSTSUBSCRIPT = 4. The carrier frequency is set to fc=73subscript𝑓𝑐73f_{c}=73italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 73 GHz and the communication bandwidth is W=7𝑊7W=7italic_W = 7 GHz in OFDM systems. In the clustered scatterer channel modeling, the number of clusters in both BS\toRIS, RIS\toUE and BS\toUE links is set to CsBR=CsRU=CsBU=3superscriptsubscript𝐶sBRsuperscriptsubscript𝐶sRUsuperscriptsubscript𝐶sBU3{C}_{\text{s}}^{\text{BR}}={C}_{\text{s}}^{\text{RU}}={C}_{\text{s}}^{\text{BU% }}=3italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT = italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT = italic_C start_POSTSUBSCRIPT s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT = 3, while the number of scatterers within cluster c𝑐citalic_c is set to ScBR=6superscriptsubscript𝑆𝑐BR6{S_{c}^{\text{BR}}}=6italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BR end_POSTSUPERSCRIPT = 6, ScRU=5superscriptsubscript𝑆𝑐RU5{S_{c}^{\text{RU}}}=5italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT = 5 and ScBU=4superscriptsubscript𝑆𝑐BU4{S_{c}^{\text{BU}}}=4italic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT BU end_POSTSUPERSCRIPT = 4, respectively. For each cluster c𝑐citalic_c, the central angle of AoA ϕcsubscriptitalic-ϕ𝑐\phi_{c}italic_ϕ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and AoD φcsubscript𝜑𝑐\varphi_{c}italic_φ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT follow the uniform distribution ϕc𝒰[π/2,π/2]similar-tosubscriptitalic-ϕ𝑐𝒰𝜋2𝜋2\phi_{c}\sim\mathcal{U}[-\pi/2,\pi/2]italic_ϕ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∼ caligraphic_U [ - italic_π / 2 , italic_π / 2 ] and φc𝒰[π/2,π/2]similar-tosubscript𝜑𝑐𝒰𝜋2𝜋2\varphi_{c}\sim\mathcal{U}[-\pi/2,\pi/2]italic_φ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∼ caligraphic_U [ - italic_π / 2 , italic_π / 2 ]. The corresponding angular spreads are set to σϕ=σφ=5subscript𝜎italic-ϕsubscript𝜎𝜑superscript5{{\sigma}_{\phi}}={{\sigma}_{\varphi}}=5^{\circ}italic_σ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT = 5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT for scatterer paths within cluster c𝑐citalic_c. The coordinates of BS and RIS are set to 𝐜B=(xB,yB,zB)=(0,0,5)superscript𝐜Bsuperscript𝑥Bsuperscript𝑦Bsuperscript𝑧B005\mathbf{c}^{\text{B}}=\left({x^{\text{B}},{y^{\text{B}}},{z^{\text{B}}}}\right% )=(0,0,5)bold_c start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT B end_POSTSUPERSCRIPT ) = ( 0 , 0 , 5 ) m and 𝐜R=(xR,yR,zR)=(0,20,5)superscript𝐜Rsuperscript𝑥Rsuperscript𝑦Rsuperscript𝑧R0205\mathbf{c}^{\text{R}}=\left({{x^{\text{R}}},y^{\text{R}},{z^{\text{R}}}}\right% )=(0,20,5)bold_c start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT , italic_z start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT ) = ( 0 , 20 , 5 ) m, respectively. The coordinate of UE is randomly sampled in the 1 m height with a horizontal radius of 5 m centered on RIS. The number of TTD units connected by each RF chain is K=16𝐾16K=16italic_K = 16 at the BS, in which the maximum time delay is tmax=5subscript𝑡max5t_{\text{max}}=5italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = 5 nanoseconds for each TTD unit. In TTD-RIS architecture, the number of subarray is S=8×8𝑆88{S}=8\times 8italic_S = 8 × 8, while the number of virtual subarray at the SA-RIS architecture is fixed as the number of subcarriers B𝐵Bitalic_B. The array gain at the transmit antenna, the receive antenna and the RIS element are set to GB=25subscript𝐺B25G_{\text{B}}=25italic_G start_POSTSUBSCRIPT B end_POSTSUBSCRIPT = 25 dBi, GU=20subscript𝐺U20G_{\text{U}}=20italic_G start_POSTSUBSCRIPT U end_POSTSUBSCRIPT = 20 dBi [13], and GR=5subscript𝐺R5G_{\text{R}}=5italic_G start_POSTSUBSCRIPT R end_POSTSUBSCRIPT = 5 dBi [22]. In the proposed UL-CT module, the channel training overhead is set to Qtr=NU/8subscript𝑄tr𝑁𝑈8Q_{\text{tr}}=NU/8italic_Q start_POSTSUBSCRIPT tr end_POSTSUBSCRIPT = italic_N italic_U / 8. The received SNR at the uplink pilot transmission stage is defined as SNRR=(𝐇𝚯bfb𝐆b+𝐃)b𝐖bF2σb2\text{SNR}_{\text{R}}=\frac{\left\|{\left({\mathbf{{H}}{{}_{b}}{\bm{\Theta}^{f% }_{b}}\mathbf{G}_{b}+\mathbf{{D}}{{}_{b}}}\right)\mathbf{W}_{b}}\right\|_{F}^{% 2}}{\sigma_{b}^{2}}SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT = divide start_ARG ∥ ( bold_H start_FLOATSUBSCRIPT italic_b end_FLOATSUBSCRIPT bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT + bold_D start_FLOATSUBSCRIPT italic_b end_FLOATSUBSCRIPT ) bold_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG for subcarrier b𝑏bitalic_b, while the transmitted SNR at the downlink beamforming stage is defined as SNRT=Ptσ02subscriptSNRTsubscript𝑃𝑡superscriptsubscript𝜎02\text{SNR}_{\text{T}}=\frac{P_{t}}{\sigma_{0}^{2}}SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT = divide start_ARG italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG with σ02=σb2,bsuperscriptsubscript𝜎02superscriptsubscript𝜎𝑏2for-all𝑏\sigma_{0}^{2}=\sigma_{b}^{2},\forall bitalic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_σ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ∀ italic_b. In each training iteration of the proposed E2E models, the uplink SNRRsubscriptSNRR\text{SNR}_{\text{R}}SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT in UL-CT module is randomly selected from the SNR range of [0,,20]020[0,\ldots,20][ 0 , … , 20 ] dB with the interval of 5 dB to improve the robustness of the trained E2E models, while the downlink SNRTsubscriptSNRT\text{SNR}_{\text{T}}SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT in DL-BF module is fixed as 20 dB. In the test stage, the uplink and downlink SNRs are set to SNRR=10subscriptSNRR10\text{SNR}_{\text{R}}=10SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT = 10 dB and SNRT=20subscriptSNRT20\text{SNR}_{\text{T}}=20SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT = 20 dB unless other specified, respectively. In this work, we compare the proposed E2E models with the following wideband beamforming benchmarks.

\bulletProjected gradient descent-based precoding (PGDP) with fully-digital beamforming architecture at the BS[37]: A joint optimization framework for the covariance matrix of the transmitted signal and the phase shifting of RIS elements. Based on the typical PGDP method, we construct an ideal PGDP method to characterize the performance upper bound in the formulated near-field wideband RIS system, in which the phase shifting of RIS elements is assumed to be independently designed for each subcarrier.

\bulletAlternative delay-phase precoding (ADPP) with TTD-based hybrid beamforming architecture[14]: A dedicated hybrid beamforming architecture to deal with the beam split effect in conventional wideband MIMO systems.

\bulletAlternative manifold optimization-based precoding (AMOP) with the classic hybrid beamforming architecture[38]: A joint beamforming and channel reconfiguration method for RIS-aided mmWave MIMO-OFDM systems.

In the above beamforming benchmarks, the accurate CSI is required to achieve the efficient beamforming optimization. To comprehensively present the performance advantages of the proposed E2E models, we compare the existing beamforming benchmarks with the perfect CSI at first, wherein the ideal spectral efficiency, i.e., omitting training overhead in (19), is used as the performance metric. Then, we compare the practical beamforming benchmarks with the estimated CSI, in which the parallel factor (PARAFAC) decomposition-based RIS channel estimation method is used to obtain the required CSI [39, 40]. Due to the full rank condition involving the LS problem, the required minimum pilot training overhead is U(N+1)𝑈𝑁1U(N+1)italic_U ( italic_N + 1 ) in the PARAFAC-based channel estimation method.

V-B Comparison between Different Beamforming Schemes

Refer to caption


Figure 6: Convergence of E2E models for different RIS architectures.

In Fig. 6, we present the convergence of the proposed E2E models for the proposed TTD-RIS and SA-RIS architectures, in which the average loss of validation dataset in each training epoch is computed according to (33). Compared to the E2E model with SA-RIS architecture, the optimization of 𝚯bfsubscriptsuperscript𝚯𝑓𝑏{\bm{\Theta}^{f}_{b}}bold_Θ start_POSTSUPERSCRIPT italic_f end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT in the TTD-RIS architecture is more complex, which increases the network scale of the DL-BF module in the E2E model. Due to the simplified network components, the convergence speed of the SA-RIS architecture is faster than the TTD-RIS architecture. However, with the increase of training epochs, the superior convergence performance can be obtained for the E2E model with TTD-RIS architecture.

Refer to caption


Figure 7: Spectral efficiency versus downlink SNRTsubscriptSNRT\text{SNR}_{\text{T}}SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT for different beamforming schemes, in which the perfect CSI is assumed to be known.

In Fig. 7, we present the spectral efficiency of the proposed E2E models and the ideal beamforming benchmarks with the perfect CSI. Note that for the proposed E2E models, i.e., the proposed TTD-RIS and SA-RIS in Fig. 7, the UL-CT module is exploited to learn the implicit CSI instead of the prior assumption of the perfect CSI. For the existing beamforming benchmarks, the near-field double beam split will degrade the efficient beamforming gain. Specifically, the AMOP method adopt the conventional hybrid beamformer architecture, which cannot design the frequency-dependent analog beamformer at the BS and the phase shifting at the RIS. In ADPP method, the BS is equipped with MRFKsubscript𝑀RF𝐾M_{\text{RF}}Kitalic_M start_POSTSUBSCRIPT RF end_POSTSUBSCRIPT italic_K TTD units to construct the frequency-dependent hybrid-beamforming at the BS, while the PGDP method adopt the fully-digital precoding architecture at BS to avoid the wideband beam split at the BS. However, the wideband beam split at the RIS cannot be addressed pertinently for the ADPP and PGDP methods. Under the perfect CSI assumption, the ideal PGDP methods can obtain the performance upper bound, while the ideal RIS architecture cannot be implemented in practical communication systems. In the proposed E2E models, by exploiting frequency-dependent RIS architectures, i.e., the TTD-RIS and the SA-RIS, and develo** deep learning-based beamforming networks, the beamforming gain of the proposed E2E models is superior to the conventional beamforming benchmarks. In the SA-RIS architecture, the effective array aperture will be shrunk due to the virtual subarray division strategy, and hence the frequency-dependent beamforming gain will be reduced compared to the TTD-RIS architecture.

Refer to caption


Figure 8: Spectral efficiency versus downlink SNRTsubscriptSNRT\text{SNR}_{\text{T}}SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT for different beamforming schemes, in which the uplink SNR is set to SNRR=10subscriptSNRR10\text{SNR}_{\text{R}}=10SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT = 10 dB.

In Fig. 8, we further compare the effective spectral efficiency of the proposed E2E models with the practical beamforming benchmarks with the estimated CSI. Since the channel estimation error and the large pilot overhead, the effective spectral efficiency will be significantly decreased for the conventional beamforming schemes. In the proposed E2E models with less pilot overhead, the available effective spectral efficiencies of both TTD-RIS and SA-RIS architectures are superior to the existing beamforming benchmarks.

Refer to caption


Figure 9: Spectral efficiency versus uplink SNRRsubscriptSNRR\text{SNR}_{\text{R}}SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT for different beamforming schemes, in which the downlink SNR is set to SNRT=20subscriptSNRT20\text{SNR}_{\text{T}}=20SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT = 20 dB.

In Fig. 9, we compare the spectral efficiency of the proposed E2E models with the existing beamforming benchmarks under different uplink SNRRsubscriptSNRR\text{SNR}_{\text{R}}SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT. For the case of low uplink SNR, the conventional beamforming approaches struggle as the noise-afflicted estimated CSI does not support efficient beamforming design. However, in the proposed E2E beamforming models, the implicit CSI acquisition and data-driven beamforming modules are jointly designed by constructing an efficient deep neural network, which can avoid the explicit error propagation between the channel estimation and beamforming modules in conventional approaches. Due to the ability of the network to learn specific latent representations from vast amounts of communication data, the proposed E2E models are robust against various disturbances in the input data. Moreover, the powerful nonlinear map** capability of deep learning significantly reduces noise interference, offering a substantial improvement over the conventional approaches, particularly in low SNR conditions. With the increase of SNRRsubscriptSNRR\text{SNR}_{\text{R}}SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT, the beamforming gain of all algorithms can be improved due to more accurate CSI, while the proposed E2E models are superior to the conventional beamforming benchmarks.

Refer to caption


Figure 10: Spectral efficiency versus bandwidth W𝑊Witalic_W for different beamforming schemes.

In Fig. 10, to characterize the beamforming performance loss caused by the near-field double beam split effect, we present the spectral efficiency of different beamforming schemes with increasing bandwidth W𝑊Witalic_W. We observe a general degradation in the spectral efficiency of beamforming schemes that rely on the hybrid precoding architecture as W𝑊Witalic_W increases, with the exception of the ideal PGDP algorithm implemented in a fully-digital architecture. In the ideal PGDP algorithm, the precoding matrices at the BS and the phase shifting at the RIS can be independently designed according to the specific subcarrier channel, achieving a beamforming gain that remains constant regardless of bandwidth W𝑊Witalic_W. Compared to the existing conventional beamforming benchmarks, the proposed TTD-RIS and SA-RIS architectures have the superior generalization and resistance for the near-field double beam split effect under the large communication bandwidth.

Refer to caption


Figure 11: Spectral efficiency versus distance rRUsuperscript𝑟RUr^{\text{RU}}italic_r start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT for different beamforming schemes.

In Fig. 11, we present the performance comparison between different beamforming approaches as the near-field effect increases. Specifically, in the considered near-field RIS system, the closer the distance rRUsuperscript𝑟RUr^{\text{RU}}italic_r start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT between the RIS and the user, the more pronounced the near-field effect becomes. To accurately depict the near-field effect, we have normalized the large-scale fading components of each communication link to a reference value, highlighting the impact of distance-dependent array responses in near-field channel modeling. By leveraging the near-field effect of the virtual LOS channel introduced by the RIS, the spectral efficiency of all beamforming schemes improves as the distance rRUsuperscript𝑟RUr^{\text{RU}}italic_r start_POSTSUPERSCRIPT RU end_POSTSUPERSCRIPT decreases. Compared to conventional beamforming approaches, the proposed beamforming models exhibit superior performance gain as the near-field effect becomes increasingly significant, wherein the proposed TDD-RIS and SA-RIS architectures are specifically designed to effectively deal with the near-field double beam split effect.

V-C Comparison for Different RIS Setups

Refer to caption


Figure 12: Spectral efficiency versus downlink SNRTsubscriptSNRT\text{SNR}_{\text{T}}SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT for different RIS setups.

Refer to caption


Figure 13: Spectral efficiency versus uplink SNRRsubscriptSNRR\text{SNR}_{\text{R}}SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT for different RIS setups.

In Fig. 12 and Fig. 13, we present the spectral efficiency of the proposed E2E models under different RIS setups versus downlink and uplink SNR, respectively. With the increase of the number of TTD units K𝐾Kitalic_K at the BS, the beamforming performance of both TTD-RIS and SA-RIS architectures has been enhanced. For the TTD-RIS architecture, the spectral efficiency can be further upgraded by increasing the number of TTD units S𝑆{S}italic_S at the RIS, while the number of virtual subarrays at the SA-RIS architecture is fixed as the number of subcarriers B𝐵Bitalic_B. Note that the introduction of TTD units and double-layer phase shifting circuits in the TTD-RIS architecture require the additional energy consumption and hardware cost. Specifically, the typical power consumption of a TTD unit is about 100 mW [7], while a 3-bit PS only about 1.5 mW [8]. In the TTD-RIS architecture, with S=8×8𝑆88{S}=8\times 8italic_S = 8 × 8 subarrays and PS=N×2=16×32×2superscript𝑃S𝑁216322{P}^{\text{S}}=N\times 2=16\times 32\times 2italic_P start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT = italic_N × 2 = 16 × 32 × 2 PSs, the power consumption of the TTD-RIS can be calculated as S×100+PS×1.5=7.936𝑆100superscript𝑃S1.57.936{S}\times 100+{P}^{\text{S}}\times 1.5=7.936italic_S × 100 + italic_P start_POSTSUPERSCRIPT S end_POSTSUPERSCRIPT × 1.5 = 7.936 W. For the SA-RIS architecture without TTD units, the power consumption is N×1.5=0.768𝑁1.50.768N\times 1.5=0.768italic_N × 1.5 = 0.768 W. Consequently, the SA-RIS architecture consumes significantly less power than the TTD-RIS architecture.

Refer to caption


Figure 14: Spectral efficiency of the proposed E2E models under different phase shifting quantization bits b𝑏bitalic_b.

In Fig. 14, we provide the spectral efficiency of the proposed E2E models under different phase shift quantization bits b¯¯𝑏\bar{b}over¯ start_ARG italic_b end_ARG. To realize the case of discrete phase shift, we incorporated quantization layers into both the UL-CT module and the DL-BF module, in which the quantized phase shift θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG with b¯¯𝑏\bar{b}over¯ start_ARG italic_b end_ARG bits for the original continuous phase shift θ[0,2π]𝜃02𝜋{\theta}\in[0,2\pi]italic_θ ∈ [ 0 , 2 italic_π ] can be expressed as θ¯=θ×2b¯2π×2π2b¯¯𝜃𝜃superscript2¯𝑏2𝜋2𝜋superscript2¯𝑏\bar{\theta}=\left\lfloor{\theta\times\frac{2^{\bar{b}}}{2\pi}}\right\rfloor% \times\frac{2\pi}{2^{\bar{b}}}over¯ start_ARG italic_θ end_ARG = ⌊ italic_θ × divide start_ARG 2 start_POSTSUPERSCRIPT over¯ start_ARG italic_b end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG ⌋ × divide start_ARG 2 italic_π end_ARG start_ARG 2 start_POSTSUPERSCRIPT over¯ start_ARG italic_b end_ARG end_POSTSUPERSCRIPT end_ARG. We employ a transfer learning strategy in [33] to train the discrete E2E model with phase shift quantization modules. We observe that the beamforming performance of the proposed TTD-RIS and SA-RIS architectures, even with a modest b¯=3¯𝑏3\bar{b}=3over¯ start_ARG italic_b end_ARG = 3 bits of phase shift quantization, closely approach the ideal performance of an infinite E2E model, which illustrates the robust generalization capability of the proposed E2E model in handling discrete phase shifts. In particular, in the case of low-resolution phase shifts, the TTD-RIS architecture with sub-connected TTD units consistently exhibits more stable performance compared to the SA-RIS architecture.

Refer to caption


Figure 15: Spectral efficiency of the proposed TTD-RIS architecture versus downlink SNRTsubscriptSNRT\text{SNR}_{\text{T}}SNR start_POSTSUBSCRIPT T end_POSTSUBSCRIPT for different training SNR setups.

In Fig. 15, we investigate the influence of the uplink SNR setting in the training stage for the proposed E2E models with the TTD-RIS architecture, in which the number of units at the BS and the RIS are set to K=8𝐾8K=8italic_K = 8 and S=32𝑆32{S}=32italic_S = 32, respectively. We observe that the dynamic training SNR setting employed in this work, i.e., training SNRR[0,,20]subscriptSNRR020\text{SNR}_{\text{R}}\in[0,\ldots,20]SNR start_POSTSUBSCRIPT R end_POSTSUBSCRIPT ∈ [ 0 , … , 20 ] dB, achieves more stable beamforming performance across the entire SNR range compared to other training SNR settings. This finding demonstrates that an appropriate training SNR setting can significantly enhance the testing performance of the deep learning-based beamforming model. In this work, the utilized training SNR setting takes into account the interference of communication noise and can also describe the effective distribution of data well. In particular, for the trained model by using the fixed SNR setting, the model has satisfactory performance for the given SNR in the test stage, while the trained model lacks the robustness for the dynamic SNR range. In the dynamic training SNR setup, the training sample space is enriched by introducing different levels of noise components. In essence, this training strategy resembles the data augmentation method in the traditional deep learning field, which can be extended to various deep learning-empowered communication scenarios, such as mmWave MIMO channel estimation [41, 42].

VI Conclusions

In this paper, a deep learning enabled near-field wideband beamforming scheme in RIS-aided MIMO systems was proposed, aiming for alleviating the beamforming performance loss caused by the near-field double beam split effect. Firstly, two specific RIS architectures, i.e., TTD-RIS and SA-RIS, were exploited to achieve the frequency-dependent passive beamforming. Compared to the SA-RIS architecture, the TTD-RIS architecture can obtain superior beamforming performance, while requiring more energy consumption and hardware cost due to the introduction of TTD units. Furthermore, the E2E beamforming optimization framework was proposed to jointly design the high-dimensional channel estimation and the frequency-dependent wideband beamforming. Moreover, to accelerate the convergence of the proposed E2E model, the advanced deep learning architectures and the classical communication signal processing theory were integrated to develop an efficient beamforming network backbone. Numerical results showed the proposed E2E models without the explicit CSI had superior beamforming performance and robustness to the existing wideband beamforming benchmarks.

References

  • [1] M. Di Renzo, A. Zappone, M. Debbah, M.-S. Alouini, C. Yuen, J. de Rosny, and S. Tretyakov, “Smart radio environments empowered by reconfigurable intelligent surfaces: How it works, state of research, and the road ahead,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2450–2525, Nov. 2020.
  • [2] C. Huang, Z. Yang, G. C. Alexandropoulos, K. Xiong, L. Wei, C. Yuen, Z. Zhang, and M. Debbah, “Multi-hop RIS-empowered Terahertz communications: A DRL-based hybrid beamforming design,” IEEE J. Sel. Areas Commun., vol. 39, no. 6, pp. 1663–1677, Jun. 2021.
  • [3] M. Cui, Z. Wu, Y. Lu, X. Wei, and L. Dai, “Near-field MIMO communications for 6G: Fundamentals, challenges, potentials, and future directions,” IEEE Commun. Mag., vol. 61, no. 1, pp. 40–46, Jan. 2023.
  • [4] X. Mu, J. Xu, Y. Liu, and L. Hanzo, “Reconfigurable intelligent surface-aided near-field communications for 6G: Opportunities and challenges,” IEEE Veh. Technol. Mag., vol. 19, no. 1, pp. 65–74, Mar. 2024.
  • [5] Y. Liu, Z. Wang, J. Xu, C. Ouyang, X. Mu, and R. Schober, “Near-field communications: A tutorial review,” IEEE Open J. Commun. Soc., vol. 4, pp. 1999–2049, Aug. 2023.
  • [6] Z. Wang, X. Mu, and Y. Liu, “Near-field integrated sensing and communications,” IEEE Commun. Lett., vol. 27, no. 8, pp. 2048–2052, Aug. 2023.
  • [7] L. Dai, J. Tan, Z. Chen, and H. V. Poor, “Delay-phase precoding for wideband THz massive MIMO,” IEEE Trans. Wireless Commun., vol. 21, no. 9, pp. 7271–7286, Sep. 2022.
  • [8] R. Su, L. Dai, and D. W. Ng, “Wideband precoding for RIS-aided THz communications,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3592–3604, Jun. 2023.
  • [9] W. Yan, W. Hao, C. Huang, G. Sun, O. Muta, H. Gacanin, and C. Yuen, “Beamforming analysis and design for wideband THz reconfigurable intelligent surface communications,” IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2306–2320, Aug. 2023.
  • [10] W. Hao, F. Zhou, M. Zeng, O. A. Dobre, and N. Al-Dhahir, “Ultra wideband THz IRS communications: Applications, challenges, key techniques, and research opportunities,” IEEE Network, vol. 36, no. 6, pp. 214–220, Dec. 2022.
  • [11] H. Sun, S. Zhang, J. Ma, and O. A. Dobre, “Time-delay unit based beam squint mitigation for RIS-aided communications,” IEEE Commun. Lett., vol. 26, no. 9, pp. 2220–2224, Sep. 2022.
  • [12] X. Mu, Y. Liu, L. Guo, J. Lin, and R. Schober, “Simultaneously transmitting and reflecting (STAR) RIS aided wireless communications,” IEEE Trans. Wireless Commun., vol. 21, no. 5, pp. 3083–3098, May 2022.
  • [13] Z. Wang, X. Mu, J. Xu, and Y. Liu, “Simultaneously transmitting and reflecting surface (STARS) for Terahertz communications,” IEEE J. Sel. Top. Signal Process., vol. 17, no. 4, pp. 861–877, Jul. 2023.
  • [14] M. Cui and L. Dai, “Near-field wideband beamforming for extremely large antenna arrays,” IEEE Trans. Wireless Commun., pp. 1–1, 2024.
  • [15] Z. Wang, X. Mu, Y. Liu, and R. Schober, “TTD configurations for near-field beamforming: Parallel, serial, or hybrid?” IEEE Trans. Commun., vol. 72, no. 6, pp. 3783–3799, Mar. 2024.
  • [16] J. Xu, L. You, G. C. Alexandropoulos, X. Yi, W. Wang, and X. Gao, “Near-field wideband extremely large-scale MIMO transmissions with holographic metasurface-based antenna arrays,” IEEE Trans. Wireless Commun., pp. 1–1, 2024.
  • [17] A. Nordio, L. Dossi, A. Tarable, and G. Virone, “Near-field IRS configuration techniques for wideband signals and THz communications,” in Proc. ICC Workshops, 2023, pp. 1198–1203.
  • [18] W. Hao, X. You, F. Zhou, Z. Chu, G. Sun, and P. Xiao, “The far-/near-field beam squint and solutions for THz intelligent reflecting surface communications,” IEEE Trans. Veh. Technol., vol. 72, no. 8, pp. 10 107–10 118, Aug. 2023.
  • [19] J. An, C. Xu, D. W. K. Ng, C. Yuen, L. Gan, and L. Hanzo, “Reconfigurable intelligent surface-enhanced OFDM communications via delay adjustable metasurface,” arXiv preprint arXiv:2110.09291, 2021.
  • [20] Y. Cheng, C. Huang, W. Peng, M. Debbah, L. Hanzo, and C. Yuen, “Achievable rate optimization of the RIS-aided near-field wideband uplink,” IEEE Trans. Wireless Commun., 2023.
  • [21] J. D. Kraus and R. J. Marhefka, “Antennas for all applications,” Antennas for all applications, 2002.
  • [22] E. Basar, I. Yildirim, and F. Kilinc, “Indoor and outdoor physical channel modeling and efficient positioning for reconfigurable intelligent surfaces in mmWave bands,” IEEE Trans. Wireless Commun., vol. 69, no. 12, pp. 8600–8611, Dec. 2021.
  • [23] S. Tarboush, H. Sarieddeen, H. Chen, M. H. Loukil, H. Jemaa, M.-S. Alouini, and T. Y. Al-Naffouri, “TeraMIMO: A channel simulator for wideband ultra-massive MIMO Terahertz communications,” IEEE Trans. Veh. Technol., vol. 70, no. 12, pp. 12 325–12 341, Dec. 2021.
  • [24] F. Zhao, W. Hao, X. You, Y. Wang, Z. Chu, and P. Xiao, “Joint beamforming optimization for IRS-aided THz communication with time delays,” IEEE Wireless Commun. Lett., vol. 13, no. 1, pp. 49–53, Jan. 2024.
  • [25] B. Zheng, C. You, W. Mei, and R. Zhang, “A survey on channel estimation and practical passive beamforming design for intelligent reflecting surface aided wireless communications,” IEEE Commun. Surveys Tuts., vol. 24, no. 2, pp. 1035–1071, Secondquarter 2022.
  • [26] X. Gan, C. Zhong, C. Huang, and Z. Zhang, “RIS-assisted multi-user MISO communications exploiting statistical CSI,” IEEE Trans. Commun., vol. 69, no. 10, pp. 6781–6792, Oct. 2021.
  • [27] J. Xiao, J. Wang, Z. Wang, W. Xie, and Y. Liu, “Multi-scale attention based channel estimation for RIS-aided massive MIMO systems,” IEEE Trans. Wireless Commun., pp. 1–1, 2023.
  • [28] T. L. Jensen and E. De Carvalho, “An optimal channel estimation scheme for intelligent reflecting surfaces based on a minimum variance unbiased estimator,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2020, pp. 5000–5004.
  • [29] M. Wu, Z. Gao, Y. Huang, Z. Xiao, D. W. K. Ng, and Z. Zhang, “Deep learning-based rate-splitting multiple access for reconfigurable intelligent surface-aided Terahertz massive MIMO,” IEEE J. Sel. Areas Commun., vol. 41, no. 5, pp. 1431–1451, May 2023.
  • [30] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [31] Z. Liu, L. Zhang, and Z. Ding, “Overcoming the channel estimation barrier in massive MIMO communication via deep learning,” IEEE Wireless Commun., vol. 27, no. 5, pp. 104–111, Oct. 2020.
  • [32] H. Liu, F. Liu, X. Fan, and D. Huang, “Polarized self-attention: Towards high-quality pixel-wise regression,” arXiv preprint arXiv:2107.00782, 2021.
  • [33] Z. Gao, M. Wu, C. Hu, F. Gao, G. Wen, D. Zheng, and J. Zhang, “Data-driven deep learning based hybrid beamforming for aerial massive MIMO-OFDM systems with implicit CSI,” IEEE J. Sel. Areas Commun., vol. 40, no. 10, pp. 2894–2913, Oct. 2022.
  • [34] Y. Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image classification,” in Proc. NeurIPS, vol. 34, 2021, p. 980–993.
  • [35] A. Paszke et al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. NeurIPS, vol. 32, 2019, p. 8026–8037.
  • [36] I. Tolstikhin et al., “MLP-Mixer: An all-MLP architecture for vision,” in Proc. NeurIPS, vol. 34, 2021, pp. 24 261–24 272.
  • [37] N. S. Perović, L.-N. Tran, M. Di Renzo, and M. F. Flanagan, “Achievable rate optimization for MIMO systems with reconfigurable intelligent surfaces,” IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 3865–3882, Jun. 2021.
  • [38] H. Wang, J. Fang, and H. Li, “Joint beamforming and channel reconfiguration for RIS-assisted millimeter wave massive MIMO-OFDM systems,” IEEE Trans. Veh. Technol., vol. 72, no. 6, pp. 7627–7638, Jun. 2023.
  • [39] G. T. de Araújo, A. L. F. de Almeida, and R. Boyer, “Channel estimation for intelligent reflecting surface assisted MIMO systems: A tensor modeling approach,” IEEE J. Sel. Top. Signal Process., vol. 15, no. 3, pp. 789–802, Apr. 2021.
  • [40] L. Wei, C. Huang, G. C. Alexandropoulos, C. Yuen, Z. Zhang, and M. Debbah, “Channel estimation for RIS-empowered multi-user MISO wireless communications,” IEEE Trans. Commun., vol. 69, no. 6, pp. 4144–4157, Jun. 2021.
  • [41] J. Xiao, J. Wang, Z. Chen, and G. Huang, “U-MLP-based hybrid-field channel estimation for XL-RIS assisted millimeter-wave MIMO systems,” IEEE Wireless Commun. Lett., vol. 12, no. 6, pp. 1042–1046, Jun. 2023.
  • [42] X. Wei, C. Hu, and L. Dai, “Deep learning for beamspace channel estimation in millimeter-wave massive MIMO systems,” IEEE Trans. Commun., vol. 69, no. 1, pp. 182–193, Jan. 2021.