\receiveddate\reviseddate\Editor

Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission

Han Xiao 11    Wenqiang Tian 11**    Shi ** 22    Wendong Liu 11    Jia Shen 11    Zhihua Shi and Zhi Zhang \corinfo[email protected] 1111 Dept. of Standards Research, OPPO, Bei**g, 100026, China National Mobile Communications Research Laboratory, Southeast University, Nan**g 211189, China
Abstract

In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. In addition, to address the complexity issue for inter-vendor collaboration and the generalization problem in practical deployments, respectively, this paper also provides a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers. Simulation results demonstrate the superiority of the proposed schemes on the performance of block error rate and throughput compared with existing counterparts.

keywords:
Superimposed pilot, interference cancellation, neural receiver, model scalability

1 Introduction

The accurate channel estimation is a key issue of wireless communication systems, which can be achieved through various kinds of pilots in the fifth generation (5G) new radio (NR) system [1, 2, 3], such as demodulation reference signal (DMRS), channel state information reference signal (CSI-RS) and sounding reference signal (SRS). Towards the sixth generation (6G) [4, 5], we can expect to see greater advancements in massive multiple input multiple output (MIMO), hybrid beamforming and high-speed scenarios, as well as an increased focus on vertical applications such as sensing and positioning. These will undoubtedly lead to a further growing demand for diverse kinds of pilots, which may result in increased competition for air interface wireless resources between data and pilot transmission.

In the 5G NR system [1], pilot design has been standardized as a series of pre-defined patterns and sequences, which fail to consider the implicit channel characteristics of specific scenarios. Recently, deep learning (DL) based methods for air interface enhancement show great potential in system performance improvement [6, 7, 8]. Specifically, DL based pilot design including sequence [9, 10, 11, 12] and pattern [13, 14] with corresponding neural network (NN) receiver are proposed to learn the optimal pilot and the receiver corresponding to specific channel characteristics. However, pilot in above solutions is allocated orthogonally to the data which results in considerable pilot overhead so that a loss of spectral efficiency. A non-orthogonal solution namely superimposed pilot (SIP) [15] allocates the pilot and data in the same time and frequency resource grids to alleviate the pilot overhead problem, where corresponding pilot power distribution and neural receiver can be jointly trained in an end-to-end manner [16].

Despite the great throughput performance with reduced pilot overhead, Exisiting DL based SIP [16] also suffers some drawbacks from the perspective of multi-layer transmission and practical deployment. Firstly, multi-layer transmission by precoding uses multiple transmit and receive antennas to simultaneously send multiple data streams, significantly enhancing throughput and making it crucial for advanced standards like 5G and beyond. However, among exisiting DL based SIP methods, there is no consideration dealing with the more serious intra-layer and inter-layer interference caused by SIP in multi-layer transmission, which results in performance loss and calls for brand-new architecture at both transmitter and receiver. Secondly, trainable parameters are at both base station (BS) and user equipment (UE), where the two-sided framework brings much more complexity to inter-vendor training collaboration, e.g. data collection, model training, monitoring, and other model life cycle management issues [17]. Thirdly, the generalization over different configurations, such as number of transmission layers and modulation and coding scheme (MCS) [18], is also ignored.

In this paper, an interference cancellation based neural receiver for SIP in multi-layer transmission is proposed, which involves the innovative design of multiple mechanisms to face the challenges of multi-layer transmission and practical deployment. The main contributions of this artical are summarized as follows.

  • To deal with the intra-layer and inter-layer interference of SIP in multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter.

  • Considering the practical deployment and standardization, a fixed SIP (F-SIP) based on constant pilot power ratio is designed where the realized one-sided model simplifies the inter-vendor collaboration.

  • To address the generalization problem in practical deployment, the scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers are also proposed, where one same model can work effectively in different MCS and layer configurations.

  • Various kinds of simulation results are provided to demonstrate the superiority of the proposed scheme on the performance of block error rate (BLER) and throughput compared with existing counterparts. These abundant simulations are performed with 3rd Generation Partnership Project (3GPP) link level channels, which may hopefully provide some referable insights for 3GPP discussions in the future.

The rest of this paper is organized as follows. The system model and existing pilot solutions are introduced in Section 2. The proposed scheme which involves the innovative design of multiple mechanisms at the transmitter and receiver is proposed in Section 3. Numerical experiments are provided in Section 4, and conclusions are given in Section 6.

2 System Description

2.1 System Model

We consider a typical downlink MIMO system with Ntsubscript𝑁tN_{\textrm{t}}italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT transmit antennas at BS and Nrsubscript𝑁rN_{\textrm{r}}italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT receive antennas at UE, where S𝑆Sitalic_S subcarriers with T𝑇Titalic_T consecutive orthogonal frequency division multiplexing (OFDM) symbols are allocated. Specifically, since we mainly focus on multi-layer transmission, the equivalent downlink channel tensor after precoding in frequency domain can be denoted as 𝐇S×T×L×Nr𝐇superscript𝑆𝑇𝐿subscript𝑁r\mathbf{H}\in\mathbb{C}^{S\times T\times L\times N_{\textrm{r}}}bold_H ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_L × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where L𝐿Litalic_L denote the number of layers. Received signal for the r𝑟ritalic_rth receive antenna can be expressed as

𝐘r=l=1L𝐇r,l𝐗l+𝐍rsubscript𝐘𝑟superscriptsubscript𝑙1𝐿subscript𝐇𝑟𝑙subscript𝐗𝑙subscript𝐍𝑟\mathbf{Y}_{r}=\sum_{l=1}^{L}\mathbf{H}_{r,l}\circ\mathbf{X}_{l}+\mathbf{N}_{r}bold_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_r , italic_l end_POSTSUBSCRIPT ∘ bold_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + bold_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT (1)

where 𝐘rS×Tsubscript𝐘𝑟superscript𝑆𝑇\mathbf{Y}_{r}\in\mathbb{C}^{S\times T}bold_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT denote the received signal, 1rNr1𝑟subscript𝑁r1\leq r\leq N_{\textrm{r}}1 ≤ italic_r ≤ italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT and 1lL1𝑙𝐿1\leq l\leq L1 ≤ italic_l ≤ italic_L are the receive antenna index and layer index, respectively. \circ denotes the Hadamard product, 𝐇r,lS×Tsubscript𝐇𝑟𝑙superscript𝑆𝑇\mathbf{H}_{r,l}\in\mathbb{C}^{S\times T}bold_H start_POSTSUBSCRIPT italic_r , italic_l end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT is a slice of tensor 𝐇𝐇\mathbf{H}bold_H and denotes the equivalent channel for the r𝑟ritalic_rth receive antenna and l𝑙litalic_lth layer. 𝐍rS×Tsubscript𝐍𝑟superscript𝑆𝑇\mathbf{N}_{r}\in\mathbb{C}^{S\times T}bold_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT is the corresponding additive white complex Gaussian noise with variance of σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT per element according to signal to noise ratio SNR =10log10(𝔼{l=1L|xl,s,t|2}/σ2)absent10subscript10𝔼superscriptsubscript𝑙1𝐿superscriptsubscript𝑥𝑙𝑠𝑡2superscript𝜎2=10\log_{10}{(\mathbb{E}\{\sum_{l=1}^{L}|x_{l,s,t}|^{2}\}/\sigma^{2})}= 10 roman_log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( blackboard_E { ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_l , italic_s , italic_t end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). 𝐗lS×Tsubscript𝐗𝑙superscript𝑆𝑇\mathbf{X}_{l}\in\mathbb{C}^{S\times T}bold_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT is the matrix of transmitted symbols for the l𝑙litalic_lth layer, which is capable of carrying data, pilot namely DMRS in 5G NR or DL based orthogonal pilot, or superimposed symbols from pilot and data as introduced later. For all schemes, the transmitted symbols are assumed to have an average energy equal to one, i.e., 𝔼{l=1L|xl,s,t|2}=1𝔼superscriptsubscript𝑙1𝐿superscriptsubscript𝑥𝑙𝑠𝑡21\mathbb{E}\{\sum_{l=1}^{L}|x_{l,s,t}|^{2}\}=1blackboard_E { ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT italic_l , italic_s , italic_t end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } = 1, where 1sS1𝑠𝑆1\leq s\leq S1 ≤ italic_s ≤ italic_S and 1tT1𝑡𝑇1\leq t\leq T1 ≤ italic_t ≤ italic_T.

2.2 DMRS in 5G NR

In this subsection, typical existing pilot solution of demodulation reference signal (DMRS) standardized in 5G NR [1] is introduced, wherein the pilots and data symbols are orthogonally allocated on different resource elements (REs). As shown in Fig. 1, two basic pilot patterns with number of OFDM symbols carrying pilot per slot Np=1subscript𝑁p1N_{\textrm{p}}=1italic_N start_POSTSUBSCRIPT p end_POSTSUBSCRIPT = 1 and Np=4subscript𝑁p4N_{\textrm{p}}=4italic_N start_POSTSUBSCRIPT p end_POSTSUBSCRIPT = 4 are designed for lower and higher speed, respectively. Meanwhile, pilots between different layers are designed orthogonally by frequency-division multiplexing (FDM) and code-division multiplexing (CDM).

Based on the pilots with pre-defined patterns, the legacy receiver performs channel estimation and data detection with some linear algorithms, such as linear minimum mean square error (LMMSE). Obviously, this kind of orthogonal pilot patterns bring inevitable overhead. In addition, the pattern switching for different scenarios also lead to cumbersome signaling exchange between BS and UE. Moreover, empirically designed pattern fail to consider the implicit characteristics of increasingly complex channel scenarios. These bottlenecks may result in considerable performance loss of throughput.

Refer to caption
Figure 1: Pilot patterns from 5G NR taking the number of layers L=2𝐿2L=2italic_L = 2 as an example.

2.3 Pilot based on DL

DL-based pilot design shows significant improvements compared with traditional solution in 5G NR. For DL-based orthogonal pilot design, the trainable parameters ΦΦ\Phiroman_Φ and ΘΘ\Thetaroman_Θ are equipped on the transmitter g(;Φ)𝑔Φg(\cdot;\Phi)italic_g ( ⋅ ; roman_Φ ) and receiver f(;Θ)𝑓Θf(\cdot;\Theta)italic_f ( ⋅ ; roman_Θ ), respectively, where ΦΦ\Phiroman_Φ configures the pilot sequence [9] or pilot pattern [13], and ΘΘ\Thetaroman_Θ underpins neural receiver. The parameters ΦΦ\Phiroman_Φ and ΘΘ\Thetaroman_Θ are jointly trained through an end-to-end manner, i.e.,

minΦ,Θbce(𝐁,f(h(g(𝐁;Φ));Θ)\begin{split}\min_{\Phi,\Theta}\ \mathcal{L}_{\textrm{bce}}(\mathbf{B},f(h(g(% \mathbf{B};\Phi));\Theta)\end{split}start_ROW start_CELL roman_min start_POSTSUBSCRIPT roman_Φ , roman_Θ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT bce end_POSTSUBSCRIPT ( bold_B , italic_f ( italic_h ( italic_g ( bold_B ; roman_Φ ) ) ; roman_Θ ) end_CELL end_ROW (2)

where 𝐁{0,1}S×T×M𝐁superscript01𝑆𝑇𝑀\mathbf{B}\in\{0,1\}^{S\times T\times M}bold_B ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_S × italic_T × italic_M end_POSTSUPERSCRIPT denotes the original encoded information bits, g(𝐁;Φ)𝑔𝐁Φg(\mathbf{B};\Phi)italic_g ( bold_B ; roman_Φ ) denotes the transmitting signal, h()h(\cdot)italic_h ( ⋅ ) denotes the process of passing channel, and f(𝐘;Θ)𝑓𝐘Θf(\mathbf{Y};\Theta)italic_f ( bold_Y ; roman_Θ ) represents the recovered bits or corresponding log-likelihood ratio (LLR), respectively. M𝑀Mitalic_M is the number of bits per symbol according to the modulation order, 𝐘=h(g(𝐁;Φ))S×T×Nr𝐘𝑔𝐁Φsuperscript𝑆𝑇subscript𝑁r\mathbf{Y}=h(g(\mathbf{B};\Phi))\in\mathbb{C}^{S\times T\times N_{\textrm{r}}}bold_Y = italic_h ( italic_g ( bold_B ; roman_Φ ) ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the received signal, and bcesubscriptbce\mathcal{L}_{\textrm{bce}}caligraphic_L start_POSTSUBSCRIPT bce end_POSTSUBSCRIPT denotes the binary crossentropy loss function. Obviously, orthogonal pilot in above solutions bring cumbersome signaling for pattern switching and inevitable overhead for pilot allocation. As for non-orthogonal DL-based solution [16] where pilot and data are superimposed and the parameters ΦΦ\Phiroman_Φ configures pilot power ratio, there are still non-negligible challenges in multi-layer transmission. More difficult than non-orthogonal multiple access (NOMA) [19] problem which only introduces the inter-user data interference, the SIP suffers not only from inter-layer data interference but also from intra-layer pilot and data interference. Moreover, the issues of the complexity of two-sided model and generalization of MCS and number of layers mentioned in Section 1 also need to be addressed.

3 Proposed Schemes

In this section, the motivation of designing the proposed schemes is first discussed. Then the proposed mechanisms at transmitter and receiver are introduced. Finally the total framework is formulated by combining all proposed mechanisms.

3.1 Motivation

3.1.1 Challenge of SIP in multi-layer transmission

In single-layer SIP transmission [16], the neural receiver can handle the interference of non-orthogonal pilot and data well. However, as the number of transmission layers increases, it introduces new challenge of intra-layer and inter-layer interference that is difficult for existing neural receivers to cope with. In more detial, accurate channel estimation is required to mitigate the inter-layer interference exploiting the low correlation of channels of different layers brought by the precoding process. Instead, less intra-layer and inter-layer interference to pilot is also required for accurate channel estimation. Increasing the power of the pilot can reduce intra-layer interference, yet it is followed by a reduction in the equivalent SNR of data. These create intractable contradictions and call for novel design of the framework for solving intra-layer and inter-layer interference challenge under multi-layer transmission of SIP.

3.1.2 Challenge of SIP in practical deployment

Considering the NN model in practical wireless communication systems, it is essential to employ appropriate parameter training to adapt the model to different transmission conditions and deploy it with low latency. However, to implement SIP, trainable paramenters of power ratio and NN model are at the both transmitter and receiver, respectively. This two-sided model brings cumbersome inter-vender collaboration such as data collection, model training, updating and switching [17]. Moreover, appropriate structure design for different system configuration also brings the problem of generalization. Specifically, distinct configuration such as layers and MCSs can lead to varying dimensions of the neural receiver inputs and outputs so that the different model structure. Thus, it cannot simply utilize the mixed dataset for model training to achieve structure generalization. Consequently, it is imperative to devise an efficient one-sided neural receiver capable of addressing the model structure generalization challenge, rather than relying on extensive model life cycle management to ensure model deployment and application.

3.2 Pre-design at Transmitter

Before introducing the interference cancellation based neural receiver, the pre-design at transmitter including intra-layer non-orthogonal F-SIP and inter-layer orthogonal code-division pilot are firstly proposed in this section to deal with the challenges of SIP in practical deployment and multi-layer transmission, respectively.

3.2.1 Intra-Layer Non-Orthogonal Fixed Superimposed Pilot

Intra-layer non-orthogonal F-SIP is first introduced in this subsection. Different from the orthogonal pilot patterns in 5G NR, the pilot and data symbols in proposed F-SIP are non-orthogonally superimposed in power domain. Different from the exising two-sided model of SIP solution with trainable paramenter at both transmisster and receiver, a fixed power allocation ratio 0<α<10𝛼10<\alpha<10 < italic_α < 1 is pre-set at transmitter. Then the transmitted matrix 𝐗lsubscript𝐗𝑙\mathbf{X}_{l}bold_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in (1) with superimposed symbols for the l𝑙litalic_lth layer can be denoted as

𝐗l=1α𝐃l+α𝐏lsubscript𝐗𝑙1𝛼subscript𝐃𝑙𝛼subscript𝐏𝑙\mathbf{X}_{l}=\sqrt{1-\alpha}\mathbf{D}_{l}+\sqrt{\alpha}\mathbf{P}_{l}bold_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = square-root start_ARG 1 - italic_α end_ARG bold_D start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + square-root start_ARG italic_α end_ARG bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT (3)

where 𝐃lS×Tsubscript𝐃𝑙superscript𝑆𝑇\mathbf{D}_{l}\in\mathbb{Q}^{S\times T}bold_D start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_Q start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT and 𝐏lS×Tsubscript𝐏𝑙superscript𝑆𝑇\mathbf{P}_{l}\in\mathbb{C}^{S\times T}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT denotes the data and pilot matrix for the l𝑙litalic_lth layer, respectively, \mathbb{Q}blackboard_Q denotes the constellation set according to the MCS configuration. Obviously, all REs are assigned with a unified and fixed power ratio α𝛼\alphaitalic_α, instead of a trainable parameter matrix ΦΦ\Phiroman_Φ with extra model training complexity for two-sided structure. A model management friendly one-sided framework serves as a premise here, and the time-frequency resource overhead of orthogonal pilot can be completely omitted.

Refer to caption
Figure 2: Illustration of proposed F-SIP, taking the number of layers L=2𝐿2L=2italic_L = 2 as an example for simplicity. Note that the same principle can be extended to the case of L>2𝐿2L>2italic_L > 2.

3.2.2 Inter-Layer Orthogonal Code-Division Pilot

In this subsection, the F-SIP is further extended to multi-layer transmission, wherein the inter-layer pilot inference introduced by F-SIP should be eliminated. An intuitive way to deal with inter-layer interference is using orthogonal pilots between different layers, such as FDM, time-division multiplexing (TDM) and CDM. However, considering pilots are allocated on orthogonal time and frequency REs in different layers for TDM and FDM, respectively, these two candidates are not suitable for F-SIP transmission where pilots and data are superimposed in all REs. Because the channel estimation based on interpolation in time or frequency domain may result in severer performance loss, especially for high speed or heavy frequency selective scenario. Therefore, CDM is selected in this paper since it can achieve inter-layer pilot orthogonality and meanwhile ensuring all REs can be equipped with pilots.

Specifically, the CDM for F-SIP based multi-layer can be expressed as

𝐏l𝐏kF=0,ljformulae-sequencesubscriptnormsubscript𝐏𝑙subscript𝐏𝑘F0𝑙𝑗\|\mathbf{P}_{l}\circ\mathbf{P}_{k}\|_{\rm{F}}=0,l\neq j∥ bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∘ bold_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_F end_POSTSUBSCRIPT = 0 , italic_l ≠ italic_j (4)

where the 1lL1𝑙𝐿1\leq l\leq L1 ≤ italic_l ≤ italic_L, 1kL1𝑘𝐿1\leq k\leq L1 ≤ italic_k ≤ italic_L are the layer indices and \|\cdot\|∥ ⋅ ∥ denotes the Frobenius norm.

To satisfy (4), all S×T𝑆𝑇S\times Titalic_S × italic_T REs in one layer are devided into G=S×T/L𝐺𝑆𝑇𝐿G=S\times T/Litalic_G = italic_S × italic_T / italic_L CDM groups as shown in Fig. 2. The pilots of different layers in the same group are distinguished by proposed discrete Fourier transform orthogonal mask code (DFT-OMC), i.e., the pilot sequence can be generated by

𝐩l,g=p^g𝐜lsubscript𝐩𝑙𝑔subscript^𝑝𝑔subscript𝐜𝑙\mathbf{p}_{l,g}=\hat{p}_{g}\mathbf{c}_{l}bold_p start_POSTSUBSCRIPT italic_l , italic_g end_POSTSUBSCRIPT = over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT bold_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT (5)

where 𝐩l,gL×1subscript𝐩𝑙𝑔superscript𝐿1\mathbf{p}_{l,g}\in\mathbb{C}^{L\times 1}bold_p start_POSTSUBSCRIPT italic_l , italic_g end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_L × 1 end_POSTSUPERSCRIPT is the vectorized pilot symbols of layer 1lL1𝑙𝐿1\leq l\leq L1 ≤ italic_l ≤ italic_L and group 1gG1𝑔𝐺1\leq g\leq G1 ≤ italic_g ≤ italic_G, p^gsubscript^𝑝𝑔\hat{p}_{g}\in\mathbb{P}over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ∈ blackboard_P is the pilot seed for group g𝑔gitalic_g, \mathbb{P}blackboard_P denotes the set of seeded constellation symbols with average power of 1/L1𝐿1/L1 / italic_L and zero mean, e.g., binary phase shift keying (BPSK) and quadrature phase shift keying (QPSK). 𝐜lL×1subscript𝐜𝑙superscript𝐿1\mathbf{c}_{l}\in\mathbb{C}^{L\times 1}bold_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_L × 1 end_POSTSUPERSCRIPT is the DFT-OMC of layer l𝑙litalic_l. Furthermore, DFT vectors is utilized to generate 𝐜lsubscript𝐜𝑙\mathbf{c}_{l}bold_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, , i.e.,

𝐜l=[1,,ej2πn(l1)L,,ej2π(L1)(l1)L]Tsubscript𝐜𝑙superscript1superscript𝑒𝑗2𝜋𝑛𝑙1𝐿superscript𝑒𝑗2𝜋𝐿1𝑙1𝐿T\mathbf{c}_{l}=[1,...,e^{\frac{-j2\pi n(l-1)}{L}},...,e^{\frac{-j2\pi(L-1)(l-1% )}{L}}]^{\rm{T}}bold_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = [ 1 , … , italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π italic_n ( italic_l - 1 ) end_ARG start_ARG italic_L end_ARG end_POSTSUPERSCRIPT , … , italic_e start_POSTSUPERSCRIPT divide start_ARG - italic_j 2 italic_π ( italic_L - 1 ) ( italic_l - 1 ) end_ARG start_ARG italic_L end_ARG end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT roman_T end_POSTSUPERSCRIPT (6)

where 0nL10𝑛𝐿10\leq n\leq L-10 ≤ italic_n ≤ italic_L - 1. Using the proposed DFT-OMC, the pilot orthogonality between different layers can be guaranteed with the required power normalization constraint.

3.3 Interference Cancellation based Neural Receiver

In this section, a novel neural receiver for F-SIP with multiple enhancing mechanisms is proposed, where the interference cancellation and superimposed symbol aided channel estimation are introduced to cope with the challenge of SIP in multi-layer transmission. The layer and MCS scalable mechanisms are also provided to solve the challenge of SIP in practical deployment.

Refer to caption
Figure 3: Illustration of proposed neural receiver, where the iteration index i𝑖iitalic_i is omitted for simplicity. Note that the principle of proposed mechanisms are insensitive to the structure of the feature extraction model for channel estimation and data detection, thus theirs backbones can be implemented flexibly.

3.3.1 Interference Cancellation with Superimposed Symbol Aided Channel Estimation

Next, in order to further handle the inter-layer and intra-layer interference when receiving F-SIP, a receiver with interference cancellation and superimposed symbol aided channel estimation is proposed, inwhich the algorithm includes V𝑉Vitalic_V outer iterations to realize interference cancellation. Note that the reception for L𝐿Litalic_L layers is formulated as L𝐿Litalic_L inner iterations in this paper for ease of explanation, which can be parallelized and accelerated by graphics processing unit according to the proposed layer-scalable mechanism as introduced later.

Fig. 3 shows the signal processing flow of the proposed receiver, where the IC, DD, CE, Rec modules denote the interference cancellation, data detection, channel estimation and signal reconstruction procedure, and Enc, Dec and Mod modules denote the channel encoding, channel decoding and modulation procedure, respectively. As shown in Fig. 3, for i𝑖iitalic_ith outer iteration and l𝑙litalic_lth inner iteration, the inputs of channel estimation model includes the received signal 𝐘l,i1xS×T×Nrsubscriptsuperscript𝐘x𝑙𝑖1superscript𝑆𝑇subscript𝑁r\mathbf{Y}^{\textrm{x}}_{l,i-1}\in\mathbb{C}^{S\times T\times N_{\textrm{r}}}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT which canceled the reconstructed interference of other L1𝐿1L-1italic_L - 1 layers 𝐘^l,i1xS×T×Nr,l{u|1uL,ul}formulae-sequencesubscriptsuperscript^𝐘xsuperscript𝑙𝑖1superscript𝑆𝑇subscript𝑁rsuperscript𝑙conditional-set𝑢formulae-sequence1𝑢𝐿𝑢𝑙\widehat{\mathbf{Y}}^{\textrm{x}}_{l^{\prime},i-1}\in\mathbb{C}^{S\times T% \times N_{\textrm{r}}},l^{\prime}\in\{u|1\leq u\leq L,u\neq l\}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { italic_u | 1 ≤ italic_u ≤ italic_L , italic_u ≠ italic_l }, the reconstructed data of l𝑙litalic_lth layer 𝐃^l,i1S×Tsubscript^𝐃𝑙𝑖1superscript𝑆𝑇\widehat{\mathbf{D}}_{l,i-1}\in\mathbb{Q}^{S\times T}over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_Q start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT, the reconstructed superimposed symbol of l𝑙litalic_lth layer 𝐗^l,i1S×Tsubscript^𝐗𝑙𝑖1superscript𝑆𝑇\widehat{\mathbf{X}}_{l,i-1}\in\mathbb{C}^{S\times T}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT and the pilot of l𝑙litalic_lth layer 𝐏lS×Tsubscript𝐏𝑙superscript𝑆𝑇\mathbf{P}_{l}\in\mathbb{C}^{S\times T}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT, inwhich the reconstructed tensors are obtained in i1𝑖1i-1italic_i - 1th iteration and the interference cancellation can be fomulated as

𝐘l,i1x=𝐘l𝐘^l,i1xsubscriptsuperscript𝐘x𝑙𝑖1𝐘subscriptsuperscript𝑙subscriptsuperscript^𝐘xsuperscript𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}=\mathbf{Y}-\sum_{l^{\prime}}\widehat{\mathbf{Y% }}^{\textrm{x}}_{l^{\prime},i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT = bold_Y - ∑ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_i - 1 end_POSTSUBSCRIPT (7)

where 𝐘S×T×Nr𝐘superscript𝑆𝑇subscript𝑁r\mathbf{Y}\in\mathbb{C}^{S\times T\times N_{\textrm{r}}}bold_Y ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes the raw received signal concatenating Nrsubscript𝑁rN_{\textrm{r}}italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT received signal 𝐘rsubscript𝐘𝑟\mathbf{Y}_{r}bold_Y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT in (1). Beyond the basic information of pilot components in received signal 𝐘l,i1xsubscriptsuperscript𝐘x𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT and pilot 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT for channel estimation, it should be noted that the reconstructed data 𝐃^l,i1subscript^𝐃𝑙𝑖1\widehat{\mathbf{D}}_{l,i-1}over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT and superimposed symbol 𝐗^l,i1subscript^𝐗𝑙𝑖1\widehat{\mathbf{X}}_{l,i-1}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT are regarded as aided information of ‘alternative pilots’, to make the data components in 𝐘l,i1xsubscriptsuperscript𝐘x𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT no longer interfere with channel estimation but can be exploited to enhance the performance of channel estimation. In more detail, with the enhancement of the aided information, the channel can be estimated accroding to three pairs of variables as follows.

  • Pilot components in received signal 𝐘l,i1xsubscriptsuperscript𝐘x𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT and pilot 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, which is the basic information.

  • Data components in received signal 𝐘l,i1xsubscriptsuperscript𝐘x𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT and reconstructed data 𝐃^l,i1subscript^𝐃𝑙𝑖1\widehat{\mathbf{D}}_{l,i-1}over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT, which is the aided information brought by proposed method.

  • Received signal 𝐘l,i1xsubscriptsuperscript𝐘x𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT and superimposed symbol 𝐗^l,i1subscript^𝐗𝑙𝑖1\widehat{\mathbf{X}}_{l,i-1}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT, which is also the aided information brought by proposed method.

By introducing the aided information, it allows for a lower power ratio of pilot α𝛼\alphaitalic_α, as long as the channel estimation using only pilot 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in the first iteration has a certain performance to ignite subsequent iterations since the reconstructed tensors are initialized to all zeros. Data equivalent SNR that is virtually unaffected can hence be guaranteed. Furthermore, even using one of 𝐗^l,i1subscript^𝐗𝑙𝑖1\widehat{\mathbf{X}}_{l,i-1}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT or 𝐃^l,i1subscript^𝐃𝑙𝑖1\widehat{\mathbf{D}}_{l,i-1}over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT to combine 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT as input provides the same amount of information, all three inputs can facilitate model learning during training phase.

Moreover, the estimated channel of l𝑙litalic_lth layer 𝐇^l,iS×T×Nrsubscript^𝐇𝑙𝑖superscript𝑆𝑇subscript𝑁r\widehat{\mathbf{H}}_{l,i}\in\mathbb{C}^{S\times T\times N_{\textrm{r}}}over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and the received signal 𝐘l,i1dS×T×Nrsubscriptsuperscript𝐘d𝑙𝑖1superscript𝑆𝑇subscript𝑁r\mathbf{Y}^{\textrm{d}}_{l,i-1}\in\mathbb{C}^{S\times T\times N_{\textrm{r}}}bold_Y start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are fed into the data detection NN model, where 𝐘l,i1dsubscriptsuperscript𝐘d𝑙𝑖1\mathbf{Y}^{\textrm{d}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT d end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT canceled the reconstructed interference of other L1𝐿1L-1italic_L - 1 layers 𝐘^l,i1xS×T×Nr,l{u|1uL,ul}formulae-sequencesubscriptsuperscript^𝐘xsuperscript𝑙𝑖1superscript𝑆𝑇subscript𝑁rsuperscript𝑙conditional-set𝑢formulae-sequence1𝑢𝐿𝑢𝑙\widehat{\mathbf{Y}}^{\textrm{x}}_{l^{\prime},i-1}\in\mathbb{C}^{S\times T% \times N_{\textrm{r}}},l^{\prime}\in\{u|1\leq u\leq L,u\neq l\}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { italic_u | 1 ≤ italic_u ≤ italic_L , italic_u ≠ italic_l } and the reconstructed pilot interference of l𝑙litalic_lth layer 𝐘^l,i1pS×T×Nrsubscriptsuperscript^𝐘p𝑙𝑖1superscript𝑆𝑇subscript𝑁r\widehat{\mathbf{Y}}^{\textrm{p}}_{l,i-1}\in\mathbb{C}^{S\times T\times N_{% \textrm{r}}}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT by using

𝐘l,i1x=𝐘l𝐘^l,i1x𝐘^l,i1psubscriptsuperscript𝐘x𝑙𝑖1𝐘subscriptsuperscript𝑙subscriptsuperscript^𝐘xsuperscript𝑙𝑖1subscriptsuperscript^𝐘p𝑙𝑖1\mathbf{Y}^{\textrm{x}}_{l,i-1}=\mathbf{Y}-\sum_{l^{\prime}}\widehat{\mathbf{Y% }}^{\textrm{x}}_{l^{\prime},i-1}-\widehat{\mathbf{Y}}^{\textrm{p}}_{l,i-1}bold_Y start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT = bold_Y - ∑ start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_i - 1 end_POSTSUBSCRIPT - over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT (8)

where 𝐘^l,i1psubscriptsuperscript^𝐘p𝑙𝑖1\widehat{\mathbf{Y}}^{\textrm{p}}_{l,i-1}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT is constructed from

𝐘^l,i1p=𝐇^l,i1𝐏lsubscriptsuperscript^𝐘p𝑙𝑖1subscript^𝐇𝑙𝑖1subscriptsuperscript𝐏𝑙\widehat{\mathbf{Y}}^{\textrm{p}}_{l,i-1}=\widehat{\mathbf{H}}_{l,i-1}\circ% \mathbf{P}^{\prime}_{l}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT = over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∘ bold_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT (9)

and 𝐏lS×T×Nrsubscriptsuperscript𝐏𝑙superscript𝑆𝑇subscript𝑁r\mathbf{P}^{\prime}_{l}\in\mathbb{C}^{S\times T\times N_{\textrm{r}}}bold_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT duplicates the pilot tensor 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT for Nrsubscript𝑁rN_{\textrm{r}}italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT times. Note that the MCS information m𝑚mitalic_m is also the input of the data detection model and is used to achieve MCS generalization, which will be explained in detail later. The model output of LLR tensor 𝐕^l,iS×T×Msubscript^𝐕𝑙𝑖superscript𝑆𝑇𝑀\widehat{\mathbf{V}}_{l,i}\in\mathbb{R}^{S\times T\times M}over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_S × italic_T × italic_M end_POSTSUPERSCRIPT can be further obtained, where M𝑀Mitalic_M is the number of bits per symbol according to the modulation order indicated by the configured MCS index m𝑚mitalic_m. In addition to supervision during model training, 𝐕^l,isubscript^𝐕𝑙𝑖\widehat{\mathbf{V}}_{l,i}over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT is also exploited for reconstructing the data and superimposed symbol tensor by using

𝐃^l,i=Mod(Enc(Dec(𝐕^l,i)))subscript^𝐃𝑙𝑖ModEncDecsubscript^𝐕𝑙𝑖\widehat{\mathbf{D}}_{l,i}=\rm{Mod}(\rm{Enc}(\rm{Dec}(\widehat{\mathbf{V}}_{% \it{l,i}})))over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT = roman_Mod ( roman_Enc ( roman_Dec ( over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT ) ) ) (10)

and

𝐗^l,i=1α𝐃^l,i+α𝐏lsubscript^𝐗𝑙𝑖1𝛼subscript^𝐃𝑙𝑖𝛼subscript𝐏𝑙\widehat{\mathbf{X}}_{l,i}=\sqrt{1-\alpha}\widehat{\mathbf{D}}_{l,i}+\sqrt{% \alpha}\mathbf{P}_{l}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT = square-root start_ARG 1 - italic_α end_ARG over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT + square-root start_ARG italic_α end_ARG bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT (11)

where Dec()Dec\rm{Dec}(\cdot)roman_Dec ( ⋅ ), Eec()Eec\rm{Eec}(\cdot)roman_Eec ( ⋅ ) and Mod()Mod\rm{Mod}(\cdot)roman_Mod ( ⋅ ) represent the channel decoding, channel encoding and modulation procedure implemented according to the MCS configuration, respectively. 𝐁^l,i=Dec(𝐕^l,i)S×T×Msubscriptsuperscript^𝐁𝑙𝑖Decsubscript^𝐕lisuperscriptSTM\widehat{\mathbf{B}}^{\prime}_{l,i}=\rm{Dec}(\widehat{\mathbf{V}}_{l,i})\in% \mathbb{R}^{S\times T\times M}over^ start_ARG bold_B end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT = roman_Dec ( over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT roman_l , roman_i end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT roman_S × roman_T × roman_M end_POSTSUPERSCRIPT denotes the received information bits of l𝑙litalic_l layer in i𝑖iitalic_ith iteration. The interference of l𝑙litalic_lth layer for cancellation procedure in i+1𝑖1i+1italic_i + 1 iteration can finally calculated by using

𝐘^l,ix=𝐇^l,i1𝐗^l,isubscriptsuperscript^𝐘x𝑙𝑖subscript^𝐇𝑙𝑖1subscriptsuperscript^𝐗𝑙𝑖\widehat{\mathbf{Y}}^{\textrm{x}}_{l,i}=\widehat{\mathbf{H}}_{l,i-1}\circ% \widehat{\mathbf{X}}^{\prime}_{l,i}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT = over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_l , italic_i - 1 end_POSTSUBSCRIPT ∘ over^ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT (12)

and 𝐗^l,iS×T×Nrsubscriptsuperscript^𝐗𝑙𝑖superscript𝑆𝑇subscript𝑁r\widehat{\mathbf{X}}^{\prime}_{l,i}\in\mathbb{C}^{S\times T\times N_{\textrm{r% }}}over^ start_ARG bold_X end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT duplicates the reconstructed superimposed symbol tensor tensor 𝐗^l,isubscript^𝐗𝑙𝑖\widehat{\mathbf{X}}_{l,i}over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT for Nrsubscript𝑁rN_{\textrm{r}}italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT times.

Algorithm 1 Interference Cancellation based Neural Receiver
   Initialization: 𝐘^l,0x=𝐘^l,0p=𝟎S×T×Nrsubscriptsuperscript^𝐘xsuperscript𝑙0subscriptsuperscript^𝐘psuperscript𝑙0superscript0𝑆𝑇subscript𝑁r\widehat{\mathbf{Y}}^{\textrm{x}}_{l^{\prime},0}=\widehat{\mathbf{Y}}^{\textrm% {p}}_{l^{\prime},0}=\mathbf{0}^{S\times T\times N_{\textrm{r}}}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 0 end_POSTSUBSCRIPT = over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , 0 end_POSTSUBSCRIPT = bold_0 start_POSTSUPERSCRIPT italic_S × italic_T × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝐃^l,0=𝐗^l,0=𝟎S×Tsubscript^𝐃𝑙0subscript^𝐗𝑙0superscript0𝑆𝑇\widehat{\mathbf{D}}_{l,0}=\widehat{\mathbf{X}}_{l,0}=\mathbf{0}^{S\times T}over^ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_l , 0 end_POSTSUBSCRIPT = over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_l , 0 end_POSTSUBSCRIPT = bold_0 start_POSTSUPERSCRIPT italic_S × italic_T end_POSTSUPERSCRIPT, 1lL1superscript𝑙𝐿1\leq l^{\prime}\leq L1 ≤ italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_L and 1lL1𝑙𝐿1\leq l\leq L1 ≤ italic_l ≤ italic_L;
   Input: 𝐘𝐘\mathbf{Y}bold_Y, 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, m𝑚mitalic_m;
   Output: 𝐕^l,Vsubscript^𝐕𝑙𝑉\widehat{\mathbf{V}}_{l,V}over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_l , italic_V end_POSTSUBSCRIPT;
   for i=1V𝑖1𝑉i=1\dots Vitalic_i = 1 … italic_V do
  for l=1L𝑙1𝐿l=1\dots Litalic_l = 1 … italic_L do
     Cancel the data and pilot interference of other L1𝐿1L-1italic_L - 1 layers by using (7);
     Estimate the channel of the l𝑙litalic_lth layer by using the model in section 3.3.4;
     Cancel the data and pilot interference of other L1𝐿1L-1italic_L - 1 layers and pilot interference of the l𝑙litalic_lth layer by using (8);
     Detect the data of the l𝑙litalic_lth layer by using the model in section 3.3.4;
     Reconstruct the pilot interference of the l𝑙litalic_lth layer by using (9);
     Reconstruct the data symbol of the l𝑙litalic_lth layer by using (10);
     Reconstruct the superimposed symbol of the l𝑙litalic_lth layer by using (11);
     Reconstruct the data and pilot interference of the l𝑙litalic_lth layer by using (12);
  end
   end

3.3.2 Layer-Scalable Mechanism

Under multi-layer transmission, the signals from other layers can be regarded as interference for receiving each target layer, so the problems to be solved in each layer are relatively similar. Therefore, the layer-scalable mechanism is implemented in the proposed receiver, where L𝐿Litalic_L layers share same channel estimation and data detection NN models as well as same signal processing flow. The layer scalability can be achieved by proposed layer-common structure since the number of layer only affects the batch size of model inference instead of the inner NN size, where it can be parallelized and accelerated by graphics processing unit conveniently. Meanwhile, since multiple layers share NN structure and parameters, lightweight model are also more friendly to terminal deployment than layer-specific model whose complexity increases with the number of layers increases.

3.3.3 MCS-Scalable Mechanism

The MCS generalization is further addressed in this subsection. The inner NN of proposed data detection model is designed according to the maximum number of bits per symbol Mmaxsubscript𝑀maxM_{\textrm{max}}italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT supported by the system, supplemented by the configured MCS index m𝑚mitalic_m as auxiliary knowledge, resulting a model structure compatibility with multiple MCSs. Note that the MCS index m𝑚mitalic_m is tiled to a tensor 𝐌{m}S×T×1𝐌superscript𝑚𝑆𝑇1\mathbf{M}\in\{m\}^{S\times T\times 1}bold_M ∈ { italic_m } start_POSTSUPERSCRIPT italic_S × italic_T × 1 end_POSTSUPERSCRIPT as input of the NN which facilitates the concatenation of the inputs. After performing feature extraction by NN, the model can proceed a intermediate redundant feature map 𝐕l,iS×T×Mmaxsubscript𝐕𝑙𝑖superscript𝑆𝑇subscript𝑀max\mathbf{V}_{l,i}\in\mathbb{R}^{S\times T\times M_{\textrm{max}}}bold_V start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_S × italic_T × italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. By crop** 𝐕l,isubscript𝐕𝑙𝑖\mathbf{V}_{l,i}bold_V start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT in the third dimension according to M𝑀Mitalic_M, final output of LLR tensor 𝐕^l,iS×T×Msubscript^𝐕𝑙𝑖superscript𝑆𝑇𝑀\widehat{\mathbf{V}}_{l,i}\in\mathbb{R}^{S\times T\times M}over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_S × italic_T × italic_M end_POSTSUPERSCRIPT can be obtained, where M𝑀Mitalic_M is the number of bits per symbol according to the modulation order indicated by the configured MCS m𝑚mitalic_m. After collecting all LLR tensors of L𝐿Litalic_L layers, it can be fed to the following channel decoder.

3.3.4 Model Implementation

Fig. 3 shows the NN structure implementing the channel estimation and data detection models. The well-known ResNet [20] block is utilized wherein double sequential batch normalizations, rectified linear unit (ReLU) activations and two-dimensional convolutional layers (Conv2D) with residual connection are implemented in each block. Since the principle of proposed scalable mechanisms are insensitive to the structure of the feature extraction model, other flexible implementations can be effectively employed such as multi-layer perceptrons mixer [21] and Transformer [22]. The hyperparameter settings of the model are given in Table 1 in the simulation part of Section 4.

3.4 Framework of Proposed Scheme

By combining the above mechanisms, the proposed receiver can be summarized in Algorithm 1. For the sake of simplicity, the Algorithm 1 mainly presents stem of the proposed receiver, which helps readers understand the macro framework of the proposed receiver. Finally, the total framework can be formulated as

minΘ^ce,Θ^dd1ViV{τbce(𝐁~,𝐕^i)+(1τ)mse(𝐇,𝐇^i)}s.t.𝐕^i,𝐇^i=f^(𝐘,𝐏,L,m;Θ^ce,Θ^dd)) 1iV\begin{split}\min_{\hat{\Theta}_{\textrm{ce}},\hat{\Theta}_{\textrm{dd}}}&\ % \frac{1}{V}\sum_{i}^{V}\left\{\tau\mathcal{L}_{\textrm{bce}}(\widetilde{% \mathbf{B}},\widehat{\mathbf{V}}_{i})+(1-\tau)\mathcal{L}_{\textrm{mse}}(% \mathbf{H},\widehat{\mathbf{H}}_{i})\right\}\\ \rm{s.t.}&\ \widehat{\mathbf{V}}_{i},\widehat{\mathbf{H}}_{i}=\hat{f}(\mathbf{% Y},\mathbf{P},L,m;\hat{\Theta}_{\textrm{ce}},\hat{\Theta}_{\textrm{dd}}))\\ &\ 1\leq i\leq V\end{split}start_ROW start_CELL roman_min start_POSTSUBSCRIPT over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT ce end_POSTSUBSCRIPT , over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT dd end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG italic_V end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_V end_POSTSUPERSCRIPT { italic_τ caligraphic_L start_POSTSUBSCRIPT bce end_POSTSUBSCRIPT ( over~ start_ARG bold_B end_ARG , over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + ( 1 - italic_τ ) caligraphic_L start_POSTSUBSCRIPT mse end_POSTSUBSCRIPT ( bold_H , over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } end_CELL end_ROW start_ROW start_CELL roman_s . roman_t . end_CELL start_CELL over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over^ start_ARG italic_f end_ARG ( bold_Y , bold_P , italic_L , italic_m ; over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT ce end_POSTSUBSCRIPT , over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT dd end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL 1 ≤ italic_i ≤ italic_V end_CELL end_ROW (13)

where bcesubscriptbce\mathcal{L}_{\textrm{bce}}caligraphic_L start_POSTSUBSCRIPT bce end_POSTSUBSCRIPT and msesubscriptmse\mathcal{L}_{\textrm{mse}}caligraphic_L start_POSTSUBSCRIPT mse end_POSTSUBSCRIPT denote the binary crossentropy and mean square error loss function, respectively, τ𝜏\tauitalic_τ denotes the weights of loss functions. 𝐁~{0,1}S×T×L×M~𝐁superscript01𝑆𝑇𝐿𝑀\widetilde{\mathbf{B}}\in\{0,1\}^{S\times T\times L\times M}over~ start_ARG bold_B end_ARG ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_S × italic_T × italic_L × italic_M end_POSTSUPERSCRIPT and 𝐇S×T×L×Nr𝐇superscript𝑆𝑇𝐿subscript𝑁r\mathbf{H}\in\mathbb{C}^{S\times T\times L\times N_{\textrm{r}}}bold_H ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_L × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represent the original encoded bits and ideal channel, respectively. 𝐕^iS×T×L×Msubscript^𝐕𝑖superscript𝑆𝑇𝐿𝑀\widehat{\mathbf{V}}_{i}\in\mathbb{R}^{S\times T\times L\times M}over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_S × italic_T × italic_L × italic_M end_POSTSUPERSCRIPT and 𝐇^iS×T×L×Nr\widehat{\mathbf{H}}_{i}\in\in\mathbb{C}^{S\times T\times L\times N_{\textrm{r% }}}over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_L × italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represent the LLR and estimated channel collecting 𝐕^l,isubscript^𝐕𝑙𝑖\widehat{\mathbf{V}}_{l,i}over^ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT and 𝐇^l,isubscript^𝐇𝑙𝑖\widehat{\mathbf{H}}_{l,i}over^ start_ARG bold_H end_ARG start_POSTSUBSCRIPT italic_l , italic_i end_POSTSUBSCRIPT of L𝐿Litalic_L layers, respectively. 𝐏S×T×L𝐏superscript𝑆𝑇𝐿\mathbf{P}\in\mathbb{C}^{S\times T\times L}bold_P ∈ blackboard_C start_POSTSUPERSCRIPT italic_S × italic_T × italic_L end_POSTSUPERSCRIPT denotes pilot tensor collecting 𝐏lsubscript𝐏𝑙\mathbf{P}_{l}bold_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT of L𝐿Litalic_L layers. f^()^𝑓\hat{f}(\cdot)over^ start_ARG italic_f end_ARG ( ⋅ ), Θ^cesubscript^Θce\hat{\Theta}_{\textrm{ce}}over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT ce end_POSTSUBSCRIPT and Θ^ddsubscript^Θdd\hat{\Theta}_{\textrm{dd}}over^ start_ARG roman_Θ end_ARG start_POSTSUBSCRIPT dd end_POSTSUBSCRIPT denote the proposed receiver and corresponding NN parameters of channel estimation and data detection model, respectively.

Compared with existing methods, the proposed framework is capable of supporting multi-layer transmission of SIP with practicality and scalability. Challenges mentioned in Section 3.1 are well addressed.

Table 1: Basic simulation parameters
Parameter Value
Carrier frequency 4GHz
Subcarrier spacing 30KHz
PRB number 8
Subcarrier number S𝑆Sitalic_S 96
OFDM symbol number1 T𝑇Titalic_T 12
Tx antennas Ntsubscript𝑁tN_{\textrm{t}}italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT 32, 4
Rx antennas Nrsubscript𝑁rN_{\textrm{r}}italic_N start_POSTSUBSCRIPT r end_POSTSUBSCRIPT 4
Channel model CDL
Channel coding scheme LDPC
Delay spread Dssubscript𝐷sD_{\textrm{s}}italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT 100 ns, 300 ns
UE speed Cuesubscript𝐶ueC_{\textrm{ue}}italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT 3-900 km/h
Optimizer Adam
Training steps 2.5 ×105absentsuperscript105\times 10^{5}× 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
Training samples number 4 ×106absentsuperscript106\times 10^{6}× 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT
Weights of loss functions τ𝜏\tauitalic_τ 0.5
Convolution filter number Dcesubscript𝐷ceD_{\textrm{ce}}italic_D start_POSTSUBSCRIPT ce end_POSTSUBSCRIPT and Dddsubscript𝐷ddD_{\textrm{dd}}italic_D start_POSTSUBSCRIPT dd end_POSTSUBSCRIPT 128
Convolution kernel size 3×\times×3
ResNet block number Nblock,cesubscript𝑁block,ceN_{\textrm{block,ce}}italic_N start_POSTSUBSCRIPT block,ce end_POSTSUBSCRIPT and Nblock,ddsubscript𝑁block,ddN_{\textrm{block,dd}}italic_N start_POSTSUBSCRIPT block,dd end_POSTSUBSCRIPT 10

1 Two control symbols among 14 symbols in a slot are excluded.

4 Simulation Results

In this section, numerical results of our proposed F-SIP with scalable neural receiver (Proposed) and two baselines are presented. Specifically, the standardized technology in 5G NR system, i.e., orthogonal pilots of 5G NR in Fig. 1 with LMMSE channel estimation and data detection is used as a baseline (Baseline I), wherein the covariance matrix for LMMSE channel estimation is calculated over 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT channel samples. It can reflect the gain compared with the existing system design, providing strong simulation result guidance for subsequent application implementation and standardization work. In addition, the state-of-art method from academia depicted in (2) with two-sided model and trainable SIP [16] is also compared as another baseline (Baseline II). The proposed solution provides some improvement methods for a series of problems that the Baseline II does not solve. Therefore, the gain reflected by comparing with this representative SIP solution can illustrate the advancedness of our solution well. The clustered delay line (CDL) channel model is considered here, which has been widely utilized for link-level evaluation in 3GPP [23]. Some basic simulation parameters are listed in Table 1. The power ratio for F-SIP is set as α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 and number of iterations V=3𝑉3V=3italic_V = 3 if there is no special declaration. Note that the setting of the hyperparameters of the model in the simulation is based on the trade-off between performance and complexity, which is more in line with practical application. MCS is set as m=7𝑚7m=7italic_m = 7 if there is no special declaration, where the modulation scheme is 2M=16superscript2𝑀162^{M}=162 start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT = 16 quadrature amplitude modulation (QAM) and target coderate is 490/1024. Except for open loop precoder cycling using Type I codebook [18] in high-speed scenario, singular value decomposition (SVD) precoding is used in other scenarios. During the training phase, each training sample is obtained through the channel sampled from CDL model with random SNR of 2025similar-to2025-20\sim 25- 20 ∼ 25 dB where random SNR in training phase in this paper brings the generalization of SNR of one receiver. Therefore, there is no need to train a specific model for each specific SNR, which is convenient for actual deployment.

4.1 Effectiveness and Outperformance

4.1.1 Comparison in Low Speed Scenario

The link-level BLER performance comparison in low speed scenario of CDL-C channel is presented in Fig. 4. It can be noticed that the proposed F-SIP with only α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 achieve effective BLER performance indicating that the neural receiver can processes the F-SIP well. Specifically, proposed framework outperforms the Baseline II of trainable SIP. This reveals the advantages of proposed aided information and interference cancellation mechanisms under multi-layer transmission while greatly simplifying the system design. The performance at V=3𝑉3V=3italic_V = 3 is better than that at V=1𝑉1V=1italic_V = 1, indicating the improvement brought by proposed interference cancellation and superimposed symbol aided channel estimation. Moreover, the F-SIP with α=0𝛼0\alpha=0italic_α = 0 does not work well, indicating that even with a small power ratio, e.g., α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, pilot is necessary for channel estimation in the neural receiver. Proposed F-SIP with α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 is comparable with the Baseline I of traditional orthogonal pilot with Np=1subscript𝑁p1N_{\textrm{p}}=1italic_N start_POSTSUBSCRIPT p end_POSTSUBSCRIPT = 1. This indicats that non-orthogonal pilot bring almost no performance loss, and the proposed method can transmit more effective information bits under same coding rate resulting in better throughput performance, which will be detailed later.

Refer to caption
Figure 4: BLER performance comparison in low speed scenario (Cue=3subscript𝐶ue3C_{\textrm{ue}}=3italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 3 km/h, Ds=300subscript𝐷s300D_{\textrm{s}}=300italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 300 ns, L=4𝐿4L=4italic_L = 4 and Nt=32subscript𝑁t32N_{\textrm{t}}=32italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 32).
Refer to caption
Figure 5: Throughtput performance comparison in low speed scenario (Cue=3subscript𝐶ue3C_{\textrm{ue}}=3italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 3 km/h, Ds=300subscript𝐷s300D_{\textrm{s}}=300italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 300 ns, L=4𝐿4L=4italic_L = 4 and Nt=32subscript𝑁t32N_{\textrm{t}}=32italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 32).

The throughput comparison in low speed scenario of CDL-C channel is provided in Fig. 5, where the throughput R𝑅Ritalic_R is defined as

R=NslotNREΩγM(1BLER)𝑅subscript𝑁slotsubscript𝑁REΩ𝛾𝑀1𝐵𝐿𝐸𝑅\begin{split}R=N_{\textrm{slot}}N_{\textrm{RE}}\Omega\gamma M(1-BLER)\end{split}start_ROW start_CELL italic_R = italic_N start_POSTSUBSCRIPT slot end_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT RE end_POSTSUBSCRIPT roman_Ω italic_γ italic_M ( 1 - italic_B italic_L italic_E italic_R ) end_CELL end_ROW (14)

wherein NRE=STLsubscript𝑁RE𝑆𝑇𝐿N_{\textrm{RE}}=STLitalic_N start_POSTSUBSCRIPT RE end_POSTSUBSCRIPT = italic_S italic_T italic_L denotes the number of REs forming a slot, Nslotsubscript𝑁slotN_{\textrm{slot}}italic_N start_POSTSUBSCRIPT slot end_POSTSUBSCRIPT denotes the number of slot per second, ΩΩ\Omegaroman_Ω denotes ratio of REs carrying data symbols, γ𝛾\gammaitalic_γ and M𝑀Mitalic_M are the target coderate and number of bits per symbol according to selected MCS, respectively. For orthogonal pilot patterns in Baseline I, some REs are reserved for pilot transmission. Thus we have Ω=11/12Ω1112\Omega=11/12roman_Ω = 11 / 12 for Baseline I in 3km/h, while other methods are with Ω=1Ω1\Omega=1roman_Ω = 1. Obviously, the proposed method with α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 and V=3𝑉3V=3italic_V = 3 achieves higher throughput compared with Baseline I. Moreover, we find that our proposed method with pre-set α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 and V=3𝑉3V=3italic_V = 3 can achieve comparable throughput with Baseline II, which demonstrates that one-sided model at only UE side is capable of ensuring the performance with more flexible model management procedure compared with two-sided counterpart.

Refer to caption
Figure 6: BLER performance comparison in high speed scenario (Cue=300/900subscript𝐶ue300900C_{\textrm{ue}}=300/900italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 300 / 900 km/h, Ds=100subscript𝐷s100D_{\textrm{s}}=100italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 100 ns, L=2𝐿2L=2italic_L = 2 and Nt=4subscript𝑁t4N_{\textrm{t}}=4italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 4).

4.1.2 Comparison in High Speed Scenario

The BLER and throughput performance comparison in high speed scenario with CDL-D extension channel [24] are depicted in Fig. 6 and 7, respectively, where Np=4subscript𝑁p4N_{\textrm{p}}=4italic_N start_POSTSUBSCRIPT p end_POSTSUBSCRIPT = 4 for Baseline I is necessarily configured to estimate channels with strong time-varying characteristics. Thus we have Ω=10/12Ω1012\Omega=10/12roman_Ω = 10 / 12, while proposed methods are with Ω=1Ω1\Omega=1roman_Ω = 1. Generally, the proposed method achieves higher throughput compared with Baseline I in high speed scenario since the pilot overhead can be avoided. Taking SNR = 25 dB as an example, gains of 19.98% and 24.45% can be obtained in scenarios of 300 km/h and 900 km/h, respectively. Moreover, proposed method in extremely high-speed scenarios of 900 km/h has significant performance advantages from the perspective of BLER, as all available time domain and frequency domain resources have pilot distribution. The LMMSE in Baseline I can realize accurate channel estimation based on covariance matrix from 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT channel samples under 300 km/h, while the channel estimation error resulted from interpolation in time domain grows extremely large when 900 km/h. As comparison, our proposed method is capable of performing accurate joint channel estimation and data detection by exploiting the statistical relationship between pilots and data symbols on all REs.

Refer to caption
Figure 7: Throughtput performance comparison in high speed snenario (Cue=300/900subscript𝐶ue300900C_{\textrm{ue}}=300/900italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 300 / 900 km/h, Ds=100subscript𝐷s100D_{\textrm{s}}=100italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 100 ns, L=2𝐿2L=2italic_L = 2 and Nt=4subscript𝑁t4N_{\textrm{t}}=4italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 4).
Refer to caption
Figure 8: Generalization study of proposed scalable neural receiver for different MCSs (Cue=3subscript𝐶ue3C_{\textrm{ue}}=3italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 3 km/h, Ds=300subscript𝐷s300D_{\textrm{s}}=300italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 300 ns, L=2𝐿2L=2italic_L = 2 and Nt=32subscript𝑁t32N_{\textrm{t}}=32italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 32).

4.2 Generalizability and Scalability

4.2.1 Scalability for Different MCSs

The scalability performance on different MCSs of our proposed scalable neural receiver in CDL-C channel is presented in Fig. 8. Here the MCS m𝑚mitalic_m={3, 7, 14} are selected with corresponding modulation order as {QPSK, 16QAM, 64QAM} and coderate as {449/1024, 490/1024, 719/1024}, respectively. Our proposed scalable neural receiver (Mixed) is trained on the mixed datasets with m={3,7,14}𝑚3714m=\{3,7,14\}italic_m = { 3 , 7 , 14 }, with the model implementation of Mmax=6subscript𝑀max6M_{\textrm{max}}=6italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = 6. While the compared specific models (Specific) are implemented and trained on its own single MCS without using proposed scalable mechanisms. It can be noticed that our proposed scalable neural receiver can achieve comparable performance with specific counterparts, which validates its excellent scalability performance on and MCSs.

Refer to caption
Figure 9: Generalization study of proposed scalable neural receiver for different number of layers (Cue=3subscript𝐶ue3C_{\textrm{ue}}=3italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 3 km/h, Ds=300subscript𝐷s300D_{\textrm{s}}=300italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 300 ns, L=2𝐿2L=2italic_L = 2 and Nt=32subscript𝑁t32N_{\textrm{t}}=32italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 32).

4.2.2 Scalability for Different Number of Layers

The scalability performance on different number of layers in CDL-C channel is studied in Fig. 9, wherethe number of layers is evaluated with L={2,4}𝐿24L=\{2,4\}italic_L = { 2 , 4 }. The proposed scalable neural receiver (Mixed) is trained on the mixed datasets with L={2,4}𝐿24L=\{2,4\}italic_L = { 2 , 4 }. While the compared specific models (Specific) are implemented and trained on its own single layers. Obviously, comparable performance is obtained, indicating the practical deployment-friendly scalability for different layers can be achieved.

Refer to caption
Figure 10: Generalization study of proposed scalable neural receiver for different channel models (Cue=3subscript𝐶ue3C_{\textrm{ue}}=3italic_C start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 3 km/h, L=2𝐿2L=2italic_L = 2 and Nt=32subscript𝑁t32N_{\textrm{t}}=32italic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 32).

4.2.3 Generalization for Different Channel Enviroments

To further explore possibility in practical deployment, generalization study in different channel enviroment is also provided in Fig. 10, where configuring more channel setting to the training dataset can further obtain the generalization of the mixed channel model [25]. The proposed neural receiver (Mixed) is trained on the mixed datasets of CDL-A/C with Ds=30/300subscript𝐷s30300D_{\textrm{s}}=30/300italic_D start_POSTSUBSCRIPT s end_POSTSUBSCRIPT = 30 / 300 ns and tested on the corresponding target channel. While the compared specific models (Specific) are trained on the target channel model. It can be seen that proposed receiver still achieves comparable BLER performance, which exhibits the excellent generalization performance in practical deployment when facing different channels.

4.3 Computational and Storage Complexity

The computational and storage complexity evaluation is also studied. First, from the perspective of simulation, an evaluation of the running time of the model (Mmax=6subscript𝑀max6M_{\rm{max}}=6italic_M start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 6) on single NVIDIA A100 SXM 80 is provided. By processing 1.28×1051.28superscript1051.28\times 10^{5}1.28 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT transport blocks (TB), the averaged computation time for each inner iteration is about 0.5 milliseconds. Moreover, from the perspective of analysis, the complexity of proposed receiver mainly lies in the channel estimation and data detection model which is far beyond the complexity of interference reconstruction and cancellation. Therefore we provide the floating point operations (FLOPs) and trainable parameters evaluation of one iteration. Firstly, the channel estimation model brings 2.9802×STL×\times STL\times× italic_S italic_T italic_L ×106 FLOPs with 2.9777×\times×106 parameters, where the computational and storage complexity are not affected by Mmaxsubscript𝑀maxM_{\textrm{max}}italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and L𝐿Litalic_L, respectively. The complexity of data detection model is provided in Table 2. It can also be noticed that the computational complexity increases with the increase of Mmaxsubscript𝑀maxM_{\textrm{max}}italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT, T𝑇Titalic_T, S𝑆Sitalic_S and L𝐿Litalic_L, where the influence of Mmaxsubscript𝑀maxM_{\textrm{max}}italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT is relatively negligible compared with the complexity of the model itself. Moreover, not only does Mmaxsubscript𝑀maxM_{\textrm{max}}italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT have a slight impact on storage complexity, but the number of transmission layers L𝐿Litalic_L does not have an impact on storage complexity of proposed layer-common structure. While the storage complexity of the the layer-specific model is expanded L𝐿Litalic_L times since different layers use different structures and parameters. These imply the feasibility of deployment of proposed scalable receiver.

Table 2: Evaluation of FLOPs and number of trainable parameters of data detection model.
Mmaxsubscript𝑀maxM_{\textrm{max}}italic_M start_POSTSUBSCRIPT max end_POSTSUBSCRIPT FLOPs (×STL×106absent𝑆𝑇𝐿superscript106\times STL\times 10^{6}× italic_S italic_T italic_L × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT) Parameters (×106absentsuperscript106\times 10^{6}× 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT)
2222 2.9813 2.9788
4444 2.9836 2.9811
6666 2.9859 2.9834

5 Standardization Potential and Prospects

Starting from 3GPP release 18, the study item of ‘Artificaial Intelligence / Machine Learning for NR Interface’ introduces the DL-based solutions into the physical layer of communication system. Some system design restrictions can be further relaxed using those DL-based approaches, which also makes it possible to explore new forms of reference signal in the subsequent 6G research such as different learnable sequence and pattern as well as introduction of non-orthogonality.

According to 3GPP’s work plan about DL-based solutions from 5G-advanced to 6G, the performance gain, overhead reduction, scenario generalization, storage and computational complexity, life cycle management (LCM) [17] and potential standardization impact need to be studied. Therefore, the DL-based pilot solutions in existing research also need to address some corresponding challenges to meet the practical requirements and follow a standardized route for 6G, namely i) maintaining the throughput gain in more complex environments such as high-speed scenarios or multi-layer transmission, ii) achieving a lower or zero overhead of reference signal, iii) kee** a lower complexity to adapt to terminal deployment, iv) generalizing to different scenarios or system configurations, and vi) designing the simple framework without cumbersome LCM procedure.

The solution proposed in this artical involves multiple novel mechanisms design to solve the above challenges from the perspective of throughput, overhead, generalization, scalability, flexibility and complexity, making the SIP compliant with standardization and practical deployment. In future work, it is meaningful to further study the effectiveness and complexity in more practical scenarios before SIP is standardized. These will also bring more diversity and space for the redesign of various reference signal in future 6G intelligent system.

6 Conclusion

In this paper, an interference cancellation based neural receiver for SIP in multi-layer transmission is proposed, which involves multiple novel mechanisms design to face the challenges of multi-layer transmission and practical deployment. Specifically, considering the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is utilized in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. Moreover, to deal with the complexity issue for inter-vendor collaboration and the generalization problem for practical deployments, respectively, a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers are also proposed. Simulation results demonstrate the superiority of the proposed scheme from the perspective of BLER and throughput compared with existing counterparts.

References

  • 3GPP [2022] 3GPP. 3GPP TS 38.211 v17.2.0, 3rd Generation Partnership Project; technical specification group radio access network; NR; physical channels and modulation (release 17)[J]. Tech. Rep., 2022.
  • Tang et al. [2021, Available: https:] TANG H, YANG N, ZHANG Z, et al. 5G NR and enhancements: From R15 to R16[J]. Elsevier, 2021, Available: https://www.amazon.com/5G-NR-Enhancements-R15-R16-ebook/dp/B09KLMV5HX.
  • Series [2015] SERIES M. IMT Vision-framework and overall objectives of the future development of IMT for 2020 and beyond[J]. Recommendation ITU, 2015, 2083: 0.
  • Wang et al. [2023] WANG C X, YOU X, GAO X, et al. On the road to 6g: Visions, requirements, key technologies and testbeds[J]. IEEE Communications Surveys & Tutorials, 2023.
  • Alsabah et al. [2021] ALSABAH M, NASER M A, MAHMMOD B M, et al. 6g wireless communications networks: A comprehensive survey[J]. Ieee Access, 2021, 9: 148191-148243.
  • Guo et al. [2022] GUO J, WEN C K, JIN S, et al. Overview of deep learning-based CSI feedback in massive MIMO systems[J]. IEEE Transactions on Communications, 2022, 70(12): 8017-8045.
  • Hoydis et al. [2021] HOYDIS J, AOUDIA F A, VALCARCE A, et al. Toward a 6G AI-native air interface[J]. IEEE Communications Magazine, 2021, 59(5): 76-81.
  • 3GPP [2023] 3GPP. 3GPP TR 38.843 v0.2.0, 3rd Generation Partnership Project; technical specification group radio access network; study on artificial intelligence (AI)/machine learning (ML) for NR air interface (release 18)[J]. Tech. Rep., 2023.
  • Ma et al. [2020] MA X, GAO Z. Data-driven deep learning to design pilot and channel estimator for massive MIMO[J]. IEEE Transactions on Vehicular Technology, 2020, 69(5): 5677-5682.
  • Sohrabi et al. [2021] SOHRABI F, ATTIAH K M, YU W. Deep learning for distributed channel feedback and multiuser precoding in fdd massive mimo[J]. IEEE Transactions on Wireless Communications, 2021, 20(7): 4044-4057.
  • Chun et al. [2019] CHUN C J, KANG J M, KIM I M. Deep learning-based joint pilot design and channel estimation for multiuser MIMO channels[J]. IEEE Communications Letters, 2019, 23(11): 1999-2003.
  • Xu et al. [2019] XU J, ZHU P, LI J, et al. Deep learning-based pilot design for multi-user distributed massive MIMO systems[J]. IEEE Wireless Communications Letters, 2019, 8(4): 1016-1019.
  • Soltani et al. [2020] SOLTANI M, POURAHMADI V, SHEIKHZADEH H. Pilot pattern design for deep learning-based channel estimation in OFDM systems[J]. IEEE Wireless Communications Letters, 2020, 9(12): 2173-2176.
  • Mashhadi et al. [2021] MASHHADI M B, GÜNDÜZ D. Pruning the pilots: Deep learning-based pilot design and channel estimation for MIMO-OFDM systems[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6315-6328.
  • Hoeher et al. [1999] HOEHER P, TUFVESSON F. Channel estimation with superimposed pilot sequence[C]//Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM’99.(Cat. No. 99CH37042): volume 4. [S.l.]: IEEE, 1999: 2162-2166.
  • Aoudia et al. [2021] AOUDIA F A, HOYDIS J. End-to-end learning for OFDM: From neural receivers to pilotless communication[J]. IEEE Transactions on Wireless Communications, 2021, 21(2): 1049-1063.
  • Chen et al. [2023] CHEN W, LIN X, LEE J, et al. 5G-advanced toward 6G: Past, present, and future[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(6): 1592-1619.
  • 3GPP [2022] 3GPP. 3GPP TS 38.214 v17.2.0 3rd Generation Partnership Project; technical specification group radio access network; NR; physical layer procedures for data (release 17)[J]. Tech. Rep., 2022.
  • Mohsan et al. [2023] MOHSAN S A H, LI Y, SHVETSOV A V, et al. A survey of deep learning based noma: State of the art, key aspects, open challenges and future trends[J]. Sensors, 2023, 23(6): 2946.
  • He et al. [2016] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. [S.l.: s.n.], 2016: 770-778.
  • Tolstikhin et al. [2021] TOLSTIKHIN I O, HOULSBY N, KOLESNIKOV A, et al. Mlp-mixer: An all-mlp architecture for vision[J]. Advances in neural information processing systems, 2021, 34: 24261-24272.
  • Devlin et al. [2018] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
  • 3GPP [2020] 3GPP. 3GPP TR 38.901 v16.1.0 3rd generation partnership project; technical specification group radio access network; study on channel model for frequencies from 0.5 to 100 GHz (release 16)[J]. Tech Rep, 2020.
  • 3GPP [2020] 3GPP. R1-2007201, summary of ai: 8.1.2.4 enhancements on hst-sfn deployment[J]. 2020.
  • Liu et al. [2021] LIU W, TIAN W, XIAO H, et al. Evcsinet: Eigenvector-based csi feedback under 3gpp link-level channels[J]. IEEE Wireless Communications Letters, 2021, 10(12): 2688-2692.