Next-slot OFDM-CSI Prediction:
Multi-head Self-attention or State Space Model?

Mohamed Akrout, , Faouzi Bellili, ,
Amine Mezghani, , Robert W. Heath
The authors are with the Department of Electrical and Computer Engineering at the University of Manitoba, Winnipeg, MB, Canada (emails:[email protected], {Faouzi.Bellili,Amine.Mezghani}@umanitoba.ca). R. W. Heath is at the University of California, San Diego (email: [email protected]). This work was supported by the Discovery Grants Program of the Natural Sciences and Engineering Research Council of Canada (NSERC) and the US National Science Foundation (NSF) Grant No. ECCS-1711702 and CNS-1731658.
Abstract

The ongoing fifth-generation (5G) standardization is exploring the use of deep learning (DL) methods to enhance the new radio (NR) interface. Both in academia and industry, researchers are investigating the performance and complexity of multiple DL architecture candidates for specific one-sided and two-sided use cases such as channel state estimation (CSI) feedback, CSI prediction, beam management, and positioning. In this paper, we set focus on the CSI prediction task and study the performance and generalization of the two main DL layers that are being extensively benchmarked within the DL community, namely, multi-head self-attention (MSA) and state-space model (SSM). We train and evaluate MSA and SSM layers to predict the next slot for uplink and downlink communication scenarios over urban microcell (UMi) and urban macrocell (UMa) OFDM 5G channel models. Our numerical results demonstrate that SSMs exhibit better prediction and generalization capabilities than MSAs only for SISO cases. For MIMO scenarios, however, the MSA layer outperforms the SSM one. While both layers represent potential DL architectures for future DL-enabled 5G use cases, the overall investigation of this paper favors MSAs over SSMs.

Index Terms:
CSI prediction, OFDM, slot, 3GPP channel models, multi-head self-attention, state space models.

I Introduction

I-A Background and motivation

Generative artificial intelligence (GenAI) has recently emerged as a result of research advancements in deep learning (DL), with a promising potential to transform the technological future across numerous areas. Specifically, large language models (LLMs) and large multi-modal models (LMMs) developed within the field of natural language processing and computer vision research communities are driving innovation by enhancing automation, language translation services, and human-computer interaction (cf. [1] for a comprehensive overview). While GenAI is being progressively adopted by different industries, some research studies at the intersection of DL and wireless communication proposed the use of LLMs as part of self-organizing networks (SONs) [2]. These networks are expected to be highly autonomous and adaptive as they continuously optimize their functions and parameters depending on the communication conditions and user demands. To accommodate such high flexibility, GenAI for wireless communication comes into play as a key technology to generate personalized communication parameters according to network patterns and KPIs learned from massive Telecom datasets. Such AI generation can target estimation or prediction of parameters pertaining to either the physical or the network layer depending on the nature of the collected datasets and the considered downstream tasks at hand.

In this context, one of the key AI use cases considered in the recent 3rd Generation Partnership Project (3GPP) 17 and 18 releases is the channel state information (CSI) prediction [3]. An important issue with the current CSI reporting system in the new radio (NR) interface is the delay between the CSI’s reporting time and the moment the CSI is actually used. This delay makes the CSI outdated due to the channel time variations. The rate at which the CSI loses its relevance depends on the channel properties and is amplified by the speed of user equipment. To address this challenge, both model-based and learning-based channel prediction techniques leverage historical CSI correlations to forecast future channel conditions and/or realizations. To capture the dynamic behavior of the channel, model-based methods employ linear extrapolation [4], sum-of-sinusoids [5], and autoregressive models [6] (see [7] for a comprehensive overview). Due to their low complexity, learning-based approaches using deep neural networks (DNNs) stood out in the 3GPP’s 5G standard discussions as a promising low-complexity strategy to predict the channel and mitigate the impact of outdated CSI. Indeed, when the channel blockage model is not available, model-based methods cannot accurately capture the large number of blockage possibilities. Such a setting is equivalent to having a non-stationary channel whose transitions can not be predicted well using linear methods.

I-B Related work

Many standard architectures of DNNs have been investigated for multiple-input-multiple-output (MIMO) channel predictions. Multi-layer perception (MLP) was used in [8] to rely on the uplink CSI to predict the downlink one under the assumption of a direct user-channel matrix relationship, which is not always applicable. Convolutional neural networks (CNNs) and long short-term memory (LSTM) networks were employed in [9], yielding notable prediction performance compared to traditional methods like maximum likelihood and minimum mean squared error (MMSE). To improve upon this, recurrent neural networks (RNNs) were combined with CNNs for feature extraction, outperforming standalone CNNs for channel prediction [10]. Because the channel is complex-valued and CNNs are designed for real-valued processing only, a complex-valued 3D CNN was proposed for CSI prediction in [11], improving the CSI prediction accuracy of real-valued networks. Graph neural networks (GNNs) have also been applied to CSI prediction as a multivariate time-series forecasting problem [12] by exploiting the spectral and temporal correlations of the historical CSI. To mitigate the sequential processing nature of LSTM networks, transformers rely on the attention mechanism to process entire sequences of input data in parallel, thereby significantly reducing training times and enabling the model to scale with the amount of data more effectively [13]. When applied to CSI prediction, transformers outperformed all other DDN architecture in terms of both mean square error and achievable rate [14].

The collaborative development of 3GPP standard involving research bodies and industry stakeholders is currently investigating the performance of multiple DNN models in terms of floating point operations per second (FLOPS) and memory complexity. Because the CSI prediction problem follows a one-sided model (i.e., inference can be conducted by one side: either the base station or the UE)111This is to be opposed to two-sided models in the 3GPP standard where the first part of the inference runs on the UE side and the second part runs on the base station side or vice-versa (e.g., encoder-decoder models)., the performance of deployable models depends heavily on specific UE/base station vendors’ hardware. For this reason, it is of utmost importance in practice to investigate the prediction capability of AI layers and avoid stacking dozens of them with the only goal of claiming state-of-the-art performance at the cost of FLOPS and memory complexity. From a wireless communication perspective (i.e., non-language application), the AI models in the 3GPP standard should benefit from the recent architectures of AI layers which have proven effective for GenAI models, namely, the multi-head self-attention (MSA) layer [13] and the state space model (SSM) layer [15]. To see why this is possible, Fig. 1 depicts the striking similarity between CSI prediction in orthogonal frequency division multiplexing (OFDM) systems and next token prediction for LLMs. Specifically, K𝐾Kitalic_K input word embeddings obtained after tokenization and present at the position interval p[TK,T]𝑝𝑇𝐾𝑇p\in[T-K,T]italic_p ∈ [ italic_T - italic_K , italic_T ] are analogous to K𝐾Kitalic_K input CSI at the time interval t[TK,T]𝑡𝑇𝐾𝑇t\in[T-K,T]italic_t ∈ [ italic_T - italic_K , italic_T ]. In other words, the equivalence becomes more apparent when the token positions are substituted by the CSI (e.g., OFDM slot) position in time.

Refer to caption
(a) CSI prediction
Refer to caption
(b) Next-token prediction
Fig. 1: The input-output similarity of a neural network between (a) CSI prediction and (b) next token prediction.

The AI community is currently benchmarking these two layers for LLMs [16], vision-related tasks including classification of images [17] and videos [18], and graph-related tasks [19], just to name a few. We believe it is also timely that the communication community examines how these layers can be leveraged for CSI prediction. If the vision of AI for wireless is to be core component in future-generation communication systems rather than a specific functionality among other ones, it is important to understand the capabilities of AI layers in terms of both performance and FLOPS as well as area and power consumption metrics when AI models are implemented on FPGAs or ASICs. This is particularly important to pursue because the 3GPP discussions are still ongoing and no final decision about the AI models have been made yet.

I-C Contribution

We study the prediction capabilities of MSA and SSM layers for CSI prediction at the UE side as a one-sided model defined in the 3GPP standard. Different from the aforementioned work, our goal is to neither beat other DNN architectures nor obtain state-of-the-art results by cascading DNN layers at the cost of higher FLOPS. For this reason, we set focus on shadow networks with either one single MSA or SSM layer and examine their CSI predictive ability. By doing so, we provide insights into the performance of MSA and SSM layers and their competitiveness to be considered by the industry among the deployable AI models on UE devices. Toward this goal, we first define the task of predicting the next-slot OFDM-CSI and describe the parameters of the 5G wireless channels used for training and evaluation, namely, urban microcell (UMi) and urban macrocell (UMa). We then conduct an exhaustive empirical comparison between MSA and SSM layers for both in-distribution (ID) and out-of-distribution (OOD) evaluations as a function of the SNR and speed of the UE. This is because rigorous investigations of AI models must examine the trade-off between generalization and accuracy [20, 21]. Our empirical investigation reveals the following main results:

  • For SISO communication scenarios: SSMs exhibit better generalization capabilities in terms of SNR and user speeds compared to MSAs. For MIMO communication scenarios, however, MSAs outperform SSMs in both ID and OOD evaluations.

  • Diversifying communication scenarios (i.e., many SNR levels within the training dataset) over which DNNs are trained for slot prediction is only beneficial for MSAs for SISO scenarios. This diversification has a negative impact on the CSI prediction MSE for MIMO scenarios where DNNs with lower SNR levels have a lower MSE performance for CSI prediction. This can be justified by the fact that introducing noise as a data augmentation technique to the training samples prevents overfitting [22], improves robustness [23], and is equivalent to Tikhonov regularization [24]. Such dataset diversification confirms the challenge of choosing the training dataset and its parameters to train DNNs. This adds on top of the data/model agreement difficulty between vendors in the context of future AI-enabled communication use cases.

The code to train and test MSA and SSM layers is available at https://github.com/makrout/Next-Slot-OFDM-CSI-Prediction.

I-D Outline

We structure the rest of this paper as follows. In Section II, we introduce the relevant background of MSA and SSM layers. In Section III, we define the next-slot OFDM-CSI prediction task and present the parameters of the 3GPP OFDM channel models. Our simulation results are presented in Section IV for both SISO and MIMO communications, from which we draw out some concluding remarks In Section V.

II Background

In this section, we review the architecture of the MSA and SSM layers at the detail needed for their comprehensive exposition and comparison.

II-A Multi-head attention layer

Let 𝑿N×D𝑿superscript𝑁𝐷\bm{X}\in\mathbb{R}^{N\times D}bold_italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT be the input sentence, where N𝑁Nitalic_N is the sequence length and D𝐷Ditalic_D is the embedding dimension. Let also Dhsubscript𝐷D_{h}italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT denote the dimension of each self-attention head (a.k.a., the query size) and H=D/Dh𝐻𝐷subscript𝐷H=D/D_{h}italic_H = italic_D / italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT be the number of heads. A self-attention layer starts by computing query, key, and value matrices 𝑸𝑸\bm{Q}bold_italic_Q, 𝑲𝑲\bm{K}bold_italic_K and 𝑽𝑽\bm{V}bold_italic_V from 𝑿𝑿\bm{X}bold_italic_X using linear transformations:

𝑸𝑸\displaystyle\bm{Q}bold_italic_Q =𝑿𝑾q,absent𝑿subscript𝑾𝑞\displaystyle=\bm{X}\,\bm{W}_{{q}},= bold_italic_X bold_italic_W start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT , (1a)
𝑲𝑲\displaystyle\bm{K}bold_italic_K =𝑿𝑾k,absent𝑿subscript𝑾𝑘\displaystyle=\bm{X}\,\bm{W}_{{k}},= bold_italic_X bold_italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (1b)
𝑽𝑽\displaystyle\bm{V}bold_italic_V =𝑿𝑾v.absent𝑿subscript𝑾𝑣\displaystyle=\bm{X}\,\bm{W}_{{v}}.= bold_italic_X bold_italic_W start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT . (1c)

where 𝑾qD×Dhsubscript𝑾𝑞superscript𝐷subscript𝐷\bm{W}_{{q}}\in\mathbb{R}^{D\times D_{h}}bold_italic_W start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, 𝑾kD×Dhsubscript𝑾𝑘superscript𝐷subscript𝐷\bm{W}_{{k}}\in\mathbb{R}^{D\times D_{h}}bold_italic_W start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and 𝑾vD×Dhsubscript𝑾𝑣superscript𝐷subscript𝐷\bm{W}_{{v}}\in\mathbb{R}^{D\times D_{h}}bold_italic_W start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are learnable parameters. Eq. (1) can be rewritten in a compact form as

[𝑸,𝑲,𝑽]=𝑿𝑾qkv,𝑸𝑲𝑽𝑿subscript𝑾𝑞𝑘𝑣[\bm{Q},\bm{K},\bm{V}]=\bm{X}\,\bm{W}_{{qkv}},[ bold_italic_Q , bold_italic_K , bold_italic_V ] = bold_italic_X bold_italic_W start_POSTSUBSCRIPT italic_q italic_k italic_v end_POSTSUBSCRIPT , (2)

where 𝑾qkvD×3Dhsubscript𝑾𝑞𝑘𝑣superscript𝐷3subscript𝐷\bm{W}_{{qkv}}\in\mathbb{R}^{D\times 3\,D_{h}}bold_italic_W start_POSTSUBSCRIPT italic_q italic_k italic_v end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × 3 italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is an overall learnable parameter matrix. The attention map 𝑴N×N𝑴superscript𝑁𝑁\bm{M}\in\mathbb{R}^{N\times N}bold_italic_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is then computed by scaled inner products from 𝑸𝑸\bm{Q}bold_italic_Q and 𝑲𝑲\bm{K}bold_italic_K and normalized by the softmax function as follows:

𝑴=softmax(𝑸𝑲Dh).𝑴softmax𝑸superscript𝑲topsubscript𝐷\bm{M}=\textrm{softmax}\left(\frac{\bm{Q}\,\bm{K}^{\top}}{\sqrt{D_{h}}}\right).bold_italic_M = softmax ( divide start_ARG bold_italic_Q bold_italic_K start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_ARG end_ARG ) . (3)

Here, the ij𝑖𝑗ijitalic_i italic_jth entry, Mijsubscript𝑀𝑖𝑗M_{ij}italic_M start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, in 𝑴𝑴\bm{M}bold_italic_M represents the attention score between 𝑸isubscript𝑸𝑖\bm{Q}_{i}bold_italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝑲jsubscript𝑲𝑗\bm{K}_{j}bold_italic_K start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The self-attention operation is then applied on the value vectors to produce the output matrix

𝑶=𝑴𝑽N×Dh.𝑶𝑴𝑽superscript𝑁subscript𝐷\bm{O}=\bm{M}\,\bm{V}\leavevmode\nobreak\ \in\mathbb{R}^{N\times D_{h}}.bold_italic_O = bold_italic_M bold_italic_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (4)

Finally, the output 𝒀N×D𝒀superscript𝑁𝐷\bm{Y}\in\mathbb{R}^{N\times D}bold_italic_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT of the self-attention layer is calculated by a learnable linear projection 𝑾projD×Dsubscript𝑾projsuperscript𝐷𝐷\bm{W}_{\textrm{proj}}\in\mathbb{R}^{D\times D}bold_italic_W start_POSTSUBSCRIPT proj end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D × italic_D end_POSTSUPERSCRIPT for the concatenated self-attention outputs of each head, i.e.:

𝒀=[𝑶1,𝑶2,,𝑶H]𝑾proj.𝒀subscript𝑶1subscript𝑶2subscript𝑶𝐻subscript𝑾proj\bm{Y}=[\bm{O}_{1},\bm{O}_{2},\dots,\bm{O}_{H}]\,\bm{W}_{\textrm{proj}}.bold_italic_Y = [ bold_italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_O start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ] bold_italic_W start_POSTSUBSCRIPT proj end_POSTSUBSCRIPT . (5)

Overall, the MSA layer can be seen as a learnable module that takes an input 𝑿𝑿\bm{X}bold_italic_X and returns an output 𝒀𝒀\bm{Y}bold_italic_Y of the same dimension. Note, however, that both 𝑿𝑿\bm{X}bold_italic_X and 𝒀𝒀\bm{Y}bold_italic_Y are divided logically among H𝐻Hitalic_H heads. Consequently, different segments of 𝑿𝑿\bm{X}bold_italic_X and 𝒀𝒀\bm{Y}bold_italic_Y are able to learn the correlation patterns of some input chunks in relation to the other ones within the sequence. This division enables the multi-head attention layer to acquire richer correlation patterns within the input sequence 𝒀𝒀\bm{Y}bold_italic_Y.

In terms of computational complexity, the FLOPS of the MSA layer is divided across four steps:

  • i)i)italic_i )

    the three linear projections in (1) with complexity 3ND23𝑁superscript𝐷23\,N\,D^{2}3 italic_N italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT,

  • ii)ii)italic_i italic_i )

    the computation of the attention map 𝑴𝑴\bm{M}bold_italic_M in (3) with complexity N2Dsuperscript𝑁2𝐷N^{2}\,Ditalic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_D,

  • iii)iii)italic_i italic_i italic_i )

    the self-attention operation in (4) with complexity is N2Dsuperscript𝑁2𝐷N^{2}\,Ditalic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_D,

  • iv)iv)italic_i italic_v )

    the linear projection for the concatenated self-attention outputs in (5) with complexity ND2𝑁superscript𝐷2N\,D^{2}italic_N italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Summing the complexity of these steps yields the overall number of FLOPS for an MSA layer as 4ND2+2ND24𝑁superscript𝐷22𝑁superscript𝐷24\,N\,D^{2}+2\,N\,D^{2}4 italic_N italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 2 italic_N italic_D start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Due to the quadratic dependence of the complexity on the sequence length N𝑁Nitalic_N, AI researchers have been actively looking for novel and cheaper alternatives without sacrificing the MSA performance. The most promising of existing competitive methods is SSMs, especially the Mamba layer, which will be described in the next Section.

II-B State-space model layer

In classical control and filtering theories, the evolution of continuous systems as a function of time t𝑡titalic_t with state 𝒉(t)D𝒉𝑡superscript𝐷\bm{h}(t)\in\mathbb{R}^{D}bold_italic_h ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT and input 𝒙(t)N𝒙𝑡superscript𝑁\bm{x}(t)\in\mathbb{R}^{N}bold_italic_x ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is described according to the SSM:

𝒉(t)superscript𝒉𝑡\displaystyle\bm{h}^{\prime}(t)bold_italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_t ) =𝑨𝒉(t)+𝑩𝒙(t),(state equation)absent𝑨𝒉𝑡𝑩𝒙𝑡(state equation)\displaystyle=\bm{A}\,\bm{h}(t)+\bm{B}\,\bm{x}(t),\hskip 44.10185pt\textrm{(% state equation)}= bold_italic_A bold_italic_h ( italic_t ) + bold_italic_B bold_italic_x ( italic_t ) , (state equation) (6a)
𝒚(t)𝒚𝑡\displaystyle\bm{y}(t)bold_italic_y ( italic_t ) =𝑪𝒉(t)+𝑫𝒙(t),(output equation)absent𝑪𝒉𝑡𝑫𝒙𝑡(output equation)\displaystyle=\bm{C}\,\bm{h}(t)+\bm{D}\,\bm{x}(t),\hskip 42.67912pt\textrm{(% output equation)}= bold_italic_C bold_italic_h ( italic_t ) + bold_italic_D bold_italic_x ( italic_t ) , (output equation) (6b)

In (6), the state equation describes how the state 𝒉(t)𝒉𝑡\bm{h}(t)bold_italic_h ( italic_t ) changes (through the matrix 𝑨𝑨\bm{A}bold_italic_A) based on how the input 𝒙(t)𝒙𝑡\bm{x}(t)bold_italic_x ( italic_t ) influences the state (through the matrix 𝑩𝑩\bm{B}bold_italic_B). The output equation describes how the state 𝒉(t)𝒉𝑡\bm{h}(t)bold_italic_h ( italic_t ) is observed in the output 𝒚(t)N𝒚𝑡superscript𝑁\bm{y}(t)\in\mathbb{R}^{N}bold_italic_y ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT (through the matrix 𝑪𝑪\bm{C}bold_italic_C) and how the input 𝒙(t)𝒙𝑡\bm{x}(t)bold_italic_x ( italic_t ) influences the output (through the matrix 𝑫𝑫\bm{D}bold_italic_D). For sequence models, the input 𝒙(t)𝒙𝑡\bm{x}(t)bold_italic_x ( italic_t ) represents the token embedding at position t𝑡titalic_t while 𝒚(t)𝒚𝑡\bm{y}(t)bold_italic_y ( italic_t ) denotes the next token embedding. By learning the parameters 𝑨𝑨\bm{A}bold_italic_A, 𝑩𝑩\bm{B}bold_italic_B, 𝑪𝑪\bm{C}bold_italic_C and 𝑫𝑫\bm{D}bold_italic_D, the SSM layer captures the evolution parameters of the dynamics from one token to the other. For systems with discrete state and input like textual sequences, the continuous-time SSM in (6) must be discretized using a step size 𝚫𝚫\bm{\Delta}bold_Δ which represents the resolution of the input. In other words, a discrete input 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a sample of the continuous input 𝒙(t)𝒙𝑡\bm{x}(t)bold_italic_x ( italic_t ) where 𝒙t=𝒙(t𝚫)subscript𝒙𝑡𝒙𝑡𝚫\bm{x}_{t}=\bm{x}(t\,\bm{\Delta})bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_x ( italic_t bold_Δ ). Using the bilinear method [25], the discrete-time SSM is given by222Note that it is common to omit the parameter 𝑫𝑫{\bm{D}}bold_italic_D during the discretization because the term 𝑫𝒙(t)𝑫𝒙𝑡{\bm{D}}\,\bm{x}(t)bold_italic_D bold_italic_x ( italic_t ) is equivalent to a skip connection which be incorporated easily in the SSM layer architecture.:

𝒉tsubscript𝒉𝑡\displaystyle\bm{h}_{t}bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 𝑨𝒉t1+ 𝑩𝒙t,absent 𝑨subscript𝒉𝑡1 𝑩subscript𝒙𝑡\displaystyle=\hbox{\vbox{\hrule height=0.5pt\kern 0.86108pt\hbox{\kern-1.0000% 6pt$\bm{A}$\kern-1.00006pt}}}\,{\bm{h}}_{t-1}+\hbox{\vbox{\hrule height=0.5pt% \kern 0.86108pt\hbox{\kern-1.00006pt$\bm{B}$\kern-1.00006pt}}}\,\bm{x}_{t},= roman_A bold_italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + roman_B bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (7a)
𝒚tsubscript𝒚𝑡\displaystyle\bm{y}_{t}bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 𝑪𝒉t,absent 𝑪subscript𝒉𝑡\displaystyle=\hbox{\vbox{\hrule height=0.5pt\kern 0.86108pt\hbox{\kern-1.0000% 6pt$\bm{C}$\kern-1.00006pt}}}\,\bm{h}_{t},\vspace{-0.5cm}= roman_C bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (7b)

where

 𝑨𝑨\bm{A}bold_italic_A (𝑰𝚫/2𝑨)1(𝑰+𝚫/2𝑨),absentsuperscript𝑰𝚫2𝑨1𝑰𝚫2𝑨\displaystyle\leavevmode\nobreak\ \triangleq\leavevmode\nobreak\ (\bm{I}-\bm{% \Delta}/2\cdot\bm{A})^{-1}(\bm{I}+\bm{\Delta}/2\cdot\bm{A}),≜ ( bold_italic_I - bold_Δ / 2 ⋅ bold_italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_I + bold_Δ / 2 ⋅ bold_italic_A ) ,
 𝑩𝑩\bm{B}bold_italic_B (𝑰𝚫/2𝑨)1𝚫𝑩,absentsuperscript𝑰𝚫2𝑨1𝚫𝑩\displaystyle\leavevmode\nobreak\ \triangleq\leavevmode\nobreak\ (\bm{I}-\bm{% \Delta}/2\cdot\bm{A})^{-1}\bm{\Delta}\,\bm{B},≜ ( bold_italic_I - bold_Δ / 2 ⋅ bold_italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Δ bold_italic_B , (8a)
 𝑪𝑪\bm{C}bold_italic_C 𝑪.absent𝑪\displaystyle\leavevmode\nobreak\ \triangleq\leavevmode\nobreak\ \bm{C}.≜ bold_italic_C . (8b)

The fact that the learnable parameters  𝑨𝑨\bm{A}bold_italic_A ,  𝑩𝑩\bm{B}bold_italic_B , and  𝑪𝑪\bm{C}bold_italic_C are constant means that the discrete-time SSM describes a linear time invariant (LTI) system with strong ties to convolution. Indeed, one can set the initial state 𝒙1subscript𝒙1\bm{x}_{-1}bold_italic_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT to 𝟎0\bm{0}bold_0 for simplicity and rewrite (7)–(8) in the convolution representation for t[1,T]𝑡1𝑇t\in[1,T]italic_t ∈ [ 1 , italic_T ] as follows [26]:

𝒚=  K 𝒙,𝒚  K 𝒙\bm{y}=\hbox{\vbox{\hrule height=0.5pt\kern 0.86108pt\hbox{\kern-1.00006pt$\bm% {K}$\kern-1.00006pt}}}\ast\bm{x},bold_italic_y = roman_K ∗ bold_italic_x , (9)

where  K (  C  B ,  C  A  B ,,  C 𝑨¯T1𝑩¯)  K  C  B  C  A  B  C superscript¯𝑨𝑇1¯𝑩\hbox{\vbox{\hrule height=0.5pt\kern 0.86108pt\hbox{\kern-1.00006pt$\bm{K}$% \kern-1.00006pt}}}\triangleq\left(\hbox{\vbox{\hrule height=0.5pt\kern 0.86108% pt\hbox{\kern-1.00006pt$\bm{C}$\kern-1.00006pt}}}\,\hbox{\vbox{\hrule height=0% .5pt\kern 0.86108pt\hbox{\kern-1.00006pt$\bm{B}$\kern-1.00006pt}}},\hbox{\vbox% {\hrule height=0.5pt\kern 0.86108pt\hbox{\kern-1.00006pt$\bm{C}$\kern-1.00006% pt}}}\,\hbox{\vbox{\hrule height=0.5pt\kern 0.86108pt\hbox{\kern-1.00006pt$\bm% {A}$\kern-1.00006pt}}}\,\hbox{\vbox{\hrule height=0.5pt\kern 0.86108pt\hbox{% \kern-1.00006pt$\bm{B}$\kern-1.00006pt}}},\ldots,\hbox{\vbox{\hrule height=0.5% pt\kern 0.86108pt\hbox{\kern-1.00006pt$\bm{C}$\kern-1.00006pt}}}\,\overline{% \bm{A}}^{T-1}\overline{\bm{B}}\right)roman_K ≜ ( roman_C roman_B , roman_C roman_A roman_B , … , roman_C over¯ start_ARG bold_italic_A end_ARG start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_B end_ARG ) represents the SSM convolution kernel. By avoiding the standard recurrent representation, the convolution representation in (9) offers a compact and efficient parallel computation for SSM layers. However, because  𝑲𝑲\bm{K}bold_italic_K is a giant filter, the naive implementation of the convolution as in (9) is slow and memory inefficient. To sidestep this limitation, many AI studies proposed restricting the structure of the SSM parameters to specific forms. Triangular  𝑨𝑨\bm{A}bold_italic_A matrices kee** track of the Legendre polynomial’s coefficients are computationally efficient and produce a hidden state 𝒉tsubscript𝒉𝑡\bm{h}_{t}bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that memorizes the input history [27]. Structured state space sequence models (S4) have also been introduced for SSMs where the parameters have a diagonal plus low-rank (DPLR) structure in the complex space [26]. Such a structure offers efficient SSMs with linear-time complexity instead of attention. More recently, Mamba [15] enhanced the S4 model by introducing a selective input mechanism that enables the model to choose relevant information based on the input 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. This approach, combined with an implementation that is optimized for hardware, allowed Mamba to outperform transformers on different dense modalities like language and genomics. For input and state sequences 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝒉tsubscript𝒉𝑡\bm{h}_{t}bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of size N𝑁Nitalic_N and D𝐷Ditalic_D, the number of FLOPS of the Mamba layer scales linearly in N𝑁Nitalic_N, more precisely it is 𝒪(ND)𝒪𝑁𝐷\mathcal{O}(N\,D)caligraphic_O ( italic_N italic_D ). Another important aspect of the Mamba model is that it is the first SSM to be time-invariant by indirectly updating  𝑨𝑨\bm{A}bold_italic_A through 𝚫𝚫\bm{\Delta}bold_Δ and directly updating  𝑩𝑩\bm{B}bold_italic_B and  𝑪𝑪\bm{C}bold_italic_C over time through its selective scan mechanism.

III AI-based OFDM-CSI Prediction

In this section, we describe the proposed CSI prediction mechanism for the next-slot OFDM-CSI prediction. We also describe how the input dimensions of SSM and MSA layers are mapped to the CSI dimensions. We then present the 3GPP channel models and their key parameters which will be later considered in our simulation results in Section IV.

III-A Prediction tasks

In both 4G (a.k.a. LTE) and 5G networks, uplink and downlink transmissions are organized into radio frames of 10 ms each as depicted in Fig. 2. Each frame is divided into ten equally sized subframes. The duration of each subframe is 1 ms. In LTE, each subframe is further divided into two equal-size time slots, and each slot is of duration 0.5 ms. In 5G, however, the slot length changes depending on the used subcarrier spacing (a.k.a., numerology) associated with the operational frequency band and the service requirements.

Refer to caption
Fig. 2: Illustration of the LTE radio frame structure

In OFDM systems, the channel is a two-dimensional grid of Nssubscript𝑁𝑠N_{s}italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT symbols in time and Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT sub-carriers in frequency. Specifically, consider a downlink system with Nrsubscript𝑁𝑟N_{r}italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT antennas at the receiver (i.e., UE) and Ntsubscript𝑁𝑡N_{t}italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT antennas at the transmitter (i.e., base station). The UE is continuously forecasting the CSI given the previously determined ones. To train and test SSM and MSA layers on this task, we consider the following CSI prediction problem: given the previous slot CSI, the UE predicts the CSI pertaining to next slot within the same subframe as depicted Fig. 2. This task covers slot-wise CSI prediction across subframes as well, i.e., between the last slot of subframe i𝑖iitalic_i and the first slot of subframe i+1𝑖1i+1italic_i + 1.

Refer to caption
(a) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 3: SISO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMi channel at different SNR values without a distribution shift in the UE speed (i.e., vtrain=vtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}=v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).

Because the input of the tasks depends on the characteristics of the SSM and MSA layers, we associate their dimensions with those of the OFDM grid as follows:

III-A1 State-space model

Given Ns0subscript𝑁subscript𝑠0N_{s_{0}}italic_N start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT OFDM symbols spanning Nf0subscript𝑁subscript𝑓0N_{f_{0}}italic_N start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT sub-carriers, the input sequence 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT consists of Ns0subscript𝑁subscript𝑠0N_{s_{0}}italic_N start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT symbols in time analogously to the token positions in sentences, while the sub-carrier dimension represents the number of channel coefficients in 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. As a result, the obtained input vector 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT belongs to Ns0×2Nf0superscriptsubscript𝑁subscript𝑠02subscript𝑁subscript𝑓0\mathbb{R}^{N_{s_{0}}\times 2\,N_{f_{0}}}blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT × 2 italic_N start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where the factor 2222 follows from the concatenation of the real and complex parts of the OFDM symbols.

III-A2 Multi-head attention

Similarly the SSM input, the Ns0subscript𝑁subscript𝑠0N_{s_{0}}italic_N start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT OFDM symbols over the Nf0subscript𝑁subscript𝑓0N_{f_{0}}italic_N start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT sub-carriers represent the input sequence of the attention layer. We use two attention heads for real and imaginary parts of the sequence.

III-B 3GPP channel models

We consider two 5G channel models from the 3GPP specification for frequency bands up to 100 GHz, namely the UMi and UMa channel models [28]. They were derived based on extensive measurement and ray tracing results across a multitude of frequencies from 5 GHz to 100 GHz. We summarize in Table I the key parameters we vary to assess the AI performance in next-slot OFDM-CSI prediction tasks.

Table I: Summary of the 3GPP channel parameters.
Parameter Values
OFDM channel type UMi, UMa
User speed [m/s] {0,10,20,30}0102030\{0,10,20,30\}{ 0 , 10 , 20 , 30 }
SNR [dB] {30,10,0,10,30}301001030\{-30,-10,0,10,30\}{ - 30 , - 10 , 0 , 10 , 30 }
carrier frequency [GHz] {5, 28}
carrier spacing [KHz] 30303030

Both UMi and UMa channels are considered without a line-of-sight between the base station and UEs.

IV Numerical Results and Discussions

In this section, we extensively assess the performance of SSM and MSA layers over multiple wireless scenarios. Throughout this section, we denote by 𝒮={30,10,0,10,30}𝒮301001030\mathcal{S}=\{-30,-10,0,10,30\}caligraphic_S = { - 30 , - 10 , 0 , 10 , 30 } in [dB] and 𝒱={0,10,20,30}𝒱0102030\mathcal{V}=\{0,10,20,30\}caligraphic_V = { 0 , 10 , 20 , 30 } in [m/s] the set of possible values for the SNR and user speeds. The code to train and test SSM and MSA layers is available on Github at https://github.com/makrout/Next-Slot-OFDM-CSI-Prediction.

Refer to caption
(a) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 4: SISO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMi channel at different SNR values with a distribution shift in the UE speed (i.e., vtrainvtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}\neq v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ≠ italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).

Specifically, we consider the following two communication scenarios:

  • Uplink SISO transmission between a base station and a single user, both having one antenna. The task on the user side forecasts the next-slot OFDM-CSI.

  • Downlink MIMO transmission between a base station with nTxsubscript𝑛subscriptTxn_{\textrm{T}_{\textrm{x}}}italic_n start_POSTSUBSCRIPT T start_POSTSUBSCRIPT x end_POSTSUBSCRIPT end_POSTSUBSCRIPT antennas and nUsubscript𝑛Un_{\textrm{U}}italic_n start_POSTSUBSCRIPT U end_POSTSUBSCRIPT users each with one single antenna. The task at the base station side forecasts the next-slot OFDM-CSI prediction for all users simultaneously.

In all simulations, we consider transmissions using 2222-QAM constellations. For a fixed configuration of OFDM channel type, carrier frequency, and carrier spacing (cf. Table I), we train MSA and SSM layers to minimize the MSE between the next-slot OFDM-CSI and the predicted one for a given communication scenario determined by the SNR and user speed values in 𝒮𝒮\mathcal{S}caligraphic_S and 𝒱𝒱\mathcal{V}caligraphic_V, respectively. Users have single antennas with vertical polarization and an omnidirectional antenna pattern. The base station, however, has a uniform linear array with dual polarization, with each antenna element having a 3GPP 38.901 antenna pattern.

Then, we test the trained layers on the same speed values for ID evaluation and on different ones for OOD evaluation. We set the number of epochs to 1000 and we report the average MSE performance after 100 iterations. We use the Sionna library [29] to generate the OFDM grids for each training and test communication scenario considered.

As mentioned in Section I-C, we highlight the fact that the goal of our simulation results is not to design state-of-the-art DNN architectures for CSI predictions, but rather compare the predictive capability of the SSM and MSA layers only.

IV-A SISO experiments

IV-A1 In-distribution evaluation

For a fixed user speed vtrainsubscript𝑣trainv_{\textrm{train}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT, we train separate MSA and SSM layers for each SNR level in 𝒮𝒮\mathcal{S}caligraphic_S at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz. We also train an additional MSA and SSM layers on a communication scenario over the UMi channel with all SNR levels combined by uniformly sampling the SNR over 𝒮𝒮\mathcal{S}caligraphic_S, which we refer to as the SNR value “all”. Fig. 3 depicts the OFDM-CSI prediction MSE pertaining to each considered network when it is evaluated over all possible SNR levels for static and highly mobile users, i.e., vtrain=vtest{0,30}subscript𝑣trainsubscript𝑣test030v_{\textrm{train}}=v_{\textrm{test}}\in\{0,30\}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT ∈ { 0 , 30 }. By comparing Figs. 3(a) and 3(b), it is seen that both SSM and MSA layers exhibit comparable MSE performance for static users only (i.e., v=0𝑣0v=0italic_v = 0). For mobile users with v=30𝑣30v=30italic_v = 30, it is observed how SSMs in Fig. 3(d) outperform MSAs in Fig. 3(c), with the MSE being an order of magnitude smaller for SNR values larger than 00 dB. However, the overall profile of the MSE over the entire SNR range increases for both models when users are mobile.

Refer to caption
(a) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 5: MIMO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMi channel at different SNR values without distribution shift in the UE speed (i.e., vtrain=vtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}=v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).
Refer to caption
(a) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 6: MIMO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMi channel at different SNR values with a distribution shift in the UE speed (i.e., vtrainvtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}\neq v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ≠ italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).

On the other hand, Figs. 3(a) and 3(b) show that the MSE decreases as a function of the SNR when users are static. This suggests that the task of learning the CSI prediction is more impacted by user mobility than the SNR. It is also noteworthy to observe how training the MSA layer with samples using all SNR levels (i.e., the gray curve) yields the lowest MSE across all test SNR levels. Interestingly, this SNR-wise diversification of samples, however, does not offer the lowest MSE for SSMs. This can be attributed to the fact that SSMs compress the signal sequence 𝒙𝒙\bm{x}bold_italic_x in the state equation given in (7a) by ensuring that 𝒉tsubscript𝒉𝑡\bm{h}_{t}bold_italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a fixed-sized low-dimensional hidden state compared to 𝒙tsubscript𝒙𝑡\bm{x}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Such compression toward learning a state-space model or equivalently a transfer function333As a matter of fact, any state-space model can be seen as a transfer function in the Laplace domain [30]. is not equally impacted by diversified communication scenarios in the dataset.

We then repeat the same training and evaluation at the mmwave carrier frequency fc=28subscript𝑓𝑐28f_{c}=28italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 28 GHz. Simulation results are presented in Appendix A due to space limitation where the ID and OOD MSE evaluations are reported in Figs. 7 and 8. There, similar MSE trends to those at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz are observed. Overall, the MSE is higher due to the significant path loss in mmwave bands. It is also interesting to note that both SSMs and MSAs trained with all SNR values do not yield the lowest MSE performance. Similar MSE profiles reported in Appendix B are also obtained after training and evaluating with UMa channels. The only notable difference is that MSE values are higher for UMa channels compared to UMi channels because macro-cell models cover a wider area with less dense networks.

IV-A2 Out-of-distribution evaluation

Unlike the previous experiment, we now train and test MSAs and SSMs on different user speeds. In Figs. 4(a) and 4(b), we train on mobile user scenarios (i.e., vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30) and test on static user scenarios (i.e., vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0). We perform the inverse training and test strategy in Figs. 4(c) and 4(d) by training on static users and testing on mobile ones. When comparing the range of the MSE between Figs. 4(a) and 4(b) and Figs. 4(c) and 4(d), it is seen that training on challenging CSI prediction tasks (i.e., when users are mobile) and testing on easier ones (i.e., when users are static) provides a better MSE on OOD scenarios. We also note that networks trained on lower SNR values generalize better than those trained on higher SNR levels. Moreover, when testing with static users, SSMs and MSAs exhibit a similar range of MSEs as shown in Figs. 4(a) and 4(b). However, for test scenarios on mobile users, SSMs exhibit a much lower MSE profile compared to MSAs as shown in Figs. 4(c) and 4(d). Indeed, it is well known that adding noise to the training samples of a DNN is equivalent to the Tikhonov regularization and can lead to significant improvements in generalization performance [24]. Similarly to the ID evaluation in Section IV-A1, the MSA model trained on all SNR levels is among the best performers for MSA layers, unlike the SSM ones.

IV-B MIMO experiments

For downlink MIMO CSI prediction, we consider a base station endowed with 20 transmit antennas communicating with 5 users, each of which with a single antenna.

IV-B1 In-distribution evaluation

Fig. 5 shows the CSI prediction MSE of both SSM and MSA networks when evaluated over all possible SNR levels for static and mobile users, i.e., vtrain=vtest{0,30}subscript𝑣trainsubscript𝑣test030v_{\textrm{train}}=v_{\textrm{test}}\in\{0,30\}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT ∈ { 0 , 30 }. Unlike the SISO case where the MSE performance of SSM and MSA layers were comparable, it is seen here that MSAs provide a lower MSE for both static and mobile user scenarios. It is interesting to observe again how networks trained with all SNR values do not provide the best performance. Training both models on MIMO scenarios with mobile or static users impacts the ID evaluation in the same way. This is to be opposed to the SISO case where training on mobile-user scenarios and testing on static-user scenarios yields better results than the opposite training and testing strategy. This does not reveal that the user speed is not a critical parameter but rather suggests that trained DNNs did not capture the correlation of fast-time varying channels due to the user mobility in MIMO scenarios. Indeed, the MSE is now two order of magnitude higher compared to the SISO case, suggesting that next-slot OFDM-CSI prediction is a challenging task for MIMO communication.

IV-B2 Out-of-distribution evaluation

When we compare the MSE of mobile user training with static user evaluation against static user training with mobile user evaluation (i.e., Fig. 6(a) vs. Fig. 6(c) for MSAs, and Fig. 6(b) and Fig. 6(d) for SSMs), a negligible variation in the MSE is observed. This does not suggest that these models exhibit a strong generalization performance over user speeds given the high MSE values, but rather confirms the challenge in predicting the next-slot OFDM-CSI for MIMO scenarios as already reported for ID evaluation in Section IV-B1.

V Conclusion

The existing applications of generative AI for wireless focus on language processing applications (e.g., prompt generation for compression, semantic communication). In this paper, we investigate the predictive capabilities of two key generative AI layers (i.e., multi-head attention and state space model) for OFDM slot prediction tasks. For these signal processing use cases, we compared the in-distribution and out-distribution performance of these two layers and empirically showed that multi-head attention layers outperform state space models for MIMO communication. However, we emphasize that the state space model layer has many advantages over the multi-head attention layer in terms of memory and computational complexity, which are also important factors for training and inference on long CSI inputs. Many avenues for further extension of this work are noteworthy. It is possible to design new hybrid architectures that endow state space models with an attention-like mechanism. One can also extend our benchmark to include more scenarios and models (e.g., antenna models, and frequency bands). One can also incorporate more wireless knowledge in the design of generative AI layers (e.g., prediction in the beam space).

References

  • [1] J. Yang, H. **, R. Tang, X. Han, Q. Feng, H. Jiang, S. Zhong, B. Yin, and X. Hu, “Harnessing the power of llms in practice: A survey on chatgpt and beyond,” ACM Transactions on Knowledge Discovery from Data, vol. 18, no. 6, pp. 1–32, 2024.
  • [2] L. Bariah, Q. Zhao, H. Zou, Y. Tian, F. Bader, and M. Debbah, “Large generative ai models for telecom: The next big thing?” IEEE Communications Magazine, 2024.
  • [3] X. Lin, “An overview of the 3gpp study on artificial intelligence for 5g new radio,” arXiv preprint arXiv:2308.05315, 2023.
  • [4] H. Kim, S. Kim, H. Lee, C. Jang, Y. Choi, and J. Choi, “Massive mimo channel prediction: Kalman filtering vs. machine learning,” IEEE Transactions on Communications, vol. 69, no. 1, pp. 518–528, 2020.
  • [5] J. B. Andersen, J. Jensen, S. H. Jensen, and F. Frederiksen, “Prediction of future fading based on past measurements,” in Gateway to 21st Century Communications Village. VTC 1999-Fall. IEEE VTS 50th Vehicular Technology Conference (Cat. No. 99CH36324), vol. 1.   IEEE, 1999, pp. 151–155.
  • [6] W. Peng, M. Zou, and T. Jiang, “Channel prediction in time-varying massive mimo environments,” IEEE Access, vol. 5, pp. 23 938–23 946, 2017.
  • [7] W. Jiang and H. D. Schotten, “Neural network-based fading channel prediction: A comprehensive overview,” IEEE Access, vol. 7, pp. 118 112–118 124, 2019.
  • [8] Y. Yang, F. Gao, Z. Zhong, B. Ai, and A. Alkhateeb, “Deep transfer learning-based downlink channel prediction for fdd massive mimo systems,” IEEE Transactions on Communications, vol. 68, no. 12, pp. 7485–7497, 2020.
  • [9] C. Luo, J. Ji, Q. Wang, X. Chen, and P. Li, “Channel state information prediction for 5g wireless communications: A deep learning approach,” IEEE transactions on network science and engineering, vol. 7, no. 1, pp. 227–236, 2018.
  • [10] J. Wang, Y. Ding, S. Bian, Y. Peng, M. Liu, and G. Gui, “Ul-csi data driven deep learning for predicting dl-csi in cellular fdd systems,” IEEE Access, vol. 7, pp. 96 105–96 112, 2019.
  • [11] Y. Zhang, J. Wang, J. Sun, B. Adebisi, H. Gacanin, G. Gui, and F. Adachi, “Cv-3dcnn: Complex-valued deep learning for csi prediction in fdd massive mimo systems,” IEEE Wireless Communications Letters, vol. 10, no. 2, pp. 266–270, 2020.
  • [12] S. Mourya, P. Reddy, S. Amuru, and K. K. Kuchi, “Spectral temporal graph neural network for massive mimo csi prediction,” IEEE Wireless Communications Letters, 2024.
  • [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [14] H. Jiang, M. Cui, D. W. K. Ng, and L. Dai, “Accurate channel prediction based on transformer: Making mobility negligible,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 9, pp. 2717–2732, 2022.
  • [15] A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752, 2023.
  • [16] S. Jelassi, D. Brandfonbrener, S. M. Kakade, and E. Malach, “Repeat after me: Transformers are better than state space models at copying,” arXiv preprint arXiv:2402.01032, 2024.
  • [17] E. Nguyen, K. Goel, A. Gu, G. Downs, P. Shah, T. Dao, S. Baccus, and C. Ré, “S4nd: Modeling images and videos as multidimensional signals with state spaces,” Advances in neural information processing systems, vol. 35, pp. 2846–2861, 2022.
  • [18] M. M. Islam and G. Bertasius, “Long movie clip classification with state-space video models,” in European Conference on Computer Vision.   Springer, 2022, pp. 87–104.
  • [19] C. Wang, O. Tsepa, J. Ma, and B. Wang, “Graph-mamba: Towards long-range graph sequence modeling with selective state spaces,” arXiv preprint arXiv:2402.00789, 2024.
  • [20] M. Akrout, A. Mezghani, E. Hossain, F. Bellili, and R. W. Heath, “From multilayer perceptron to gpt: A reflection on deep learning research for wireless physical layer,” arXiv preprint arXiv:2307.07359, 2023.
  • [21] M. Akrout, A. Feriani, F. Bellili, A. Mezghani, and E. Hossain, “Domain generalization in machine learning models for wireless communications: Concepts, state-of-the-art, and open issues,” IEEE Communications Surveys & Tutorials, 2023.
  • [22] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 13 001–13 008.
  • [23] R. G. Lopes, D. Yin, B. Poole, J. Gilmer, and E. D. Cubuk, “Improving robustness without sacrificing accuracy with patch gaussian augmentation,” arXiv preprint arXiv:1906.02611, 2019.
  • [24] C. M. Bishop, “Training with noise is equivalent to tikhonov regularization,” Neural computation, vol. 7, no. 1, pp. 108–116, 1995.
  • [25] A. Tustin, “A method of analysing the behaviour of linear systems in terms of time series,” Journal of the Institution of Electrical Engineers-Part IIA: Automatic Regulators and Servo Mechanisms, vol. 94, no. 1, pp. 130–142, 1947.
  • [26] A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,” arXiv preprint arXiv:2111.00396, 2021.
  • [27] A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré, “Hippo: Recurrent memory with optimal polynomial projections,” Advances in neural information processing systems, vol. 33, pp. 1474–1487, 2020.
  • [28] G. T. 38.901, “Study on channel model for frequencies from 0.5 to 100 ghz,” 2017.
  • [29] J. Hoydis, S. Cammerer, F. A. Aoudia, A. Vem, N. Binder, G. Marcus, and A. Keller, “Sionna: An open-source library for next-generation physical layer research,” arXiv preprint arXiv:2203.11854, 2022.
  • [30] J. R. Leigh, Control theory.   Iet, 2004, vol. 64.

Appendix A SISO Simulations at f=28𝑓28f=28italic_f = 28 GHz

In this appendix, we present the evaluation of both SSM and MSA layers for the SISO communication scenario described in Section IV-A when the carrier frequency is fixed at fc=28subscript𝑓𝑐28f_{c}=28italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 28 GHz. Figs. 7 and 8 depict the ID and OOD evaluations.

Refer to caption
(a) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 7: SISO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=28subscript𝑓𝑐28f_{c}=28italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 28 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMi channel at different SNR values without distribution shift in the UE speed (i.e., vtrain=vtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}=v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).
Refer to caption
(a) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 8: SISO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=28subscript𝑓𝑐28f_{c}=28italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 28 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMi channel at different SNR values with a distribution shift in the UE speed (i.e., vtrainvtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}\neq v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ≠ italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).

Appendix B Simulations with the UMa channel

In this appendix, we present the results when SSM and MSA layers are trained on UMa channels.

Refer to caption
(a) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 9: SISO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMa channel at different SNR values without distribution shift in the UE speed (i.e., vtrain=vtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}=v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).
Refer to caption
(a) MSA, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(b) SSM, vtrain=30subscript𝑣train30v_{\textrm{train}}=30italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 30, vtest=0subscript𝑣test0v_{\textrm{test}}=0italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 0
Refer to caption
(c) MSA, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Refer to caption
(d) SSM, vtrain=0subscript𝑣train0v_{\textrm{train}}=0italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT = 0, vtest=30subscript𝑣test30v_{\textrm{test}}=30italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT = 30
Fig. 10: SISO MSE of next-slot OFDM-CSI prediction task vs. test SNRs at fc=5subscript𝑓𝑐5f_{c}=5italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 5 GHz for multiple MSA layers in (a) and (c) and the SSM layers in (b) and (d) when each is trained with the UMa channel at different SNR values with a distribution shift in the UE speed (i.e., vtrainvtestsubscript𝑣trainsubscript𝑣testv_{\textrm{train}}\neq v_{\textrm{test}}italic_v start_POSTSUBSCRIPT train end_POSTSUBSCRIPT ≠ italic_v start_POSTSUBSCRIPT test end_POSTSUBSCRIPT).