3GPP
3rd Generation Partnership Project
5G
fifth generation
AE-ELM
auto-encoder extreme learning machine
AI
artificial intelligence
AoA
angle-of-arrival
AoD
angle-of-departure
AP
access point
AWGN
additive white Gaussian noise
BLE
Bluetooth Low Energy
BW
bandwidth
BW-level
bandwidth-level
BS
base station
CM
common model
CNN
convolutional neural network
CR
compression ratio
CSI
channel state information
CU
Centralized Unit
DL
downlink
DL-AoD
downlink angle-of-departure
DoA
direction-of-arrival
DNN
dense neural network
DU
Decentralized Unit
ECDF
empirical cumulative distribution function
EEG
electroencephalogram
EKF
extended Kalman filter
ELM
extreme learning machine
EM
Encoder Model
FFT
Fast Fourier Transform
FR
frequency response
FM
full model
FP
fingerprinting
FR-Complex
complex frequency response
FR-Power
power-domain frequency response
FR-Phase
relative phase frequency response
FR-Power/Phase
power and relative phase frequency response
FR1
Frequency Range 1
FR2
Frequency Range 2
HDim
HIDDEN_DIM
GDA
Generalized Discriminant Analysis
GELU
Gaussian error linear unit
gNB
gNodeB
GNSS
Global Navigation Satellite System
ICA
Independent Component Analysis
IoT
Internet-of-Things
IPS
indoor positioning system
KF
Kalman filter
k𝑘kitalic_k-NN
k𝑘kitalic_k-Nearest Neighbors
LBS
location-based services
LDA
Linear Discriminant Analysis
LMF
Location Management Function
LoS
line-of-sight
LR
learning rate
LSTM
long-short term memory
MAC
Media Access Control
MDT
Minimization of Drive Testing
MEE
mean Euclidean error
MIMO
multiple-input multiple-output
ML
machine learning
mmWave
millimeter-wave
MRTT
multi-round trip time
NLoS
non-line-of-sight
NN
neural network
NR
New Radio
OFDM
orthogonal frequency-division multiplexing
path-AoA
path-wise angle-of-arrival
path-RP
path-wise received power
PRS
positioning reference signal
path-ToF
path-wise time-of-flight
PHY
physical-layer
PCA
principal component analysis
PF
particle filter
PL
path-loss
QoE
quality of experience
QoS
quality of service
RB
resource block
RE
resource element
ReLU
rectified linear unit
RB-level
resource block-level
RF
radio frequency
RFID
radio requency identification device
RMS
root mean square
RMSEE
root mean squared euclidean error
RNN
recurrent neural network
RP
reference point
RRC
radio resource control
RS
recommender system
RSRP
reference signal received power
RSRQ
reference signal received quality
RSS
received signal strength
RSSI
received signal strength indicator
RTT
round trip time
RX
receiver
SLAM
simultaneous localization and map**
SP
scattering point
SS
synchronization signal
SSB
synchronization signal block
SRS
sounding reference signal
SSS
secondary synchronization signal
std
standard deviation
SMV
support vector machine
SVD
single value decomposition
TDoA
time-difference-of-arrival
TL
transfer learning
TLCNN
Transfer Learning Convolutional Neural Network
ToA
time-of-arrival
ToF
time-of-flight
t-SNE
T-distributed Stochastic Neighbor Embedding
TX
transmitter
UE
user equipment
UL
uplink
UL-AoA
uplink angle-of-arrival
Wi-Fi
IEEE 802.11 network
WLAN
wireless LAN
WSN
wireless sensors networks
\fail

Robust NLoS Localization in 5G mmWave Networks: Data-based Methods and Performance

Roman Klus \orcidlink0000-0002-0641-5931,  Jukka Talvitie \orcidlink0000-0001-7685-7666, Julia Vinogradova \orcidlink0000-0001-8911-2065, Gabor Fodor \orcidlink0000-0002-2289-3159, Johan Torsner, and
Mikko Valkama \orcidlink0000-0003-0361-0800
Limited subset of early-stage results presented at IEEE SPAWC 2022 [14].R. Klus, J. Talvitie, and M. Valkama are with Tampere University, Finland. J. Vinogradova and J. Torsner are with Ericsson Research, Helsinki, Finland. G. Fodor is with Ericsson Research and with KTH, Stockholm, Sweden. Data and codes openly available at https://doi.org/10.5281/zenodo.12204893
Abstract

Ensuring smooth mobility management while employing directional beamformed transmissions in 5G millimeter-wave networks calls for robust and accurate user equipment (UE) localization and tracking. In this article, we develop neural network-based positioning models with time- and frequency-domain channel state information (CSI) data in harsh non-line-of-sight (NLoS) conditions. We propose a novel frequency-domain feature extraction, which combines relative phase differences and received powers across resource blocks, and offers robust performance and reliability. Additionally, we exploit the multipath components and propose an aggregate time-domain feature combining time-of-flight, angle-of-arrival and received path-wise powers. Importantly, the temporal correlations are also harnessed in the form of sequence processing neural networks, which prove to be of particular benefit for vehicular UEs. Realistic numerical evaluations in large-scale line-of-sight (LoS)-obstructed urban environment with moving vehicles are provided, building on full ray-tracing based propagation modeling. The results show the robustness of the proposed CSI features in terms of positioning accuracy, and that the proposed models reliably localize UEs even in the absence of a LoS path, clearly outperforming the state-of-the-art with similar or even reduced processing complexity. The proposed sequence-based neural network model is capable of tracking the UE position, speed and heading simultaneously despite the strong uncertainties in the CSI measurements. Finally, it is shown that differences between the training and online inference environments can be efficiently addressed and alleviated through transfer learning.

Index Terms:
5G New Radio, channel state information, deep learning, non-line-of-sight, positioning, tracking, vehicular systems

I Introduction

Expanding to the millimeter-wave (mmWave) frequencies allows to harness large channel bandwidths in the fifth generation (5G) New Radio (NR) mobile communication networks, which improves the network capacity, peak data rates, and latency characteristics compared to legacy systems [1, 2]. In such mmWave networks, beamforming active antenna arrays are a critical technology, allowing directional transmission and reception capabilities, thus improving the link budget while mitigating co-channel interference and providing the basis for angle-based cellular positioning.

In general, accurate real-time knowledge of the locations of the network UE is critical to ensure smooth and seamless mobility management, efficient handover management, and improved reliability of the radio link, while also allowing for location-based services (LBS) [3, 4, 5, 6]. Baseline UE localization builds commonly on Global Navigation Satellite System (GNSS) based approaches. However, terrestrial positioning utilizing 5G and other signals of opportunity is of increasing importance, and is also the main technical scope of this article. This is well motivated, as the availability of GNSS is known to be compromised not only indoors but also in outdoor urban areas [4, 7], while the large channel bandwidths and directional antenna systems deployed in 5G allow for accurate time- and angle-based measurements.

There is generally a wide selection of positioning methods available in the literature [4], covering both model-based Bayesian filtering approaches [8, 9, 10, 11, 12, 13] as well as data-driven machine learning (ML) based methods [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Majority of the works focus on the LoS scenarios, where the channel includes a direct propagation path from the transmitter (TX) to the receiver (RX), and thus the UE location can be estimated geometrically by utilizing pseudo-ranges and/or angular information based on radio measurements. Good examples of such LoS-oriented 5G positioning works include [11, 12], building commonly on Bayesian filtering methods such as different variants of extended Kalman filter (EKF) and particle filter (PF). In case the LoS path is unavailable, geometry-based Bayesian filtering models still exist, e.g., [9, 10, 13]. Such methods are, however, typically limited to single-bounce scenarios with a single path per scattering point, and can thus easily become unreliable in realistic scattering environments while being also computationally heavy and complex [32, 33]. Therefore, UE positioning and tracking in NLoS or LoS-ambiguous scenarios call for novel solutions and models, capable of ensuring fast and reliable operation in realistic scattering environments with feasible real-time computational complexity. This is the main technical focus of this article, with a specific emphasis on vehicular systems in challenging urban environments, where NLoS scenarios commonly occur with realistic network deployments [34].

In this article, building on our initial work in [14], we propose and describe efficient ML based models for NLoS or LoS-obstructed positioning that offer robust performance, low operational complexity, good generalization properties, and wide architectural options. We harness the temporal correlation of the channel features in vehicular systems and focus on sequence processing neural network (NN) methods as the fundamental ML engine. We propose two alternative feature sets capable of describing the radio channel and further NN-based positioning, namely frequency-domain and time-domain CSI features. Additionally, we include the vehicle speed and heading as additional model outputs to enable efficient tracking using a single model.

ML-based positioning has been addressed in the recent literature, e.g. in [15, 14, 16, 17, 18, 20, 22, 23, 24, 25, 26, 27, 28, 21, 29, 30, 31]. To this end, received signal strength (RSS) measurements were adopted in [19, 20, 21], in indoor IEEE 802.11 network (Wi-Fi) fingerprinting context, while the corresponding 5G deployment was considered in [35]. The work in [24] proposed a CSI-based fingerprinting model that utilizes the amplitude response of the channel. Due to the classifier-like NN, the proposed system is, however, limited to small deployments. The authors of [15] developed an NN-based feature extractor with Wi-Fi CSI measurements considering the amplitude response only, followed by a k𝑘kitalic_k-Nearest Neighbors (k𝑘kitalic_k-NN) positioning algorithm. Similar approaches building on channel amplitude response measurements were considered also in [25] and [26]. In [27], a Wi-Fi positioning approach utilizing the channel phase response as a feature was proposed. The method is, however, not suitable for large-scale scenarios due to the classifier-based NN, while the considered phase slope calculation is also subject to ambiguities. The work in [22], in turn, presented a 5G positioning system, however, being limited to the LoS scenario while utilizing only the beam-specific reference signal received power (RSRP) values as the features. In [23], a positioning system that generates probability maps using an NN model was described with a feature representation that transforms the frequency-domain uplink (UL) CSI data to a delay-domain. The probability maps enable efficient sensor fusion, yet their scale directly affects the complexity of the underlying NN model. Furthermore, [29] described a paradigm to produce a high-accuracy 5G localization dataset, building on channel frequency response measurements.

Different hybrid solutions also exist in the literature, either in terms of aiding Bayesian filtering solutions through ML methods or fusing measurements from various different sensors [16]. To this end, [17] proposed a series of recurrent neural network (RNN) models to replace the EKF and thus enable implicit and data-driven learning. Furthermore, a sensor fusion approach using reinforcement learning-assisted particle filtering is described in [10]. Neural network based 5G fingerprinting and GNSS data fusion were, in turn, considered in [18]. Recently, in [36], an NN model classifying different propagation paths from time-domain CSI in the form of path parameters was paired with geometry-based positioning algorithm considering LoS and single-bounce NLoS paths.

ML-based localization with propagation time measurements has also gained interest in recent works [30, 33, 14]. In [30], a bidirectional RNN is employed to track the UE based on the time-of-arrival (ToA) measurements from multiple nodes, while the authors of [33] utilize a convolutional neural network (CNN) to estimate the accurate ToA from the raw channel impulse responses in LoS/NLoS scenarios. However, only LoS measurements are used for the actual localization. Finally, angular information at either the gNodeB (gNB) or the UE can also be used for ML-based localization, as shown in [31].

TABLE I: Summary of related works
Reference Technology Features ML model Tracking Uncertainty Vehicular NLoS
Feat. Label Envir. Systems
[14] 5G mmWave AoA and ToF DNN, LSTM \checkmark \checkmark \checkmark ×\times× \checkmark \checkmark
[15] Wi-Fi channel amplitude response CNN ×\times× ×\times× ×\times× ×\times× ×\times× \checkmark*
[18] 5G mmWave RSRP DNN ×\times× \checkmark ×\times× ×\times× ×\times× ×\times×
[22] 5G mmWave RSRP DNN ×\times× ×\times× ×\times× ×\times× ×\times× ×\times×**
[23] Wi-Fi channel frequency response DNN ×\times× \checkmark ×\times× ×\times× ×\times× \checkmark
[24] Wi-Fi channel amplitude response DNN \checkmark ×\times× ×\times× ×\times× ×\times× ×\times×
[25] Wi-Fi channel amplitude response CNN ×\times× ×\times× ×\times× ×\times× ×\times× ×\times×*
[26] Wi-Fi channel amplitude response ML \checkmark ×\times× ×\times× ×\times× ×\times× ×\times×*
[27] Wi-Fi channel phase response DNN \checkmark ×\times× ×\times× ×\times× ×\times× ×\times×
[29] 5G mmWave channel frequency response CNN ×\times× \checkmark ×\times× ×\times× ×\times× ×\times×
[30] Unspecified ToA RNN \checkmark ×\times× ×\times× ×\times× ×\times× ×\times×
[31] BLE AoA CNN ×\times× \checkmark ×\times× ×\times× ×\times× \checkmark***
[33] LTE channel impulse response CNN ×\times× \checkmark ×\times× ×\times× ×\times× \checkmark*
[36] mmWave path-wise CSI DNN ×\times× \checkmark ×\times× ×\times× \checkmark \checkmark**
 This Work  5G mmWave  time- and frequency-domain CSI  DNN, LSTM \checkmark \checkmark \checkmark \checkmark \checkmark \checkmark

* Some NLoS samples available in evaluation; ** Removing the detected NLoS samples before positioning; *** RX signal subject to Rayleigh fading.

The main related works and their relevant aspects are summarized in Table I. Importantly, the NLoS positioning under rich and realistic scattering is not explicitly addressed, in particular in the context of beamforming 5G mmWave networks. Thus, complementary to the existing literature, this article focuses on ML-based reliable network localization using time- and frequency-domain CSI data with emphasis on challenging NLoS scenarios, while noting also various relevant uncertainty aspects. The application focus is on vehicular systems in urban environments with 5G mmWave deployments and CSI features that can be obtained through 3GPP standardized UL and/or downlink (DL) measurements and corresponding signaling. The contributions and novelty compared to the existing ML-based positioning literature can be stated and summarized as follows:

  • We introduce, derive, and evaluate efficient frequency-domain CSI features in the form of sparse power and phase measurements, and their combinations, for ML-based positioning models and compare their robustness with the ones introduced previously in the literature;

  • We also introduce and evaluate alternative time-domain path-wise CSI features and demonstrate their effectiveness in wireless positioning scenarios while finding the relevant and best-performing feature combinations;

  • We develop a novel hybrid NN processing model in terms of instantaneous and sequence data processing for simultaneous UE location, velocity, and heading tracking using the above channel-based features;

  • We evaluate the performance of different features and processing models in a large-scale, realistic, urban scenario with full ray-tracing-based channel measurements under harsh NLoS conditions in the context of 28 GHz mmWave 5G network, while also considering realistic training and measurement uncertainties;

  • We show that the proposed time-domain and frequency-domain features outperform the benchmark solutions, especially when combined with the sequence processing ML model, in terms of the positioning accuracy and complexity – in particular, in the challenging multi-bounce scattering environments, where the proposed positioning approach achieves comparable accuracy in both LoS and NLoS conditions, regardless of the number of bounces;

  • We also address the important practical issue of environment or gNB deployment differences between the training phase and the actual online inference phase, through transfer learning, and show that specializing to the current deployment is feasible;

  • Finally, we address the complexity of the developed methods, in comparison to the prior-art, while also openly share the data and codes for research reproducibility and transparency.

For clarity it is stated that selected time-domain features were initially considered in [14], however, the adopted NN models were lacking the advanced sequence processing capabilities. Additionally, no frequency-domain features were considered, while the transfer learning aspects were also fully neglected.

The rest of this article is organized as follows. Section II introduces the network measurements applicable for positioning and their acquisition with 3GPP compatible reference signals and measurement procedures. Section III introduces the proposed frequency-domain and time-domain CSI data features and the related pre-processing. Additionally, the NN processing models and architectures are described incorporating both instantaneous and sequence models. Section IV describes the considered urban vehicular positioning scenario and evaluation environment, together with the practical measurement or data uncertainties. Additionally, the obtained numerical results are presented and analyzed, while also considering the important aspect of specializing to the prevailing gNB deployment through transfer learning. Finally, Section V concludes the work.

II Positioning Measurements and Data Acquisition

This section introduces the signals and standardized measurements, available in 5G NR, to extract positioning data. Specific focus is on synchronization signal block (SSB) based measurements in DL, in terms of frequency-domain data, while in time-domain we harness multi-round trip time (MRTT) and UL angle-of-arrival (AoA) based multipath measurements utilizing UL sounding reference signal (SRS) and DL positioning reference signal (PRS). The relevant measurements and data acquisition methods are illustrated conceptually in Fig. 1, while being described in detail below. For clarity, we state that the frequency- and time-domain measurements are alternative approaches to obtain positioning features and data.

II-A Signals and Measurements for 5G NR Positioning

Measuring the received signal strength is one common approach utilized in wireless positioning. In 5G NR, we distinguish RSRP, reference signal received quality (RSRQ), and received signal strength indicator (RSSI), including their beam-specific and resource-specific alternatives. Their acquisition is defined in [37] building on different reference signals, such as UL SRS and DL synchronization signal (SS) and PRS. The corresponding measurements are called UL-SRS-RSRP, SS-RSRP and DL-PRS-RSRP, respectively. Signal strength measurements are vital for numerous network functions, such as mobility management, and thus regularly collected.

Compared to signal strength-based measurements, propagation delay or time-of-flight (ToF)-based ranging benefits from large transmission bandwidths while being less sensitive to channel effects, such as reflections, diffractions, and scattering. To relax clock synchronization requirements between TX and RX, the 5G NR standard supports MRTT measurements [38] where the gNB measures the round trip time, denoted as ”gNB Rx–Tx time difference”, based on PRS transmission in DL and SRS transmission in UL. In addition, the UE measures the time between receiving the PRS and sending the SRS, denoted as the ”UE Rx–Tx time difference”, which is reported to the gNB [37] in order to solve the channel-dependent propagation delay. Alternatively, the ToF can be estimated indirectly at the gNB via time-difference-of-arrival (TDoA) and the related positioning calculations, as defined in [38]. Obtaining the ToF directly at the UE is currently not explicitly standardized.

Beamformed radio access provides inherent support for angle estimation and corresponding angle-based positioning schemes. In the current 5G NR standard, angle estimation is directly specified only for the gNB-side angular information either via the uplink angle-of-arrival (UL-AoA) or the downlink angle-of-departure (DL-AoD) [38]. The UL-AoA is defined as the estimated azimuth and vertical angles of a UE, observed at a gNB [37], based on UL SRS. The exact angle estimation method used in the UL-AoA is not specified, which allows for performance optimization. On the other hand, the estimation of the gNB angle at the UE using the DL-AoD is practically restricted to the use of spatial power measurements, i.e., DL-PRS-RSRP, which limits the achievable angle estimation accuracy [38].

Refer to caption
Figure 1: Illustration of the network data acquisition scheme with two gNBs, a UE, and a network localization entity represented by the LMF. Different UL and DL reference signals form the physical basis for obtaining positioning measurements and data. Additionally, the baseline neural processing chains from features to UE location, heading and velocity are highlighted, for both time- and frequency-domain feature scenarios.

Importantly, estimating the ranges and angles also for paths beyond the LoS component is feasible [13], offering added value to the positioning task [9]. Since the current 5G NR standard does not specify accurate estimation of path ranges or angles at the UE side, for a DL-based positioning we consider observing frequency-domain CSI measurements based on SSs transmissions by the gNBs. This is beneficial since the SSs are periodically transmitted and thus systematically available. Additionally, for an UL-based positioning method, we consider observing time-domain multipath measurements, including path-wise angles and propagation delays, directly at the gNBs. For obtaining the path-wise angles and propagation delays in practice, it is possible to exploit the above-discussed 5G NR specified UL-AoA and MRTT methods, respectively.

II-B Frequency-Domain Channel Measurements through Beamformed DL SSs

The frequency-domain CSI relates to the frequency response (FR) of the effective channel between the gNB and the UE. Considering orthogonal frequency-division multiplexing (OFDM) based transmission, the antenna-element-wise FR at subcarrier n𝑛nitalic_n, denoted as 𝓗(n)NRX×NTX𝓗𝑛superscriptsubscript𝑁RXsubscript𝑁TX\bm{\mathcal{H}}(n)\!\in\!\mathbb{C}^{N_{\text{RX}}\times N_{\text{TX}}}bold_caligraphic_H ( italic_n ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, can be written as [13, 9]

𝓗(n)=k=0K1hkej2πnτkFsN𝒂RX(θAOA,k)𝒂TXH(θAOD,k),𝓗𝑛superscriptsubscript𝑘0𝐾1subscript𝑘superscript𝑒𝑗2𝜋𝑛subscript𝜏𝑘subscript𝐹𝑠𝑁subscript𝒂RXsubscript𝜃AOA𝑘superscriptsubscript𝒂TX𝐻subscript𝜃AOD𝑘\bm{\mathcal{H}}(n)=\sum_{k=0}^{K-1}h_{k}e^{-\frac{j2\pi n\tau_{k}F_{s}}{N}}% \bm{a}_{\text{RX}}(\theta_{\text{AOA},k})\bm{a}_{\text{TX}}^{H}(\theta_{\text{% AOD},k}),bold_caligraphic_H ( italic_n ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_j 2 italic_π italic_n italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_N end_ARG end_POSTSUPERSCRIPT bold_italic_a start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT ) bold_italic_a start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT AOD , italic_k end_POSTSUBSCRIPT ) , (1)

where K𝐾Kitalic_K is the number of paths, while hksubscript𝑘h_{k}italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, θAOA,ksubscript𝜃AOA𝑘\theta_{\text{AOA},k}italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT and θAOD,ksubscript𝜃AOD𝑘\theta_{\text{AOD},k}italic_θ start_POSTSUBSCRIPT AOD , italic_k end_POSTSUBSCRIPT are the complex path coefficient, ToF, AoA and angle-of-departure (AoD) for the kthsuperscript𝑘thk^{\text{th}}italic_k start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT path, respectively. Furthermore, NTXsubscript𝑁TXN_{\text{TX}}italic_N start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT and NRXsubscript𝑁RXN_{\text{RX}}italic_N start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT are the numbers of transmit and receive antennas in respective order, Fssubscript𝐹𝑠F_{s}italic_F start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is the sampling frequency, and N𝑁Nitalic_N is the OFDM Fast Fourier Transform (FFT) size. Finally, 𝒂TX()NTXsubscript𝒂TXsuperscriptsubscript𝑁TX\bm{a}_{\text{TX}}(\cdot)\in\mathbb{C}^{N_{\text{TX}}}bold_italic_a start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT ( ⋅ ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝒂RX()NRXsubscript𝒂RXsuperscriptsubscript𝑁RX\bm{a}_{\text{RX}}(\cdot)\in\mathbb{C}^{N_{\text{RX}}}bold_italic_a start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT ( ⋅ ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are the steering vectors, which define the phases per antenna element with respect to the array center, for given AoD and AoA. Considering further the analog phased-arrays in mmWave systems, the TX and RX apply beamforming weights 𝒃TXNTXsubscript𝒃TXsuperscriptsubscript𝑁TX\bm{b}_{\text{TX}}\in\mathbb{C}^{N_{\text{TX}}}bold_italic_b start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝒃RXNRXsubscript𝒃RXsuperscriptsubscript𝑁RX\bm{b}_{\text{RX}}\in\mathbb{C}^{N_{\text{RX}}}bold_italic_b start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, respectively. The corresponding effective beamformed channel at subcarrier n𝑛nitalic_n, considered as the frequency-domain CSI, can then be expressed as

(n)=𝒃RXH𝓗(n)𝒃TX.𝑛superscriptsubscript𝒃RX𝐻𝓗𝑛subscript𝒃TX\mathcal{H}(n)=\bm{b}_{\text{RX}}^{H}\bm{\mathcal{H}}(n)\bm{b}_{\text{TX}}.caligraphic_H ( italic_n ) = bold_italic_b start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_caligraphic_H ( italic_n ) bold_italic_b start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT . (2)

In practice, besides noise and interference, the CSI estimation can suffer from inaccuracies [28] due to radio frequency (RF) impairments, clock and frequency offsets between the gNB and the UE, and imperfect timing advance information. Furthermore, due to signaling overhead, CSI is often reported per blocks of subcarriers, which reduces the CSI resolution in frequency. In this paper, we consider obtaining the frequency-domain CSI via 5G NR SSs, transmitted periodically in DL by all gNBs.

II-C Time-Domain Multipath Measurements through MRTT and UL-AoA

In time-domain, the radio propagation channel can be modeled as a composition of individual propagation paths with path-specific propagation delay, power gain, phase shift, AoD, and AoA, together with additional distortion and interference, among other channel effects. The antenna-element-wise channel impulse response 𝐇(τ)NRX×NTX𝐇𝜏superscriptsubscript𝑁RXsubscript𝑁TX\mathbf{H}(\tau)\in\mathbb{C}^{N_{\text{RX}}\times N_{\text{TX}}}bold_H ( italic_τ ) ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT end_POSTSUPERSCRIPT can be written as a function of propagation delay τ𝜏\tauitalic_τ as [36, 34]

𝐇(τ)=k=0K1hk𝒂RX(θAOA,k)𝒂TXH(θAOD,k)δ(ττk)𝐇𝜏superscriptsubscript𝑘0𝐾1subscript𝑘subscript𝒂RXsubscript𝜃AOA𝑘superscriptsubscript𝒂TX𝐻subscript𝜃AOD𝑘𝛿𝜏subscript𝜏𝑘\mathbf{H}(\tau)=\sum_{k=0}^{K-1}h_{k}\bm{a}_{\text{RX}}(\theta_{\text{AOA},k}% )\bm{a}_{\text{TX}}^{H}(\theta_{\text{AOD},k})\delta(\tau-\tau_{k})bold_H ( italic_τ ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_a start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT ) bold_italic_a start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT AOD , italic_k end_POSTSUBSCRIPT ) italic_δ ( italic_τ - italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (3)

where δ()𝛿\delta(\cdot)italic_δ ( ⋅ ) is a Dirac delta function (i.e., a unit impulse). Similar to the frequency-domain representation, while again assuming analog phased-arrays, the effective beamformed channel impulse response can be written as

H(τ)=𝒃RXH𝐇(τ)𝒃TX.𝐻𝜏superscriptsubscript𝒃RX𝐻𝐇𝜏subscript𝒃TXH(\tau)=\bm{b}_{\text{RX}}^{H}\mathbf{H}(\tau)\bm{b}_{\text{TX}}.italic_H ( italic_τ ) = bold_italic_b start_POSTSUBSCRIPT RX end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H ( italic_τ ) bold_italic_b start_POSTSUBSCRIPT TX end_POSTSUBSCRIPT . (4)

In this paper, the measured time-domain CSI includes the path delays τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the path powers |hk|2superscriptsubscript𝑘2|h_{k}|^{2}| italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and the gNB side path angles θAOA,ksubscript𝜃AOA𝑘\theta_{\text{AOA},k}italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT. For the path delays τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and path powers |hk|2superscriptsubscript𝑘2|h_{k}|^{2}| italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, the estimation procedure is assumed to exploit 5G NR MRTT measurements [38], as discussed in Section II-A. Furthermore, for estimating the gNB side path angles θAOA,ksubscript𝜃AOA𝑘\theta_{\text{AOA},k}italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT, it is also possible to utilize UL-AoA measurements [38] based on UL SRS transmissions, as noted in Section II-A.

The overall data acquisition concept is illustrated in Fig. 1, highlighting the different considered measurements. In general, within the current 5G NR standard, the LMF is responsible for the localization and related signaling management while the positioning calculations can be carried out either at the UE or the network side. Moreover, reporting the MRTT and UL-AoA measurements between the gNB and the LMF is supported by the so-called NR Positioning Protocol A [39]. Different alternative ways to arrange for labeled training data include crowdsourcing, crowdsensing, as well as utilization of synthetic data. These are discussed further in Section IV.G.

III Proposed Methods

This section describes and introduces the novel approach of utilizing frequency-domain CSI with relative phase, while also addressing the time-domain CSI data pre-processing. In addition, the proposed architectures, hyperparameters, and training algorithm of the proposed NN models are presented. Finally, important system-level implementation alternatives and aspects are discussed.

III-A Frequency-Domain CSI Data Preprocessing

III-A1 Proposed Relative Phase Approach

5G mmWave networks operate at high carrier frequencies, at and beyond 24 GHz, with wavelengths approaching the millimeter-scale. In mobile scenarios, utilizing absolute phase responses is highly impractical, as movement of a few millimeters in distance results in a full rotation of the phase. Furthermore, as is well-known, the frequencies and wavelengths relate through

λ=δs=c/f,𝜆subscript𝛿𝑠𝑐𝑓\lambda=\delta_{s}={c}/{f},italic_λ = italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_c / italic_f , (5)

where δssubscript𝛿𝑠\delta_{s}italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is the propagation distance between two points with equal phases, λ𝜆\lambdaitalic_λ is the signal wavelength, c𝑐citalic_c is the speed of light, and f𝑓fitalic_f is the signal frequency.

In this article, we consider obtaining the frequency-domain CSI in the resolution of 12 subcarriers, which refers to a bandwidth of one resource block (RB) in 5G NR, denoted as ΔfRBΔsubscript𝑓𝑅𝐵\Delta f_{RB}roman_Δ italic_f start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT. The CSI is interpreted at the 6thsuperscript6th6^{\text{th}}6 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT subcarrier of each RB, and thus the corresponding subcarrier index for the mthsuperscript𝑚thm^{\text{th}}italic_m start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT RB observation is given as nRB,m=6+mΔfRBsubscript𝑛RB,𝑚6𝑚Δsubscript𝑓𝑅𝐵n_{\text{RB,}m}=6+m\Delta f_{RB}italic_n start_POSTSUBSCRIPT RB, italic_m end_POSTSUBSCRIPT = 6 + italic_m roman_Δ italic_f start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT with m=0,,M1𝑚0𝑀1m=0,...,M-1italic_m = 0 , … , italic_M - 1, where M𝑀Mitalic_M is the number of RBs. Moreover, we propose to take advantage of differential phase measurements between neighboring RBs. Based on (1)-(2), and when considering an individual propagation path, the phase difference between subcarriers is equal across the spectrum and completely determined by the ToF through the complex exponential term. Possible phase rotations due the other terms in (1) are constant over all subcarriers, and thus do not induce phase difference between the subcarriers. Thus, for an individual path with ToF of τ0subscript𝜏0\tau_{0}italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the phase difference between two consecutive resource blocks can be given as

Δϕ=2πτ0ΔfRB.Δitalic-ϕ2𝜋subscript𝜏0Δsubscript𝑓𝑅𝐵\Delta\phi=2\pi\tau_{0}\Delta f_{RB}.roman_Δ italic_ϕ = 2 italic_π italic_τ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_Δ italic_f start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT . (6)

While the above expression builds on a single propagation path, we utilize this approach in this work also in case of realistic multipath propagation. As elaborated further below, the differential phase approach allows to mitigate the effect of phase periodicity, and thus extract relevant features for the proposed NN-based positioning.

To this end, the linkage between the relative phase and a specific propagation distance is unambiguous only when the relative phase is within one phase cycle (Δϕ<2πΔitalic-ϕ2𝜋\Delta\phi<2\piroman_Δ italic_ϕ < 2 italic_π). A distance dϕsubscript𝑑italic-ϕd_{\phi}italic_d start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, which inflicts the full 2π2𝜋2\pi2 italic_π cycle of the relative phase between two neighboring RBs, can be solved based on (6) as

dϕ=cΔfRBsubscript𝑑italic-ϕ𝑐Δsubscript𝑓𝑅𝐵d_{\phi}=\frac{c}{\Delta f_{RB}}italic_d start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT = divide start_ARG italic_c end_ARG start_ARG roman_Δ italic_f start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT end_ARG (7)

by denoting τϕΔfRB=1subscript𝜏italic-ϕΔsubscript𝑓𝑅𝐵1\tau_{\phi}\Delta f_{RB}=1italic_τ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT roman_Δ italic_f start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT = 1, where τϕ=dϕ/csubscript𝜏italic-ϕsubscript𝑑italic-ϕ𝑐\tau_{\phi}=d_{\phi}/citalic_τ start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT = italic_d start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT / italic_c is the corresponding ToF resulting in a full phase cycle. By using the relative phase difference ΔϕΔitalic-ϕ\Delta\phiroman_Δ italic_ϕ, instead of an absolute phase, as the frequency-domain feature, the positioning performance can be significantly improved, as shown in Section IV. Although the distance ambiguity issue still remains with the phase difference recurrence at every dϕsubscript𝑑italic-ϕd_{\phi}italic_d start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT meters, it is greatly improved compared to the recurrence level with an absolute phase at every δssubscript𝛿𝑠\delta_{s}italic_δ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT meters, as fΔfRBmuch-greater-than𝑓Δsubscript𝑓𝑅𝐵f\gg\Delta f_{RB}italic_f ≫ roman_Δ italic_f start_POSTSUBSCRIPT italic_R italic_B end_POSTSUBSCRIPT.

III-A2 Frequency-Domain CSI Features and Visualization

To provide a short illustration, we consider a single representative user path along an urban environment, as shown in Fig. 2a (for further details of the environment, refer to Section IV). Then, Fig. 2b and Fig. 2c demonstrate the utilized frequency-domain CSI feature representations along the path, including the proposed features and the features from the related literature. Specifically, the raw, complex channel response, further referred to as complex frequency response (FR-Complex), is depicted in Fig. 2b, top, and Fig. 2b, center, which show the real and imaginary parts of FR-Complex for 10 consecutive resource blocks along the path. The feature is obtained from (2) as real((nRB,m))realsubscript𝑛RB,𝑚\text{real}(\mathcal{H}(n_{\text{RB,}m}))real ( caligraphic_H ( italic_n start_POSTSUBSCRIPT RB, italic_m end_POSTSUBSCRIPT ) ) and imag((nRB,m))imagsubscript𝑛RB,𝑚\text{imag}(\mathcal{H}(n_{\text{RB,}m}))imag ( caligraphic_H ( italic_n start_POSTSUBSCRIPT RB, italic_m end_POSTSUBSCRIPT ) ) for m=1,,10𝑚110m=1,...,10italic_m = 1 , … , 10.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: An example UE track in urban environment is shown in (a) where the gNB location is depicted with the red rectangle while the track color represents the mean normalized power across the resource blocks. The frequency-domain features FR-Complex and FR-Power as well as the proposed FR-Phase and FR-Power/Phase are visualized in (b) and (c), respectively, along the UE track shown in (a). In (b) and (c), different colors represent the ten different RB allocations within the full SSB transmission bandwidth.

Furthermore, Fig. 2b, bottom, depicts the channel power response, denoted as the power-domain frequency response (FR-Power). Such channel feature is utilized, e.g., in [24, 25], and can be expressed via (2) as 10log10(|(nRB,m)|2)10subscriptlog10superscriptsubscript𝑛RB,𝑚210\text{log}_{10}(|\mathcal{H}(n_{\text{RB,}m})|^{2})10 log start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( | caligraphic_H ( italic_n start_POSTSUBSCRIPT RB, italic_m end_POSTSUBSCRIPT ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In 5G NR, the FR-Power corresponds to a RB-wise RSRP measurement, defined as the average power of the resource elements carrying the reference symbols. As an input feature, we also re-scale FR-Power to the normalized range of [0,1]01[0,1][ 0 , 1 ].

Then, the top graph of Fig. 2c visualizes the proposed relative phase difference as the frequency-domain feature, further denoted as the relative phase frequency response (FR-Phase). Specifically, building on the discussion in Section III-A1, the FR-Phase can be obtained and expressed following (2) as

Δϕ(m)=arg((nRB,m))arg((nRB,m1))Δitalic-ϕ𝑚argsubscript𝑛RB,𝑚argsubscript𝑛RB,𝑚1\Delta\phi(m)=\text{arg}(\mathcal{H}(n_{\text{RB,}m}))-\text{arg}(\mathcal{H}(% n_{\text{RB,}m-1}))roman_Δ italic_ϕ ( italic_m ) = arg ( caligraphic_H ( italic_n start_POSTSUBSCRIPT RB, italic_m end_POSTSUBSCRIPT ) ) - arg ( caligraphic_H ( italic_n start_POSTSUBSCRIPT RB, italic_m - 1 end_POSTSUBSCRIPT ) ) (8)

for RB indices m=1,,M1𝑚1𝑀1m=1,...,M-1italic_m = 1 , … , italic_M - 1. The dependency between the signal path lengths and the relative phases, especially in LoS regions, is clearly visible in the figure. Moreover, it can be seen that the feature magnitude is recurring with a path propagation distance at every dϕsubscript𝑑italic-ϕd_{\phi}italic_d start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT meters, as derived in (7).

III-A3 Proposed Combined Feature

To utilize the maximum information enclosed in the measured channel responses, we further propose the so-called power and relative phase frequency response (FR-Power/Phase) approach as the ultimate frequency-domain feature. This approach combines the FR-Phase and FR-Power by transforming the FR-Phase to the complex unit circle, with subsequent element-wise multiplication with the re-scaled FR-Power. This is expressed as

P¯RB(m)exp(jΔϕ(m))subscript¯𝑃RB𝑚exp𝑗Δitalic-ϕ𝑚\bar{P}_{\text{RB}}(m)\,\text{exp}({j\Delta\phi(m)})over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT RB end_POSTSUBSCRIPT ( italic_m ) exp ( italic_j roman_Δ italic_ϕ ( italic_m ) ) (9)

where P¯RB(m)subscript¯𝑃RB𝑚\bar{P}_{\text{RB}}(m)over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT RB end_POSTSUBSCRIPT ( italic_m ) refers to the re-scaled normalized power for RB indices m=0,,M1𝑚0𝑀1m=0,...,M-1italic_m = 0 , … , italic_M - 1. Furthermore, since the FR-Phase has one element less than FR-Power, we extend the FR-Phase array with an additional element for m=0𝑚0m=0italic_m = 0 by defining Δϕ(0)=0Δitalic-ϕ00\Delta\phi(0)=0roman_Δ italic_ϕ ( 0 ) = 0.

The real and imaginary components of the proposed FR-Power/Phase feature set are visualized in Fig. 2c, center and bottom, respectively. The proposed feature allows to accommodate the advantages of both received power and relative phases in a single complex feature vector, while relaxing the distance ambiguity of the relative phase feature.

III-B Time-Domain CSI Data Preprocessing

The time-domain CSI data utilized for localization includes propagation delays τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, powers |hk|2superscriptsubscript𝑘2|h_{k}|^{2}| italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and gNB side path angles θAOA,ksubscript𝜃AOA𝑘\theta_{\text{AOA},k}italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT for the observed LoS and NLoS paths. To this end, the measured path-wise propagation delays τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, referred to as path-wise time-of-flight (path-ToF), are transformed to propagation distances by multiplying them with the speed of light. The path-wise AoAs, θAOA,ksubscript𝜃AOA𝑘\theta_{\text{AOA},k}italic_θ start_POSTSUBSCRIPT AOA , italic_k end_POSTSUBSCRIPT, are obtained at the gNB side, and transformed to directions in Cartesian coordinates in the preprocessing, to omit the zero-crossing problem with cyclic angular data. This results in a robust AoA feature, called path-wise angle-of-arrival (path-AoA) in the following, which is less susceptible to angular deviation and related uncertainties. The path-wise received powers |hk|2superscriptsubscript𝑘2|h_{k}|^{2}| italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are expressed in decibels (dBm) to overcome the extremely low feature magnitudes in linear scale. Such feature is the time-domain equivalent of the frequency-domain FR-Power, which accumulates all paths into the same observed frequency-domain measurement. Similar to the propagation delay feature, the path power feature includes information on the path propagation distance, but most importantly, it also provides information on the number and type of channel interactions, such as reflections, diffraction, or scattering, within the radio path. In general, different combinations of the time-domain CSI data can be adopted. The aggregated path-ToF+AoA and path-ToF+RP+AoA features, proposed in this work, are the most powerful ones, as shown through the numerical results.

III-C NN Model Architectures and Hyperparameters

Among the various alternative data-aided approaches, we restrict ourselves to NN models in this work, which currently dominate the ML area due to their performance, scalability, generalization properties, and dynamic architecture options [40].

III-C1 Activation Function

In this work, we utilize the Gaussian error linear unit (GELU[41] as the non-linear activation function. Its main advantages over traditional rectified linear unit (ReLU) include resistance to a “dying ReLU” problem [42], differentiability at all values while having also been shown to offer improved performance already in a number of applications such as natural language processing [41]. It can be defined as GELU(x)=xΦ(x)GELU𝑥𝑥Φ𝑥\text{GELU}(x)=x\Phi(x)GELU ( italic_x ) = italic_x roman_Φ ( italic_x ) [43, 44] where Φ(x)Φ𝑥\Phi(x)roman_Φ ( italic_x ) is the cumulative distribution function of the standard normal distribution. The function can also be approximated for faster processing as

GELU(x)=x2tanh(2π(x+Cx3)),GELU𝑥𝑥22𝜋𝑥𝐶superscript𝑥3\text{GELU}(x)=\frac{x}{2}\tanh\left(\sqrt{\frac{2}{\pi}(x+Cx^{3})}\right),GELU ( italic_x ) = divide start_ARG italic_x end_ARG start_ARG 2 end_ARG roman_tanh ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG ( italic_x + italic_C italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) end_ARG ) , (10)

where C=0.044715𝐶0.044715C=0.044715italic_C = 0.044715. Compared to ReLU, the higher complexity of GELU is compensated by the faster convergence of the model, as well as the corresponding improved positioning performance, based on our complementary experiments.

III-C2 Utilized NN Architectures

As the functional NN layers, we utilize in this work both densely connected layers and long-short term memory (LSTM[45] layers, the later being used only in the sequence-based implementation of models. Importantly, the LSTM layer is a recurrent-based layer capable of preserving long-term and short-term trends within the data.

The architecture of the densely-connected model is depicted in Fig. 3a. It consists of 5 densely connected layers with GELU activation functions and a single densely connected layer with linear activation and 2 neurons as the output, estimating the UE position. The architecture of the sequence processing capable model is, in turn, shown in Fig. 3b. It consists of 5 densely connected layers after the input, with a single LSTM layer connected in parallel with the 5th dense layer. The concatenated output of these layers is then fed to an LSTM layer with 5 neurons at the output with linear activation. Specifically, the densely connected layers serve as instantaneous feature extractors, while the intermediate LSTM layer learns the temporal features. Due to the considered parallel architecture, the last functional layer has access to both instantaneous and temporal features. The resulting output is then divided into a positioning output with 2222 variables, velocity output with a single variable, and a heading output with an additional tanh()tanh\text{tanh}(\cdot)tanh ( ⋅ ) activation and 2222 variables.

III-C3 Data Structures, Normalization and Training

In general, the input dimensions vary based on the selected features and deployment scenario. For frequency domain features, and when considering the evaluation scenario described in Section IV containing 3333 gNBs, 16161616 beams, and 10101010 RBs, the input size is either 480480480480 for FR-Power and FR-Phase or 960960960960 for FR-Power/Phase and FR-Complex. When considering the time-domain CSI in the same scenario, the individual path-ToF, path-AoA, and path-RP features have each an input size of 15151515, the combined path-ToF+AoA and path-RP+AoA features have 30303030 inputs, and finally the aggregated path-ToF+RP+AoA feature has an input size of 45454545. Some of the features are also normalized prior to the training, as demonstrated already along Fig. 2b and Fig. 2c. Specifically, all power-related quantities as well as path-wise ToF measurements, when first converted to pseudo-ranges, are all normalized between [0,1]01[0,1][ 0 , 1 ] within the overall sets of available measurements. Finally, all angle and phase quantities are, by design, within [π,π]𝜋𝜋[-\pi,\pi][ - italic_π , italic_π ]. We emphasize that each different feature scenario and combination corresponds to an individual NN, trained and deployed on its own. The vast set of numerical results, provided in Section IV, provides the corresponding mutual performance comparisons.

Refer to caption
(a)
Refer to caption
(b)
Figure 3: Architecture and hyperparameters of the densely connected NN, in (a), and of the sequence processing NN, in (b). Each layer is specified by the number of neurons and an activation function.

All considered NN models are trained using the Adam optimizer [46] with learning rates of 0.0010.0010.0010.001 for the first 200200200200 epochs, and then an early-stop** mechanism based on validation performance for additional 500500500500 epochs, while iteratively reducing the learning rate to 0.00050.00050.00050.0005 and 0.00010.00010.00010.0001 after each stop. The lowered learning rates ensure a fine-tuned performance with a small number of epochs. The mean squared error (MSE) loss was selected for each output, and for the sequence-based NN model, the loss weights were selected as 0.80.80.80.8, 0.10.10.10.1, and 0.10.10.10.1 for positioning loss, velocity loss, and heading loss, respectively. Furthermore, stemming from the deployment area of around 550×370550370550\times 370~{}550 × 370m2 (see Fig. 5), the position labels are reduced by a factor of 300300300300 to accelerate the training.

III-D System-Level Implementation Alternatives and Aspects

In general, there are alternative ways to organize and implement the use of the CSI measurements and data for NN training and actual online inference processing for localization. These are discussed below, in relation to the proposed methods and the data acquisition visualized in Fig. 1, while noting also the important role of UE radio resource control (RRC) state.

To this end, the time-domain CSI data, i.e., the MRTT-based ToF measurements and the SRS-based UL-AoA measurements, are by definition obtained at the network side. Thus, in this case, it is natural to also perform both the model training as well as the localization inference processing at the network side. Consequently, there is no need for additional signaling or feedback, and all training data from different UEs is inherently gathered together for training the model. Importantly, since MRTT and UL-AoA require scheduled SRS and PRS transmissions, time-domain measurements are only available in the connected mode when it comes to the UE RRC state.

Frequency-domain RSRP and other CSI measurements are collected from periodic and always available SSB transmissions at the UE side, thus enabling utilization of efficient data crowdsourcing methods. Despite a possible technical capability to perform training at the UE, assuming individually trained models at different UEs can be considered unrealistic. Therefore, UEs are expected to periodically share such measurement data with the network for NN training, for example, through Minimization of Drive Testing (MDT) messaging in the form of raw measurements and location tags, or alternatively as locally pre-trained models following the principle of federated learning (FL). Interestingly, unlike with MRTT and UL-AoA, the DL frequency-domain CSI and RSRP measurements can be collected and obtained also in the RRC idle mode as part of standard mobility management procedures. This can be considered a great asset enabling continuous data collection and localization with very low power consumption. Furthermore, assuming a pre-shared model from the network for the final inference phase, the UE can perform localization independently without supplementary signaling with the gNB.

IV Evaluation Environment and Results

IV-A Evaluation Scenario and Assumptions

The evaluation environment builds on ray-tracing-based channel measurements utilizing Wireless Insite®software [47]. We employ the map-based METIS Madrid grid [48], recognized as the relevant urban scenario by 3GPP in 5G NR specifications [34]. The Madrid grid layout introduces generally a rich radio propagation environment with different street widths and open areas, empowering generalization and scalability.

The simulated urban scenario illustrated in Fig. 4 contains three 5G mmWave gNBs operating at 28282828 GHz, such that clear NLoS regions also exist along the streets. Each gNB is equipped with a uniform cylindrical antenna array with 4444 elevated layers, each with 16161616 antenna elements placed at 5555 m height. The beam configuration includes 16 beams with uniformly separated azimuth angles and a common down-tilted elevation angle fixed at 10101010 deg. The AoD and ToF measurements are obtained based on the corresponding characteristics of the radio propagation path with the highest received power, building on the signals and measurement procedures described in Section II. The obtained AoD and ToF measurements are exposed to substantial measurement errors, as discussed further in Section IV-B. The beam-wise frequency-domain CSI measurements are obtained from SSB transmissions as an average of received subcarrier powers per RB, with 120120120120 kHz subcarrier spacing. Measurements with path-loss higher than 160 dB are not considered, while in general the environment shown in Fig. 4 possesses large areas and street segments with severe multi-bounce phenomena.

The combined time-domain and frequency-domain dataset consists of 40404040 vehicle-like user tracks, where the UE collects measurements at 100100100100 ms intervals. The UE locations are initialized with random locations along the streets, and the UEs move within the area by considering an equal probability to advance in any direction at intersections. The UE velocity varies between 20202020 km/h and 60606060 km/h depending on the present street and possible proximity of intersections while when approaching an intersection, the UE decelerates at 3333 m/s2 until reaching a fixed velocity of 20202020 km/h for smooth turning. After the turn, the UE accelerates at 2222 m/s2 until reaching a street-specific speed limit. The speed limit is generally defined as 40404040 km/h, apart from the top horizontal street which has the speed limit of 20202020 km/h (see Fig. 4) and the wider street below the pedestrian street having a limit of 60606060 km/h. The exact UE trajectories and associated measurement locations are different for each simulated user track. As this work is heavily focused on NLoS positioning, Fig. 4 visualizes the simulated tracks with the LoS/NLoS indication at each sampled location. We note that in order to efficiently track moving UEs with varying velocities through the sequence processing models, a sufficiently rich training dataset is needed with representative velocity statistics.

Refer to caption
Figure 4: Illustration of the METIS Madrid map-based deployment and evaluation scenario with 40 simulated UE tracks while distinguishing the LoS and NLoS regions. Detailed example paths on one crossing are shown on the right.

The available 40404040 user tracks are distributed into 32 UE traces for training, 4 for validation, and the remaining 4 for the actual testing. The validation and testing paths are carefully selected, to avoid any area-specific bias in the evaluation. Furthermore, as the work focuses on the NLoS positioning performance, we validated the consistency of the LoS/NLoS split across the datasets. The distribution of the samples in the individual datasets based on the number of LoS gNBs is consistent with approx. 35%percent3535\%35 % NLoS samples, 60%percent6060\%60 % of samples having a single LoS gNB, and only 5%percent55\%5 % samples having 2222 gNBs in LoS. The distribution suggests that the traditional model-based solutions, such as trilateration, are not applicable in the considered scenario. In total, there are 25 1812518125\,18125 181 samples in the dataset.

IV-B Network Data Uncertainties

In this work, we take into account the important practical aspect of uncertainties in the measurements and thereon in the corresponding features. To this end, the frequency-domain CSI is impaired in its FR-Complex representation with complex additive white Gaussian noise (AWGN) samples with magnitude equal to 30%percent3030\%30 % of the corresponding channel estimate’s root mean square (RMS) magnitude. Such represents large practical measurement uncertainties. The other related features such as the FR-Power/Phase are impaired correspondingly, through the transformations from the impaired FR-Complex to amplitude/power and phase domains.

To impair the time-domain features, we impose impairments separately to path-ToF, path-RP and path-AoA quantities. The path-ToF feature uncertainty is an AWGN with standard deviation (std) equal to 10101010 m. We consider the constant uncertainty scale regardless of the ToF magnitude, as the measurement errors are mostly resulting from hardware inaccuracies and timing offsets in the UEs and gNBs. Furthermore, we impair the path-RP feature with an AWGN with 2222 dB std, which corresponds to the maximum impairment of ±6plus-or-minus6\pm 6± 6 dB range with 99.7%percent99.799.7\%99.7 % certainty, defined by 3GPP as the required absolute measurement accuracy for SS-RSRP [49]. The path-AoA is, in turn, impaired with discretized accuracy of 22.5superscript22.522.5^{\circ}22.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT (360/16superscript36016360^{\circ}/16360 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT / 16 beams), rather than with a randomized value, to incorporate the gNB limitations in accurately determining the AoA.

Finally, as reviewed in the Introduction, a large majority of the state-of-the-art works, such as [15, 29, 35, 25, 28, 24, 23, 18, 21], utilize channel amplitude or power response, or even the integrated received power, as the positioning feature. Hence, in the following, the results with FR-Power feature represent essentially the state-of-the-art reference approach when it comes to the frequency-domain features. In the time-domain feature case, the use of the individual dominant path features has been considered in [8, 32, 5, 30, 31, 36], thus serving as the main reference schemes. Additionally, the state-of-the-art schemes build commonly on snap-shot NNs without harnessing the temporal correlation.

Data and codes are openly available at https://doi.org/10.5281/zenodo.12204893.

IV-C Numerical Results with Dense Snap-Shot NNs

We next provide and analyze the results obtained with dense NN based ML models while considering both the frequency-domain and time-domain features as well as the impacts of the feature density or granularity in the two considered domains. To establish an understanding on the baseline or reference performance, we start with the results under perfect measurements (no uncertainties), while then show also the performance under practical measurement uncertainties.

IV-C1 Results with Frequency-Domain Features

First, we analyze and compare the different frequency-domain CSI features introduced in Section III-A and their positioning capabilities with a densely connected snap-shot NN with 5 hidden layers. We thus split all the user tracks into individual samples and compare the performance without considering the temporal dependencies within sequences or additional uncertainties, to focus on the quality of the features themselves.

Fig. 5a visualizes the distributions of the positioning errors on the testing dataset for each feature. Each boxplot marks the median (center) as well as the first and third quartiles (25thsuperscript25𝑡25^{th}25 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and 75thsuperscript75𝑡75^{th}75 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentiles) encapsulated in the box, while the whiskers mark the values of 5thsuperscript5𝑡5^{th}5 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and 95thsuperscript95𝑡95^{th}95 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentiles. The results show that the proposed FR-Power/Phase feature representation enables the most efficient training in terms of positioning error and that considering the FR-Complex features as the input provides the poorest performance. The FR-Power and FR-Phase features achieve comparable median performance, but in terms of outliers, FR-Power performs better. The 95thsuperscript95𝑡95^{th}95 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentiles, referring essentially to the presence of outliers, of FR-Complex and FR-Phase are significantly higher than those of the remaining methods, as shown quantitatively in Table II. Furthermore, the feature combination denoted as FR-Power+FR-Phase represents the simple concatenation of the corresponding individual features. The numerical results show that the positioning performance is improved when compared to the individual features, but the novel FR-Power/Phase feature – utilizing the same, yet pre-processed inputs – provides superior performance. The table high-lights in bold the best performance numbers in different cases.

Refer to caption
(a)
Refer to caption
(b)
Figure 5: Distributions of positioning errors on the testing data when evaluating (a) different frequency-domain features, and (b) different time-domain features, with dense snap-shot NN and with no measurement uncertainties.

We next further investigate the impact of the feature representation by considering the LoS and NLoS data separately, with the results being shown in Table II. We can observe that the proposed FR-Power/Phase feature representation achieves the lowest positioning errors by a considerable margin, when compared to the other methods (3.353.353.353.35 m and 4.704.704.704.70 m mean positioning error in LoS/NLoS, respectively) addressed earlier in the literature. By comparing the performance in LoS and NLoS scenarios, we can observe some increase in the error in NLoS, however, the exact impact is clearly feature-dependent. Furthermore, when considering the 95thsuperscript95𝑡95^{th}95 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentiles of the error distributions, we can observe that the errors related to the FR-Complex and FR-Phase features are drastically increased, in both LoS and NLoS scenarios, while FR-Power+FR-Phase and FR-Power/Phase features sustain a relatively stable performance across the majority of the testing samples.

TABLE II: Baseline performance results: frequency-domain features, dense snapshot NN, no uncertainties
Feature FR-Complex FR-Power FR-Phase FR-Power+FR-Phase FR-Power/Phase
Error [mmeter\mathrm{m}roman_m] LoS NLoS LoS NLoS LoS NLoS LoS NLoS LoS NLoS
Median 7.887.887.887.88 14.0114.0114.0114.01 5.575.575.575.57 5.015.015.015.01 5.265.265.265.26 6.976.976.976.97 3.353.353.353.35 4.294.294.294.29 2.42 2.84
Mean 21.2721.2721.2721.27 25.6325.6325.6325.63 9.699.699.699.69 8.898.898.898.89 18.0118.0118.0118.01 27.9927.9927.9927.99 6.536.536.536.53 8.678.678.678.67 3.35 4.70
80thsuperscript80𝑡80^{th}80 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT pc 21.7021.7021.7021.70 33.7333.7333.7333.73 12.3612.3612.3612.36 11.1111.1111.1111.11 9.339.339.339.33 17.6717.6717.6717.67 5.465.465.465.46 8.578.578.578.57 4.45 5.18
95thsuperscript95𝑡95^{th}95 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT pc 106.37106.37106.37106.37 92.0592.0592.0592.05 30.6930.6930.6930.69 27.5327.5327.5327.53 87.9787.9787.9787.97 166.66166.66166.66166.66 10.4710.4710.4710.47 21.4321.4321.4321.43 6.71 10.97

Overall, the obtained results clearly show and demonstrate that utilizing the novel FR-Power/Phase feature offers the best performance by a large margin, clearly outperforming the earlier state-of-the-art in the field of frequency-domain features. Thus, in the further frequency-domain feature related evaluations, we consider only the FR-Power/Phase feature representation.

IV-C2 Results with Time-Domain Features

Next, we evaluate the positioning capabilities and performance when utilizing the different time-domain features (path-ToF, path-RP, and path-AoA) as the input data. We also evaluate the combination of the features, while the model can consider up to 5555 dominant multipath components above the 160 dB path-loss threshold.

Fig. 5b visualizes the achieved positioning results, showing that the proposed combinations of path-ToF and AoA or path-ToF, RP and AoA are the two best performing aggregate features. The results also suggest that the path-RP measurement provides less relevant information to the model than the path-ToF, which the model can directly interpret as normalized pseudo-range measurement. This can be seen by comparing the individual features (path-ToF vs. path-RP), as well as the cases where they are combined with path-AoA.

TABLE III: Baseline performance results: time-domain features, dense snapshot NN, no uncertainties
Feature path-ToF path-RP path-AoA path-ToF+AoA path-RP+AoA path-ToF+RP+AoA
Error [mmeter\mathrm{m}roman_m] LoS NLoS LoS NLoS LoS NLoS LoS NLoS LoS NLoS LoS NLoS
Median 2.742.742.742.74 2.192.192.192.19 2.962.962.962.96 4.494.494.494.49 1.101.101.101.10 1.671.671.671.67 0.54 1.10 1.091.091.091.09 1.731.731.731.73 0.710.710.710.71 1.211.211.211.21
Mean 11.0811.0811.0811.08 11.1411.1411.1411.14 10.1710.1710.1710.17 13.5013.5013.5013.50 2.792.792.792.79 5.035.035.035.03 1.23 3.613.613.613.61 1.911.911.911.91 4.294.294.294.29 1.321.321.321.32 2.63
80thsuperscript80𝑡80^{th}80 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT pc 8.598.598.598.59 6.056.056.056.05 6.356.356.356.35 12.5112.5112.5112.51 2.572.572.572.57 3.963.963.963.96 1.35 2.33 2.452.452.452.45 4.164.164.164.16 1.671.671.671.67 2.712.712.712.71
95thsuperscript95𝑡95^{th}95 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT pc 45.2645.2645.2645.26 40.8440.8440.8440.84 29.9129.9129.9129.91 53.6053.6053.6053.60 8.488.488.488.48 14.9214.9214.9214.92 3.773.773.773.77 6.866.866.866.86 5.315.315.315.31 14.8814.8814.8814.88 3.44 6.73

The impacts of the features as well as the standalone performance in LoS/NLoS are summarized in Table III, while also highlighting the best-performing features in each scenario. The table shows that the combination of all features (path-ToF+RP+AoA) together with path-ToF+AoA offer the best results across all statistics. The corresponding performance of path-RP+AoA lags already behind. When evaluating the individual features, the path-AoA provides high-accuracy positioning capabilities with less than 2222 m median positioning error in NLoS, as it can effectively capture the propagation patterns within the given deployment. The path-RP and path-ToF provide, in turn, significantly poorer performance as individual features, especially when considering the higher percentile errors. These results thus clearly prove the value of the directional measurements. Additionally, when compared to the results presented in Table II, relative performance improvement can be observed, which we credit to stronger interpretability of time-domain measurements as model inputs compared to the frequency-domain CSI features. Notably, meter-scale positioning accuracy can be reached through the time-domain features also in NLoS.

IV-C3 Impact of Feature Granularity

Next, we assess and compare the performance of the snap-shot NN model while varying the granularity or sparsity of the input measurements. We again separate the testing dataset into LoS and NLoS parts, and first evaluate the frequency-domain data as resource block-level (RB-level) features, and their mean values across the RBs as the bandwidth-level (BW-level) features. The RB-level features are obtained per-RB, which contains 12121212 subcarriers with a subcarrier spacing of 120120120120 kHz, thus representing a bandwidth of 1.441.441.441.44 MHz per measurement. The BW-level feature considers the 14.414.414.414.4 MHz bandwidth across 10101010 RBs. Technically, the full feature (RB-level) dataset consists of 960960960960 features per sample (real and imaginary part, 10101010 RBs, 16161616 beams, 3333 gNBs), while the BW-level feature dataset contains only 96969696 features per sample (2×16×321632\times 16\times 32 × 16 × 3). The purpose is to investigate and understand whether the more sparse feature representations, which enable simpler and thus faster models, are capable of achieving competitive performance to the non-sparse data or full features.

Fig. 6a visualizes the empirical cumulative distribution functions of the positioning errors, in the LoS and NLoS regions, when utilizing the RB-level and BW-level data as the FR-Power/Phase features. We can observe that in the LoS scenario, the BW-level features provide somewhat reduced performance compared to the full RB-level features. In the NLoS scenario, the RB-level features as the input again outperform the BW-level features. In general, particularly in NLoS where the channel geometry is more complex, the wider array of inputs can support the model in extracting more relevant positioning information thus leading to an improved performance. On the other hand, one can also conclude that the BW-level features provide a well-working solution with large reductions in the model complexity.

Refer to caption
(a)
Refer to caption
(b)
Figure 6: \Acpecdf of the positioning errors on the test data, (a) when evaluating RB-level and BW-level FR-Power/Phase data, and (b) when evaluating path-ToF+AoA and path-ToF+RP+AoA features with either dominant path only or with all detected paths. The figures visualize the results of LoS and NLoS regions separately using a dense snap-shot NN model with no measurement uncertainties.

Similarly, we next evaluate the impact of the time-domain feature granularity. Earlier, we already concluded that the path-ToF+AoA and path-ToF+RP+AoA features provide the best performance, thus these features are utilized also here. In the following, we distinguish and compare between utilizing only the dominant multipath component (in terms of power) and all available multipath components as the input features.

The achieved performance results are depicted in Fig 6b, from which we can draw the following observations. The model performance actually improves when only the dominant multipath component is used as the input. This applies to both LoS and NLoS regions, and the impact is particularly clear when the path-ToF+RP+AoA feature case is considered. Additionally, Fig. 6b shows that especially the outlier performance (the highest 10%percent1010\%10 % of errors) is clearly improved, particularly in the LoS regions. Overall, the results in Fig 6b demonstrate that very high positioning performance can be achieved also in NLoS given that proper path features are utilized.

IV-D Numerical Results with Temporal Sequence Models

In this section, we evaluate the selected frequency-domain and time-domain CSI features, namely RB-level FR-Power/Phase, path-ToF+RP+AoA with the dominant component, and path-ToF+AoA with the dominant component, in the spirit of vehicular user tracking. For this purpose, we utilize the novel temporal sequence-processing NN model proposed and described in Section III-C, while estimating the UE position, UE velocity and UE heading simultaneously. Additionally, and importantly, we now also consider the realistic uncertainties in all considered measurements and the corresponding features, as introduced in Section IV-B.

Fig. 7 provides and visualizes the sequence-based model performance for the different considered features while also explicitly comparing the cases without and with measurement uncertainties, hence providing valuable insight into the NN model generalization capabilities. In the uncertainty-free scenario, time-domain features clearly outperform the frequency-domain ones, similar to the earlier results with snap-shot models. In the practical case where the uncertainties are present in the data, the performance with the time-domain path-ToF+AoA and path-ToF+RP+AoA features deteriorates to a certain extent, while the performance gap between the LoS and NLoS scenarios also interestingly disappears. The frequency-domain FR-Power/Phase features provide essentially the same distributions for the LoS and NLoS positioning errors, while even outperforming the uncertainty-free model in NLoS. These findings high-light the generalization properties of the NN model, as the NLoS scenario itself is a source of additional uncertainties, as can be seen in Fig. 2c already. To this end, Fig. 7 demonstrates that the increased amount of uncertainty in the data does not necessarily lead to the degradation of performance, given that the model is trained on sufficient amount of data containing such uncertainties.

Refer to caption
Figure 7: Distributions of the positioning errors with sequence-processing NN model without (w/o) and with (w/) feature uncertainties along the LoS and NLoS regions. The box center denotes the median value, box edges the 25thsuperscript25𝑡25^{th}25 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and 75thsuperscript75𝑡75^{th}75 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentiles, while the whiskers mark the 5thsuperscript5𝑡5^{th}5 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and 95thsuperscript95𝑡95^{th}95 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentiles.

To further study the achievable performance of the proposed sequence-processing model and the channel features, the positioning, speed, and heading estimation performance is next assessed and shown. We also compare the proposed sequence model’s performance against the selected benchmark solutions, namely the NN model from our initial work in [14] (denoted in the continuation as SPAWC benchmark), as well as an EKF-based robust Bayesian tracking algorithm. To this end, the SPAWC model’s LSTM architecture, parameters, and training setup are as described in [14], whereas the training and testing data are naturally the same as for all other methods described in this article such that comparative results and fair comparisons can be obtained. The SPAWC benchmark utilizes path-ToF+AoA time-domain features as the inputs.

The considered EKF benchmark, in turn, utilizes ToF and AoA measurements from each gNB with LoS connection, while assuming ideal LoS detection such that the EKF performance is the best possible in all LoS locations. In the absence of any LoS link, only the prediction stage of the EKF is conducted. The implementation of the used state-transition model and observation model, together with the related Jacobians and process covariance matrix, follows the descriptions given in [11]. Consequently, the used EKF state vector comprises the UE position in x- and y-coordinates and the UE speed in x-y directions. For each track, the UE state vector is initialized with a perfect state vector estimate in order to provide the best available performance for the EKF benchmark results. However, the performance is evaluated only after the first LoS link is obtained, so that the unrealistic prediction during NLoS condition with the perfect initialization is avoided. After a brief optimization of the EKF parameters, the std of ToF and AoA, included in a diagonal measurement covariance matrix, are defined as 50 ns and 15 deg, respectively. In addition, the std of the velocity noise, used in the process covariance matrix, as defined in [11], is set as 8 m/s.

The positioning performance of the different models is visualized and compared in Fig. 8a, showing significantly improved positioning performance through the proposed solution compared to the two benchmarks. While being evaluated with the same data, the proposed time- and frequency-domain CSI features combined with the proposed sequence processing engine achieve around 2222 m median positioning error, compared to 5555 m of the SPAWC benchmark and 12121212 m of the EKF benchmark. The achieved performance improvement is stemming from the combination of the novel feature engineering and the carefully crafted sequence based NN processing system. Furthermore, besides estimating the UE location, incorporating the estimation of UE heading and velocity leads to reduced estimation variance through stabilization between individual quantities. Compared to the EKF model, which is built on the assumption of LoS geometry and unbiased measurements, the proposed approach can work in both LoS and NLoS conditions and deal with biased measurements – such as the ones encountered with the discretized path-AoA feature.

Refer to caption
(a)
Refer to caption
(b)
Figure 8: ECDFs of the positioning errors, in (a), and the split between the LoS and NLoS samples, in (b), with realistic feature uncertainties. The proposed sequence processing NN model with three different channel features is compared against two benchmark solutions.

Fig. 8b provides the corresponding results when distinguishing between the LoS and NLoS regions. The ECDFs show that with the uncertainties within the received signals, the gap between the LoS and NLoS performance with the proposed solution is strongly suppressed, as for the NN model, the NLoS scenario itself represents an uncertainty on its own, which only complements the ones in the input data. While time-domain CSI features offer LoS-agnostic results, there is still a small performance gap when utilizing the frequency-domain data. The SPAWC benchmark provides consistent results in the lower parts of the distribution, but the distribution for the NLoS samples contains significantly higher errors and outliers. The positioning performance of the EKF in LoS suffers from the discretization of the path-AoA, leading to an increasing positioning error with the increasing distance from the LoS gNB(s). In NLoS, the EKF follows the direction of the last LoS sample, leading to unreliable estimation. As noted already in the Introduction, there are Bayesian filtering or other model-based methods deliberately crafted for NLoS scenarios, such as [9, 13]. However, such possess very high computational complexity and are typically limited to single-bounce scenarios, while in the evaluation environment considered in this article severe multi-bounce phenomena can occur.

Fig. 9a and Fig. 9b visualize the speed estimation and tracking performance in the same scenario as the positioning evaluation above, yet excluding the SPAWC benchmark which does not facilitate speed or heading estimation. The results show that the speed estimation error of the proposed model and features is in the majority of situations lower than 1111 m/s, regardless of the input feature or LoS availability. The frequency-domain feature FR-Power/Phase has a slightly higher estimation error in NLoS. Comparably, EKF is subject to significantly higher estimation error even in LoS, while in NLoS loses the tracking capabilities completely.

Refer to caption
(a)
Refer to caption
(b)
Figure 9: ECDFs of the speed estimation errors, in (a), and the split between the LoS and NLoS samples, in (b), with realistic feature uncertainties. The proposed sequence processing NN model with three different channel features is compared against the EKF benchmark solution.

Furthermore, as shown in Fig. 10a and Fig. 10b, the UE heading is in the majority of samples estimated correctly – with less than 1superscript11^{\circ}1 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT error in more than 75757575% of the samples. Interestingly, the NLoS heading estimates are even more accurate than the ones with an unobstructed radio link to the gNB. This phenomenon occurs when the UE starts turning and the heading rapidly changes, while the data, such as discretized path-AoA, remain unaffected by the change. The model is not capable of instantaneously reacting due to the uncertainties included within the data.

Refer to caption
(a)
Refer to caption
(b)
Figure 10: ECDFs of the heading estimation errors, in (a), and the split between the LoS and NLoS samples, in (b), with realistic feature uncertainties. The proposed sequence processing NN model with three different channel features is compared against the EKF benchmark solution.

For a comprehensive performance assessment, we next shortly address the potential impact of the uncertainties in the training data positioning labels. We model the uncertainty through AWGN and set the corresponding std in the x-y coordinates to 5555 m. The obtained results are illustrated in Fig. 11. As can be observed, additional uncertainties in the labels force the model to generalize along the path. Consequently, the model is capable of tracking more effectively in LoS, while in NLoS, the positioning results include an increased number of outliers. These results show, that the proposed combination of features and the sequential NN model can diminish the effects of significant uncertainties in both features and labels.

Refer to caption
Figure 11: Distributions of the positioning errors with sequence-processing NN model with feature uncertainties along the LoS and NLoS regions, when considering perfect position labels (feat.) and the labels with the uncertainty modeled through AWGN with std of 5555 m along x-y coordinates (feat. + labels).

IV-E Deployment Specialization through Transfer Learning

Being able to adapt the NN model to environmental changes is a critical task in real-world applications, as any given model is essentially restricted to the environment it was trained at. This is especially so if no direct information about the environment, such as gNB locations, beam directions, or buildings, is included in the feature vector. To address this practical challenge, we explore next the capabilities of transfer learning (TL), which relies on re-training the model from one scenario to another. For presentation simplicity, we consider the snapshot model with FR-Power/Phase features and assume no feature or label uncertainty. The models utilize the same training parameters as described in Section III-C2, with the exception of applying the early stop** mechanism also to the first training loop. In addition, for the considered new gNB deployments, we limit the dataset size to only 10%percent1010\%10 % of the original to demonstrate the feasibility of the TL approach also with small data sizes.

IV-E1 Scenario 1

We first consider relocating a single gNB within the deployment, thus altering the signal propagation geometry, as visualized in Fig. 12a. For the evaluation, we consider three models, all sharing the same architecture: an original model, a TL model, and a new model trained from scratch. The original model was trained on a full dataset from the prior deployment, whereas the TL model and the new model were trained on the data from the altered scenario. Moreover, the newly trained model was initialized with random weights, while the TL model was initialized with the weights of the original model before (re-)training. In addition, only the first layer after the input was set to “trainable”, while the rest of the model remain frozen.

The positioning error distributions of the considered models are visualized in Fig. 12b. Although the original model is not trained with the data from the altered deployment scenario, its positioning error is below 10101010 m in approximately 40%percent4040\%40 % of the samples. However, most of the accurately localized samples of the original model are found in the southeast part of the deployment, where the relocated gNB is not detected, and thus the environment seems essentially unchanged. The TL model clearly outperforms the newly trained model, especially when considering errors above the 40thsuperscript40𝑡40^{th}40 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT percentile. The model convergence during the training is visualized in Fig. 12c, where both training and validation losses across epochs are shown for the TL and the newly trained models. While the TL model is able to converge within less than 80808080 epochs, the normally trained model requires more than 200200200200 epochs (not explicitly visible in the figure) to obtain the final weights. This highlights the training efficiency of the TL model.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 12: (a) Scenario 1 illustration where a single gNB is relocated. (b) Positioning performance comparison between the model trained on the original data, adapted model via TL, and a new model trained from scratch. (c) The training histories of the TL model and the newly trained model.

IV-E2 Scenario 2

We next consider the more challenging case of relocating all three gNBs, as shown in Fig. 13a. In this case, the changes in the gNB coordinates are up to hundreds of meters, resulting in significantly altered radio propagation characteristics. In Fig. 13b, the positioning errors are shown following similar model cases as in Scenario 1. Due to the considerably changed gNB locations, the performance of the original model clearly collapses. Furthermore, the TL model provides clearly better performance compared to the new model trained from scratch. In Fig. 13c, it can be seen that besides providing the best positioning performance, the TL model significantly accelerates the training rate, enabling fast model convergence in only some 50505050 epochs. These results show that TL is an efficient approach to adapt the NN model to a new, previously unseen scenario with greatly reduced effort and relaxed requirements on the availability of data.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 13: (a) Scenario 2 illustration where all gNBs are relocated. (b) Positioning performance comparison between the model trained on the original data, adapted model via TL, and a new model trained from scratch. (c) The training histories of the TL model and the newly trained model.

IV-F Notes on Complexity

Model complexity is an important aspect of any NN solution, and notable efforts are commonly invested to reach beneficial performance-complexity trade-offs. For comparability and presentation simplicity, we focus here primarily on the parameter counts and input sizes as the main complexity-related metrics. To this end, Table IV compares the proposed solutions with the ones from the referred literature noting also the model structures.

TABLE IV: Complexity assessment of related works
Reference ML model Input size Num. parameters
[15] CNN 184321843218~{}43218 432 13M13𝑀13M13 italic_M
[18] DNN 224224224224 233165023316502~{}331~{}6502 331 650
[23] DNN N/A 2.5M2.5𝑀2.5M2.5 italic_M
[25] CNN 270027002~{}7002 700 9M9𝑀9M9 italic_M
[28] DNN 184321843218~{}43218 432 910534691053469~{}105~{}3469 105 346
[29] CNN 154401544015~{}44015 440 2.6M2.6𝑀2.6M2.6 italic_M
Our models Feature Input size Num. parameters
 Snapshot  path-ToF+AoA 12121212 105830610583061~{}058~{}3061 058 306
 Snapshot  path-ToF+RP+AoA 15151515 106701010670101~{}067~{}0101 067 010
 Snapshot  FR-Power/Phase 960960960960 154368215436821~{}543~{}6821 543 682
 Sequence  path-ToF+AoA 50×12501250\times 1250 × 12* 166719216671921~{}667~{}1921 667 192
 Sequence  path-ToF+RP+AoA 50×15501550\times 1550 × 15* 167026416702641~{}670~{}2641 670 264
 Sequence  FR-Power/Phase 50×9605096050\times 96050 × 960* 263794426379442~{}637~{}9442 637 944

* Processing 50505050 snapshot samples in a sequence

As can be observed, both the convolutional and fully connected models utilized across the referred works consider a few million parameter models with up to 184321843218~{}43218 432 input sizes, corresponding to manageable computational complexity when training on high-performance machines. In comparison, the path-ToF+AoA model which achieved meter-level positioning accuracies requires only 12121212 inputs and a model with around one million parameters, while increasing the input size to 960960960960 with FR-Power/Phase feature results in 154368215436821~{}543~{}6821 543 682 trainable parameters. Our sequence-processing models, despite their extra layers, retain relatively low parameter counts resulting in feasible computational requirements. Overall, we can conclude that while outperforming the prior-art methods in positioning accuracy, the proposed models are also computationally feasible – and even lighter in complexity compared to many reference models.

IV-G Discussion on Synthetic vs. Real-World Training Data

While we evaluate the performance of the methods and models on ray-tracing data in this article, training the models with artificially created synthetic data can be a feasible option also for true deployments and applications of large-scale NN-operated systems. In general, the acquisition of synthetic data is cheaper and faster than performing exhaustive site surveys and can approximate reality with increasing accuracy and fidelity [50]. Considering the availability of synthetic data, there are many possibilities to organize the model training in practice. For example, initial training can be performed already at the factory based on synthetic data from a specific intended operation environment, and then the model can be fine-tuned on a limited set of real-world measurements. Such real-world labelled data can be obtained through different crowdsourcing or crowdsensing arrangements, as discussed further below. On the other hand, training at the factory can be more generic and cover numerous environments, while specialization to a specific environment can be managed with a TL approach using synthetic data and/or real-world measurements. Similarly, in case of any changes in the environment, it is possible to update the model via TL by utilizing re-generated site-specific synthetic data and/or newly obtained real-world measurements.

Employing crowdsourcing campaigns, where the surveying software is offered to the public to perform the measurements is one way of arranging real-world labelled data in practice. One recognized challenge is that such an approach is often biased by the human factor, such as inaccurate manual labeling. As another alternative, while crowdsourcing requires the user to perform an action to obtain the data, crowdsensing is fully automated and unsupervised by the user. Consequently, it can yield a massive volume of data, though at the cost of potentially missing labels or other quality-related challenges. Numerous solutions exist to cure and filter such data, although some challenges still remain [51].

In general, the topic of synthetic data is currently explored by IEEE [52] and can become a critical link enabling real-world systems driven by their digital twins [53]. This is a key component in positioning-driven studies, where obtaining realistic performance evaluations requires consideration of whole network deployments with detailed physical-layer processing and measurement capabilities. End-to-end simulated results play thus a crucial role in the positioning system analysis before conclusive validation through experimental field tests.

V Conclusions

This article addressed cellular network-based user localization and tracking in challenging NLoS environments, with specific emphasis on 5G mmWave deployments and urban vehicular systems. We first described the UL/DL measurements, available for positioning purposes, together with their acquisition in 5G mmWave networks. We then derived and proposed efficient frequency-domain CSI features, most notably utilizing the relative phases and powers of the received signal across the neighboring resource blocks. As time-domain CSI data, we exploit the multipath components and proposed different aggregate features combining time-of-flight, angle-of-arrival, and received path-wise powers. Deep learning ML architectures were then described, covering not only dense snapshot models but also sequence-processing NN models harnessing the temporal correlations of the features in vehicular systems.

Realistic numerical evaluations in large-scale LoS-obstructed urban environment with moving vehicles were provided, building on full ray-tracing-based propagation modeling on METIS Madrid map at 28 GHz. The baseline results without feature uncertainties show that the frequency-domain CSI in the form of RB-level relative phases and powers allows for very good and robust positioning performance, in both LoS and NLoS, while even further enhanced performance can be obtained through the time-domain features when combining multipath times-of-flight and angles-of-arrival. The results also show that dominant multipath feature combinations are sufficient, or even favorable, for robust positioning. Additionally, when considering practical levels of feature measurement uncertainties, together with the sequence-processing NN models, robust positioning in both LoS and NLoS was still shown to be feasible. Finally, the important practical aspect of dealing with gNB deployment differences between the training and inference phases was addressed. It was shown that such environment related uncertainties can be addressed and alleviated through transfer learning.

Overall, the provided numerical results clearly demonstrate that the proposed methods harnessing the novel feature engineering and sequence processing neural network models outperform the state-of-the-art, being able to facilitate 1-2 m median positioning accuracy even in deep-NLoS regions with feasible parameter counts. Our future work will focus on exploring the opportunities with obtaining and processing positioning measurements from co-existing C-band and mmWave cellular networks, as well as with further improving the positioning accuracy and reliability through sensor fusion.

References

  • [1] H. Holma, A. Toskala, and T. Nakamura, 5G Technology: 3GPP New Radio.   Wiley, 2019.
  • [2] E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The Next Generation Wireless Access Technology, 2nd ed.   Elsevier, 2020.
  • [3] R. Mendrzik, F. Meyer, G. Bauch, and M. Z. Win, “Enabling Situational Awareness in Millimeter Wave Massive MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 13, no. 5, pp. 1196–1211, 2019.
  • [4] J. A. del Peral-Rosado, R. Raulefs, J. A. López-Salcedo, and G. Seco-Granados, “Survey of cellular mobile radio localization methods: From 1G to 5G,” IEEE Commun. Surveys Tuts., vol. 20, no. 2, pp. 1124–1148, 2017.
  • [5] M. Koivisto et al., “High-efficiency device positioning and location-aware communications in dense 5G networks,” IEEE Commun. Mag., vol. 55, no. 8, pp. 188–195, 2017.
  • [6] R. Di Taranto et al., “Location-Aware Communications for 5G Networks: How location information can improve scalability, latency, and robustness of 5G,” IEEE Signal Process. Mag., vol. 31, no. 6, pp. 102–112, Nov. 2014.
  • [7] K. Nagai, T. Fasoro, M. Spenko, R. Henderson, and B. Pervan, “Evaluating GNSS navigation availability in 3-D mapped urban environments,” in Proc. IEEE/ION PLANS, 2020, pp. 639–646.
  • [8] Y. Lu et al., “Bayesian filtering for joint multi-user positioning, synchronization and anchor state calibration,” ”IEEE Trans. Veh. Technol., pp. 1–16, 2023.
  • [9] Y. Ge et al., “A computationally efficient EK-PMBM filter for bistatic mmWave radio SLAM,” IEEE J. Sel. Areas Commun., vol. 40, no. 7, pp. 2179–2192, 2022.
  • [10] J. L. C. Villacrés, Z. Zhao, T. Braun, and Z. Li, “A particle filter-based reinforcement learning approach for reliable wireless indoor positioning,” IEEE J. Sel. Areas Commun., vol. 37, no. 11, pp. 2457–2473, 2019.
  • [11] M. Koivisto et al., “Joint device positioning and clock synchronization in 5G ultra-dense networks,” IEEE Trans. Wireless Commun., vol. 16, no. 5, pp. 2866–2881, 2017.
  • [12] K. Ko, W. Ahn, and W. Shin, “High-speed train positioning using deep kalman filter with 5G NR signals,” IEEE Trans. Intell. Transp. Syst., 2022.
  • [13] J. Talvitie, M. Valkama, G. Destino, and H. Wymeersch, “Novel algorithms for high-accuracy joint position and orientation estimation in 5G mmwave systems,” in Proc. IEEE Globecom Workshops, 2017, pp. 1–7.
  • [14] R. Klus, J. Talvitie, J. Vinogradova, J. Torsner, and M. Valkama, “Machine learning based NLOS radio positioning in beamforming networks,” in Proc. IEEE SPAWC, 2022, pp. 1–5.
  • [15] X. Sun, C. Wu, X. Gao, and G. Y. Li, “Fingerprint-based localization for massive MIMO-OFDM system with deep convolutional neural networks,” IEEE Trans. Veh. Technol., vol. 68, no. 11, 2019.
  • [16] A. Zappone, M. Di Renzo, and M. Debbah, “Wireless networks design in the era of deep learning: Model-based, AI-based, or both?” IEEE Trans. Commun., vol. 67, no. 10, pp. 7331–7376, 2019.
  • [17] G. Revach et al., “KalmanNet: Neural network aided Kalman filtering for partially known dynamics,” IEEE Trans. Signal Process., vol. 70, pp. 1532–1547, 2022.
  • [18] R. Klus, J. Talvitie, and M. Valkama, “Neural network fingerprinting and GNSS data fusion for improved localization in 5G,” in Proc. Int. Conf. Localization and GNSS (ICL-GNSS), 2021, pp. 1–6.
  • [19] J. Torres-Sospedra et al., “A comprehensive and reproducible comparison of clustering and optimization rules in Wi-Fi fingerprinting,” IEEE Trans. Mobile Comput., 2020.
  • [20] J. Rojo et al., “Machine learning applied to Wi-Fi fingerprinting: The experiences of the ubiqum challenge,” in Proc. IPIN, 2019, pp. 1–8.
  • [21] R. Klus et al., “Transfer learning for convolutional indoor positioning systems,” in Proc. IPIN, 2021, pp. 1–8.
  • [22] M. M. Butt, A. Pantelidou, and I. Z. Kovács, “ML-assisted UE positioning: Performance analysis and 5G architecture enhancements,” IEEE Open Journal of Vehicular Technology, vol. 2, pp. 377–388, 2021.
  • [23] E. Gönültaş, E. Lei, J. Langerman, H. Huang, and C. Studer, “CSI-Based Multi-Antenna and Multi-Point Indoor Positioning Using Probability Fusion,” IEEE Trans. Wireless Commun., vol. 21, no. 4, pp. 2162–2176, 2021.
  • [24] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based fingerprinting for indoor localization: A deep learning approach,” IEEE Trans. Veh. Technol., vol. 66, no. 1, pp. 763–776, 2016.
  • [25] H. Chen, Y. Zhang, W. Li, X. Tao, and P. Zhang, “ConFi: Convolutional neural networks based indoor Wi-Fi localization using channel state information,” IEEE Access, vol. 5, pp. 18 066–18 074, 2017.
  • [26] J. Xiao, K. Wu, Y. Yi, and L. M. Ni, “FIFS: Fine-grained indoor fingerprinting system,” in 2012 21st International Conference on Computer Communications and Networks (ICCCN), 2012, pp. 1–7.
  • [27] X. Wang, L. Gao, and S. Mao, “CSI Phase Fingerprinting for Indoor Localization With a Deep Learning Approach,” IEEE Internet Things J., vol. 3, no. 6, pp. 1113–1123, 2016.
  • [28] P. Ferrand, A. Decurninge, and M. Guillaud, “DNN-based localization from channel estimates: Feature design and experimental results,” in Proc. IEEE GLOBECOM, 2020, pp. 1–6.
  • [29] K. Gao, H. Wang, H. Lv, and W. Liu, “Toward 5G NR high-precision indoor positioning via channel frequency response: A new paradigm and dataset generation method,” IEEE J. Sel. Areas Commun., 2022.
  • [30] D. Lynch, L. Ho, M. MacDonald, and M. O’Neill, “Localisation in wireless networks using deep bidirectional recurrent neural networks,” in Proc. Int. Joint Conf. Neural Networks (IJCNN), 2020, pp. 1–8.
  • [31] Z. HajiAkhondi-Meybodi, M. Salimibeni, A. Mohammadi, and K. N. Plataniotis, “Bluetooth low energy and CNN-based angle of arrival localization in presence of Rayleigh fading,” in Proc. IEEE ICASSP, 2021, pp. 7913–7917.
  • [32] Y. Xie, L. Zhou, Y. Zhang, H. Huan, and Z. Zhang, “Simultaneous localization of scatterers and target user based on indoor prior information in NLOS environments,” IEEE Trans. Veh. Technol., vol. 71, no. 11, pp. 11 729–11 740, 2022.
  • [33] T. Feigl, E. Eberlein, S. Kram, and C. Mutschler, “Robust ToA-estimation using convolutional neural networks on randomized channel models,” in Proc. IPIN, 2021, pp. 1–8.
  • [34] 3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz,” 3GPP, Tech. Rep. 38.901, 3 2022, version 17.0.0.
  • [35] M. M. Butt, A. Rao, and D. Yoon, “RF fingerprinting and deep learning assisted UE positioning in 5G,” in Proc. IEEE VTC-Spring, 2020, pp. 1–7.
  • [36] Y. Chen, J. Palacios, N. González-Prelcic, T. Shimizu, and H. Lu, “Joint initial access and localization in millimeter wave vehicular networks: a hybrid model/data driven approach,” in Proc. IEEE SAM, 2022, pp. 355–359.
  • [37] 3GPP, “Physical layer measurements,” 3GPP, Tech. Rep. 38.215, 1 2021, version 16.4.0.
  • [38] 3GPP, “Stage 2 functional specification of User Equipment (UE) positioning in NG-RAN,” 3GPP, Tech. Rep. 38.305, 12 2021, version 16.7.0.
  • [39] 3GPP, “NG Radio Access Network (NG-RAN); Stage 2 functional specification of User Equipment (UE) positioning in NG-RAN,” 3GPP, Tech. Rep. 38.305, 9 2022, version 16.8.0.
  • [40] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning.   MIT press, 2016.
  • [41] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
  • [42] L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, “Dying relu and initialization: Theory and numerical examples,” arXiv preprint arXiv:1903.06733, 2019.
  • [43] Y. Wang et al., “Transformer-based acoustic modeling for hybrid speech recognition,” in Proc. IEEE ICASSP, 2020, pp. 6874–6878.
  • [44] J. Xiao, X. Fu, A. Liu, F. Wu, and Z.-J. Zha, “Image De-raining Transformer,” IEEE Trans. Pattern Anal. Mach. Intell., 2022.
  • [45] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [46] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [47] Remcom. Wireless InSite - 3D Wireless Prediction Software. Accessed: Jan 27, 2021). [Online]. Available: https://www.remcom.com/wireless-insite-em-propagation-software
  • [48] A. Rauch et al., “Fast algorithm for radio propagation modeling in realistic 3-D urban environment,” Advances in Radio Science, vol. 13, pp. 169–173, 11 2015.
  • [49] 3GPP, “NR; Requirements for support of radio resource management,” 3GPP, Tech. Rep. 38.133, 9 2022, version 16.13.0.
  • [50] Y. Assayag, H. Oliveira, E. Souto, R. Barreto, and R. Pazzi, “Indoor positioning system using synthetic training and data fusion,” IEEE Access, vol. 9, pp. 115 687–115 699, 2021.
  • [51] A. Capponi et al., “A survey on mobile crowdsensing systems: Challenges, solutions, and opportunities,” IEEE Communications Surveys & Tutorials, vol. 21, no. 3, pp. 2419–2465, 2019.
  • [52] “Synthetic data,” IEEE Standards Association, Mar 2023. [Online]. Available: https://standards.ieee.org/industry-connections/synthetic-data/
  • [53] A. Castellani, S. Schmitt, and S. Squartini, “Real-world anomaly detection by using digital twin systems and weakly supervised learning,” IEEE Trans. Ind. Informat., vol. 17, no. 7, pp. 4733–4742, 2020.