3GPP: 3rd Generation Partnership Project
5G: fifth generation
AE-ELM: auto-encoder extreme learning machine
AI: artificial intelligence
AoA: angle-of-arrival
AoD: angle-of-departure
AP: access point
AWGN: additive white Gaussian noise
BLE: Bluetooth Low Energy
BW: bandwidth
BW-level: bandwidth-level
BS: base station
CM: common model
CNN: convolutional neural network
CR: compression ratio
CSI: channel state information
CU: Centralized Unit
DL: downlink
DL-AoD: downlink angle-of-departure
DoA: direction-of-arrival
DNN: dense neural network
DU: Decentralized Unit
ECDF: empirical cumulative distribution function
EEG: electroencephalogram
EKF: extended Kalman filter
ELM: extreme learning machine
EM: Encoder Model
FFT: Fast Fourier Transform
FR: frequency response
FM: full model
FP: fingerprinting
FR-Complex: complex frequency response
FR-Power: power-domain frequency response
FR-Phase: relative phase frequency response
FR-Power/Phase: power and relative phase frequency response
FR1: Frequency Range 1
FR2: Frequency Range 2
HDim: HIDDEN_DIM
GDA: Generalized Discriminant Analysis
GELU: Gaussian error linear unit
gNB: gNodeB
GNSS: Global Navigation Satellite System
ICA: Independent Component Analysis
IoT: Internet-of-Things
IPS: indoor positioning system
KF: Kalman filter
$k$ -NN: $k$ -Nearest Neighbors
LBS: location-based services
LDA: Linear Discriminant Analysis
LMF: Location Management Function
LoS: line-of-sight
LR: learning rate
LSTM: long-short term memory
MAC: Media Access Control
MDT: Minimization of Drive Testing
MEE: mean Euclidean error
MIMO: multiple-input multiple-output
ML: machine learning
mmWave: millimeter-wave
MRTT: multi-round trip time
NLoS: non-line-of-sight
NN: neural network
NR: New Radio
OFDM: orthogonal frequency-division multiplexing
path-AoA: path-wise angle-of-arrival
path-RP: path-wise received power
PRS: positioning reference signal
path-ToF: path-wise time-of-flight
PHY: physical-layer
PCA: principal component analysis
PF: particle filter
PL: path-loss
QoE: quality of experience
QoS: quality of service
RB: resource block
RE: resource element
ReLU: rectified linear unit
RB-level: resource block-level
RF: radio frequency
RFID: radio requency identification device
RMS: root mean square
RMSEE: root mean squared euclidean error
RNN: recurrent neural network
RP: reference point
RRC: radio resource control
RS: recommender system
RSRP: reference signal received power
RSRQ: reference signal received quality
RSS: received signal strength
RSSI: received signal strength indicator
RTT: round trip time
RX: receiver
SLAM: simultaneous localization and map**
SP: scattering point
SS: synchronization signal
SSB: synchronization signal block
SRS: sounding reference signal
SSS: secondary synchronization signal
std: standard deviation
SMV: support vector machine
SVD: single value decomposition
TDoA: time-difference-of-arrival
TL: transfer learning
TLCNN: Transfer Learning Convolutional Neural Network
ToA: time-of-arrival
ToF: time-of-flight
t-SNE: T-distributed Stochastic Neighbor Embedding
TX: transmitter
UE: user equipment
UL: uplink
UL-AoA: uplink angle-of-arrival
Wi-Fi: IEEE 802.11 network
WLAN: wireless LAN
WSN: wireless sensors networks

\fail

Robust NLoS Localization in 5G mmWave Networks: Data-based Methods and Performance

Roman Klus \orcidlink0000-0002-0641-5931, Jukka Talvitie \orcidlink0000-0001-7685-7666, Julia Vinogradova \orcidlink0000-0001-8911-2065, Gabor Fodor \orcidlink0000-0002-2289-3159, Johan Torsner, and
Mikko Valkama \orcidlink0000-0003-0361-0800 Limited subset of early-stage results presented at IEEE SPAWC 2022 [14].R. Klus, J. Talvitie, and M. Valkama are with Tampere University, Finland. J. Vinogradova and J. Torsner are with Ericsson Research, Helsinki, Finland. G. Fodor is with Ericsson Research and with KTH, Stockholm, Sweden. Data and codes openly available at https://doi.org/10.5281/zenodo.12204893

Abstract

Ensuring smooth mobility management while employing directional beamformed transmissions in 5G millimeter-wave networks calls for robust and accurate user equipment (UE) localization and tracking. In this article, we develop neural network-based positioning models with time- and frequency-domain channel state information (CSI) data in harsh non-line-of-sight (NLoS) conditions. We propose a novel frequency-domain feature extraction, which combines relative phase differences and received powers across resource blocks, and offers robust performance and reliability. Additionally, we exploit the multipath components and propose an aggregate time-domain feature combining time-of-flight, angle-of-arrival and received path-wise powers. Importantly, the temporal correlations are also harnessed in the form of sequence processing neural networks, which prove to be of particular benefit for vehicular UEs. Realistic numerical evaluations in large-scale line-of-sight (LoS)-obstructed urban environment with moving vehicles are provided, building on full ray-tracing based propagation modeling. The results show the robustness of the proposed CSI features in terms of positioning accuracy, and that the proposed models reliably localize UEs even in the absence of a LoS path, clearly outperforming the state-of-the-art with similar or even reduced processing complexity. The proposed sequence-based neural network model is capable of tracking the UE position, speed and heading simultaneously despite the strong uncertainties in the CSI measurements. Finally, it is shown that differences between the training and online inference environments can be efficiently addressed and alleviated through transfer learning.

Index Terms:

5G New Radio, channel state information, deep learning, non-line-of-sight, positioning, tracking, vehicular systems

I Introduction

Expanding to the millimeter-wave (mmWave) frequencies allows to harness large channel bandwidths in the fifth generation (5G) New Radio (NR) mobile communication networks, which improves the network capacity, peak data rates, and latency characteristics compared to legacy systems [1, 2]. In such mmWave networks, beamforming active antenna arrays are a critical technology, allowing directional transmission and reception capabilities, thus improving the link budget while mitigating co-channel interference and providing the basis for angle-based cellular positioning.

In general, accurate real-time knowledge of the locations of the network UE is critical to ensure smooth and seamless mobility management, efficient handover management, and improved reliability of the radio link, while also allowing for location-based services (LBS) [3, 4, 5, 6]. Baseline UE localization builds commonly on Global Navigation Satellite System (GNSS) based approaches. However, terrestrial positioning utilizing 5G and other signals of opportunity is of increasing importance, and is also the main technical scope of this article. This is well motivated, as the availability of GNSS is known to be compromised not only indoors but also in outdoor urban areas [4, 7], while the large channel bandwidths and directional antenna systems deployed in 5G allow for accurate time- and angle-based measurements.

There is generally a wide selection of positioning methods available in the literature [4], covering both model-based Bayesian filtering approaches [8, 9, 10, 11, 12, 13] as well as data-driven machine learning (ML) based methods [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Majority of the works focus on the LoS scenarios, where the channel includes a direct propagation path from the transmitter (TX) to the receiver (RX), and thus the UE location can be estimated geometrically by utilizing pseudo-ranges and/or angular information based on radio measurements. Good examples of such LoS-oriented 5G positioning works include [11, 12], building commonly on Bayesian filtering methods such as different variants of extended Kalman filter (EKF) and particle filter (PF). In case the LoS path is unavailable, geometry-based Bayesian filtering models still exist, e.g., [9, 10, 13]. Such methods are, however, typically limited to single-bounce scenarios with a single path per scattering point, and can thus easily become unreliable in realistic scattering environments while being also computationally heavy and complex [32, 33]. Therefore, UE positioning and tracking in NLoS or LoS-ambiguous scenarios call for novel solutions and models, capable of ensuring fast and reliable operation in realistic scattering environments with feasible real-time computational complexity. This is the main technical focus of this article, with a specific emphasis on vehicular systems in challenging urban environments, where NLoS scenarios commonly occur with realistic network deployments [34].

In this article, building on our initial work in [14], we propose and describe efficient ML based models for NLoS or LoS-obstructed positioning that offer robust performance, low operational complexity, good generalization properties, and wide architectural options. We harness the temporal correlation of the channel features in vehicular systems and focus on sequence processing neural network (NN) methods as the fundamental ML engine. We propose two alternative feature sets capable of describing the radio channel and further NN-based positioning, namely frequency-domain and time-domain CSI features. Additionally, we include the vehicle speed and heading as additional model outputs to enable efficient tracking using a single model.

ML-based positioning has been addressed in the recent literature, e.g. in [15, 14, 16, 17, 18, 20, 22, 23, 24, 25, 26, 27, 28, 21, 29, 30, 31]. To this end, received signal strength (RSS) measurements were adopted in [19, 20, 21], in indoor IEEE 802.11 network (Wi-Fi) fingerprinting context, while the corresponding 5G deployment was considered in [35]. The work in [24] proposed a CSI-based fingerprinting model that utilizes the amplitude response of the channel. Due to the classifier-like NN, the proposed system is, however, limited to small deployments. The authors of [15] developed an NN-based feature extractor with Wi-Fi CSI measurements considering the amplitude response only, followed by a $k$ -Nearest Neighbors ( $k$ -NN) positioning algorithm. Similar approaches building on channel amplitude response measurements were considered also in [25] and [26]. In [27], a Wi-Fi positioning approach utilizing the channel phase response as a feature was proposed. The method is, however, not suitable for large-scale scenarios due to the classifier-based NN, while the considered phase slope calculation is also subject to ambiguities. The work in [22], in turn, presented a 5G positioning system, however, being limited to the LoS scenario while utilizing only the beam-specific reference signal received power (RSRP) values as the features. In [23], a positioning system that generates probability maps using an NN model was described with a feature representation that transforms the frequency-domain uplink (UL) CSI data to a delay-domain. The probability maps enable efficient sensor fusion, yet their scale directly affects the complexity of the underlying NN model. Furthermore, [29] described a paradigm to produce a high-accuracy 5G localization dataset, building on channel frequency response measurements.

Different hybrid solutions also exist in the literature, either in terms of aiding Bayesian filtering solutions through ML methods or fusing measurements from various different sensors [16]. To this end, [17] proposed a series of recurrent neural network (RNN) models to replace the EKF and thus enable implicit and data-driven learning. Furthermore, a sensor fusion approach using reinforcement learning-assisted particle filtering is described in [10]. Neural network based 5G fingerprinting and GNSS data fusion were, in turn, considered in [18]. Recently, in [36], an NN model classifying different propagation paths from time-domain CSI in the form of path parameters was paired with geometry-based positioning algorithm considering LoS and single-bounce NLoS paths.

ML-based localization with propagation time measurements has also gained interest in recent works [30, 33, 14]. In [30], a bidirectional RNN is employed to track the UE based on the time-of-arrival (ToA) measurements from multiple nodes, while the authors of [33] utilize a convolutional neural network (CNN) to estimate the accurate ToA from the raw channel impulse responses in LoS/NLoS scenarios. However, only LoS measurements are used for the actual localization. Finally, angular information at either the gNodeB (gNB) or the UE can also be used for ML-based localization, as shown in [31].

TABLE I: Summary of related works

Reference	Technology	Features	ML model	Tracking	Uncertainty			Vehicular	NLoS
Reference	Technology	Features	ML model	Tracking	Feat.	Label	Envir.	Systems	NLoS
[14]	5G mmWave	AoA and ToF	DNN, LSTM	$\checkmark$	$\checkmark$	$\checkmark$	$\times$	$\checkmark$	$\checkmark$
[15]	Wi-Fi	channel amplitude response	CNN	$\times$	$\times$	$\times$	$\times$	$\times$	$\checkmark$ *
[18]	5G mmWave	RSRP	DNN	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\times$
[22]	5G mmWave	RSRP	DNN	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$ **
[23]	Wi-Fi	channel frequency response	DNN	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\checkmark$
[24]	Wi-Fi	channel amplitude response	DNN	$\checkmark$	$\times$	$\times$	$\times$	$\times$	$\times$
[25]	Wi-Fi	channel amplitude response	CNN	$\times$	$\times$	$\times$	$\times$	$\times$	$\times$ *
[26]	Wi-Fi	channel amplitude response	ML	$\checkmark$	$\times$	$\times$	$\times$	$\times$	$\times$ *
[27]	Wi-Fi	channel phase response	DNN	$\checkmark$	$\times$	$\times$	$\times$	$\times$	$\times$
[29]	5G mmWave	channel frequency response	CNN	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\times$
[30]	Unspecified	ToA	RNN	$\checkmark$	$\times$	$\times$	$\times$	$\times$	$\times$
[31]	BLE	AoA	CNN	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\checkmark$ ***
[33]	LTE	channel impulse response	CNN	$\times$	$\checkmark$	$\times$	$\times$	$\times$	$\checkmark$ *
[36]	mmWave	path-wise CSI	DNN	$\times$	$\checkmark$	$\times$	$\times$	$\checkmark$	$\checkmark$ **
This Work	5G mmWave	time- and frequency-domain CSI	DNN, LSTM	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$

* Some NLoS samples available in evaluation; ** Removing the detected NLoS samples before positioning; *** RX signal subject to Rayleigh fading.

The main related works and their relevant aspects are summarized in Table I. Importantly, the NLoS positioning under rich and realistic scattering is not explicitly addressed, in particular in the context of beamforming 5G mmWave networks. Thus, complementary to the existing literature, this article focuses on ML-based reliable network localization using time- and frequency-domain CSI data with emphasis on challenging NLoS scenarios, while noting also various relevant uncertainty aspects. The application focus is on vehicular systems in urban environments with 5G mmWave deployments and CSI features that can be obtained through 3GPP standardized UL and/or downlink (DL) measurements and corresponding signaling. The contributions and novelty compared to the existing ML-based positioning literature can be stated and summarized as follows:

•

We introduce, derive, and evaluate efficient frequency-domain CSI features in the form of sparse power and phase measurements, and their combinations, for ML-based positioning models and compare their robustness with the ones introduced previously in the literature;
•

We also introduce and evaluate alternative time-domain path-wise CSI features and demonstrate their effectiveness in wireless positioning scenarios while finding the relevant and best-performing feature combinations;
•

We develop a novel hybrid NN processing model in terms of instantaneous and sequence data processing for simultaneous UE location, velocity, and heading tracking using the above channel-based features;
•

We evaluate the performance of different features and processing models in a large-scale, realistic, urban scenario with full ray-tracing-based channel measurements under harsh NLoS conditions in the context of 28 GHz mmWave 5G network, while also considering realistic training and measurement uncertainties;
•

We show that the proposed time-domain and frequency-domain features outperform the benchmark solutions, especially when combined with the sequence processing ML model, in terms of the positioning accuracy and complexity – in particular, in the challenging multi-bounce scattering environments, where the proposed positioning approach achieves comparable accuracy in both LoS and NLoS conditions, regardless of the number of bounces;
•

We also address the important practical issue of environment or gNB deployment differences between the training phase and the actual online inference phase, through transfer learning, and show that specializing to the current deployment is feasible;
•

Finally, we address the complexity of the developed methods, in comparison to the prior-art, while also openly share the data and codes for research reproducibility and transparency.

For clarity it is stated that selected time-domain features were initially considered in [14], however, the adopted NN models were lacking the advanced sequence processing capabilities. Additionally, no frequency-domain features were considered, while the transfer learning aspects were also fully neglected.

The rest of this article is organized as follows. Section II introduces the network measurements applicable for positioning and their acquisition with 3GPP compatible reference signals and measurement procedures. Section III introduces the proposed frequency-domain and time-domain CSI data features and the related pre-processing. Additionally, the NN processing models and architectures are described incorporating both instantaneous and sequence models. Section IV describes the considered urban vehicular positioning scenario and evaluation environment, together with the practical measurement or data uncertainties. Additionally, the obtained numerical results are presented and analyzed, while also considering the important aspect of specializing to the prevailing gNB deployment through transfer learning. Finally, Section V concludes the work.

II Positioning Measurements and Data Acquisition

This section introduces the signals and standardized measurements, available in 5G NR, to extract positioning data. Specific focus is on synchronization signal block (SSB) based measurements in DL, in terms of frequency-domain data, while in time-domain we harness multi-round trip time (MRTT) and UL angle-of-arrival (AoA) based multipath measurements utilizing UL sounding reference signal (SRS) and DL positioning reference signal (PRS). The relevant measurements and data acquisition methods are illustrated conceptually in Fig. 1, while being described in detail below. For clarity, we state that the frequency- and time-domain measurements are alternative approaches to obtain positioning features and data.

II-A Signals and Measurements for 5G NR Positioning

Measuring the received signal strength is one common approach utilized in wireless positioning. In 5G NR, we distinguish RSRP, reference signal received quality (RSRQ), and received signal strength indicator (RSSI), including their beam-specific and resource-specific alternatives. Their acquisition is defined in [37] building on different reference signals, such as UL SRS and DL synchronization signal (SS) and PRS. The corresponding measurements are called UL-SRS-RSRP, SS-RSRP and DL-PRS-RSRP, respectively. Signal strength measurements are vital for numerous network functions, such as mobility management, and thus regularly collected.

Compared to signal strength-based measurements, propagation delay or time-of-flight (ToF)-based ranging benefits from large transmission bandwidths while being less sensitive to channel effects, such as reflections, diffractions, and scattering. To relax clock synchronization requirements between TX and RX, the 5G NR standard supports MRTT measurements [38] where the gNB measures the round trip time, denoted as ”gNB Rx–Tx time difference”, based on PRS transmission in DL and SRS transmission in UL. In addition, the UE measures the time between receiving the PRS and sending the SRS, denoted as the ”UE Rx–Tx time difference”, which is reported to the gNB [37] in order to solve the channel-dependent propagation delay. Alternatively, the ToF can be estimated indirectly at the gNB via time-difference-of-arrival (TDoA) and the related positioning calculations, as defined in [38]. Obtaining the ToF directly at the UE is currently not explicitly standardized.

Beamformed radio access provides inherent support for angle estimation and corresponding angle-based positioning schemes. In the current 5G NR standard, angle estimation is directly specified only for the gNB-side angular information either via the uplink angle-of-arrival (UL-AoA) or the downlink angle-of-departure (DL-AoD) [38]. The UL-AoA is defined as the estimated azimuth and vertical angles of a UE, observed at a gNB [37], based on UL SRS. The exact angle estimation method used in the UL-AoA is not specified, which allows for performance optimization. On the other hand, the estimation of the gNB angle at the UE using the DL-AoD is practically restricted to the use of spatial power measurements, i.e., DL-PRS-RSRP, which limits the achievable angle estimation accuracy [38].

Refer to caption — Figure 1: Illustration of the network data acquisition scheme with two gNBs, a UE, and a network localization entity represented by the LMF. Different UL and DL reference signals form the physical basis for obtaining positioning measurements and data. Additionally, the baseline neural processing chains from features to UE location, heading and velocity are highlighted, for both time- and frequency-domain feature scenarios.

Importantly, estimating the ranges and angles also for paths beyond the LoS component is feasible [13], offering added value to the positioning task [9]. Since the current 5G NR standard does not specify accurate estimation of path ranges or angles at the UE side, for a DL-based positioning we consider observing frequency-domain CSI measurements based on SSs transmissions by the gNBs. This is beneficial since the SSs are periodically transmitted and thus systematically available. Additionally, for an UL-based positioning method, we consider observing time-domain multipath measurements, including path-wise angles and propagation delays, directly at the gNBs. For obtaining the path-wise angles and propagation delays in practice, it is possible to exploit the above-discussed 5G NR specified UL-AoA and MRTT methods, respectively.

II-B Frequency-Domain Channel Measurements through Beamformed DL SSs

The frequency-domain CSI relates to the frequency response (FR) of the effective channel between the gNB and the UE. Considering orthogonal frequency-division multiplexing (OFDM) based transmission, the antenna-element-wise FR at subcarrier $n$ , denoted as $\bm{\mathcal{H}}(n)\!\in\!\mathbb{C}^{N_{\text{RX}}\times N_{\text{TX}}}$ , can be written as [13, 9]

\bm{\mathcal{H}}(n)=\sum_{k=0}^{K-1}h_{k}e^{-\frac{j2\pi n\tau_{k}F_{s}}{N}}% \bm{a}_{\text{RX}}(\theta_{\text{AOA},k})\bm{a}_{\text{TX}}^{H}(\theta_{\text{% AOD},k}),

(1)

where $K$ is the number of paths, while $h_{k}$ , $\tau_{k}$ , $\theta_{\text{AOA},k}$ and $\theta_{\text{AOD},k}$ are the complex path coefficient, ToF, AoA and angle-of-departure (AoD) for the $k^{\text{th}}$ path, respectively. Furthermore, $N_{\text{TX}}$ and $N_{\text{RX}}$ are the numbers of transmit and receive antennas in respective order, $F_{s}$ is the sampling frequency, and $N$ is the OFDM Fast Fourier Transform (FFT) size. Finally, $\bm{a}_{\text{TX}}(\cdot)\in\mathbb{C}^{N_{\text{TX}}}$ and $\bm{a}_{\text{RX}}(\cdot)\in\mathbb{C}^{N_{\text{RX}}}$ are the steering vectors, which define the phases per antenna element with respect to the array center, for given AoD and AoA. Considering further the analog phased-arrays in mmWave systems, the TX and RX apply beamforming weights $\bm{b}_{\text{TX}}\in\mathbb{C}^{N_{\text{TX}}}$ and $\bm{b}_{\text{RX}}\in\mathbb{C}^{N_{\text{RX}}}$ , respectively. The corresponding effective beamformed channel at subcarrier $n$ , considered as the frequency-domain CSI, can then be expressed as

\mathcal{H}(n)=\bm{b}_{\text{RX}}^{H}\bm{\mathcal{H}}(n)\bm{b}_{\text{TX}}.

(2)

In practice, besides noise and interference, the CSI estimation can suffer from inaccuracies [28] due to radio frequency (RF) impairments, clock and frequency offsets between the gNB and the UE, and imperfect timing advance information. Furthermore, due to signaling overhead, CSI is often reported per blocks of subcarriers, which reduces the CSI resolution in frequency. In this paper, we consider obtaining the frequency-domain CSI via 5G NR SSs, transmitted periodically in DL by all gNBs.

II-C Time-Domain Multipath Measurements through MRTT and UL-AoA

In time-domain, the radio propagation channel can be modeled as a composition of individual propagation paths with path-specific propagation delay, power gain, phase shift, AoD, and AoA, together with additional distortion and interference, among other channel effects. The antenna-element-wise channel impulse response $\mathbf{H}(\tau)\in\mathbb{C}^{N_{\text{RX}}\times N_{\text{TX}}}$ can be written as a function of propagation delay $\tau$ as [36, 34]

\mathbf{H}(\tau)=\sum_{k=0}^{K-1}h_{k}\bm{a}_{\text{RX}}(\theta_{\text{AOA},k}% )\bm{a}_{\text{TX}}^{H}(\theta_{\text{AOD},k})\delta(\tau-\tau_{k})

(3)

where $\delta(\cdot)$ is a Dirac delta function (i.e., a unit impulse). Similar to the frequency-domain representation, while again assuming analog phased-arrays, the effective beamformed channel impulse response can be written as

H(\tau)=\bm{b}_{\text{RX}}^{H}\mathbf{H}(\tau)\bm{b}_{\text{TX}}.

(4)

In this paper, the measured time-domain CSI includes the path delays $\tau_{k}$ , the path powers $|h_{k}|^{2}$ , and the gNB side path angles $\theta_{\text{AOA},k}$ . For the path delays $\tau_{k}$ and path powers $|h_{k}|^{2}$ , the estimation procedure is assumed to exploit 5G NR MRTT measurements [38], as discussed in Section II-A. Furthermore, for estimating the gNB side path angles $\theta_{\text{AOA},k}$ , it is also possible to utilize UL-AoA measurements [38] based on UL SRS transmissions, as noted in Section II-A.

The overall data acquisition concept is illustrated in Fig. 1, highlighting the different considered measurements. In general, within the current 5G NR standard, the LMF is responsible for the localization and related signaling management while the positioning calculations can be carried out either at the UE or the network side. Moreover, reporting the MRTT and UL-AoA measurements between the gNB and the LMF is supported by the so-called NR Positioning Protocol A [39]. Different alternative ways to arrange for labeled training data include crowdsourcing, crowdsensing, as well as utilization of synthetic data. These are discussed further in Section IV.G.

III Proposed Methods

This section describes and introduces the novel approach of utilizing frequency-domain CSI with relative phase, while also addressing the time-domain CSI data pre-processing. In addition, the proposed architectures, hyperparameters, and training algorithm of the proposed NN models are presented. Finally, important system-level implementation alternatives and aspects are discussed.

III-A Frequency-Domain CSI Data Preprocessing

III-A1 Proposed Relative Phase Approach

5G mmWave networks operate at high carrier frequencies, at and beyond 24 GHz, with wavelengths approaching the millimeter-scale. In mobile scenarios, utilizing absolute phase responses is highly impractical, as movement of a few millimeters in distance results in a full rotation of the phase. Furthermore, as is well-known, the frequencies and wavelengths relate through

\lambda=\delta_{s}={c}/{f},

(5)

where $\delta_{s}$ is the propagation distance between two points with equal phases, $\lambda$ is the signal wavelength, $c$ is the speed of light, and $f$ is the signal frequency.

In this article, we consider obtaining the frequency-domain CSI in the resolution of 12 subcarriers, which refers to a bandwidth of one resource block (RB) in 5G NR, denoted as $\Delta f_{RB}$ . The CSI is interpreted at the $6^{\text{th}}$ subcarrier of each RB, and thus the corresponding subcarrier index for the $m^{\text{th}}$ RB observation is given as $n_{\text{RB,}m}=6+m\Delta f_{RB}$ with $m=0,...,M-1$ , where $M$ is the number of RBs. Moreover, we propose to take advantage of differential phase measurements between neighboring RBs. Based on (1)-(2), and when considering an individual propagation path, the phase difference between subcarriers is equal across the spectrum and completely determined by the ToF through the complex exponential term. Possible phase rotations due the other terms in (1) are constant over all subcarriers, and thus do not induce phase difference between the subcarriers. Thus, for an individual path with ToF of $\tau_{0}$ , the phase difference between two consecutive resource blocks can be given as

\Delta\phi=2\pi\tau_{0}\Delta f_{RB}.

(6)

While the above expression builds on a single propagation path, we utilize this approach in this work also in case of realistic multipath propagation. As elaborated further below, the differential phase approach allows to mitigate the effect of phase periodicity, and thus extract relevant features for the proposed NN-based positioning.

To this end, the linkage between the relative phase and a specific propagation distance is unambiguous only when the relative phase is within one phase cycle ( $\Delta\phi<2\pi$ ). A distance $d_{\phi}$ , which inflicts the full $2\pi$ cycle of the relative phase between two neighboring RBs, can be solved based on (6) as

d_{\phi}=\frac{c}{\Delta f_{RB}}

(7)

by denoting $\tau_{\phi}\Delta f_{RB}=1$ , where $\tau_{\phi}=d_{\phi}/c$ is the corresponding ToF resulting in a full phase cycle. By using the relative phase difference $\Delta\phi$ , instead of an absolute phase, as the frequency-domain feature, the positioning performance can be significantly improved, as shown in Section IV. Although the distance ambiguity issue still remains with the phase difference recurrence at every $d_{\phi}$ meters, it is greatly improved compared to the recurrence level with an absolute phase at every $\delta_{s}$ meters, as $f\gg\Delta f_{RB}$ .

III-A2 Frequency-Domain CSI Features and Visualization

To provide a short illustration, we consider a single representative user path along an urban environment, as shown in Fig. 2a (for further details of the environment, refer to Section IV). Then, Fig. 2b and Fig. 2c demonstrate the utilized frequency-domain CSI feature representations along the path, including the proposed features and the features from the related literature. Specifically, the raw, complex channel response, further referred to as complex frequency response (FR-Complex), is depicted in Fig. 2b, top, and Fig. 2b, center, which show the real and imaginary parts of FR-Complex for 10 consecutive resource blocks along the path. The feature is obtained from (2) as $\text{real}(\mathcal{H}(n_{\text{RB,}m}))$ and $\text{imag}(\mathcal{H}(n_{\text{RB,}m}))$ for $m=1,...,10$ .

Furthermore, Fig. 2b, bottom, depicts the channel power response, denoted as the power-domain frequency response (FR-Power). Such channel feature is utilized, e.g., in [24, 25], and can be expressed via (2) as $10\text{log}_{10}(|\mathcal{H}(n_{\text{RB,}m})|^{2})$ . In 5G NR, the FR-Power corresponds to a RB-wise RSRP measurement, defined as the average power of the resource elements carrying the reference symbols. As an input feature, we also re-scale FR-Power to the normalized range of $[0,1]$ .

Then, the top graph of Fig. 2c visualizes the proposed relative phase difference as the frequency-domain feature, further denoted as the relative phase frequency response (FR-Phase). Specifically, building on the discussion in Section III-A1, the FR-Phase can be obtained and expressed following (2) as

\Delta\phi(m)=\text{arg}(\mathcal{H}(n_{\text{RB,}m}))-\text{arg}(\mathcal{H}(% n_{\text{RB,}m-1}))

(8)

for RB indices $m=1,...,M-1$ . The dependency between the signal path lengths and the relative phases, especially in LoS regions, is clearly visible in the figure. Moreover, it can be seen that the feature magnitude is recurring with a path propagation distance at every $d_{\phi}$ meters, as derived in (7).

III-A3 Proposed Combined Feature

To utilize the maximum information enclosed in the measured channel responses, we further propose the so-called power and relative phase frequency response (FR-Power/Phase) approach as the ultimate frequency-domain feature. This approach combines the FR-Phase and FR-Power by transforming the FR-Phase to the complex unit circle, with subsequent element-wise multiplication with the re-scaled FR-Power. This is expressed as

\bar{P}_{\text{RB}}(m)\,\text{exp}({j\Delta\phi(m)})

(9)

where $\bar{P}_{\text{RB}}(m)$ refers to the re-scaled normalized power for RB indices $m=0,...,M-1$ . Furthermore, since the FR-Phase has one element less than FR-Power, we extend the FR-Phase array with an additional element for $m=0$ by defining $\Delta\phi(0)=0$ .

The real and imaginary components of the proposed FR-Power/Phase feature set are visualized in Fig. 2c, center and bottom, respectively. The proposed feature allows to accommodate the advantages of both received power and relative phases in a single complex feature vector, while relaxing the distance ambiguity of the relative phase feature.

III-B Time-Domain CSI Data Preprocessing

The time-domain CSI data utilized for localization includes propagation delays $\tau_{k}$ , powers $|h_{k}|^{2}$ , and gNB side path angles $\theta_{\text{AOA},k}$ for the observed LoS and NLoS paths. To this end, the measured path-wise propagation delays $\tau_{k}$ , referred to as path-wise time-of-flight (path-ToF), are transformed to propagation distances by multiplying them with the speed of light. The path-wise AoAs, $\theta_{\text{AOA},k}$ , are obtained at the gNB side, and transformed to directions in Cartesian coordinates in the preprocessing, to omit the zero-crossing problem with cyclic angular data. This results in a robust AoA feature, called path-wise angle-of-arrival (path-AoA) in the following, which is less susceptible to angular deviation and related uncertainties. The path-wise received powers $|h_{k}|^{2}$ are expressed in decibels (dBm) to overcome the extremely low feature magnitudes in linear scale. Such feature is the time-domain equivalent of the frequency-domain FR-Power, which accumulates all paths into the same observed frequency-domain measurement. Similar to the propagation delay feature, the path power feature includes information on the path propagation distance, but most importantly, it also provides information on the number and type of channel interactions, such as reflections, diffraction, or scattering, within the radio path. In general, different combinations of the time-domain CSI data can be adopted. The aggregated path-ToF+AoA and path-ToF+RP+AoA features, proposed in this work, are the most powerful ones, as shown through the numerical results.

III-C NN Model Architectures and Hyperparameters

Among the various alternative data-aided approaches, we restrict ourselves to NN models in this work, which currently dominate the ML area due to their performance, scalability, generalization properties, and dynamic architecture options [40].

III-C1 Activation Function

In this work, we utilize the Gaussian error linear unit (GELU) [41] as the non-linear activation function. Its main advantages over traditional rectified linear unit (ReLU) include resistance to a “dying ReLU” problem [42], differentiability at all values while having also been shown to offer improved performance already in a number of applications such as natural language processing [41]. It can be defined as $\text{GELU}(x)=x\Phi(x)$ [43, 44] where $\Phi(x)$ is the cumulative distribution function of the standard normal distribution. The function can also be approximated for faster processing as

\text{GELU}(x)=\frac{x}{2}\tanh\left(\sqrt{\frac{2}{\pi}(x+Cx^{3})}\right),

(10)

where $C=0.044715$ . Compared to ReLU, the higher complexity of GELU is compensated by the faster convergence of the model, as well as the corresponding improved positioning performance, based on our complementary experiments.

III-C2 Utilized NN Architectures

As the functional NN layers, we utilize in this work both densely connected layers and long-short term memory (LSTM) [45] layers, the later being used only in the sequence-based implementation of models. Importantly, the LSTM layer is a recurrent-based layer capable of preserving long-term and short-term trends within the data.

The architecture of the densely-connected model is depicted in Fig. 3a. It consists of 5 densely connected layers with GELU activation functions and a single densely connected layer with linear activation and 2 neurons as the output, estimating the UE position. The architecture of the sequence processing capable model is, in turn, shown in Fig. 3b. It consists of 5 densely connected layers after the input, with a single LSTM layer connected in parallel with the 5^th dense layer. The concatenated output of these layers is then fed to an LSTM layer with 5 neurons at the output with linear activation. Specifically, the densely connected layers serve as instantaneous feature extractors, while the intermediate LSTM layer learns the temporal features. Due to the considered parallel architecture, the last functional layer has access to both instantaneous and temporal features. The resulting output is then divided into a positioning output with $2$ variables, velocity output with a single variable, and a heading output with an additional $\text{tanh}(\cdot)$ activation and $2$ variables.

III-C3 Data Structures, Normalization and Training

In general, the input dimensions vary based on the selected features and deployment scenario. For frequency domain features, and when considering the evaluation scenario described in Section IV containing $3$ gNBs, $16$ beams, and $10$ RBs, the input size is either $480$ for FR-Power and FR-Phase or $960$ for FR-Power/Phase and FR-Complex. When considering the time-domain CSI in the same scenario, the individual path-ToF, path-AoA, and path-RP features have each an input size of $15$ , the combined path-ToF+AoA and path-RP+AoA features have $30$ inputs, and finally the aggregated path-ToF+RP+AoA feature has an input size of $45$ . Some of the features are also normalized prior to the training, as demonstrated already along Fig. 2b and Fig. 2c. Specifically, all power-related quantities as well as path-wise ToF measurements, when first converted to pseudo-ranges, are all normalized between $[0,1]$ within the overall sets of available measurements. Finally, all angle and phase quantities are, by design, within $[-\pi,\pi]$ . We emphasize that each different feature scenario and combination corresponds to an individual NN, trained and deployed on its own. The vast set of numerical results, provided in Section IV, provides the corresponding mutual performance comparisons.

All considered NN models are trained using the Adam optimizer [46] with learning rates of $0.001$ for the first $200$ epochs, and then an early-stop** mechanism based on validation performance for additional $500$ epochs, while iteratively reducing the learning rate to $0.0005$ and $0.0001$ after each stop. The lowered learning rates ensure a fine-tuned performance with a small number of epochs. The mean squared error (MSE) loss was selected for each output, and for the sequence-based NN model, the loss weights were selected as $0.8$ , $0.1$ , and $0.1$ for positioning loss, velocity loss, and heading loss, respectively. Furthermore, stemming from the deployment area of around $550\times 370~{}$ m² (see Fig. 5), the position labels are reduced by a factor of $300$ to accelerate the training.

III-D System-Level Implementation Alternatives and Aspects

In general, there are alternative ways to organize and implement the use of the CSI measurements and data for NN training and actual online inference processing for localization. These are discussed below, in relation to the proposed methods and the data acquisition visualized in Fig. 1, while noting also the important role of UE radio resource control (RRC) state.

To this end, the time-domain CSI data, i.e., the MRTT-based ToF measurements and the SRS-based UL-AoA measurements, are by definition obtained at the network side. Thus, in this case, it is natural to also perform both the model training as well as the localization inference processing at the network side. Consequently, there is no need for additional signaling or feedback, and all training data from different UEs is inherently gathered together for training the model. Importantly, since MRTT and UL-AoA require scheduled SRS and PRS transmissions, time-domain measurements are only available in the connected mode when it comes to the UE RRC state.

Frequency-domain RSRP and other CSI measurements are collected from periodic and always available SSB transmissions at the UE side, thus enabling utilization of efficient data crowdsourcing methods. Despite a possible technical capability to perform training at the UE, assuming individually trained models at different UEs can be considered unrealistic. Therefore, UEs are expected to periodically share such measurement data with the network for NN training, for example, through Minimization of Drive Testing (MDT) messaging in the form of raw measurements and location tags, or alternatively as locally pre-trained models following the principle of federated learning (FL). Interestingly, unlike with MRTT and UL-AoA, the DL frequency-domain CSI and RSRP measurements can be collected and obtained also in the RRC idle mode as part of standard mobility management procedures. This can be considered a great asset enabling continuous data collection and localization with very low power consumption. Furthermore, assuming a pre-shared model from the network for the final inference phase, the UE can perform localization independently without supplementary signaling with the gNB.

IV Evaluation Environment and Results

IV-A Evaluation Scenario and Assumptions

The evaluation environment builds on ray-tracing-based channel measurements utilizing Wireless Insite®software [47]. We employ the map-based METIS Madrid grid [48], recognized as the relevant urban scenario by 3GPP in 5G NR specifications [34]. The Madrid grid layout introduces generally a rich radio propagation environment with different street widths and open areas, empowering generalization and scalability.

The simulated urban scenario illustrated in Fig. 4 contains three 5G mmWave gNBs operating at $28$ GHz, such that clear NLoS regions also exist along the streets. Each gNB is equipped with a uniform cylindrical antenna array with $4$ elevated layers, each with $16$ antenna elements placed at $5$ m height. The beam configuration includes 16 beams with uniformly separated azimuth angles and a common down-tilted elevation angle fixed at $10$ deg. The AoD and ToF measurements are obtained based on the corresponding characteristics of the radio propagation path with the highest received power, building on the signals and measurement procedures described in Section II. The obtained AoD and ToF measurements are exposed to substantial measurement errors, as discussed further in Section IV-B. The beam-wise frequency-domain CSI measurements are obtained from SSB transmissions as an average of received subcarrier powers per RB, with $120$ kHz subcarrier spacing. Measurements with path-loss higher than 160 dB are not considered, while in general the environment shown in Fig. 4 possesses large areas and street segments with severe multi-bounce phenomena.

The combined time-domain and frequency-domain dataset consists of $40$ vehicle-like user tracks, where the UE collects measurements at $100$ ms intervals. The UE locations are initialized with random locations along the streets, and the UEs move within the area by considering an equal probability to advance in any direction at intersections. The UE velocity varies between $20$ km/h and $60$ km/h depending on the present street and possible proximity of intersections while when approaching an intersection, the UE decelerates at $3$ m/s² until reaching a fixed velocity of $20$ km/h for smooth turning. After the turn, the UE accelerates at $2$ m/s² until reaching a street-specific speed limit. The speed limit is generally defined as $40$ km/h, apart from the top horizontal street which has the speed limit of $20$ km/h (see Fig. 4) and the wider street below the pedestrian street having a limit of $60$ km/h. The exact UE trajectories and associated measurement locations are different for each simulated user track. As this work is heavily focused on NLoS positioning, Fig. 4 visualizes the simulated tracks with the LoS/NLoS indication at each sampled location. We note that in order to efficiently track moving UEs with varying velocities through the sequence processing models, a sufficiently rich training dataset is needed with representative velocity statistics.

The available $40$ user tracks are distributed into 32 UE traces for training, 4 for validation, and the remaining 4 for the actual testing. The validation and testing paths are carefully selected, to avoid any area-specific bias in the evaluation. Furthermore, as the work focuses on the NLoS positioning performance, we validated the consistency of the LoS/NLoS split across the datasets. The distribution of the samples in the individual datasets based on the number of LoS gNBs is consistent with approx. $35\%$ NLoS samples, $60\%$ of samples having a single LoS gNB, and only $5\%$ samples having $2$ gNBs in LoS. The distribution suggests that the traditional model-based solutions, such as trilateration, are not applicable in the considered scenario. In total, there are $25\,181$ samples in the dataset.

IV-B Network Data Uncertainties

In this work, we take into account the important practical aspect of uncertainties in the measurements and thereon in the corresponding features. To this end, the frequency-domain CSI is impaired in its FR-Complex representation with complex additive white Gaussian noise (AWGN) samples with magnitude equal to $30\%$ of the corresponding channel estimate’s root mean square (RMS) magnitude. Such represents large practical measurement uncertainties. The other related features such as the FR-Power/Phase are impaired correspondingly, through the transformations from the impaired FR-Complex to amplitude/power and phase domains.

To impair the time-domain features, we impose impairments separately to path-ToF, path-RP and path-AoA quantities. The path-ToF feature uncertainty is an AWGN with standard deviation (std) equal to $10$ m. We consider the constant uncertainty scale regardless of the ToF magnitude, as the measurement errors are mostly resulting from hardware inaccuracies and timing offsets in the UEs and gNBs. Furthermore, we impair the path-RP feature with an AWGN with $2$ dB std, which corresponds to the maximum impairment of $\pm 6$ dB range with $99.7\%$ certainty, defined by 3GPP as the required absolute measurement accuracy for SS-RSRP [49]. The path-AoA is, in turn, impaired with discretized accuracy of $22.5^{\circ}$ ( $360^{\circ}/16$ beams), rather than with a randomized value, to incorporate the gNB limitations in accurately determining the AoA.

Finally, as reviewed in the Introduction, a large majority of the state-of-the-art works, such as [15, 29, 35, 25, 28, 24, 23, 18, 21], utilize channel amplitude or power response, or even the integrated received power, as the positioning feature. Hence, in the following, the results with FR-Power feature represent essentially the state-of-the-art reference approach when it comes to the frequency-domain features. In the time-domain feature case, the use of the individual dominant path features has been considered in [8, 32, 5, 30, 31, 36], thus serving as the main reference schemes. Additionally, the state-of-the-art schemes build commonly on snap-shot NNs without harnessing the temporal correlation.

Data and codes are openly available at https://doi.org/10.5281/zenodo.12204893.

IV-C Numerical Results with Dense Snap-Shot NNs

We next provide and analyze the results obtained with dense NN based ML models while considering both the frequency-domain and time-domain features as well as the impacts of the feature density or granularity in the two considered domains. To establish an understanding on the baseline or reference performance, we start with the results under perfect measurements (no uncertainties), while then show also the performance under practical measurement uncertainties.

IV-C1 Results with Frequency-Domain Features

First, we analyze and compare the different frequency-domain CSI features introduced in Section III-A and their positioning capabilities with a densely connected snap-shot NN with 5 hidden layers. We thus split all the user tracks into individual samples and compare the performance without considering the temporal dependencies within sequences or additional uncertainties, to focus on the quality of the features themselves.

Fig. 5a visualizes the distributions of the positioning errors on the testing dataset for each feature. Each boxplot marks the median (center) as well as the first and third quartiles ( $25^{th}$ and $75^{th}$ percentiles) encapsulated in the box, while the whiskers mark the values of $5^{th}$ and $95^{th}$ percentiles. The results show that the proposed FR-Power/Phase feature representation enables the most efficient training in terms of positioning error and that considering the FR-Complex features as the input provides the poorest performance. The FR-Power and FR-Phase features achieve comparable median performance, but in terms of outliers, FR-Power performs better. The $95^{th}$ percentiles, referring essentially to the presence of outliers, of FR-Complex and FR-Phase are significantly higher than those of the remaining methods, as shown quantitatively in Table II. Furthermore, the feature combination denoted as FR-Power+FR-Phase represents the simple concatenation of the corresponding individual features. The numerical results show that the positioning performance is improved when compared to the individual features, but the novel FR-Power/Phase feature – utilizing the same, yet pre-processed inputs – provides superior performance. The table high-lights in bold the best performance numbers in different cases.

We next further investigate the impact of the feature representation by considering the LoS and NLoS data separately, with the results being shown in Table II. We can observe that the proposed FR-Power/Phase feature representation achieves the lowest positioning errors by a considerable margin, when compared to the other methods ( $3.35$ m and $4.70$ m mean positioning error in LoS/NLoS, respectively) addressed earlier in the literature. By comparing the performance in LoS and NLoS scenarios, we can observe some increase in the error in NLoS, however, the exact impact is clearly feature-dependent. Furthermore, when considering the $95^{th}$ percentiles of the error distributions, we can observe that the errors related to the FR-Complex and FR-Phase features are drastically increased, in both LoS and NLoS scenarios, while FR-Power+FR-Phase and FR-Power/Phase features sustain a relatively stable performance across the majority of the testing samples.

TABLE II: Baseline performance results: frequency-domain features, dense snapshot NN, no uncertainties

Feature	FR-Complex		FR-Power		FR-Phase		FR-Power+FR-Phase		FR-Power/Phase
Error [ $\mathrm{m}$ ]	LoS	NLoS	LoS	NLoS	LoS	NLoS	LoS	NLoS	LoS	NLoS
Median	$7.88$	$14.01$	$5.57$	$5.01$	$5.26$	$6.97$	$3.35$	$4.29$	2.42	2.84
Mean	$21.27$	$25.63$	$9.69$	$8.89$	$18.01$	$27.99$	$6.53$	$8.67$	3.35	4.70
$80^{th}$ pc	$21.70$	$33.73$	$12.36$	$11.11$	$9.33$	$17.67$	$5.46$	$8.57$	4.45	5.18
$95^{th}$ pc	$106.37$	$92.05$	$30.69$	$27.53$	$87.97$	$166.66$	$10.47$	$21.43$	6.71	10.97

Overall, the obtained results clearly show and demonstrate that utilizing the novel FR-Power/Phase feature offers the best performance by a large margin, clearly outperforming the earlier state-of-the-art in the field of frequency-domain features. Thus, in the further frequency-domain feature related evaluations, we consider only the FR-Power/Phase feature representation.

IV-C2 Results with Time-Domain Features

Next, we evaluate the positioning capabilities and performance when utilizing the different time-domain features (path-ToF, path-RP, and path-AoA) as the input data. We also evaluate the combination of the features, while the model can consider up to $5$ dominant multipath components above the 160 dB path-loss threshold.

Fig. 5b visualizes the achieved positioning results, showing that the proposed combinations of path-ToF and AoA or path-ToF, RP and AoA are the two best performing aggregate features. The results also suggest that the path-RP measurement provides less relevant information to the model than the path-ToF, which the model can directly interpret as normalized pseudo-range measurement. This can be seen by comparing the individual features (path-ToF vs. path-RP), as well as the cases where they are combined with path-AoA.

TABLE III: Baseline performance results: time-domain features, dense snapshot NN, no uncertainties

Feature	path-ToF		path-RP		path-AoA		path-ToF+AoA		path-RP+AoA		path-ToF+RP+AoA
Error [ $\mathrm{m}$ ]	LoS	NLoS	LoS	NLoS	LoS	NLoS	LoS	NLoS	LoS	NLoS	LoS	NLoS
Median	$2.74$	$2.19$	$2.96$	$4.49$	$1.10$	$1.67$	0.54	1.10	$1.09$	$1.73$	$0.71$	$1.21$
Mean	$11.08$	$11.14$	$10.17$	$13.50$	$2.79$	$5.03$	1.23	$3.61$	$1.91$	$4.29$	$1.32$	2.63
$80^{th}$ pc	$8.59$	$6.05$	$6.35$	$12.51$	$2.57$	$3.96$	1.35	2.33	$2.45$	$4.16$	$1.67$	$2.71$
$95^{th}$ pc	$45.26$	$40.84$	$29.91$	$53.60$	$8.48$	$14.92$	$3.77$	$6.86$	$5.31$	$14.88$	3.44	6.73

The impacts of the features as well as the standalone performance in LoS/NLoS are summarized in Table III, while also highlighting the best-performing features in each scenario. The table shows that the combination of all features (path-ToF+RP+AoA) together with path-ToF+AoA offer the best results across all statistics. The corresponding performance of path-RP+AoA lags already behind. When evaluating the individual features, the path-AoA provides high-accuracy positioning capabilities with less than $2$ m median positioning error in NLoS, as it can effectively capture the propagation patterns within the given deployment. The path-RP and path-ToF provide, in turn, significantly poorer performance as individual features, especially when considering the higher percentile errors. These results thus clearly prove the value of the directional measurements. Additionally, when compared to the results presented in Table II, relative performance improvement can be observed, which we credit to stronger interpretability of time-domain measurements as model inputs compared to the frequency-domain CSI features. Notably, meter-scale positioning accuracy can be reached through the time-domain features also in NLoS.

IV-C3 Impact of Feature Granularity

Next, we assess and compare the performance of the snap-shot NN model while varying the granularity or sparsity of the input measurements. We again separate the testing dataset into LoS and NLoS parts, and first evaluate the frequency-domain data as resource block-level (RB-level) features, and their mean values across the RBs as the bandwidth-level (BW-level) features. The RB-level features are obtained per-RB, which contains $12$ subcarriers with a subcarrier spacing of $120$ kHz, thus representing a bandwidth of $1.44$ MHz per measurement. The BW-level feature considers the $14.4$ MHz bandwidth across $10$ RBs. Technically, the full feature (RB-level) dataset consists of $960$ features per sample (real and imaginary part, $10$ RBs, $16$ beams, $3$ gNBs), while the BW-level feature dataset contains only $96$ features per sample ( $2\times 16\times 3$ ). The purpose is to investigate and understand whether the more sparse feature representations, which enable simpler and thus faster models, are capable of achieving competitive performance to the non-sparse data or full features.

Fig. 6a visualizes the empirical cumulative distribution functions of the positioning errors, in the LoS and NLoS regions, when utilizing the RB-level and BW-level data as the FR-Power/Phase features. We can observe that in the LoS scenario, the BW-level features provide somewhat reduced performance compared to the full RB-level features. In the NLoS scenario, the RB-level features as the input again outperform the BW-level features. In general, particularly in NLoS where the channel geometry is more complex, the wider array of inputs can support the model in extracting more relevant positioning information thus leading to an improved performance. On the other hand, one can also conclude that the BW-level features provide a well-working solution with large reductions in the model complexity.

Similarly, we next evaluate the impact of the time-domain feature granularity. Earlier, we already concluded that the path-ToF+AoA and path-ToF+RP+AoA features provide the best performance, thus these features are utilized also here. In the following, we distinguish and compare between utilizing only the dominant multipath component (in terms of power) and all available multipath components as the input features.

The achieved performance results are depicted in Fig 6b, from which we can draw the following observations. The model performance actually improves when only the dominant multipath component is used as the input. This applies to both LoS and NLoS regions, and the impact is particularly clear when the path-ToF+RP+AoA feature case is considered. Additionally, Fig. 6b shows that especially the outlier performance (the highest $10\%$ of errors) is clearly improved, particularly in the LoS regions. Overall, the results in Fig 6b demonstrate that very high positioning performance can be achieved also in NLoS given that proper path features are utilized.

IV-D Numerical Results with Temporal Sequence Models

In this section, we evaluate the selected frequency-domain and time-domain CSI features, namely RB-level FR-Power/Phase, path-ToF+RP+AoA with the dominant component, and path-ToF+AoA with the dominant component, in the spirit of vehicular user tracking. For this purpose, we utilize the novel temporal sequence-processing NN model proposed and described in Section III-C, while estimating the UE position, UE velocity and UE heading simultaneously. Additionally, and importantly, we now also consider the realistic uncertainties in all considered measurements and the corresponding features, as introduced in Section IV-B.

Fig. 7 provides and visualizes the sequence-based model performance for the different considered features while also explicitly comparing the cases without and with measurement uncertainties, hence providing valuable insight into the NN model generalization capabilities. In the uncertainty-free scenario, time-domain features clearly outperform the frequency-domain ones, similar to the earlier results with snap-shot models. In the practical case where the uncertainties are present in the data, the performance with the time-domain path-ToF+AoA and path-ToF+RP+AoA features deteriorates to a certain extent, while the performance gap between the LoS and NLoS scenarios also interestingly disappears. The frequency-domain FR-Power/Phase features provide essentially the same distributions for the LoS and NLoS positioning errors, while even outperforming the uncertainty-free model in NLoS. These findings high-light the generalization properties of the NN model, as the NLoS scenario itself is a source of additional uncertainties, as can be seen in Fig. 2c already. To this end, Fig. 7 demonstrates that the increased amount of uncertainty in the data does not necessarily lead to the degradation of performance, given that the model is trained on sufficient amount of data containing such uncertainties.

To further study the achievable performance of the proposed sequence-processing model and the channel features, the positioning, speed, and heading estimation performance is next assessed and shown. We also compare the proposed sequence model’s performance against the selected benchmark solutions, namely the NN model from our initial work in [14] (denoted in the continuation as SPAWC benchmark), as well as an EKF-based robust Bayesian tracking algorithm. To this end, the SPAWC model’s LSTM architecture, parameters, and training setup are as described in [14], whereas the training and testing data are naturally the same as for all other methods described in this article such that comparative results and fair comparisons can be obtained. The SPAWC benchmark utilizes path-ToF+AoA time-domain features as the inputs.

The considered EKF benchmark, in turn, utilizes ToF and AoA measurements from each gNB with LoS connection, while assuming ideal LoS detection such that the EKF performance is the best possible in all LoS locations. In the absence of any LoS link, only the prediction stage of the EKF is conducted. The implementation of the used state-transition model and observation model, together with the related Jacobians and process covariance matrix, follows the descriptions given in [11]. Consequently, the used EKF state vector comprises the UE position in x- and y-coordinates and the UE speed in x-y directions. For each track, the UE state vector is initialized with a perfect state vector estimate in order to provide the best available performance for the EKF benchmark results. However, the performance is evaluated only after the first LoS link is obtained, so that the unrealistic prediction during NLoS condition with the perfect initialization is avoided. After a brief optimization of the EKF parameters, the std of ToF and AoA, included in a diagonal measurement covariance matrix, are defined as 50 ns and 15 deg, respectively. In addition, the std of the velocity noise, used in the process covariance matrix, as defined in [11], is set as 8 m/s.

The positioning performance of the different models is visualized and compared in Fig. 8a, showing significantly improved positioning performance through the proposed solution compared to the two benchmarks. While being evaluated with the same data, the proposed time- and frequency-domain CSI features combined with the proposed sequence processing engine achieve around $2$ m median positioning error, compared to $5$ m of the SPAWC benchmark and $12$ m of the EKF benchmark. The achieved performance improvement is stemming from the combination of the novel feature engineering and the carefully crafted sequence based NN processing system. Furthermore, besides estimating the UE location, incorporating the estimation of UE heading and velocity leads to reduced estimation variance through stabilization between individual quantities. Compared to the EKF model, which is built on the assumption of LoS geometry and unbiased measurements, the proposed approach can work in both LoS and NLoS conditions and deal with biased measurements – such as the ones encountered with the discretized path-AoA feature.

Fig. 8b provides the corresponding results when distinguishing between the LoS and NLoS regions. The ECDFs show that with the uncertainties within the received signals, the gap between the LoS and NLoS performance with the proposed solution is strongly suppressed, as for the NN model, the NLoS scenario itself represents an uncertainty on its own, which only complements the ones in the input data. While time-domain CSI features offer LoS-agnostic results, there is still a small performance gap when utilizing the frequency-domain data. The SPAWC benchmark provides consistent results in the lower parts of the distribution, but the distribution for the NLoS samples contains significantly higher errors and outliers. The positioning performance of the EKF in LoS suffers from the discretization of the path-AoA, leading to an increasing positioning error with the increasing distance from the LoS gNB(s). In NLoS, the EKF follows the direction of the last LoS sample, leading to unreliable estimation. As noted already in the Introduction, there are Bayesian filtering or other model-based methods deliberately crafted for NLoS scenarios, such as [9, 13]. However, such possess very high computational complexity and are typically limited to single-bounce scenarios, while in the evaluation environment considered in this article severe multi-bounce phenomena can occur.

Fig. 9a and Fig. 9b visualize the speed estimation and tracking performance in the same scenario as the positioning evaluation above, yet excluding the SPAWC benchmark which does not facilitate speed or heading estimation. The results show that the speed estimation error of the proposed model and features is in the majority of situations lower than $1$ m/s, regardless of the input feature or LoS availability. The frequency-domain feature FR-Power/Phase has a slightly higher estimation error in NLoS. Comparably, EKF is subject to significantly higher estimation error even in LoS, while in NLoS loses the tracking capabilities completely.

Furthermore, as shown in Fig. 10a and Fig. 10b, the UE heading is in the majority of samples estimated correctly – with less than $1^{\circ}$ error in more than $75$ % of the samples. Interestingly, the NLoS heading estimates are even more accurate than the ones with an unobstructed radio link to the gNB. This phenomenon occurs when the UE starts turning and the heading rapidly changes, while the data, such as discretized path-AoA, remain unaffected by the change. The model is not capable of instantaneously reacting due to the uncertainties included within the data.

For a comprehensive performance assessment, we next shortly address the potential impact of the uncertainties in the training data positioning labels. We model the uncertainty through AWGN and set the corresponding std in the x-y coordinates to $5$ m. The obtained results are illustrated in Fig. 11. As can be observed, additional uncertainties in the labels force the model to generalize along the path. Consequently, the model is capable of tracking more effectively in LoS, while in NLoS, the positioning results include an increased number of outliers. These results show, that the proposed combination of features and the sequential NN model can diminish the effects of significant uncertainties in both features and labels.

IV-E Deployment Specialization through Transfer Learning

Being able to adapt the NN model to environmental changes is a critical task in real-world applications, as any given model is essentially restricted to the environment it was trained at. This is especially so if no direct information about the environment, such as gNB locations, beam directions, or buildings, is included in the feature vector. To address this practical challenge, we explore next the capabilities of transfer learning (TL), which relies on re-training the model from one scenario to another. For presentation simplicity, we consider the snapshot model with FR-Power/Phase features and assume no feature or label uncertainty. The models utilize the same training parameters as described in Section III-C2, with the exception of applying the early stop** mechanism also to the first training loop. In addition, for the considered new gNB deployments, we limit the dataset size to only $10\%$ of the original to demonstrate the feasibility of the TL approach also with small data sizes.

IV-E1 Scenario 1

We first consider relocating a single gNB within the deployment, thus altering the signal propagation geometry, as visualized in Fig. 12a. For the evaluation, we consider three models, all sharing the same architecture: an original model, a TL model, and a new model trained from scratch. The original model was trained on a full dataset from the prior deployment, whereas the TL model and the new model were trained on the data from the altered scenario. Moreover, the newly trained model was initialized with random weights, while the TL model was initialized with the weights of the original model before (re-)training. In addition, only the first layer after the input was set to “trainable”, while the rest of the model remain frozen.

The positioning error distributions of the considered models are visualized in Fig. 12b. Although the original model is not trained with the data from the altered deployment scenario, its positioning error is below $10$ m in approximately $40\%$ of the samples. However, most of the accurately localized samples of the original model are found in the southeast part of the deployment, where the relocated gNB is not detected, and thus the environment seems essentially unchanged. The TL model clearly outperforms the newly trained model, especially when considering errors above the $40^{th}$ percentile. The model convergence during the training is visualized in Fig. 12c, where both training and validation losses across epochs are shown for the TL and the newly trained models. While the TL model is able to converge within less than $80$ epochs, the normally trained model requires more than $200$ epochs (not explicitly visible in the figure) to obtain the final weights. This highlights the training efficiency of the TL model.

IV-E2 Scenario 2

We next consider the more challenging case of relocating all three gNBs, as shown in Fig. 13a. In this case, the changes in the gNB coordinates are up to hundreds of meters, resulting in significantly altered radio propagation characteristics. In Fig. 13b, the positioning errors are shown following similar model cases as in Scenario 1. Due to the considerably changed gNB locations, the performance of the original model clearly collapses. Furthermore, the TL model provides clearly better performance compared to the new model trained from scratch. In Fig. 13c, it can be seen that besides providing the best positioning performance, the TL model significantly accelerates the training rate, enabling fast model convergence in only some $50$ epochs. These results show that TL is an efficient approach to adapt the NN model to a new, previously unseen scenario with greatly reduced effort and relaxed requirements on the availability of data.

IV-F Notes on Complexity

Model complexity is an important aspect of any NN solution, and notable efforts are commonly invested to reach beneficial performance-complexity trade-offs. For comparability and presentation simplicity, we focus here primarily on the parameter counts and input sizes as the main complexity-related metrics. To this end, Table IV compares the proposed solutions with the ones from the referred literature noting also the model structures.

TABLE IV: Complexity assessment of related works

Reference	ML model	Input size	Num. parameters
[15]	CNN	$18~{}432$	$13M$
[18]	DNN	$224$	$2~{}331~{}650$
[23]	DNN	N/A	$2.5M$
[25]	CNN	$2~{}700$	$9M$
[28]	DNN	$18~{}432$	$9~{}105~{}346$
[29]	CNN	$15~{}440$	$2.6M$
Our models	Feature	Input size	Num. parameters
Snapshot	path-ToF+AoA	$12$	$1~{}058~{}306$
Snapshot	path-ToF+RP+AoA	$15$	$1~{}067~{}010$
Snapshot	FR-Power/Phase	$960$	$1~{}543~{}682$
Sequence	path-ToF+AoA	$50\times 12$ *	$1~{}667~{}192$
Sequence	path-ToF+RP+AoA	$50\times 15$ *	$1~{}670~{}264$
Sequence	FR-Power/Phase	$50\times 960$ *	$2~{}637~{}944$

* Processing $50$ snapshot samples in a sequence

As can be observed, both the convolutional and fully connected models utilized across the referred works consider a few million parameter models with up to $18~{}432$ input sizes, corresponding to manageable computational complexity when training on high-performance machines. In comparison, the path-ToF+AoA model which achieved meter-level positioning accuracies requires only $12$ inputs and a model with around one million parameters, while increasing the input size to $960$ with FR-Power/Phase feature results in $1~{}543~{}682$ trainable parameters. Our sequence-processing models, despite their extra layers, retain relatively low parameter counts resulting in feasible computational requirements. Overall, we can conclude that while outperforming the prior-art methods in positioning accuracy, the proposed models are also computationally feasible – and even lighter in complexity compared to many reference models.

IV-G Discussion on Synthetic vs. Real-World Training Data

While we evaluate the performance of the methods and models on ray-tracing data in this article, training the models with artificially created synthetic data can be a feasible option also for true deployments and applications of large-scale NN-operated systems. In general, the acquisition of synthetic data is cheaper and faster than performing exhaustive site surveys and can approximate reality with increasing accuracy and fidelity [50]. Considering the availability of synthetic data, there are many possibilities to organize the model training in practice. For example, initial training can be performed already at the factory based on synthetic data from a specific intended operation environment, and then the model can be fine-tuned on a limited set of real-world measurements. Such real-world labelled data can be obtained through different crowdsourcing or crowdsensing arrangements, as discussed further below. On the other hand, training at the factory can be more generic and cover numerous environments, while specialization to a specific environment can be managed with a TL approach using synthetic data and/or real-world measurements. Similarly, in case of any changes in the environment, it is possible to update the model via TL by utilizing re-generated site-specific synthetic data and/or newly obtained real-world measurements.

Employing crowdsourcing campaigns, where the surveying software is offered to the public to perform the measurements is one way of arranging real-world labelled data in practice. One recognized challenge is that such an approach is often biased by the human factor, such as inaccurate manual labeling. As another alternative, while crowdsourcing requires the user to perform an action to obtain the data, crowdsensing is fully automated and unsupervised by the user. Consequently, it can yield a massive volume of data, though at the cost of potentially missing labels or other quality-related challenges. Numerous solutions exist to cure and filter such data, although some challenges still remain [51].

In general, the topic of synthetic data is currently explored by IEEE [52] and can become a critical link enabling real-world systems driven by their digital twins [53]. This is a key component in positioning-driven studies, where obtaining realistic performance evaluations requires consideration of whole network deployments with detailed physical-layer processing and measurement capabilities. End-to-end simulated results play thus a crucial role in the positioning system analysis before conclusive validation through experimental field tests.

V Conclusions

This article addressed cellular network-based user localization and tracking in challenging NLoS environments, with specific emphasis on 5G mmWave deployments and urban vehicular systems. We first described the UL/DL measurements, available for positioning purposes, together with their acquisition in 5G mmWave networks. We then derived and proposed efficient frequency-domain CSI features, most notably utilizing the relative phases and powers of the received signal across the neighboring resource blocks. As time-domain CSI data, we exploit the multipath components and proposed different aggregate features combining time-of-flight, angle-of-arrival, and received path-wise powers. Deep learning ML architectures were then described, covering not only dense snapshot models but also sequence-processing NN models harnessing the temporal correlations of the features in vehicular systems.

Realistic numerical evaluations in large-scale LoS-obstructed urban environment with moving vehicles were provided, building on full ray-tracing-based propagation modeling on METIS Madrid map at 28 GHz. The baseline results without feature uncertainties show that the frequency-domain CSI in the form of RB-level relative phases and powers allows for very good and robust positioning performance, in both LoS and NLoS, while even further enhanced performance can be obtained through the time-domain features when combining multipath times-of-flight and angles-of-arrival. The results also show that dominant multipath feature combinations are sufficient, or even favorable, for robust positioning. Additionally, when considering practical levels of feature measurement uncertainties, together with the sequence-processing NN models, robust positioning in both LoS and NLoS was still shown to be feasible. Finally, the important practical aspect of dealing with gNB deployment differences between the training and inference phases was addressed. It was shown that such environment related uncertainties can be addressed and alleviated through transfer learning.

Overall, the provided numerical results clearly demonstrate that the proposed methods harnessing the novel feature engineering and sequence processing neural network models outperform the state-of-the-art, being able to facilitate 1-2 m median positioning accuracy even in deep-NLoS regions with feasible parameter counts. Our future work will focus on exploring the opportunities with obtaining and processing positioning measurements from co-existing C-band and mmWave cellular networks, as well as with further improving the positioning accuracy and reliability through sensor fusion.

References

[1] H. Holma, A. Toskala, and T. Nakamura, 5G Technology: 3GPP New Radio. Wiley, 2019.
[2] E. Dahlman, S. Parkvall, and J. Skold, 5G NR: The Next Generation Wireless Access Technology, 2nd ed. Elsevier, 2020.
[3] R. Mendrzik, F. Meyer, G. Bauch, and M. Z. Win, “Enabling Situational Awareness in Millimeter Wave Massive MIMO Systems,” IEEE J. Sel. Topics Signal Process., vol. 13, no. 5, pp. 1196–1211, 2019.
[4] J. A. del Peral-Rosado, R. Raulefs, J. A. López-Salcedo, and G. Seco-Granados, “Survey of cellular mobile radio localization methods: From 1G to 5G,” IEEE Commun. Surveys Tuts., vol. 20, no. 2, pp. 1124–1148, 2017.
[5] M. Koivisto et al., “High-efficiency device positioning and location-aware communications in dense 5G networks,” IEEE Commun. Mag., vol. 55, no. 8, pp. 188–195, 2017.
[6] R. Di Taranto et al., “Location-Aware Communications for 5G Networks: How location information can improve scalability, latency, and robustness of 5G,” IEEE Signal Process. Mag., vol. 31, no. 6, pp. 102–112, Nov. 2014.
[7] K. Nagai, T. Fasoro, M. Spenko, R. Henderson, and B. Pervan, “Evaluating GNSS navigation availability in 3-D mapped urban environments,” in Proc. IEEE/ION PLANS, 2020, pp. 639–646.
[8] Y. Lu et al., “Bayesian filtering for joint multi-user positioning, synchronization and anchor state calibration,” ”IEEE Trans. Veh. Technol., pp. 1–16, 2023.
[9] Y. Ge et al., “A computationally efficient EK-PMBM filter for bistatic mmWave radio SLAM,” IEEE J. Sel. Areas Commun., vol. 40, no. 7, pp. 2179–2192, 2022.
[10] J. L. C. Villacrés, Z. Zhao, T. Braun, and Z. Li, “A particle filter-based reinforcement learning approach for reliable wireless indoor positioning,” IEEE J. Sel. Areas Commun., vol. 37, no. 11, pp. 2457–2473, 2019.
[11] M. Koivisto et al., “Joint device positioning and clock synchronization in 5G ultra-dense networks,” IEEE Trans. Wireless Commun., vol. 16, no. 5, pp. 2866–2881, 2017.
[12] K. Ko, W. Ahn, and W. Shin, “High-speed train positioning using deep kalman filter with 5G NR signals,” IEEE Trans. Intell. Transp. Syst., 2022.
[13] J. Talvitie, M. Valkama, G. Destino, and H. Wymeersch, “Novel algorithms for high-accuracy joint position and orientation estimation in 5G mmwave systems,” in Proc. IEEE Globecom Workshops, 2017, pp. 1–7.
[14] R. Klus, J. Talvitie, J. Vinogradova, J. Torsner, and M. Valkama, “Machine learning based NLOS radio positioning in beamforming networks,” in Proc. IEEE SPAWC, 2022, pp. 1–5.
[15] X. Sun, C. Wu, X. Gao, and G. Y. Li, “Fingerprint-based localization for massive MIMO-OFDM system with deep convolutional neural networks,” IEEE Trans. Veh. Technol., vol. 68, no. 11, 2019.
[16] A. Zappone, M. Di Renzo, and M. Debbah, “Wireless networks design in the era of deep learning: Model-based, AI-based, or both?” IEEE Trans. Commun., vol. 67, no. 10, pp. 7331–7376, 2019.
[17] G. Revach et al., “KalmanNet: Neural network aided Kalman filtering for partially known dynamics,” IEEE Trans. Signal Process., vol. 70, pp. 1532–1547, 2022.
[18] R. Klus, J. Talvitie, and M. Valkama, “Neural network fingerprinting and GNSS data fusion for improved localization in 5G,” in Proc. Int. Conf. Localization and GNSS (ICL-GNSS), 2021, pp. 1–6.
[19] J. Torres-Sospedra et al., “A comprehensive and reproducible comparison of clustering and optimization rules in Wi-Fi fingerprinting,” IEEE Trans. Mobile Comput., 2020.
[20] J. Rojo et al., “Machine learning applied to Wi-Fi fingerprinting: The experiences of the ubiqum challenge,” in Proc. IPIN, 2019, pp. 1–8.
[21] R. Klus et al., “Transfer learning for convolutional indoor positioning systems,” in Proc. IPIN, 2021, pp. 1–8.
[22] M. M. Butt, A. Pantelidou, and I. Z. Kovács, “ML-assisted UE positioning: Performance analysis and 5G architecture enhancements,” IEEE Open Journal of Vehicular Technology, vol. 2, pp. 377–388, 2021.
[23] E. Gönültaş, E. Lei, J. Langerman, H. Huang, and C. Studer, “CSI-Based Multi-Antenna and Multi-Point Indoor Positioning Using Probability Fusion,” IEEE Trans. Wireless Commun., vol. 21, no. 4, pp. 2162–2176, 2021.
[24] X. Wang, L. Gao, S. Mao, and S. Pandey, “CSI-based fingerprinting for indoor localization: A deep learning approach,” IEEE Trans. Veh. Technol., vol. 66, no. 1, pp. 763–776, 2016.
[25] H. Chen, Y. Zhang, W. Li, X. Tao, and P. Zhang, “ConFi: Convolutional neural networks based indoor Wi-Fi localization using channel state information,” IEEE Access, vol. 5, pp. 18 066–18 074, 2017.
[26] J. Xiao, K. Wu, Y. Yi, and L. M. Ni, “FIFS: Fine-grained indoor fingerprinting system,” in 2012 21st International Conference on Computer Communications and Networks (ICCCN), 2012, pp. 1–7.
[27] X. Wang, L. Gao, and S. Mao, “CSI Phase Fingerprinting for Indoor Localization With a Deep Learning Approach,” IEEE Internet Things J., vol. 3, no. 6, pp. 1113–1123, 2016.
[28] P. Ferrand, A. Decurninge, and M. Guillaud, “DNN-based localization from channel estimates: Feature design and experimental results,” in Proc. IEEE GLOBECOM, 2020, pp. 1–6.
[29] K. Gao, H. Wang, H. Lv, and W. Liu, “Toward 5G NR high-precision indoor positioning via channel frequency response: A new paradigm and dataset generation method,” IEEE J. Sel. Areas Commun., 2022.
[30] D. Lynch, L. Ho, M. MacDonald, and M. O’Neill, “Localisation in wireless networks using deep bidirectional recurrent neural networks,” in Proc. Int. Joint Conf. Neural Networks (IJCNN), 2020, pp. 1–8.
[31] Z. HajiAkhondi-Meybodi, M. Salimibeni, A. Mohammadi, and K. N. Plataniotis, “Bluetooth low energy and CNN-based angle of arrival localization in presence of Rayleigh fading,” in Proc. IEEE ICASSP, 2021, pp. 7913–7917.
[32] Y. Xie, L. Zhou, Y. Zhang, H. Huan, and Z. Zhang, “Simultaneous localization of scatterers and target user based on indoor prior information in NLOS environments,” IEEE Trans. Veh. Technol., vol. 71, no. 11, pp. 11 729–11 740, 2022.
[33] T. Feigl, E. Eberlein, S. Kram, and C. Mutschler, “Robust ToA-estimation using convolutional neural networks on randomized channel models,” in Proc. IPIN, 2021, pp. 1–8.
[34] 3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz,” 3GPP, Tech. Rep. 38.901, 3 2022, version 17.0.0.
[35] M. M. Butt, A. Rao, and D. Yoon, “RF fingerprinting and deep learning assisted UE positioning in 5G,” in Proc. IEEE VTC-Spring, 2020, pp. 1–7.
[36] Y. Chen, J. Palacios, N. González-Prelcic, T. Shimizu, and H. Lu, “Joint initial access and localization in millimeter wave vehicular networks: a hybrid model/data driven approach,” in Proc. IEEE SAM, 2022, pp. 355–359.
[37] 3GPP, “Physical layer measurements,” 3GPP, Tech. Rep. 38.215, 1 2021, version 16.4.0.
[38] 3GPP, “Stage 2 functional specification of User Equipment (UE) positioning in NG-RAN,” 3GPP, Tech. Rep. 38.305, 12 2021, version 16.7.0.
[39] 3GPP, “NG Radio Access Network (NG-RAN); Stage 2 functional specification of User Equipment (UE) positioning in NG-RAN,” 3GPP, Tech. Rep. 38.305, 9 2022, version 16.8.0.
[40] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
[41] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
[42] L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, “Dying relu and initialization: Theory and numerical examples,” arXiv preprint arXiv:1903.06733, 2019.
[43] Y. Wang et al., “Transformer-based acoustic modeling for hybrid speech recognition,” in Proc. IEEE ICASSP, 2020, pp. 6874–6878.
[44] J. Xiao, X. Fu, A. Liu, F. Wu, and Z.-J. Zha, “Image De-raining Transformer,” IEEE Trans. Pattern Anal. Mach. Intell., 2022.
[45] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[46] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[47] Remcom. Wireless InSite - 3D Wireless Prediction Software. Accessed: Jan 27, 2021). [Online]. Available: https://www.remcom.com/wireless-insite-em-propagation-software
[48] A. Rauch et al., “Fast algorithm for radio propagation modeling in realistic 3-D urban environment,” Advances in Radio Science, vol. 13, pp. 169–173, 11 2015.
[49] 3GPP, “NR; Requirements for support of radio resource management,” 3GPP, Tech. Rep. 38.133, 9 2022, version 16.13.0.
[50] Y. Assayag, H. Oliveira, E. Souto, R. Barreto, and R. Pazzi, “Indoor positioning system using synthetic training and data fusion,” IEEE Access, vol. 9, pp. 115 687–115 699, 2021.
[51] A. Capponi et al., “A survey on mobile crowdsensing systems: Challenges, solutions, and opportunities,” IEEE Communications Surveys & Tutorials, vol. 21, no. 3, pp. 2419–2465, 2019.
[52] “Synthetic data,” IEEE Standards Association, Mar 2023. [Online]. Available: https://standards.ieee.org/industry-connections/synthetic-data/
[53] A. Castellani, S. Schmitt, and S. Squartini, “Real-world anomaly detection by using digital twin systems and weakly supervised learning,” IEEE Trans. Ind. Informat., vol. 17, no. 7, pp. 4733–4742, 2020.