Federated Learning-based Collaborative Wideband Spectrum Sensing and Scheduling for UAVs in UTM Systems

Sravan Reddy Chintareddy1, Keenan Roach2, Kenny Cheung2, Morteza Hashemi1 1Department of Electrical Engineering and Computer Science, University of Kansas
2Universities Space Research Association (USRA)

Abstract

In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users (SUs) to opportunistically utilize detected “spectrum holes”. Our overall framework consists of three main stages. Firstly, in the model training stage, we explore dataset generation in a multi-cell environment and training a machine learning (ML) model using the federated learning (FL) architecture. Unlike the existing studies on FL for wireless that presume datasets are readily available for training, we propose a novel architecture that directly integrates wireless dataset generation, which involves capturing I/Q samples from over-the-air signals in a multi-cell environment, into the FL training process. To this purpose, we propose a multi-label classification problem for wideband spectrum sensing to detect multiple spectrum holes simultaneously based on the I/Q samples collected locally by the UAVs. In the traditional FL that employ FedAvg as the aggregating method, each UAV is assigned an equal weight during model aggregation. However, due to the disparities in channel conditions in a multi-cell environment, the FedAvg approach may not generalize effectively for all the UAV locations. To address this issue, we propose a proportional weighted federated averaging method (pwFedAvg) in which the aggregating weights incorporate wireless channel conditions and received signal powers at each individual UAV. As such, the proposed method integrates the intrinsic properties of wireless datasets into the FL algorithm. Secondly, in the collaborative spectrum inference stage, we propose a collaborative spectrum fusion strategy that is compatible with the unmanned aircraft system traffic management (UTM) ecosystem. In particular, we improve the accuracy of spectrum sensing results by combining the individual multi-label classification results from the individual UAVs at a central server. Finally, in the spectrum scheduling stage, we leverage reinforcement learning (RL) solutions to dynamically allocate the detected spectrum holes to the secondary users. To evaluate the proposed methods, we establish a comprehensive simulation framework that generates a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station (BS) locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. This evaluation methodology provides a flexible framework to generate large spectrum datasets that could be used for develo** ML/AI-based spectrum management solutions for aerial devices.

Index Terms:

UAV-based Spectrum Sensing, Collaborative Inference, Federated Learning (FL), Reinforcement Learning (RL), UAS traffic management (UTM).

I Introduction

Refer to caption — Figure 1: Unmanned Aircraft System Traffic Management (UTM) architecture showing the separation between Federal Aviation Administration (FAA) and industry developments; Flight Information Management System (FIMS).

Unmanned aerial vehicles (UAVs) have attracted significant interest from communications and networking, robotics, and control societies for exploring novel applications such as on-demand connectivity, search-and-rescue operations, situational awareness, to name a few [1]. As of April 2024, there were roughly 800,000 registered UAVs in the US alone, positioning UAVs as one of the fastest-growing sectors in the aviation industry [2]. Traditionally, UAVs that are used for recreational purposes are operated under visual line of sight (VLOS) conditions. However, real-world and commercial deployments will most likely be in the form of beyond visual line-of-sight (BVLOS), which provides easier access to remote or hazardous areas, less human intervention, and reduced cost of operation [3]. For safe operations of multiple UAVs under BVLOS conditions, NASA and FAA are in the process of defining the UTM system [4]. Fig. 1 shows a simplified form of the UTM architecture, highlighting the separation between FAA and industry development and deployment responsibilities for the necessary infrastructure, services, and entities that interact within the UTM ecosystem. In this work, we mainly focus on the hierarchical structure between multiple operators and the UAS service supplier (USS), which assists multiple operators in meeting UTM operational requirements, ensuring safe and efficient utilization of the airspace.

The concept of operations within the UTM architecture [4] highlights the need for spectrum resources to facilitate wireless communications between UAVs, UAV operators, and the USS network. Existing terrestrial mobile networks (for example, 4G LTE and the upcoming 5G-and-beyond) provide significant wireless coverage with relatively low latency, high throughput, and low cost, making the cellular network a good candidate for the operation of UAVs in BVLOS scenarios [5]. However, the proliferation of new wireless services and the demand for higher cellular data rates have significantly exacerbated the spectrum crunch that cellular providers are already experiencing. Therefore, it is essential to develop dynamic spectrum sensing, inference, and sharing solutions for UAV operations in existing licensed and unlicensed spectrum to enable advanced aerial use cases in BVLOS, such as urban air mobility (UAM) and advanced air mobility (AAM) [6, 7].

There exists a multitude of prior works on spectrum management frameworks for ground users [8, 9, 10, 11, 12]. For instance, the authors in [11, 9, 10] propose deep learning-based wideband spectrum sensing to dynamically detect “spectrum holes”. Furthermore, the authors in [12] propose reinforcement learning (RL) techniques for spectrum sharing, assuming that spectrum sensing results are readily available. While these data-driven spectrum management frameworks for ground users are available, they are not directly applicable for UTM-enabled UAV operations, due to several factors, such as the widely different wireless channel models and the overall system architecture [9, 12]. In the context of UAV spectrum sharing systems, the authors in [13, 14] proposed spatial spectral sensing (SSS) to develop efficient spectrum sharing policies for UAV communications aimed at improving the overall spectral efficiency (SE). However, the SSS models do not consider the spectrum usage pattern of users under realistic scenarios (e.g., ignoring the I/Q level samples), and/or they consider only a single primary user (PU) or secondary user (SU). Moreover, the problem of joint multi-channel wideband spectrum sensing and scheduling among several SUs has not been fully investigated. In this paper, we propose a unified and data-driven spectrum sensing and scheduling framework to enable UAVs to effectively share the spectrum with existing primary users. To make our development more concrete and grounded, the problem of joint spectrum sensing and sharing is formulated as an energy efficiency (EE) maximization in a wideband multi-UAV network scenario. Then, we transform the EE optimization problem into a Markov Decision Process (MDP) to maximize the overall throughput of the SUs. At the spectrum sensing stage, we note the inherent hierarchical nature of the UTM architecture with USS (shown in Fig. 1) is a good match for federated learning (FL) based spectrum sensing. For spectrum scheduling stage, we develop RL-based solutions to enable non-manual and automated spectrum resource allocation. Particular to the spectrum sensing stage, we propose an FL-based cooperative wideband spectrum sensing across multiple UAVs. To this purpose, we develop a multi-label classification framework to identify spectrum holes based on the observed I/Q samples. Each UAV trains their respective local models using the locally collected datasets and transmits the local model parameters to the central server. Furthermore, we propose a novel proportional weighted federated averaging (pwFedAvg) method that incorporates the power level received at each UAV into the FL aggregation algorithm, thereby integrating the dataset generation plane with the FL model training plane, as shown in Fig. 2. Once the training process is completed, all UAVs have an updated global model that predicts spectrum holes. To further enhance the accuracy of the individual spectrum inference results, the predicted spectrum holes from the multi-label classification at each UAV are fused at a central server within the UTM ecosystem. In the spectrum scheduling stage, we develop and implement several RL algorithms, including the standard Q-learning methods to dynamically allocate underutilized spectrum sub-channels to multiple UAVs. We further investigate the performance of the “vanilla” deep Q-Network (DQN) and its variations, including double DQN (DDQN) and DDQN with soft-update.

Furthermore, one of the primary challenges of using machine learning (ML) based methods for spectrum sensing and scheduling approaches is the need for large amounts of training data. The lack of available spectral data in many cases is a significant obstacle, especially for UAV networks that introduce an additional level of complexity for large-scale experimental data collection. To address this gap, we have developed a comprehensive framework for generating spectrum datasets. This framework models LTE waveform generation and propagation channel in any environment of interest, particularly suitable for UTM-enabled UAV applications. Using the generated dataset, we provide a comprehensive set of numerical results to demonstrate the efficacy of the joint FL-based spectrum sensing, spectrum fusion and RL-based dynamic spectrum allocation to multiple UAVs. In summary, the main contributions of this paper are as follows:

•

We develop a spectrum management framework based on the envisioned UTM deployment architecture. To this end, we propose a joint spectrum sensing and scheduling problem for collaborative networked UAVs that operate according to the UTM rules. The joint optimization problem integrates the spectrum sensing results into the spectrum scheduling stage for scenarios with multiple secondary users (i.e., UAVs) and primary users.
•

For spectrum sensing, we propose an FL-based solution to enable collaborative model training across distributed UAVs. We propose the pwFedAvg method that integrates the underlying wireless channel conditions into the FL aggregation step. We also provide the convergence analysis results of the proposed pwFedAvg method. We demonstrate the benefits of collaborative spectrum sensing through a fusion step. For the spectrum scheduling stage, we develop RL-based solutions leveraging DQN-based approaches.
•

We outline a methodology for generating large amounts of I/Q dataset for UAVs in a wide geographical area, considering the effects in a multi-cell multi-path environment by incorporating the base-station locations and accurately modeling the environment using ray-tracing methods. Based on the established framework, we provide a comprehensive set of numerical results to analyze the performance of pwFedAvg compared with the traditional FedAvg approach, as well as with centralized and local learning. Our results demonstrate the efficacy of the pwFedAvg method for collaborative spectrum sensing, without the need to transfer all I/Q samples to one location as in central learning.

This paper extends our prior work [15] in which we did not investigate the feasibility of model training using FL methods for UAVs. In contrast, this paper mainly focuses on develo** FL-based spectrum sensing by incorporating the wireless datasets captured by multiple UAVs into the FL model training plane, as shown in Fig. 2. Furthermore, we have significantly extended our dataset generation by scaling the size of captured I/Q data samples and increasing the number of reflection and diffraction rays, thereby enhancing the fidelity of emulating the propagation environment. The remainder of this paper is organized as follows. In Section II, we review related works. In Section III, we present the overall system model and problem formulation for FL-based wideband spectrum sensing and collaborative spectrum inference and scheduling. In Section IV, we discuss the dataset generation model and the model training aspects of FL based solution to incorporate our proposed pwFedAvg method, followed by a discussion of the convergence analysis of pwFedAvg. In Section V, we present dynamic spectrum allocation using RL. Section VI describes our methodology to generate synthetic spectrum dataset followed by our numerical results in Section VII. Finally, Section VIII concludes the paper.

II Related Works

II-A Spectrum Sensing and Sharing for UAVs

The authors in [16] address spectrum access and interference management by utilizing SSS for ground based device-to-device (D2D) communications [17, 18]. Furthermore, the authors in [14, 19] extend the usage of SSS to UAVs to opportunistically access the licensed channels that are occupied by the D2D communications of ground users. The UAVs perform SSS to obtain the received signal strength and compare it with a threshold to identify the spectrum occupancy of a particular D2D channel. However, in general, energy-based detection methods would require capturing the entire waveform for a sub-channel to compute the energy and compare it with a predefined threshold. When there are multiple sub-channels, such detection methods repeated for each sub-channel add further time and hardware complexity. Therefore, SSS methods are not directly applicable to wideband spectrum sensing by UAVs to detect multiple spectrum holes simultaneously.

In addition to the SSS methods, data-driven deep learning (DL) methods for spectrum sensing have been considered in prior works [20, 21]. To develop multi-channel spectrum sensing using DL, the authors in [9] developed a fast wideband spectrum sensing based on DL. The DL model is based on a convolutional neural network (CNN) that accepts raw I/Q signals and predicts the spectrum holes. The above works consider a single PU, a single SU only, and the channel between the PU and SU is modeled as a Rayleigh fading channel.

Furthermore, there exists extensive research on spectrum sharing solutions. For example, the authors in [11, 22, 23, 24] propose the use of RL for dynamic spectrum access in multi-channel wireless networks. Furthermore, the authors in [12, 25, 26] propose the use of DQN, where, in each time slot, a single SU decides whether to stay idle or transmit using one of the sub-channels in a multi-channel environment without performing spectrum sensing. While these studies have provided significant insights, they consider one SU only and are well studied for ground based communications.

In this paper, we consider a data-driven approach to predict multiple spectrum holes simultaneously from the raw I/Q signals captured in a multi-cell multi-path fading environment consisting of multiple PUs and SUs. We incorporate ray-tracing methods to effectively model the dynamic UAV environment instead of assuming a statistical channel model. Furthermore, we employ RL for dynamically allocating resources for the UAVs based on predicted spectrum holes.

II-B FL-based Spectrum Sensing

FL for spectrum sensing has lately gained popularity [27, 28, 29, 30, 31]. The authors in [29, 27] discuss the application of FL for spectrum sensing in cognitive radio environments, where a SU detects the spectrum holes in the PU’s spectrum band and utilize them opportunistically. However, the studies only considered a single PU with multiple SUs within the coverage of the PU. There exists a separate class of research that concentrate on interference management in a multi-cell wireless networks and incorporate over-the-air computation in FL [32, 33, 34]. For instance, the authors in [32] study the adverse effects of inter-cell interference on the uplink and downlink local model aggregates and global model updates and propose solutions to mitigate the interference. In contrast to the above mentioned works, in this paper, we cater for multiple PUs, multiple SUs, incorporate co-channel interference at the dataset generation level, and also consider wideband spectrum sensing.

Additionally, there exists few research works on FL for wireless systems that investigate how the convergence of the learning process is affected by the noisy transmissions between the clients and the server [35, 36]. However, these investigations often assume that the datasets are readily available at the clients and primarily focus on the specific discussions of model training plane, as illustrated in Fig. 2. Furthermore, these studies consider standard ML datasets, such as CIFAR-10, MNIST, Shakespeare [36], while using traditional federated averaging algorithms.

Yet, wireless datasets collected by multiple UAVs in a multi-cell environment are significantly complex and different compared to those standard datasets. For instance, data collected at one UAV location may encounter distinct wireless channels, varying numbers of propagation paths, and significantly different received signal power levels compared to the data collected at other locations. This variability underscores the need for tailored approaches to the model training within FL frameworks, particularly when dealing with datasets from real-world wireless environments. Hence, we propose a weighted averaging algorithm (pwFedAvg) that captures the disparities in the datasets captured at different UAV locations. Moreover, none of the previous works considered training the FL models with I/Q datasets. To address this gap, we propose a novel architecture to capture wireless datasets in a multi-cell environment, and integrate the dataset generation and the model training planes, to incorporate the effects of wireless datasets into the FL model training, as illustrated in Fig. 2.

III System Model and Problem Formulation

To model collaborative wideband spectrum sensing and scheduling, we consider a multi-cell wireless network that consists of a set of base-stations (BS) denoted by $\mathcal{B}$ ( $|\mathcal{B}|=B$ ), as shown in Fig. 3. In addition, we consider a set of UAVs denoted by $\mathcal{K}$ ( $|\mathcal{K}|=K$ ) in the system. To coordinate the collaborative spectrum sensing, fusion, and scheduling, we assume that each time slot is divided into four consecutive sub-slots: UAV resource request ( $t_{req}$ ), spectrum sensing ( $t_{s}$ ), broadcasting to central server ( $t_{b}$ ), and channel access ( $t_{a}$ ). Specifically, at the beginning of each time slot, the UAVs that require PU resources request the server for resource allocation. In the subsequent sub-slot of sensing ( $t_{s}$ ), the UAVs perform spectrum sensing and broadcasts the sensed channel information in the following sub-slot ( $t_{b}$ ). The central server then applies fusion rules and assigns spectrum holes to the requesting UAVs. The UAVs then transmit on the allocated spectrum holes in the access sub-slot ( $t_{a}$ ). In this paper, we focus on three main stages to develop our proposed framework: (i) FL-based training for wideband spectrum sensing, (ii) collaborative spectrum inference and fusion, and (iii) spectrum scheduling. To coordinate the above three stages, we assume a central server within the UTM ecosystem. Next, we describe these stages.

III-A Model Training, Spectrum Inference, and Scheduling Stages

FL-based model training for spectrum sensing. The UTM system architecture shown in Fig. 1 supports data exchange between multiple UAVs through the USS network. Such a hierarchical architecture makes it feasible to implement FL-based learning algorithms to identify spectrum holes. In this case, we may consider two deployment models within the UTM architecture. One model would be to have a server deployed by each UAV operator where multiple UAVs connected to the operator act as FL clients. The second model would have a server within the USS network that orchestrates multiple UAV operators. Thus, with several UAVs training local models, they exchange model parameters with the central server that is located either at the USS or UAV operator. The central server then aggregates the local model weights according to an aggregation algorithm and transmits the global model weights back to the UAVs to update their local models.

Collaborative spectrum inference and fusion. Due to the highly dynamic environment in which UAVs operate, it may not be feasible for all UAVs to achieve high prediction accuracy across all sub-channels. Therefore, we leverage collaborative spectrum inference by the UAVs, and perform fusion at the fusion module within the central server to increase the reliability of spectrum hole detection. In particular, each individual UAV captures the raw I/Q samples from over-the-air received signals and predicts the availability of spectrum holes across $M$ sub-channels using the FL-trained model. We assume that there is an associated spectrum inference cost for each UAV $k$ involved in sensing at time slot $t$ . The spectrum inference cost is the energy consumed for sensing the spectrum and is proportional to the voltage $V_{CC}$ of the receiver, the system bandwidth ${W}$ , and the duration allotted for sensing ( $t_{s}$ ) [37]. Therefore, it is defined as $SC_{k,m}(t)=t_{s}V_{CC}^{2}{{W}}_{m}$ , where $W_{m}$ is the $m$ -th sub-channel bandwidth. Upon completion of the spectrum inference phase, the UAV $k$ has a predicted spectrum occupancy vector $\bm{\widehat{h}}_{k}(t)=[{\widehat{h}}_{k,1}(t),...,{\widehat{h}}_{k,M}(t)]$ such that ${\widehat{h}}_{k,m}(t)=0$ if the $m$ -th sub-channel is detected vacant at time $t$ , and ${\widehat{h}}_{k,m}(t)=1$ otherwise. This problem can be considered as a multi-label classification problem, and we leverage deep neural network (DNN) at each UAV that accepts raw I/Q samples $\bm{R}_{k}$ as inputs and outputs the predicted spectrum occupancy vector $\bm{\widehat{h}}_{k}(t)$ .

The central server receives multiple copies of spectrum holes detected by individual UAVs and applies fusion rules that results in aggregated prediction. In this paper, we use the $n$ -out-of- ${K}$ fusion rule defined as follows:

z_{m}(t)=\begin{dcases}0,&\text{if }\sum_{k\in\mathcal{K}}\mathds{1}\{{% \widehat{h}}_{k,m}(t)=0\}\geq n;\\ 1,&\text{otherwise},\end{dcases}

(1)

where $\mathds{1}\{.\}$ is an indicator function. In this case, $\bm{z}(t)=[z_{1}(t),...,z_{M}(t)]$ is the fused prediction of all the $M$ sub-channels at the central server. Note that when $n=1$ , the $n$ -out-of- ${K}$ rule is equivalent to the “OR” rule, and $n={K}$ is the same as the “AND” rule.

Spectrum scheduling. Based on the aggregated fusion result provided by the fusion module, the central server then allocates sub-channels to the requesting UAVs. The UAVs then transmit data on the sub-channels allocated to them by the server in the next time step. The transmission energy consumption is denoted by $AC_{k,m}(t)$ . The access cost is the energy consumed for data transmission and is defined as $AC_{k,m}(t)=t_{a}P_{tx}$ , where, $P_{tx}$ is the transmit power and $t_{a}$ is the time allotted to transmission. Furthermore, the transmission utility is the amount of data transmitted on the allocated sub-channel, which is defined as follows:

U_{k,m}(t)=t_{a}{W}_{m}\log_{2}\left(1+\text{SNR}_{k,m}(t)\right),

(2)

where $\text{SNR}_{k,m}(t)$ denotes the signal-to-noise ratio for UAV $k$ on sub-channel $m$ .

We highlight that the UAVs transmit on those sub-channels that were detected vacant in the previous time step. Hence, spectrum collision occurs when the previously detected spectrum holes are no longer available at the current time step. We assume that the true state of sub-channel $m$ is denoted by $\bar{z}_{m}(t)$ . To capture this, we define the spectrum access collision indicator $c_{k,m}(t)$ as follows:

c_{k,m}(t)=\begin{dcases}1,&\text{if }\bar{z}_{m}(t)=0~{}\text{and}~{}z_{m}(t-% 1)=0;\\ -1,&\text{if }\bar{z}_{m}(t)\neq 0~{}\text{and}~{}z_{m}(t-1)=0;\\ 0,&\text{otherwise}.\end{dcases}

(3)

Next, we formulate a joint spectrum sensing and scheduling optimization problem.

III-B Joint Spectrum Sensing and Scheduling Problem Formulation

Given the presented system model, we now introduce a joint spectrum sensing and scheduling problem to coordinate collaborative spectrum sensing and spectrum scheduling. We cast the problem as an EE optimization for the UAVs that opportunistically use the spectrum resources of the primary network. In particular, let $y_{k,m}(t)=1$ if UAV $k$ is scheduled to use sub-channel $m$ at time $t$ , and $y_{k,m}(t)=0$ otherwise. Given that the spectrum holes are allocated to the requesting SUs based on the sub-channel availability, we incorporate the sensing and access costs to maximize the overall EE of the system. Therefore, we have:

\begin{cases}\mathop{\mathrm{max}}\limits_{\{y_{k,m}(t)\}}&\mathbb{E}\big{\{}% \sum\limits_{t,k,m}\frac{\ y_{k,m}(t)\ c_{k,m}(t)\ U_{k,m}(t)}{y_{k,m}(t)AC_{k% ,m}(t)+SC_{k,m}(t)}\big{\}}\\ \text{subject to:}&\sum_{m}\ y_{k,m}(t)\ \leq 1,\ \forall\ k=1,2,3,\dots K,\\ &\sum_{k}\ y_{k,m}(t)\ \leq 1,\ \forall\ m=1,2,3,\dots M,\\ &\sum_{k,m}\ y_{k,m}(t)\ \leq\ M-|\bm{z}(t)|,\\ &y_{k,m}(t)\in\{0,1\},\end{cases}\vspace{-0.5mm}

(4)

where $U_{k,m}(t)$ , $SC_{k,m}(t)$ , and $AC_{k,m}(t)$ are, respectively, the amount of data transmitted, the sensing cost, and transmission cost by the SU $k$ on sub-band $m$ . The constraints guarantee that each UAV is scheduled to use at most one sub-channel, while the total number of scheduled UAVs is at most equal to the number of detected spectrum holes at time $t$ , which is $M-|\bm{z}(t)|$ . In this paper, we use DNN at each UAV to detect spectrum holes that are fused to obtain $\bm{z}(t)$ . To train the DNN models, next we present an FL-based approach for distributed training of spectrum sensing models.

IV Proposed FL-based Model Training for Spectrum Sensing

IV-A Dataset and DNN Models

We assume that each UAV receives signals from more than one BS due to the fact that they operate at higher altitudes, which increases the chances of signal reception from multiple BSs. Furthermore, we assume that the cell bandwidth ${W}$ is partitioned into $M$ orthogonal sub-channels. Then the total transmitted signal from a BS b across $M$ orthogonal sub-channels at any time $t$ can be represented by the superposition principle as follows:

{\bm{s}}_{b}(t)=\sum_{m=1}^{M}I_{b,m}(t)~{}\bm{v}_{b,m}(t),~{}~{}~{}\forall b% \in\mathcal{B},

(5)

where $I_{b,m}(t)=1$ if the $m$ -th sub-channel of BS $b$ is occupied at time $t$ , and $0$ otherwise. Moreover, $\bm{v}_{b,m}(t)$ represents the waveform on the $m$ -th sub-channel. As a result, ${\bm{s}}_{b}(t)$ is the transmitted baseband waveform in digital domain. Each UAV $k$ then receives a wideband signal from multiple BSs in a multi-path propagation environment, which can be expressed as follows:

\bm{r}_{k}(t)=\sum_{b=1}^{{B}}\bm{g}_{k,b}(t)*{\bm{s}}_{b}(t)+\bm{\eta}_{k}(t)% ,~{}~{}~{}\forall k\in\mathcal{K},

(6)

where $\bm{g}_{k,b}(t)$ represents the multi-path channel between BS $b$ and UAV $k$ and $\bm{\eta}_{k}(t)$ denotes the noise signal observed at UAV $k$ . Therefore, the signal-to-noise ratio observed at UAV $k$ can be written as follows:

\text{SNR}_{k}(t)=\frac{||\sum_{b=1}^{{B}}\bm{g}_{k,b}(t)*{\bm{s}}_{b}(t)||^{2% }}{\sigma_{k}^{2}(t)},~{}~{}~{}\forall k\in\mathcal{K},

(7)

where $\sigma_{k}^{2}(t)$ represents the noise variance observed at UAV $k$ at time $t$ . We use $P_{k}(t)$ to denote the total power received in UAV $k$ at time $t$ , which is directly proportional to the signal generated as defined in Eq. (5). We will use $P_{k}(t)$ in proportional weight scaling for FL training.

To train the DNN models for predicting spectrum holes using raw I/Q samples, it has been shown that the characteristics of the wireless signal can be captured by observing only a portion of the signal waveform [9, 15]. Hence, from the received baseband signal ${\bm{r}}_{k}(t)$ , we capture $J$ I/Q samples and store them locally. Therefore, the samples from baseband waveform collected at UAV $k$ are represented as $\bm{R}_{k}(t)$ given as follows:

{\bm{R}}_{k}(t)=\bm{\tilde{R}}_{k}(t)+\bm{\tilde{\eta}}_{k}(t),~{}~{}~{}% \forall k\in\mathcal{K},

(8)

where $\bm{\tilde{R}}_{k}(t)$ represents the $J$ I/Q samples from the first term in Eq. (6) and the second term represents $J$ complex Gaussian noise samples.

In addition to the I/Q samples, we also need to store the true labels for channel occupancy at each UAV $k$ at time $t$ . The channel occupancy vector ${\bm{h}}_{k}(t)$ is an $M$ -dimensional vector, with each index indicating if a sub-channel $m$ is occupied or free at time $t$ and can be computed as follows:

h_{k,m}(t)=\begin{dcases}1,&\sum_{b=1}^{B}~{}I_{b,m}(t)\geq 1;\\ 0,&\text{Otherwise.}\\ \end{dcases}

(9)

Note that ${\bm{h}}_{k}(t)$ observed at time $t$ would be the true label corresponding to the wideband received signal $\bm{r}_{k}(t)$ . The channel occupancy would remain unchanged for the stored $J$ I/Q samples ${\bm{R}}_{k}(t)$ . We store ( ${\bm{R}}_{k}(t)$ , ${\bm{h}}_{k}(t)$ ) as an input-output pair that will be used for the training of the FL model. For the sake of simplicity of notation, we represent the input-output pair as ( ${\bm{R}}_{k}$ , ${\bm{h}}_{k}$ ). Note that for each $M$ -dimensional channel occupancy vector $\bm{h}_{k}$ , the input-output pair is treated as one data sample and the total I/Q dataset collected at UAV $k$ is denoted as follows:

\bm{D}_{k}=\bigl{\{}({\bm{R}_{k}^{1}},\bm{h}_{k}^{1}),({\bm{R}_{k}^{2}},\bm{h}% _{k}^{2}),\dots({\bm{R}_{k}^{|\bm{D}_{k}|}},\bm{h}_{k}^{|\bm{D}_{k}|})\bigr{\}}.

(10)

where $|\bm{D}_{k}|$ represents the total number of samples in the UAV $k$ . These local datasets are used in FL-based training for spectrum hole detection.

In the FL setting, each UAV $k$ trains a local wideband spectrum sensing model whose parameters are denoted by $\bm{{\omega}_{k}}$ . Hence, the primary objective of the local model is to find a mathematical function $f$ ( $\bm{{\omega}}_{k}$ , $\bm{R}_{k}$ ), that maps input I/Q samples $\bm{R}_{k}$ to $\bm{h}_{k}$ , i.e.,

f(\bm{{\omega}}_{k},\bm{R}_{k}):\bm{R}_{k}\rightarrow\bm{h}_{k}.

(11)

To this end, using the raw I/Q samples ( $\bm{R}_{k}$ ) each UAV $k$ trains a local model that detects vacant sub-channels, such that the local loss function $L_{k}(\bm{{\omega}})$ minimizes the error between the true labels $\bm{h}_{k}$ and the predicted labels $\bm{\widehat{h}_{k}}$ , as defined below:

L_{k}(\bm{\omega})\triangleq\frac{1}{|\bm{D}_{k}|}\sum_{i=1}^{|\bm{D}_{k}|}l(f% (\bm{\omega}_{k},\bm{R}_{k}^{i});\bm{{h}}_{k}^{i})),

(12)

where $l(.)$ is the loss function for computing the prediction loss in the supervised machine learning setting. Furthermore, $f(.)$ represents the predicted label for the sample ( $\bm{R}_{k}^{i}$ , $\bm{h}_{k}^{i}$ ) and $\bm{\omega}_{k}$ represents the local model parameters during training.

For each input sequence $\bm{R}_{k}^{i}$ , we intend to obtain an $M$ -dimensional binary vector $\bm{\widehat{h}_{k}}^{i}$ that represents the predicted spectrum holes. This is an instance of a multi-label classification problem for which we employ DNN. The DNN architecture considered is shown in Fig. 4. The model accepts raw I/Q samples as input, which are then processed by two 1D convolutional (Conv1D) layers followed by a 1D maximum pooling layer (MaxPool1D). This layer pattern is repeated twice and one dense layer is followed by a sigmoid layer at the end.

IV-B Channel-Aware FL Aggregation Method

Given the system model, we now introduce a framework for wideband spectrum sensing where multiple UAVs collaboratively participate in the FL. In such a distributed learning environment, we aim to learn a global statistical model at the central server. Given that each UAV $k$ trains a local model to identify the spectrum holes by minimizing the local loss function $L_{k}(\bm{\omega})$ , in the context of FL, we would like to minimize the aggregated global loss function $L(\bm{\omega})$ , as follows:

\mathop{\min}_{\bm{\omega}}\Bigg{\{}L(\bm{\omega})\triangleq\mathop{\sum}_{k=1% }^{{K}}\frac{|\bm{D}_{k}|}{{D}}~{}L_{k}(\bm{\omega})\Bigg{\}},

(13)

where ${D}$ = $\sum_{k=1}^{{K}}|\bm{D}_{k}|$ is the total size of data samples across the UAVs.

To solve the global loss function Eq. (13), the authors in [38] proposed FedAvg, an iterative aggregation algorithm where the global model aggregates the local model gradients and redistributes the global model weights to the local models. However, when the datasets of each UAV $k$ are of equal size, FedAvg assigns equal scaling factor of $\frac{1}{{K}}$ for all local gradients. However, in our considered multi-cell environment, the signal received at different UAV locations experiences different channel conditions, and the signal power received at different locations varies significantly. Hence, by assigning equal scaling weights for the local model gradients, the performance metrics at UAV locations with strong signal deteriorate. To compensate for this effect and improve performance at locations that receive better signal power, we propose a proportional weight scaling aggregation method for FL (pwFedAvg) that intuitively assigns smaller weights to UAVs with lower received signal power (i.e., poor channel conditions), and larger weights to those UAVs with higher received signal power.

Algorithm 1 Channel-Aware FL-Based Training

1:Initialize the global model parameters

\bm{\omega}

and local model

\bm{\omega_{k}}

\forall k\in\mathcal{K}

;

T:

Communication rounds.

2:for

t

T

3: for UAV

k

\mathcal{K}

4: Choose a batch of I/Q samples

\bm{\xi}_{k}^{t}

\subseteq\bm{D}_{k}

5: Train the local model for E epochs.

6: end for

7: Send the gradients

{\nabla L_{k}(\bm{\omega}_{k}^{t};\bm{\xi}_{k}^{t})}

to the central server.

8: Aggregate local gradients at the server per Eq. (14).

9: Update the global model at the server per Eq. (15).

10: Update local models using the global model, i.e.,

\bm{\omega}_{k}^{t+1}

\bm{\omega}^{t+1}

11:end for

In particular, using the pwFedAvg, the central server aggregates the local gradients by assigning a weight proportional to their received signals as follows:

{\nabla L(\bm{\omega}^{t})}=\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}~% {}{\nabla L_{k}(\bm{\omega}_{k}^{t};\bm{\xi}_{k}^{t})},

(14)

where $\alpha_{k}^{t}$ = $\sqrt{\bar{P}_{k}^{t}}$ and $\alpha^{t}$ = $\sum_{k=1}^{{K}}{\sqrt{\bar{P}_{k}^{t}}}$ . Here, ${\bar{P}_{k}^{t}}$ represents the average received signal power at UAV k for the batch of samples $\bm{\xi}_{k}^{t}$ . Note that during the FL training process at time $t$ , $\bm{\omega}_{k}^{t}$ and $\bm{\omega}^{t}$ denote the local and global model weights, respectively. Upon computing the global model gradient based on Eq. (14), the global model weights are updated as follows:

\bm{\omega}^{t+1}=\bm{\omega}^{t}-\gamma^{t}~{}{\nabla L(\bm{\omega}^{t})},

(15)

where $\gamma^{t}$ is the learning rate of the global model. The updated global model weights are sent to the clients to update their local model weights. The overall process of FL-based model training using the pwFedAvg approach is outlined in Algorithm 1.

IV-C Convergence Analysis

In this section, we present the convergence analysis for the pwFedAvg algorithm. To this end, we first introduce the following assumptions.

Assumption 1: The loss function $L_{k}(.)$ is $L$ -smooth, i.e., for all $\bm{u}$ and $\bm{v}$ ,

L_{k}(\bm{u})-L_{k}(\bm{v})~{}\leq~{}(\bm{u}-\bm{v})^{{T}}~{}\nabla L_{k}(\bm{% v})+\frac{L}{2}~{}||\bm{u}-\bm{v}||_{2}^{2}.

(16)

Assumption 2: For each $k$ , $L_{k}(.)$ is $\beta$ -strongly convex, i.e., for all $\bm{u}$ and $\bm{v}$ ,

L_{k}(\bm{u})-L_{k}(\bm{v})~{}\geq~{}(\bm{u}-\bm{v})^{{T}}~{}\nabla L_{k}(\bm{% v})+\frac{\beta}{2}~{}||\bm{u}-\bm{v}||_{2}^{2}.

(17)

Assumption 3: Let $\bm{\xi}_{k}$ be the data samples chosen from $\bm{D}_{k}$ . The variance of the stochastic gradients for each UAV $k$ is bounded i.e,

\mathbb{E}\big{[}||\nabla L_{k}(\bm{\omega}_{k};\xi_{k})-\nabla L_{k}(\bm{% \omega}_{k})||_{2}^{2}\big{]}~{}\leq\rho_{k}^{2}~{}~{}\forall~{}k\in\mathcal{K}.

(18)

Next, similar to [39, 40], we define two virtual sequences to denote the aggregated full gradient and stochastic gradient respectively, as follows:

\bar{\bm{a}}^{t}=\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}~{}{\nabla L% _{k}(\bm{\omega}_{k}^{t})};~{}~{}\bm{a}^{t}=\sum_{k=1}^{{K}}\frac{\alpha_{k}^{% t}}{\alpha^{t}}~{}{\nabla L_{k}(\bm{\omega}_{k}^{t};\bm{\xi}_{k}^{t})}.

(19)

We also assume that $\mathbb{E}[\bm{a}^{t}]=\bar{\bm{a}}^{t}$ . Given these assumptions, we have the following lemmas.

Lemma 1: Let $\bm{\omega^{*}}$ = [ $\omega_{1}^{*}$ , $\omega_{2}^{*}$ , …, $\omega_{d}^{*}$ ] be the weights of optimal global model, and $\bm{\omega}_{k}^{*}$ = [ $\omega_{k,1}^{*}$ , $\omega_{k,2}^{*}$ , …, $\omega_{k,d}^{*}$ ] be the weights of optimal local model of UAV $k$ . Here, $d$ represents the dimensions of the model weights. Then for each UAV $k$ , the upper bound of the gap between the optimal global and local models can be shown as,

L_{k}(\bm{\omega}^{*})-L_{k}(\bm{\omega}_{k}^{*})\leq\tau,\\

(20)

where $\tau=\max\limits_{k}\{{\frac{Ld}{2}}(\max\limits_{i}\{|\omega_{i}^{*}-\omega_{% k,i}^{*}|\})^{2}\}$ .

Lemma 2: The aggregated gradient is upper bounded as follows:

\mathbb{E}\big{(}||\bm{a}^{t}-\bm{\bar{a}}^{t}||_{2}^{2}\big{)}\leq\sum_{k=1}^% {{K}}\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\rho_{k}^{2}.

(21)

Lemma 3: Let the constants $\kappa$ and $\gamma^{t}$ satisfy $\frac{1}{\kappa}\leq\gamma^{t}$ . Then, we can show that:

\mathbb{E}\big{[}||\bm{\omega}^{t+1}-\bm{\omega}^{*}||_{2}^{2}\big{]}\leq(1-% \beta\gamma^{t})||\bm{\omega}^{t}-\bm{\omega}^{*}||_{2}^{2}+(\gamma^{t})^{2}G^% {t},

(22)

where $G^{t}=2\kappa\tau+\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}% \bigg{)}^{2}\big{[}\rho_{k}^{2}-2L\tau\big{]}$ .

Theorem 1: Given that $\kappa\leq\gamma^{t}=\frac{1}{\beta t+L}$ , the optimality gap for the proposed pwFedAvg satisfies the following:

\begin{split}\mathbb{E}[L(\bm{\omega})^{T}]-L^{*}\leq\frac{L}{\beta T+2L}\bigg% {[}\frac{2G}{\beta}+L||\bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{2}\bigg{]}\end{% split},

(23)

where $G=\max_{t}\{G^{t}\}$ and $G^{t}$ is as defined in Lemma $3$ . Therefore, we show that the convergence of our proposed method is $\mathcal{O}(\frac{1}{T})$ . All of the proofs are presented in Section IX.

V Dynamic Spectrum Scheduling Using RL

Once the DNN models are trained using our proposed pwFedAvg, they output their spectrum hole predictions, which are then fused at the fusion module, as described in Section III. The identified spectrum holes will be allocated to requesting UAVs. The integrated system model of collaborative spectrum sensing and scheduling is shown in Fig. 5, with the overall algorithm described in Algorithm 2.

For spectrum scheduling, we note that the optimization problem in Eq. (4) is a fractional integer programming problem, which is NP-hard in general. If we consider maximizing the numerator alone, which is the total utility $U(t)$ of the UAVs over all sub-channels, the problem will become an integer programming problem. In this case, the utility would depend on the spectrum usage pattern by the PUs, which is captured by $c_{k,m}(t)$ as well as the channel condition between the BSs and UAVs that determine the amounts of transmitted data $U_{k,m}(t)$ . To tackle this utility optimization problem, we model the channel occupancy $\bar{z}_{m}(t)$ as a Markov process, enabling us to use an MDP formulation to solve this problem [41] and develop a dynamic spectrum scheduling for the SUs.

As we assume that there exist $M$ sub-channels in the system, each sub-channel can be modeled as an independent two-state Markov chain. The transition probability function $\bm{P}$ can then be viewed as a set of transition probability matrices { $\bm{P}_{m}$ } for each sub-channel that captures the randomness in the assumed multi-user multi-channel environment. Therefore, we formulate the total utility of the SUs as a traditional MDP governed by the tuple ( $\mathcal{S}$ , $\mathcal{A}$ , { $\bm{P}_{m}$ }, U, $\gamma$ ), consisting of the set of states $\mathcal{S}$ , set of actions $\mathcal{A}$ , a transition probability function { $\bm{P}_{m}$ }, a reward function $U$ , and a discount factor $\gamma$ . To solve an MDP using RL, an agent learns to make decisions in an uncertain environment by maximizing a cumulative reward over a sequence of actions. Specifically, the agent interacts with an environment by taking actions that transition the system from one state to another, and the agent receives a reward that is commensurate with the merit of the action. The discount factor determines the relative importance of immediate and future rewards.

Algorithm 2 Collaborative Spectrum Sensing and Scheduling

2:Phase 1 – Spectrum Sensing and Broadcasting

4:for each UAV in

\mathcal{K}

5: Capture I/Q samples from over the air signal.

6: Feed I/Q samples to the pre-trained ML model that predicts the spectrum holes

\bm{\widehat{h}}

7: Broadcast the individual spectrum hole observations

\bm{\widehat{h}}(t)

\in\{0,1\}^{1\times M}

to the central server.

8:end for

10:Phase 2 – Spectrum Fusion and Scheduling

11:

12:Apply fusion rule in Eq. (1) to predict spectrum holes

\bm{z}(t)

13:Allocate a single spectrum hole to each requesting UAV using pre-trained RL algorithm,

y_{k,m}(t)

, such that the constraints in Eq. (4) are satisfied.

14:UAVs are scheduled to transmit on the sub-channel allocated in the previous allocated time slot.

15:Given the spectrum allocation

y_{k,m}(t)

and spectrum access collision indicator

c_{k,m}(t)

, the total utility

U(t)

can be computed using Eq. (4).

DDQN-Based Spectrum Allocation. One of the most popular RL methods is Q-learning [41]. The classical Q-learning is table-based, i.e. the values of the Q-function are stored in a table of size $|\mathcal{S}|$ × $|\mathcal{A}|$ . However, when the size of the state and action spaces is large, the complexity of tabular Q-learning becomes cumbersome. For example, with $M=16$ sub-channels, the Q-table will be of size $65,537\times 17$ . To address the complexity issue, we adapt the deep Q-learning approach in [42] to approximate the Q-function by a neural network $Q_{\bm{\theta}}$ called DDQN and train its weights $\bm{\theta}$ using experience replay. As the name suggests, we have two networks when using DDQN where, $Q_{\bm{\theta}}$ is called the primary network and $Q^{\prime}_{\bm{\theta}}$ is called the target network and the weights of the target network are updated periodically. In the original DDQN, the weights of target network are directly copied from the primary network every few episodes. In DDQN-soft, the target networks are updated using polyak averaging to smoothly update the weights (“soft-update”) [42].

The input to the DDQN agent is a state s of size $1\times M$ . The output of the network is a vector of size $1\times(M+1)$ that contains the values of the Q-function with respect to state s and each of the $M+1$ actions. In all hidden layers, we use the rectified linear unit (ReLU) as an activation function. Given the neural networks input-output dimensions, the overall DDQN architecture and its interaction with the environment are shown in Fig. 6. As shown, the major components are primary network, target network, experience replay, and the interaction with the environment to select an action.

To train the DDQN agent, the experiences are initially stored in the memory using $\epsilon$ -greedy policy, that is, for a state $s_{t}$ , an action $a_{t}$ is taken randomly with probability $\epsilon_{t}$ , or taken greedily with probability $1$ - $\epsilon_{t}$ from the current state of the DDQN network. Then, when we have sufficient samples in the memory a mini-batch of $\bm{X}$ experiences $\{(\textbf{s}_{i},\textbf{a}_{i},r_{i},\textbf{s}_{i}^{\prime})\}_{i}\in\bm{X}% _{t}$ is randomly sampled from the memory for every time step t to train the neural networks. Here, $\bm{X}_{t}$ is the set of experiences currently available in the memory. Based on the selected mini-batch, we compute and update the weights $\bm{\theta}$ of the primary network $Q_{\bm{\theta}}$ that minimize the loss function $L_{t}({\bm{\theta}})$ . Fig. 6 captures the overall DDQN architecture and the interaction of the agents with the environment [41, 42].

VI I/Q DATASET GENERATION

Utilizing data-driven machine learning techniques for wide-band spectrum sensing requires substantial amounts of spectrum data. While obtaining raw I/Q signals over the air using physical hardware is the ideal scenario, the complexity of coordinating multiple UAVs in a specific environment for collaborative sensing poses significant challenges in achieving this objective. Therefore, we resort to MATLAB’s LTE toolbox to create the I/Q samples and employ ray-tracing methods to emulate the channel for generating synthetic datasets that closely mimic the data collection process through experimentation. The entire process of generating synthetic datasets is outlined below.

Dataset Generation Methodology. As shown in Fig. 7, we assume a multi-cell environment consisting of three neighboring cells with base-stations at the center of the cells. Without loss of generality, we simulate for one specific LTE band in the Kansas city area and obtain the location of the base-stations from cellmapper [43], an open crowd sourced cellular tower and coverage map** service. Furthermore, we assume there are three UAVs in the network operating at an altitude of $90$ meters. In this scenario, the base-stations act as the transmitter sites and the UAV locations as the receiver sites that collect the I/Q samples for wide-band spectrum sensing.

Another important aspect in any wireless network is the wireless channel modeling. We use ray-tracing methods to incorporate the channel between the BS and the UAV. We incorporate both reflection and diffraction settings in ray-tracing to simulate a near real-world environment. This is in contrast to using channel models, which consider probabilistic channel models for line-of-sight (LoS) and non-LoS channel conditions. Since we use ray-tracing, we have the flexibility to incorporate different aspects of the environment like buildings and vegetation, permittivity and permeability of the materials, which further enhances the channel model.

To mimic the real-world scenario for conducting ray-tracing experiments, we use OpenStreetMap, which is a free and open geographical database [44]. The evaluation area is a $3$ km $\times$ $3$ km area with buildings and vegetation. We utilize MATLAB’s ray-tracer to emulate the wireless channel between considered UAVs and base-station locations. The ray-tracing simulation setup is outlined in Table I.

Parameter	Description
Location	Kansas City
Area	$3$ km x $3$ km
Frequency	$1980$ MHz
Number of base-stations	$3$
Number of UAVs	$3$
UAV Altitude	$90$ m
Max. Number of Reflections	$5$
Max. Number of Diffractions	$2$

TABLE I: Ray-tracing simulation setup.

It is essential to highlight that this setup can be seamlessly adapted to accommodate varying numbers of LTE cells and UAVs as long as we are able to obtain the $3$ D environment and load it into MATLAB. In our considered scenario, we assume the UAVs are stationary and are hovering in a fixed position. However, this simulation can be extended to incorporate the UAV flight trajectories by running additional ray-tracing experiments for each UAV way-point location in the UAV trajectory.

The MATLAB’s ray-tracing toolbox effectively emulates the channel. Next, we utilize MATLAB’s LTE Toolbox to generate the LTE waveform to extract the I/Q samples. For generating the LTE waveform, we assume that the entire cell bandwidth of $10$ MHz ( $50$ resource blocks) is split into $16$ orthogonal sub-channels, each of size $3$ resource blocks. Typically, a base-station has the flexibility to assign either a single sub-channel or multiple sub-channels to a PU for transmitting user-specific data on the downlink shared channel. Additionally, various multiple access techniques can be employed to transmit data to different PUs in different time slots. However, during our dataset generation process, we do not consider primary user locations and how the base-station allocates user specific data to different PUs. At any given point in time, we take a snapshot of the entire cell bandwidth and identify which spectrum bands are occupied. Furthermore, when creating the downlink waveform, we omit the generation of UE specific reference signals to avoid mixing user-specific data with broadcast channels. Instead, we identify the appropriate indices and embed the LTE data samples into the downlink shared channel to generate the LTE waveform.

Modelling the channel occupancy. In our assumed scenario, each cell bandwidth is divided into $16$ sub-channels such that a binary flag $1$ indicates the sub-channel is allocated and $0$ represents the sub-channel is not allocated. Hence, each $16$ -bit binary combination serves as a distinct true label for the channel occupancy. As a result, the base-station has the capability to generate $2^{16}$ unique labels, spanning from no sub-channel allocation to a fully busy cell site. For instance, in Fig. 9 we show the spectrogram of one channel realization, where $6$ sub-channels are occupied out of $16$ sub-channels. Furthermore, we model the temporal dynamics of each sub-channel using a binary Markov chain, as shown in Fig. 8. Thus, the channel occupancy for each sub-channel $m$ evolves according to a transition probability matrix $\bm{P}_{m}$ . In this paper, we consider different transition probabilities for each sub-channel. Thus, the overall transition probabilities across $M$ sub-channels are denoted as follows:

\bm{P}=\bigg{\{}\begin{bmatrix}p_{00}^{1}&p_{01}^{1}\\ p_{10}^{1}&p_{11}^{1}\end{bmatrix},\cdots\cdots,\begin{bmatrix}p_{00}^{M}&p_{0% 1}^{M}\\ p_{10}^{M}&p_{11}^{M}\end{bmatrix}\bigg{\}}.

(24)

Further, we assume that all SUs are capable of receiving the waveform from all the base-stations, whose channel is modelled by the ray-tracing. In addition to the reflected paths received from the corresponding base-station in which the UAV is present, we also receive the waveform from the neighboring base-stations as shown in Fig. 7. The received signal $\bm{r}_{k}(t)$ at each UAV $k$ can be written as superposition of wideband signals received from all base-stations as shown in Eq. (6). Furthermore, we vary the noise variance $\sigma_{k}^{2}(t)$ at UAV $k$ such that the effective SNR varies from $-10~{}\text{dB}$ to $20~{}\text{dB}$ in steps of $10~{}\text{dB}$ . For instance, in Fig. 10 we show the power spectrum of the received signal at UAV location $1$ . Ideally, we would like to capture the whole LTE frame corresponding to $10$ MHz LTE waveform. However, we only capture $32$ I/Q samples that provides a good trade-off between the computational complexity and the performance. In this context, for each SNR, we collect approximately $6.8$ million I/Q samples. When considering all SNR levels and the UAV locations together, the total generated dataset is more than $80$ million I/Q samples, which will be publicly released along with all source codes. Generation of large-scale spectrum datasets for dynamic UAV environments enables us to evaluate the proposed data-driven collaborative wideband spectrum sensing and sharing, as described next.

VII Numerical Results

In this section, we first present our target performance metrics, followed by a discussion of the results on spectrum sensing for different ML configurations. Next, we present the results of collaborative spectrum inference and fusion followed by spectrum access using RL.

Performance metrics. As mentioned earlier, detecting spectrum holes aligns with the framework of a classical multi-label classification problem, where each sub-channel represents a label. We utilize Precision, Recall, and F1-score as metrics to evaluate the classifier’s performance for each sub-channel by constructing a confusion matrix. Although we can calculate these performance metrics for each sub-channel individually, it would be advantageous to have an average performance assessment across all $16$ sub-channels [45]. In this paper, we consider the micro-averages for Precision, Recall and F1-score to concretely capture the sensing performance across the $16$ sub-channels as follows:

\text{Precision}=\frac{\sum_{m=1}^{M}\text{TP(m)}}{\sum_{m=1}^{M}\text{TP(m)}+% \text{FP(m)}},

(25)

\text{Recall}=\frac{\sum_{m=1}^{M}\text{TP(m)}}{\sum_{m=1}^{M}\text{TP(m)}+% \text{FN(m)}},

(26)

\text{F1-score}=\frac{\text{2(Precision . Recall)}}{\text{Precision}+\text{% Recall}}.

(27)

where TP, FN, FP accounts for the number of true positives, false negatives, and false positives, respectively.

VII-A Model Training with Distributed UAVs

As previously stated, we model wideband spectrum sensing, aiming to identify spectrum holes from the given I/Q samples as inputs to the ML model. In this context, we explore three configurations: centralized learning (CL), local learning (LL), and federated learning (FL), for training and testing the wideband spectrum sensing model. In each of these configurations, we use $70$ % of the dataset to train the model and $30$ % for spectrum inference purposes. Next, we present the results obtained for each configuration.

Centralized Learning (CL) is a technique in which it is assumed that all the data collected at different locations are aggregated at one central server and are readily available to train the ML model. The trained CL model is loaded on the UAVs for testing purposes, and the performance metrics computed at different UAV locations are shown in Fig. 11. As shown in Fig. 11b, 11c, the performance metrics computed at UAV locations $2$ and $3$ improves as SNR increases and attains near optimal performance around SNR $20$ dB. However, as it can be seen in Fig. 11a, at UAV location $1$ , the metrics are saturated at $96$ %. Since CL accumulates all the datasets, the distributions of datasets at different locations are incorporated into the CL model, and thus it generalizes well to different locations, as shown in Fig. 11. However, the caveat of CL is the need to aggregate all datasets in one location to train a single model. On the other hand, we can consider local learning.

Local Learning (LL) is a ML technique in which each UAV trains a model with its own local data, without sharing the dataset or model parameters with a central server or other UAVs. Hence, LL characterizes the performance of the model at a particular location. As shown in Fig. 12, the performance metrics improve as the SNR increases. Although LL tends to be a natural solution that provides insights into the performance of local models trained based on the local datasets, LL models at one particular location do not generalize to other locations. For example, in Fig. 13, when the local model trained at UAV location $1$ is tested at locations $2$ and $3$ , the performance metrics are significantly lower than their individual performance metrics, as shown in Fig. 12. This is one of the key observations that led us to explore federated learning that combines the advantages of both LL and CL to obtain a more generalized global model, without the need to accumulate all the datasets in one central location.

Federated Learning achieves trade-off between LL and CL, as it does not require aggregating the datasets in a central location; instead, the local model gradients are transferred to the central server for aggregation, and in return, the local models receive aggregated global weights as described in Algorithm 1. As such, the training process is similar to LL except that the local model weights are updated with the computed global weights iteratively, and by the end of the training process, all of the UAVs will have the same global model. To investigate FL performance, we implement the FedAvg algorithm [38] and the results are presented in Fig. 14. From the results, we note that FedAvg achieves good performance only for the UAV locations $2$ and $3$ . Given the heterogeneous dataset collected at different UAV locations, the overall performance of FedAvg is limited by the UAV(s) that performs the worst. This is because FedAvg scales the weights of all local models equally. To reduce the impact of UAV locations with poor performance, our proposed pwFedAvg algorithm scales the weights of local models according to the received signal power. As shown in Fig. 15 it is evident that our proportional weighting scheme improves the performance at locations $2$ and $3$ . Furthermore, to have a fair comparison, we plot the F1-score for CL, FL-FedAvg and FL-pwFedAvg as shown in Fig. 16. With our proposed aggregating scheme (pwFedAvg), we improve the performance metrics at UAV locations $2$ and $3$ , without significantly affecting location $1$ performance.

VII-B Collaborative Spectrum Inference results

As shown in Fig. 5, we consider fusing the spectrum hole predictions from multiple UAVs. This is motivated by the fact that individual sensing performance might fluctuate at different locations, which we observed in the CL, LL, FL settings. However, by applying fusion rules, we can significantly improve the overall performance, as shown by our results in Fig. 17. From the results, we notice that the overall performance of all methods is significantly improved by fusion. Furthermore, the proposed pwFedAvg algorithm outperforms FedAvg, while achieving comparable results with respect to the CL method without the need to transfer all datasets to a central location. The comparison results for the spectrum fusion results at locations $2$ and $3$ are omitted for brevity, as they show similar trends.

VII-C Spectrum Resource Allocation using RL

As mentioned in Section V, we use deep Q-learning methods for allocating spectrum resources to the UAVs. In Fig. 18a, we compare the training performance of three variants of Q-learning methods for allocating a sub-channel to a single UAV whenever the fusion rule detects at least a single spectrum hole. It is observed that DDQN with soft update performs slightly better and converges earlier than DDQN and vanilla-DQN. Next, we extend the model to allocate spectrum holes to two UAVs. In this case, we have augmented the DDQN algorithm with soft update to generate two best actions. From the results in Fig. 18b, we observe that the utility performance with two SUs is slightly less than two times of the performance with a single SU. We further note that this paper tries to explore the possibility of integrating spectrum sensing and sharing by making use of existing RL algorithms. Though we explored Q-learning techniques, different and other advanced RL algorithms can be integrated into the proposed framework.

VIII Conclusion

In this paper, we developed a collaborative wideband spectrum sensing and sharing solution for networked UAVs. To train machine learning models for detecting spectrum holes, we explored the applications of FL and developed an architecture that integrates wireless dataset generation into the FL model training and aggregation steps. To this end, we proposed the pwFedAvg algorithm to incorporate wireless channel conditions and received signal powers into the FL aggregation algorithm. To further enhance the accuracy of the predicted spectrum holes by individual UAVs, we considered spectrum fusion at the central server. Additionally, by leveraging deep Q-learning methods, the detected spectrum holes are dynamically allocated to the requesting UAVs. To evaluate the proposed methods, we generated a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. Based on the collected I/Q datasets, we investigated the performance of three model training algorithms, namely CL, LL, and FL. The numerical results demonstrated that the CL model generalizes well and performs better for all UAV locations, while the LL models showed poor generalization performance. Furthermore, the proposed pwFedAvg algorithm outperforms FedAvg while achieving comparable results with respect to the CL method without the need for sharing all datasets to a central location. From the fusion results, we noticed that the overall performance improved significantly for all learning configurations, and the implemented DDQN method can provide dynamic spectrum scheduling across requesting UAVs. In future work, we plan to expand the application of our developed solutions to other technologies and spectrum bands (beyond LTE), while incorporating realistic spectrum usage of the incumbent users in those bands (i.e., PUs).

IX APPENDIX

Proof of Lemma $1$ . Since $\nabla L_{k}(\bm{\omega}_{k}^{*})$ = $0$ , assumption $1$ reduces to $L_{k}(\bm{\omega}^{*})$ - $L_{k}(\bm{\omega}_{k}^{*})$ $\leq$ $\frac{L}{2}$ $||\bm{\omega}^{*}-\bm{\omega}_{k}^{*}||_{2}^{2}$ . Using the identities of vector norm and max-norm for a vector $\mathbf{x}$ , $||\mathbf{x}||_{2}^{2}\leq~{}d||\mathbf{x}||_{\infty}^{2}=d(\max\limits_{i}|x_% {i}|)^{2}$ , we have:

\frac{L}{2}||\bm{\omega}^{*}-\bm{\omega}_{k}^{*}||_{2}^{2}\leq\max\limits_{k}% \{{\frac{Ld}{2}}(\max\limits_{i}\{|\omega_{i}^{*}-\omega_{k,i}^{*}|\})^{2}\},

which completes the proof.

Proof of Lemma $2$ . Using the definition of virtual sequences from Eq. (19), we have:
$\mathbb{E}\big{(}||\bm{a}^{t}-\bm{\bar{a}}^{t}||_{2}^{2}\big{)}$

\begin{split}&\overset{\mathrm{(a)}}{=}\small{\mathbb{E}\big{[}\sum_{k=1}^{{K}% }||{\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}\big{[}\nabla L_{k}(\bm{% \omega}_{k}^{t};\bm{\xi}_{k}^{t})}-{\nabla L_{k}(\bm{\omega}_{k}^{t})\big{]}||% _{2}^{2}}\big{]}}\\ &\overset{\mathrm{(b)}}{\leq}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}}{% \alpha^{t}}\bigg{)}^{2}\mathbb{E}\big{[}||{\nabla L_{k}(\bm{\omega}_{k}^{t};% \bm{\xi}_{k}^{t})}-{\nabla L_{k}(\bm{\omega}_{k}^{t})||_{2}^{2}}\big{]}\\ &\overset{\mathrm{(c)}}{\leq}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}}{% \alpha^{t}}\bigg{)}^{2}\rho_{k}^{2},~{}\end{split}

(28)

where (a) is from Eq. (19), (b) comes from Jensen’s inequality, and (c) is by applying Assumption $3$ .

Proof of Lemma $3$ . Using Eq. (14) and Eq. (15), we have the following equation:
$||\bm{\omega}^{t+1}-\bm{\omega}^{*}||_{2}^{2}=||\bm{\omega}^{t}-\gamma^{t}\bm{% a}^{t}-\bm{\omega}^{*}||_{2}^{2}$

\begin{split}&=||\bm{\omega}^{t}-\gamma^{t}\bm{\bar{a}}^{t}+\gamma^{t}\bm{\bar% {a}}^{t}-\gamma^{t}\bm{a}^{t}-\bm{\omega}^{*}||_{2}^{2}\\ &=\underbrace{||\bm{\omega}^{t}-\gamma^{t}\bm{\bar{a}}^{t}-\bm{\omega}^{*}||_{% 2}^{2}}_{\text{$A_{1}$}}+\underbrace{(\gamma^{t})^{2}||\bm{\bm{a}^{t}-\bar{a}}% ^{t}||_{2}^{2}}_{\text{$A_{2}$}}\\ &\underbrace{-2\gamma^{t}\langle\bm{\omega}^{t}-\gamma^{t}\bm{\bar{a}}^{t}-\bm% {\omega}^{*},\bm{a}^{t}-\bar{\bm{a}}^{t}\rangle}_{\text{$A_{3}$}}.\\ \end{split}

(29)

Since $\mathbb{E}[\bm{a}^{t}]=\bar{\bm{a}}^{t}$ , it can be seen that $\mathbb{E}[A3]=0$ .
By expanding $\mathbb{E}[A_{1}]$ , we have:

\begin{split}\mathbb{E}[A_{1}]&=\mathbb{E}\big{[}||\bm{\omega}^{t}-\bm{\omega}% ^{*}||_{2}^{2}+\underbrace{(\gamma^{t})^{2}||\bar{\bm{a}}^{t}||_{2}^{2}}_{% \text{$A_{1,1}$}}\\ &-\underbrace{2~{}\gamma^{t}\langle\bm{\omega}^{t}-\bm{\omega}^{*},\bar{\bm{a}% }^{t}\rangle}_{\text{$A_{1,2}$}}\big{]}.\\ \end{split}

(30)

The bound on $A_{1,1}$ term can be derived as follows: $\mathbb{E}[A_{1,1}]$

\begin{split}&\overset{\mathrm{(a)}}{=}(\gamma^{t})^{2}~{}\mathbb{E}\big{[}||% \sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\nabla L_{k}(\bm{\omega}_{k}^% {t})||_{2}^{2}\big{]}\\ &\overset{\mathrm{(b)}}{\leq}(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}\frac{% \alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}||\nabla L_{k}(\bm{\omega}_{k}^{t})||_{% 2}^{2}\\ &\overset{\mathrm{(c)}}{\leq}2L~{}(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}% \frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\big{(}L_{k}(\bm{\omega}_{k}^{t})% -L_{k}(\bm{\omega}_{k}^{*})\big{)},\\ \end{split}

(31)

where (a) is from Eq. (19), (b) comes from Jensen’s inequality, and (c) by applying Assumption $1$ and $L$ -smoothness property [46]. The bound for $A_{1,2}$ term can be derived as follows:
$\mathbb{E}[A_{1,2}]$ = $-2~{}\gamma^{t}~{}\langle\bm{\omega}^{t}-\bm{\omega}^{*},\sum_{k=1}^{{K}}\frac% {\alpha_{k}^{t}}{\alpha^{t}}~{}{\nabla L_{k}(\bm{\omega}_{k}^{t})}\rangle$

	$\displaystyle=-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}% ~{}\langle\bm{\omega}^{t}-\bm{\omega}^{*},~{}{\nabla L_{k}(\bm{\omega}_{k}^{t}% )}\rangle$
	$\displaystyle\overset{\mathrm{(a)}}{\leq}-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{% \alpha_{k}^{t}}{\alpha^{t}}\big{[}L_{k}(\bm{\omega}_{k}^{t})-L_{k}(\bm{\omega}% ^{})+\frac{\beta}{2}\bm{\|\|\omega}^{t}-\bm{\omega}^{}\|\|_{2}^{2}\big{]}$
	$\displaystyle\overset{\mathrm{(b)}}{\leq}\underbrace{-2\gamma^{t}~{}\sum_{k=1}% ^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{(}L_{k}(\bm{\omega}_{k}^{t})-L_{k}% (\bm{\omega}^{})\big{)}}_{\text{$A_{1,2,1}$}}-\beta\gamma^{t}~{}\|\|\bm{\omega}% ^{t}-\bm{\omega}^{}\|\|_{2}^{2},$

where (a) comes from Assumption $2$ , (b) comes from the fact that $\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}=1.$

Combining $\mathbb{E}[A_{1,1}]$ and $\mathbb{E}[A_{1,2,1}]$ , we have

$\mathbb{E}[A_{1,1}]$ + $\mathbb{E}[A_{1,2,1}]$

\begin{split}&\leq 2L~{}(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{% k}^{t}}{\alpha^{t}}\bigg{)}^{2}\bigg{(}L_{k}(\bm{\omega}_{k}^{t})-L_{k}(\bm{% \omega}^{*})\\ &~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{% }~{}+L_{k}(\bm{\omega}^{*})-L_{k}(\bm{\omega}_{k}^{*})\bigg{)}\\ &-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{(}L_{k}(% \bm{\omega}_{k}^{t})-L_{k}(\bm{\omega}^{*})\big{)}\\ &\overset{\mathrm{(a)}}{\leq}2L\tau(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}% \frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\\ &-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{[}1-L% \gamma^{t}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{]}\big{(}L_{k}(\bm{\omega}_{k}% ^{t})-L_{k}(\bm{\omega}^{*})\big{)}\\ &\overset{\mathrm{(b)}}{\leq}2\tau\gamma^{t}\bigg{[}1-L\gamma^{t}\sum_{k=1}^{{% K}}\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\bigg{]},\end{split}

(32)

where (a) comes from Lemma $1$ and (b) comes from the fact that $L_{k}(\bm{\omega}_{k}^{t})-L_{k}(\bm{\omega}_{k}^{*})\geq 0$ and Lemma $1$ . Now, $\mathbb{E}[A_{2}]=(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}% }{\alpha^{t}}\bigg{)}^{2}\rho_{k}^{2}$ can be found easily by applying Lemma $1$ . Substituting $\mathbb{E}[A_{1}],~{}\mathbb{E}[A_{2}],~{}\mathbb{E}[A_{3}]$ into $\mathbb{E}\big{(}||\bm{\omega}^{t+1}-\bm{\omega}^{t}||_{2}^{2}\big{)}$ and using the fact that $\frac{1}{\kappa}\leq\gamma^{t}$ complete the proof.

Proof of Theorem $1$ Similar to [39, 40], we define $\Delta^{t}=\mathbb{E}[||\bm{\omega}^{t}-\bm{\omega}^{*}||_{2}^{2}]$ . From Lemma $3$ , it follows that, $\Delta^{t+1}\leq(1-\beta\gamma^{t})\Delta^{t}+(\gamma^{t})^{2}G^{t}$ . we assume $\gamma^{t}=\frac{\alpha}{t+\mu}$ for some $\alpha>\frac{1}{\beta}$ and $\mu>1$ . Assuming $\lambda=\max\{\frac{\alpha^{2}G}{\alpha\beta-1},\mu\Delta^{0}\}$ , we will prove $\Delta^{t}\leq\frac{\lambda}{t+\mu}$ by induction as follows. The definition of $\lambda$ ensures that the inequality $\Delta^{t}\leq\frac{\lambda}{t+\mu}$ holds for $t=0$ . For the inequality to hold for $t>0$ , it follows from definition as follows:
$\Delta^{t+1}\leq(1-\beta\gamma^{t})\Delta^{t}+(\gamma^{t})^{2}G^{t}$

\begin{split}&\leq\bigg{(}1-\frac{\alpha\beta}{t+\mu}\bigg{)}+\frac{\alpha^{2}% G^{t}}{(t+\mu)^{2}}\\ &\leq\frac{t+\mu-1}{(t+\mu)^{2}}\lambda+\underbrace{\bigg{[}\frac{\alpha^{2}G^% {t}-\alpha\beta+1}{(t+\mu)^{2}}\bigg{]}}_{\leq 0}\\ &\leq\frac{t+\mu-1}{(t+\mu)^{2}-1}\lambda\leq\frac{\lambda}{(t+1)+\mu}.\end{split}

(33)

Specifically, if we choose $\alpha=\frac{2}{\beta}$ , $\mu=\frac{2L}{\beta}$ , then $\gamma^{t}=\frac{2}{\beta t+2L}$ . Then, we have
$\lambda=\max\big{\{}\frac{\alpha^{2}G}{\alpha\beta-1},\mu\Delta^{0}\big{\}}$

\begin{split}&\leq\frac{\alpha^{2}G}{\alpha\beta-1}+\mu\Delta^{0}=\frac{4G}{% \beta^{2}}+\frac{2L}{\beta}||\bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{2}.\\ \end{split}

(34)

Finally, we have:
$\mathbb{E}[L(\bm{\omega}^{t})]-L^{*}$

\begin{split}&\overset{\mathrm{(a)}}{\leq}\frac{L}{2}||\bm{\omega}^{t}-\bm{% \omega}^{*}||_{2}^{2}=\frac{L}{2}\Delta^{t}=\frac{L}{2}\frac{\lambda}{t+\mu}\\ &\overset{\mathrm{(b)}}{\leq}\frac{L}{2\alpha}\frac{\alpha}{t+\mu}\bigg{[}% \frac{4G}{\beta^{2}}+\frac{2L}{\beta}||\bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{% 2}\bigg{]}\\ &\overset{\mathrm{(c)}}{\leq}\frac{L}{\beta t+2L}\bigg{[}\frac{2G}{\beta}+L||% \bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{2}\bigg{]},\end{split}

(35)

where (a) comes from $L$ -smoothness of the loss function and using the fact that $\nabla L(\bm{\omega}^{*})=0$ , (b) is computed using Eq. (34), and (c) is computed by substituting the values of $\alpha$ and $\mu$ . Hence, the convergence is proved to be $\mathcal{O}(\frac{1}{T})$ .

X Acknowledgement

The material is based upon work supported by NASA under award No(s) 80NSSC20M0261, and NSF grants 1948511, 1955561, and 2212565. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration (NASA) and the National Science Foundation (NSF).

References

[1] H. Menouar, I. Guvenc, K. Akkaya, A. S. Uluagac, A. Kadri, and A. Tuncer, “UAV-enabled intelligent transportation systems for the smart city: Applications and challenges,” IEEE Communications Magazine, vol. 55, no. 3, pp. 22–28, 2017.
[2] FAA, “Drones by the Numbers,” https://www.faa.gov/node/54496.
[3] S. Li, Y. Gu, B. Subedi, C. He, Y. Wan, A. Miyaji, and T. Higashino, “Beyond visual line of sight UAV control for remote monitoring using directional antennas,” in 2019 IEEE Globecom Workshops (GC Wkshps). IEEE, 2019, pp. 1–6.
[4] P. Kopardekar, J. Rios, T. Prevot, M. Johnson, J. Jung, and J. E. Robinson, “Unmanned aircraft system traffic management (UTM) concept of operations,” 2016.
[5] A. S. Abdalla and V. Marojevic, “Communications standards for unmanned aircraft systems: The 3GPP perspective and research drivers,” IEEE Communications Standards Magazine, vol. 5, no. 1, pp. 70–77, 2021.
[6] M. Rimjha and A. Trani, “Urban Air Mobility: Factors Affecting Vertiport Capacity,” in 2021 Integrated Communications Navigation and Surveillance Conference (ICNS). IEEE, 2021, pp. 1–14.
[7] M. Ghazikor, K. Roach, K. Cheung, and M. Hashemi, “Exploring the Interplay of Interference and Queues in Unlicensed Spectrum Bands for UAV Networks,” in 2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023, pp. 729–733.
[8] W. S. H. M. W. Ahmad, N. A. M. Radzi, F. Samidi, A. Ismail, F. Abdullah, M. Z. Jamaludin, and M. Zakaria, “5G technology: Towards dynamic spectrum sharing using cognitive radio networks,” IEEE access, vol. 8, pp. 14 460–14 488, 2020.
[9] D. Uvaydov, S. D’Oro, F. Restuccia, and T. Melodia, “Deepsense: Fast wideband spectrum sensing through real-time in-the-loop deep learning,” in IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 2021, pp. 1–10.
[10] J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement learning-based resource allocation for UAV networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 2, pp. 729–743, 2019.
[11] Y. Li, W. Zhang, C.-X. Wang, J. Sun, and Y. Liu, “Deep reinforcement learning for dynamic spectrum sensing and aggregation in multi-channel wireless networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 2, pp. 464–475, 2020.
[12] H. Q. Nguyen, B. T. Nguyen, T. Q. Dong, D. T. Ngo, and T. A. Nguyen, “Deep Q-Learning with Multiband Sensing for Dynamic Spectrum Access,” in 2018 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2018, pp. 1–5.
[13] J. Kakar and V. Marojevic, “Waveform and spectrum management for unmanned aerial systems beyond 2025,” in 2017 IEEE 28th Annual international symposium on personal, indoor, and mobile radio communications (PIMRC). IEEE, 2017, pp. 1–5.
[14] B. Shang, V. Marojevic, Y. Yi, A. S. Abdalla, and L. Liu, “Spectrum sharing for UAV communications: Spatial spectrum sensing and open issues,” IEEE Vehicular Technology Magazine, vol. 15, no. 2, pp. 104–112, 2020.
[15] S. R. Chintareddy, K. Roach, K. Cheung, and M. Hashemi, “Collaborative wideband spectrum sensing and scheduling for networked uavs in utm systems,” in GLOBECOM 2023-2023 IEEE Global Communications Conference. IEEE, 2023, pp. 3064–3069.
[16] B. Shang, L. Liu, H. Chen, J. Zhang, S. Pudlewski, E. S. Bentley, and J. Ashdown, “Spatial spectrum sensing-based D2D communications in user-centric deployed HetNets,” in 2019 IEEE Global Communications Conference (GLOBECOM). IEEE, 2019, pp. 1–6.
[17] H. Chen, L. Liu, T. Novlan, J. D. Matyjas, B. L. Ng, and J. Zhang, “Spatial spectrum sensing-based device-to-device cellular networks,” IEEE Transactions on Wireless Communications, vol. 15, no. 11, pp. 7299–7313, 2016.
[18] H. Chen, L. Liu, H. S. Dhillon, and Y. Yi, “QoS-aware D2D cellular networks with spatial spectrum sensing: A stochastic geometry view,” IEEE Transactions on Communications, vol. 67, no. 5, pp. 3651–3664, 2018.
[19] B. Shang, L. Liu, R. M. Rao, V. Marojevic, and J. H. Reed, “3D spectrum sharing for hybrid D2D and UAV networks,” IEEE Transactions on Communications, vol. 68, no. 9, pp. 5375–5389, 2020.
[20] C. Liu, J. Wang, X. Liu, and Y.-C. Liang, “Deep CM-CNN for spectrum sensing in cognitive radio,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 10, pp. 2306–2321, 2019.
[21] D. Chew and A. B. Cooper, “Spectrum sensing in interference and noise using deep learning,” in 2020 54th Annual conference on information sciences and systems (CISS). IEEE, 2020, pp. 1–6.
[22] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for dynamic spectrum access in multichannel wireless networks,” in GLOBECOM 2017-2017 IEEE Global Communications Conference. IEEE, 2017, pp. 1–7.
[23] ——, “Deep multi-user reinforcement learning for distributed dynamic spectrum access,” IEEE transactions on wireless communications, vol. 18, no. 1, pp. 310–323, 2018.
[24] H. Albinsaid, K. Singh, S. Biswas, and C.-P. Li, “Multi-agent reinforcement learning-based distributed dynamic spectrum access,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 1174–1185, 2021.
[25] Y. Bokobza, R. Dabora, and K. Cohen, “Deep reinforcement learning for simultaneous sensing and channel access in cognitive networks,” IEEE Transactions on Wireless Communications, 2023.
[26] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforcement learning for dynamic multichannel access in wireless networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 2, pp. 257–265, 2018.
[27] Z. Chen, Y.-Q. Xu, H. Wang, and D. Guo, “Federated learning-based cooperative spectrum sensing in cognitive radio,” IEEE Communications Letters, vol. 26, no. 2, pp. 330–334, 2021.
[28] Z. Gao, A. Li, Y. Chen, B. Li, Y. Wang, and Y. Chen, “FedSwap: A Federated Learning based 5G Decentralized Dynamic Spectrum Access System,” in 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2021, pp. 1–6.
[29] M. Wasilewska, H. Bogucka, and H. V. Poor, “Secure Federated Learning for Cognitive Radio Sensing,” IEEE Communications Magazine, vol. 61, no. 3, pp. 68–73, 2023.
[30] N. A. Khalek, D. H. Tashman, and W. Hamouda, “Advances in Machine Learning-Driven Cognitive Radio for Wireless Networks: A Survey,” IEEE Communications Surveys & Tutorials, 2023.
[31] X. Liu, Y. Deng, and T. Mahmoodi, “Wireless distributed learning: a new hybrid split and federated learning approach,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2650–2665, 2022.
[32] Z. Wang, Y. Zhou, Y. Shi, and W. Zhuang, “Interference management for over-the-air federated learning in multi-cell wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 8, pp. 2361–2377, 2022.
[33] G. Shi, S. Guo, J. Ye, N. Saeed, and S. Dang, “Multiple parallel federated learning via over-the-air computation,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1252–1264, 2022.
[34] B. Xiao, X. Yu, W. Ni, X. Wang, and H. V. Poor, “Over-the-air federated learning: Status quo, open challenges, and future directions,” arXiv preprint arXiv:2307.00974, 2023.
[35] M. M. Amiri, D. Gündüz, S. R. Kulkarni, and H. V. Poor, “Convergence of federated learning over a noisy downlink,” IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 1422–1437, 2021.
[36] X. Wei and C. Shen, “Federated learning over noisy channels: Convergence analysis and design examples,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 1253–1268, 2022.
[37] X. Zhang and K. G. Shin, “E-MiLi: Energy-minimizing idle listening in wireless networks,” in Proceedings of the 17th annual international conference on Mobile computing and networking, 2011, pp. 205–216.
[38] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
[39] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” arXiv preprint arXiv:1907.02189, 2019.
[40] N. Yan, K. Wang, C. Pan, and K. K. Chai, “Performance analysis for channel-weighted federated learning in OMA wireless networks,” IEEE Signal Processing Letters, vol. 29, pp. 772–776, 2022.
[41] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[42] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
[43] “CellMapper,” https://www.cellmapper.net/map.
[44] OpenStreetMap contributors, “Planet dump retrieved from https://planet.osm.org ,” https://www.openstreetmap.org, 2017.
[45] M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class classification: an overview,” arXiv preprint arXiv:2008.05756, 2020.
[46] S. P. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.