Federated Learning-based Collaborative Wideband Spectrum Sensing and Scheduling for UAVs in UTM Systems

Sravan Reddy Chintareddy1, Keenan Roach2, Kenny Cheung2, Morteza Hashemi1 1Department of Electrical Engineering and Computer Science, University of Kansas
2Universities Space Research Association (USRA)
Abstract

In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users (SUs) to opportunistically utilize detected “spectrum holes”. Our overall framework consists of three main stages. Firstly, in the model training stage, we explore dataset generation in a multi-cell environment and training a machine learning (ML) model using the federated learning (FL) architecture. Unlike the existing studies on FL for wireless that presume datasets are readily available for training, we propose a novel architecture that directly integrates wireless dataset generation, which involves capturing I/Q samples from over-the-air signals in a multi-cell environment, into the FL training process. To this purpose, we propose a multi-label classification problem for wideband spectrum sensing to detect multiple spectrum holes simultaneously based on the I/Q samples collected locally by the UAVs. In the traditional FL that employ FedAvg as the aggregating method, each UAV is assigned an equal weight during model aggregation. However, due to the disparities in channel conditions in a multi-cell environment, the FedAvg approach may not generalize effectively for all the UAV locations. To address this issue, we propose a proportional weighted federated averaging method (pwFedAvg) in which the aggregating weights incorporate wireless channel conditions and received signal powers at each individual UAV. As such, the proposed method integrates the intrinsic properties of wireless datasets into the FL algorithm. Secondly, in the collaborative spectrum inference stage, we propose a collaborative spectrum fusion strategy that is compatible with the unmanned aircraft system traffic management (UTM) ecosystem. In particular, we improve the accuracy of spectrum sensing results by combining the individual multi-label classification results from the individual UAVs at a central server. Finally, in the spectrum scheduling stage, we leverage reinforcement learning (RL) solutions to dynamically allocate the detected spectrum holes to the secondary users. To evaluate the proposed methods, we establish a comprehensive simulation framework that generates a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station (BS) locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. This evaluation methodology provides a flexible framework to generate large spectrum datasets that could be used for develo** ML/AI-based spectrum management solutions for aerial devices.

Index Terms:
UAV-based Spectrum Sensing, Collaborative Inference, Federated Learning (FL), Reinforcement Learning (RL), UAS traffic management (UTM).

I Introduction

Refer to caption
Figure 1: Unmanned Aircraft System Traffic Management (UTM) architecture showing the separation between Federal Aviation Administration (FAA) and industry developments; Flight Information Management System (FIMS).

Unmanned aerial vehicles (UAVs) have attracted significant interest from communications and networking, robotics, and control societies for exploring novel applications such as on-demand connectivity, search-and-rescue operations, situational awareness, to name a few [1]. As of April 2024, there were roughly 800,000 registered UAVs in the US alone, positioning UAVs as one of the fastest-growing sectors in the aviation industry [2]. Traditionally, UAVs that are used for recreational purposes are operated under visual line of sight (VLOS) conditions. However, real-world and commercial deployments will most likely be in the form of beyond visual line-of-sight (BVLOS), which provides easier access to remote or hazardous areas, less human intervention, and reduced cost of operation [3]. For safe operations of multiple UAVs under BVLOS conditions, NASA and FAA are in the process of defining the UTM system [4]. Fig. 1 shows a simplified form of the UTM architecture, highlighting the separation between FAA and industry development and deployment responsibilities for the necessary infrastructure, services, and entities that interact within the UTM ecosystem. In this work, we mainly focus on the hierarchical structure between multiple operators and the UAS service supplier (USS), which assists multiple operators in meeting UTM operational requirements, ensuring safe and efficient utilization of the airspace.

The concept of operations within the UTM architecture [4] highlights the need for spectrum resources to facilitate wireless communications between UAVs, UAV operators, and the USS network. Existing terrestrial mobile networks (for example, 4G LTE and the upcoming 5G-and-beyond) provide significant wireless coverage with relatively low latency, high throughput, and low cost, making the cellular network a good candidate for the operation of UAVs in BVLOS scenarios [5]. However, the proliferation of new wireless services and the demand for higher cellular data rates have significantly exacerbated the spectrum crunch that cellular providers are already experiencing. Therefore, it is essential to develop dynamic spectrum sensing, inference, and sharing solutions for UAV operations in existing licensed and unlicensed spectrum to enable advanced aerial use cases in BVLOS, such as urban air mobility (UAM) and advanced air mobility (AAM) [6, 7].

Refer to caption
Figure 2: Envisioned FL system model in a Multi-cell wireless network with multiple UAVs.

There exists a multitude of prior works on spectrum management frameworks for ground users [8, 9, 10, 11, 12]. For instance, the authors in [11, 9, 10] propose deep learning-based wideband spectrum sensing to dynamically detect “spectrum holes”. Furthermore, the authors in [12] propose reinforcement learning (RL) techniques for spectrum sharing, assuming that spectrum sensing results are readily available. While these data-driven spectrum management frameworks for ground users are available, they are not directly applicable for UTM-enabled UAV operations, due to several factors, such as the widely different wireless channel models and the overall system architecture [9, 12]. In the context of UAV spectrum sharing systems, the authors in [13, 14] proposed spatial spectral sensing (SSS) to develop efficient spectrum sharing policies for UAV communications aimed at improving the overall spectral efficiency (SE). However, the SSS models do not consider the spectrum usage pattern of users under realistic scenarios (e.g., ignoring the I/Q level samples), and/or they consider only a single primary user (PU) or secondary user (SU). Moreover, the problem of joint multi-channel wideband spectrum sensing and scheduling among several SUs has not been fully investigated. In this paper, we propose a unified and data-driven spectrum sensing and scheduling framework to enable UAVs to effectively share the spectrum with existing primary users. To make our development more concrete and grounded, the problem of joint spectrum sensing and sharing is formulated as an energy efficiency (EE) maximization in a wideband multi-UAV network scenario. Then, we transform the EE optimization problem into a Markov Decision Process (MDP) to maximize the overall throughput of the SUs. At the spectrum sensing stage, we note the inherent hierarchical nature of the UTM architecture with USS (shown in Fig. 1) is a good match for federated learning (FL) based spectrum sensing. For spectrum scheduling stage, we develop RL-based solutions to enable non-manual and automated spectrum resource allocation. Particular to the spectrum sensing stage, we propose an FL-based cooperative wideband spectrum sensing across multiple UAVs. To this purpose, we develop a multi-label classification framework to identify spectrum holes based on the observed I/Q samples. Each UAV trains their respective local models using the locally collected datasets and transmits the local model parameters to the central server. Furthermore, we propose a novel proportional weighted federated averaging (pwFedAvg) method that incorporates the power level received at each UAV into the FL aggregation algorithm, thereby integrating the dataset generation plane with the FL model training plane, as shown in Fig. 2. Once the training process is completed, all UAVs have an updated global model that predicts spectrum holes. To further enhance the accuracy of the individual spectrum inference results, the predicted spectrum holes from the multi-label classification at each UAV are fused at a central server within the UTM ecosystem. In the spectrum scheduling stage, we develop and implement several RL algorithms, including the standard Q-learning methods to dynamically allocate underutilized spectrum sub-channels to multiple UAVs. We further investigate the performance of the “vanilla” deep Q-Network (DQN) and its variations, including double DQN (DDQN) and DDQN with soft-update.

Furthermore, one of the primary challenges of using machine learning (ML) based methods for spectrum sensing and scheduling approaches is the need for large amounts of training data. The lack of available spectral data in many cases is a significant obstacle, especially for UAV networks that introduce an additional level of complexity for large-scale experimental data collection. To address this gap, we have developed a comprehensive framework for generating spectrum datasets. This framework models LTE waveform generation and propagation channel in any environment of interest, particularly suitable for UTM-enabled UAV applications. Using the generated dataset, we provide a comprehensive set of numerical results to demonstrate the efficacy of the joint FL-based spectrum sensing, spectrum fusion and RL-based dynamic spectrum allocation to multiple UAVs. In summary, the main contributions of this paper are as follows:

  • We develop a spectrum management framework based on the envisioned UTM deployment architecture. To this end, we propose a joint spectrum sensing and scheduling problem for collaborative networked UAVs that operate according to the UTM rules. The joint optimization problem integrates the spectrum sensing results into the spectrum scheduling stage for scenarios with multiple secondary users (i.e., UAVs) and primary users.

  • For spectrum sensing, we propose an FL-based solution to enable collaborative model training across distributed UAVs. We propose the pwFedAvg method that integrates the underlying wireless channel conditions into the FL aggregation step. We also provide the convergence analysis results of the proposed pwFedAvg method. We demonstrate the benefits of collaborative spectrum sensing through a fusion step. For the spectrum scheduling stage, we develop RL-based solutions leveraging DQN-based approaches.

  • We outline a methodology for generating large amounts of I/Q dataset for UAVs in a wide geographical area, considering the effects in a multi-cell multi-path environment by incorporating the base-station locations and accurately modeling the environment using ray-tracing methods. Based on the established framework, we provide a comprehensive set of numerical results to analyze the performance of pwFedAvg compared with the traditional FedAvg approach, as well as with centralized and local learning. Our results demonstrate the efficacy of the pwFedAvg method for collaborative spectrum sensing, without the need to transfer all I/Q samples to one location as in central learning.

This paper extends our prior work [15] in which we did not investigate the feasibility of model training using FL methods for UAVs. In contrast, this paper mainly focuses on develo** FL-based spectrum sensing by incorporating the wireless datasets captured by multiple UAVs into the FL model training plane, as shown in Fig. 2. Furthermore, we have significantly extended our dataset generation by scaling the size of captured I/Q data samples and increasing the number of reflection and diffraction rays, thereby enhancing the fidelity of emulating the propagation environment. The remainder of this paper is organized as follows. In Section II, we review related works. In Section III, we present the overall system model and problem formulation for FL-based wideband spectrum sensing and collaborative spectrum inference and scheduling. In Section IV, we discuss the dataset generation model and the model training aspects of FL based solution to incorporate our proposed pwFedAvg method, followed by a discussion of the convergence analysis of pwFedAvg. In Section V, we present dynamic spectrum allocation using RL. Section VI describes our methodology to generate synthetic spectrum dataset followed by our numerical results in Section VII. Finally, Section VIII concludes the paper.

II Related Works

II-A Spectrum Sensing and Sharing for UAVs

The authors in [16] address spectrum access and interference management by utilizing SSS for ground based device-to-device (D2D) communications [17, 18]. Furthermore, the authors in [14, 19] extend the usage of SSS to UAVs to opportunistically access the licensed channels that are occupied by the D2D communications of ground users. The UAVs perform SSS to obtain the received signal strength and compare it with a threshold to identify the spectrum occupancy of a particular D2D channel. However, in general, energy-based detection methods would require capturing the entire waveform for a sub-channel to compute the energy and compare it with a predefined threshold. When there are multiple sub-channels, such detection methods repeated for each sub-channel add further time and hardware complexity. Therefore, SSS methods are not directly applicable to wideband spectrum sensing by UAVs to detect multiple spectrum holes simultaneously.

In addition to the SSS methods, data-driven deep learning (DL) methods for spectrum sensing have been considered in prior works [20, 21]. To develop multi-channel spectrum sensing using DL, the authors in [9] developed a fast wideband spectrum sensing based on DL. The DL model is based on a convolutional neural network (CNN) that accepts raw I/Q signals and predicts the spectrum holes. The above works consider a single PU, a single SU only, and the channel between the PU and SU is modeled as a Rayleigh fading channel.

Furthermore, there exists extensive research on spectrum sharing solutions. For example, the authors in [11, 22, 23, 24] propose the use of RL for dynamic spectrum access in multi-channel wireless networks. Furthermore, the authors in  [12, 25, 26] propose the use of DQN, where, in each time slot, a single SU decides whether to stay idle or transmit using one of the sub-channels in a multi-channel environment without performing spectrum sensing. While these studies have provided significant insights, they consider one SU only and are well studied for ground based communications.

In this paper, we consider a data-driven approach to predict multiple spectrum holes simultaneously from the raw I/Q signals captured in a multi-cell multi-path fading environment consisting of multiple PUs and SUs. We incorporate ray-tracing methods to effectively model the dynamic UAV environment instead of assuming a statistical channel model. Furthermore, we employ RL for dynamically allocating resources for the UAVs based on predicted spectrum holes.

II-B FL-based Spectrum Sensing

FL for spectrum sensing has lately gained popularity [27, 28, 29, 30, 31]. The authors in [29, 27] discuss the application of FL for spectrum sensing in cognitive radio environments, where a SU detects the spectrum holes in the PU’s spectrum band and utilize them opportunistically. However, the studies only considered a single PU with multiple SUs within the coverage of the PU. There exists a separate class of research that concentrate on interference management in a multi-cell wireless networks and incorporate over-the-air computation in FL [32, 33, 34]. For instance, the authors in [32] study the adverse effects of inter-cell interference on the uplink and downlink local model aggregates and global model updates and propose solutions to mitigate the interference. In contrast to the above mentioned works, in this paper, we cater for multiple PUs, multiple SUs, incorporate co-channel interference at the dataset generation level, and also consider wideband spectrum sensing.

Additionally, there exists few research works on FL for wireless systems that investigate how the convergence of the learning process is affected by the noisy transmissions between the clients and the server [35, 36]. However, these investigations often assume that the datasets are readily available at the clients and primarily focus on the specific discussions of model training plane, as illustrated in Fig. 2. Furthermore, these studies consider standard ML datasets, such as CIFAR-10, MNIST, Shakespeare [36], while using traditional federated averaging algorithms.

Yet, wireless datasets collected by multiple UAVs in a multi-cell environment are significantly complex and different compared to those standard datasets. For instance, data collected at one UAV location may encounter distinct wireless channels, varying numbers of propagation paths, and significantly different received signal power levels compared to the data collected at other locations. This variability underscores the need for tailored approaches to the model training within FL frameworks, particularly when dealing with datasets from real-world wireless environments. Hence, we propose a weighted averaging algorithm (pwFedAvg) that captures the disparities in the datasets captured at different UAV locations. Moreover, none of the previous works considered training the FL models with I/Q datasets. To address this gap, we propose a novel architecture to capture wireless datasets in a multi-cell environment, and integrate the dataset generation and the model training planes, to incorporate the effects of wireless datasets into the FL model training, as illustrated in Fig. 2.

III System Model and Problem Formulation

Refer to caption
Figure 3: A zoomed in version of dataset generation plane of a Multi-cell wireless network with multiple UAVs.

To model collaborative wideband spectrum sensing and scheduling, we consider a multi-cell wireless network that consists of a set of base-stations (BS) denoted by \mathcal{B}caligraphic_B (||=B𝐵|\mathcal{B}|=B| caligraphic_B | = italic_B), as shown in Fig. 3. In addition, we consider a set of UAVs denoted by 𝒦𝒦\mathcal{K}caligraphic_K (|𝒦|=K𝒦𝐾|\mathcal{K}|=K| caligraphic_K | = italic_K) in the system. To coordinate the collaborative spectrum sensing, fusion, and scheduling, we assume that each time slot is divided into four consecutive sub-slots: UAV resource request (treqsubscript𝑡𝑟𝑒𝑞t_{req}italic_t start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT), spectrum sensing (tssubscript𝑡𝑠t_{s}italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT), broadcasting to central server (tbsubscript𝑡𝑏t_{b}italic_t start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT), and channel access (tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT). Specifically, at the beginning of each time slot, the UAVs that require PU resources request the server for resource allocation. In the subsequent sub-slot of sensing (tssubscript𝑡𝑠t_{s}italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT), the UAVs perform spectrum sensing and broadcasts the sensed channel information in the following sub-slot (tbsubscript𝑡𝑏t_{b}italic_t start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT). The central server then applies fusion rules and assigns spectrum holes to the requesting UAVs. The UAVs then transmit on the allocated spectrum holes in the access sub-slot (tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT). In this paper, we focus on three main stages to develop our proposed framework: (i) FL-based training for wideband spectrum sensing, (ii) collaborative spectrum inference and fusion, and (iii) spectrum scheduling. To coordinate the above three stages, we assume a central server within the UTM ecosystem. Next, we describe these stages.

III-A Model Training, Spectrum Inference, and Scheduling Stages

FL-based model training for spectrum sensing. The UTM system architecture shown in Fig. 1 supports data exchange between multiple UAVs through the USS network. Such a hierarchical architecture makes it feasible to implement FL-based learning algorithms to identify spectrum holes. In this case, we may consider two deployment models within the UTM architecture. One model would be to have a server deployed by each UAV operator where multiple UAVs connected to the operator act as FL clients. The second model would have a server within the USS network that orchestrates multiple UAV operators. Thus, with several UAVs training local models, they exchange model parameters with the central server that is located either at the USS or UAV operator. The central server then aggregates the local model weights according to an aggregation algorithm and transmits the global model weights back to the UAVs to update their local models.

Collaborative spectrum inference and fusion. Due to the highly dynamic environment in which UAVs operate, it may not be feasible for all UAVs to achieve high prediction accuracy across all sub-channels. Therefore, we leverage collaborative spectrum inference by the UAVs, and perform fusion at the fusion module within the central server to increase the reliability of spectrum hole detection. In particular, each individual UAV captures the raw I/Q samples from over-the-air received signals and predicts the availability of spectrum holes across M𝑀Mitalic_M sub-channels using the FL-trained model. We assume that there is an associated spectrum inference cost for each UAV k𝑘kitalic_k involved in sensing at time slot t𝑡titalic_t. The spectrum inference cost is the energy consumed for sensing the spectrum and is proportional to the voltage VCCsubscript𝑉𝐶𝐶V_{CC}italic_V start_POSTSUBSCRIPT italic_C italic_C end_POSTSUBSCRIPT of the receiver, the system bandwidth W𝑊{W}italic_W, and the duration allotted for sensing (tssubscript𝑡𝑠t_{s}italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT[37]. Therefore, it is defined as SCk,m(t)=tsVCC2Wm𝑆subscript𝐶𝑘𝑚𝑡subscript𝑡𝑠superscriptsubscript𝑉𝐶𝐶2subscript𝑊𝑚SC_{k,m}(t)=t_{s}V_{CC}^{2}{{W}}_{m}italic_S italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = italic_t start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_C italic_C end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_W start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where Wmsubscript𝑊𝑚W_{m}italic_W start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the m𝑚mitalic_m-th sub-channel bandwidth. Upon completion of the spectrum inference phase, the UAV k𝑘kitalic_k has a predicted spectrum occupancy vector 𝒉^k(t)=[h^k,1(t),,h^k,M(t)]subscriptbold-^𝒉𝑘𝑡subscript^𝑘1𝑡subscript^𝑘𝑀𝑡\bm{\widehat{h}}_{k}(t)=[{\widehat{h}}_{k,1}(t),...,{\widehat{h}}_{k,M}(t)]overbold_^ start_ARG bold_italic_h end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = [ over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT ( italic_t ) , … , over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k , italic_M end_POSTSUBSCRIPT ( italic_t ) ] such that h^k,m(t)=0subscript^𝑘𝑚𝑡0{\widehat{h}}_{k,m}(t)=0over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = 0 if the m𝑚mitalic_m-th sub-channel is detected vacant at time t𝑡titalic_t, and h^k,m(t)=1subscript^𝑘𝑚𝑡1{\widehat{h}}_{k,m}(t)=1over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = 1 otherwise. This problem can be considered as a multi-label classification problem, and we leverage deep neural network (DNN) at each UAV that accepts raw I/Q samples 𝑹ksubscript𝑹𝑘\bm{R}_{k}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as inputs and outputs the predicted spectrum occupancy vector 𝒉^k(t)subscriptbold-^𝒉𝑘𝑡\bm{\widehat{h}}_{k}(t)overbold_^ start_ARG bold_italic_h end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ).

The central server receives multiple copies of spectrum holes detected by individual UAVs and applies fusion rules that results in aggregated prediction. In this paper, we use the n𝑛nitalic_n-out-of-K𝐾{K}italic_K fusion rule defined as follows:

zm(t)={0,if k𝒦𝟙{h^k,m(t)=0}n;1,otherwise,subscript𝑧𝑚𝑡cases0if subscript𝑘𝒦1subscript^𝑘𝑚𝑡0𝑛1otherwisez_{m}(t)=\begin{dcases}0,&\text{if }\sum_{k\in\mathcal{K}}\mathds{1}\{{% \widehat{h}}_{k,m}(t)=0\}\geq n;\\ 1,&\text{otherwise},\end{dcases}italic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) = { start_ROW start_CELL 0 , end_CELL start_CELL if ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT blackboard_1 { over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = 0 } ≥ italic_n ; end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL otherwise , end_CELL end_ROW (1)

where 𝟙{.}\mathds{1}\{.\}blackboard_1 { . } is an indicator function. In this case, 𝒛(t)=[z1(t),,zM(t)]𝒛𝑡subscript𝑧1𝑡subscript𝑧𝑀𝑡\bm{z}(t)=[z_{1}(t),...,z_{M}(t)]bold_italic_z ( italic_t ) = [ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , … , italic_z start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( italic_t ) ] is the fused prediction of all the M𝑀Mitalic_M sub-channels at the central server. Note that when n=1𝑛1n=1italic_n = 1, the n𝑛nitalic_n-out-of-K𝐾{K}italic_K rule is equivalent to the “OR” rule, and n=K𝑛𝐾n={K}italic_n = italic_K is the same as the “AND” rule.

Spectrum scheduling. Based on the aggregated fusion result provided by the fusion module, the central server then allocates sub-channels to the requesting UAVs. The UAVs then transmit data on the sub-channels allocated to them by the server in the next time step. The transmission energy consumption is denoted by ACk,m(t)𝐴subscript𝐶𝑘𝑚𝑡AC_{k,m}(t)italic_A italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ). The access cost is the energy consumed for data transmission and is defined as ACk,m(t)=taPtx𝐴subscript𝐶𝑘𝑚𝑡subscript𝑡𝑎subscript𝑃𝑡𝑥AC_{k,m}(t)=t_{a}P_{tx}italic_A italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT, where, Ptxsubscript𝑃𝑡𝑥P_{tx}italic_P start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT is the transmit power and tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the time allotted to transmission. Furthermore, the transmission utility is the amount of data transmitted on the allocated sub-channel, which is defined as follows:

Uk,m(t)=taWmlog2(1+SNRk,m(t)),subscript𝑈𝑘𝑚𝑡subscript𝑡𝑎subscript𝑊𝑚subscript21subscriptSNR𝑘𝑚𝑡U_{k,m}(t)=t_{a}{W}_{m}\log_{2}\left(1+\text{SNR}_{k,m}(t)\right),italic_U start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + SNR start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) ) , (2)

where SNRk,m(t)subscriptSNR𝑘𝑚𝑡\text{SNR}_{k,m}(t)SNR start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) denotes the signal-to-noise ratio for UAV k𝑘kitalic_k on sub-channel m𝑚mitalic_m.

We highlight that the UAVs transmit on those sub-channels that were detected vacant in the previous time step. Hence, spectrum collision occurs when the previously detected spectrum holes are no longer available at the current time step. We assume that the true state of sub-channel m𝑚mitalic_m is denoted by z¯m(t)subscript¯𝑧𝑚𝑡\bar{z}_{m}(t)over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ). To capture this, we define the spectrum access collision indicator ck,m(t)subscript𝑐𝑘𝑚𝑡c_{k,m}(t)italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) as follows:

ck,m(t)={1,if z¯m(t)=0andzm(t1)=0;1,if z¯m(t)0andzm(t1)=0;0,otherwise.subscript𝑐𝑘𝑚𝑡cases1if subscript¯𝑧𝑚𝑡0andsubscript𝑧𝑚𝑡101if subscript¯𝑧𝑚𝑡0andsubscript𝑧𝑚𝑡100otherwisec_{k,m}(t)=\begin{dcases}1,&\text{if }\bar{z}_{m}(t)=0~{}\text{and}~{}z_{m}(t-% 1)=0;\\ -1,&\text{if }\bar{z}_{m}(t)\neq 0~{}\text{and}~{}z_{m}(t-1)=0;\\ 0,&\text{otherwise}.\end{dcases}italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = { start_ROW start_CELL 1 , end_CELL start_CELL if over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) = 0 and italic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t - 1 ) = 0 ; end_CELL end_ROW start_ROW start_CELL - 1 , end_CELL start_CELL if over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) ≠ 0 and italic_z start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t - 1 ) = 0 ; end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise . end_CELL end_ROW (3)

Next, we formulate a joint spectrum sensing and scheduling optimization problem.

III-B Joint Spectrum Sensing and Scheduling Problem Formulation

Given the presented system model, we now introduce a joint spectrum sensing and scheduling problem to coordinate collaborative spectrum sensing and spectrum scheduling. We cast the problem as an EE optimization for the UAVs that opportunistically use the spectrum resources of the primary network. In particular, let yk,m(t)=1subscript𝑦𝑘𝑚𝑡1y_{k,m}(t)=1italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = 1 if UAV k𝑘kitalic_k is scheduled to use sub-channel m𝑚mitalic_m at time t𝑡titalic_t, and yk,m(t)=0subscript𝑦𝑘𝑚𝑡0y_{k,m}(t)=0italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = 0 otherwise. Given that the spectrum holes are allocated to the requesting SUs based on the sub-channel availability, we incorporate the sensing and access costs to maximize the overall EE of the system. Therefore, we have:

{max{yk,m(t)}𝔼{t,k,myk,m(t)ck,m(t)Uk,m(t)yk,m(t)ACk,m(t)+SCk,m(t)}subject to:myk,m(t)1,k=1,2,3,K,kyk,m(t)1,m=1,2,3,M,k,myk,m(t)M|𝒛(t)|,yk,m(t){0,1},casessubscriptmaxsubscript𝑦𝑘𝑚𝑡𝔼subscript𝑡𝑘𝑚subscript𝑦𝑘𝑚𝑡subscript𝑐𝑘𝑚𝑡subscript𝑈𝑘𝑚𝑡subscript𝑦𝑘𝑚𝑡𝐴subscript𝐶𝑘𝑚𝑡𝑆subscript𝐶𝑘𝑚𝑡subject to:formulae-sequencesubscript𝑚subscript𝑦𝑘𝑚𝑡1for-all𝑘123𝐾otherwiseformulae-sequencesubscript𝑘subscript𝑦𝑘𝑚𝑡1for-all𝑚123𝑀otherwisesubscript𝑘𝑚subscript𝑦𝑘𝑚𝑡𝑀𝒛𝑡otherwisesubscript𝑦𝑘𝑚𝑡01\begin{cases}\mathop{\mathrm{max}}\limits_{\{y_{k,m}(t)\}}&\mathbb{E}\big{\{}% \sum\limits_{t,k,m}\frac{\ y_{k,m}(t)\ c_{k,m}(t)\ U_{k,m}(t)}{y_{k,m}(t)AC_{k% ,m}(t)+SC_{k,m}(t)}\big{\}}\\ \text{subject to:}&\sum_{m}\ y_{k,m}(t)\ \leq 1,\ \forall\ k=1,2,3,\dots K,\\ &\sum_{k}\ y_{k,m}(t)\ \leq 1,\ \forall\ m=1,2,3,\dots M,\\ &\sum_{k,m}\ y_{k,m}(t)\ \leq\ M-|\bm{z}(t)|,\\ &y_{k,m}(t)\in\{0,1\},\end{cases}\vspace{-0.5mm}{ start_ROW start_CELL roman_max start_POSTSUBSCRIPT { italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) } end_POSTSUBSCRIPT end_CELL start_CELL blackboard_E { ∑ start_POSTSUBSCRIPT italic_t , italic_k , italic_m end_POSTSUBSCRIPT divide start_ARG italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) italic_U start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) italic_A italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) + italic_S italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) end_ARG } end_CELL end_ROW start_ROW start_CELL subject to: end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) ≤ 1 , ∀ italic_k = 1 , 2 , 3 , … italic_K , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) ≤ 1 , ∀ italic_m = 1 , 2 , 3 , … italic_M , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) ≤ italic_M - | bold_italic_z ( italic_t ) | , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) ∈ { 0 , 1 } , end_CELL end_ROW (4)

where Uk,m(t)subscript𝑈𝑘𝑚𝑡U_{k,m}(t)italic_U start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ), SCk,m(t)𝑆subscript𝐶𝑘𝑚𝑡SC_{k,m}(t)italic_S italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ), and ACk,m(t)𝐴subscript𝐶𝑘𝑚𝑡AC_{k,m}(t)italic_A italic_C start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) are, respectively, the amount of data transmitted, the sensing cost, and transmission cost by the SU k𝑘kitalic_k on sub-band m𝑚mitalic_m. The constraints guarantee that each UAV is scheduled to use at most one sub-channel, while the total number of scheduled UAVs is at most equal to the number of detected spectrum holes at time t𝑡titalic_t, which is M|𝒛(t)|𝑀𝒛𝑡M-|\bm{z}(t)|italic_M - | bold_italic_z ( italic_t ) |. In this paper, we use DNN at each UAV to detect spectrum holes that are fused to obtain 𝒛(t)𝒛𝑡\bm{z}(t)bold_italic_z ( italic_t ). To train the DNN models, next we present an FL-based approach for distributed training of spectrum sensing models.

IV Proposed FL-based Model Training for Spectrum Sensing

IV-A Dataset and DNN Models

We assume that each UAV receives signals from more than one BS due to the fact that they operate at higher altitudes, which increases the chances of signal reception from multiple BSs. Furthermore, we assume that the cell bandwidth W𝑊{W}italic_W is partitioned into M𝑀Mitalic_M orthogonal sub-channels. Then the total transmitted signal from a BS b across M𝑀Mitalic_M orthogonal sub-channels at any time t𝑡titalic_t can be represented by the superposition principle as follows:

𝒔b(t)=m=1MIb,m(t)𝒗b,m(t),b,formulae-sequencesubscript𝒔𝑏𝑡superscriptsubscript𝑚1𝑀subscript𝐼𝑏𝑚𝑡subscript𝒗𝑏𝑚𝑡for-all𝑏{\bm{s}}_{b}(t)=\sum_{m=1}^{M}I_{b,m}(t)~{}\bm{v}_{b,m}(t),~{}~{}~{}\forall b% \in\mathcal{B},bold_italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_b , italic_m end_POSTSUBSCRIPT ( italic_t ) bold_italic_v start_POSTSUBSCRIPT italic_b , italic_m end_POSTSUBSCRIPT ( italic_t ) , ∀ italic_b ∈ caligraphic_B , (5)

where Ib,m(t)=1subscript𝐼𝑏𝑚𝑡1I_{b,m}(t)=1italic_I start_POSTSUBSCRIPT italic_b , italic_m end_POSTSUBSCRIPT ( italic_t ) = 1 if the m𝑚mitalic_m-th sub-channel of BS b𝑏bitalic_b is occupied at time t𝑡titalic_t, and 00 otherwise. Moreover, 𝒗b,m(t)subscript𝒗𝑏𝑚𝑡\bm{v}_{b,m}(t)bold_italic_v start_POSTSUBSCRIPT italic_b , italic_m end_POSTSUBSCRIPT ( italic_t ) represents the waveform on the m𝑚mitalic_m-th sub-channel. As a result, 𝒔b(t)subscript𝒔𝑏𝑡{\bm{s}}_{b}(t)bold_italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_t ) is the transmitted baseband waveform in digital domain. Each UAV k𝑘kitalic_k then receives a wideband signal from multiple BSs in a multi-path propagation environment, which can be expressed as follows:

𝒓k(t)=b=1B𝒈k,b(t)𝒔b(t)+𝜼k(t),k𝒦,formulae-sequencesubscript𝒓𝑘𝑡superscriptsubscript𝑏1𝐵subscript𝒈𝑘𝑏𝑡subscript𝒔𝑏𝑡subscript𝜼𝑘𝑡for-all𝑘𝒦\bm{r}_{k}(t)=\sum_{b=1}^{{B}}\bm{g}_{k,b}(t)*{\bm{s}}_{b}(t)+\bm{\eta}_{k}(t)% ,~{}~{}~{}\forall k\in\mathcal{K},bold_italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_k , italic_b end_POSTSUBSCRIPT ( italic_t ) ∗ bold_italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_t ) + bold_italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , ∀ italic_k ∈ caligraphic_K , (6)

where 𝒈k,b(t)subscript𝒈𝑘𝑏𝑡\bm{g}_{k,b}(t)bold_italic_g start_POSTSUBSCRIPT italic_k , italic_b end_POSTSUBSCRIPT ( italic_t ) represents the multi-path channel between BS b𝑏bitalic_b and UAV k𝑘kitalic_k and 𝜼k(t)subscript𝜼𝑘𝑡\bm{\eta}_{k}(t)bold_italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) denotes the noise signal observed at UAV k𝑘kitalic_k. Therefore, the signal-to-noise ratio observed at UAV k𝑘kitalic_k can be written as follows:

SNRk(t)=b=1B𝒈k,b(t)𝒔b(t)2σk2(t),k𝒦,formulae-sequencesubscriptSNR𝑘𝑡superscriptnormsuperscriptsubscript𝑏1𝐵subscript𝒈𝑘𝑏𝑡subscript𝒔𝑏𝑡2superscriptsubscript𝜎𝑘2𝑡for-all𝑘𝒦\text{SNR}_{k}(t)=\frac{||\sum_{b=1}^{{B}}\bm{g}_{k,b}(t)*{\bm{s}}_{b}(t)||^{2% }}{\sigma_{k}^{2}(t)},~{}~{}~{}\forall k\in\mathcal{K},SNR start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG | | ∑ start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_k , italic_b end_POSTSUBSCRIPT ( italic_t ) ∗ bold_italic_s start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_t ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) end_ARG , ∀ italic_k ∈ caligraphic_K , (7)

where σk2(t)superscriptsubscript𝜎𝑘2𝑡\sigma_{k}^{2}(t)italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) represents the noise variance observed at UAV k𝑘kitalic_k at time t𝑡titalic_t. We use Pk(t)subscript𝑃𝑘𝑡P_{k}(t)italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) to denote the total power received in UAV k𝑘kitalic_k at time t𝑡titalic_t, which is directly proportional to the signal generated as defined in Eq. (5). We will use Pk(t)subscript𝑃𝑘𝑡P_{k}(t)italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) in proportional weight scaling for FL training.

To train the DNN models for predicting spectrum holes using raw I/Q samples, it has been shown that the characteristics of the wireless signal can be captured by observing only a portion of the signal waveform [9, 15]. Hence, from the received baseband signal 𝒓k(t)subscript𝒓𝑘𝑡{\bm{r}}_{k}(t)bold_italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ), we capture J𝐽Jitalic_J I/Q samples and store them locally. Therefore, the samples from baseband waveform collected at UAV k𝑘kitalic_k are represented as 𝑹k(t)subscript𝑹𝑘𝑡\bm{R}_{k}(t)bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) given as follows:

𝑹k(t)=𝑹~k(t)+𝜼~k(t),k𝒦,formulae-sequencesubscript𝑹𝑘𝑡subscriptbold-~𝑹𝑘𝑡subscriptbold-~𝜼𝑘𝑡for-all𝑘𝒦{\bm{R}}_{k}(t)=\bm{\tilde{R}}_{k}(t)+\bm{\tilde{\eta}}_{k}(t),~{}~{}~{}% \forall k\in\mathcal{K},bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) = overbold_~ start_ARG bold_italic_R end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) + overbold_~ start_ARG bold_italic_η end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) , ∀ italic_k ∈ caligraphic_K , (8)

where 𝑹~k(t)subscriptbold-~𝑹𝑘𝑡\bm{\tilde{R}}_{k}(t)overbold_~ start_ARG bold_italic_R end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) represents the J𝐽Jitalic_J I/Q samples from the first term in Eq. (6) and the second term represents J𝐽Jitalic_J complex Gaussian noise samples.

In addition to the I/Q samples, we also need to store the true labels for channel occupancy at each UAV k𝑘kitalic_k at time t𝑡titalic_t. The channel occupancy vector 𝒉k(t)subscript𝒉𝑘𝑡{\bm{h}}_{k}(t)bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) is an M𝑀Mitalic_M-dimensional vector, with each index indicating if a sub-channel m𝑚mitalic_m is occupied or free at time t𝑡titalic_t and can be computed as follows:

hk,m(t)={1,b=1BIb,m(t)1;0,Otherwise.subscript𝑘𝑚𝑡cases1superscriptsubscript𝑏1𝐵subscript𝐼𝑏𝑚𝑡10Otherwise.h_{k,m}(t)=\begin{dcases}1,&\sum_{b=1}^{B}~{}I_{b,m}(t)\geq 1;\\ 0,&\text{Otherwise.}\\ \end{dcases}italic_h start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) = { start_ROW start_CELL 1 , end_CELL start_CELL ∑ start_POSTSUBSCRIPT italic_b = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_b , italic_m end_POSTSUBSCRIPT ( italic_t ) ≥ 1 ; end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL Otherwise. end_CELL end_ROW (9)

Note that 𝒉k(t)subscript𝒉𝑘𝑡{\bm{h}}_{k}(t)bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) observed at time t𝑡titalic_t would be the true label corresponding to the wideband received signal 𝒓k(t)subscript𝒓𝑘𝑡\bm{r}_{k}(t)bold_italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ). The channel occupancy would remain unchanged for the stored J𝐽Jitalic_J I/Q samples 𝑹k(t)subscript𝑹𝑘𝑡{\bm{R}}_{k}(t)bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ). We store (𝑹k(t)subscript𝑹𝑘𝑡{\bm{R}}_{k}(t)bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ), 𝒉k(t)subscript𝒉𝑘𝑡{\bm{h}}_{k}(t)bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t )) as an input-output pair that will be used for the training of the FL model. For the sake of simplicity of notation, we represent the input-output pair as (𝑹ksubscript𝑹𝑘{\bm{R}}_{k}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒉ksubscript𝒉𝑘{\bm{h}}_{k}bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT). Note that for each M𝑀Mitalic_M-dimensional channel occupancy vector 𝒉ksubscript𝒉𝑘\bm{h}_{k}bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the input-output pair is treated as one data sample and the total I/Q dataset collected at UAV k𝑘kitalic_k is denoted as follows:

𝑫k={(𝑹k1,𝒉k1),(𝑹k2,𝒉k2),(𝑹k|𝑫k|,𝒉k|𝑫k|)}.subscript𝑫𝑘superscriptsubscript𝑹𝑘1superscriptsubscript𝒉𝑘1superscriptsubscript𝑹𝑘2superscriptsubscript𝒉𝑘2superscriptsubscript𝑹𝑘subscript𝑫𝑘superscriptsubscript𝒉𝑘subscript𝑫𝑘\bm{D}_{k}=\bigl{\{}({\bm{R}_{k}^{1}},\bm{h}_{k}^{1}),({\bm{R}_{k}^{2}},\bm{h}% _{k}^{2}),\dots({\bm{R}_{k}^{|\bm{D}_{k}|}},\bm{h}_{k}^{|\bm{D}_{k}|})\bigr{\}}.bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { ( bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) , ( bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , … ( bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT , bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ) } . (10)

where |𝑫k|subscript𝑫𝑘|\bm{D}_{k}|| bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | represents the total number of samples in the UAV k𝑘kitalic_k. These local datasets are used in FL-based training for spectrum hole detection.

In the FL setting, each UAV k𝑘kitalic_k trains a local wideband spectrum sensing model whose parameters are denoted by 𝝎𝒌subscript𝝎𝒌\bm{{\omega}_{k}}bold_italic_ω start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT. Hence, the primary objective of the local model is to find a mathematical function f𝑓fitalic_f(𝝎ksubscript𝝎𝑘\bm{{\omega}}_{k}bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝑹ksubscript𝑹𝑘\bm{R}_{k}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT), that maps input I/Q samples 𝑹ksubscript𝑹𝑘\bm{R}_{k}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to 𝒉ksubscript𝒉𝑘\bm{h}_{k}bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, i.e.,

f(𝝎k,𝑹k):𝑹k𝒉k.:𝑓subscript𝝎𝑘subscript𝑹𝑘subscript𝑹𝑘subscript𝒉𝑘f(\bm{{\omega}}_{k},\bm{R}_{k}):\bm{R}_{k}\rightarrow\bm{h}_{k}.italic_f ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) : bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT → bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (11)

To this end, using the raw I/Q samples (𝑹ksubscript𝑹𝑘\bm{R}_{k}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT) each UAV k𝑘kitalic_k trains a local model that detects vacant sub-channels, such that the local loss function Lk(𝝎)subscript𝐿𝑘𝝎L_{k}(\bm{{\omega}})italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω ) minimizes the error between the true labels 𝒉ksubscript𝒉𝑘\bm{h}_{k}bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the predicted labels 𝒉^𝒌subscriptbold-^𝒉𝒌\bm{\widehat{h}_{k}}overbold_^ start_ARG bold_italic_h end_ARG start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT, as defined below:

Lk(𝝎)1|𝑫k|i=1|𝑫k|l(f(𝝎k,𝑹ki);𝒉ki)),L_{k}(\bm{\omega})\triangleq\frac{1}{|\bm{D}_{k}|}\sum_{i=1}^{|\bm{D}_{k}|}l(f% (\bm{\omega}_{k},\bm{R}_{k}^{i});\bm{{h}}_{k}^{i})),italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω ) ≜ divide start_ARG 1 end_ARG start_ARG | bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT italic_l ( italic_f ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ; bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) ) , (12)

where l(.)l(.)italic_l ( . ) is the loss function for computing the prediction loss in the supervised machine learning setting. Furthermore, f(.)f(.)italic_f ( . ) represents the predicted label for the sample (𝑹kisuperscriptsubscript𝑹𝑘𝑖\bm{R}_{k}^{i}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, 𝒉kisuperscriptsubscript𝒉𝑘𝑖\bm{h}_{k}^{i}bold_italic_h start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT) and 𝝎ksubscript𝝎𝑘\bm{\omega}_{k}bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT represents the local model parameters during training.

For each input sequence 𝑹kisuperscriptsubscript𝑹𝑘𝑖\bm{R}_{k}^{i}bold_italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, we intend to obtain an M𝑀Mitalic_M-dimensional binary vector 𝒉^𝒌isuperscriptsubscriptbold-^𝒉𝒌𝑖\bm{\widehat{h}_{k}}^{i}overbold_^ start_ARG bold_italic_h end_ARG start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT that represents the predicted spectrum holes. This is an instance of a multi-label classification problem for which we employ DNN. The DNN architecture considered is shown in Fig. 4. The model accepts raw I/Q samples as input, which are then processed by two 1D convolutional (Conv1D) layers followed by a 1D maximum pooling layer (MaxPool1D). This layer pattern is repeated twice and one dense layer is followed by a sigmoid layer at the end.

Refer to caption
Figure 4: Multi-label classification using DNN.

IV-B Channel-Aware FL Aggregation Method

Given the system model, we now introduce a framework for wideband spectrum sensing where multiple UAVs collaboratively participate in the FL. In such a distributed learning environment, we aim to learn a global statistical model at the central server. Given that each UAV k𝑘kitalic_k trains a local model to identify the spectrum holes by minimizing the local loss function Lk(𝝎)subscript𝐿𝑘𝝎L_{k}(\bm{\omega})italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω ), in the context of FL, we would like to minimize the aggregated global loss function L(𝝎)𝐿𝝎L(\bm{\omega})italic_L ( bold_italic_ω ), as follows:

min𝝎{L(𝝎)k=1K|𝑫k|DLk(𝝎)},subscript𝝎𝐿𝝎superscriptsubscript𝑘1𝐾subscript𝑫𝑘𝐷subscript𝐿𝑘𝝎\mathop{\min}_{\bm{\omega}}\Bigg{\{}L(\bm{\omega})\triangleq\mathop{\sum}_{k=1% }^{{K}}\frac{|\bm{D}_{k}|}{{D}}~{}L_{k}(\bm{\omega})\Bigg{\}},roman_min start_POSTSUBSCRIPT bold_italic_ω end_POSTSUBSCRIPT { italic_L ( bold_italic_ω ) ≜ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG | bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_ARG start_ARG italic_D end_ARG italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω ) } , (13)

where D𝐷{D}italic_D = k=1K|𝑫k|superscriptsubscript𝑘1𝐾subscript𝑫𝑘\sum_{k=1}^{{K}}|\bm{D}_{k}|∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | is the total size of data samples across the UAVs.

To solve the global loss function Eq. (13), the authors in [38] proposed FedAvg, an iterative aggregation algorithm where the global model aggregates the local model gradients and redistributes the global model weights to the local models. However, when the datasets of each UAV k𝑘kitalic_k are of equal size, FedAvg assigns equal scaling factor of 1K1𝐾\frac{1}{{K}}divide start_ARG 1 end_ARG start_ARG italic_K end_ARG for all local gradients. However, in our considered multi-cell environment, the signal received at different UAV locations experiences different channel conditions, and the signal power received at different locations varies significantly. Hence, by assigning equal scaling weights for the local model gradients, the performance metrics at UAV locations with strong signal deteriorate. To compensate for this effect and improve performance at locations that receive better signal power, we propose a proportional weight scaling aggregation method for FL (pwFedAvg) that intuitively assigns smaller weights to UAVs with lower received signal power (i.e., poor channel conditions), and larger weights to those UAVs with higher received signal power.

Algorithm 1 Channel-Aware FL-Based Training
1:Initialize the global model parameters 𝝎𝝎\bm{\omega}bold_italic_ω and local model 𝝎𝒌subscript𝝎𝒌\bm{\omega_{k}}bold_italic_ω start_POSTSUBSCRIPT bold_italic_k end_POSTSUBSCRIPT, k𝒦for-all𝑘𝒦\forall k\in\mathcal{K}∀ italic_k ∈ caligraphic_K; T::𝑇absentT:italic_T : Communication rounds.
2:for t𝑡titalic_t in T𝑇Titalic_T do
3:     for UAV k𝑘kitalic_k in 𝒦𝒦\mathcal{K}caligraphic_K do
4:         Choose a batch of I/Q samples 𝝃ktsuperscriptsubscript𝝃𝑘𝑡\bm{\xi}_{k}^{t}bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT 𝑫kabsentsubscript𝑫𝑘\subseteq\bm{D}_{k}⊆ bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.
5:         Train the local model for E epochs.
6:     end for
7:     Send the gradients Lk(𝝎kt;𝝃kt)subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡superscriptsubscript𝝃𝑘𝑡{\nabla L_{k}(\bm{\omega}_{k}^{t};\bm{\xi}_{k}^{t})}∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) to the central server.
8:     Aggregate local gradients at the server per Eq. (14).
9:     Update the global model at the server per Eq. (15).
10:     Update local models using the global model, i.e.,      𝝎kt+1superscriptsubscript𝝎𝑘𝑡1\bm{\omega}_{k}^{t+1}bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = 𝝎t+1superscript𝝎𝑡1\bm{\omega}^{t+1}bold_italic_ω start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT.
11:end for

In particular, using the pwFedAvg, the central server aggregates the local gradients by assigning a weight proportional to their received signals as follows:

L(𝝎t)=k=1KαktαtLk(𝝎kt;𝝃kt),𝐿superscript𝝎𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡superscriptsubscript𝝃𝑘𝑡{\nabla L(\bm{\omega}^{t})}=\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}~% {}{\nabla L_{k}(\bm{\omega}_{k}^{t};\bm{\xi}_{k}^{t})},∇ italic_L ( bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , (14)

where αktsuperscriptsubscript𝛼𝑘𝑡\alpha_{k}^{t}italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = P¯ktsuperscriptsubscript¯𝑃𝑘𝑡\sqrt{\bar{P}_{k}^{t}}square-root start_ARG over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG and αtsuperscript𝛼𝑡\alpha^{t}italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = k=1KP¯ktsuperscriptsubscript𝑘1𝐾superscriptsubscript¯𝑃𝑘𝑡\sum_{k=1}^{{K}}{\sqrt{\bar{P}_{k}^{t}}}∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT square-root start_ARG over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG. Here, P¯ktsuperscriptsubscript¯𝑃𝑘𝑡{\bar{P}_{k}^{t}}over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT represents the average received signal power at UAV k for the batch of samples 𝝃ktsuperscriptsubscript𝝃𝑘𝑡\bm{\xi}_{k}^{t}bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Note that during the FL training process at time t𝑡titalic_t, 𝝎ktsuperscriptsubscript𝝎𝑘𝑡\bm{\omega}_{k}^{t}bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝝎tsuperscript𝝎𝑡\bm{\omega}^{t}bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT denote the local and global model weights, respectively. Upon computing the global model gradient based on Eq. (14), the global model weights are updated as follows:

𝝎t+1=𝝎tγtL(𝝎t),superscript𝝎𝑡1superscript𝝎𝑡superscript𝛾𝑡𝐿superscript𝝎𝑡\bm{\omega}^{t+1}=\bm{\omega}^{t}-\gamma^{t}~{}{\nabla L(\bm{\omega}^{t})},bold_italic_ω start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∇ italic_L ( bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , (15)

where γtsuperscript𝛾𝑡\gamma^{t}italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the learning rate of the global model. The updated global model weights are sent to the clients to update their local model weights. The overall process of FL-based model training using the pwFedAvg approach is outlined in Algorithm 1.

IV-C Convergence Analysis

In this section, we present the convergence analysis for the pwFedAvg algorithm. To this end, we first introduce the following assumptions.

Assumption 1: The loss function Lk(.)L_{k}(.)italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( . ) is L𝐿Litalic_L-smooth, i.e., for all 𝐮𝐮\bm{u}bold_italic_u and 𝐯𝐯\bm{v}bold_italic_v,

Lk(𝒖)Lk(𝒗)(𝒖𝒗)TLk(𝒗)+L2𝒖𝒗22.subscript𝐿𝑘𝒖subscript𝐿𝑘𝒗superscript𝒖𝒗𝑇subscript𝐿𝑘𝒗𝐿2superscriptsubscriptnorm𝒖𝒗22L_{k}(\bm{u})-L_{k}(\bm{v})~{}\leq~{}(\bm{u}-\bm{v})^{{T}}~{}\nabla L_{k}(\bm{% v})+\frac{L}{2}~{}||\bm{u}-\bm{v}||_{2}^{2}.italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_u ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v ) ≤ ( bold_italic_u - bold_italic_v ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v ) + divide start_ARG italic_L end_ARG start_ARG 2 end_ARG | | bold_italic_u - bold_italic_v | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (16)

Assumption 2: For each k𝑘kitalic_k, Lk(.)L_{k}(.)italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( . ) is β𝛽\betaitalic_β-strongly convex, i.e., for all 𝐮𝐮\bm{u}bold_italic_u and 𝐯𝐯\bm{v}bold_italic_v,

Lk(𝒖)Lk(𝒗)(𝒖𝒗)TLk(𝒗)+β2𝒖𝒗22.subscript𝐿𝑘𝒖subscript𝐿𝑘𝒗superscript𝒖𝒗𝑇subscript𝐿𝑘𝒗𝛽2superscriptsubscriptnorm𝒖𝒗22L_{k}(\bm{u})-L_{k}(\bm{v})~{}\geq~{}(\bm{u}-\bm{v})^{{T}}~{}\nabla L_{k}(\bm{% v})+\frac{\beta}{2}~{}||\bm{u}-\bm{v}||_{2}^{2}.italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_u ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v ) ≥ ( bold_italic_u - bold_italic_v ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_v ) + divide start_ARG italic_β end_ARG start_ARG 2 end_ARG | | bold_italic_u - bold_italic_v | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (17)

Assumption 3: Let 𝝃ksubscript𝝃𝑘\bm{\xi}_{k}bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be the data samples chosen from 𝑫ksubscript𝑫𝑘\bm{D}_{k}bold_italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The variance of the stochastic gradients for each UAV k𝑘kitalic_k is bounded i.e,

𝔼[Lk(𝝎k;ξk)Lk(𝝎k)22]ρk2k𝒦.𝔼delimited-[]superscriptsubscriptnormsubscript𝐿𝑘subscript𝝎𝑘subscript𝜉𝑘subscript𝐿𝑘subscript𝝎𝑘22superscriptsubscript𝜌𝑘2for-all𝑘𝒦\mathbb{E}\big{[}||\nabla L_{k}(\bm{\omega}_{k};\xi_{k})-\nabla L_{k}(\bm{% \omega}_{k})||_{2}^{2}\big{]}~{}\leq\rho_{k}^{2}~{}~{}\forall~{}k\in\mathcal{K}.blackboard_E [ | | ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∀ italic_k ∈ caligraphic_K . (18)

Next, similar to [39, 40], we define two virtual sequences to denote the aggregated full gradient and stochastic gradient respectively, as follows:

𝒂¯t=k=1KαktαtLk(𝝎kt);𝒂t=k=1KαktαtLk(𝝎kt;𝝃kt).formulae-sequencesuperscript¯𝒂𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡superscript𝒂𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡superscriptsubscript𝝃𝑘𝑡\bar{\bm{a}}^{t}=\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}~{}{\nabla L% _{k}(\bm{\omega}_{k}^{t})};~{}~{}\bm{a}^{t}=\sum_{k=1}^{{K}}\frac{\alpha_{k}^{% t}}{\alpha^{t}}~{}{\nabla L_{k}(\bm{\omega}_{k}^{t};\bm{\xi}_{k}^{t})}.over¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ; bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) . (19)

We also assume that 𝔼[𝒂t]=𝒂¯t𝔼delimited-[]superscript𝒂𝑡superscript¯𝒂𝑡\mathbb{E}[\bm{a}^{t}]=\bar{\bm{a}}^{t}blackboard_E [ bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = over¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Given these assumptions, we have the following lemmas.

Lemma 1: Let 𝝎superscript𝝎\bm{\omega^{*}}bold_italic_ω start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT = [ω1superscriptsubscript𝜔1\omega_{1}^{*}italic_ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, ω2superscriptsubscript𝜔2\omega_{2}^{*}italic_ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, …, ωdsuperscriptsubscript𝜔𝑑\omega_{d}^{*}italic_ω start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT] be the weights of optimal global model, and 𝝎ksuperscriptsubscript𝝎𝑘\bm{\omega}_{k}^{*}bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ωk,1superscriptsubscript𝜔𝑘1\omega_{k,1}^{*}italic_ω start_POSTSUBSCRIPT italic_k , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, ωk,2superscriptsubscript𝜔𝑘2\omega_{k,2}^{*}italic_ω start_POSTSUBSCRIPT italic_k , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, …, ωk,dsuperscriptsubscript𝜔𝑘𝑑\omega_{k,d}^{*}italic_ω start_POSTSUBSCRIPT italic_k , italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT] be the weights of optimal local model of UAV k𝑘kitalic_k. Here, d𝑑ditalic_d represents the dimensions of the model weights. Then for each UAV k𝑘kitalic_k, the upper bound of the gap between the optimal global and local models can be shown as,

Lk(𝝎)Lk(𝝎k)τ,subscript𝐿𝑘superscript𝝎subscript𝐿𝑘superscriptsubscript𝝎𝑘𝜏L_{k}(\bm{\omega}^{*})-L_{k}(\bm{\omega}_{k}^{*})\leq\tau,\\ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≤ italic_τ , (20)

where τ=maxk{Ld2(maxi{|ωiωk,i|})2}𝜏subscript𝑘𝐿𝑑2superscriptsubscript𝑖superscriptsubscript𝜔𝑖superscriptsubscript𝜔𝑘𝑖2\tau=\max\limits_{k}\{{\frac{Ld}{2}}(\max\limits_{i}\{|\omega_{i}^{*}-\omega_{% k,i}^{*}|\})^{2}\}italic_τ = roman_max start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT { divide start_ARG italic_L italic_d end_ARG start_ARG 2 end_ARG ( roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { | italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_ω start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | } ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }.

Lemma 2: The aggregated gradient is upper bounded as follows:

𝔼(𝒂t𝒂¯t22)k=1K(αktαt)2ρk2.𝔼superscriptsubscriptnormsuperscript𝒂𝑡superscriptbold-¯𝒂𝑡22superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2superscriptsubscript𝜌𝑘2\mathbb{E}\big{(}||\bm{a}^{t}-\bm{\bar{a}}^{t}||_{2}^{2}\big{)}\leq\sum_{k=1}^% {{K}}\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\rho_{k}^{2}.blackboard_E ( | | bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (21)

Lemma 3: Let the constants κ𝜅\kappaitalic_κ and γtsuperscript𝛾𝑡\gamma^{t}italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT satisfy 1κγt1𝜅superscript𝛾𝑡\frac{1}{\kappa}\leq\gamma^{t}divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG ≤ italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Then, we can show that:

𝔼[𝝎t+1𝝎22](1βγt)𝝎t𝝎22+(γt)2Gt,𝔼delimited-[]superscriptsubscriptnormsuperscript𝝎𝑡1superscript𝝎221𝛽superscript𝛾𝑡superscriptsubscriptnormsuperscript𝝎𝑡superscript𝝎22superscriptsuperscript𝛾𝑡2superscript𝐺𝑡\mathbb{E}\big{[}||\bm{\omega}^{t+1}-\bm{\omega}^{*}||_{2}^{2}\big{]}\leq(1-% \beta\gamma^{t})||\bm{\omega}^{t}-\bm{\omega}^{*}||_{2}^{2}+(\gamma^{t})^{2}G^% {t},blackboard_E [ | | bold_italic_ω start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ≤ ( 1 - italic_β italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , (22)

where Gt=2κτ+k=1K(αktαt)2[ρk22Lτ]superscript𝐺𝑡2𝜅𝜏superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2delimited-[]superscriptsubscript𝜌𝑘22𝐿𝜏G^{t}=2\kappa\tau+\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}% \bigg{)}^{2}\big{[}\rho_{k}^{2}-2L\tau\big{]}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 2 italic_κ italic_τ + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_L italic_τ ].

Theorem 1: Given that κγt=1βt+L𝜅superscript𝛾𝑡1𝛽𝑡𝐿\kappa\leq\gamma^{t}=\frac{1}{\beta t+L}italic_κ ≤ italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_β italic_t + italic_L end_ARG, the optimality gap for the proposed pwFedAvg satisfies the following:

𝔼[L(𝝎)T]LLβT+2L[2Gβ+L𝝎0𝝎22],𝔼delimited-[]𝐿superscript𝝎𝑇superscript𝐿𝐿𝛽𝑇2𝐿delimited-[]2𝐺𝛽𝐿superscriptsubscriptnormsuperscript𝝎0superscript𝝎22\begin{split}\mathbb{E}[L(\bm{\omega})^{T}]-L^{*}\leq\frac{L}{\beta T+2L}\bigg% {[}\frac{2G}{\beta}+L||\bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{2}\bigg{]}\end{% split},start_ROW start_CELL blackboard_E [ italic_L ( bold_italic_ω ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ] - italic_L start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ divide start_ARG italic_L end_ARG start_ARG italic_β italic_T + 2 italic_L end_ARG [ divide start_ARG 2 italic_G end_ARG start_ARG italic_β end_ARG + italic_L | | bold_italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW , (23)

where G=maxt{Gt}𝐺subscript𝑡superscript𝐺𝑡G=\max_{t}\{G^{t}\}italic_G = roman_max start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT { italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } and Gtsuperscript𝐺𝑡G^{t}italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is as defined in Lemma 3333. Therefore, we show that the convergence of our proposed method is 𝒪(1T)𝒪1𝑇\mathcal{O}(\frac{1}{T})caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ). All of the proofs are presented in Section IX.

V Dynamic Spectrum Scheduling Using RL

Once the DNN models are trained using our proposed pwFedAvg, they output their spectrum hole predictions, which are then fused at the fusion module, as described in Section III. The identified spectrum holes will be allocated to requesting UAVs. The integrated system model of collaborative spectrum sensing and scheduling is shown in Fig. 5, with the overall algorithm described in Algorithm 2.

Refer to caption
Figure 5: Joint spectrum inference and spectrum scheduling.

For spectrum scheduling, we note that the optimization problem in Eq. (4) is a fractional integer programming problem, which is NP-hard in general. If we consider maximizing the numerator alone, which is the total utility U(t)𝑈𝑡U(t)italic_U ( italic_t ) of the UAVs over all sub-channels, the problem will become an integer programming problem. In this case, the utility would depend on the spectrum usage pattern by the PUs, which is captured by ck,m(t)subscript𝑐𝑘𝑚𝑡c_{k,m}(t)italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) as well as the channel condition between the BSs and UAVs that determine the amounts of transmitted data Uk,m(t)subscript𝑈𝑘𝑚𝑡U_{k,m}(t)italic_U start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ). To tackle this utility optimization problem, we model the channel occupancy z¯m(t)subscript¯𝑧𝑚𝑡\bar{z}_{m}(t)over¯ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) as a Markov process, enabling us to use an MDP formulation to solve this problem [41] and develop a dynamic spectrum scheduling for the SUs.

As we assume that there exist M𝑀Mitalic_M sub-channels in the system, each sub-channel can be modeled as an independent two-state Markov chain. The transition probability function 𝑷𝑷\bm{P}bold_italic_P can then be viewed as a set of transition probability matrices {𝑷msubscript𝑷𝑚\bm{P}_{m}bold_italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT} for each sub-channel that captures the randomness in the assumed multi-user multi-channel environment. Therefore, we formulate the total utility of the SUs as a traditional MDP governed by the tuple (𝒮𝒮\mathcal{S}caligraphic_S, 𝒜𝒜\mathcal{A}caligraphic_A, {𝑷msubscript𝑷𝑚\bm{P}_{m}bold_italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT}, U, γ𝛾\gammaitalic_γ), consisting of the set of states 𝒮𝒮\mathcal{S}caligraphic_S, set of actions 𝒜𝒜\mathcal{A}caligraphic_A, a transition probability function {𝑷msubscript𝑷𝑚\bm{P}_{m}bold_italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT}, a reward function U𝑈Uitalic_U, and a discount factor γ𝛾\gammaitalic_γ. To solve an MDP using RL, an agent learns to make decisions in an uncertain environment by maximizing a cumulative reward over a sequence of actions. Specifically, the agent interacts with an environment by taking actions that transition the system from one state to another, and the agent receives a reward that is commensurate with the merit of the action. The discount factor determines the relative importance of immediate and future rewards.

Algorithm 2 Collaborative Spectrum Sensing and Scheduling
1:  
2:Phase 1 – Spectrum Sensing and Broadcasting
3:  
4:for each UAV in 𝒦𝒦\mathcal{K}caligraphic_K do
5:     Capture I/Q samples from over the air signal.
6:     Feed I/Q samples to the pre-trained ML model that predicts the spectrum holes 𝒉^bold-^𝒉\bm{\widehat{h}}overbold_^ start_ARG bold_italic_h end_ARG.
7:     Broadcast the individual spectrum hole observations 𝒉^(t)bold-^𝒉𝑡\bm{\widehat{h}}(t)overbold_^ start_ARG bold_italic_h end_ARG ( italic_t ) {0,1}1×Mabsentsuperscript011𝑀\in\{0,1\}^{1\times M}∈ { 0 , 1 } start_POSTSUPERSCRIPT 1 × italic_M end_POSTSUPERSCRIPT to the central server.
8:end for
9:  
10:Phase 2 – Spectrum Fusion and Scheduling
11:  
12:Apply fusion rule in Eq. (1) to predict spectrum holes 𝒛(t)𝒛𝑡\bm{z}(t)bold_italic_z ( italic_t ).
13:Allocate a single spectrum hole to each requesting UAV using pre-trained RL algorithm, yk,m(t)subscript𝑦𝑘𝑚𝑡y_{k,m}(t)italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ), such that the constraints in Eq. (4) are satisfied.
14:UAVs are scheduled to transmit on the sub-channel allocated in the previous allocated time slot.
15:Given the spectrum allocation yk,m(t)subscript𝑦𝑘𝑚𝑡y_{k,m}(t)italic_y start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ) and spectrum access collision indicator ck,m(t)subscript𝑐𝑘𝑚𝑡c_{k,m}(t)italic_c start_POSTSUBSCRIPT italic_k , italic_m end_POSTSUBSCRIPT ( italic_t ), the total utility U(t)𝑈𝑡U(t)italic_U ( italic_t ) can be computed using Eq. (4).

DDQN-Based Spectrum Allocation. One of the most popular RL methods is Q-learning [41]. The classical Q-learning is table-based, i.e. the values of the Q-function are stored in a table of size |𝒮|𝒮|\mathcal{S}|| caligraphic_S |×|𝒜|𝒜|\mathcal{A}|| caligraphic_A |. However, when the size of the state and action spaces is large, the complexity of tabular Q-learning becomes cumbersome. For example, with M=16𝑀16M=16italic_M = 16 sub-channels, the Q-table will be of size 65,537×17655371765,537\times 1765 , 537 × 17. To address the complexity issue, we adapt the deep Q-learning approach in [42] to approximate the Q-function by a neural network Q𝜽subscript𝑄𝜽Q_{\bm{\theta}}italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT called DDQN and train its weights 𝜽𝜽\bm{\theta}bold_italic_θ using experience replay. As the name suggests, we have two networks when using DDQN where, Q𝜽subscript𝑄𝜽Q_{\bm{\theta}}italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT is called the primary network and Q𝜽subscriptsuperscript𝑄𝜽Q^{\prime}_{\bm{\theta}}italic_Q start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT is called the target network and the weights of the target network are updated periodically. In the original DDQN, the weights of target network are directly copied from the primary network every few episodes. In DDQN-soft, the target networks are updated using polyak averaging to smoothly update the weights (“soft-update”) [42].

The input to the DDQN agent is a state s of size 1×M1𝑀1\times M1 × italic_M. The output of the network is a vector of size 1×(M+1)1𝑀11\times(M+1)1 × ( italic_M + 1 ) that contains the values of the Q-function with respect to state s and each of the M+1𝑀1M+1italic_M + 1 actions. In all hidden layers, we use the rectified linear unit (ReLU) as an activation function. Given the neural networks input-output dimensions, the overall DDQN architecture and its interaction with the environment are shown in Fig. 6. As shown, the major components are primary network, target network, experience replay, and the interaction with the environment to select an action.

To train the DDQN agent, the experiences are initially stored in the memory using ϵitalic-ϵ\epsilonitalic_ϵ-greedy policy, that is, for a state stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, an action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is taken randomly with probability ϵtsubscriptitalic-ϵ𝑡\epsilon_{t}italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, or taken greedily with probability 1111-ϵtsubscriptitalic-ϵ𝑡\epsilon_{t}italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from the current state of the DDQN network. Then, when we have sufficient samples in the memory a mini-batch of 𝑿𝑿\bm{X}bold_italic_X experiences {(si,ai,ri,si)}i𝑿tsubscriptsubscripts𝑖subscripta𝑖subscript𝑟𝑖superscriptsubscripts𝑖𝑖subscript𝑿𝑡\{(\textbf{s}_{i},\textbf{a}_{i},r_{i},\textbf{s}_{i}^{\prime})\}_{i}\in\bm{X}% _{t}{ ( s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is randomly sampled from the memory for every time step t to train the neural networks. Here, 𝑿tsubscript𝑿𝑡\bm{X}_{t}bold_italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the set of experiences currently available in the memory. Based on the selected mini-batch, we compute and update the weights 𝜽𝜽\bm{\theta}bold_italic_θ of the primary network Q𝜽subscript𝑄𝜽Q_{\bm{\theta}}italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT that minimize the loss function Lt(𝜽)subscript𝐿𝑡𝜽L_{t}({\bm{\theta}})italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_θ ). Fig. 6 captures the overall DDQN architecture and the interaction of the agents with the environment [41, 42].

Refer to caption
Figure 6: DDQN for spectrum allocation.

VI I/Q DATASET GENERATION

Utilizing data-driven machine learning techniques for wide-band spectrum sensing requires substantial amounts of spectrum data. While obtaining raw I/Q signals over the air using physical hardware is the ideal scenario, the complexity of coordinating multiple UAVs in a specific environment for collaborative sensing poses significant challenges in achieving this objective. Therefore, we resort to MATLAB’s LTE toolbox to create the I/Q samples and employ ray-tracing methods to emulate the channel for generating synthetic datasets that closely mimic the data collection process through experimentation. The entire process of generating synthetic datasets is outlined below.

Dataset Generation Methodology. As shown in Fig. 7, we assume a multi-cell environment consisting of three neighboring cells with base-stations at the center of the cells. Without loss of generality, we simulate for one specific LTE band in the Kansas city area and obtain the location of the base-stations from cellmapper [43], an open crowd sourced cellular tower and coverage map** service. Furthermore, we assume there are three UAVs in the network operating at an altitude of 90909090 meters. In this scenario, the base-stations act as the transmitter sites and the UAV locations as the receiver sites that collect the I/Q samples for wide-band spectrum sensing.

Refer to caption
Figure 7: Ray-tracing simulation setup used for dataset generation. The plot illustrates the received signal paths at UAV location 1 from all three base-stations.

Another important aspect in any wireless network is the wireless channel modeling. We use ray-tracing methods to incorporate the channel between the BS and the UAV. We incorporate both reflection and diffraction settings in ray-tracing to simulate a near real-world environment. This is in contrast to using channel models, which consider probabilistic channel models for line-of-sight (LoS) and non-LoS channel conditions. Since we use ray-tracing, we have the flexibility to incorporate different aspects of the environment like buildings and vegetation, permittivity and permeability of the materials, which further enhances the channel model.

To mimic the real-world scenario for conducting ray-tracing experiments, we use OpenStreetMap, which is a free and open geographical database [44]. The evaluation area is a 3333 km ×\times× 3333 km area with buildings and vegetation. We utilize MATLAB’s ray-tracer to emulate the wireless channel between considered UAVs and base-station locations. The ray-tracing simulation setup is outlined in Table I.

Parameter Description
Location Kansas City
Area 3333 km x 3333 km
Frequency 1980198019801980 MHz
Number of base-stations 3333
Number of UAVs 3333
UAV Altitude 90909090 m
Max. Number of Reflections 5555
Max. Number of Diffractions 2222
TABLE I: Ray-tracing simulation setup.

It is essential to highlight that this setup can be seamlessly adapted to accommodate varying numbers of LTE cells and UAVs as long as we are able to obtain the 3333D environment and load it into MATLAB. In our considered scenario, we assume the UAVs are stationary and are hovering in a fixed position. However, this simulation can be extended to incorporate the UAV flight trajectories by running additional ray-tracing experiments for each UAV way-point location in the UAV trajectory.

The MATLAB’s ray-tracing toolbox effectively emulates the channel. Next, we utilize MATLAB’s LTE Toolbox to generate the LTE waveform to extract the I/Q samples. For generating the LTE waveform, we assume that the entire cell bandwidth of 10101010 MHz (50505050 resource blocks) is split into 16161616 orthogonal sub-channels, each of size 3333 resource blocks. Typically, a base-station has the flexibility to assign either a single sub-channel or multiple sub-channels to a PU for transmitting user-specific data on the downlink shared channel. Additionally, various multiple access techniques can be employed to transmit data to different PUs in different time slots. However, during our dataset generation process, we do not consider primary user locations and how the base-station allocates user specific data to different PUs. At any given point in time, we take a snapshot of the entire cell bandwidth and identify which spectrum bands are occupied. Furthermore, when creating the downlink waveform, we omit the generation of UE specific reference signals to avoid mixing user-specific data with broadcast channels. Instead, we identify the appropriate indices and embed the LTE data samples into the downlink shared channel to generate the LTE waveform.

Refer to caption
Figure 8: M independent Binary Markov chains. In our dataset generation, we set M = 16.

Modelling the channel occupancy. In our assumed scenario, each cell bandwidth is divided into 16161616 sub-channels such that a binary flag 1111 indicates the sub-channel is allocated and 00 represents the sub-channel is not allocated. Hence, each 16161616-bit binary combination serves as a distinct true label for the channel occupancy. As a result, the base-station has the capability to generate 216superscript2162^{16}2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT unique labels, spanning from no sub-channel allocation to a fully busy cell site. For instance, in Fig. 9 we show the spectrogram of one channel realization, where 6666 sub-channels are occupied out of 16161616 sub-channels. Furthermore, we model the temporal dynamics of each sub-channel using a binary Markov chain, as shown in Fig. 8. Thus, the channel occupancy for each sub-channel m𝑚mitalic_m evolves according to a transition probability matrix 𝑷msubscript𝑷𝑚\bm{P}_{m}bold_italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. In this paper, we consider different transition probabilities for each sub-channel. Thus, the overall transition probabilities across M𝑀Mitalic_M sub-channels are denoted as follows:

𝑷={[p001p011p101p111],,[p00Mp01Mp10Mp11M]}.𝑷matrixsuperscriptsubscript𝑝001superscriptsubscript𝑝011superscriptsubscript𝑝101superscriptsubscript𝑝111matrixsuperscriptsubscript𝑝00𝑀superscriptsubscript𝑝01𝑀superscriptsubscript𝑝10𝑀superscriptsubscript𝑝11𝑀\bm{P}=\bigg{\{}\begin{bmatrix}p_{00}^{1}&p_{01}^{1}\\ p_{10}^{1}&p_{11}^{1}\end{bmatrix},\cdots\cdots,\begin{bmatrix}p_{00}^{M}&p_{0% 1}^{M}\\ p_{10}^{M}&p_{11}^{M}\end{bmatrix}\bigg{\}}.bold_italic_P = { [ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL start_CELL italic_p start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_p start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL start_CELL italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] , ⋯ ⋯ , [ start_ARG start_ROW start_CELL italic_p start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT end_CELL start_CELL italic_p start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_p start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT end_CELL start_CELL italic_p start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] } . (24)
Refer to caption
Figure 9: Spectrogram for a transmission on 6 sub-channels out of 16 sub channels.

Further, we assume that all SUs are capable of receiving the waveform from all the base-stations, whose channel is modelled by the ray-tracing. In addition to the reflected paths received from the corresponding base-station in which the UAV is present, we also receive the waveform from the neighboring base-stations as shown in Fig. 7. The received signal 𝒓k(t)subscript𝒓𝑘𝑡\bm{r}_{k}(t)bold_italic_r start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) at each UAV k𝑘kitalic_k can be written as superposition of wideband signals received from all base-stations as shown in Eq. (6). Furthermore, we vary the noise variance σk2(t)superscriptsubscript𝜎𝑘2𝑡\sigma_{k}^{2}(t)italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) at UAV k𝑘kitalic_k such that the effective SNR varies from 10dB10dB-10~{}\text{dB}- 10 dB to 20dB20dB20~{}\text{dB}20 dB in steps of 10dB10dB10~{}\text{dB}10 dB. For instance, in Fig. 10 we show the power spectrum of the received signal at UAV location 1111. Ideally, we would like to capture the whole LTE frame corresponding to 10101010 MHz LTE waveform. However, we only capture 32323232 I/Q samples that provides a good trade-off between the computational complexity and the performance. In this context, for each SNR, we collect approximately 6.86.86.86.8 million I/Q samples. When considering all SNR levels and the UAV locations together, the total generated dataset is more than 80808080 million I/Q samples, which will be publicly released along with all source codes. Generation of large-scale spectrum datasets for dynamic UAV environments enables us to evaluate the proposed data-driven collaborative wideband spectrum sensing and sharing, as described next.

Refer to caption
Figure 10: Power spectrum observed at UAV 1 for transmission on 6 sub-channels out of 16 sub channels, with channel impairments and for snr=-10dB; Vacant sub-channels are indicated as 0 and occupied sub-channels by 1.

VII Numerical Results

Refer to caption
(a) CL: location 1.
Refer to caption
(b) CL: location 2.
Refer to caption
(c) CL: location 3.
Figure 11:  Performance metrics obtained at UAV locations 1, 2, 3 in CL. The performance metrics improve as the SNR observed by the UAV increases.
Refer to caption
(a) LL: location 1.
Refer to caption
(b) LL: location 2.
Refer to caption
(c) LL: location 3.
Figure 12:  Performance metrics obtained at UAV locations 1, 2 and 3 for testing their respective local models. The performance metrics improve as the SNR observed by the UAV increases.

In this section, we first present our target performance metrics, followed by a discussion of the results on spectrum sensing for different ML configurations. Next, we present the results of collaborative spectrum inference and fusion followed by spectrum access using RL.

Performance metrics. As mentioned earlier, detecting spectrum holes aligns with the framework of a classical multi-label classification problem, where each sub-channel represents a label. We utilize Precision, Recall, and F1-score as metrics to evaluate the classifier’s performance for each sub-channel by constructing a confusion matrix. Although we can calculate these performance metrics for each sub-channel individually, it would be advantageous to have an average performance assessment across all 16161616 sub-channels [45]. In this paper, we consider the micro-averages for Precision, Recall and F1-score to concretely capture the sensing performance across the 16161616 sub-channels as follows:

Precision=m=1MTP(m)m=1MTP(m)+FP(m),Precisionsuperscriptsubscript𝑚1𝑀TP(m)superscriptsubscript𝑚1𝑀TP(m)FP(m)\text{Precision}=\frac{\sum_{m=1}^{M}\text{TP(m)}}{\sum_{m=1}^{M}\text{TP(m)}+% \text{FP(m)}},Precision = divide start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT TP(m) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT TP(m) + FP(m) end_ARG , (25)
Recall=m=1MTP(m)m=1MTP(m)+FN(m),Recallsuperscriptsubscript𝑚1𝑀TP(m)superscriptsubscript𝑚1𝑀TP(m)FN(m)\text{Recall}=\frac{\sum_{m=1}^{M}\text{TP(m)}}{\sum_{m=1}^{M}\text{TP(m)}+% \text{FN(m)}},Recall = divide start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT TP(m) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT TP(m) + FN(m) end_ARG , (26)
F1-score=2(Precision . Recall)Precision+Recall.F1-score2(Precision . Recall)PrecisionRecall\text{F1-score}=\frac{\text{2(Precision . Recall)}}{\text{Precision}+\text{% Recall}}.F1-score = divide start_ARG 2(Precision . Recall) end_ARG start_ARG Precision + Recall end_ARG . (27)

where TP, FN, FP accounts for the number of true positives, false negatives, and false positives, respectively.

Refer to caption
(a) LL model 1 tested at location 1.
Refer to caption
(b) LL model 1 tested at location 2.
Refer to caption
(c) LL model 1 tested at location 3.
Figure 13:  Performance metrics obtained for testing UAV model 1 at UAV locations 1, 2 and 3. The local model trained at UAV 1 does not generalize for locations 2 and 3.
Refer to caption
(a) FL-FedAvg: location 1.
Refer to caption
(b) FL-FedAvg: location 2.
Refer to caption
(c) FL-FedAvg: location 3.
Figure 14:  Performance metrics obtained at UAV locations 1, 2 and 3 in FL using FedAvg. The performance metrics for locations 2 and 3 improve, but UAV location 1 remain almost the same.
Refer to caption
(a) FL-pwFedAvg: location 1.
Refer to caption
(b) FL-pwFedAvg: location 2.
Refer to caption
(c) FL-pwFedAvg: location 3.
Figure 15:  Performance metrics obtained at UAV location 1, 2, 3 in FL using pwFedAvg. The performance metrics for locations 2 and 3 improve as snr increases and are higher than FedAvg.

VII-A Model Training with Distributed UAVs

As previously stated, we model wideband spectrum sensing, aiming to identify spectrum holes from the given I/Q samples as inputs to the ML model. In this context, we explore three configurations: centralized learning (CL), local learning (LL), and federated learning (FL), for training and testing the wideband spectrum sensing model. In each of these configurations, we use 70707070% of the dataset to train the model and 30303030% for spectrum inference purposes. Next, we present the results obtained for each configuration.

Centralized Learning (CL) is a technique in which it is assumed that all the data collected at different locations are aggregated at one central server and are readily available to train the ML model. The trained CL model is loaded on the UAVs for testing purposes, and the performance metrics computed at different UAV locations are shown in Fig. 11. As shown in Fig. 11b, 11c, the performance metrics computed at UAV locations 2222 and 3333 improves as SNR increases and attains near optimal performance around SNR 20202020 dB. However, as it can be seen in Fig. 11a, at UAV location 1111, the metrics are saturated at 96969696%. Since CL accumulates all the datasets, the distributions of datasets at different locations are incorporated into the CL model, and thus it generalizes well to different locations, as shown in Fig. 11. However, the caveat of CL is the need to aggregate all datasets in one location to train a single model. On the other hand, we can consider local learning.

Refer to caption
(a) Comparison of F1-Score at location 1.
Refer to caption
(b) Comparison of F1-Score at location 2.
Refer to caption
(c) Comparison of F1-Score at location 3.
Figure 16:  Comparison of F1-score for CL, FL-FedAvg, FL-pwFedAvg. We improve the performance for locations 2 and 3 by proportionally scaling their weights as shown in Eq. (14).

Local Learning (LL) is a ML technique in which each UAV trains a model with its own local data, without sharing the dataset or model parameters with a central server or other UAVs. Hence, LL characterizes the performance of the model at a particular location. As shown in Fig. 12, the performance metrics improve as the SNR increases. Although LL tends to be a natural solution that provides insights into the performance of local models trained based on the local datasets, LL models at one particular location do not generalize to other locations. For example, in Fig. 13, when the local model trained at UAV location 1111 is tested at locations 2222 and 3333, the performance metrics are significantly lower than their individual performance metrics, as shown in Fig. 12. This is one of the key observations that led us to explore federated learning that combines the advantages of both LL and CL to obtain a more generalized global model, without the need to accumulate all the datasets in one central location.

Federated Learning achieves trade-off between LL and CL, as it does not require aggregating the datasets in a central location; instead, the local model gradients are transferred to the central server for aggregation, and in return, the local models receive aggregated global weights as described in Algorithm 1. As such, the training process is similar to LL except that the local model weights are updated with the computed global weights iteratively, and by the end of the training process, all of the UAVs will have the same global model. To investigate FL performance, we implement the FedAvg algorithm [38] and the results are presented in Fig. 14. From the results, we note that FedAvg achieves good performance only for the UAV locations 2222 and 3333. Given the heterogeneous dataset collected at different UAV locations, the overall performance of FedAvg is limited by the UAV(s) that performs the worst. This is because FedAvg scales the weights of all local models equally. To reduce the impact of UAV locations with poor performance, our proposed pwFedAvg algorithm scales the weights of local models according to the received signal power. As shown in Fig. 15 it is evident that our proportional weighting scheme improves the performance at locations 2222 and 3333. Furthermore, to have a fair comparison, we plot the F1-score for CL, FL-FedAvg and FL-pwFedAvg as shown in Fig. 16. With our proposed aggregating scheme (pwFedAvg), we improve the performance metrics at UAV locations 2222 and 3333, without significantly affecting location 1111 performance.

Refer to caption
Figure 17: Comparison of F1-Score results with and without fusion for CL, FL-FedAvg and FL-pwFedAvg at UAV location 1.
Refer to caption
(a)
Refer to caption
(b)
Figure 18: Training results for allocating spectrum holes. (a) to one UAV. (b) to two UAVs.

VII-B Collaborative Spectrum Inference results

As shown in Fig. 5, we consider fusing the spectrum hole predictions from multiple UAVs. This is motivated by the fact that individual sensing performance might fluctuate at different locations, which we observed in the CL, LL, FL settings. However, by applying fusion rules, we can significantly improve the overall performance, as shown by our results in Fig. 17. From the results, we notice that the overall performance of all methods is significantly improved by fusion. Furthermore, the proposed pwFedAvg algorithm outperforms FedAvg, while achieving comparable results with respect to the CL method without the need to transfer all datasets to a central location. The comparison results for the spectrum fusion results at locations 2222 and 3333 are omitted for brevity, as they show similar trends.

VII-C Spectrum Resource Allocation using RL

As mentioned in Section V, we use deep Q-learning methods for allocating spectrum resources to the UAVs. In Fig. 18a, we compare the training performance of three variants of Q-learning methods for allocating a sub-channel to a single UAV whenever the fusion rule detects at least a single spectrum hole. It is observed that DDQN with soft update performs slightly better and converges earlier than DDQN and vanilla-DQN. Next, we extend the model to allocate spectrum holes to two UAVs. In this case, we have augmented the DDQN algorithm with soft update to generate two best actions. From the results in Fig. 18b, we observe that the utility performance with two SUs is slightly less than two times of the performance with a single SU. We further note that this paper tries to explore the possibility of integrating spectrum sensing and sharing by making use of existing RL algorithms. Though we explored Q-learning techniques, different and other advanced RL algorithms can be integrated into the proposed framework.

VIII Conclusion

In this paper, we developed a collaborative wideband spectrum sensing and sharing solution for networked UAVs. To train machine learning models for detecting spectrum holes, we explored the applications of FL and developed an architecture that integrates wireless dataset generation into the FL model training and aggregation steps. To this end, we proposed the pwFedAvg algorithm to incorporate wireless channel conditions and received signal powers into the FL aggregation algorithm. To further enhance the accuracy of the predicted spectrum holes by individual UAVs, we considered spectrum fusion at the central server. Additionally, by leveraging deep Q-learning methods, the detected spectrum holes are dynamically allocated to the requesting UAVs. To evaluate the proposed methods, we generated a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. Based on the collected I/Q datasets, we investigated the performance of three model training algorithms, namely CL, LL, and FL. The numerical results demonstrated that the CL model generalizes well and performs better for all UAV locations, while the LL models showed poor generalization performance. Furthermore, the proposed pwFedAvg algorithm outperforms FedAvg while achieving comparable results with respect to the CL method without the need for sharing all datasets to a central location. From the fusion results, we noticed that the overall performance improved significantly for all learning configurations, and the implemented DDQN method can provide dynamic spectrum scheduling across requesting UAVs. In future work, we plan to expand the application of our developed solutions to other technologies and spectrum bands (beyond LTE), while incorporating realistic spectrum usage of the incumbent users in those bands (i.e., PUs).

IX APPENDIX

Proof of Lemma 1111. Since Lk(𝝎k)subscript𝐿𝑘superscriptsubscript𝝎𝑘\nabla L_{k}(\bm{\omega}_{k}^{*})∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 00, assumption 1111 reduces to Lk(𝝎)subscript𝐿𝑘superscript𝝎L_{k}(\bm{\omega}^{*})italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - Lk(𝝎k)subscript𝐿𝑘superscriptsubscript𝝎𝑘L_{k}(\bm{\omega}_{k}^{*})italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) \leq L2𝐿2\frac{L}{2}divide start_ARG italic_L end_ARG start_ARG 2 end_ARG 𝝎𝝎k22superscriptsubscriptnormsuperscript𝝎superscriptsubscript𝝎𝑘22||\bm{\omega}^{*}-\bm{\omega}_{k}^{*}||_{2}^{2}| | bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Using the identities of vector norm and max-norm for a vector 𝐱𝐱\mathbf{x}bold_x, 𝐱22d𝐱2=d(maxi|xi|)2superscriptsubscriptnorm𝐱22𝑑superscriptsubscriptnorm𝐱2𝑑superscriptsubscript𝑖subscript𝑥𝑖2||\mathbf{x}||_{2}^{2}\leq~{}d||\mathbf{x}||_{\infty}^{2}=d(\max\limits_{i}|x_% {i}|)^{2}| | bold_x | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ italic_d | | bold_x | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_d ( roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we have:

L2𝝎𝝎k22maxk{Ld2(maxi{|ωiωk,i|})2},𝐿2superscriptsubscriptnormsuperscript𝝎superscriptsubscript𝝎𝑘22subscript𝑘𝐿𝑑2superscriptsubscript𝑖superscriptsubscript𝜔𝑖superscriptsubscript𝜔𝑘𝑖2\frac{L}{2}||\bm{\omega}^{*}-\bm{\omega}_{k}^{*}||_{2}^{2}\leq\max\limits_{k}% \{{\frac{Ld}{2}}(\max\limits_{i}\{|\omega_{i}^{*}-\omega_{k,i}^{*}|\})^{2}\},divide start_ARG italic_L end_ARG start_ARG 2 end_ARG | | bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ roman_max start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT { divide start_ARG italic_L italic_d end_ARG start_ARG 2 end_ARG ( roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { | italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_ω start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | } ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } ,

which completes the proof.

Proof of Lemma 2222. Using the definition of virtual sequences from Eq. (19), we have:
𝔼(𝒂t𝒂¯t22)𝔼superscriptsubscriptnormsuperscript𝒂𝑡superscriptbold-¯𝒂𝑡22\mathbb{E}\big{(}||\bm{a}^{t}-\bm{\bar{a}}^{t}||_{2}^{2}\big{)}blackboard_E ( | | bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )

=(a)𝔼[k=1K(αktαt)[Lk(𝝎kt;𝝃kt)Lk(𝝎kt)]22](b)k=1K(αktαt)2𝔼[Lk(𝝎kt;𝝃kt)Lk(𝝎kt)22](c)k=1K(αktαt)2ρk2,a𝔼delimited-[]superscriptsubscript𝑘1𝐾superscriptsubscriptnormsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡delimited-[]subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡superscriptsubscript𝝃𝑘𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡22bsuperscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2𝔼delimited-[]superscriptsubscriptnormsubscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡superscriptsubscript𝝃𝑘𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡22csuperscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2superscriptsubscript𝜌𝑘2\begin{split}&\overset{\mathrm{(a)}}{=}\small{\mathbb{E}\big{[}\sum_{k=1}^{{K}% }||{\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}\big{[}\nabla L_{k}(\bm{% \omega}_{k}^{t};\bm{\xi}_{k}^{t})}-{\nabla L_{k}(\bm{\omega}_{k}^{t})\big{]}||% _{2}^{2}}\big{]}}\\ &\overset{\mathrm{(b)}}{\leq}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}}{% \alpha^{t}}\bigg{)}^{2}\mathbb{E}\big{[}||{\nabla L_{k}(\bm{\omega}_{k}^{t};% \bm{\xi}_{k}^{t})}-{\nabla L_{k}(\bm{\omega}_{k}^{t})||_{2}^{2}}\big{]}\\ &\overset{\mathrm{(c)}}{\leq}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}}{% \alpha^{t}}\bigg{)}^{2}\rho_{k}^{2},~{}\end{split}start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_a ) end_OVERACCENT start_ARG = end_ARG blackboard_E [ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | | ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) [ ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ] | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_b ) end_OVERACCENT start_ARG ≤ end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ | | ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_c ) end_OVERACCENT start_ARG ≤ end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW (28)

where (a) is from Eq. (19), (b) comes from Jensen’s inequality, and (c) is by applying Assumption 3333.

Proof of Lemma 3333.  Using Eq. (14) and Eq. (15), we have the following equation:
𝝎t+1𝝎22=𝝎tγt𝒂t𝝎22superscriptsubscriptnormsuperscript𝝎𝑡1superscript𝝎22superscriptsubscriptnormsuperscript𝝎𝑡superscript𝛾𝑡superscript𝒂𝑡superscript𝝎22||\bm{\omega}^{t+1}-\bm{\omega}^{*}||_{2}^{2}=||\bm{\omega}^{t}-\gamma^{t}\bm{% a}^{t}-\bm{\omega}^{*}||_{2}^{2}| | bold_italic_ω start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

=𝝎tγt𝒂¯t+γt𝒂¯tγt𝒂t𝝎22=𝝎tγt𝒂¯t𝝎22A1+(γt)2𝒂𝒕𝒂¯t22A22γt𝝎tγt𝒂¯t𝝎,𝒂t𝒂¯tA3.absentsuperscriptsubscriptnormsuperscript𝝎𝑡superscript𝛾𝑡superscriptbold-¯𝒂𝑡superscript𝛾𝑡superscriptbold-¯𝒂𝑡superscript𝛾𝑡superscript𝒂𝑡superscript𝝎22subscriptsuperscriptsubscriptnormsuperscript𝝎𝑡superscript𝛾𝑡superscriptbold-¯𝒂𝑡superscript𝝎22subscript𝐴1subscriptsuperscriptsuperscript𝛾𝑡2superscriptsubscriptnormsuperscript𝒂𝒕superscriptbold-¯𝒂𝑡22subscript𝐴2subscript2superscript𝛾𝑡superscript𝝎𝑡superscript𝛾𝑡superscriptbold-¯𝒂𝑡superscript𝝎superscript𝒂𝑡superscript¯𝒂𝑡subscript𝐴3\begin{split}&=||\bm{\omega}^{t}-\gamma^{t}\bm{\bar{a}}^{t}+\gamma^{t}\bm{\bar% {a}}^{t}-\gamma^{t}\bm{a}^{t}-\bm{\omega}^{*}||_{2}^{2}\\ &=\underbrace{||\bm{\omega}^{t}-\gamma^{t}\bm{\bar{a}}^{t}-\bm{\omega}^{*}||_{% 2}^{2}}_{\text{$A_{1}$}}+\underbrace{(\gamma^{t})^{2}||\bm{\bm{a}^{t}-\bar{a}}% ^{t}||_{2}^{2}}_{\text{$A_{2}$}}\\ &\underbrace{-2\gamma^{t}\langle\bm{\omega}^{t}-\gamma^{t}\bm{\bar{a}}^{t}-\bm% {\omega}^{*},\bm{a}^{t}-\bar{\bm{a}}^{t}\rangle}_{\text{$A_{3}$}}.\\ \end{split}start_ROW start_CELL end_CELL start_CELL = | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = under⏟ start_ARG | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + under⏟ start_ARG ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | | bold_italic_a start_POSTSUPERSCRIPT bold_italic_t end_POSTSUPERSCRIPT bold_- overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL under⏟ start_ARG - 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟨ bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT overbold_¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT . end_CELL end_ROW (29)

Since 𝔼[𝒂t]=𝒂¯t𝔼delimited-[]superscript𝒂𝑡superscript¯𝒂𝑡\mathbb{E}[\bm{a}^{t}]=\bar{\bm{a}}^{t}blackboard_E [ bold_italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] = over¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, it can be seen that 𝔼[A3]=0𝔼delimited-[]𝐴30\mathbb{E}[A3]=0blackboard_E [ italic_A 3 ] = 0.
By expanding 𝔼[A1]𝔼delimited-[]subscript𝐴1\mathbb{E}[A_{1}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ], we have:

𝔼[A1]=𝔼[||𝝎t𝝎||22+(γt)2𝒂¯t22A1,12γt𝝎t𝝎,𝒂¯tA1,2].𝔼delimited-[]subscript𝐴1𝔼delimited-[]superscriptsubscriptnormsuperscript𝝎𝑡superscript𝝎22subscriptsuperscriptsuperscript𝛾𝑡2superscriptsubscriptnormsuperscript¯𝒂𝑡22subscript𝐴11subscript2superscript𝛾𝑡superscript𝝎𝑡superscript𝝎superscript¯𝒂𝑡subscript𝐴12\begin{split}\mathbb{E}[A_{1}]&=\mathbb{E}\big{[}||\bm{\omega}^{t}-\bm{\omega}% ^{*}||_{2}^{2}+\underbrace{(\gamma^{t})^{2}||\bar{\bm{a}}^{t}||_{2}^{2}}_{% \text{$A_{1,1}$}}\\ &-\underbrace{2~{}\gamma^{t}\langle\bm{\omega}^{t}-\bm{\omega}^{*},\bar{\bm{a}% }^{t}\rangle}_{\text{$A_{1,2}$}}\big{]}.\\ \end{split}start_ROW start_CELL blackboard_E [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] end_CELL start_CELL = blackboard_E [ | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + under⏟ start_ARG ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | | over¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - under⏟ start_ARG 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟨ bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , over¯ start_ARG bold_italic_a end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] . end_CELL end_ROW (30)

The bound on A1,1subscript𝐴11A_{1,1}italic_A start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT term can be derived as follows: 𝔼[A1,1]𝔼delimited-[]subscript𝐴11\mathbb{E}[A_{1,1}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ]

=(a)(γt)2𝔼[k=1KαktαtLk(𝝎kt)22](b)(γt)2k=1K(αktαt)2Lk(𝝎kt)22(c)2L(γt)2k=1K(αktαt)2(Lk(𝝎kt)Lk(𝝎k)),asuperscriptsuperscript𝛾𝑡2𝔼delimited-[]superscriptsubscriptnormsuperscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡22bsuperscriptsuperscript𝛾𝑡2superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2superscriptsubscriptnormsubscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡22c2𝐿superscriptsuperscript𝛾𝑡2superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘\begin{split}&\overset{\mathrm{(a)}}{=}(\gamma^{t})^{2}~{}\mathbb{E}\big{[}||% \sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\nabla L_{k}(\bm{\omega}_{k}^% {t})||_{2}^{2}\big{]}\\ &\overset{\mathrm{(b)}}{\leq}(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}\frac{% \alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}||\nabla L_{k}(\bm{\omega}_{k}^{t})||_{% 2}^{2}\\ &\overset{\mathrm{(c)}}{\leq}2L~{}(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}% \frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\big{(}L_{k}(\bm{\omega}_{k}^{t})% -L_{k}(\bm{\omega}_{k}^{*})\big{)},\\ \end{split}start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_a ) end_OVERACCENT start_ARG = end_ARG ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT blackboard_E [ | | ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_b ) end_OVERACCENT start_ARG ≤ end_ARG ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | | ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_c ) end_OVERACCENT start_ARG ≤ end_ARG 2 italic_L ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) , end_CELL end_ROW (31)

where (a) is from Eq. (19), (b) comes from Jensen’s inequality, and (c) by applying Assumption 1111 and L𝐿Litalic_L-smoothness property [46]. The bound for A1,2subscript𝐴12A_{1,2}italic_A start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT term can be derived as follows:
𝔼[A1,2]𝔼delimited-[]subscript𝐴12\mathbb{E}[A_{1,2}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT ] = 2γt𝝎t𝝎,k=1KαktαtLk(𝝎kt)2superscript𝛾𝑡superscript𝝎𝑡superscript𝝎superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡-2~{}\gamma^{t}~{}\langle\bm{\omega}^{t}-\bm{\omega}^{*},\sum_{k=1}^{{K}}\frac% {\alpha_{k}^{t}}{\alpha^{t}}~{}{\nabla L_{k}(\bm{\omega}_{k}^{t})}\rangle- 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟨ bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⟩

=2γtk=1Kαktαt𝝎t𝝎,Lk(𝝎kt)absent2superscript𝛾𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡superscript𝝎𝑡superscript𝝎subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡\displaystyle=-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}% ~{}\langle\bm{\omega}^{t}-\bm{\omega}^{*},~{}{\nabla L_{k}(\bm{\omega}_{k}^{t}% )}\rangle= - 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ⟨ bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⟩
(a)2γtk=1Kαktαt[Lk(𝝎kt)Lk(𝝎)+β2𝝎t𝝎22]a2superscript𝛾𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡delimited-[]subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscript𝝎𝛽2superscriptsubscriptnormsuperscript𝝎𝑡superscript𝝎22\displaystyle\overset{\mathrm{(a)}}{\leq}-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{% \alpha_{k}^{t}}{\alpha^{t}}\big{[}L_{k}(\bm{\omega}_{k}^{t})-L_{k}(\bm{\omega}% ^{*})+\frac{\beta}{2}\bm{||\omega}^{t}-\bm{\omega}^{*}||_{2}^{2}\big{]}start_OVERACCENT ( roman_a ) end_OVERACCENT start_ARG ≤ end_ARG - 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG [ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) + divide start_ARG italic_β end_ARG start_ARG 2 end_ARG bold_| bold_| bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
(b)2γtk=1Kαktαt(Lk(𝝎kt)Lk(𝝎))A1,2,1βγt𝝎t𝝎22,bsubscript2superscript𝛾𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscript𝝎subscript𝐴121𝛽superscript𝛾𝑡superscriptsubscriptnormsuperscript𝝎𝑡superscript𝝎22\displaystyle\overset{\mathrm{(b)}}{\leq}\underbrace{-2\gamma^{t}~{}\sum_{k=1}% ^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{(}L_{k}(\bm{\omega}_{k}^{t})-L_{k}% (\bm{\omega}^{*})\big{)}}_{\text{$A_{1,2,1}$}}-\beta\gamma^{t}~{}||\bm{\omega}% ^{t}-\bm{\omega}^{*}||_{2}^{2},start_OVERACCENT ( roman_b ) end_OVERACCENT start_ARG ≤ end_ARG under⏟ start_ARG - 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ( italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_ARG start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT 1 , 2 , 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_β italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where (a) comes from Assumption 2222, (b) comes from the fact that k=1Kαktαt=1.superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡1\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}=1.∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG = 1 .

Combining 𝔼[A1,1]𝔼delimited-[]subscript𝐴11\mathbb{E}[A_{1,1}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ] and 𝔼[A1,2,1]𝔼delimited-[]subscript𝐴121\mathbb{E}[A_{1,2,1}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 , 2 , 1 end_POSTSUBSCRIPT ], we have

𝔼[A1,1]𝔼delimited-[]subscript𝐴11\mathbb{E}[A_{1,1}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 , 1 end_POSTSUBSCRIPT ] + 𝔼[A1,2,1]𝔼delimited-[]subscript𝐴121\mathbb{E}[A_{1,2,1}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 , 2 , 1 end_POSTSUBSCRIPT ]

2L(γt)2k=1K(αktαt)2(Lk(𝝎kt)Lk(𝝎)+Lk(𝝎)Lk(𝝎k))2γtk=1Kαktαt(Lk(𝝎kt)Lk(𝝎))(a)2Lτ(γt)2k=1K(αktαt)22γtk=1Kαktαt[1Lγtαktαt](Lk(𝝎kt)Lk(𝝎))(b)2τγt[1Lγtk=1K(αktαt)2],absent2𝐿superscriptsuperscript𝛾𝑡2superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscript𝝎subscript𝐿𝑘superscript𝝎subscript𝐿𝑘superscriptsubscript𝝎𝑘2superscript𝛾𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscript𝝎a2𝐿𝜏superscriptsuperscript𝛾𝑡2superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡22superscript𝛾𝑡superscriptsubscript𝑘1𝐾superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡delimited-[]1𝐿superscript𝛾𝑡superscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscript𝝎b2𝜏superscript𝛾𝑡delimited-[]1𝐿superscript𝛾𝑡superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2\begin{split}&\leq 2L~{}(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{% k}^{t}}{\alpha^{t}}\bigg{)}^{2}\bigg{(}L_{k}(\bm{\omega}_{k}^{t})-L_{k}(\bm{% \omega}^{*})\\ &~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{% }~{}+L_{k}(\bm{\omega}^{*})-L_{k}(\bm{\omega}_{k}^{*})\bigg{)}\\ &-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{(}L_{k}(% \bm{\omega}_{k}^{t})-L_{k}(\bm{\omega}^{*})\big{)}\\ &\overset{\mathrm{(a)}}{\leq}2L\tau(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}% \frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\\ &-2\gamma^{t}~{}\sum_{k=1}^{{K}}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{[}1-L% \gamma^{t}\frac{\alpha_{k}^{t}}{\alpha^{t}}\big{]}\big{(}L_{k}(\bm{\omega}_{k}% ^{t})-L_{k}(\bm{\omega}^{*})\big{)}\\ &\overset{\mathrm{(b)}}{\leq}2\tau\gamma^{t}\bigg{[}1-L\gamma^{t}\sum_{k=1}^{{% K}}\bigg{(}\frac{\alpha_{k}^{t}}{\alpha^{t}}\bigg{)}^{2}\bigg{]},\end{split}start_ROW start_CELL end_CELL start_CELL ≤ 2 italic_L ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ( italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_a ) end_OVERACCENT start_ARG ≤ end_ARG 2 italic_L italic_τ ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - 2 italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG [ 1 - italic_L italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ] ( italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_b ) end_OVERACCENT start_ARG ≤ end_ARG 2 italic_τ italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ 1 - italic_L italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , end_CELL end_ROW (32)

where (a) comes from Lemma 1111 and  (b) comes from the fact that Lk(𝝎kt)Lk(𝝎k)0subscript𝐿𝑘superscriptsubscript𝝎𝑘𝑡subscript𝐿𝑘superscriptsubscript𝝎𝑘0L_{k}(\bm{\omega}_{k}^{t})-L_{k}(\bm{\omega}_{k}^{*})\geq 0italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_italic_ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ≥ 0 and Lemma 1111. Now, 𝔼[A2]=(γt)2k=1K(αktαt)2ρk2𝔼delimited-[]subscript𝐴2superscriptsuperscript𝛾𝑡2superscriptsubscript𝑘1𝐾superscriptsuperscriptsubscript𝛼𝑘𝑡superscript𝛼𝑡2superscriptsubscript𝜌𝑘2\mathbb{E}[A_{2}]=(\gamma^{t})^{2}\sum_{k=1}^{{K}}\bigg{(}\frac{\alpha_{k}^{t}% }{\alpha^{t}}\bigg{)}^{2}\rho_{k}^{2}blackboard_E [ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ( divide start_ARG italic_α start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT can be found easily by applying Lemma 1111. Substituting 𝔼[A1],𝔼[A2],𝔼[A3]𝔼delimited-[]subscript𝐴1𝔼delimited-[]subscript𝐴2𝔼delimited-[]subscript𝐴3\mathbb{E}[A_{1}],~{}\mathbb{E}[A_{2}],~{}\mathbb{E}[A_{3}]blackboard_E [ italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , blackboard_E [ italic_A start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] , blackboard_E [ italic_A start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] into 𝔼(𝝎t+1𝝎t22)𝔼superscriptsubscriptnormsuperscript𝝎𝑡1superscript𝝎𝑡22\mathbb{E}\big{(}||\bm{\omega}^{t+1}-\bm{\omega}^{t}||_{2}^{2}\big{)}blackboard_E ( | | bold_italic_ω start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) and using the fact that 1κγt1𝜅superscript𝛾𝑡\frac{1}{\kappa}\leq\gamma^{t}divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG ≤ italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT complete the proof.

Proof of Theorem 1111 Similar to [39, 40], we define Δt=𝔼[𝝎t𝝎22]superscriptΔ𝑡𝔼delimited-[]superscriptsubscriptnormsuperscript𝝎𝑡superscript𝝎22\Delta^{t}=\mathbb{E}[||\bm{\omega}^{t}-\bm{\omega}^{*}||_{2}^{2}]roman_Δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = blackboard_E [ | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. From Lemma 3333, it follows that, Δt+1(1βγt)Δt+(γt)2GtsuperscriptΔ𝑡11𝛽superscript𝛾𝑡superscriptΔ𝑡superscriptsuperscript𝛾𝑡2superscript𝐺𝑡\Delta^{t+1}\leq(1-\beta\gamma^{t})\Delta^{t}+(\gamma^{t})^{2}G^{t}roman_Δ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ≤ ( 1 - italic_β italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) roman_Δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. we assume γt=αt+μsuperscript𝛾𝑡𝛼𝑡𝜇\gamma^{t}=\frac{\alpha}{t+\mu}italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG italic_α end_ARG start_ARG italic_t + italic_μ end_ARG for some α>1β𝛼1𝛽\alpha>\frac{1}{\beta}italic_α > divide start_ARG 1 end_ARG start_ARG italic_β end_ARG and μ>1𝜇1\mu>1italic_μ > 1. Assuming λ=max{α2Gαβ1,μΔ0}𝜆superscript𝛼2𝐺𝛼𝛽1𝜇superscriptΔ0\lambda=\max\{\frac{\alpha^{2}G}{\alpha\beta-1},\mu\Delta^{0}\}italic_λ = roman_max { divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G end_ARG start_ARG italic_α italic_β - 1 end_ARG , italic_μ roman_Δ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT }, we will prove Δtλt+μsuperscriptΔ𝑡𝜆𝑡𝜇\Delta^{t}\leq\frac{\lambda}{t+\mu}roman_Δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ divide start_ARG italic_λ end_ARG start_ARG italic_t + italic_μ end_ARG by induction as follows. The definition of λ𝜆\lambdaitalic_λ ensures that the inequality Δtλt+μsuperscriptΔ𝑡𝜆𝑡𝜇\Delta^{t}\leq\frac{\lambda}{t+\mu}roman_Δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ divide start_ARG italic_λ end_ARG start_ARG italic_t + italic_μ end_ARG holds for t=0𝑡0t=0italic_t = 0. For the inequality to hold for t>0𝑡0t>0italic_t > 0, it follows from definition as follows:
Δt+1(1βγt)Δt+(γt)2GtsuperscriptΔ𝑡11𝛽superscript𝛾𝑡superscriptΔ𝑡superscriptsuperscript𝛾𝑡2superscript𝐺𝑡\Delta^{t+1}\leq(1-\beta\gamma^{t})\Delta^{t}+(\gamma^{t})^{2}G^{t}roman_Δ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ≤ ( 1 - italic_β italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) roman_Δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT

(1αβt+μ)+α2Gt(t+μ)2t+μ1(t+μ)2λ+[α2Gtαβ+1(t+μ)2]0t+μ1(t+μ)21λλ(t+1)+μ.absent1𝛼𝛽𝑡𝜇superscript𝛼2superscript𝐺𝑡superscript𝑡𝜇2𝑡𝜇1superscript𝑡𝜇2𝜆subscriptdelimited-[]superscript𝛼2superscript𝐺𝑡𝛼𝛽1superscript𝑡𝜇2absent0𝑡𝜇1superscript𝑡𝜇21𝜆𝜆𝑡1𝜇\begin{split}&\leq\bigg{(}1-\frac{\alpha\beta}{t+\mu}\bigg{)}+\frac{\alpha^{2}% G^{t}}{(t+\mu)^{2}}\\ &\leq\frac{t+\mu-1}{(t+\mu)^{2}}\lambda+\underbrace{\bigg{[}\frac{\alpha^{2}G^% {t}-\alpha\beta+1}{(t+\mu)^{2}}\bigg{]}}_{\leq 0}\\ &\leq\frac{t+\mu-1}{(t+\mu)^{2}-1}\lambda\leq\frac{\lambda}{(t+1)+\mu}.\end{split}start_ROW start_CELL end_CELL start_CELL ≤ ( 1 - divide start_ARG italic_α italic_β end_ARG start_ARG italic_t + italic_μ end_ARG ) + divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_t + italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ divide start_ARG italic_t + italic_μ - 1 end_ARG start_ARG ( italic_t + italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_λ + under⏟ start_ARG [ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_α italic_β + 1 end_ARG start_ARG ( italic_t + italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] end_ARG start_POSTSUBSCRIPT ≤ 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ divide start_ARG italic_t + italic_μ - 1 end_ARG start_ARG ( italic_t + italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 1 end_ARG italic_λ ≤ divide start_ARG italic_λ end_ARG start_ARG ( italic_t + 1 ) + italic_μ end_ARG . end_CELL end_ROW (33)

Specifically, if we choose α=2β𝛼2𝛽\alpha=\frac{2}{\beta}italic_α = divide start_ARG 2 end_ARG start_ARG italic_β end_ARG, μ=2Lβ𝜇2𝐿𝛽\mu=\frac{2L}{\beta}italic_μ = divide start_ARG 2 italic_L end_ARG start_ARG italic_β end_ARG, then γt=2βt+2Lsuperscript𝛾𝑡2𝛽𝑡2𝐿\gamma^{t}=\frac{2}{\beta t+2L}italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG 2 end_ARG start_ARG italic_β italic_t + 2 italic_L end_ARG. Then, we have
   λ=max{α2Gαβ1,μΔ0}𝜆superscript𝛼2𝐺𝛼𝛽1𝜇superscriptΔ0\lambda=\max\big{\{}\frac{\alpha^{2}G}{\alpha\beta-1},\mu\Delta^{0}\big{\}}italic_λ = roman_max { divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G end_ARG start_ARG italic_α italic_β - 1 end_ARG , italic_μ roman_Δ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT }

α2Gαβ1+μΔ0=4Gβ2+2Lβ𝝎0𝝎22.absentsuperscript𝛼2𝐺𝛼𝛽1𝜇superscriptΔ04𝐺superscript𝛽22𝐿𝛽superscriptsubscriptnormsuperscript𝝎0superscript𝝎22\begin{split}&\leq\frac{\alpha^{2}G}{\alpha\beta-1}+\mu\Delta^{0}=\frac{4G}{% \beta^{2}}+\frac{2L}{\beta}||\bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{2}.\\ \end{split}start_ROW start_CELL end_CELL start_CELL ≤ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_G end_ARG start_ARG italic_α italic_β - 1 end_ARG + italic_μ roman_Δ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = divide start_ARG 4 italic_G end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 2 italic_L end_ARG start_ARG italic_β end_ARG | | bold_italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (34)

Finally, we have:
𝔼[L(𝝎t)]L𝔼delimited-[]𝐿superscript𝝎𝑡superscript𝐿\mathbb{E}[L(\bm{\omega}^{t})]-L^{*}blackboard_E [ italic_L ( bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ] - italic_L start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT

(a)L2𝝎t𝝎22=L2Δt=L2λt+μ(b)L2ααt+μ[4Gβ2+2Lβ𝝎0𝝎22](c)Lβt+2L[2Gβ+L𝝎0𝝎22],a𝐿2superscriptsubscriptnormsuperscript𝝎𝑡superscript𝝎22𝐿2superscriptΔ𝑡𝐿2𝜆𝑡𝜇b𝐿2𝛼𝛼𝑡𝜇delimited-[]4𝐺superscript𝛽22𝐿𝛽superscriptsubscriptnormsuperscript𝝎0superscript𝝎22c𝐿𝛽𝑡2𝐿delimited-[]2𝐺𝛽𝐿superscriptsubscriptnormsuperscript𝝎0superscript𝝎22\begin{split}&\overset{\mathrm{(a)}}{\leq}\frac{L}{2}||\bm{\omega}^{t}-\bm{% \omega}^{*}||_{2}^{2}=\frac{L}{2}\Delta^{t}=\frac{L}{2}\frac{\lambda}{t+\mu}\\ &\overset{\mathrm{(b)}}{\leq}\frac{L}{2\alpha}\frac{\alpha}{t+\mu}\bigg{[}% \frac{4G}{\beta^{2}}+\frac{2L}{\beta}||\bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{% 2}\bigg{]}\\ &\overset{\mathrm{(c)}}{\leq}\frac{L}{\beta t+2L}\bigg{[}\frac{2G}{\beta}+L||% \bm{\omega}^{0}-\bm{\omega}^{*}||_{2}^{2}\bigg{]},\end{split}start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_a ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_L end_ARG start_ARG 2 end_ARG | | bold_italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG italic_L end_ARG start_ARG 2 end_ARG roman_Δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = divide start_ARG italic_L end_ARG start_ARG 2 end_ARG divide start_ARG italic_λ end_ARG start_ARG italic_t + italic_μ end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_b ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_L end_ARG start_ARG 2 italic_α end_ARG divide start_ARG italic_α end_ARG start_ARG italic_t + italic_μ end_ARG [ divide start_ARG 4 italic_G end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 2 italic_L end_ARG start_ARG italic_β end_ARG | | bold_italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL start_OVERACCENT ( roman_c ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG italic_L end_ARG start_ARG italic_β italic_t + 2 italic_L end_ARG [ divide start_ARG 2 italic_G end_ARG start_ARG italic_β end_ARG + italic_L | | bold_italic_ω start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT - bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , end_CELL end_ROW (35)

where (a) comes from L𝐿Litalic_L-smoothness of the loss function and using the fact that L(𝝎)=0𝐿superscript𝝎0\nabla L(\bm{\omega}^{*})=0∇ italic_L ( bold_italic_ω start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = 0, (b) is computed using  Eq. (34), and (c) is computed by substituting the values of α𝛼\alphaitalic_α and μ𝜇\muitalic_μ. Hence, the convergence is proved to be 𝒪(1T)𝒪1𝑇\mathcal{O}(\frac{1}{T})caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ).

X Acknowledgement

The material is based upon work supported by NASA under award No(s) 80NSSC20M0261, and NSF grants 1948511, 1955561, and 2212565. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration (NASA) and the National Science Foundation (NSF).

References

  • [1] H. Menouar, I. Guvenc, K. Akkaya, A. S. Uluagac, A. Kadri, and A. Tuncer, “UAV-enabled intelligent transportation systems for the smart city: Applications and challenges,” IEEE Communications Magazine, vol. 55, no. 3, pp. 22–28, 2017.
  • [2] FAA, “Drones by the Numbers,” https://www.faa.gov/node/54496.
  • [3] S. Li, Y. Gu, B. Subedi, C. He, Y. Wan, A. Miyaji, and T. Higashino, “Beyond visual line of sight UAV control for remote monitoring using directional antennas,” in 2019 IEEE Globecom Workshops (GC Wkshps).   IEEE, 2019, pp. 1–6.
  • [4] P. Kopardekar, J. Rios, T. Prevot, M. Johnson, J. Jung, and J. E. Robinson, “Unmanned aircraft system traffic management (UTM) concept of operations,” 2016.
  • [5] A. S. Abdalla and V. Marojevic, “Communications standards for unmanned aircraft systems: The 3GPP perspective and research drivers,” IEEE Communications Standards Magazine, vol. 5, no. 1, pp. 70–77, 2021.
  • [6] M. Rimjha and A. Trani, “Urban Air Mobility: Factors Affecting Vertiport Capacity,” in 2021 Integrated Communications Navigation and Surveillance Conference (ICNS).   IEEE, 2021, pp. 1–14.
  • [7] M. Ghazikor, K. Roach, K. Cheung, and M. Hashemi, “Exploring the Interplay of Interference and Queues in Unlicensed Spectrum Bands for UAV Networks,” in 2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023, pp. 729–733.
  • [8] W. S. H. M. W. Ahmad, N. A. M. Radzi, F. Samidi, A. Ismail, F. Abdullah, M. Z. Jamaludin, and M. Zakaria, “5G technology: Towards dynamic spectrum sharing using cognitive radio networks,” IEEE access, vol. 8, pp. 14 460–14 488, 2020.
  • [9] D. Uvaydov, S. D’Oro, F. Restuccia, and T. Melodia, “Deepsense: Fast wideband spectrum sensing through real-time in-the-loop deep learning,” in IEEE INFOCOM 2021-IEEE Conference on Computer Communications.   IEEE, 2021, pp. 1–10.
  • [10] J. Cui, Y. Liu, and A. Nallanathan, “Multi-agent reinforcement learning-based resource allocation for UAV networks,” IEEE Transactions on Wireless Communications, vol. 19, no. 2, pp. 729–743, 2019.
  • [11] Y. Li, W. Zhang, C.-X. Wang, J. Sun, and Y. Liu, “Deep reinforcement learning for dynamic spectrum sensing and aggregation in multi-channel wireless networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 2, pp. 464–475, 2020.
  • [12] H. Q. Nguyen, B. T. Nguyen, T. Q. Dong, D. T. Ngo, and T. A. Nguyen, “Deep Q-Learning with Multiband Sensing for Dynamic Spectrum Access,” in 2018 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), 2018, pp. 1–5.
  • [13] J. Kakar and V. Marojevic, “Waveform and spectrum management for unmanned aerial systems beyond 2025,” in 2017 IEEE 28th Annual international symposium on personal, indoor, and mobile radio communications (PIMRC).   IEEE, 2017, pp. 1–5.
  • [14] B. Shang, V. Marojevic, Y. Yi, A. S. Abdalla, and L. Liu, “Spectrum sharing for UAV communications: Spatial spectrum sensing and open issues,” IEEE Vehicular Technology Magazine, vol. 15, no. 2, pp. 104–112, 2020.
  • [15] S. R. Chintareddy, K. Roach, K. Cheung, and M. Hashemi, “Collaborative wideband spectrum sensing and scheduling for networked uavs in utm systems,” in GLOBECOM 2023-2023 IEEE Global Communications Conference.   IEEE, 2023, pp. 3064–3069.
  • [16] B. Shang, L. Liu, H. Chen, J. Zhang, S. Pudlewski, E. S. Bentley, and J. Ashdown, “Spatial spectrum sensing-based D2D communications in user-centric deployed HetNets,” in 2019 IEEE Global Communications Conference (GLOBECOM).   IEEE, 2019, pp. 1–6.
  • [17] H. Chen, L. Liu, T. Novlan, J. D. Matyjas, B. L. Ng, and J. Zhang, “Spatial spectrum sensing-based device-to-device cellular networks,” IEEE Transactions on Wireless Communications, vol. 15, no. 11, pp. 7299–7313, 2016.
  • [18] H. Chen, L. Liu, H. S. Dhillon, and Y. Yi, “QoS-aware D2D cellular networks with spatial spectrum sensing: A stochastic geometry view,” IEEE Transactions on Communications, vol. 67, no. 5, pp. 3651–3664, 2018.
  • [19] B. Shang, L. Liu, R. M. Rao, V. Marojevic, and J. H. Reed, “3D spectrum sharing for hybrid D2D and UAV networks,” IEEE Transactions on Communications, vol. 68, no. 9, pp. 5375–5389, 2020.
  • [20] C. Liu, J. Wang, X. Liu, and Y.-C. Liang, “Deep CM-CNN for spectrum sensing in cognitive radio,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 10, pp. 2306–2321, 2019.
  • [21] D. Chew and A. B. Cooper, “Spectrum sensing in interference and noise using deep learning,” in 2020 54th Annual conference on information sciences and systems (CISS).   IEEE, 2020, pp. 1–6.
  • [22] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for dynamic spectrum access in multichannel wireless networks,” in GLOBECOM 2017-2017 IEEE Global Communications Conference.   IEEE, 2017, pp. 1–7.
  • [23] ——, “Deep multi-user reinforcement learning for distributed dynamic spectrum access,” IEEE transactions on wireless communications, vol. 18, no. 1, pp. 310–323, 2018.
  • [24] H. Albinsaid, K. Singh, S. Biswas, and C.-P. Li, “Multi-agent reinforcement learning-based distributed dynamic spectrum access,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 1174–1185, 2021.
  • [25] Y. Bokobza, R. Dabora, and K. Cohen, “Deep reinforcement learning for simultaneous sensing and channel access in cognitive networks,” IEEE Transactions on Wireless Communications, 2023.
  • [26] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforcement learning for dynamic multichannel access in wireless networks,” IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 2, pp. 257–265, 2018.
  • [27] Z. Chen, Y.-Q. Xu, H. Wang, and D. Guo, “Federated learning-based cooperative spectrum sensing in cognitive radio,” IEEE Communications Letters, vol. 26, no. 2, pp. 330–334, 2021.
  • [28] Z. Gao, A. Li, Y. Chen, B. Li, Y. Wang, and Y. Chen, “FedSwap: A Federated Learning based 5G Decentralized Dynamic Spectrum Access System,” in 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2021, pp. 1–6.
  • [29] M. Wasilewska, H. Bogucka, and H. V. Poor, “Secure Federated Learning for Cognitive Radio Sensing,” IEEE Communications Magazine, vol. 61, no. 3, pp. 68–73, 2023.
  • [30] N. A. Khalek, D. H. Tashman, and W. Hamouda, “Advances in Machine Learning-Driven Cognitive Radio for Wireless Networks: A Survey,” IEEE Communications Surveys & Tutorials, 2023.
  • [31] X. Liu, Y. Deng, and T. Mahmoodi, “Wireless distributed learning: a new hybrid split and federated learning approach,” IEEE Transactions on Wireless Communications, vol. 22, no. 4, pp. 2650–2665, 2022.
  • [32] Z. Wang, Y. Zhou, Y. Shi, and W. Zhuang, “Interference management for over-the-air federated learning in multi-cell wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 8, pp. 2361–2377, 2022.
  • [33] G. Shi, S. Guo, J. Ye, N. Saeed, and S. Dang, “Multiple parallel federated learning via over-the-air computation,” IEEE Open Journal of the Communications Society, vol. 3, pp. 1252–1264, 2022.
  • [34] B. Xiao, X. Yu, W. Ni, X. Wang, and H. V. Poor, “Over-the-air federated learning: Status quo, open challenges, and future directions,” arXiv preprint arXiv:2307.00974, 2023.
  • [35] M. M. Amiri, D. Gündüz, S. R. Kulkarni, and H. V. Poor, “Convergence of federated learning over a noisy downlink,” IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 1422–1437, 2021.
  • [36] X. Wei and C. Shen, “Federated learning over noisy channels: Convergence analysis and design examples,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 2, pp. 1253–1268, 2022.
  • [37] X. Zhang and K. G. Shin, “E-MiLi: Energy-minimizing idle listening in wireless networks,” in Proceedings of the 17th annual international conference on Mobile computing and networking, 2011, pp. 205–216.
  • [38] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  • [39] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” arXiv preprint arXiv:1907.02189, 2019.
  • [40] N. Yan, K. Wang, C. Pan, and K. K. Chai, “Performance analysis for channel-weighted federated learning in OMA wireless networks,” IEEE Signal Processing Letters, vol. 29, pp. 772–776, 2022.
  • [41] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.   MIT press, 2018.
  • [42] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016.
  • [43] “CellMapper,” https://www.cellmapper.net/map.
  • [44] OpenStreetMap contributors, “Planet dump retrieved from https://planet.osm.org ,” https://www.openstreetmap.org, 2017.
  • [45] M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class classification: an overview,” arXiv preprint arXiv:2008.05756, 2020.
  • [46] S. P. Boyd and L. Vandenberghe, Convex optimization.   Cambridge university press, 2004.