License: CC BY 4.0
arXiv:2310.17491v2 [cs.LG] 28 Feb 2024

FedPEAT: Convergence of 6G Enabled Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for AI Foundation Models

Terence Jie Chua Nanyang Technological University, Graduate College, Singapore, 637335, Singapore These authors contributed equally to this work. Wenhan Yu Nanyang Technological University, Graduate College, Singapore, 637335, Singapore These authors contributed equally to this work. Yang Li Nanyang Technological University, Graduate College, Singapore, 637335, Singapore Jun Zhao Nanyang Technological University, School of Computer Science and Engineering, Singapore, 639798, Singapore [email protected]
Abstract

The advent of foundation models like GPT-3 and BERT has revolutionized artificial intelligence, providing unparalleled capabilities across various applications and the potential to transform industries from healthcare to entertainment. Deploying and fine-tuning these models pose unique challenges, making it imperative to address issues like model ownership, collaborative training, and computation and communication limitations for realizing their full potential. We generalize the offsite tuning approach to Emulator-Assisted Tuning (EAT) and combine it with Parameter-Efficient Fine-Tuning (PEFT) to create Parameter- Efficient Emulator-Assisted Tuning (PEAT), expanding its use into 6G-enabled Federated Learning (FL) as Federated Parameter- Efficient Emulator-Assisted Tuning (FedPEAT). The FedPEAT framework proposes a solution using adapters, emulators, and PEFT techniques for federated model fine-tuning. This approach enhances model privacy and streamlines downstream fine-tuning. Our approach, adaptable to diverse neural network architectures, incorporates an adaptive control mechanism utilizing the novel Single-Agent Action Branching Proximal Policy Optimization (SABPPO) algorithm. The proposed SABPPO is tailored for high-dimensional action spaces, featuring short training delays, essential for scalable FedPEAT involving a large number of users and variables to optimize. Our experimental results demonstrate the practicality and efficacy of our proposed framework and algorithm in addressing the complex challenges associated with large foundation model fine-tuning.

In the vibrant landscape of artificial intelligence (AI), colossal foundation models like GPT-3 [1], CLIP [2] and BERT [3] have revolutionized AI, venturing beyond traditional machine learning approaches. These models, trained on massive datasets, possess the uncanny ability to generate images, texts, and audio with unparalleled accuracy. With sizes reaching billions of parameters, they capture intricate linguistic nuances and showcase human-level proficiency across diverse applications. Large foundation models have garnered attention for their capacity to adapt to new tasks and domains through a transfer learning approach called fine-tuning [4, 5]. Leveraging these models offers an advantage in terms of time and resource savings as compared to training models from the ground up, especially for large models like GPT3 with 175B+ parameters. The dawn of 6G technologies, boasting broad bandwidths (1 THz to 3THz) and unprecedented data communication speeds [6], potentially reaching terabits per second, opens avenues for federated fine-tuning of these expansive models.

Refer to caption
Figure 1: Intersection of Federated learning (FL), Parameter-Efficient Fine-Tuning (PEFT), and Emulator-Assisted Tuning (EAT). Here we illustrate the intersection of FL, PEFT, and (EAT). The main contribution of our current paper is to introduce Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT), as a convergence of EAT, PEFT, and FL, while EAT and Parameter-Efficient Emulator-Assisted Tuning (PEAT) are also terms coined by our paper.

One of the significant challenges associated with fine-tuning large language models lies in the distribution of data. Many real-world applications necessitate the utilization of data that resides on user devices, such as smartphones, laptops, and mobile edge devices, rather than on centralized servers. The need for decentralization hinders foundation model fine-tuning. Federated learning (FL) has emerged as promising solutions to these issues. Federated learning is a decentralized machine learning approach that enables privacy-preserving model training without the need to centralize data [7, 8, 9]. Instead of sending raw data to a central server, federated learning trains models directly on the user’s device. These models are then aggregated to create a global model, preserving data privacy while achieving the desired performance.

However, fine-tuning large language models is computationally intensive. Training a model with hundreds of millions or billions of parameters demands substantial computational resources, often beyond the reach of individual users or small organizations. This computational bottleneck can limit the widespread adoption of these models and impede their deployment in resource-constrained environments. Moreover, fine-tuning on local devices, such as smartphones or edge devices, is often not feasible due to their limited computational capabilities. Distributing the model fine-tuning process across devices while ensuring data privacy and model performance adds another layer of complexity. In response to these challenges, various methods have been explored to make fine-tuning of pre-trained models more efficient. Efforts in model tuning have extended to the realm of adapters [10, 11, 12], which encode task-specific representations within intermediate layers while preserving pre-training knowledge. Different Parameter-Efficient Fine Tuning (PEFT) techniques have been proposed, encompassing approaches such as Low Rank Adapters (LoRA) [13], prompt tuning [14, 15], prefix-tuning [16], adapters [12], P-tuning V2 [17], tuning embedding layer inputs [18], tuning hidden states [19], and more. These methods aim to update or add only a limited number of model parameters, reducing resource requirements and allowing for the sharing of parameters from the pre-trained model. Several authors of the works [20, 21, 22, 23, 24] noticed the prowess of PEFT techniques and proposed Federated-PEFT approaches.

However, large language models are often owned by research institutions or companies that bear the responsibility of maintaining and updating them. These model owners typically cannot directly share the entire model with external devices due to various reasons, including privacy concerns, intellectual property rights, and the potential for misuse. The lack of easy sharing mechanisms hampers the democratization of large language models and their use in applications that require continuous updates and fine-tuning. As a result, there is a need to develop mechanisms that allow model owners to collaborate with external parties or distribute portions of models securely. Federated fine-tuning for downstream tasks of local devices often necessitates knowledge of the entire model’s weights, potentially raising privacy concerns. Furthermore, the process of fine-tuning and deploying foundation models can pose significant resource challenges due to their substantial parameter sizes [25, 26]. Xiao et al. [27] proposed an approach to fine-tune large foundation models using the combination of an emulator, which is a compressed version of a subset of the original large foundation model, and an adapter, which are the trainable weight to be shared. Nevertheless, these authors do not consider the federated and collaborative tuning between devices. Ding et al. [28] introduced an approach that involves model compression and an emulator-adapter-like strategy for collaborative tuning of large vision models in a device-server setting. Kuang et al. [29] proposed FedOT, which is federated version of offsite-tuning. Although Kuang et al. [29] briefly touch upon an architecture similar to our proposed Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT), they do not provide detailed discussions.

Refer to caption
Figure 2: FedPEAT with Adaptive control overview. This figure shows how the Adaptive control orchestrator makes decisions on important parameters, such as device selection, emulator compression parameter, transmission bandwidth and power to facilitate the FedPEAT process.

Overview of FedPEAT with adaptive control

Proposed EAT structure

In addressing the pressing issues of model and data privacy and ownership, as well as the imperative need for memory and computation-efficient downstream model fine-tuning, we propose a novel Emulator-Assisted Tuning (EAT) structure which generalizes the offsite tuning approach introduced by Xiao et al. [27] to encompass all possible combinations of adapter and emulator configurations for large foundation model fine-tuning. Our proposed EAT structure offers the flexibility to adapt the adapter and emulator to the specific requirements of a given application. The adapter and emulator can take any form, whether encompassing layers within a transformer architecture, multi-layer perceptron, or any other neural network structure. The emulator, can have variable number of neural network layers, variable number of nodes per layer, and even variable arrangements of transformer attention units. This adaptability ensures that the model can be fine-tuned efficiently across a wide spectrum of tasks, from simple to complex.

Expansion to PEAT architecture

In the field of model tuning, various Parameter-Efficient Fine Tuning (PEFT) methods such as Low-rank Adapters (LoRA) [13], prompt tuning [14], and adapters [10, 11, 12] have been explored to make fine-tuning of pre-trained models more efficient. We combine EAT and Parameter-Efficient Fine-Tuning (PEFT) to present Parameter-Efficient Emulator-Assisted Tuning (PEAT).

FedPEAT framework

We extend the use of the PEAT into the domain of Federated Learning (FL) and introduce a novel framework, Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT). This unique integration not only addresses model and data privacy concerns by eliminating the need for the model owner to transmit the entire model to the client and the client to send their local data to the model owner but also substantially improves the memory and computational efficiency of collaborative, downstream federated model fine-tuning. We illustrate the intersection our proposed EAT, PEAT, and FedPEAT in Figure 1.

FedPEAT adaptive control mechanism

To optimize and streamline this adaptive combination of adapters and emulators, we propose coupling them with an adaptive control mechanism. This mechanism employs a deep reinforcement learning orchestrator to control critical hyper-parameters, such as emulator model compression ratios, adapter parameter-efficient fine-tuning parameters, and even device selection for participation in collaborative federated learning during each iteration (shown in Figure 2). This integration facilitates the efficient orchestration of resources, ensuring that the fine-tuning process remains memory, computation, and communication-efficient. This orchestration ensures that participating devices possess the necessary computational resources to carry out fine-tuning effectively. This contribution is essential in guaranteeing the successful application of our model adaptation and fine-tuning technique in real-world, resource-constrained environments.

Server-Device collaborative tuning

The FedPEAT framework is applicable to collaborative FL of various contribution nature. We note two distinct types of contribution cases. The first case involves FL where all data resides on mobile edge devices (i.e., clients), with no central server involvement. In this scenario, model tuning is entirely performed on the client, while the server’s role is restricted to aggregating adapter module parameters. The second case entails federated learning where data is distributed across both client devices and a central server. Fine-tuning occurs on both client devices and the server, presenting a more complex but realistic setting that highlights the adaptability and versatility of our proposed framework. In our experiments, we consider the special case of FedPEAT framework in which the server possesses data and partakes in the collaborative federated foundation model fine-tuning process instead of acting purely as an aggregator. Through these experiments, we aim to demonstrate the practical applicability and efficacy of our approach.

Parameter-Efficient Emulator-Assisted Tuning (PEAT) Sub-units

Emulator

The emulator represents a collection of neural network weights meticulously designed to mimic the behavior of the original foundation model. Through the compression of extensive neural network knowledge into a more compact architecture, emulators aim to deliver performance that closely rivals their larger counterparts while dramatically reducing computational and storage requirements. The decision to share emulators with client devices, rather than the original foundation model, serves a dual purpose: firstly, it safeguards the proprietary nature of model ownership by obviating the need to divulge the complete model to local devices; secondly, it empowers local devices to store and undertake model fine-tuning using a significantly smaller-sized emulator. In essence, an emulator serves as a streamlined and resource-efficient rendition of a more extensive model, crafted through techniques such as pruning [30], layer drop [31], or knowledge distillation [32]. Importantly, our approach employs emulators with fixed-parameter values, without fine-tuning, to encapsulate the bulk of knowledge and information derived from pre-trained foundation models.

Adapter

Adapters are modular additions to pre-existing foundation models like large language model (LLM), designed to facilitate task-specific adaptations with minimal modifications to the original model [27]. Essentially, adapters are a smaller set of neural network weights with tunable parameters so as to encode information at the user device for downstream task fine-tuning. The smaller adapter size serves two main purpose. Firstly, the adapter is designed to be a plug which can be conveniently placed at the end of the original foundation model at the server and also a plug at the end of the emulator on the local devices. Secondly, the smaller adapter size reduces adapter transmission costs. By only tuning the parameters of these added layers, one can harness the generalized capabilities of large models while efficiently tailoring them for specific tasks.

PEFT integration

PEFT methods like LoRA [13] and Adapter [12] can significantly reduce model size, consequently save memory, while achieving comparable model performance to a model which do not use PEFT approaches [27]. The integration of PEFT methods is seamless and can be directly applied on the adapter module in each federated learning iteration.

Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT) Framework

Refer to caption
Figure 3: Emulator-Assisted Tuning generalized to three cases. Figure illustrates how the neural network structures at the server and local devices differ in each case. Case 1 represents our proposed FedPEAT framework. Case 2 represents the integration of Federated Learning and PEFT. Case 3 represents a traditional Federated Learning scenario.

Emulators and Adapters

The server houses a foundation model Mθgsubscript𝑀subscript𝜃𝑔M_{\theta_{g}}italic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT, while each user device (UEs) labeled by index n𝑛nitalic_n receives and holds:

{An adapter Aϕn specifically tuned for the downstream task.An emulator Eθn, which is a tailored version of the foundational model, represented by Eθn=f(MθgAϕn).casesAn adapter Aϕn specifically tuned for the downstream task.𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒An emulator Eθn, which is a tailored version of the𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 foundational model, represented by𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Eθn=f(MθgAϕn).𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒\displaystyle\begin{cases}\text{An adapter $A_{\phi_{n}}$ specifically tuned % for the downstream task.}\\ \text{An emulator $E_{\theta_{n}}$, which is a tailored version of the}\\ \text{\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ foundational model, represented by}\\ \text{\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ $E_{\theta_{n}}=f(M_{\theta_{g}}-A_{\phi_{n}})$% .}\end{cases}{ start_ROW start_CELL An adapter italic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT specifically tuned for the downstream task. end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL An emulator italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , which is a tailored version of the end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL foundational model, represented by end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_f ( italic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . end_CELL start_CELL end_CELL end_ROW

The adapter, denoted as A𝐴Aitalic_A, aligns with the definition put forth by [27], comprising sets of layers embedded within the foundation model’s architecture. These layers feature tunable parameters, specifically designed to facilitate model fine-tuning by encoding new information from downstream tasks. On the contrary, the emulators denoted by E𝐸Eitalic_E encapsulate a version of the original foundation model that may have undergone modifications. The adaptation of the emulator occurs after the removal of the adapter layers and serves as a guiding framework for tuning the adapter parameters. The parameter value of the emulators are fixed and aims to emulate the large foundation models. The transformation function f()𝑓f()italic_f ( ), in this context, refers to model compression algorithms such as layer drop** [33], model pruning [30]. Let ω𝜔\omegaitalic_ω represent the weights that are collaboratively trainable on both the server and device. ωssubscriptsuperscript𝜔𝑠\omega^{\prime}_{s}italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT refers to the untrainable weights specific to the server, excluding ω𝜔\omegaitalic_ω. ωcsubscriptsuperscript𝜔𝑐\omega^{\prime}_{c}italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT denotes the untrainable weights specific to a device, distinct from ω𝜔\omegaitalic_ω.

Given this, we can generalize emulator-assisted tuning to three cases:

  • Case 1: ωsωc::subscriptsuperscript𝜔𝑠subscriptsuperscript𝜔𝑐absent\omega^{\prime}_{s}\neq\omega^{\prime}_{c}\neq\varnothing:italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ≠ italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≠ ∅ : This is our proposed, more generalized framework, in which we permit various user devices (UEs) to employ distinct emulators, denoted as Eθnsubscript𝐸subscript𝜃𝑛E_{\theta_{n}}italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT. These emulators correspond to the untrainable weights on a device, ωcsubscriptsuperscript𝜔𝑐\omega^{\prime}_{c}italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, which are maintained at fixed values. Similarly, the subset of the model with untrainable parameters Mθsubscriptsuperscript𝑀𝜃M^{\prime}_{\theta}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT corresponds to ωssubscriptsuperscript𝜔𝑠\omega^{\prime}_{s}italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. This flexibility is particularly important since UEs frequently operate with constrained storage and computational resources. Therefore, emphasizing the efficient decompression and adaptiveness of the foundation model becomes essential.

  • Case 2: ωs=ωc::subscriptsuperscript𝜔𝑠subscriptsuperscript𝜔𝑐absent\omega^{\prime}_{s}=\omega^{\prime}_{c}\neq\varnothing:italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≠ ∅ : This scenario is a subset of Case 1. Here, the emulator designated for UE n𝑛nitalic_n aligns with the static parameters of the overarching foundation model (i.e., ωs=ωcsubscriptsuperscript𝜔𝑠subscriptsuperscript𝜔𝑐\omega^{\prime}_{s}=\omega^{\prime}_{c}italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT). In this setup, we synergize Federated Learning (FL) training with Parameter-Efficient Fine-Tuning (PEFT) techniques, reflecting strategies showcased in previous research such as  [34, 35, 21, 23].

  • Case 3: ωs=ωc=::subscriptsuperscript𝜔𝑠subscriptsuperscript𝜔𝑐absent\omega^{\prime}_{s}=\omega^{\prime}_{c}=\varnothing:italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = ∅ : This is another specific instance within the purview of Case 1. In this scenario, all participants utilize the adjustable parameters of the global foundation model ω𝜔\omegaitalic_ω. Essentially, no weights remain untrainable beyond the collaboratively trainable ones (i.e., ωs=ωc=subscriptsuperscript𝜔𝑠subscriptsuperscript𝜔𝑐\omega^{\prime}_{s}=\omega^{\prime}_{c}=\varnothingitalic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_ω start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = ∅). This methodology closely mirrors the conventional federated learning (FL) paradigm, where individual model parameters are amalgamated to shape the global model.

The details of the cases are further illustrated in Figure 3.

Tuning Process

The server model, denoted as Mθgsubscript𝑀subscript𝜃𝑔M_{\theta_{g}}italic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT, can be decomposed into two primary components: the untrainable subset of weights of the foundation model Mθsubscriptsuperscript𝑀𝜃M^{\prime}_{\theta}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, and the adapter Aϕsubscript𝐴italic-ϕA_{\phi}italic_A start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT. After such decomposition, the server model is expressed as MθAϕsubscriptsuperscript𝑀𝜃subscript𝐴italic-ϕM^{\prime}_{\theta}\circ A_{\phi}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ∘ italic_A start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT, with the symbol “\circ" signifying the neural network connections between Mθsubscriptsuperscript𝑀𝜃M^{\prime}_{\theta}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and Aϕsubscript𝐴italic-ϕA_{\phi}italic_A start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT. It is important to note that the arrangement of layers Mθsubscriptsuperscript𝑀𝜃M^{\prime}_{\theta}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and Aϕsubscript𝐴italic-ϕA_{\phi}italic_A start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT is flexible and can be configured in various orders. Emulator-assisted tuning (EAT) is an approach which generalizes all emulator-adapter based configurations, which include those proposed by [27] and extends it to cases beyond those proposed by [27] such as the “Vertical” splitting of the foundation model [28]. Furthermore, the term “offsite” in their work [27] only considers a single device tuning and does not consider a multiple device collaborative training scenario. Our proposed EAT approach generalizes the emulator and adapter approach to collaborative tuning between multiple devices. Furthermore, Xiao et al. [27] do not consider a collaborative fine-tuning scenario where there are datasets that are stored at the server and that the server is able to partake in collaborative fine-tuning as proposed by [28]. The emulator-to-be, represented as Mθsubscriptsuperscript𝑀𝜃M^{\prime}_{\theta}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, can be customized to create emulator Eθnsubscript𝐸subscript𝜃𝑛E_{\theta_{n}}italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT specific to UE n𝑛nitalic_n taking into account UE n𝑛nitalic_n’s device hardware configurations and conditions of its environment. Subsequently, this tailored emulator, Eθnsubscript𝐸subscript𝜃𝑛E_{\theta_{n}}italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT, is distributed to each respective UE. In our work, we extend our proposed PEAT approach to a collaborative, federated model fine-tuning context and establish the Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT) framework.

We denote 𝒩={1,2,,N}𝒩12𝑁\mathcal{N}=\{1,2,\ldots,N\}caligraphic_N = { 1 , 2 , … , italic_N } and 𝒯={1,2,,T}𝒯12𝑇\mathcal{T}=\{1,2,\ldots,T\}caligraphic_T = { 1 , 2 , … , italic_T } as the UE and iteration set for accomplishing the training. At the start of the first iteration, each adapter Aθn0subscriptsuperscript𝐴0subscript𝜃𝑛A^{0}_{\theta_{n}}italic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT with randomly initialized parameter values is disseminated to UE n𝑛nitalic_n where the complete user device model for UE n𝑛nitalic_n will be Eθn0Aθn0subscriptsuperscript𝐸0subscript𝜃𝑛subscriptsuperscript𝐴0subscript𝜃𝑛E^{0}_{\theta_{n}}\circ A^{0}_{\theta_{n}}italic_E start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ italic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT. And then, the server orchestrator will determine the user selection {Unt|n𝒩}conditional-setsuperscriptsubscript𝑈𝑛𝑡for-all𝑛𝒩\{U_{n}^{t}|\forall n\in\mathcal{N}\}{ italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | ∀ italic_n ∈ caligraphic_N } for participating in the current update, where Unt=0superscriptsubscript𝑈𝑛𝑡0U_{n}^{t}=0italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 0 signifies non-participation, and 1111 indicates participation. Each selected UE will then carry out model-emulated-assisted fine-tuning with their local dataset Dntsubscriptsuperscript𝐷𝑡𝑛D^{t}_{n}italic_D start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and update the parameter values of their adapter to produce Aθn1subscriptsuperscript𝐴1subscript𝜃𝑛A^{1}_{\theta_{n}}italic_A start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT with the assistance of the emulator. Each user device UE n𝑛nitalic_n will then upload their adapter parameters to the server for adapter parameter aggregation as follows:

Aϕgt+1=1n𝒩:Unt=1|Dnt|n=1N(|Dnt|Aϕnt),subscriptsuperscript𝐴𝑡1subscriptitalic-ϕ𝑔1subscript:𝑛𝒩superscriptsubscript𝑈𝑛𝑡1subscriptsuperscript𝐷𝑡𝑛subscriptsuperscript𝑁𝑛1subscriptsuperscript𝐷𝑡𝑛subscriptsuperscript𝐴𝑡subscriptitalic-ϕ𝑛\displaystyle A^{t+1}_{\phi_{g}}=\frac{1}{\sum\limits_{n\in\mathcal{N}:U_{n}^{% t}=1}|D^{t}_{n}|}\cdot\sum^{N}_{n=1}(|D^{t}_{n}|\cdot A^{t}_{\phi_{n}}),italic_A start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N : italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT | italic_D start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | end_ARG ⋅ ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT ( | italic_D start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | ⋅ italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , (1)

where |Dnt|subscriptsuperscript𝐷𝑡𝑛|D^{t}_{n}|| italic_D start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | is the size of the data being trained on at UE n𝑛nitalic_n. The server will then disseminate this global adapter Aϕgt+1subscriptsuperscript𝐴𝑡1subscriptitalic-ϕ𝑔A^{t+1}_{\phi_{g}}italic_A start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT. This above-mentioned tuning process proceeds for further iterations until model convergence, or as defined by a specific criterion. This process can be summarized in Algorithm 1.

FedPEAT Adaptive Control Mechanism

The FedPEAT framework facilitates the collaborative, federated fine-tuning of models for downstream tasks. However, successful adoption of the framework for downstream task fine-tuning has to be achieved through adaptive control on the hyper-parameters related to the FedPEAT framework, the PEFTs and the FL process. As there are potentially many variables-of-concerns, and diverse scenarios, we designed an adaptive control system which is able to handle the control of multiple variables. To illustrate the FedPEAT with Adaptive Control mechanism, we consider a situation where UEs are moving within a fixed geographic space where the channel gain between the user device and the server changes.

Algorithm 1 FedPEAT with Adaptive Control
0:  device set 𝒩={1,2,,N}𝒩12𝑁\mathcal{N}=\{1,2,\ldots,N\}caligraphic_N = { 1 , 2 , … , italic_N }, initial global adapter Aϕg0superscriptsubscript𝐴subscriptitalic-ϕ𝑔0A_{\phi_{g}}^{0}italic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, foundation model Mθgsubscript𝑀subscript𝜃𝑔M_{\theta_{g}}italic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT
1:  for iteration t{1,2,T}𝑡12𝑇t\in\{1,2...,T\}italic_t ∈ { 1 , 2 … , italic_T } do
2:     Adaptive control to decide user selection, emulator sizes, downlink bandwidth, and transmission power resources for every user: {(Unt,Eθnt,Bnt,Pnt)|n𝒩}conditional-setsuperscriptsubscript𝑈𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐵𝑛𝑡superscriptsubscript𝑃𝑛𝑡for-all𝑛𝒩\{(U_{n}^{t},E_{\theta_{n}}^{t},B_{n}^{t},P_{n}^{t})|\forall n\in\mathcal{N}\}{ ( italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | ∀ italic_n ∈ caligraphic_N }, based on Section FedPEAT Adaptive Control Mechanism
3:     Transmit global adapter Aϕgtsuperscriptsubscript𝐴subscriptitalic-ϕ𝑔𝑡A_{\phi_{g}}^{t}italic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and if the current emulator on user devices needs to be changed, also transmit changed emulators {Eθnt|n𝒩:EθntEθnt1}conditional-setsuperscriptsubscript𝐸subscript𝜃𝑛𝑡:for-all𝑛𝒩superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1\{E_{\theta_{n}}^{t}|\forall n\in\mathcal{N}:{E_{\theta_{n}}^{t}\neq E_{\theta% _{n}}^{t-1}}\}{ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | ∀ italic_n ∈ caligraphic_N : italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT } to devices
4:     for device n𝒩𝑛𝒩n\in\mathcal{N}italic_n ∈ caligraphic_N in parallel do
5:        for epoch ν=1𝜈1\nu=1italic_ν = 1 to V𝑉Vitalic_V do
6:           Aϕnt[ν+1]=ModelTuning(Dnt,Eθnt,Aϕnt[ν]subscriptsuperscript𝐴𝑡subscriptitalic-ϕ𝑛delimited-[]𝜈1ModelTuning(subscriptsuperscript𝐷𝑡𝑛subscriptsuperscript𝐸𝑡subscript𝜃𝑛subscriptsuperscript𝐴𝑡subscriptitalic-ϕ𝑛delimited-[]𝜈A^{t}_{\phi_{n}}[\nu+1]=\text{ModelTuning(}D^{t}_{n},E^{t}_{\theta_{n}},A^{t}_% {\phi_{n}}[\nu]italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_ν + 1 ] = ModelTuning( italic_D start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_ν ]).
7:        end for
8:        AϕntAϕnt[V]subscriptsuperscript𝐴𝑡subscriptitalic-ϕ𝑛subscriptsuperscript𝐴𝑡subscriptitalic-ϕ𝑛delimited-[]𝑉A^{t}_{\phi_{n}}\leftarrow A^{t}_{\phi_{n}}[V]italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_V ],
9:        Transmit local Aϕntsubscriptsuperscript𝐴𝑡subscriptitalic-ϕ𝑛A^{t}_{\phi_{n}}italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT to server.
10:     end for
11:     Perform adapter parameter aggregation with equation (1) to obtain Aϕgt+1subscriptsuperscript𝐴𝑡1subscriptitalic-ϕ𝑔A^{t+1}_{\phi_{g}}italic_A start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT
12:  end for

Adaptive Control Scenario

In each iteration, the server orchestration performs key tasks. It begins by selecting users (Unt|n𝒩conditionalsuperscriptsubscript𝑈𝑛𝑡𝑛𝒩{U_{n}^{t}|n\in\mathcal{N}}italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_n ∈ caligraphic_N) for participation in the FL process. Next, it determines emulators for each user (Eθnt|n𝒩conditionalsuperscriptsubscript𝐸subscript𝜃𝑛𝑡𝑛𝒩{E_{\theta_{n}}^{t}|n\in\mathcal{N}}italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_n ∈ caligraphic_N), arranges downlink bandwidth resources (Bnt|n𝒩conditionalsuperscriptsubscript𝐵𝑛𝑡𝑛𝒩{B_{n}^{t}|n\in\mathcal{N}}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_n ∈ caligraphic_N), and allocates downlink transmission power (Pnt|n𝒩conditionalsuperscriptsubscript𝑃𝑛𝑡𝑛𝒩{P_{n}^{t}|n\in\mathcal{N}}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | italic_n ∈ caligraphic_N) for effective UE engagement. Frequency Division Multiple Access (FDMA) communication technique is adopted to mitigate the interference between UEs associated with different edge servers. Similar to the setting in [36], we assume the central server allocates its dedicated bandwidth to the UEs it is associated with. According to Shannon’s formula, the achievable transmission rate of UE n𝑛nitalic_n and the central server can be formulated as

rnt(Bnt,Pnt)=Bntlog2(1+gntPntBntσ02),superscriptsubscript𝑟𝑛𝑡superscriptsubscript𝐵𝑛𝑡superscriptsubscript𝑃𝑛𝑡superscriptsubscript𝐵𝑛𝑡subscript21superscriptsubscript𝑔𝑛𝑡superscriptsubscript𝑃𝑛𝑡superscriptsubscript𝐵𝑛𝑡superscriptsubscript𝜎02\displaystyle r_{n}^{t}(B_{n}^{t},P_{n}^{t})=B_{n}^{t}\log_{2}(1+\frac{g_{n}^{% t}P_{n}^{t}}{B_{n}^{t}\sigma_{0}^{2}}),italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + divide start_ARG italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , (2)

where rnt(Bnt,Pnt)superscriptsubscript𝑟𝑛𝑡superscriptsubscript𝐵𝑛𝑡superscriptsubscript𝑃𝑛𝑡r_{n}^{t}(B_{n}^{t},P_{n}^{t})italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) means transmission rate rntsuperscriptsubscript𝑟𝑛𝑡r_{n}^{t}italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is a function of Bnt,Pntsuperscriptsubscript𝐵𝑛𝑡superscriptsubscript𝑃𝑛𝑡B_{n}^{t},P_{n}^{t}italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. gnsubscript𝑔𝑛g_{n}italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the channel gain between UE n𝑛nitalic_n and the central server, with Rician fading being the small-scale fading [37], and σ0subscript𝜎0\sigma_{0}italic_σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the power spectral density of additive white Gaussian noise. Note that the total bandwidth the central server can allocate is Bmaxsubscript𝐵B_{\max}italic_B start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, so we have n𝒩:Unt=1BntBmaxsubscript:𝑛𝒩superscriptsubscript𝑈𝑛𝑡1superscriptsubscript𝐵𝑛𝑡subscript𝐵\sum_{n\in\mathcal{N}:U_{n}^{t}=1}B_{n}^{t}\leq B_{\max}∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N : italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_B start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. We also optimize the power allocated by the central server for the downlink transmission of emulator and adapters. Note that the total power the central server can allocate is Pmaxsubscript𝑃P_{\max}italic_P start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, so we have n𝒩:Unt=1PnPmaxsubscript:𝑛𝒩superscriptsubscript𝑈𝑛𝑡1subscript𝑃𝑛subscript𝑃\sum_{n\in\mathcal{N}:U_{n}^{t}=1}P_{n}\leq P_{\max}∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N : italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ italic_P start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. As the size of the adapter is small and negligible in the context of emulator assisted-tuning, we assume the adapter is transmitted via a dedicated channel, and ignore the uplink energy and time overhead for adapters. Only if the current emulator is designated for modification, do we proceed to transmit the updated emulator. We introduce an indicator function χ[x]𝜒delimited-[]𝑥\chi[x]italic_χ [ italic_x ] that equals 1 when event x𝑥xitalic_x occurs and 0 otherwise. Then the transmission delay from server to UE n𝑛nitalic_n within one iteration can be given as:

dn,transt(Unt,Eθnt,Bnt,Pnt)=Unt×χ[EθntEθnt1]×D(Eθnt)rnt,superscriptsubscript𝑑𝑛𝑡𝑟𝑎𝑛𝑠𝑡superscriptsubscript𝑈𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐵𝑛𝑡superscriptsubscript𝑃𝑛𝑡superscriptsubscript𝑈𝑛𝑡𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1𝐷superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝑟𝑛𝑡\displaystyle d_{n,trans}^{t}(U_{n}^{t},E_{\theta_{n}}^{t},B_{n}^{t},P_{n}^{t}% )=U_{n}^{t}\times\chi[E_{\theta_{n}}^{t}\neq E_{\theta_{n}}^{t-1}]\times\frac{% D(E_{\theta_{n}}^{t})}{r_{n}^{t}},italic_d start_POSTSUBSCRIPT italic_n , italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT × italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] × divide start_ARG italic_D ( italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_r start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG , (3)

where D(EθntD(E_{\theta_{n}}^{t}italic_D ( italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the allocated emulator size. Then, the time for one round of local training and model transmission for UE n𝑛nitalic_n is Qnt=dn,compt+dn,transtsuperscriptsubscript𝑄𝑛𝑡superscriptsubscript𝑑𝑛𝑐𝑜𝑚𝑝𝑡superscriptsubscript𝑑𝑛𝑡𝑟𝑎𝑛𝑠𝑡Q_{n}^{t}=d_{n,comp}^{t}+d_{n,trans}^{t}italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_d start_POSTSUBSCRIPT italic_n , italic_c italic_o italic_m italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_d start_POSTSUBSCRIPT italic_n , italic_t italic_r italic_a italic_n italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, where dn,comptsuperscriptsubscript𝑑𝑛𝑐𝑜𝑚𝑝𝑡d_{n,comp}^{t}italic_d start_POSTSUBSCRIPT italic_n , italic_c italic_o italic_m italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the model fine-tuning time taken for iteration t𝑡titalic_t of local training at UE n𝑛nitalic_n, computed empirically.

Therefore, we formulate the problem as follows:

min{(Unt,Eθnt,Bnt,Pnt)|n𝒩,t𝒯}subscriptconditional-setsuperscriptsubscript𝑈𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐵𝑛𝑡superscriptsubscript𝑃𝑛𝑡formulae-sequencefor-all𝑛𝒩for-all𝑡𝒯\displaystyle\min_{\{(U_{n}^{t},E_{\theta_{n}}^{t},B_{n}^{t},P_{n}^{t})|% \forall n\in\mathcal{N},\forall t\in\mathcal{T}\}}roman_min start_POSTSUBSCRIPT { ( italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) | ∀ italic_n ∈ caligraphic_N , ∀ italic_t ∈ caligraphic_T } end_POSTSUBSCRIPT {ξp1Nt=1Tn=1Npnt+ξft=1Tmaxn𝒩Qnt+ξs1Nt=1Tn=1Nχ[EθntEθnt1]},subscript𝜉𝑝1𝑁superscriptsubscript𝑡1𝑇superscriptsubscript𝑛1𝑁subscriptsuperscript𝑝𝑡𝑛subscript𝜉𝑓superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscriptsuperscript𝑄𝑡𝑛subscript𝜉𝑠1𝑁superscriptsubscript𝑡1𝑇superscriptsubscript𝑛1𝑁𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1\displaystyle\Bigg{\{}\xi_{p}\cdot\frac{1}{N}\sum_{t=1}^{T}\sum_{n=1}^{N}p^{t}% _{n}+\xi_{f}\cdot\sum_{t=1}^{T}\max_{n\in\mathcal{N}}Q^{t}_{n}+\xi_{s}\cdot% \frac{1}{N}\cdot\sum_{t=1}^{T}\sum_{n=1}^{N}\chi[E_{\theta_{n}}^{t}\neq E_{% \theta_{n}}^{t-1}]\Bigg{\}},{ italic_ξ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] } , (Adaptive Control Scenario)
Subject to: mnt1qmmax,nt,n𝒩,formulae-sequencesubscriptsuperscript𝑚𝑡𝑛1𝑞subscriptsuperscript𝑚𝑡𝑛for-all𝑛𝒩\displaystyle m^{t}_{n}\leq\frac{1}{q}\cdot m^{t}_{\max,n},\leavevmode\nobreak% \ \forall n\in\mathcal{N},italic_m start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≤ divide start_ARG 1 end_ARG start_ARG italic_q end_ARG ⋅ italic_m start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_max , italic_n end_POSTSUBSCRIPT , ∀ italic_n ∈ caligraphic_N , (4a)
t=2Tχ[EθntEθnt1]T1c,n𝒩,t𝒯,formulae-sequencesuperscriptsubscript𝑡2𝑇𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1𝑇1𝑐formulae-sequencefor-all𝑛𝒩for-all𝑡𝒯\displaystyle\sum_{t=2}^{T}\chi[E_{\theta_{n}}^{t}\neq E_{\theta_{n}}^{t-1}]% \leq T\cdot\frac{1}{c},\leavevmode\nobreak\ \forall n\in\mathcal{N},% \leavevmode\nobreak\ \forall t\in\mathcal{T},∑ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] ≤ italic_T ⋅ divide start_ARG 1 end_ARG start_ARG italic_c end_ARG , ∀ italic_n ∈ caligraphic_N , ∀ italic_t ∈ caligraphic_T , (4b)
n𝒩BntBmax,t𝒯,formulae-sequencesubscript𝑛𝒩superscriptsubscript𝐵𝑛𝑡subscript𝐵for-all𝑡𝒯\displaystyle\sum_{n\in\mathcal{N}}B_{n}^{t}\leq B_{\max},\leavevmode\nobreak% \ \leavevmode\nobreak\ \forall t\in\mathcal{T},∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_B start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , ∀ italic_t ∈ caligraphic_T , (4c)
n𝒩PntPmax,t𝒯.formulae-sequencesubscript𝑛𝒩superscriptsubscript𝑃𝑛𝑡subscript𝑃for-all𝑡𝒯\displaystyle\sum_{n\in\mathcal{N}}P_{n}^{t}\leq P_{\max},\leavevmode\nobreak% \ \leavevmode\nobreak\ \forall t\in\mathcal{T}.∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≤ italic_P start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , ∀ italic_t ∈ caligraphic_T . (4d)

In the objective function (Adaptive Control Scenario) above, pntsubscriptsuperscript𝑝𝑡𝑛p^{t}_{n}italic_p start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT stands for the perplexity score achieved by UE n𝑛nitalic_n at iteration t𝑡titalic_t, where it is a performance measure for how well a language model predicts a set of data, Qntsubscriptsuperscript𝑄𝑡𝑛Q^{t}_{n}italic_Q start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT stands for the log of total time taken for a single round of adapter and emulator transmission. χ[EθntEθnt1]𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1\chi[E_{\theta_{n}}^{t}\neq E_{\theta_{n}}^{t-1}]italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] represents the emulator exchange count. ξpsubscript𝜉𝑝\xi_{p}italic_ξ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, ξfsubscript𝜉𝑓\xi_{f}italic_ξ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, and ξssubscript𝜉𝑠\xi_{s}italic_ξ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT stand for the weight balancing parameters for these objectives. mntsubscriptsuperscript𝑚𝑡𝑛m^{t}_{n}italic_m start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT represents the memory space taken for the model assigned to UE n𝑛nitalic_n, while mmax,ntsubscriptsuperscript𝑚𝑡𝑛m^{t}_{\max,n}italic_m start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_max , italic_n end_POSTSUBSCRIPT represents the available memory capacity of device n𝑛nitalic_n at iteration t𝑡titalic_t. c𝑐citalic_c and q𝑞qitalic_q are numerical constants. Constraint (4a) ensures that the total memory consumed for any device in each round falls well below a predefined fraction of its memory capacity. Constraint (4b) prevents excessive emulator switch counts to reduce transmission costs. Constraint (4c), (4d) are the limits of total bandwidth and power resources from the server. Essentially, the objective function (Adaptive Control Scenario) aims to minimize the sum of perplexity scores across T𝑇Titalic_T iterations which is synonymous with achieving a quicker rate of model tuning convergence, minimizing the maximum training time amongst all devices for the federated fine-tuning process, and emulator exchange count, via optimizing the emulator compression parameter, device selection vector, bandwidth selection vector, and downlink power selection vector. For the sake of simplicity in our demonstration, we use 1Nn=1Npnt1𝑁superscriptsubscript𝑛1𝑁subscriptsuperscript𝑝𝑡𝑛\frac{1}{N}\sum_{n=1}^{N}p^{t}_{n}divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as an estimate of global ptsuperscript𝑝𝑡p^{t}italic_p start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. The rationale behind such a formulation is to expedite model convergence, all while maintaining the constraint of the maximum total transmission and computation delay among UEs. Additionally, this approach ensures that the sizes of both the emulator and adapters remain within a practical fraction of the local devices’ memory capacity.

Deep reinforcement learning approach

We have devised a deep reinforcement learning approach as our driver behind our adaptive control mechanism to tackle our proposed problem as the problem is highly sequential and is a mixed-integer non-linear programming problem.

State

To effectively execute the FedPEAT approach, we included the following variables within the state: (1) user device-server channel gain gntsuperscriptsubscript𝑔𝑛𝑡g_{n}^{t}italic_g start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT which is required for the computation of rntsubscriptsuperscript𝑟𝑡𝑛r^{t}_{n}italic_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, (2) user device available memory capacity mntsubscriptsuperscript𝑚𝑡𝑛m^{t}_{n}italic_m start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, (3) FedPEAT UE n𝑛nitalic_n emulator exchange count χ[EθntEθnt1]𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1\chi[E_{\theta_{n}}^{t}\neq E_{\theta_{n}}^{t-1}]italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] which keeps track of the number of times UE n𝑛nitalic_n has undergone emulator exchange. Output information from each successive actor branch is appended to the state to be fed into the next actor branch as shown in Figure 4. These additional information include user-selection Untsubscriptsuperscript𝑈𝑡𝑛U^{t}_{n}italic_U start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, bandwidth selection Bntsubscriptsuperscript𝐵𝑡𝑛B^{t}_{n}italic_B start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and power selection Pntsubscriptsuperscript𝑃𝑡𝑛P^{t}_{n}italic_P start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT at the current time step.

Action

In this study, we have 4 actions to include in the agent action space: (1) UE selection vector {Unt|n𝒩}conditional-setsuperscriptsubscript𝑈𝑛𝑡for-all𝑛𝒩\{U_{n}^{t}|\forall n\in\mathcal{N}\}{ italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | ∀ italic_n ∈ caligraphic_N }, (2) downlink bandwidth selection vector {Bnt|n𝒩}conditional-setsuperscriptsubscript𝐵𝑛𝑡for-all𝑛𝒩\{B_{n}^{t}|\forall n\in\mathcal{N}\}{ italic_B start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | ∀ italic_n ∈ caligraphic_N }, (3) downlink power selection vector {Pnt|n𝒩}conditional-setsuperscriptsubscript𝑃𝑛𝑡for-all𝑛𝒩\{P_{n}^{t}|\forall n\in\mathcal{N}\}{ italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | ∀ italic_n ∈ caligraphic_N } (4) choice of emulator compression parameter {Eθnt|n𝒩}conditional-setsuperscriptsubscript𝐸subscript𝜃𝑛𝑡for-all𝑛𝒩\{E_{\theta_{n}}^{t}|\forall n\in\mathcal{N}\}{ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | ∀ italic_n ∈ caligraphic_N } for each device, stored in a vector.

Reward

We formulate our reward function as per our objective function, where we assign our reinforcement learning agent the reward as follows in each iteration:

Rdt=ξf1Tt=1Tmaxn𝒩Qnt,Rpt=ξp1TNt=1Tn=1Npnt,Rst=ξs1TNt=1Tn=1Nχ[EθntEθnt1].formulae-sequencesubscriptsuperscript𝑅𝑡𝑑subscript𝜉𝑓1𝑇superscriptsubscript𝑡1𝑇subscript𝑛𝒩subscriptsuperscript𝑄𝑡𝑛formulae-sequencesubscriptsuperscript𝑅𝑡𝑝subscript𝜉𝑝1𝑇𝑁superscriptsubscript𝑡1𝑇superscriptsubscript𝑛1𝑁subscriptsuperscript𝑝𝑡𝑛subscriptsuperscript𝑅𝑡𝑠subscript𝜉𝑠1𝑇𝑁superscriptsubscript𝑡1𝑇superscriptsubscript𝑛1𝑁𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1\displaystyle R^{t}_{d}=-\xi_{f}\cdot\frac{1}{T}\sum_{t=1}^{T}\max_{n\in% \mathcal{N}}Q^{t}_{n},\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ R^{t}_{p}=-\xi_{p}\frac{1}{TN}\sum_{t=1}^{T}\sum_{n=1}^{N}p^{t}_{n},% \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ R^{t}_{s}=-\xi_% {s}\cdot\frac{1}{TN}\cdot\sum_{t=1}^{T}\sum_{n=1}^{N}\chi[E_{\theta_{n}}^{t}% \neq E_{\theta_{n}}^{t-1}].italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = - italic_ξ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_max start_POSTSUBSCRIPT italic_n ∈ caligraphic_N end_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = - italic_ξ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_T italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = - italic_ξ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⋅ divide start_ARG 1 end_ARG start_ARG italic_T italic_N end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] . (5)

In addition, we assign the agent very large penalties ϰitalic-ϰ\varkappaitalic_ϰ when (1) the memory size of emulator Eθntsuperscriptsubscript𝐸subscript𝜃𝑛𝑡E_{\theta_{n}}^{t}italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and adapter Aϕntsuperscriptsubscript𝐴subscriptitalic-ϕ𝑛𝑡A_{\phi_{n}}^{t}italic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT exceeds an allowable fraction of the local device n𝑛nitalic_n’s memory capacity, in accordance to constraint (4a), and (2) the emulator exchange count χ[EθntEθnt1]𝜒delimited-[]superscriptsubscript𝐸subscript𝜃𝑛𝑡superscriptsubscript𝐸subscript𝜃𝑛𝑡1\chi[E_{\theta_{n}}^{t}\neq E_{\theta_{n}}^{t-1}]italic_χ [ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ≠ italic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ] exceeds a given fraction of the total iteration, in accordance to constraint (4b).

Reinforcement Learning Algorithm

We adopted the Proximal Policy Optimization (PPO) algorithm, developed by OpenAI [38], which stands as an advancement over traditional policy gradient algorithms. In the domain of sequential problems such as reinforcement learning, even minor adjustments to parameters can have a profound impact on performance, making parameter fine-tuning a challenging endeavor. PPO tackles the issue of delicate and noisy advantage estimates by implementing a cautious approach. It incorporates a Kullback–Leibler (KL) divergence penalty to regulate policy adjustments. Furthermore, PPO makes use of an importance sampling technique [39] by employing asynchronous policies for training and data collection, enhancing overall efficiency. The loss function for the Actor is formally defined as follows [38]:

LCLIP(φ)=𝔼t[min(𝔯t(φ)ϖt,clip(𝔯t(φ),1ϵ,1+ϵ)ϖt)].superscript𝐿𝐶𝐿𝐼𝑃𝜑superscript𝔼𝑡delimited-[]superscript𝔯𝑡𝜑superscriptitalic-ϖ𝑡clipsuperscript𝔯𝑡𝜑1italic-ϵ1italic-ϵsuperscriptitalic-ϖ𝑡\displaystyle L^{CLIP}(\varphi)=\mathbb{E}^{t}\left[\min\left(\mathfrak{r}^{t}% ({\varphi})\varpi^{t},\text{clip}\left(\mathfrak{r}^{t}(\varphi),1-\epsilon,1+% \epsilon\right)\varpi^{t}\right)\right].italic_L start_POSTSUPERSCRIPT italic_C italic_L italic_I italic_P end_POSTSUPERSCRIPT ( italic_φ ) = blackboard_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ roman_min ( fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_φ ) italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , clip ( fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_φ ) , 1 - italic_ϵ , 1 + italic_ϵ ) italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ] .

In this context, φ𝜑\varphiitalic_φ represents the policy. 𝔼tsuperscript𝔼𝑡\mathbb{E}^{t}blackboard_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT signifies empirical expectations over the trajectory. 𝔯tsuperscript𝔯𝑡\mathfrak{r}^{t}fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT represents the ratio of the current policy to the old policy. ϖtsuperscriptitalic-ϖ𝑡\varpi^{t}italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT denotes the estimated advantage at time t𝑡titalic_t and ϵitalic-ϵ\epsilonitalic_ϵ denotes the clip value. This clip** mechanism acts as a safeguard, preventing significant bias and ensuring that the policy remains within a trusted range.

Refer to caption
Figure 4: Our proposed SABPPO algorithm and architecture. Figure illustrates the underlying actor and critic architecture, their interaction with the environment and model update process.
Algorithm 2 SABPPO adaptive control algorithm
0:  critic parameter ϕitalic-ϕ\phiitalic_ϕ, critic target parameter ϕsuperscriptitalic-ϕ\phi^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, 𝒜𝒜\mathcal{A}caligraphic_A actor parameter φ𝜑\varphiitalic_φ and data-sampling parameter φsuperscript𝜑\varphi^{{}^{\prime}}italic_φ start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT, initialize state sgt=sg1subscriptsuperscript𝑠𝑡𝑔subscriptsuperscript𝑠1𝑔s^{t}_{g}=s^{1}_{g}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_s start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT;
1:  for iteration = 1,2,121,2,...1 , 2 , … do
2:     for action = 1,2,121,2,...1 , 2 , … do
3:        𝒜actionsubscript𝒜action\mathcal{A_{\text{action}}}caligraphic_A start_POSTSUBSCRIPT action end_POSTSUBSCRIPT choose action vector aactiontsubscriptsuperscript𝑎𝑡actiona^{t}_{\text{action}}italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT action end_POSTSUBSCRIPT according to πφ(aactiont|sactiont)subscript𝜋superscript𝜑conditionalsubscriptsuperscript𝑎𝑡actionsubscriptsuperscript𝑠𝑡action\pi_{\varphi^{\prime}}(a^{t}_{\text{action}}|s^{t}_{\text{action}})italic_π start_POSTSUBSCRIPT italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT action end_POSTSUBSCRIPT | italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT action end_POSTSUBSCRIPT ), based on SABPPO section.
4:        saction+1tsubscriptsuperscript𝑠𝑡action1s^{t}_{\text{action}+1}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT action + 1 end_POSTSUBSCRIPT = concatenate{sactiont,aactiont}subscriptsuperscript𝑠𝑡actionsubscriptsuperscript𝑎𝑡action\{s^{t}_{\text{action}},a^{t}_{\text{action}}\}{ italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT action end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT action end_POSTSUBSCRIPT }.
5:     end for
6:     Get Rdtsubscriptsuperscript𝑅𝑡𝑑R^{{t}}_{d}italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, Rptsubscriptsuperscript𝑅𝑡𝑝R^{{t}}_{p}italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, Rstsubscriptsuperscript𝑅𝑡𝑠R^{{t}}_{s}italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT based on equation (5) and next state s1t+1subscriptsuperscript𝑠𝑡11s^{t+1}_{1}italic_s start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT from the environment.
7:     Collect trajectories: τ𝜏\tauitalic_τ={sgt,𝔞t,sgt+1,Rdt,Rpt,Rst\{s^{t}_{g},\mathfrak{a}^{{t}},s^{t+1}_{g},R^{{t}}_{d},R^{{t}}_{p},R^{{t}}_{s}{ italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , fraktur_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_s start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT , italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT} iteratively till end of episode.
8:     sgtsgt+1subscriptsuperscript𝑠𝑡𝑔subscriptsuperscript𝑠𝑡1𝑔s^{t}_{g}\leftarrow s^{t+1}_{g}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ← italic_s start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT;
9:     Compute advantage ϖtsuperscriptitalic-ϖ𝑡\varpi^{{{\color[rgb]{0,0,1}t}}}italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT based on equation (8)
10:     for o𝑜oitalic_o = 1,2,,O12𝑂1,2,...,O1 , 2 , … , italic_O do
11:        Group trajectories into batches
12:        for each batch do
13:           Compute gradient for actor: φ𝜑\triangledown\varphi▽ italic_φ based on equation (6)
14:           Apply gradient ascent on φ𝜑\varphiitalic_φ using φ𝜑\triangledown\varphi▽ italic_φ
15:           Update critic model through back-propagation of loss using equation based on (7)
16:        end for
17:        Update parameters of critic target network ϕsuperscriptitalic-ϕ\phi^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with parameters of critic network ϕitalic-ϕ\phiitalic_ϕ, every C𝐶Citalic_C number of iterations, where C𝐶Citalic_C denotes the interval for critic parameter update;
18:     end for
19:  end for

Single-Agent Action Branching Proximal Policy Optimization (SABPPO)

To ensure scalable federated fine-tuning of foundation models system, multiple variables require optimization. However, as the number of optimization variables increase, the number of actions that need to be explicitly represented grows exponentially with increasing action dimensionality, where the total number of actions equates to 𝔡=1𝔇|𝔞𝔡|subscriptsuperscriptproduct𝔇𝔡1subscript𝔞𝔡\prod^{\mathfrak{D}}_{\mathfrak{d=1}}|\mathfrak{a}_{\mathfrak{d}}|∏ start_POSTSUPERSCRIPT fraktur_D end_POSTSUPERSCRIPT start_POSTSUBSCRIPT fraktur_d = fraktur_1 end_POSTSUBSCRIPT | fraktur_a start_POSTSUBSCRIPT fraktur_d end_POSTSUBSCRIPT |, where 𝔇𝔇\mathfrak{D}fraktur_D is the total number of action dimensions, 𝔞𝔡subscript𝔞𝔡\mathfrak{a}_{\mathfrak{d}}fraktur_a start_POSTSUBSCRIPT fraktur_d end_POSTSUBSCRIPT is the action space of action 𝔡𝔡\mathfrak{d}fraktur_d. However, traditional deep reinforcement learning architectures do not handle the exponentially growing action dimension well. We propose a novel Single-Agent Action Branching Proximal Policy Optimization (SABPPO) algorithm which is inspired by the action branching approaches proposed by [40]. The SABPPO architecture builds on state-of-the-art Proximal Policy Optimization algorithm [38] and distributes the representation of the action controllers across individual network branches, meanwhile, maintaining a shared decision module among them to encode a latent representation of the input and help with the coordination of the branches. This proposed approach enables the linear growth of the total number of network outputs with increasing action dimensionality as opposed to the combinatorial growth in current discrete-action algorithms. SABPPO extends the PPO architecture with a single critic and actor. In each FL iteration, the actor’s user-selection branch takes the state s1tsubscriptsuperscript𝑠𝑡1s^{t}_{1}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as input, producing user-selection information for concatenation with the state to form s2tsubscriptsuperscript𝑠𝑡2s^{t}_{2}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. This concatenated state is then input to the actor’s bandwidth-selection branch, generating bandwidth selection information for concatenation with the state to form s3tsubscriptsuperscript𝑠𝑡3s^{t}_{3}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. The same process occurs with the power-selection branch, producing power selection information for concatenation with the state to form s4tsubscriptsuperscript𝑠𝑡4s^{t}_{4}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Lastly, the concatenated state is fed into the actor’s emulator-compression selection branch, yielding emulator compression selection information (shown in Figure 4). The SABPPO actor is updated as follow:

Δφ=𝔼t[φmin{𝔯t(φ)ϖt,clip(𝔯t(φ),1ϵ,1+ϵ)ϖt}],Δ𝜑superscript𝔼𝑡delimited-[]subscript𝜑superscript𝔯𝑡𝜑superscriptitalic-ϖ𝑡clipsuperscript𝔯𝑡𝜑1italic-ϵ1italic-ϵsuperscriptitalic-ϖ𝑡\displaystyle\Delta\varphi=\mathbb{E}^{t}[\nabla_{\varphi}\min\{\mathfrak{r}^{% t}(\varphi)\varpi^{t},\text{clip}(\mathfrak{r}^{t}(\varphi),1-\epsilon,1+% \epsilon)\varpi^{t}\}],roman_Δ italic_φ = blackboard_E start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ ∇ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT roman_min { fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_φ ) italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , clip ( fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_φ ) , 1 - italic_ϵ , 1 + italic_ϵ ) italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } ] , (6)

while the SABPPO critic is updated as follows:

Lt(ϕ)=[Vϕ(sgt)(ϖt+Vϕ(sgt))]2.superscript𝐿𝑡italic-ϕsuperscriptdelimited-[]subscript𝑉italic-ϕsubscriptsuperscript𝑠𝑡𝑔superscriptitalic-ϖ𝑡subscript𝑉superscriptitalic-ϕsubscriptsuperscript𝑠𝑡𝑔2\displaystyle L^{t}(\phi)=[V_{\phi}(s^{t}_{g})-(\varpi^{t}+V_{\phi^{\prime}}(s% ^{t}_{g}))]^{2}.italic_L start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_ϕ ) = [ italic_V start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) - ( italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_V start_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (7)

sgtsubscriptsuperscript𝑠𝑡𝑔s^{t}_{g}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT is the global state, which is the concatenation of all states, V𝑉Vitalic_V is the state-value function and ϕitalic-ϕ\phiitalic_ϕ and ϕsuperscriptitalic-ϕ\phi^{\prime}italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are the state-value function parameter and target state-value function parameter, respectively. Here, 𝔯t(θ)superscript𝔯𝑡𝜃\mathfrak{r}^{t}(\theta)fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_θ ) represents the ratio between the two policies: 𝔯t(θ)=πθ(a1t,a2t,a3t,a4t|sgt)πθ(a1t,a2t,a3t,a4t|sgt)superscript𝔯𝑡𝜃subscript𝜋𝜃subscriptsuperscript𝑎𝑡1subscriptsuperscript𝑎𝑡2subscriptsuperscript𝑎𝑡3conditionalsubscriptsuperscript𝑎𝑡4subscriptsuperscript𝑠𝑡𝑔subscript𝜋superscript𝜃subscriptsuperscript𝑎𝑡1subscriptsuperscript𝑎𝑡2subscriptsuperscript𝑎𝑡3conditionalsubscriptsuperscript𝑎𝑡4subscriptsuperscript𝑠𝑡𝑔\mathfrak{r}^{t}(\theta)=\frac{\pi_{\theta}(a^{t}_{1},a^{t}_{2},a^{t}_{3},a^{t% }_{4}|s^{t}_{g})}{\pi_{{\theta^{\prime}}}(a^{t}_{1},a^{t}_{2},a^{t}_{3},a^{t}_% {4}|s^{t}_{g})}fraktur_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_θ ) = divide start_ARG italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT | italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT | italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) end_ARG. And the advantage function ϖtsuperscriptitalic-ϖ𝑡\varpi^{t}italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is calculated via Generalized Advantage Estimation (GAE) [41]:

ϖt=δt+(γλ)δt+1++(γλ)T¯1δt+T¯1,whereδt=Rt+γVϕ(sgt+1)Vϕ(sgt),formulae-sequencesuperscriptitalic-ϖ𝑡superscript𝛿𝑡𝛾𝜆superscript𝛿𝑡1superscript𝛾𝜆¯𝑇1superscript𝛿𝑡¯𝑇1wheresuperscript𝛿𝑡superscript𝑅𝑡𝛾subscript𝑉superscriptitalic-ϕsubscriptsuperscript𝑠𝑡1𝑔subscript𝑉superscriptitalic-ϕsubscriptsuperscript𝑠𝑡𝑔\displaystyle\varpi^{t}=\delta^{t}+(\gamma\lambda)\delta^{t+1}+...+(\gamma% \lambda)^{\bar{T}-1}\delta^{t+\bar{T}-1},\leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \text{where}\leavevmode\nobreak\ \leavevmode% \nobreak\ \delta^{t}=R^{t}+\gamma V_{\phi^{\prime}}(s^{t+1}_{g})-V_{\phi^{% \prime}}(s^{t}_{g}),italic_ϖ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + ( italic_γ italic_λ ) italic_δ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT + … + ( italic_γ italic_λ ) start_POSTSUPERSCRIPT over¯ start_ARG italic_T end_ARG - 1 end_POSTSUPERSCRIPT italic_δ start_POSTSUPERSCRIPT italic_t + over¯ start_ARG italic_T end_ARG - 1 end_POSTSUPERSCRIPT , where italic_δ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_R start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_γ italic_V start_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) - italic_V start_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) , (8)

T¯¯𝑇\bar{T}over¯ start_ARG italic_T end_ARG is the trajectory segment, λ𝜆\lambdaitalic_λ is the trace decay parameter and γ𝛾\gammaitalic_γ is the discount rate.

Refer to caption
(a) FL vs FedPEAT (Delay).
Refer to caption
(b) FL vs FedPEAT (Emulator Switch)
Refer to caption
(c) FL vs FedPEAT (Perplexity).
Refer to caption
(d) Algorithm training time.
Refer to caption
(e) log(Delay)
Refer to caption
(f) Emulator exchange count
Refer to caption
(g) Perplexity
Refer to caption
(h) Reward
Figure 5: Comparison between FL and FedPEAT, and Comparison between Adaptive control algorithms. Figure 5(a), 5(b), 5(c) illustrates the performance difference between FL and FedPEAT with regards to delay, emulator exchange count, and perplexity, respectively. Figure 5(d) illustrates the time taken for model training for each adaptive control algorithm. Figure 5(e), 5(f), 5(g), 5(h) illustrate the performance of each adaptive control algorithm in across the training process, in terms of log(delay), emulator exchange count, perplexity, and reward.

Numerical Experiments

Experiment configuration

We substantiate our study with several experiments by showing that the FedPEAT framework with adaptive control works, and compare our proposed framework against Federated full model fine-tuning (Fed-FT). To simplify our workflow and for the ease of demonstration, we utilized numerical solutions from the works by [27] to facilitate our experiment. We utilized the OPT-1.3B [2] large language model as the foundation model, which has 1208 million parameters and is of approximately 2.63 gigabytes (GB) in storage memory. We utilized the layer-drop approach [33] for the emulator compression. We adopted the perplexity-layer drop retention numerical solution from the works by [27] and established the function to be approximated by P=25.2ϱ243.1ϱ+31.9𝑃25.2superscriptitalic-ϱ243.1italic-ϱ31.9P=25.2\varrho^{2}-43.1\varrho+31.9italic_P = 25.2 italic_ϱ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 43.1 italic_ϱ + 31.9 for 0<ϱ10italic-ϱ10<\varrho\leq 10 < italic_ϱ ≤ 1, with an R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT score of 0.970.970.970.97, where ϱitalic-ϱ\varrhoitalic_ϱ stands for the layer drop retention ratio and varies from 0 to 1. We also adopted perplexity improvements of using LoRA from the works by [27], and establish the perplexity improvement upon application of LoRA to be 0.780.78-0.78- 0.78. We designed the trainable layers of the adapter to be 2 layers at both top and bottom layers of the neural network. We assume model storage memory usage to follow a linear relationship with the number of parameters of the model. We set T𝑇Titalic_T which is the number of federated fine-tuning rounds in an episode to be 100. We consider a scenario with 1 main server and 10 user devices. In each round of federated fine-tuning, our adaptive control orchestrator selects 5 devices for fine-tuning. To facilitate the collaboratively training scenario, the server holds 30%percent3030\%30 % of the total data to be trained. We set our large penalty ϰitalic-ϰ\varkappaitalic_ϰ to be -50.

As we consider the communications to be over 6G networks, we assign the bandwidth B𝐵Bitalic_B to be selected from a range between 7777 and 20202020 Ghz and noise σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to be 174174-174- 174 dBm. We initialize and constrain main server power output to (0.0,15.0)0.015.0(0.0,15.0)( 0.0 , 15.0 ) Watt. User-device channel gain are calculated based on path-loss [42] and user distance from the server. ξpsubscript𝜉𝑝\xi_{p}italic_ξ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, ξfsubscript𝜉𝑓\xi_{f}italic_ξ start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT, and ξssubscript𝜉𝑠\xi_{s}italic_ξ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are set to 5, -10, and 25, respectively, and these numbers are empirically derived with the aim of balancing the variables in the objective function. We adopt the ADAM optimizer [43] for the algorithms implemented in our study. The models are trained for 5,000,000 steps and evaluated at every 5000 steps.

Experiment results

Our findings reveal that FedPEAT with adaptive control outperforms Fed-FT significantly in terms of both communication and computation delays. Specifically, in a single round of computation and communication, Fed-FT demonstrates a delay 4.60×\times× longer than that of FedPEAT with adaptive control, as illustrated in Figure 5(a). This notable improvement by FedPEAT with adaptive control is achieved despite the need for an emulator exchange, where emulators are exchanged 2.10 ×\times× on average in each 10 iterations (Figure 5(b)), a process not required in Fed-FT. The ability of FedPEAT with adaptive control to mitigate delays underscores its effectiveness in enhancing the efficiency of FL systems, even when accounting for additional emulator exchanges. However, it is essential to note that FedPEAT with Adaptive control exhibits a perplexity score 3.49 points higher than that of Fed-FL.

Subsequently, we extend our analysis to compare our proposed SABPPO adaptive control algorithm with baseline algorithms, namely iterative Reinforcement Learning Agents (iterRL) and Heterogeneous Action Proximal Policy Optimization (HAPPO). IterRL employs independent actors and critics, while HAPPO, based on Centralized Training and Decentralized Execution (CTDE) [44], features three separate actors sharing a single critic model. Our experimental results, as depicted in Figure 5(h), showcase the superiority of the SABPPO algorithm in terms of model convergence and reward. Specifically, SABPPO achieves a reward of -168, outperforming HAPPO and iterRL, which attain -225 and -214, respectively, after 5,000,000 training steps. This superior performance is corroborated by lower log(delay) values (Figure 5(e)) of -4.02 for SABPPO compared to -1.33 and -3.27 for HAPPO and iterRL, respectively. Additionally, SABPPO exhibits fewer emulator exchanges (2.10×\times×) compared to HAPPO (4.91×\times×) and iterRL (3.95×\times×) (Figure 5(f)), as well as lower perplexity scores (15.03 compared to 17.86 and 16.60 for HAPPO and iterRL, respectively), as seen in Figure 5(g)).

Furthermore, the SABPPO framework demonstrates a significantly shorter delay in model training (0.308 seconds) compared to HAPPO (0.534 seconds) and iterRL (0.549 seconds), as highlighted in our results. These findings collectively underscore the efficiency gains achieved by the SABPPO algorithm across various performance metrics when compared to baseline algorithms in federated learning scenarios.

Discussion and Conclusion

In summary, the deployment and refinement of large foundation models present multifaceted challenges, necessitating solutions that address collaborative training, model ownership, and computational constraints to fully unlock their potential. In response to these challenges, we extend the offsite tuning paradigm to introduce Emulator-Assisted Tuning (EAT) and integrate it with Parameter-Efficient Fine-Tuning (PEFT), resulting in the creation of Parameter-Efficient Emulator-Assisted Tuning (PEAT). This novel approach is further extended to the FL domain, resulting in Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT).

Our proposed FedPEAT framework, featuring adaptive control and a unique fusion of adapters and emulators, represents a pioneering avenue for advancing model privacy and optimizing memory-efficient downstream federated fine-tuning. Adapters, endowed with trainable neural network parameters, tailor models for specific tasks, while emulators offer compressed, fixed-parameter representations. This innovative approach not only addresses concerns regarding model privacy by eliminating the need to transmit complete models to edge devices but also substantially enhances memory and computational efficiency. Its adaptability to diverse neural network architectures is complemented by an adaptive control mechanism, employing deep reinforcement learning to optimize critical hyper-parameters, thus ensuring efficient resource orchestration.

In the broader context, our FedPEAT framework, empowered by the SABPPO-Adaptive control optimizer, facilitates Federated fine-tuning by leveraging Parameter-Efficient Fine-Tuning (PEFT) and Emulator-Assisted Tuning (EAT) methodologies. This framework upholds user data privacy through Federated Learning and protects model owner intellectual property (IP) through EAT. Furthermore, our experimental results demonstrate that FedPEAT with Adaptive control significantly outperforms traditional Federated Learning (Fed-FT) in terms of communication and computation efficiency as the use of FedPEAT reduces the foundation model memory footprint and number of parameters to tune. This efficiency gain enables the inclusion of low-resource devices in federated fine-tuning of foundation models. While FedPEAT with adaptive control exhibits a slightly higher perplexity score compared to Fed-FT, the marginal discrepancy in performance is overshadowed by the substantial reduction in communication and computation overhead. Notably, the adaptive control mechanism can be fine-tuned to prioritize higher perplexity scores should specific preferences or requirements dictate such adjustments.

Moreover, our investigation reveals that the training time required for the proposed SABPPO adaptive control optimizer is significantly lower than that achieved by the HAPPO and iterRL algorithms. This efficiency gain is attributed to the streamlined training process of a single actor and a single critic in SABPPO, as opposed to the more resource-intensive training requirements of multiple actors and critics in HAPPO and iterRL. This reduction in the number of neural networks trained concurrently contributes significantly to the observed decrease in training time. In conclusion, our comprehensive framework, FedPEAT with adaptive control, stands as a pioneering solution, providing a nuanced balance between model performance, privacy, and resource efficiency in the complex landscape of federated learning and large model fine-tuning.

References

  • [1] Brown, T. et al. Language models are few-shot learners. \JournalTitleAdvances in Neural Information Processing Systems 33, 1877–1901 (2020).
  • [2] Radford, A. et al. Language models are unsupervised multitask learners. \JournalTitleOpenAI blog 1, 9 (2019).
  • [3] Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. \JournalTitlearXiv preprint arXiv:1810.04805 (2018).
  • [4] Wei, J. et al. Finetuned language models are zero-shot learners. \JournalTitlearXiv preprint arXiv:2109.01652 (2021).
  • [5] Muennighoff, N. et al. Crosslingual generalization through multitask finetuning. \JournalTitlearXiv preprint arXiv:2211.01786 (2022).
  • [6] Letaief, K. B., Chen, W., Shi, Y., Zhang, J. & Zhang, Y.-J. A. The roadmap to 6g: Ai empowered wireless networks. \JournalTitleIEEE communications magazine 57, 84–90 (2019).
  • [7] McMahan, B., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, 1273–1282 (PMLR, 2017).
  • [8] Konečnỳ, J. et al. Federated learning: Strategies for improving communication efficiency. \JournalTitlearXiv preprint arXiv:1610.05492 (2016).
  • [9] Bonawitz, K. et al. Towards federated learning at scale: System design. \JournalTitleProceedings of Machine Learning and Systems 1, 374–388 (2019).
  • [10] Rebuffi, S.-A., Bilen, H. & Vedaldi, A. Learning multiple visual domains with residual adapters. \JournalTitleAdvances in Neural Information Processing Systems 30 (2017).
  • [11] He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T. & Neubig, G. Towards a unified view of parameter-efficient transfer learning. \JournalTitlearXiv preprint arXiv:2110.04366 (2021).
  • [12] Houlsby, N. et al. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790–2799 (PMLR, 2019).
  • [13] Hu, E. J. et al. LoRA: Low-rank adaptation of large language models. \JournalTitlearXiv preprint arXiv:2106.09685 (2021).
  • [14] Qin, G. & Eisner, J. Learning how to ask: Querying LMs with mixtures of soft prompts. \JournalTitlearXiv preprint arXiv:2104.06599 (2021).
  • [15] Lester, B., Al-Rfou, R. & Constant, N. The power of scale for parameter-efficient prompt tuning. \JournalTitlearXiv preprint arXiv:2104.08691 (2021).
  • [16] Li, X. L. & Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. \JournalTitlearXiv preprint arXiv:2101.00190 (2021).
  • [17] Liu, X. et al. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. \JournalTitlearXiv preprint arXiv:2110.07602 (2021).
  • [18] An, S. et al. Input-tuning: Adapting unfamiliar inputs to frozen pretrained models. \JournalTitlearXiv preprint arXiv:2203.03131 (2022).
  • [19] Liu, H. et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. \JournalTitleAdvances in Neural Information Processing Systems 35, 1950–1965 (2022).
  • [20] Zhang, Z. et al. FedPETuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Annual Meeting of the Association of Computational Linguistics 2023, 9963–9977 (Association for Computational Linguistics (ACL), 2023).
  • [21] Zhao, H., Du, W., Li, F., Li, P. & Liu, G. FedPrompt: Communication-efficient and privacy-preserving prompt tuning in federated learning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
  • [22] Zhang, J. et al. Towards building the federated gpt: Federated instruction tuning. \JournalTitlearXiv preprint arXiv:2305.05644 (2023).
  • [23] Guo, T., Guo, S., Wang, J., Tang, X. & Xu, W. PromptFL: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model. \JournalTitleIEEE Transactions on Mobile Computing (2023).
  • [24] Cai, D., Wu, Y., Wang, S., Lin, F. X. & Xu, M. FedAdapter: Efficient federated learning for modern NLP. In ACM 29th Annual International Conference on Mobile Computing and Networking (MobiCom) (2023).
  • [25] Smith, S. et al. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. \JournalTitlearXiv preprint arXiv:2201.11990 (2022).
  • [26] Xiao, G. et al. SmoothQuant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning, 38087–38099 (PMLR, 2023).
  • [27] Xiao, G., Lin, J. & Han, S. Offsite-tuning: Transfer learning without full model. \JournalTitlearXiv preprint arXiv:2302.04870 (2023).
  • [28] Ding, Y. et al. DC-CCL: Device-cloud collaborative controlled learning for large vision models. \JournalTitlearXiv preprint arXiv:2303.10361 (2023).
  • [29] Kuang, W. et al. FederatedScope-LLM: A comprehensive package for fine-tuning large language models in federated learning. \JournalTitlearXiv preprint arXiv:2309.00363 (2023).
  • [30] Han, S., Mao, H. & Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. \JournalTitlearXiv preprint arXiv:1510.00149 (2015).
  • [31] Sajjad, H., Dalvi, F., Durrani, N. & Nakov, P. On the effect of drop** layers of pre-trained transformer models. \JournalTitleComputer Speech & Language 77, 101429 (2023).
  • [32] Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. \JournalTitlearXiv preprint arXiv:1503.02531 (2015).
  • [33] Zhang, M. & He, Y. Accelerating training of transformer-based language models with progressive layer drop**. \JournalTitleAdvances in Neural Information Processing Systems 33, 14011–14023 (2020).
  • [34] Zhang, Z. et al. FedPETuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Findings of the Association for Computational Linguistics: ACL 2023, 9963–9977, DOI: 10.18653/v1/2023.findings-acl.632 (Association for Computational Linguistics, Toronto, Canada, 2023).
  • [35] Zhang, J. et al. Towards building the federated GPT: Federated instruction tuning. \JournalTitlearXiv preprint arXiv:2305.05644 (2023).
  • [36] Lim, W. Y. B. et al. Dynamic edge association and resource allocation in self-organizing hierarchical federated learning networks. \JournalTitleIEEE Journal on Selected Areas in Communications 39, 3640–3653 (2021).
  • [37] Xiao, C., Zheng, Y. R. & Beaulieu, N. C. Statistical simulation models for rayleigh and rician fading. In IEEE International Conference on Communications, 2003. ICC’03., vol. 5, 3524–3529 (IEEE, 2003).
  • [38] Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. \JournalTitlearXiv preprint arXiv:1707.06347 (2017).
  • [39] Kahn, H. & Harris, T. E. Estimation of particle transmission by random sampling. \JournalTitleNational Bureau of Standards Applied Mathematics Series 12, 27–30 (1951).
  • [40] Tavakoli, A., Pardo, F. & Kormushev, P. Action branching architectures for deep reinforcement learning. In Proceedings of the aaai conference on artificial intelligence, vol. 32 (2018).
  • [41] Schulman, J., Moritz, P., Levine, S., Jordan, M. & Abbeel, P. High-dimensional continuous control using generalized advantage estimation. \JournalTitlearXiv preprint arXiv:1506.02438 (2015).
  • [42] Erceg, V. et al. An empirically based path loss model for wireless channels in suburban environments. \JournalTitleIEEE Journal on selected areas in communications 17, 1205–1211 (1999).
  • [43] Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. \JournalTitlearXiv preprint arXiv:1412.6980 (2014).
  • [44] Lowe, R. et al. Multi-agent actor-critic for mixed cooperative-competitive environments. \JournalTitleAdvances in neural information processing systems 30 (2017).

sectionAcknowledgements

This research is supported in part by Nanyang Technological University (NTU), the NTU-Wallenberg AI, Autonomous Systems and Software Program (WASP) Joint Project; NTU Startup Grant; the Sin- gapore Ministry of Education Academic Research Fund under Grant Tier 1 RG97/20, Grant Tier 1 RG24/20 and Grant Tier 2 MOE2019-T2-1-176.

Author contributions statement

T.J.C, WH.Y, Y.L, and J.Z contributed equally and wrote the main manuscript text and software of programming.

Competing interests statement

The authors declare no competing interests.

Legends

Figure 1. Intersection of Federated learning (FL), Parameter-Efficient Fine-Tuning (PEFT), and Emulator-Assisted Tuning (EAT). Here we illustrate the intersection of FL, PEFT, and (EAT). The main contribution of our current paper is to introduce Federated Parameter-Efficient Emulator-Assisted Tuning (FedPEAT), as a convergence of EAT, PEFT, and FL, while EAT and Parameter-Efficient Emulator-Assisted Tuning (PEAT) are also terms coined by our paper.

Figure 2. FedPEAT with Adaptive control overview. This figure shows how the Adaptive control orchestrator makes decisions on important parameters, such as device selection, emulator compression parameter, transmission bandwidth and power to facilitate the FedPEAT process.

Figure 3. Emulator-Assisted Tuning generalized to three cases. Figure illustrates how the neural network structures at the server and local devices differ in each case. Case 1 represents our proposed FedPEAT framework. Case 2 represents the integration of Federated Learning and PEFT. Case 3 represents a traditional Federated Learning scenario.

Figure 4. Our proposed SABPPO algorithm and architecture. Figure illustrates the underlying actor and critic architecture, their interaction with the environment and model update process.

Figure 5. Comparison between FL and FedPEAT, and Comparison between Adaptive control algorithms. Figure 5(a), 5(b), 5(c) illustrates the performance difference between FL and FedPEAT with regards to delay, emulator exchange count, and perplexity, respectively. Figure 5(d) illustrates the time taken for model training for each adaptive control algorithm. Figure 5(e), 5(f), 5(g), 5(h) illustrate the performance of each adaptive control algorithm in across the training process, in terms of log(delay), emulator exchange count, perplexity, and reward.

Algorithm 1. FedPEAT with adaptive control mechanism.

Algorithm 2. SABPPO adaptive control algorithm.