License: CC BY 4.0
arXiv:2310.07471v2 [cs.NI] 25 Mar 2024

The Implications of Decentralization in Blockchained Federated Learning: Evaluating the Impact of Model Staleness and Inconsistencies

Francesc Wilhelmi{}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPT, Nima Afraz{}^{\sharp}start_FLOATSUPERSCRIPT ♯ end_FLOATSUPERSCRIPT, Elia Guerra{}^{\flat}start_FLOATSUPERSCRIPT ♭ end_FLOATSUPERSCRIPT, and Paolo Dini{}^{\flat}start_FLOATSUPERSCRIPT ♭ end_FLOATSUPERSCRIPT


Corresponding author: [email protected].This work has been partially funded by the Spanish project PID2020-113832RB-C22(ORIGIN)/MCIN/AEI/10.13039/50110001103 and by FREE6G - TSI-063000-2021-151 from the Ministerio de Asuntos Económicos y Transformación Digital and the European Union – NextGenerationEU under the framework of the “Plan de Recuperación, Transformación y Resiliencia” and the “Mecanismo de Recuperación y Resiliencia”. {}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPTRadio Systems Research, Nokia Bell Labs, Stuttgart, Germany {}^{\sharp}start_FLOATSUPERSCRIPT ♯ end_FLOATSUPERSCRIPTCONNECT Centre, School of Computer Science, University College Dublin, Dublin, Ireland {}^{\flat}start_FLOATSUPERSCRIPT ♭ end_FLOATSUPERSCRIPTSustainable Artificial Intelligence, Centre Tecnològic de Telecomunicacions de Catalunya, Barcelona, Spain
Abstract

Blockchain promises to enhance distributed machine learning (ML) approaches such as federated learning (FL) by providing further decentralization, security, immutability, and trust, which are key properties for enabling collaborative intelligence in next-generation applications. Nonetheless, the intrinsic decentralized operation of peer-to-peer (P2P) blockchain nodes leads to an uncharted setting for FL, whereby the concepts of FL round and global model become meaningless, as devices’ synchronization is lost without the figure of a central orchestrating server. In this paper, we study the practical implications of outsourcing the orchestration of FL to a democratic setting such as in a blockchain. In particular, we focus on the effects that model staleness and inconsistencies, endorsed by blockchains’ modus operandi, have on the training procedure held by FL devices asynchronously. Using simulation, we evaluate the blockchained FL operation by applying two different ML models (ranging from low to high complexity) on the well-known MNIST and CIFAR-10 datasets, respectively, and focus on the accuracy and timeliness of the solutions. Our results show the high impact of model inconsistencies on the accuracy of the models (up to a  35% decrease in prediction accuracy), which underscores the importance of properly designing blockchain systems based on the characteristics of the underlying FL application.

Index Terms:
blockchain, machine learning, decentralized federated learning, model inconsistencies, model staleness

I Introduction

I-A The decentralization of machine learning

Enabled by the advances in edge computing, the decentralization of artificial intelligence (AI) and machine learning (ML) unlocks a prominent paradigm where intelligence is brought closer to end-users. Decentralized AI allows real-time and near real-time applications to meet the ever-increasing latency requirements [1] thanks to the delay reduction achieved by eliminating the communication with a central node performing model inference. And not only that, but decentralized AI can potentially save energy by reducing the burden of massive data transferring, and also distributing the workload across many sites, rather than hosting it in a data center [2].

To enable edge intelligence and address scalability issues in ML, federated learning (FL) emerged in 2016 as a powerful tool to train ML models in a distributed manner [3]. In FL, a set of participants (also referred to as FL clients or FL devices) train an ML model collaboratively by exchanging model parameters, rather than by exchanging training data explicitly. Following this approach, an FL algorithm—see, e.g., Federated Averaging (FedAvg) [4]—can potentially reduce the overheads of ML training and also enhance the privacy of its centralized counterpart. FL was initially defined around the operation of a central server, which is responsible for orchestrating the ML training procedure by iteratively retrieving ML model updates from FL clients, computing global ML model updates, and distributing the outputs back to FL devices. Given the potential weaknesses of the centralized setting [5], including security, bottlenecking, or straggling issues, alternative decentralized architectures for FL have been recently proposed [6].

To enable the decentralization of FL, blockchain technology [7] stands as an appealing approach, as blockchains provide secure, immutable, and trustworthy decentralized storage. The decentralized realization of FL through blockchain, referred to as blockchained FL (sometimes it is also referred to as FLchain [8]), provides trust via cryptographic proof to federated ecosystems where multiple (often unreliable) parties cooperate to train a shared model. The blockchained FL framework does not only address centralization issues (e.g., single point of failure) but also provides complementary mechanisms that may boost FL settings, including effective ways of incentivizing FL participants (e.g., through native tokenization) to undertake ML model training [9].

I-B Challenges of blockchained federated learning

Refer to caption
Figure 1: Overview of the procedures carried out by FL devices and blockchain nodes in blockchained FL.

Although blockchain enables security and trustworthiness in decentralized FL applications, its inherent decentralization entails some important implications that must not be disregarded. Figure 1 summarizes the blockchained FL operation by graphically showing the following main procedures (highlighted by the numbered circles in the figure):

  1. 1.

    On-device computation at end devices: FL clients generate local ML model updates by training a model using their local data. The resulting local models are encapsulated in blockchain transactions.

  2. 2.

    Exchange of transactions: The local model updates are submitted by FL devices to the blockchain. Blockchain nodes (e.g., miners) maintain a shared pool of transactions synchronized by exchanging and broadcasting transactions through peer-to-peer (P2P) messages.

  3. 3.

    Blockchain writing: Blockchain miners take transactions from the shared pool to generate new blocks that update the status of the ledger. Each block corresponds to a newly computed aggregate model, built from FL client model updates.

  4. 4.

    Block propagation and consensus: Blockchain nodes exchange blocks and enforce consensus rules to ensure that the ledger is consistent across the entire P2P network. The latest blocks are provided to FL devices to continue training the FL model iteratively.

Under ideal conditions in which information is instantaneously propagated without errors, the blockchain would keep track of the overall FL optimization process as in centralized FL, thus storing a global update in each block (equivalent to performing FL rounds). However, such ideal conditions are unfeasible and do not hold in reality. First, the delays associated with information propagation in the blockchain, together with the decentralized nature of consensus, make the ledger inconsistent in different parts of the blockchain network. Apart from that, FL devices have heterogeneous capabilities in terms of computation and communication, thus contributing to the inconsistent usage of models for training.

In this work, we focus our attention on the two following issues derived from the operation of blockchained FL:

  • Issue #1 - Ledger inconsistencies: A ledger inconsistency (e.g., a fork in the main chain) arises as a result of the concurrent and decentralized mining operation. The effects of ledger inconsistencies on the FL operation are exemplified in the right-top part of Fig. 1. In the provided example, Miner #4 generates a valid block at t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, but before such a block is propagated at t3subscript𝑡3t_{3}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, Miner #5 generates another valid block at t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (notice that t3>t2subscript𝑡3subscript𝑡2t_{3}>t_{2}italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) with potentially different model updates. Until Miner #5’s chain is correctly updated (i.e., after Miner #4’s block is reaffirmed by another mined block), the forked block (shown in red) is used by Client #5 to generate a new local update at time t4subscript𝑡4t_{4}italic_t start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, thus leading to a model inconsistency.

  • Issue #2 - Model staleness: The delays for computing and securing local updates on the blockchain lead FL devices to potentially use outdated models for model training, i.e., models that are trained insufficiently compared to newer client updates but which are still to be processed by the blockchain. Model staleness may raise concerns in specific datasets and scenarios, provided that old model updates can negatively impact the global model’s accuracy. As an example of model staleness, in the right-bottom part of Fig. 1, Client #2 uses the model in Block #1 to generate a new local update at t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. However, before the update from Client #2 is computed and secured on the blockchain (which entails computation, communication, and mining delays), newer blocks with fresher models are generated as a result of the asynchronous operation of other FL clients. Then, the question is whether the stale model update generated by Client #2 is still valid or not.

As described above, the blockchained FL setting leads to a set of issues that can potentially affect the learning procedure therein. To the best of our knowledge, these issues have been barely studied in the literature.

I-C Our contributions

This paper is an extended version of our previous work presented in [10]. It builds upon the same concepts and methodology to provide further characterization and insights into the blockchained FL setting and its inherent issues (highlighted in Section I-B). The specific contributions of this paper are as follows:

  • We provide a comprehensive and self-contained overview of blockchained FL, including its technological realization and the description of the issues associated with it.

  • We characterize the main implications that arise from blockchained FL, i.e., ledger inconsistencies and model staleness. In this regard, we compute the freshness of blocks and use it as a staleness metric in FL.

  • We provide a simulation tool that is new of its kind for realistically capturing the blockchained FL operation. This tool, which is named BlockFLsim [11], integrates BlockSim [12] with Pytorch [13], thus allowing to simulate blockchained FL applications.

  • We evaluate the blockchained FL approach through extensive simulations and assess the impact of ledger inconsistencies and model staleness on FL accuracy. Our evaluation is carried out for automatic image recognition in different scenarios. More specifically, we consider two different sub-problems, characterized by the MNIST [14] and the CIFAR-10 [15] datasets for the evaluation of blockchained FL.

I-D Structure of the document

The remainder of this paper is structured as follows. Section II provides an overview of the state-of-the-art solutions regarding decentralized realizations of FL and the integration of blockchain and FL. Section III describes background concepts on blockchain and FL as independent solutions and then delves into the blockchained FL paradigm. The system model is presented in Section IV, which is followed by the performance evaluation using simulations in Section V. Section VI provides insights on future research directions and Section VII concludes the paper with final remarks.

II Related Work

II-A Decentralized federated learning

The decentralization of ML procedures has been popularized in the past years as a result of the increased computational and storage capabilities of handheld devices and the increasing reluctance to share private data with a server. A proposal to distribute in a P2P fashion the well-known stochastic gradient descend (SGD) algorithm, normally used for training purposes, is presented in [16]. In [17], a fully decentralized mechanism was proposed to train ML models by leveraging one-hop communications between neighbor nodes. Similarly, gossip-based communications [18] were leveraged in [19] to train ML models in a P2P network. As the authors of [19] showed, besides improving robustness, gossip learning significantly reduces energy consumption while leading to the same performance as in centralized learning.

The idea of gossip learning has also been applied in decentralized FL. In particular, the work in [20] leveraged gossip communications to allow the exchange of model segments among workers, which were used to provide ML model updates in a decentralized manner. A decentralized version of FedAvg whereby clients communicate with their neighbors was also proposed and analyzed in  [21]. A similar approach was presented in [22], where decentralized FL was realized through device-to-device (D2D) communications. In addition, the authors in [22] defined protocols for both analog and digital types of transmission modes, which were evaluated through simulations. Another decentralized FL solution was proposed in [23] to accommodate the specific needs of unmanned aerial vehicles (UAV) networks. An alternative mechanism for decentralizing FL can be found in [24], which was inspired by the BitTorrent protocol. More specifically, through the mechanism in [24], FL clients can request model updates to other devices on-demand.

Between centralized and fully decentralized FL, other solutions relied on cloud-edge computing hierarchies [25, 26] or clustering capabilities [27] in order to improve the performance of FL. These classes of hybrid methods are useful to boost efficiency and mitigate the single point of failure and scalability issues of centralized FL, and at the same time prevent synchronization issues and model inconsistencies as in fully decentralized FL. For a more comprehensive overview of existing decentralized ML and FL solutions, we refer the interested reader to the surveys in [28, 29, 30].

II-B Blockchained federated learning

Blockchained FL was first introduced in [31, 8] as a prominent solution for decentralizing FL and replacing the figure of the centralized orchestrating server with a blockchain. Different types of solutions have been envisioned to realize blockchained FL [32, 33]. Important design considerations lie in where and how to deploy the blockchain, which largely impacts the cost and performance of the solution [34].

A common approach in the literature relies on mobile edge computing (MEC) servers, co-located within access points (APs) or base stations (BSs) connected to end-users, to perform blockchain operations such as transaction validation or mining. This approach is very convenient for fulfilling typical blockchain requirements, as edge servers are typically equipped with high computation and communication capabilities. In [32], for instance, a generic blockchained FL architecture based on edge computing was defined. Through this approach, FL devices (in charge of performing model training) can submit ML model updates to the closest AP/BS, where an edge server acts as a full blockchain node (it collects transactions and participates in the block mining). Other works following a similar approach can be found in [8, 35, 36, 37], while alternatives including cloud/fog computing solutions were considered in [38, 23, 39, 40].

Apart from the system’s architecture, the election of the blockchain type is critical to the desired performance and capabilities and, therefore, should be influenced by the underlying FL application and the degree of trust among participants. In this regard, public blockchains using consensus mechanisms such as Proof of Work (PoW) have been adopted in [37, 23] to accommodate fully decentralized applications. Conversely, consortium and private blockchains using Algorand or Practical Byzantine Fault Tolerance (PBFT) have been considered in [36, 41] to fit more restricted settings where a limited group of participants maintain the blockchain.

When it comes to the evaluation of blockchained FL applications, analytical models like the one in [31] have been widely adopted to derive the end-to-end latency model of a blockchained FL system. Similarly, end-to-end latency models were proposed in [42, 43] to characterize the communication, computation, and consensus delays in blockchained FL.

Analytical models like the abovementioned ones, while providing a good intuition of the performance of blockchained FL applications, typically assume unrealistic conditions like perfect synchronization or proper FL client scheduling (see, e.g., [6]). In contrast, blockchained FL implementations suit better an asynchronous setting in which FL clients operate independently. The asynchronous blockchained FL problem has been studied in [44, 45, 46] but, still, the impact that blockchain decentralization has on the FL operation has been little studied to date. For instance, in the literature, forks have been considered to affect the transaction confirmation time only (see, e.g., [31]), so their implications on the FL training procedure remain unclear. In this paper, we aim to cover this gap in the literature and study the issues associated with model staleness and ledger inconsistencies that naturally arise in practical blockchained FL realizations.

III Blockchain and Federated Learning: Preliminaries

In this section, we first introduce blockchain and FL as separate technologies, and we then delve into their confluence, leading to the blockchained FL setting.

III-A Blockchain

Refer to caption
Figure 2: Blockchain’s protocol stack.

Blockchain is a distributed ledger technology where a set of P2P nodes holding their own copy of the ledger must agree on the history of timestamped transactions. Blockchain combines multiple concepts and technologies, including cryptography primitives, to achieve relevant properties such as security, privacy, and immutability in a decentralized data-sharing framework. To showcase the main components and operations of blockchains, we resort to the blockchain protocol stack illustrated in Fig. 2 (other definitions have also been proposed in [47, 48]), which includes the following layers:

  • Application: The application layer deals with the decentralized applications (e.g., finance, supply chain, IoT [49]) running on the blockchain. This includes all the infrastructure and tools for generating and handling the decentralized application data (e.g., transactions, smart contracts) hosted in the blockchain and may include the support from user graphical interfaces, APIs, wallets, etc.

  • Data: The data layer is related to the way data is structured and stored in the blockchain. This layer applies concepts such as the chaining of blocks or digital signatures to enforce privacy, security, immutability, and transparency properties.

  • Network: The network layer allows blockchain nodes communicate to exchange transactions and blocks, and also to enforce consensus. In this layer, procedures like node discovery or block/transaction exchanges are defined.

  • Consensus: The consensus layer establishes the rules to be applied by blockchain nodes in order to participate in the maintenance and update of the distributed ledger. Consensus mechanisms such as PoW establish the operations of blockchain miners for creating and accepting new blocks, thus being the core of the decentralized and concurrent synchronization of the ledger.

  • Infrastructure: Finally, the infrastructure layer defines the set of physical devices, connections, and operations within the underlying blockchain network. Depending on the type of consensus adopted, varying computational and storage capabilities might be required.

The data and the consensus layers constitute the core of blockchain technology, as they define the way information is structured and validated by participants, which is the key to providing security and immutability. As shown in Fig. 3, transactions (i.e., events updating the ledger) are grouped in blocks, each chained after the previous one and starting by the genesis block (with depth 0). Depending on the blockchain realization (e.g., Bitcoin, Ethereum, Hyperledger Fabric), blocks may carry different types of information, but in general, blocks include the following basic fields:

  • Nonce (number only used once): The number used to demonstrate a cryptographic proof within the PoW mining operation. In Bitcoin, a nonce is a 32-bit number that, when hashed with the rest of the headers of a block, meets the difficulty imposed by the consensus protocol (i.e., the solution of such a hash contains a predefined amount of initial zeros).

  • Hash: Output of the hash function (e.g., SHA-256) when combining the headers of the block. The hash is used as cryptographic proof to authorize a given miner to add a new block to the ledger.

  • Timestamp (TS): The time the block was created.

  • Merkle root (MR): Typically, for building the hash of a block, transactions are organized in a Merkle tree to further guarantee immutability. The root of the Merkle tree is included in the header.

  • Body of transactions: Set of transactions generated by the application layer and which are included in the block by the miner.

Refer to caption
Figure 3: Blockchain structure and block information.

Blockchain consensus plays a major role in preserving the integrity of the data stored in a blockchain and it is required for blockchain participants to adopt the same history of the ledger. Particularly in permissionless blockchains,111In permissionless blockchains, different from permissioned ones, any party can participate in the consensus procedure. where miners work concurrently, mining can lead to forks, i.e., different (forked) versions of the ledger adopted by different blockchain nodes. In PoW, which has been widely used in public blockchains (e.g., Bitcoin, Ethereum), forks occur when two or more miners come up with a valid nonce before the winning miner shares its block with the rest of the miners. Forks are solved by enforcing consensus rules, such as adopting the longest chain (with more invested power) as the valid one. Following the consensus rules, a blockchain node in a forked chain eventually switches to the main chain, which occurs when the main chain obtains more confirmations than any other forked version.222In Bitcoin, for instance, up to six confirmations are required before a transaction (a payment) is considered to be secure.

III-B Federated learning

In FL [50], a set of clients collaborate to train a global model in a distributed manner. To that purpose, instead of sharing raw data directly (as done in traditional centralized ML applications), FL devices sequentially exchange model parameters, obtained by training the latest received global model on local data. Broadly speaking, the goal of FL is to minimize a global finite-sum cost function, weighted by the contribution of each node (e.g., in terms of data samples with respect to the total length of the distributed dataset).

FedAvg (illustrated in Fig. 4 and described in Algorithm 1) is one of the most popular algorithms to carry out the FL operation [4]. In FedAvg, the parameter server (or central orchestrating server) selects a subset St𝒦subscript𝑆𝑡𝒦S_{t}\subseteq\mathcal{K}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊆ caligraphic_K of FL clients in each round t𝑡titalic_t. The selected clients compute a local update wt(k)superscriptsubscript𝑤𝑡𝑘w_{t}^{(k)}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT from the current global model wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT by running SGD. Then, the server aggregates the received local models from the selected clients to generate a new global model update wt+1subscript𝑤𝑡1w_{t+1}italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, which is sent to the next set of selected clients for further training.

Refer to caption
Figure 4: ML model training in Federated Averaging (FedAvg).
Algorithm 1 Federated Averaging (FedAvg)
1:Initialize: Initial model w0subscript𝑤0w_{0}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, batch size B𝐵Bitalic_B, number of epochs E𝐸Eitalic_E, learning rate η𝜂\etaitalic_η
2:for t=0,,T𝑡0𝑇t=0,\ldots,Titalic_t = 0 , … , italic_T do
3:     Select St𝒦subscript𝑆𝑡𝒦S_{t}\subseteq\mathcal{K}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊆ caligraphic_K
4:     for kSt𝑘subscript𝑆𝑡k\in S_{t}italic_k ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT do
5:         Pull 𝒘tsubscript𝒘𝑡\boldsymbol{w}_{t}bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from central server: 𝒘t,0(k)=𝒘tsubscriptsuperscript𝒘𝑘𝑡0subscript𝒘𝑡\boldsymbol{w}^{(k)}_{t,0}=\boldsymbol{w}_{t}bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , 0 end_POSTSUBSCRIPT = bold_italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
6:         for e=1,,E𝑒1𝐸e=1,\ldots,Eitalic_e = 1 , … , italic_E do
7:              Update local model: 𝒘t,e(k)=𝒘t,e(k)ηlt,e(k)subscriptsuperscript𝒘𝑘𝑡𝑒superscriptsubscript𝒘𝑡𝑒𝑘𝜂subscriptsuperscript𝑙𝑘𝑡𝑒\boldsymbol{w}^{(k)}_{t,e}=\boldsymbol{w}_{t,e}^{(k)}-\eta\nabla l^{(k)}_{t,e}bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_e end_POSTSUBSCRIPT = bold_italic_w start_POSTSUBSCRIPT italic_t , italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - italic_η ∇ italic_l start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t , italic_e end_POSTSUBSCRIPT
8:         end for
9:         Push 𝒘t+1(k)𝒘t,E(k)superscriptsubscript𝒘𝑡1𝑘superscriptsubscript𝒘𝑡𝐸𝑘\boldsymbol{w}_{t+1}^{(k)}\leftarrow\boldsymbol{w}_{t,E}^{(k)}bold_italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ← bold_italic_w start_POSTSUBSCRIPT italic_t , italic_E end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT
10:     end for
11:     𝒘t+1=1|St|kSt𝒘t+1(k)subscript𝒘𝑡11subscript𝑆𝑡subscript𝑘subscript𝑆𝑡superscriptsubscript𝒘𝑡1𝑘\boldsymbol{w}_{t+1}=\frac{1}{|S_{t}|}\sum_{k\in S_{t}}\boldsymbol{w}_{t+1}^{(% k)}bold_italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT
12:end for

III-C Blockchained federated learning

The blockchained FL solution relies on a distributed ledger to securely store ML model updates from FL clients. This solution gets rid of the central orchestrating server and, instead, empowers a decentralized network of miners to maintain the status of the FL global model by updating the ledger based on FL clients’ updates. The ledger can be accessed asynchronously by the FL clients for either reading (e.g., downloading the latest global models) or writing (e.g., submitting fresh local model updates). Under such an asynchronous and uncoordinated setting where each FL client works independently and maintains its own FL model history, the concepts of FL training round and global model lose meaning.

To carry out decentralized FL through blockchain, we identify two types of logical entities, namely blockchain nodes and FL clients, each one with specific functionalities and logical components (illustrated in Fig. 5). Blockchain nodes are responsible for gathering, sharing, and verifying transactions and maintaining the distributed ledger, while FL clients perform a given federated optimization task by contributing with locally trained models. The set of functional blocks from blockchain nodes and FL clients is as follows:

  • Distributed data module: Used to store and keep track of the history of blocks and to maintain the pool of unconfirmed transactions.

  • Consensus engine: Deals with the enforcement of the distributed protocol through mining operations (e.g., PoW, PBFT), validation, and conflict resolution.

  • Communication module: It allows blockchain nodes to share either transactions or blocks among them. The communication module also provides an interface with FL clients for gathering transactions (local models) and exposing the ledger.

  • Blockchain adaptor: It allows clients to interact with the blockchain to either submitting transactions or retrieving the latest FL models. An incentive engine can be used to encourage the participation of FL clients in the distributed learning operation.

  • Federated learning engine: Allows training a model using on-device ML libraries (e.g., TensorFlow) and local data (it can be an offline process). In this paper, model aggregation is considered to be done by miners. Nevertheless, the envisioned architecture also allows performing model aggregation on the FL device side, as previously proposed in [31, 32].

Refer to caption
Figure 5: Blockchained FL logical components and interactions.

The specific implementation of blockchain nodes and FL clients depends on the underlying FL application and the requirements therein. Based on the type of blockchain adopted (e.g., public vs. private, permissioned vs. permissionless blockchains), blockchain nodes and miners might need different hardware capabilities, thus leading to either specialized or general-purpose devices meeting certain storage and computation requirements. In public permissionless blockchains, blockchain nodes must possess high computational and storage capabilities to support computation-intensive mining and store a large history of transactions, respectively. In Bitcoin, for instance, blockchain nodes are typically specialized devices (e.g., ASIC miners) with tens to hundreds of terahash per second (TH/s) power and hundreds of GBs of memory [51]. When it comes to FL clients, they need enough storage to keep their local dataset (typically in the order of a few GB, depending on the application) and from low to moderate computational power to perform local training. FL clients are typically end devices like smartphones or laptops, but other solutions leveraging MEC-based computation offloading exist [26].

IV System Model

IV-A Blockchained FL model

A set of 𝒦={1,2,,K}𝒦12𝐾\mathcal{K}=\{1,2,...,K\}caligraphic_K = { 1 , 2 , … , italic_K } clients collaborate to train a global model w𝑤witalic_w by sharing locally trained model parameters w(k)dsuperscript𝑤𝑘superscript𝑑w^{(k)}\in\mathbb{R}^{d}italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT through a blockchain. Starting from a global model provided by block b𝑏bitalic_b, w(b)superscript𝑤𝑏w^{(b)}italic_w start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT, each device k𝑘kitalic_k attempts to minimize a local loss function l(k)()superscript𝑙𝑘l^{(k)}(\cdot)italic_l start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( ⋅ ) by running E𝐸Eitalic_E epochs of SGD on its local data. Using a dataset 𝒟(k)superscript𝒟𝑘\mathcal{D}^{(k)}caligraphic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT with D(k)=|𝒟(k)|superscript𝐷𝑘superscript𝒟𝑘D^{(k)}=|\mathcal{D}^{(k)}|italic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = | caligraphic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT | samples, a client k𝑘kitalic_k updates the local model parameters w(k)superscript𝑤𝑘w^{(k)}italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT as:

w(k)=w(b)ηll(k)(w(b),𝒟(k)),superscript𝑤𝑘superscript𝑤𝑏subscript𝜂𝑙superscript𝑙𝑘superscript𝑤𝑏superscript𝒟𝑘w^{(k)}=w^{(b)}-\eta_{l}\nabla l^{(k)}(w^{(b)},\mathcal{D}^{(k)}),italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT = italic_w start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT - italic_η start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∇ italic_l start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_w start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) , (1)

where η𝜂\etaitalic_η is the learning rate and l(k)(w(b),𝒟(k))superscript𝑙𝑘superscript𝑤𝑏superscript𝒟𝑘\nabla l^{(k)}(w^{(b)},\mathcal{D}^{(k)})∇ italic_l start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ( italic_w start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) is the k𝑘kitalic_k-th client average loss gradient with respect to w(b)superscript𝑤𝑏w^{(b)}italic_w start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT. A client k𝑘kitalic_k counts on ρ(k)superscript𝜌𝑘\rho^{(k)}italic_ρ start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT computational power for training a local model.

The local model updates are aggregated and the resulting global model is included into a blockchain block. The aggregation of local updates, assumed to be done by the miner generating the block b𝑏bitalic_b, is computed as

w(b)=kbD(k)D(b)w(k),superscript𝑤𝑏subscript𝑘𝑏superscript𝐷𝑘superscript𝐷𝑏superscript𝑤𝑘w^{(b)}=\sum_{k\in b}\frac{D^{(k)}}{D^{(b)}}w^{(k)},italic_w start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ∈ italic_b end_POSTSUBSCRIPT divide start_ARG italic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_D start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_ARG italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT , (2)

where D(b)=kbD(k)superscript𝐷𝑏subscript𝑘𝑏superscript𝐷𝑘D^{(b)}=\sum_{k\in b}D^{(k)}italic_D start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_k ∈ italic_b end_POSTSUBSCRIPT italic_D start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT. Following this approach, any client can retrieve the latest block b𝑏bitalic_b from its closest miner m𝑚mitalic_m and use the latest global model to continue training the federated model.

IV-B Blockchain latency model

We consider a P2P network of \mathcal{M}caligraphic_M blockchain nodes acting as miners. The blockchain nodes are responsible for gathering transactions from FL devices, mining blocks, and broadcasting new transactions and blocks. The overall process, as described in the literature [31, 52], can be decomposed into different steps, including transaction/block propagation, block mining, and consensus resolution, which delay characterizations are described in the following sections.

IV-B1 Transaction and block propagation

Both transaction and block propagation delays are characterized by an exponential distribution with mean 1/T1T1/\text{T}1 / T, which is determined by the size of the data L to be transmitted and the capacity ClinksubscriptClink\text{C}_{\text{link}}C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT of the link used. In particular, the mean transaction propagation latency is computed as

Ttp=LtClink,subscriptTtpsubscriptLtsubscriptClink\text{T}_{\text{tp}}=\frac{\text{L}_{\text{t}}}{\text{C}_{\text{link}}},T start_POSTSUBSCRIPT tp end_POSTSUBSCRIPT = divide start_ARG L start_POSTSUBSCRIPT t end_POSTSUBSCRIPT end_ARG start_ARG C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT end_ARG , (3)

where the transaction length LtsubscriptLt\text{L}_{\text{t}}L start_POSTSUBSCRIPT t end_POSTSUBSCRIPT is defined by the size of the ML model, which is computed as the number of model parameters (NmodelsubscriptNmodel\text{N}_{\text{model}}N start_POSTSUBSCRIPT model end_POSTSUBSCRIPT) multiplied by 4 bytes (we assume that each model parameters is represented by a float32 variable).

Likewise, the mean block propagation delay is computed as

Tbp=LbClink=Lbh+NtLtClink,subscriptTbpsubscriptLbsubscriptClinksubscriptLbhsubscriptNtsubscriptLtsubscriptClink\text{T}_{\text{bp}}=\frac{\text{L}_{\text{b}}}{\text{C}_{\text{link}}}=\frac{% \text{L}_{\text{bh}}+\text{N}_{\text{t}}\cdot\text{L}_{\text{t}}}{\text{C}_{% \text{link}}},T start_POSTSUBSCRIPT bp end_POSTSUBSCRIPT = divide start_ARG L start_POSTSUBSCRIPT b end_POSTSUBSCRIPT end_ARG start_ARG C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT end_ARG = divide start_ARG L start_POSTSUBSCRIPT bh end_POSTSUBSCRIPT + N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT ⋅ L start_POSTSUBSCRIPT t end_POSTSUBSCRIPT end_ARG start_ARG C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT end_ARG , (4)

where LbhsubscriptLbh\text{L}_{\text{bh}}L start_POSTSUBSCRIPT bh end_POSTSUBSCRIPT is the block’s header length and NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT is the number of transactions carried in the block. In this paper, NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT is fixed to 1, provided that miners are assumed to perform model aggregation and include the resulting global model in the block.

IV-B2 Blockchain mining and consensus

We adopt a PoW-based type of consensus, whereby miners compete to update the blockchain by appening new blocks. In particular, each miner m𝑚m\in\mathcal{M}italic_m ∈ caligraphic_M employs its computational hash power ξ(m)superscript𝜉𝑚\xi^{(m)}italic_ξ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT to generate a new block each time a valid block is received. The time it takes a miner m𝑚mitalic_m to generate a block, Tbg(m)superscriptsubscriptTbg𝑚\text{T}_{\text{bg}}^{(m)}T start_POSTSUBSCRIPT bg end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT, is characterized by an exponential distribution Exp(λ(m))Expsuperscript𝜆𝑚\text{Exp}(\lambda^{(m)})Exp ( italic_λ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT ), being λ(m)superscript𝜆𝑚\lambda^{(m)}italic_λ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT defined as

λ(m)=ξ(m)nξ(n)1BI,superscript𝜆𝑚superscript𝜉𝑚subscript𝑛superscript𝜉𝑛1𝐵𝐼\lambda^{(m)}=\frac{\xi^{(m)}}{\sum_{n\in\mathcal{M}}\xi^{(n)}}\frac{1}{BI},italic_λ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT = divide start_ARG italic_ξ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_n ∈ caligraphic_M end_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_ARG divide start_ARG 1 end_ARG start_ARG italic_B italic_I end_ARG , (5)

where BI𝐵𝐼BIitalic_B italic_I is the block interval, which, together with the total hash power (ξ=mξ(m)𝜉subscript𝑚superscript𝜉𝑚\xi=\sum_{m\in\mathcal{M}}\xi^{(m)}italic_ξ = ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_M end_POSTSUBSCRIPT italic_ξ start_POSTSUPERSCRIPT ( italic_m ) end_POSTSUPERSCRIPT), determines the mining difficulty.

As previously described, the mining operation associated with PoW is done concurrently and in a decentralized manner, which might lead to ledger inconsistencies in the form of forks. Assuming that the time between blocks is characterized by a Poisson inter-arrival process, the fork probability is given by

Pfork=1iwPr(Tbg(i)Tbg(w)>Tbp(w))=1eμ(||1)Tbp(w),subscriptPfork1subscriptproductfor-all𝑖𝑤PrsuperscriptsubscriptTbg𝑖superscriptsubscriptTbg𝑤superscriptsubscriptTbp𝑤1superscript𝑒𝜇1superscriptsubscriptTbp𝑤\begin{split}\text{P}_{\text{fork}}&=1-\prod_{\forall i\neq w}\Pr(\text{T}_{% \text{bg}}^{(i)}-\text{T}_{\text{bg}}^{(w)}>\text{T}_{\text{bp}}^{(w)})\\ &=1-e^{-\mu(|\mathcal{M}|-1)\text{T}_{\text{bp}}^{(w)}},\end{split}start_ROW start_CELL P start_POSTSUBSCRIPT fork end_POSTSUBSCRIPT end_CELL start_CELL = 1 - ∏ start_POSTSUBSCRIPT ∀ italic_i ≠ italic_w end_POSTSUBSCRIPT roman_Pr ( T start_POSTSUBSCRIPT bg end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT - T start_POSTSUBSCRIPT bg end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT > T start_POSTSUBSCRIPT bp end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = 1 - italic_e start_POSTSUPERSCRIPT - italic_μ ( | caligraphic_M | - 1 ) T start_POSTSUBSCRIPT bp end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , end_CELL end_ROW (6)

where Tbg(w)superscriptsubscriptTbg𝑤\text{T}_{\text{bg}}^{(w)}T start_POSTSUBSCRIPT bg end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT and Tbp(w)superscriptsubscriptTbp𝑤\text{T}_{\text{bp}}^{(w)}T start_POSTSUBSCRIPT bp end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_w ) end_POSTSUPERSCRIPT are the winner’s block generation and propagation delays, respectively.

In PoW, forks are eventually solved by consensus, which states that the longest chain (i.e., the one with the highest invested computational power) is the valid one. However, from the point of view of a miner, consensus is not enforced until the next mined block is received, which allows switching to the version of the ledger accepted by the majority. The consensus resolution process, of course, has an impact on the models used by the FL devices for training, as the models used might differ depending on the miner providing its version of the ledger.

IV-C Performance metrics

To evaluate the performance of different blockchained FL realizations, we focus on the blockchain throughput and the model accuracy and staleness, defined in the following subsections.

IV-C1 Blockchain throughput [transactions per second, TPS]

The blockchain throughput is measured as the number of processed transactions per second. In particular, the effective throughput considers the transactions from the main chain only:

Γ=bmainNt(b)Tsim, total,Γsubscript𝑏subscriptmainsuperscriptsubscriptNt𝑏subscriptTsim, total\Gamma=\frac{\sum_{b\in\mathcal{B}_{\text{main}}}\text{N}_{\text{t}}^{(b)}}{% \text{T}_{\text{sim, total}}},roman_Γ = divide start_ARG ∑ start_POSTSUBSCRIPT italic_b ∈ caligraphic_B start_POSTSUBSCRIPT main end_POSTSUBSCRIPT end_POSTSUBSCRIPT N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT end_ARG start_ARG T start_POSTSUBSCRIPT sim, total end_POSTSUBSCRIPT end_ARG , (7)

where mainsubscriptmain\mathcal{B}_{\text{main}}caligraphic_B start_POSTSUBSCRIPT main end_POSTSUBSCRIPT is the set of blocks in the main chain and Tsim, totalsubscriptTsim, total\text{T}_{\text{sim, total}}T start_POSTSUBSCRIPT sim, total end_POSTSUBSCRIPT is the total simulated time.

IV-C2 Model accuracy [percentage, %]

To evaluate the performance of the employed ML models, we use the classification accuracy, defined as

A=Npred, correctNpred, total,AsubscriptNpred, correctsubscriptNpred, total\text{A}=\frac{\text{N}_{\text{pred, correct}}}{\text{N}_{\text{pred, total}}},A = divide start_ARG N start_POSTSUBSCRIPT pred, correct end_POSTSUBSCRIPT end_ARG start_ARG N start_POSTSUBSCRIPT pred, total end_POSTSUBSCRIPT end_ARG , (8)

where Npred, correctsubscriptNpred, correct\text{N}_{\text{pred, correct}}N start_POSTSUBSCRIPT pred, correct end_POSTSUBSCRIPT is the number of correct predictions and Npred, totalsubscriptNpred, total\text{N}_{\text{pred, total}}N start_POSTSUBSCRIPT pred, total end_POSTSUBSCRIPT is the total number of predictions done. We differentiate between two types of accuracy metrics, based on the data partition on which they are applied:

  1. 1.

    Test accuracy, AtestsubscriptAtest\text{A}_{\text{test}}A start_POSTSUBSCRIPT test end_POSTSUBSCRIPT: the accuracy measured in the complete test dataset once the training is finished. The model from the last mined block of the main chain is used for the evaluation.

  2. 2.

    Block training/validation accuracy, Atrain/val(b)superscriptsubscriptAtrain/val𝑏\text{A}_{\text{train/val}}^{(b)}A start_POSTSUBSCRIPT train/val end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT: the accuracy achieved by the global model stored in a given block b𝑏bitalic_b from the main chain. The training/validation data of the clients contributing to such a block is used for the measurement.

IV-C3 Model staleness [seconds, s]

To measure the degree of staleness of the FL models stored in the blockchain, we resort to the blockchain and communication latency models provided in sections IV-B2 and IV-B1. The freshness of a global model update does not only depend on the blockchain delays (including block mining and block propagation) but also on the delays associated with training local models by FL devices. In particular, the staleness of block b𝑏bitalic_b is

Λ(b)=1|𝒩t(b)|n=1|𝒩t(b)|(TSbm(b)TStg(b,n)),superscriptΛ𝑏1superscriptsubscript𝒩t𝑏superscriptsubscript𝑛1superscriptsubscript𝒩t𝑏superscriptsubscriptTSbm𝑏superscriptsubscriptTStg𝑏𝑛\Lambda^{(b)}=\frac{1}{|\mathcal{N}_{\text{t}}^{(b)}|}\sum_{n=1}^{|\mathcal{N}% _{\text{t}}^{(b)}|}\big{(}\text{TS}_{\text{bm}}^{(b)}-\text{TS}_{\text{tg}}^{(% b,n)}\big{)},roman_Λ start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | caligraphic_N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT | end_POSTSUPERSCRIPT ( TS start_POSTSUBSCRIPT bm end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT - TS start_POSTSUBSCRIPT tg end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b , italic_n ) end_POSTSUPERSCRIPT ) , (9)

where TStg(b,n)superscriptsubscriptTStg𝑏𝑛\text{TS}_{\text{tg}}^{(b,n)}TS start_POSTSUBSCRIPT tg end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b , italic_n ) end_POSTSUPERSCRIPT is the time at which transaction n𝑛nitalic_n is generated, TSbm(b)superscriptsubscriptTSbm𝑏\text{TS}_{\text{bm}}^{(b)}TS start_POSTSUBSCRIPT bm end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT is the time block b𝑏bitalic_b is mined, and 𝒩t(b)subscriptsuperscript𝒩𝑏t\mathcal{N}^{(b)}_{\text{t}}caligraphic_N start_POSTSUPERSCRIPT ( italic_b ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT t end_POSTSUBSCRIPT is the set of transactions used to generate the model in block b𝑏bitalic_b.

V Performance Evaluation

To carry out the experiments, we use BlockFLsim [11], an extension of BlockSim [12] that provides event-based simulations of blockchained FL applications. BlockFLsim uses Pytorch libraries [13] to train and evaluate FL models following the scheme previously illustrated in Fig. 1. Through the simulation of blockchained FL, we can characterize the phenomena associated with model staleness and inconsistencies, as discussed in Section I. The simulation parameters used for the considered scenarios are collected in Table I.

TABLE I: Simulation parameters.
Parameter Description Value
BI𝐵𝐼BIitalic_B italic_I Block interval {1, 10, 60} s
NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT Max. local models per block {1, 5, 10}
LtsubscriptLt\text{L}_{\text{t}}L start_POSTSUBSCRIPT t end_POSTSUBSCRIPT Transaction length (MNIST/CIFAR-10) 0.796/2.327 Mb
LbhsubscriptLbh\text{L}_{\text{bh}}L start_POSTSUBSCRIPT bh end_POSTSUBSCRIPT Block header length 20 Kb
M𝑀Mitalic_M Number of miners 10
ClinksubscriptClink\text{C}_{\text{link}}C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT P2P links’ capacity {1, 100} Mbps
NbsubscriptNb\text{N}_{\text{b}}N start_POSTSUBSCRIPT b end_POSTSUBSCRIPT Total simulated blocks 200
K𝐾Kitalic_K Number of FL devices {10, 50, 100}
E𝐸Eitalic_E Number of local epochs 5
B𝐵Bitalic_B Batch size 64
ξclientsubscript𝜉client\xi_{\text{client}}italic_ξ start_POSTSUBSCRIPT client end_POSTSUBSCRIPT Devices comp. power 900 MIPS

The evaluation is done on the MNIST [14] and CIFAR-10 [15] datasets, respectively. MNIST contains 70.000 samples (28×28×12828128\times 28\times 128 × 28 × 1 black and white images of hand-written numbers), split into 60.000 for training and 10.000 for test, and CIFAR-10 contains a total of 60.000 samples (32×32×33232332\times 32\times 332 × 32 × 3 color images of objects), split into 50.000 for training and 10.000 for test. In our case, for each dataset, we have further split the test partitions into test (30%) and validation (70%). The ML model selected to be trained in a federated manner is a Feed-forward Neural Network (FNN) for MNIST and a Convolutional Neural Network (CNN) for CIFAR-10. In both cases, cross-entropy is used as a loss function, which suits the target classification task well. The details of the implemented FNN and CNN are collected in Table II.

TABLE II: Per-layer detailed information of the FNN-MNIST and CNN-CIFAR10 implementations.
Model Layer Activation Kernel Stride Input Output
FNNMNISTMNIST{}_{\text{MNIST}}start_FLOATSUBSCRIPT MNIST end_FLOATSUBSCRIPT Fully-conn. ReLU - - 784 200
Fully-conn. ReLU - - 200 200
Fully-conn. LogSoftMax - - 200 10
Optimizer: SGD, learning rate = 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
CNNCIFAR10CIFAR10{}_{\text{CIFAR10}}start_FLOATSUBSCRIPT CIFAR10 end_FLOATSUBSCRIPT Conv2D ReLU 3 1 3 16
MaxPool - 2 - - -
Conv2D ReLU 3 1 16 32
MaxPool - 2 - - -
Conv2D ReLU 3 1 32 64
MaxPool - 2 - - -
Fuly-conn. ReLU - - 1024 512
Fully-conn. ReLU - - 512 64
Fully-conn. - - - 64 10
Optimizer: SGD, learning rate = 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, momentum = 0.9, weight decay = 5104absentsuperscript104\cdot 10^{-4}⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT

V-A Blockchain throughput

First, we study the performance of different blockchain settings in Fig. 6, which shows the throughput in TPS achieved for different block interval (BI𝐵𝐼BIitalic_B italic_I) values and numbers of transactions used per block (NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT). To showcase the performance for the most challenging task considered, Fig. 6 focuses on CIFAR-10, where a heavier CNN model is used.

Refer to caption
Figure 6: Blockchain throughput achieved by each considered blockchain setting in TPS.

As shown, the blockchain throughput decreases as the block interval BI𝐵𝐼BIitalic_B italic_I increases (e.g., from 7.429 TPS to 0.163 for BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s and BI=60𝐵𝐼60BI=60italic_B italic_I = 60 s, respectively, when Nt=10subscriptNt10\text{N}_{\text{t}}=10N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 10), which is mostly motivated by the bottleneck created by block mining. Both BI𝐵𝐼BIitalic_B italic_I and NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT are blockchain configuration parameters that allow for representing different types of blockchains (e.g., public, consortium, private). The block interval, for instance, is a fundamental aspect of the design of blockchains, provided that it determines the speed at which the blockchain validates and secures transactions.333In PoW mining, miners invest their computational power to prove the validity of the blocks they mine in a decentralized setting. Accordingly, large block intervals are derived from the necessity of enforcing a high mining difficulty in networks with a big computational power. Typically, the block interval is selected according to the characteristics of the targeted scenario (e.g., type of participation access) and the security requirements of the supported decentralized application. Whereas there are multiple aspects involved in the establishment of security and trust in a blockchain, the block interval encapsulates well the characterization of different types of blockchain. In particular, low block interval values such as BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s are feasible in blockchains where the miners are trusted (e.g., a private blockchain controlled by a consortium of network operators), while higher values like BI=60𝐵𝐼60BI=60italic_B italic_I = 60 s are required in blockchains where miners are trustless (e.g., a public blockchain where any interested party can participate). As for the set of included transactions used to generate each block, we observe that higher NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT lead to higher throughput (e.g., from 0.813 to 7.429 TPS for Nt=1subscriptNt1\text{N}_{\text{t}}=1N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 1 and Nt=10subscriptNt10\text{N}_{\text{t}}=10N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT = 10, respectively when BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s), as blocks can carry more information. However, processing and validating a large volume of user transactions requires enough computational power from blockchain nodes. The blockchain throughput, as shown in more detail in the sequel, has an impact on the underlying application’s performance, so it is an important metric to be optimized.

V-B FL model performance

Refer to caption
(a) Training accuracy (MNIST).
Refer to caption
(b) Validation accuracy (MNIST).
Refer to caption
(c) Training accuracy (CIFAR-10).
Refer to caption
(d) Validation accuracy (CIFAR-10).
Figure 7: Temporal evolution of the models’ accuracy: a) training accuracy (MNIST), b) validation accuracy (MNIST), c) training accuracy (CIFAR-10), d) validation accuracy (CIFAR-10). The experiments for Clink=1subscriptClink1\text{C}_{\text{link}}=1C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 1 Mbps are represented by light circles, while Clink=100subscriptClink100\text{C}_{\text{link}}=100C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 100 Mbps results are represented by solid lines.

We now focus on the performance of the FL models in each blockchained FL setting. First, Fig. 7 shows the temporal evolution of both the training (Fig. 6(a) and Fig. 6(c)) and the validation accuracy (Fig. 6(b) and Fig. 6(d)) achieved by the models carried in the blocks of the main chain, for the two different datasets. The evaluation also includes the comparison of two types of P2P links, namely Clink={1,100}subscriptClink1100\text{C}_{\text{link}}=\{1,100\}C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = { 1 , 100 } Mbps. The maximum number of transactions per block, NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT, is set to 10.

Starting with MNIST, we observe in Fig. 6(a) and Fig. 6(b) that the performance is very similar for both training and validation accuracy, and this is due to the simplicity of the data employed (at which up to 99,7% performance can be achieved [53]). For CIFAR-10 (see Fig. 6(c) and Fig. 6(d)), in contrast, the discrepancies between training and validation accuracy are much more noticeable, being the training accuracy superior (up to 100% accuracy) to the validation one (up to 75%).

Regarding the different evaluation parameters, we find the following. First, the lower the block interval (BI𝐵𝐼BIitalic_B italic_I), the worse the accuracy, especially in CIFAR-10, for which the model’s complexity is higher. Setting BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s, while allowing to generate blocks very fast, leads to a low training accuracy (in CIFAR-10, up to 50% training/validation accuracy at the end of the training). As studied later with further detail, a short block generation time leads to a high number of forks, which is detrimental to the learning procedure. Increasing the block interval (BI=10𝐵𝐼10BI=10italic_B italic_I = 10 s and BI=60𝐵𝐼60BI=60italic_B italic_I = 60 s), instead, contributes to improving the accuracy significantly, as newly generated models are properly spread throughout the network, thus leading to a consistent federated training operation. Increasing the block interval, therefore, is one way of enforcing stability to FL training, but it leads to a higher training time.

When it comes to the number of FL participants (K𝐾Kitalic_K), we observe that a higher performance is achieved as K𝐾Kitalic_K decreases in all the cases. This is directly related to the amount of training data available to FL participants, provided that the entire dataset is split among all the considered participants. Apart from that, we observe that the gap in the accuracy achieved for different values of K𝐾Kitalic_K becomes bigger as BI𝐵𝐼BIitalic_B italic_I increases, as setting a higher BI𝐵𝐼BIitalic_B italic_I value allows for collecting more transactions in each block. In settings where BI𝐵𝐼BIitalic_B italic_I is small (BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s), the probability of collecting enough transactions per block is low if the number of FL participants K𝐾Kitalic_K is low (K=10𝐾10K=10italic_K = 10). In contrast, deployments with more FL clients in place (K=100𝐾100K=100italic_K = 100) allow gathering plenty of local models per block, as there are more participants operating concurrently. However, the quality of the provided model updates for K=100𝐾100K=100italic_K = 100 is lower than for K=10𝐾10K=10italic_K = 10, as data is split among a higher number of users. Therefore, there is a trade-off between the number of local models per block and the quality of the same, and such a trade-off becomes apparent when the block interval is not properly dimensioned according to the underlying FL application.

Finally, regarding the type of links used to transmit transactions and blocks, we observe different behaviors depending on the blockchain setting. For low block interval values, i.e., BI={1,10}𝐵𝐼110BI=\{1,10\}italic_B italic_I = { 1 , 10 } s, there is a big gap between the accuracy achieved by Clink=100subscriptClink100\text{C}_{\text{link}}=100C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 100 Mbps (shown by the solid lines) and Clink=1subscriptClink1\text{C}_{\text{link}}=1C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 1 Mbps (shown by the light circles). This gap is due in large part to the need for propagating the updated models throughout the blockchain network consistently, thus enabling a robust FL training operation. Otherwise, ledger inconsistencies lead to conflicting and redundant efforts in iterating the global model, thus negatively affecting the overall model accuracy. The gap between the performance achieved by each ClinksubscriptClink\text{C}_{\text{link}}C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT value, however, is mitigated as BI𝐵𝐼BIitalic_B italic_I increases, thanks to the model consistency achieved in those cases.

V-C Model inconsistencies

As shown before, the type of blockchain adopted and its characteristics have a significant impact on the performance of the ML model, being the block interval (BI𝐵𝐼BIitalic_B italic_I) and the P2P links capacity (ClinksubscriptClink\text{C}_{\text{link}}C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT) some of the most critical parameters in that regard. Now, to further illustrate the impact of the blockchain characteristics on the federated models’ accuracy, in Fig. 8, we show the difference between the test accuracy achieved for Clink=1subscriptClink1\text{C}_{\text{link}}=1C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 1 Mbps (leading to a higher fork probability) and Clink=100subscriptClink100\text{C}_{\text{link}}=100C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 100 Mbps (granting more stability), accompanied by the fork probability in each case. The fork probability is a good indicator of the model inconsistencies, as it gives a good intuition of how many blockchain versions (i.e., global models) are concurrently used by different FL clients.

Refer to caption
Figure 8: Effect of forks on the test accuracy for both MNIST (left plot) and CIFAR-10 (right plot). The boxplots show the mean difference in the accuracy achieved at various blockchained FL settings when using either Clink=1subscriptClink1\text{C}_{\text{link}}=1C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 1 Mbps or Clink=100subscriptClink100\text{C}_{\text{link}}=100C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 100 Mbps. The horizontal dashed lines represent the mean fork probability in each case.

The results in Fig. 8 confirm the observations done above for Fig. 7, as the highest differences in the test accuracy are obtained for low block interval values (BI={1,10}𝐵𝐼110BI=\{1,10\}italic_B italic_I = { 1 , 10 } s) and high numbers of FL participants (K=50,100𝐾50100K={50,100\\ }italic_K = 50 , 100 clients). In particular, a high fork probability (Pfork0.8subscriptPfork0.8\text{P}_{\text{fork}}\approx 0.8P start_POSTSUBSCRIPT fork end_POSTSUBSCRIPT ≈ 0.8 for the worst case in CIFAR-10) can lead to a decrease of up to 30% on the model’s accuracy when users’ diversity is high, i.e., for K=100𝐾100K=100italic_K = 100 clients. In these situations, increasing the pace at which information is processed within the blockchain by increasing BI𝐵𝐼BIitalic_B italic_I is helpful to minimize the fork probability, thus mitigating the effects of model inconsistencies. For instance, for MNIST, BI=10𝐵𝐼10BI=10italic_B italic_I = 10 s provides a good trade-off between the block confirmation time and the fork probability. For CIFAR-10, which uses heavier modes, setting BI=60𝐵𝐼60BI=60italic_B italic_I = 60 s is required to keep the test accuracy difference below 5%.

V-D Model staleness

Next, we focus on model staleness and its impact on ML model accuracy. To that end, Fig. 9 illustrates the temporal evolution of the model staleness and the validation accuracy experienced throughout the 50 first blocks of the main chain in each simulation. For the sake of reducing the impact of forks and focusing on model staleness, the results are shown only for Clink=100subscriptClink100\text{C}_{\text{link}}=100C start_POSTSUBSCRIPT link end_POSTSUBSCRIPT = 100 Mbps, where the fork probability is kept low in all the scenarios. Apart from that, the maximum number of local updates aggregated in a block, NtsubscriptNt\text{N}_{\text{t}}N start_POSTSUBSCRIPT t end_POSTSUBSCRIPT, is fixed to 10. In the subplots on the top of Fig. 9, both model staleness and validation accuracy are displayed together in a 3D grid, while the corresponding 2D projection of the validation accuracy is plotted at the bottom of each figure for the sake of clarity.

Refer to caption
(a) MNIST.
Refer to caption
(b) CIFAR-10.
Figure 9: Effect of model staleness on the validation accuracy: a) MNIST, b) CIFAR-10. The figures on the top show a 3D representation of the temporal evolution of model staleness and validation accuracy, while the plots on the bottom show the evolution of the validation accuracy in 2D.

As shown in both Fig. 8(a) and Fig. 8(b), model staleness remains stable for BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s, which indicates that the blockchain is able to process the model updates submitted by clients in time. In contrast, model staleness increases very fast for each block when BI=60𝐵𝐼60BI=60italic_B italic_I = 60 s, as more and more local model updates remain unprocessed at the unconfirmed pool of transactions, thus becoming outdated with respect to newer updates (the blockchain cannot process all the local model updates). For BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s, the achieved accuracy is significantly lower than in the other cases because FL devices are not fast enough to compute and provide local updates in time (as previously shown in Fig. 7), thus leading to under-trained models at each block depth. As for BI=10𝐵𝐼10BI=10italic_B italic_I = 10 s and BI=60𝐵𝐼60BI=60italic_B italic_I = 60 s, both solutions provide very similar accuracy, even if model staleness is much higher in the second case due to the slow pace at which the blockchain processes users’ transactions (this is the opposite of what occurs with BI=1𝐵𝐼1BI=1italic_B italic_I = 1 s). This result suggests that model staleness is not detrimental to the model performance at all and, moreover, old local updates (i.e., updates computed from old global models) indeed contribute to sustaining the training of the federated models. Nevertheless, this conclusion is tied to the dataset used in this paper (CIFAR-10) and the distribution of the data over time, which in this case remains unchanged. However, other applications where data vary over time (which is associated with concept drift [54]) could be severely affected by staleness.

VI Future Research Directions

VI-A Blockchained FL optimization

The analysis performed in Section V has demonstrated that the type of blockchain selected and its configuration have a high impact on the performance of the application running over it. In particular, model staleness and model inconsistencies, which are motivated by the mismatch between the application requirements and the blockchain performance, may result in significant performance degradation of FL if they are not kept under control. For that reason, blockchain optimization becomes particularly relevant for the FL use case. Whereas techniques like sharding or off-chain computation have been widely applied to improve current blockchains, other works have focused on optimizing blockchain configurations to better comply with the application requirements. In this regard, [31] focused on the block generation rate to minimize the end-to-end latency of blockchained FL applications, while [52] targeted the optimization of the block size as a function of the users’ activity.

Another important aspect for optimizing blockchained FL applications relates to the exchange of raw ML models between FL devices and blockchain miners. As shown in [2], this approach entails significant communication and storage overheads, especially for complex ML models like VGG-16, whose size is over 500 MB [55]. To alleviate the communication burden, other distributed learning approaches such as Knowledge Distillation (KD) [56], much lighter than FL in terms of communication, might be compelling for its integration with blockchain.

VI-B Trust and security enforcement through client selection

The performance of blockchained FL applications can be affected by the quality and legitimacy of the contributions from the FL devices. To address these issues, we first find in FL client selection a prominent solution to speed up the performance of FL, thus relieving the burden on the blockchain. With FL client selection, user heterogeneity (e.g., in terms of computation and/or communication capabilities) can be addressed by selecting the best-performing nodes [57]. Apart from FL client selection, blockchain inherent properties can be leveraged and extended to provide enhanced security and trust. Some prominent examples are enhanced authentication mechanisms [58] and trust evaluation and enforcement [59]. Finally, there is the opportunity to adopt mechanisms in the blockchain for selecting client model updates to be mined, based on their degree of staleness (is the model update fresh enough?).

VII Conclusions

The blend of two disruptive technologies such as blockchain and AI can enable the proliferation of breakthrough innovations in the domain of collaborative computation. Blockchain and distributed ML solutions such as FL lead to a very profitable symbiosis in which trust and immutability are provided to decentralized applications, previously lacking security and privacy guarantees. In FL, blockchain allows getting rid of the figure of the central orchestrating server by replacing it with a democratic P2P network, thus contributing to relieving the issues of centralization (e.g., bottlenecking) and granting full control of the data to application participants. The partnership of blockchain and FL, however, poses a series of trade-offs that demand a joint design of the blockchain infrastructure and the underlying FL application.

In this paper, we studied those trade-offs by analyzing the impact that different blockchain realizations have on FL performance in various scenarios. For that, we introduced an extension of BlockSim called BlockFLsim—a blockchain simulator with embedded FL operations—to characterize FL applications running in a blockchain. Using BlockFLsim, we studied model staleness and inconsistencies as a result of the blockchain operation and analyzed their impact on the FL model accuracy. Our results showed that model inconsistencies, resulting from fast blockchains with low communication capabilities, can largely contribute to lowering the FL model accuracy (up to 34% less accuracy) when compared to more stable settings. When it comes to model staleness, we showed that its impact on the model accuracy is much lower than inconsistencies. Furthermore, we saw that, for the studied dataset, stale model updates can still contribute to improving the global model, which suggests that stale updates should not be discarded, so that already spent computational power is not wasted. Future work includes the evaluation of staleness and model inconsistencies in experimental blockchain platforms, which would allow for providing further insights into the blockchain’s decentralization-security-cost trilemma and the scalability of blockchained FL applications. In addition, the study of blockchain-native mechanisms for incentivizing the participation of FL devices, which in practice might be reluctant to invest computational power, is left as future work.

References

  • [1] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, 2019.
  • [2] E. Guerra, F. Wilhelmi, M. Miozzo, and D. Paolo, “The Cost of Training Machine Learning Models over Distributed Data Sources,” IEEE Open Journal of the Communications Society, 2023.
  • [3] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
  • [4] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  • [5] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
  • [6] S. R. Pokhrel and J. Choi, “A decentralized federated learning approach for connected autonomous vehicles,” in 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW).   IEEE, 2020, pp. 1–6.
  • [7] S. Nakamoto, “Bitcoin: A Peer-to-Peer electronic cash system,” Tech. Rep., 2008.
  • [8] U. Majeed and C. S. Hong, “FLchain: Federated learning via MEC-enabled blockchain network,” in 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS).   IEEE, 2019, pp. 1–4.
  • [9] E. K. Wang, Z. Liang, C.-M. Chen, S. Kumari, and M. K. Khan, “PoRX: A reputation incentive scheme for blockchain consensus of IIoT,” Future generation computer systems, vol. 102, pp. 140–151, 2020.
  • [10] F. Wilhelmi, E. Guerra, and P. Dini, “On the decentralization of blockchain-enabled asynchronous federated learning,” in 2023 IEEE 9th International Conference on Network Softwarization (NetSoft).   IEEE, 2023, pp. 408–413.
  • [11] ——, “BlockFLsim: Blockchain Federated Learning Simulator,” 2022. [Online]. Available: https://gitlab.cttc.es/supercom/blockFLsim/-/tree/BlockFLsim
  • [12] M. Alharby and A. Van Moorsel, “Blocksim: a simulation framework for blockchain systems,” ACM SIGMETRICS Performance Evaluation Review, vol. 46, no. 3, pp. 135–138, 2019.
  • [13] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems 32.   Curran Associates, Inc., 2019, pp. 8024–8035.
  • [14] L. Deng, “The mnist database of handwritten digit images for machine learning research [best of the web],” IEEE signal processing magazine, vol. 29, no. 6, pp. 141–142, 2012.
  • [15] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  • [16] A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized SGD with changing topology and local updates,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119.   PMLR, 13–18 Jul 2020, pp. 5381–5393. [Online]. Available: https://proceedings.mlr.press/v119/koloskova20a.html
  • [17] A. Lalitha, S. Shekhar, T. Javidi, and F. Koushanfar, “Fully decentralized federated learning,” in Third workshop on bayesian deep learning (NeurIPS), vol. 2, 2018.
  • [18] R. Ormándi, I. Hegedűs, and M. Jelasity, “Gossip learning with linear models on fully distributed data,” Concurrency and Computation: Practice and Experience, vol. 25, no. 4, pp. 556–571, 2013.
  • [19] M. Miozzo, Z. Ali, L. Giupponi, and P. Dini, “Distributed and multi-task learning at the edge for energy efficient radio access networks,” IEEE Access, vol. 9, pp. 12 491–12 505, 2021.
  • [20] C. Hu, J. Jiang, and Z. Wang, “Decentralized federated learning: A segmented gossip approach,” arXiv preprint arXiv:1908.07782, 2019.
  • [21] T. Sun, D. Li, and B. Wang, “Decentralized federated averaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4289–4301, 2022.
  • [22] H. Xing, O. Simeone, and S. Bi, “Decentralized federated learning via SGD over wireless D2D networks,” in 2020 IEEE 21st international workshop on signal processing advances in wireless communications (SPAWC).   IEEE, 2020, pp. 1–5.
  • [23] Y. Qu, H. Dai, Y. Zhuang, J. Chen, C. Dong, F. Wu, and S. Guo, “Decentralized federated learning for UAV networks: Architecture, challenges, and opportunities,” IEEE Network, vol. 35, no. 6, pp. 156–162, 2021.
  • [24] A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, and C. Wachinger, “Braintorrent: A peer-to-peer environment for decentralized federated learning,” arXiv preprint arXiv:1905.06731, 2019.
  • [25] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge computing systems,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1205–1221, 2019.
  • [26] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Client-edge-cloud hierarchical federated learning,” in ICC 2020-2020 IEEE International Conference on Communications (ICC).   IEEE, 2020, pp. 1–6.
  • [27] Q. Chen, Z. Wang, Y. Zhou, J. Chen, D. Xiao, and X. Lin, “CFL: Cluster Federated Learning in Large-Scale Peer-to-Peer Networks,” in International Conference on Information Security.   Springer, 2022, pp. 464–472.
  • [28] E. T. M. Beltrán, M. Q. Pérez, P. M. S. Sánchez, S. L. Bernal, G. Bovet, M. G. Pérez, G. M. Pérez, and A. H. Celdrán, “Decentralized federated learning: Fundamentals, state-of-the-art, frameworks, trends, and challenges,” arXiv preprint arXiv:2211.08413, 2022.
  • [29] Q. W. Khan, A. N. Khan, A. Rizwan, R. Ahmad, S. Khan, and D. H. Kim, “Decentralized machine learning training: a survey on synchronization, consolidation, and topologies,” IEEE Access, 2023.
  • [30] E. Gabrielli, G. Pica, and G. Tolomei, “A Survey on Decentralized Federated Learning,” arXiv preprint arXiv:2308.04604, 2023.
  • [31] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained on-device federated learning,” IEEE Communications Letters, vol. 24, no. 6, pp. 1279–1283, 2019.
  • [32] D. C. Nguyen, M. Ding, Q.-V. Pham, P. N. Pathirana, L. B. Le, A. Seneviratne, J. Li, D. Niyato, and H. V. Poor, “Federated learning meets blockchain in edge computing: Opportunities and challenges,” IEEE Internet of Things Journal, vol. 8, no. 16, pp. 12 806–12 825, 2021.
  • [33] D. Hou, J. Zhang, K. L. Man, J. Ma, and Z. Peng, “A systematic literature review of blockchain-based federated learning: Architectures, applications and issues,” in 2021 2nd Information Communication Technologies Conference (ICTC).   IEEE, 2021, pp. 302–307.
  • [34] N. Afraz, F. Wilhelmi, H. Ahmadi, and M. Ruffini, “Blockchain and Smart Contracts for Telecommunications: Requirements vs. Cost Analysis,” IEEE Access, 2023.
  • [35] Y. Lu, X. Huang, K. Zhang, S. Maharjan, and Y. Zhang, “Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles,” IEEE Transactions on Vehicular Technology, vol. 69, no. 4, pp. 4298–4311, 2020.
  • [36] Y. Zhao, J. Zhao, L. Jiang, R. Tan, D. Niyato, Z. Li, L. Lyu, and Y. Liu, “Privacy-preserving blockchain-based federated learning for IoT devices,” IEEE Internet of Things Journal, vol. 8, no. 3, pp. 1817–1829, 2020.
  • [37] F. Wilhelmi, L. Giupponi, and P. Dini, “Analysis and evaluation of synchronous and asynchronous FLchain,” Computer Networks, vol. 218, p. 109390, 2022.
  • [38] Y. Qu, L. Gao, T. H. Luan, Y. Xiang, S. Yu, B. Li, and G. Zheng, “Decentralized privacy using blockchain-enabled federated learning in fog computing,” IEEE Internet of Things Journal, vol. 7, no. 6, pp. 5171–5183, 2020.
  • [39] M. Ali, H. Karimipour, and M. Tariq, “Integration of blockchain and federated learning for Internet of Things: Recent advances and future challenges,” Computers & Security, vol. 108, p. 102355, 2021.
  • [40] D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, “Integration of blockchain and cloud of things: Architecture, applications and challenges,” IEEE Communications surveys & tutorials, vol. 22, no. 4, pp. 2521–2549, 2020.
  • [41] Y. Qi, M. S. Hossain, J. Nie, and X. Li, “Privacy-preserving blockchain-based federated learning for traffic flow prediction,” Future Generation Computer Systems, vol. 117, pp. 328–337, 2021.
  • [42] Y. Lu, X. Huang, K. Zhang, S. Maharjan, and Y. Zhang, “Low-latency federated learning and blockchain for edge association in digital twin empowered 6G networks,” IEEE Transactions on Industrial Informatics, vol. 17, no. 7, pp. 5098–5107, 2020.
  • [43] S. R. Pokhrel and J. Choi, “Federated learning with blockchain for autonomous vehicles: Analysis and design challenges,” IEEE Transactions on Communications, vol. 68, no. 8, pp. 4734–4746, 2020.
  • [44] Y. Liu, Y. Qu, C. Xu, Z. Hao, and B. Gu, “Blockchain-enabled asynchronous federated learning in edge computing,” Sensors, vol. 21, no. 10, p. 3335, 2021.
  • [45] R. Wang and W.-T. Tsai, “Asynchronous Federated Learning System Based on Permissioned Blockchains,” Sensors, vol. 22, no. 4, p. 1672, 2022.
  • [46] L. Feng, Y. Zhao, S. Guo, X. Qiu, W. Li, and P. Yu, “BAFL: A Blockchain-Based Asynchronous Federated Learning Framework,” IEEE Transactions on Computers, vol. 71, no. 05, pp. 1092–1103, 2022.
  • [47] L. Kan, Y. Wei, A. H. Muhammad, W. Siyuan, L. C. Gao, and H. Kai, “A multiple blockchains architecture on inter-blockchain communication,” in 2018 IEEE international conference on software quality, reliability and security companion (QRS-C).   IEEE, 2018, pp. 139–145.
  • [48] L. Tseng, L. Wong, S. Otoum, M. Aloqaily, and J. B. Othman, “Blockchain for managing heterogeneous internet of things: A perspective architecture,” IEEE network, vol. 34, no. 1, pp. 16–23, 2020.
  • [49] X. Wang, S. Garg, H. Lin, G. Kaddoum, J. Hu, and M. S. Hossain, “A secure data aggregation strategy in edge computing and blockchain-empowered internet of things,” IEEE Internet of Things Journal, vol. 9, no. 16, pp. 14 237–14 246, 2020.
  • [50] M. Aledhari, R. Razzak, R. M. Parizi, and F. Saeed, “Federated learning: A survey on enabling technologies, protocols, and applications,” IEEE Access, vol. 8, pp. 140 699–140 725, 2020.
  • [51] M. B. Taylor, “The evolution of bitcoin hardware,” Computer, vol. 50, no. 9, pp. 58–66, 2017.
  • [52] F. Wilhelmi, S. Barrachina-Muñoz, and P. Dini, “End-to-End Latency Analysis and Optimal Block Size of Proof-of-Work Blockchain Applications,” IEEE Communications Letters, pp. 1–1, 2022.
  • [53] S. An, M. Lee, S. Park, H. Yang, and J. So, “An ensemble of simple convolutional neural network models for mnist digit recognition,” arXiv preprint arXiv:2008.10400, 2020.
  • [54] A. Tsymbal, “The problem of concept drift: definitions and related work,” Computer Science Department, Trinity College Dublin, vol. 106, no. 2, p. 58, 2004.
  • [55] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [56] Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for heterogeneous federated learning,” in International conference on machine learning.   PMLR, 2021, pp. 12 878–12 889.
  • [57] T. Nishio and R. Yonetani, “Client selection for federated learning with heterogeneous resources in mobile edge,” in ICC 2019-2019 IEEE international conference on communications (ICC).   IEEE, 2019, pp. 1–7.
  • [58] X. Wang, S. Garg, H. Lin, M. J. Piran, J. Hu, and M. S. Hossain, “Enabling secure authentication in industrial iot with transfer learning empowered blockchain,” IEEE Transactions on Industrial Informatics, vol. 17, no. 11, pp. 7725–7733, 2021.
  • [59] X. Wang, S. Garg, H. Lin, G. Kaddoum, J. Hu, and M. M. Hassan, “Heterogeneous blockchain and ai-driven hierarchical trust evaluation for 5g-enabled intelligent transportation systems,” IEEE Transactions on Intelligent Transportation Systems, 2021.