Robustness of Decentralised Learning to Nodes and Data Disruption

Luigi Palmieri [email protected] Chiara Boldrini¹¹1C. Boldrini and L. Valerio contributed equally to this work. [email protected] Lorenzo Valerio²²2C. Boldrini and L. Valerio contributed equally to this work. [email protected] Andrea Passarella [email protected] Marco Conti [email protected] János Kertész [email protected]

Abstract

In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes’ disruption on the collective learning process. Assuming a given percentage of “central” nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.

\affiliation

[inst1]organization=CNR-IIT,addressline=Via G. Moruzzi, 1, city=Pisa, country=Italy \affiliation[inst2]organization=CEU,addressline=Quellenstrasse 51, city=Vienna, country=Austria

1 Introduction

While centralised AI algorithms based on data centres are necessary for many AI tasks, more and more researchers are focusing on decentralised AI [1]. In general, in decentralised AI, data stays local at nodes generating them, local models are trained based on local data, and then an aggregation process happens whereby nodes combine their local models. Normally, this process is iterative, which lets nodes improve their performance over time after repeated rounds of aggregation and local training. Such a decentralised approach addresses scenarios where data cannot leave their location for privacy or sovereignty reasons or where models must work close to the controlled devices due to real-time constraints.

Federated learning has been the first example of “non-centralised” learning [1]. As discussed in Section 2, Federated learning still requires a central node that controls local models’ aggregation and synchronisation. More recently, researchers have focused increasingly on completely decentralised learning schemes, where nodes collaborate based on locally trained models without the need for central coordination. In this paper, we focus on this case and address specific issues related to the robustness of the decentralised learning process in case of disruptions.

The robustness of decentralised learning is a key research question to address. Specifically, centralised learning guarantees a high degree of reliability for many reasons. First, while locally centralised, data centres are quite robust against failures due to standardised resilience solutions. Moreover, as data is centralised, once the resilience of data centre devices is guaranteed, models can be trained on full datasets. All these conditions are not present in decentralised learning, which intuitively may seem much more a fragile paradigm due to the fact that any node in the network may disappear, thus resulting in potential loss of connectivity (which hinders aggregation of models across nodes), data (which may be residing on individual nodes only), or both.

In this paper, we set out to analyse these aspects of robustness. Specifically, we consider a network of collaborating nodes, each hosting a local dataset and training a local model on it. We assume that nodes are connected according to a Barabasi-Albert network model, which has been shown to reproduce many real-world networked systems, both “artificial” (such as computer networks) and “natural” (such as human social networks) [2]. We further assume that nodes exchange local models only with their direct neighbours and aggregate the received models and the local one according to a well-known averaging policy (known in the literature as Decentralised Averaging [3]). Finally, we assume that nodes have to solve an image classification task.

Under these assumptions, we analyse the robustness of decentralised learning under three configurations, which we denote as Case 1, Case 2, and Case 3, respectively (see Section 4 for more details), characterised by increased levels of disruptions. In Case 1, after an initial period where the decentralised process proceeds without disruptions, we assume that a certain percentage of “central” nodes disappear. However, these nodes do not hold any local data, so the effect is a disruption of the network structure only. In Case 2, we assume that the same nodes also hold local data, and therefore, the disruption consists of the loss of both data and connectivity. Finally, in Case 3, we assume that central nodes hold a significantly higher share of data with respect to the other nodes. Therefore, Case 3 represents the loss of connectivity and a particularly extended set of data on which pre-disruption models could have been trained. In all these cases, we analyse the performance in terms of accuracy achieved by surviving nodes after a certain number of training rounds. As the considered disruptions partition the network into smaller disconnected subgraphs, we also analyse the performance inside the different clusters of connected nodes that are generated after disruption, starting from isolated nodes to the largest connected component. In all cases, we compare the accuracy achieved by the surviving nodes after the disruption in comparison to the baseline case where no disruption occurs and the case when disruption occurs at time 0. The latter represents the boundary case where surviving nodes cannot exploit any collaborative learning process before the disruption.

The main take-home messages we obtain from the results presented in this paper are the following.

1.

First and foremost, decentralised learning proves to be remarkably robust to failures. The loss of accuracy with respect to the case when no disruption occurs is negligible in the largest connected component, and it is limited to 10 to 20% depending on specific parameters, even for completely isolated nodes.
2.

Knowledge persists despite disruption. In all cases and for all types of surviving nodes, the accuracy after disruption grows much larger than when disruption happens at time 0. This means that, even for isolated nodes, having been exposed to the decentralised learning process allows them to maintain a significant level of knowledge acquired from other nodes, provided this can be “refreshed” after disruption by a small local dataset.
3.

Significant disruptions of the network structure are not particularly challenging for the learning process. Provided sufficient data is available in the overall network, nodes can achieve very high accuracy even if the most central nodes of the network disappear and the network becomes partitioned.
4.

Decentralised learning can tolerate even large loss of data. Even when disrupted nodes have much larger local datasets than the others, the latter ones are still able to compensate by jointly extracting knowledge from the data available at the surviving nodes, with a very limited reduction of the overall accuracy after a disruption.

The rest of the paper is organised as follows. Section 2 presents the most important work related to this paper. Section 3 describes in detail the properties of the considered network and decentralised learning process. In Section 4, we present in detail the experimental settings. Section 5 discusses the results of our simulations, while Section 6 concludes the paper.

2 Related work

As stated in the introduction, network disruption is a central concern in today’s interconnected world, whether the network is a communication network, a transportation network, or a social network. On the other hand, complex networks serve as the backbone of numerous real-world systems, from biological processes and social interactions to technological infrastructures like the internet and power grids[4]. As a result, the study of complex network models undergoing network disruption has attracted a lot of attention in the field of network science. Understanding the vulnerability and resilience of complex networks is essential for develo** systems that can withstand failures and disruptions. This is particularly important in disaster management, where the goal is to minimise the impact of critical events on infrastructures and services. Different network topologies exhibit varying levels of resilience and vulnerability. For example, a centralised network is highly vulnerable to single points of failure, since all network communication flows through a central node or hub. Therefore, this topology is particularly vulnerable to disruptions that target the central node.

Within the context of complex networks and their applications to real-world scenarios, networks’ structures such as the Barabasi-Albert model, are frequently used as reference models. This is because of their ability to capture the inhomogeneous connectivity distribution characteristic of many real-world networks, thanks to the scale-free property exhibited by this model. In [5], the authors investigated the tolerance of complex networks with scale-free properties to random errors and targeted attacks. They looked at how the diameter of the networks changed when a small number of nodes were removed, disrupting the interconnectedness of the system, in order to address the networks’ fault tolerance. They found that scale-free networks display high resilience against random failures, largely due to their heterogeneous topology. This resilience stems from the vast majority of nodes having few connections, making it statistically unlikely for a random failure to hit a hub, which is crucial for maintaining the network’s connectivity. On the other hand, in a related study [6], the authors examined the vulnerability of different network models, including Barabasi-Albert graphs, to intentional attacks. Their investigation involved an analysis of how scale-free networks respond to various attack strategies that specifically target nodes based on their properties, such as degree and centrality measures. By examining the behaviour and robustness of these networks under different attack scenarios, they provided valuable insights into the dynamics of scale-free networks and their susceptibility to intentional attacks. They found that these network structures are vulnerable to targeted attacks on their hubs.

In the context of FL and its robustness, in [7], the authors provide insights into the challenges and open problems associated with FL, including network disruptions and communication failures. In FL, models are trained across multiple clients holding local data samples without exchanging them. Therefore, the FL process involves several steps, including broadcasting a model to the participating clients, local client computation, and the subsequent reporting of client updates to the central aggregator. However, as the authors highlight, this setup inherently faces various challenges that can hinder task completion due to system factors that can contribute to failures occurring at any of these steps. These failures can range from explicit communication breakdowns due to unreliable network connections to the presence of straggler clients, participants that report their outputs significantly later than their peers within the same communication round. Thus, to optimise efficiency, these straggler clients may be omitted from a communication round, even in the absence of explicit failures. The presence of stragglers is often unavoidable, which significantly impairs algorithm performance. To tackle these issues, asynchronous versions of Federated Learning have been proposed, where the central server does not wait for all the clients to complete their tasks [8, 9]. However, the FL architecture is naturally prone to be vulnerable to a single point of failure, where the failure of the central server impairs the whole learning process.

Decentralised Learning (DL) extends the typical settings of Federated Learning [3] by removing the existence of the central parameter server. This approach is gaining momentum as it merges the privacy benefits inherent in Federated Learning with the capabilities of decentralised and uncoordinated optimisation and learning processes. In [10], a DL model is deployed for a healthcare application, enabling multiple hospitals to collaboratively train a Neural Network model while kee** their data private. Another work [11] introduces a federated consensus algorithm that extends the FedAvg method proposed by [3] for use in decentralised environments, with emphasis on industrial and IoT applications. The work in [12] proposes a Federated Decentralized Average based on SGD in their research, where they incorporate a momentum term to counterbalance potential drift caused by multiple updates, along with a quantisation strategy to reduce communication requirements. In [13], the authors introduce a novel method for collaborative training in DL environments by means of a secret sharing schema and a multi-party average computation, aiming to preserve privacy and security. The authors of [14] explore the influence of the underlying network architecture on the learning efficacy in a fully decentralised learning system, while [15] investigates the impact of network topology on the optimal model initialisation in decentralised settings. However, the issues surrounding system robustness remain unaddressed within the domain of DL. However, the effectiveness and reliability of DL systems are crucial for their success. Our research represents a step towards filling this gap.

This paper extends the work in [16] along several directions. Firstly, we employ a new strategy for simulating disruptions, ensuring fairer comparisons among different learning configurations by imposing disruptions once the average accuracy of the system reaches a specific target threshold. This contrasts with the previous approach, where disruptions occurred at fixed intervals. Furthermore, a novel scenario (Case 3) has been introduced, detailing situations where disrupted nodes exhibit a disproportionate proficiency in classifying specific classes of images. This setup enables evaluating surviving nodes’ ability to retain valuable knowledge provided by disrupted nodes after disconnection. Moreover, the paper now includes an analysis of the impact of different centrality measures for selecting nodes to be switched off, as well as a percolation analysis to characterise changes in network connectivity as nodes are removed, detailed in Section 4.2. Additionally, for Case 1 and Case, we now consider a more challenging (from the learning standpoint) data distribution. Finally, a comparison of the performance of isolated nodes versus the largest cluster has been added to illustrate the robustness of the decentralised learning process in extreme cases following disruption.

3 System model

3.1 Decentralized learning

We represent the network connecting the nodes as a graph $\mathcal{G}(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}$ denotes the set of nodes and $\mathcal{E}$ is the set of edges. The decentralised learning algorithms designed to operate on a network of nodes are typically composed of two main blocks: one for the local training of the model using local data and the other one devoted to the exchange and aggregation of the models’ updates. These operations are executed by each node, atomically, within a single communication round. Each node $i\in\mathcal{V}$ has a local training dataset $\mathcal{D}_{i}$ (containing tuples of features and labels $(\mathbf{x},y)\in\mathcal{X}\times\mathcal{Y}$ ) and a local model $h$ defined by weights $\mathbf{w}_{i}$ , such that $h(\mathbf{x};\mathbf{w}_{i})$ yields the prediction of label $y$ for input $\mathbf{x}$ . Let us denote with $\mathcal{D}=\bigcup_{i}\mathcal{D}_{i}$ and with $\mathcal{P}$ the label distribution in $\mathcal{D}$ . In this paper, we consider both the case where $\mathcal{P}_{i}$ (i.e., the label distribution of the local dataset on node $i$ ) is different from $\mathcal{P}$ , as well as the case where label distributions $\mathcal{P}_{i}$ are IID (independent and identically distributed) across all devices (hence, $\mathcal{P}_{i}\sim\mathcal{P}$ ). We also assume homogeneous initialisation of local models across devices, i.e., we assume that, at time 0, $\mathbf{w}_{i}=\mathbf{w}_{j}$ for any pair of nodes $i,j$ holds true. The effect of heterogenous initialisation on decentralised learning is investigated in [17], while in this paper we focus on the effect of disruption on the learning process, which is an orthogonal problem.

At time 0, the model $h(\cdot;\mathbf{w}_{i})$ is trained on local data by minimising a target loss function $\ell$ :

\widetilde{\mathbf{w}}_{i}=\mathrm{argmin}_{\mathbf{w}}\sum_{k=1}^{|\mathcal{D% }_{i}|}\ell(y_{k},h(\mathbf{x}_{k};\mathbf{w}_{i})),

with $(\mathbf{x}_{k},y_{k})\in\mathcal{D}_{i}$ . At each communication round, i.e., at each step $t$ , a given node $i$ receives the parameters of the local models $\widetilde{\mathbf{w}}_{j}$ from its neighbours $j\in\mathcal{N}(i)$ in the social graph and combines them with its local model through the following aggregation policy:

\mathbf{w}_{i}^{(t)}\leftarrow\frac{\sum_{j\in\mathcal{N}(i)}\alpha_{ij}% \widetilde{\mathbf{w}}_{j}^{(t-1)}}{\sum_{j\in\mathcal{N}(i)}A_{ij}},

(1)

where $\mathcal{N}(i)$ is the neighbourhood of node $i$ including itself, $A_{ij}$ is the $i,j$ element of the adjacency matrix of the network, and $\alpha_{ij}$ is equal to $\frac{|\mathcal{P}_{j}|}{\sum_{j\in\mathcal{N}_{i}}|\mathcal{P}_{j}|}$ (and captures the relative weight of the local dataset of node $j$ in the neighbourhood of node $i$ ). Afterwards, each node begins a new round of local training using the newly aggregated local model. This goes on for a certain number of communication rounds.

This strategy, which we denote as DecAvg, extends FedAvg [3] to a decentralised setting and is a generalisation of similar strategies proposed in [12, 11]. Differently from the standard Federated Learning, in fully decentralised learning settings (as the ones considered in this paper), the whole process cannot rely on the coordination and supervision of a central entity. This means that, from a node point of view and apart from degenerate cases, the number of other models it can directly access to improve its local knowledge is limited to the size of its neighbourhood. However, knowledge extracted by any given node from its own data can percolate beyond its immediate neighbourhood due to successive averaging and exchanges of models across communication rounds. Moreover, the connectivity between nodes is a property of the system, and under no circumstances can it be controlled by a node. Conversely, nodes can disappear according to possible failures.

3.2 Network resilience and disruption strategy

The malfunctioning of nodes within a network can lead to a systemic collapse, where the normal functionality of the system is impaired. Of course, the amount of damage inflicted on the network depends on its characteristics and the characteristics of the nodes that exhibit malfunctioning. The ability of a network to retain its functionality despite the failures of nodes within it is called network’s resilience or robustness [2, 18]. Network resilience is critical in various domains, including telecommunications, transportation, and social networks. A crucial aspect of network resilience is the structure of the underlying network. Network connectivity plays an essential role in a system’s ability to survive random failures or deliberate attacks. Therefore, understanding how networks respond to node removal or failure is essential for designing robust and reliable systems. Of course, the quicker the network breaks apart, the less robust it is. To this end, percolation theory is a widespread technique used to address questions related to network resilience. Percolation analysis is a mathematical and statistical method used to study the behaviour of interconnected systems, such as networks, under various threats, including network disruption. Its goal is to investigate how the removal or failure of nodes or edges affects the overall connectivity of the network. The analysis involves simulating the process of removing nodes or edges from the network based on certain criteria. Normally, the main emphasis in percolation theory is to characterise the properties of the largest connected component (LCC) after disrupting a certain fraction of nodes, which is defined as the largest subset of nodes that remain reachable from each other via a finite-length multi-hop path. Focusing on the LCC is clearly important to characterise how global properties of a network (such as global connectivity) are impacted by disruption. However, the same methodology can also be used to understand the impact of disruptions at a local level, i.e., by characterising their effect not only on the LCC but also on the other connected components of smaller sizes, down to nodes that become isolated.

This is how we use this methodology in this paper. Specifically, as explained in Section 4.2, we carry out a preliminary percolation analysis to identify the percentage of nodes to be disrupted to generate an interesting mix of connected components, from large ones to isolated ones. Then, we study the evolution of decentralised learning when such a percentage of nodes is disrupted by varying the point in time when these disruptions occur (to give more or less time to the underlying decentralised learning process to work on the original network before disruption). Specifically, we characterise the evolution of the learning process in the different components that emerge after disruption to show the interplay between the quality of the model learned before disruption, the residual connectivity, and the presence of data (or lack thereof) in the surviving components.

4 Experimental settings

Below we define the experimental settings of the analyses we carry out in this work.

4.1 Communication network topology

We consider an unweighted Barabasi-Albert (BA) graph $\mathcal{G}$ with 100 nodes (shown in Figure 1) and a preferential attachment parameter $m=2$ . A BA graph is created by initialising it with $m_{0}\geq m$ nodes, and then, at each step, adding a new node $i$ and connecting it with other $m$ existing nodes $j$ with a linking probability that is proportional to the degree of $j$ . Since each newly created node starts off with $m$ links only, $m$ effectively becomes the minimum degree for nodes in the network. We rely on the networkx python library to generate this graph.

The topology of the Barabasi-Albert graph is considered able to capture important features of real-life networks by incorporating two fundamental principles: preferential attachment (new nodes in the network tend to connect to existing nodes that already have a high number of connections) and organic growth (which aligns with the dynamic nature of many real-world systems). By reproducing these principles, a BA graph model successfully emulates the power-law degree distribution observed in numerous networks, such as social networks, the World Wide Web, and collaboration networks. For this reason, it makes sense to consider it a realistic topology for our network graph.

4.2 Selection of “disrupted” nodes

We mimic disruptions by switching off some nodes in the network while the cooperative learning task is ongoing. To maximise the effect of the disruption, these nodes are chosen based on their centrality score. It is well-known that BA graphs are sensitive to targeted attacks towards central nodes, as discussed in Section 2. Thus, starting with graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ , where $\mathcal{V}$ denotes the set of vertices and $\mathcal{E}$ the set of edges, we split $\mathcal{V}$ into $\mathcal{V}=\mathcal{V}_{d}\cup\mathcal{V}_{s}$ , where $\mathcal{V}_{d}$ contains the nodes with the highest centrality score. The graph left after the disruption is $\mathcal{G}_{s}=(\mathcal{V}_{s},\mathcal{E}_{s})$ , where $\mathcal{E}_{s}=\{(i,j):i,j\in\mathcal{V}_{s}\wedge(i,j)\in\mathcal{E}\}$ .

To identify a suitable centrality metric for our study, we carry out an initial percolation analysis of the network. A percolation analysis involves monitoring the network’s connectivity changes while undergoing node removal. This can be done by observing the size of the largest connected component and comparing it to the size of the original network, as follows. Let us denote the network under study with $\mathcal{G}$ , composed of $N=|\mathcal{V}|$ nodes and $E=|\mathcal{E}|$ links. At time $t$ , we remove $m$ nodes according to a predefined criterion. After the removal, we count the number of connected components that survive and take the one with the highest number of nodes as the largest connected component, $\mathcal{G}_{c}$ . Then we calculate the ratio between the size of $\mathcal{G}_{c}$ (denoted as $N_{G_{c}}$ ) and that of the original graph $\mathcal{G}$ (denoted with $N_{G}$ ):

\Phi=\frac{N_{G_{c}}}{N_{G}}.

The smaller $\Phi$ , the higher the network fragmentation and the higher the damage the disruption has produced in the network.

In Figure 2(a), we show the behaviour of the ratio $\Phi$ in our network for different centrality measures: structural hole score [19], betweenness centrality and degree centrality. Structural holes refer to the gaps or “holes” in a network where there are no direct connections between two or more groups of people. These gaps represent opportunities for nodes to act as intermediaries or brokers by connecting otherwise disconnected groups and facilitating the flow of information, resources, or ideas between them. Thus, the structural hole score quantifies the importance of a node in promoting information flow within the network. Nodes with higher structural hole scores are crucial in facilitating communication across different groups. The betweenness centrality quantifies the extent to which a node lies on the shortest paths between pairs of other nodes in the network. Hence, it measures the importance of a node in connecting other nodes. The degree centrality measures the importance of a node within a network based on the number of connections it has to other nodes. We refer the interested reader to [2] for a formal definition of these centrality metrics.

We progressively remove those nodes with the highest value of the selected centrality measure. This is effectively equivalent to analysing the robustness of the Barabasi-Albert graph undergoing targeted attacks towards the most central nodes. As we can see in Figure 2(a), for the first 5% of nodes removed (exactly 5 nodes in our case), the impact of all centrality measures is equal. After removing the initial 5% of nodes, the degree centrality and structural hole score behave similarly, whereas the betweenness centrality curve remains generally higher. This implies that a larger portion of the network survives w.r.t. the other centralities, meaning that the graph is more robust to node removal when a targeted attack considers betweenness centrality. The reason why the curves behave the same for the first 5 nodes is that their centrality values rank the same nodes regardless of the specific metric considered, as it can be seen in Figure 2(c). In general, the figure shows that all the centrality measures have a linear directly proportional relationship. This means that nodes with high degrees also exhibit high betweenness centrality and high structural hole score. While the relationship remains directly proportional, subtle variations appear beyond the top five nodes. Other important factors in a disruption analysis are the number and size of the connected components that survive after the disruption. As we can see in Figure 2(b), after the disruption, there will be many isolated nodes (component size 1) and few connected components: the larger the size, the smaller the number. Note that the effect of disruption under degree centrality and structural hole score results in a higher number of isolated nodes than under the betweenness centrality case, meaning that undergoing such disruption is more detrimental to the network as it isolates more nodes.

Figure 2(a) shows that the biggest drop in the largest connected component size happens at around 10% node removal, with a similar pattern for all centrality measures. Therefore, we decided to remove the 10% most central nodes. Due to its importance in measuring the importance of a node in promoting information flow, we select the structural hole score as our centrality metric. Note, though, that we are effectively also removing exactly the 10% of nodes with the highest degree centrality, as can be seen in Figure 2(a).

4.3 Time of disruption

In disaster scenarios, disruptions present themselves in unpredictable ways. They can happen as soon as the collaborative learning process starts, in the middle, or when it is approaching its end. The loss of nodes is expected to have a different impact depending on the current progress of the learning process: weak when it happens in the final phase, stronger early on. In the latter case, the key question is whether the decentralised learning process is able to recover and catch up over time. As we discuss later on in Section 5.1, different learning configurations might exhibit different temporal dynamics, with one being slower than another one, even if ultimately their accuracy converges to the same value. To account for these different learning speeds and enable fair comparisons among the different scenarios considered, we thus force disruptions to happen once the average accuracy of the system has reached a certain target threshold. For example, in Section 5.2, we consider disruptions happening when the average accuracy in the system has reached 0.7, 0.75, 0.8. This guarantees that disruptions affect systems that are similarly “skilled” in the learning process at the time of disruption.

4.4 Network and data resilience scenarios

A node in the network can contribute to the decentralised learning process in two ways. It can be well positioned in the network facilitating knowledge (i.e., model) circulation and/or can be endowed with training data to be used for model training and update. Thus, a disruption might have a different effect depending on whether the switched-off nodes only facilitate model dissemination or also possess local training data on which the local model can be trained. For this reason, we will consider different scenarios. First (Case 1), we investigate network resilience when the disrupted nodes are not assigned local data. Thus, the highly central nodes (those that are cut off) will act as bridges but will not contribute knowledge to the learning process. They will pass along aggregate models without using their local data for training because they don’t have any. Next, we consider a situation where the highly central nodes are assigned data, train locally on their own data set, and share their updates with their neighbours. Thus, they are no longer just passive elements of the network. In these settings, the characteristics of the data items assigned to the disrupted nodes may play a crucial role in the resulting learning performance. Thus, we consider two different data distributions. We refer to Case 2 as the configuration in which disrupted nodes are assigned data in an IID fashion with respect to surviving nodes. In practice, this means that we assign exactly the same number of data samples per class to all nodes, regardless of whether they survive or don’t survive the disruption. Finally, in Case 3 we focus on a non-IID data distribution as follows. The available class labels are split into two groups, $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ : nodes that will be disrupted get disproportionately more data in the second label group than the other nodes, while all nodes get the same number of data for the first label group. With this configuration, we are testing what happens when disrupted nodes contribute much more knowledge to the training process than the other nodes. This is because larger local datasets can lead to better-trained models, thus enabling more knowledge extraction.

For what concerns the training dataset, for our experiments, we employ the widely used MNIST image dataset [20]. This dataset contains a set of handwritten digits; thus, data are divided into 10 classes (for digits from 0 to 9). The collaborative task undertaken by the nodes is then a standard image classification problem. This problem is relevant to a disaster scenario, where individuals within the affected community can leverage their smartphones or other devices with built-in cameras to capture images of the affected areas, including collapsed buildings, road blockages, or other hazardous conditions. The MNIST dataset contains 60,000 training images (approximately 6,000 per class) and 10,000 test images (1,000 per class). We evaluate the impact of the disconnection of central nodes by measuring the accuracy obtained by the local models on the standard MNIST test dataset over time. Note that all classes are equally represented in the test set and that the test set is common to all nodes.

Classification on the MNIST dataset is a relatively easy learning task. Thus, in order to stress test our decentralised learning process and to capture a realistic scenario³³3We anticipate that in scenarios where nodes represent user devices collaborating on decentralised learning tasks, the local datasets will be relatively small, as users actively collect data samples themselves., we assign very few images per class to each node. In Case 1, we distribute 7 images per class to non-disrupted nodes, leading to a local training dataset of 70 images and 6,300 images overall in the network. Recall, in fact, that in Case 1 the 10 disrupted nodes do not own any local data, so the remaining 90 nodes are the only ones assigned data. In Case 2, we keep the same IID data distribution, but this time, we involve disrupted nodes. In order to get approximately the same dataset size distributed across the whole network, this time, we assign 6 images per class to each node, leading to a local training dataset of 60 images and, globally, 6,000 images in the network (given that images cannot be fractional, this is the best trade-off to make Case 1 and Case 2 comparable). In Case 3, recall that class labels are split into two sets $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ . Images belonging to $\mathcal{L}_{1}$ are distributed in an IID fashion across all nodes by simply splitting the approximately 6,000 images among all nodes (hence, each node gets around 60 images per class). Instead, only a small number of images belonging to $\mathcal{L}_{2}$ are assigned to nodes in $\mathcal{V}_{s}$ (we test configurations with 10, 20, and 30 images per class), while all the others are split equally among disrupted nodes (those in $\mathcal{V}_{d}$ ), which end up with approximately 500 images per $\mathcal{L}_{2}$ -class. In Case 3, disrupted nodes are disproportionately better at classifying $\mathcal{L}_{2}$ -images, because they enjoy a rather larger training set with respect to surviving nodes. Thus, with this configuration, we are able to assess how well the surviving nodes can retain the knowledge of disrupted nodes once the latter ones are cut off from the network. The three different scenarios and the corresponding data distributions are summarised in Table 1.

	Disrupted nodes	Surviving nodes
Case 1	No data	All classes, 7 images per class
Case 2	All classes, 6 images per class	All classes, 6 images per class
Case 3	Classes [0-4], $\sim 60$ images per class	Classes [0-4], $\sim 60$ images per class
	Classes [5-9], $\sim 500$ images per class	Classes [5-9], 10/20/30 images per class

Table 1: Summary table for the data distributions in the three considered scenarios.

4.5 Other settings

We use the DecAvg scheme implementation within the SAISim simulator, available on Zenodo [21]. The simulator is developed in Python and leverages state-of-the-art libraries such as pytorch and networkx for deep learning and complex networks, respectively. It also implements primitives to support fully decentralised learning. The decentralised learning process is run for 200 communication rounds. For the learning task, we consider a simple classifier as the learned model. The local models of nodes are Multilayer Perceptrons with three layers (sizes 512, 256, 128) and ReLu activation functions. SGD is used for the optimisation, with a learning rate of 0.01 and momentum of 0.5.

4.6 Evaluation metrics

Here, we introduce the metrics used to evaluate the robustness of decentralised learning to node disruption. A direct measure of classification performance is the accuracy achieved by nodes using their local models⁴⁴4Note that local models here are the result of progressively averaging models obtained from neighbours. Thus they embed global knowledge. and applying it to the test set, which is common to all nodes. Given that disrupted nodes do not participate in the learning any more after disruption, we do not consider the accuracy they achieve in our analyses. To obtain a compact view of the performance, we also consider the average accuracy across non-disrupted nodes, which is computed as follows.

Definition 4.1 (Mean Accuracy).

The mean accuracy of the system is defined as the mean over all the $\mathcal{V}_{s}$ nodes:

\overline{A}(t)=\sum_{i\,\in\,\mathcal{V}_{s}}\frac{1}{|\mathcal{V}_{s}|}A_{i}% (t).

(2)

$\overline{A}(t)$ will be used to measure the overall system’s performance.

When the top 10% central nodes are disrupted, the connectivity in the graph drastically decreases. As illustrated in Figure 2(b), many nodes become completely disconnected, while others form smaller clusters. Reduced connectivity results in fewer opportunities for communication and collaboration, impairing the decentralised learning process. It is anticipated that nodes losing more connections, and particularly those becoming isolated, will be affected most severely. Thus, we also define the mean accuracy for the individual clusters as follows.

Definition 4.2 (Mean accuracy within clusters).

The mean accuracy over clusters of size $c$ is defined as the mean over all $\mathcal{V}_{s}$ nodes belonging to the clusters of the chosen size $c$ .

\overline{A}^{c}(t)=\sum_{i\,\in\,\mathcal{C}_{c}}\frac{1}{|\mathcal{C}_{c}|}A% _{i}(t)

(3)

where $\mathcal{C}_{c}$ denotes the set of nodes belonging to a cluster of size $c$ .

The mean accuracy over clusters will be used to measure a more fine-grained impact of the disruption on the system, accounting for localised effects. Note that we aggregate over all nodes belonging to clusters with size $c$ , regardless of whether they are in the same cluster. In the case of clusters of size 1, for example, this allows us to consider all isolated nodes together.

The metrics defined above will be used to quantify the impact of the disruption. However, in order to measure differences between the various cases under investigation, we will employ a percentage difference metric. This metric will indicate the distance in performance between the different study cases.

Definition 4.3 (Accuracy Difference).

The accuracy difference between two study cases $i$ and $j$ , be them different data distribution or different accuracy thresholds at which the disruption occurs, is defined as:

d_{A}(A_{i}^{k},A_{j}^{k};t)=\frac{A_{i}^{k}(t)}{A_{j}^{k}(t)}-1

(4)

where $A^{k}(t)$ is the accuracy of the $k$ -th node, and the subscript refers to the accuracy measures in the different cases $i$ and $j$ . To denote the average across all nodes, we will use the notation $d_{A}(A_{i},A_{j};t)$ .

This metric serves as a tool for evaluating and comparing various study cases and disruption conditions within our analysis. By examining the sign of the function, we can discern which study case exhibits superior accuracy, whether at a global or local level. Furthermore, the function’s magnitude yields the percentage variation among the many scenarios, facilitating a comparison between their relative efficacies. Note that, whenever we compute accuracy differences, we align communication rounds at the time of disruption. Specifically, we consider the accuracies of the two case studies $i$ and $j$ at $t$ communication rounds after their disruption events (which may happen at different communication rounds in the two cases, as exemplified in Table 2). This way, we are sure to have given the two study cases the same amount of time to recover from disruptions.

Please note that all average results in Section 5 will be reported with their corresponding 95% confidence intervals. In certain instances, the interval may be so narrow that it is not readily apparent.

5 Results

In this section, we discuss the impact of central nodes being cut off from the network due to a disruption, and we focus on the three scenarios, Case 1, Case 2 and Case 3, illustrated in the previous section. Recall that in Case 1 and Case 2, nodes have an IID distribution of local data, with the difference that in Case 1, highly central nodes do not have local data, and thus they contribute only by connecting elements in the graph topology. The data that these nodes hold locally in Case 2 are redistributed in Case 1 among the other nodes, preserving the IID property among them to maintain the total number of data points equal at the overall network level. Case 3 is different from the other two, as we use it to test the impact of disruption when the data distribution across nodes is non-IID, and specifically when the disrupted nodes have much more local data (hence knowledge) than the others. First, in Sections 5.1-5.4, we discuss and compare Case 1 and Case 2, as in both the disrupted nodes do not enjoy more knowledge than the other nodes. Then, Case 3, in which disrupted nodes have much more knowledge than the other nodes, will be discussed in Section 5.5. Due to the significantly different data distributions between Cases 1-2 and Case 3, a direct quantitative comparison would not be appropriate.

5.1 Baseline performance when no disruption occurs

In Figure 3, we show the global system’s performance, as defined in Equation 2, in Case 1 and Case 2 when no disruption occurs. As evident from the plot, the curve representing Case 2 (orange curve) exhibits a sharper increase and overall higher accuracy than its counterpart in Case 1. In Case 2, the central nodes engage in local training in addition to their role as model aggregators and relays across the network. Despite a similar total dataset size between the two cases (6,300 images for Case 1 and 6,000 images for Case 2, with a slight advantage for Case 1) and a slightly larger local training dataset in Case 1, the superior accuracy achieved by Case 2 implies that bridge nodes offer more informative updates when engaged in local training. This enhancement compensates for the smaller training datasets, both locally and overall. With increasing communication rounds, the Case 1 curve tends to the Case 2 curve. In other words, if we allow the system adequate time for information exchange, it can compensate for any informative updates missing initially, ultimately achieving the same accuracy level as its Case 2 counterpart. This equivalence arises from the similarity in the amount of data available at the system level in both cases. From the perspective of disruption, this finding implies that if the disruption were to occur later on, the consequences of connectivity loss and the concurrent loss of connectivity and data would be identical.

Figure 3 showcases what we already anticipated in Section 4.3: when two learning processes have differing speeds, a disruption occurring at a fixed time $t$ will encounter two systems at distinct stages of the learning process. While this scenario realistically captures the notion that faster systems are more likely to have acquired sufficient knowledge when disrupted, comparisons between the two systems for what-if analyses may be less informative. The discrepancy between the two scenarios is depicted in Table 2, which illustrates the communication round at which the mean accuracy of the system reaches the specified accuracy threshold in both cases. As expected, Case 2 reaches the accuracy threshold faster than Case 1. Therefore, as discussed in Section 4.3, in this study, we chose to assess the robustness of the two systems when disruptions occur at the same average accuracy level for both. The objective is to compare the consequences of disruption in systems that had achieved similar skill levels in terms of learning.

	Accuracy 0.7	Accuracy 0.75	Accuracy 0.8
Case 1	33	37	46
Case 2	23	26	34

Table 2: Communication round at which the mean accuracy of the system reaches the selected accuracy threshold, in the Case 1 and Case 2 baselines.

5.2 Impact of the time of disruption

In Figure 4, we show the accuracy of all the surviving (i.e., non-switched-off) nodes in Case 1 and Case 2 when the disruption happens at different accuracy thresholds. The colouring here is defined based on the size of the cluster the nodes belong to after the disruption occurs. The clusters are the different connected components left after the central nodes are switched off due to the disruption, as discussed in Section 4.2.

First, we focus on the effect of different disruption times in Figure 4. The later the disruption happens, i.e., going from left to right, the higher the mean accuracy of the system, as more and more nodes have been able to reach a higher level of accuracy before the cut-off, resulting in a narrower curve beam. This result holds for both scenarios and follows naturally after noticing that the central nodes are able to connect the network and, in Case 2, share their information for more communication rounds before getting switched off as the accuracy threshold increases. In all cases, after the disruption occurs, most curves stop improving and tend to cluster into different groups of similarly performing nodes. As the curve colours show, these groups reflect the cluster size to which the nodes belong. Recall that since the clusters’ sizes are calculated after the removal of the special nodes, nodes having cluster size 1 are isolated nodes. As we can see in Figure 4, isolated nodes have the lowest values of accuracies, and the accuracy generally increases with the increasing value of the cluster size. This comes naturally from the fact that more connected nodes enjoy more information flow due to collaborative learning, thus increasing their performance. Furthermore, by examining the curves for the two largest clusters (size 25 and size 29), we can observe that the larger the cluster size, the more a node’s performance becomes independent of the disruption time. This indicates that the cluster has sufficient data to compensate for the disruption’s effects, no matter when it occurs. Conversely, we notice a strong correlation between the accuracy level and the timing of disruptions for the isolated nodes: the later the disruption occurs, the better the accuracy level that the isolated nodes can achieve. Naturally, this is because they have longer access to more information. Clusters of intermediate sizes fall within these two boundary conditions.

Interestingly, in Figure 4, we note that for later disruption times, isolated nodes, as well as nodes belonging to small clusters, reduce the accuracy they achieve with respect to the one they had achieved at the disruption time. This effect is not present for nodes in larger clusters. The intuition is that large clusters can compensate for the lack of connectivity thanks to the overall set of data residing at the nodes in the cluster. If the cluster is large enough, the effect of disruption can be compensated reasonably well. Otherwise, nodes in smaller clusters are not able to recover knowledge that is lost as a side effect of disruption. We analyse this effect in more detail in the next section.

5.3 Connectivity loss vs connectivity and data loss

In this section, we directly compare Case 1 and Case 2 with respect to the evolution of the decentralised learning process after disruption. Specifically, in Figure 5, we show the difference between the overall performances of the two cases, denoted as $d_{A}(A_{Case1},A_{Case2})$ , for the different accuracy thresholds, where $d_{A}(A_{Case1},A_{Case2})$ is the mean of the accuracy differences over all surviving nodes. The accuracy difference is computed at each communication round following the time of disruption. As we can see, the curves are all positive and stabilise at small distances. Even though, after a large number of communication rounds, the difference tends to be very limited, this indicates that, in general, the configuration of Case 1 is more beneficial than that of Case 2 to limit the damage of disruption.

This is further confirmed at a more granular level when we compute the same difference but focusing on the largest cluster (Figure 6(a)) and the isolated nodes (Figure 6(b)), thus computing the difference between the within-clusters accuracy, as per Equation 3. The behaviour of the large cluster is similar to that observed for the system overall. Specifically, we note that (i) the condition of Case 1 is more favourable than that of Case 2 and that (2) after sufficient time, the accuracy threshold at which disruption occurred does not differentiate the two conditions. Moreover, the case of isolated nodes shows that when disruption occurs at a (relatively) low accuracy, the presence of even a small amount of additional data locally allows the model of Case 1 to reach a higher accuracy than the model trained in Case 2. However, this difference vanishes when disruption occurs at a high accuracy threshold.

All in all, our results suggest that having more data available “somewhere” in the network (as in Case 1) is more beneficial than concentrating the same amount of data on central nodes, if those nodes are at risk of disruption (as in Case 2), while the main advantage of the latter configuration is limited to a speed-up of the decentralised learning process if no disruption occurs (as shown by Figure 3). This confirms that, as long as data are present in the network, decentralised learning is able to exploit them through collaborative training, even in the presence of significant disruptions to the network structure.

5.4 Knowledge persistence

In this section, we analyse in detail the aspect of knowledge persistence, referring to the nodes’ ability to retain the memory of the knowledge transmitted by nodes that were disrupted. Specifically, we analyse, after disruption, the loss of accuracy with respect to the case where no disruption occurs. In the following, “baseline” and “accuracy threshold 0” refer to the cases where the disruption either never happens or has already happened at the start of the simulation, respectively. They represent the upper and lower bounds of achievable accuracy.

First, we analyse the overall performance of the system, considering all nodes altogether. To this end, in Figure 7(a) we show the average accuracy of the entire system, as defined in Equation 2. As we can see from Figure 7(a), there is a noticeable gap in accuracy with respect to the baseline. Furthermore, from the zoom inset, we can observe that, despite the curves being closely grouped together, there is a gap between the system’s performances for the different accuracy thresholds. This can be better analysed by looking at the percentage difference (Figure 7(b)). With increasing accuracy threshold, the distance from the baseline decreases, and overall, it ranges anyway around 4 and 8%. Also, in this case, we notice the slightly lower performance of Case 2 with respect to Case 1.

Next, we focus individually on the two boundary cases of completely isolated nodes, and on the case of the biggest cluster. In 8(a), we show the mean accuracy of the isolated nodes for different accuracy thresholds. We can observe several interesting features. First, we confirm the drop in accuracy after disruption suffered at any accuracy threshold (higher than 0). This means that, for isolated nodes, it is fundamental to get access to knowledge extracted from data residing on other nodes and that they cannot fully compensate for the effect of a disruption using local data only. On the other hand, it is also interesting to observe that isolated nodes, at any accuracy threshold, achieve higher accuracy with respect to the case where they started in isolation (accuracy threshold equal to 0). This means that, despite the observed drop, some knowledge acquired thanks to collaborative learning still persists after disruption and “stays there” for a long time. In other words, this knowledge persists even in the absence of a large representation of labels in data residing locally. The difference metric (8(b) shows that the drop in accuracy ranges between 10 and 20% depending on the accuracy threshold. All in all, Figure 8 shows a significant persistence of knowledge after disruption even at isolated nodes, thanks to the fact that they had the chance of being exposed to the collaborative learning process before disruption, even though some loss of knowledge is unavoidable. The same behaviour can also be observed for small clusters and vanishes only when clusters contain an amount of data sufficient to recover from the loss of global knowledge diffusion implied from the time of disruption.

Figure 9 illustrates the average accuracy attained by nodes within the largest cluster and the percentage difference compared to the baseline. Intuitively, these nodes are less affected by disruptions, which is supported by the results. Specifically, we observe the following main findings. Firstly, large clusters demonstrate resilience to disruptions regardless of the accuracy thresholds, as evidenced by the nearly overlap** accuracy curves. This indicates that the timing of the disruption minimally impacts the performance of clusters. Secondly, there is a slight difference in performance compared to the baseline (approximately 1%), suggesting that large clusters can effectively mitigate the effects of disruptions quite efficiently. One particular aspect to note in this case is that the accuracy in Case 1 is maximised when the disruption occurs at time 0, and this effect is even more evident for earlier communication rounds. In other words, in Case 1, nodes of the largest cluster do not benefit at all from the presence of the other nodes in the network, which are actually detrimental to them. Albeit counter-intuitive, this behaviour can be explained also based on the analysis we have carried out in [14]. Remember that in Case 1 the most central nodes (which undergo disruption) do not have data of their own, but they just average and forward models received from neighbours without any local retraining. Particularly early on in time, these nodes receive very heterogeneous models (as they are central, they typically have many neighbours, see Section 4.2) and therefore, the average model they compute is not very accurate unless it has a chance of being retrained on local data. As a side effect, when disruption occurs after time 0, central nodes inject in the largest (after disruption) cluster “noisy” models, which do not contribute to, yet work against, the overall accuracy achieved by nodes in this cluster.

Considering the results presented in this section altogether, we can observe a significant persistence of knowledge after disruption in all cases. Even in the most unfavourable conditions (i.e., isolated nodes) the accuracy achieved sufficiently long after the disruption is significantly higher than what would be achieved if those nodes were never exposed to the decentralised learning process (i.e., the case “accuracy threshold 0”). There is clearly a loss of accuracy with respect to the case where no disruption occurs, but this is rather limited (in the order of 20% maximum in our experiments). Moreover, if nodes remain grouped in sufficiently large clusters, the decentralised learning process is extremely robust, and, if sufficient time is allowed after disruption, the achieved accuracy is basically indistinguishable from the case when no disruption occurred.

5.5 The effect of non-IID data

In the scenarios considered so far, nodes have (as summarized in Table 1) either the same amount of images per class or none at all (disrupted nodes in Case 1), so the data distribution was IID assuming that data were present. This implied that each node with data would contribute equally to the learning process. In this final set of results, we want to focus, instead, on the effects of an unequal data distribution. As we explained in Section 4.4, in Case 3, data labels are divided into two classes, $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ . Images belonging to $\mathcal{L}_{1}$ are distributed in an IID fashion across all nodes by simply splitting the approximately 6,000 images among all nodes (hence, each node gets around 60 images per class). Instead, only 10, 20 or 30 images belonging to $\mathcal{L}_{2}$ are assigned to the nodes surviving the disruption, while all the others are split equally among disrupted nodes, which end up with approximately 500 images per $\mathcal{L}_{2}$ -class. Thus, in Case 3, disrupted nodes are disproportionately better at classifying $\mathcal{L}_{2}$ -images hence losing them after the disruption is expected to significantly impact the learning process. Before we continue, note that in Case 3 we had to consider higher accuracy thresholds than in the previous cases (87%, 90%, 92%). This is because in this Case 3 configuration, there are much more data around in the network overall, so accuracies of 80% are basically reached in the very first communication rounds.

Figure 10 displays the accuracy over time for Case 3. Let us begin with the most challenging configuration (top row), where the surviving nodes after the disruption only possess 10 images for each $\mathcal{L}_{2}$ -label (those ranging from 5 to 9 in the MNIST dataset). When the disruption occurs at time zero, the spread of curves is quite broad, indicating that some nodes lack the ability to learn solely based on local data. As the disruption is delayed, the spread narrows, indicating that even the most isolated nodes post-disruption can benefit from collaborative training that occurred before disruption, despite having very few local $\mathcal{L}_{2}$ -images. Note, in the inset of the figure, that more peripheral nodes experience a drop in accuracy after the disruption, as observed in previous cases. However, the accuracy remains significantly higher than when there is no collaboration at all (corresponding to an accuracy threshold of 0%). With an increase in the number of local images (second and third rows), we observe an overall facilitation in the learning process, as expected. In this scenario, the timing of the disruption plays a less significant role, as the local data suffice to compensate for the loss of collaborators.

Zooming in on the differences between isolated nodes and the largest connected component after the disruption (Figure 11), we confirm that nodes that become completely disconnected post-disruption can capitalise on the knowledge received and assimilated before the disruption. Collaborative learning benefits them most when they have very few local training images initially. Large clusters (bottom row of Figure 11) suffer much less from the disruption because they can effectively pool their knowledge thanks to the remaining graph connectivity. However, similar to previous cases, later disruptions and an increase in local images also alleviate the burden on their learning process. These findings are clearly confirmed when looking at the differences in accuracy (Figure 12).

The results for Case 3 confirm the robustness of the decentralised learning process to network disruptions: as long as some knowledge is available and accessible through the communication graph, compensating for the loss of collaborators – even excellent ones like in Case 3 – is feasible and the loss in accuracy with respect to the baseline of no disruption is drastically mitigated. More than other, nodes that lose all collaborators benefit significantly from the accumulation of deep knowledge circulated before the disruption.

6 Conclusions

In this paper we have focused on fully decentralised learning, and we have analysed the robustness of the learning process in case of disruptions. Specifically, after a variable initial period where the network is intact and, therefore, the decentralised process can proceed without disruptions, we have removed a given percentage of central nodes, and we have analysed, over time, the accuracy achieved by the surviving nodes. Specifically, we have considered various types of disruptions involving loss of either connectivity alone or connectivity and data together.

The most important finding we can highlight is that decentralised learning appears to be remarkably robust to all types of disruptions considered, as long as surviving nodes hold a sufficient fraction of representative data to sustain the learning process after disruption. For nodes belonging to large clusters (after disruption) the loss of accuracy is negligible compared to the case when no disruption occurred. For isolated nodes, the loss of accuracy is larger, but we have never observed it exceeding 20% with respect to the case of no disruption. In all cases, we have shown that if surviving nodes are part of the decentralised learning process for some time before disruption occurs, then they can retain most of the knowledge acquired before disruption and achieve a much higher accuracy than in the case when the network starts already in the same configuration as after disruption.

We identify three key reasons for this behaviour. First, knowledge acquired before disruption persists, and is not lost even by isolated nodes, as long as they have even a small local dataset to refresh it through local training. Second, accuracy can be recovered if data is present “somewhere” in the network, even though very distributed across surviving nodes. Third, even modest connectivity supports efficient recovery from failures, as nodes in connected components, thanks to the collaborative nature of decentralised learning, are able to jointly train very accurate models even after the disruption.

Acknowledgments

This work was partially supported by the H2020 HumaneAI Net (952026) and by the CHIST-ERA-19-XAI010 SAI projects. C. Boldrini’s and M. Conti’s work was partly funded by the PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - “FAIR”, A. Passarella’s work was partly funded by the PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000001 - “RESTART”, both funded by the European Commission under the NextGeneration EU programme.

References

[1] P. Bellavista, L. Foschini, A. Mora, Decentralised learning in federated deployment environments: A system-level survey, ACM Comput. Surv. 54 (1) (2022) 15:1–15:38.
[2] A.-L. Barabási, Network science, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371 (1987) (2013) 20120375.
[3] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, B. Agüera y Arcas, Communication-efficient learning of deep networks from decentralized data, in: AISTATS’17, 2017.
[4] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Complex networks: Structure and dynamics, Physics reports 424 (4-5) (2006) 175–308.
[5] R. Albert, H. Jeong, A.-L. Barabási, Error and attack tolerance of complex networks, Nature 406 (6794) (2000) 378–382. doi:10.1038/35019019.
URL https://www.nature.com/articles/35019019
[6] P. Holme, B. J. Kim, C. N. Yoon, S. K. Han, Attack vulnerability of complex networks, Physical Review E 65 (5) (2002) 056109. doi:10.1103/PhysRevE.65.056109.
URL https://link.aps.org/doi/10.1103/PhysRevE.65.056109
[7] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. Nitin Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, R. G. L. D’Oliveira, H. Eichner, S. El Rouayheb, D. Evans, J. Gardner, Z. Garrett, A. Gascón, B. Ghazi, P. B. Gibbons, M. Gruteser, Z. Harchaoui, C. He, L. He, Z. Huo, B. Hutchinson, J. Hsu, M. Jaggi, T. Javidi, G. Joshi, M. Khodak, J. Konecný, A. Korolova, F. Koushanfar, S. Koyejo, T. Lepoint, Y. Liu, P. Mittal, M. Mohri, R. Nock, A. Özgür, R. Pagh, H. Qi, D. Ramage, R. Raskar, M. Raykova, D. Song, W. Song, S. U. Stich, Z. Sun, A. T. Suresh, F. Tramèr, P. Vepakomma, J. Wang, L. Xiong, Z. Xu, Q. Yang, F. X. Yu, H. Yu, S. Zhao, Advances and Open Problems in Federated Learning, Foundations and Trends® in Machine Learning 14 (1–2) (2021) 1–210. doi:10.1561/2200000083.
URL http://www.nowpublishers.com/article/Details/MAL-083
[8] C. Xie, S. Koyejo, I. Gupta, Asynchronous federated optimization, arXiv preprint arXiv:1903.03934 (2019).
[9] Y. Chen, Y. Ning, M. Slawski, H. Rangwala, Asynchronous online federated learning for edge devices with non-iid data, in: 2020 IEEE International Conference on Big Data (Big Data), IEEE, 2020, pp. 15–24.
[10] A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, C. Wachinger, BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning, arXiv (2019) 1–9.
URL http://arxiv.longhoe.net/abs/1905.06731
[11] S. Savazzi, M. Nicoli, V. Rampa, Federated Learning With Cooperating Devices: A Consensus Approach for Massive IoT Networks, IEEE Internet of Things Journal 7 (5) (2020) 4641–4654. doi:10.1109/JIOT.2020.2964162.
URL https://ieeexplore.ieee.org/document/8950073/
[12] T. Sun, D. Li, B. Wang, Decentralized federated averaging, IEEE Trans. Pattern Anal. Mach. Intell. 45 (04) (2023) 4289–4301. doi:10.1109/TPAMI.2022.3196503.
[13] T. Wink, Z. Nochta, An approach for peer-to-peer federated learning, in: 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), IEEE, 2021, pp. 150–157.
[14] L. Palmieri, L. Valerio, C. Boldrini, A. Passarella, The effect of network topologies on fully decentralized learning: a preliminary investigation, in: Proceedings of the 1st International Workshop on Networked AI Systems, 2023, pp. 1–6.
[15] A. Badie-Modiri, C. Boldrini, L. Valerio, J. Kertész, M. Karsai, Initialisation and topology effects in decentralised federated learning, arXiv preprint arXiv:2403.15855 (2024).
[16] L. Palmieri, C. Boldrini, L. Valerio, A. Passarella, M. Conti, Exploring the impact of disrupted peer-to-peer communications on fully decentralized learning in disaster scenarios, in: 2023 International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), IEEE, 2023, pp. 1–6.
[17] L. Valerio, C. Boldrini, A. Passarella, J. Kertész, M. Karsai, G. Iñiguez, Coordination-free decentralised federated learning on complex networks: Overcoming heterogeneity (2023). arXiv:2312.04504.
[18] J. Gao, B. Barzel, A.-L. Barabási, Universal resilience patterns in complex networks, Nature 530 (7590) (2016) 307–312. doi:10.1038/nature16948.
URL http://dx.doi.org/10.1038/nature16948
[19] R. S. Burt, Structural holes and good ideas, American journal of sociology 110 (2) (2004) 349–399.
[20] Y. LeCun, The mnist database of handwritten digits, http://yann.lecun.com/exdb/mnist/ (1998).
[21] L. Valerio, C. Boldrini, A. Passarella, SAI Simulator for Social AI Gossi** (Dec. 2021). doi:10.5281/zenodo.5780042.
URL https://doi.org/10.5281/zenodo.5780042