-
DSig: Breaking the Barrier of Signatures in Data Centers
Authors:
Marcos K. Aguilera,
Clément Burgelin,
Rachid Guerraoui,
Antoine Murat,
Athanasios Xygkis,
Igor Zablotchi
Abstract:
Data centers increasingly host mutually distrustful users on shared infrastructure. A powerful tool to safeguard such users are digital signatures. Digital signatures have revolutionized Internet-scale applications, but current signatures are too slow for the growing genre of microsecond-scale systems in modern data centers. We propose DSig, the first digital signature system to achieve single-dig…
▽ More
Data centers increasingly host mutually distrustful users on shared infrastructure. A powerful tool to safeguard such users are digital signatures. Digital signatures have revolutionized Internet-scale applications, but current signatures are too slow for the growing genre of microsecond-scale systems in modern data centers. We propose DSig, the first digital signature system to achieve single-digit microsecond latency to sign, transmit, and verify signatures in data center systems. DSig is based on the observation that, in many data center applications, the signer of a message knows most of the time who will verify its signature. We introduce a new hybrid signature scheme that combines cheap single-use hash-based signatures verified in the foreground with traditional signatures pre-verified in the background. Compared to prior state-of-the-art signatures, DSig reduces signing time from 18.9 to 0.7 us and verification time from 35.6 to 5.1 us, while kee** signature transmission time below 2.5 us. Moreover, DSig achieves 2.5x higher signing throughput and 6.9x higher verification throughput than the state of the art. We use DSig to (a) bring auditability to two key-value stores (HERD and Redis) and a financial trading system (based on Liquibook) for 86% lower added latency than the state of the art, and (b) replace signatures in BFT broadcast and BFT replication, reducing their latency by 73% and 69%, respectively
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Overcoming the Challenges of Batch Normalization in Federated Learning
Authors:
Rachid Guerraoui,
Rafael Pinot,
Geovani Rizk,
John Stephan,
François Taiani
Abstract:
Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We int…
▽ More
Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We introduce in this paper Federated BatchNorm (FBN), a novel scheme that restores the benefits of batch normalization in federated learning. Essentially, FBN ensures that the batch normalization during training is consistent with what would be achieved in a centralized execution, hence preserving the distribution of the data, and providing running statistics that accurately approximate the global statistics. FBN thereby reduces the external covariate shift and matches the evaluation performance of the centralized setting. We also show that, with a slight increase in complexity, we can robustify FBN to mitigate erroneous statistics and potentially adversarial attacks.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Boosting Robustness by Clip** Gradients in Distributed Learning
Authors:
Youssef Allouah,
Rachid Guerraoui,
Nirupam Gupta,
Ahmed Jellouli,
Geovani Rizk,
John Stephan
Abstract:
Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. Th…
▽ More
Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. The learning guarantee of SOTA Robust-DGD cannot be further improved when model initialization is done arbitrarily. However, we show that it is possible to circumvent the lower bound, and improve the learning performance, when the workers' gradients at model initialization are assumed to be bounded. We prove this by proposing pre-aggregation clip** of workers' gradients, using a novel scheme called adaptive robust clip** (ARC). Incorporating ARC in Robust-DGD provably improves the learning, under the aforementioned assumption on model initialization. The factor of improvement is prominent when the tolerable fraction of misbehaving workers approaches the breakdown point. ARC induces this improvement by constricting the search space, while preserving the robustness property of the original aggregation scheme at the same time. We validate this theoretical finding through exhaustive experiments on benchmark image classification tasks.
△ Less
Submitted 27 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
The Privacy Power of Correlated Noise in Decentralized Learning
Authors:
Youssef Allouah,
Anastasia Koloskova,
Aymane El Firdoussi,
Martin Jaggi,
Rachid Guerraoui
Abstract:
Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose De…
▽ More
Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use.
△ Less
Submitted 3 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
On the Relevance of Byzantine Robust Optimization Against Data Poisoning
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot
Abstract:
The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faul…
▽ More
The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faulty workers}. The problem of {\em Byzantine ML} formalizes these robustness issues by considering a distributed ML environment in which workers (storing a portion of the global dataset) can deviate arbitrarily from the prescribed algorithm. Although the problem has attracted a lot of attention from a theoretical point of view, its practical importance for addressing realistic faults (where the behavior of any worker is locally constrained) remains unclear. It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable. We prove that, while tolerating a wider range of faulty behaviors, Byzantine ML yields solutions that are, in a precise sense, optimal even under the weaker data poisoning threat model. Then, we study a generic data poisoning model wherein some workers have {\em fully-poisonous local data}, i.e., their datasets are entirely corruptible, and the remainders have {\em partially-poisonous local data}, i.e., only a fraction of their local datasets is corruptible. We prove that Byzantine-robust schemes yield optimal solutions against both these forms of data poisoning, and that the former is more harmful when workers have {\em heterogeneous} local data.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Efficient Signature-Free Validated Agreement
Authors:
Pierre Civit,
Muhammad Ayaz Dzulfikar,
Seth Gilbert,
Rachid Guerraoui,
Jovan Komatovic,
Manuel Vidigueira,
Igor Zablotchi
Abstract:
Byzantine agreement enables n processes to agree on a common L-bit value, despite up to t > 0 arbitrary failures. A long line of work has been dedicated to improving the bit complexity of Byzantine agreement in synchrony. This has culminated in COOL, an error-free (deterministically secure against a computationally unbounded adversary) solution that achieves O(nL + n^2 logn) worst-case bit complex…
▽ More
Byzantine agreement enables n processes to agree on a common L-bit value, despite up to t > 0 arbitrary failures. A long line of work has been dedicated to improving the bit complexity of Byzantine agreement in synchrony. This has culminated in COOL, an error-free (deterministically secure against a computationally unbounded adversary) solution that achieves O(nL + n^2 logn) worst-case bit complexity (which is optimal for L > n logn according to the Dolev-Reischuk lower bound). COOL satisfies strong validity: if all correct processes propose the same value, only that value can be decided.
Strong validity is, however, not appropriate for today's state machine replication (SMR) and blockchain protocols. These systems value progress and require a decided value to always be valid, excluding default decisions (such as EMPTY) even in cases where there is no unanimity a priori. Validated Byzantine agreement satisfies this property (called external validity). Yet, the best error-free (or even signature-free) validated agreement solutions achieve only O(n^2L) bit complexity, a far cry from the Omega(nL + n^2) Dolev-Reishcuk lower bound. In this paper, we present two new synchronous algorithms for validated Byzantine agreement, HashExt and ErrorFreeExt, with different trade-offs. Both algorithms are (1) signature-free, (2) optimally resilient (tolerate up to t < n / 3 failures), and (3) early-stop** (terminate in O(f+1) rounds, where f <= t is the actual number of failures). On the one hand, HashExt uses only hashes and achieves O(nL + n^3 kappa) bit complexity, which is optimal for L > n^2 kappa (where kappa is the size of a hash). On the other hand, ErrorFreeExt is error-free, using no cryptography whatsoever, and achieves O( (nL + n^2) logn ) bit complexity, which is near-optimal for any L.
△ Less
Submitted 17 May, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates
Authors:
Youssef Allouah,
Sadegh Farhadkhani,
Rachid GuerraouI,
Nirupam Gupta,
Rafael Pinot,
Geovani Rizk,
Sasha Voitovych
Abstract:
The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the c…
▽ More
The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.
△ Less
Submitted 10 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Partial Synchrony for Free? New Upper Bounds for Byzantine Agreement
Authors:
Pierre Civit,
Muhammad Ayaz Dzulfikar,
Seth Gilbert,
Rachid Guerraoui,
Jovan Komatovic,
Manuel Vidigueira,
Igor Zablotchi
Abstract:
Byzantine agreement allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine agreement exchanges Omega(n^2) bits. In synchronous networks, solutions with optimal O(n^2) bit complexity, optimal fault tolerance, and no cryptography have been established for over three decades. However, these s…
▽ More
Byzantine agreement allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine agreement exchanges Omega(n^2) bits. In synchronous networks, solutions with optimal O(n^2) bit complexity, optimal fault tolerance, and no cryptography have been established for over three decades. However, these solutions lack robustness under adverse network conditions. Therefore, research has increasingly focused on Byzantine agreement for partially synchronous networks. Numerous solutions have been proposed for the partially synchronous setting. However, these solutions are notoriously hard to prove correct, and the most efficient cryptography-free algorithms still require O(n^3) exchanged bits in the worst case. In this paper, we introduce Oper, the first generic transformation of deterministic Byzantine agreement algorithms from synchrony to partial synchrony. Oper requires no cryptography, is optimally resilient (n >= 3t+1, where t is the maximum number of failures), and preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Oper, we present the first partially synchronous Byzantine agreement algorithm that (1) achieves optimal O(n^2) bit complexity, (2) requires no cryptography, and (3) is optimally resilient (n >= 3t+1), thus showing that the Dolev-Reischuk bound is tight even in partial synchrony. Moreover, we adapt Oper for long values and obtain several new partially synchronous algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions.
△ Less
Submitted 5 April, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Robustness, Efficiency, or Privacy: Pick Two in Machine Learning
Authors:
Youssef Allouah,
Rachid Guerraoui,
John Stephan
Abstract:
The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the cost…
▽ More
The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the costs associated with achieving these objectives in distributed ML architectures, from both theoretical and empirical perspectives. We overview the meanings of privacy and robustness in distributed ML, and clarify how they can be achieved efficiently in isolation. However, we contend that the integration of these two objectives entails a notable compromise in computational efficiency. In short, traditional noise injection hurts accuracy by concealing poisoned inputs, while cryptographic methods clash with poisoning defenses due to their non-linear nature. However, we outline future research directions aimed at reconciling this compromise with efficiency by considering weaker threat models.
△ Less
Submitted 11 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
All Byzantine Agreement Problems are Expensive
Authors:
Pierre Civit,
Seth Gilbert,
Rachid Guerraoui,
Jovan Komatovic,
Anton Paramonov,
Manuel Vidigueira
Abstract:
Byzantine agreement, arguably the most fundamental problem in distributed computing, operates among n processes, out of which t < n can exhibit arbitrary failures. The problem states that all correct (non-faulty) processes must eventually decide (termination) the same value (agreement) from a set of admissible values defined by the proposals of the processes (validity). Depending on the exact vers…
▽ More
Byzantine agreement, arguably the most fundamental problem in distributed computing, operates among n processes, out of which t < n can exhibit arbitrary failures. The problem states that all correct (non-faulty) processes must eventually decide (termination) the same value (agreement) from a set of admissible values defined by the proposals of the processes (validity). Depending on the exact version of the validity property, Byzantine agreement comes in different forms, from Byzantine broadcast to strong and weak consensus, to modern variants of the problem introduced in today's blockchain systems. Regardless of the specific flavor of the agreement problem, its communication cost is a fundamental metric whose improvement has been the focus of decades of research. The Dolev-Reischuk bound, one of the most celebrated results in distributed computing, proved 40 years ago that, at least for Byzantine broadcast, no deterministic solution can do better than Omega(t^2) exchanged messages in the worst case. Since then, it remained unknown whether the quadratic lower bound extends to seemingly weaker variants of Byzantine agreement. This paper answers the question in the affirmative, closing this long-standing open problem. Namely, we prove that any non-trivial agreement problem requires Omega(t^2) messages to be exchanged in the worst case. To prove the general lower bound, we determine the weakest Byzantine agreement problem and show, via a novel indistinguishability argument, that it incurs Omega(t^2) exchanged messages.
△ Less
Submitted 15 November, 2023; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Epidemic Learning: Boosting Decentralized Learning with Randomized Communication
Authors:
Martijn de Vos,
Sadegh Farhadkhani,
Rachid Guerraoui,
Anne-Marie Kermarrec,
Rafael Pires,
Rishi Sharma
Abstract:
We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonst…
▽ More
We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.
△ Less
Submitted 27 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity
Authors:
Youssef Allouah,
Rachid Guerraoui,
Nirupam Gupta,
Rafaël Pinot,
Geovani Rizk
Abstract:
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model consider…
▽ More
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.
△ Less
Submitted 28 October, 2023; v1 submitted 24 September, 2023;
originally announced September 2023.
-
SABLE: Secure And Byzantine robust LEarning
Authors:
Antoine Choffrut,
Rachid Guerraoui,
Rafael Pinot,
Renaud Sirdey,
John Stephan,
Martin Zuber
Abstract:
Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes pos…
▽ More
Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes poses a risk of corrupting the learning process by transmitting inaccurate information to the server. Secondly, a curious server may compromise the privacy of individual nodes, sometimes reconstructing the entirety of the nodes' data. Homomorphic encryption (HE) has emerged as a leading security measure to preserve privacy in distributed learning under non-Byzantine scenarios. However, the extensive computational demands of HE, particularly for high-dimensional ML models, have deterred attempts to design purely homomorphic operators for non-linear robust aggregators. This paper introduces SABLE, the first homomorphic and Byzantine robust distributed learning algorithm. SABLE leverages HTS, a novel and efficient homomorphic operator implementing the prominent coordinate-wise trimmed mean robust aggregator. Designing HTS enables us to implement HMED, a novel homomorphic median aggregator. Extensive experiments on standard ML tasks demonstrate that SABLE achieves practical execution times while maintaining an ML accuracy comparable to its non-private counterpart.
△ Less
Submitted 14 December, 2023; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Strong Byzantine Agreement with Adaptive Word Complexity
Authors:
Pierre Civit,
Seth Gilbert,
Rachid Guerraoui,
Jovan Komatovic,
Manuel Vidigueira
Abstract:
The strong Byzantine agreement (SBA) problem is defined among n processes, out of which t < n can be faulty and behave arbitrarily. SBA allows correct (non-faulty) processes to agree on a common value. Moreover, if all correct processes have proposed the same value, only that value can be agreed upon. It has been known for a long time that any solution to the SBA problem incurs quadratic worst-cas…
▽ More
The strong Byzantine agreement (SBA) problem is defined among n processes, out of which t < n can be faulty and behave arbitrarily. SBA allows correct (non-faulty) processes to agree on a common value. Moreover, if all correct processes have proposed the same value, only that value can be agreed upon. It has been known for a long time that any solution to the SBA problem incurs quadratic worst-case word complexity; additionally, the bound was known to be tight. However, no existing protocol achieves adaptive word complexity, where the number of exchanged words depends on the actual number of faults, and not on the upper bound. Therefore, it is still unknown whether SBA with adaptive word complexity exists. This paper answers the question in the affirmative. Namely, we introduce STRONG, a synchronous protocol that solves SBA among n = (2 + Omega(1))t + 1 processes and achieves adaptive word complexity. We show that the fundamental challenge of adaptive SBA lies in efficiently solving certification, the problem of obtaining a constant-sized, locally-verifiable proof that a value can safely be decided.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
On the Inherent Anonymity of Gossi**
Authors:
Rachid Guerraoui,
Anne-Marie Kermarrec,
Anastasiia Kucherenko,
Rafael Pinot,
Sasha Voitovych
Abstract:
Detecting the source of a gossip is a critical issue, related to identifying patient zero in an epidemic, or the origin of a rumor in a social network. Although it is widely acknowledged that random and local gossip communications make source identification difficult, there exists no general quantification of the level of anonymity provided to the source. This paper presents a principled method ba…
▽ More
Detecting the source of a gossip is a critical issue, related to identifying patient zero in an epidemic, or the origin of a rumor in a social network. Although it is widely acknowledged that random and local gossip communications make source identification difficult, there exists no general quantification of the level of anonymity provided to the source. This paper presents a principled method based on $\varepsilon$-differential privacy to analyze the inherent source anonymity of gossi** for a large class of graphs. First, we quantify the fundamental limit of source anonymity any gossip protocol can guarantee in an arbitrary communication graph. In particular, our result indicates that when the graph has poor connectivity, no gossip protocol can guarantee any meaningful level of differential privacy. This prompted us to further analyze graphs with controlled connectivity. We prove on these graphs that a large class of gossip protocols, namely cobra walks, offers tangible differential privacy guarantees to the source. In doing so, we introduce an original proof technique based on the reduction of a gossip protocol to what we call a random walk with probabilistic die out. This proof technique is of independent interest to the gossip community and readily extends to other protocols inherited from the security community, such as the Dandelion protocol. Interestingly, our tight analysis precisely captures the trade-off between dissemination time of a gossip protocol and its source anonymity.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Authors:
Andrei Kucharavy,
Rachid Guerraoui,
Ljiljana Dolamic
Abstract:
Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Arti…
▽ More
Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Artificial Neural Networks (ANNs), leading to better machine learning algorithms. However, SGD is not applicable in a non-differentiable setting, leaving all that prior research off the table.
In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs. We refer to such EAs as Gillespie-Orr EA class (GO-EAs) and empirically show how an insight transfer from SGD can work for them. We then show that for ANNs trained to near-optimality or in the transfer learning setting, the equivalence also allows transferring the insights from the Mutational Landscapes model to SGD.
We then leverage this equivalence to experimentally show how SGD and GO-EAs can provide mutual insight through examples of minima flatness, transfer learning, and mixing of individuals in EAs applied to large models.
△ Less
Submitted 20 May, 2023;
originally announced June 2023.
-
Every Bit Counts in Consensus
Authors:
Pierre Civit,
Seth Gilbert,
Rachid Guerraoui,
Jovan Komatovic,
Matteo Monti,
Manuel Vidigueira
Abstract:
Consensus enables n processes to agree on a common valid L-bit value, despite t < n/3 processes being faulty and acting arbitrarily. A long line of work has been dedicated to improving the worst-case communication complexity of consensus in partial synchrony. This has recently culminated in the worst-case word complexity of O(n^2). However, the worst-case bit complexity of the best solution is sti…
▽ More
Consensus enables n processes to agree on a common valid L-bit value, despite t < n/3 processes being faulty and acting arbitrarily. A long line of work has been dedicated to improving the worst-case communication complexity of consensus in partial synchrony. This has recently culminated in the worst-case word complexity of O(n^2). However, the worst-case bit complexity of the best solution is still O(n^2 L + n^2 kappa) (where kappa is the security parameter), far from the Ω(n L + n^2) lower bound. The gap is significant given the practical use of consensus primitives, where values typically consist of batches of large size (L > n).
This paper shows how to narrow the aforementioned gap while achieving optimal linear latency. Namely, we present a new algorithm, DARE (Disperse, Agree, REtrieve), that improves upon the O(n^2 L) term via a novel dispersal primitive. DARE achieves O(n^{1.5} L + n^{2.5} kappa) bit complexity, an effective sqrt{n}-factor improvement over the state-of-the-art (when L > n kappa). Moreover, we show that employing heavier cryptographic primitives, namely STARK proofs, allows us to devise DARE-Stark, a version of DARE which achieves the near-optimal bit complexity of O(n L + n^2 poly(kappa)). Both DARE and DARE-Stark achieve optimal O(n) latency.
△ Less
Submitted 7 August, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search
Authors:
Andrei Kucharavy,
Matteo Monti,
Rachid Guerraoui,
Ljiljana Dolamic
Abstract:
Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them.
Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly…
▽ More
Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them.
Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly pressing concern. While distributed byzantine resilient algorithms have been proposed in a differentiable setting, none exist in a gradient-free setting.
The goal of this work is to address this shortcoming. For that, we introduce a more general definition of byzantine-resilience in ML - the \textit{model-consensus}, that extends the definition of the classical distributed consensus. We then leverage this definition to show that a general class of gradient-free ML algorithms - ($1,λ$)-Evolutionary Search - can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms. We provide proofs and pseudo-code for two specific cases - the Total Order Broadcast and proof-of-work leader election.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs
Authors:
Da Silva Gameiro Henrique,
Andrei Kucharavy,
Rachid Guerraoui
Abstract:
The self-attention revolution allowed generative language models to scale and achieve increasingly impressive abilities. Such models - commonly referred to as Large Language Models (LLMs) - have recently gained prominence with the general public, thanks to conversational fine-tuning, putting their behavior in line with public expectations regarding AI. This prominence amplified prior concerns rega…
▽ More
The self-attention revolution allowed generative language models to scale and achieve increasingly impressive abilities. Such models - commonly referred to as Large Language Models (LLMs) - have recently gained prominence with the general public, thanks to conversational fine-tuning, putting their behavior in line with public expectations regarding AI. This prominence amplified prior concerns regarding the misuse of LLMs and led to the emergence of numerous tools to detect LLMs in the wild.
Unfortunately, most such tools are critically flawed. While major publications in the LLM detectability field suggested that LLMs were easy to detect with fine-tuned autoencoders, the limitations of their results are easy to overlook. Specifically, they assumed publicly available generative models without fine-tunes or non-trivial prompts. While the importance of these assumptions has been demonstrated, until now, it remained unclear how well such detection could be countered.
Here, we show that an attacker with access to such detectors' reference human texts and output not only evades detection but can fully frustrate the detector training - with a reasonable budget and all its outputs labeled as such. Achieving it required combining common "reinforcement from critic" loss function modification and AdamW optimizer, which led to surprisingly good fine-tuning generalization. Finally, we warn against the temptation to transpose the conclusions obtained in RNN-driven text GANs to LLMs due to their better representative ability.
These results have critical implications for the detection and prevention of malicious use of generative language models, and we hope they will aid the designers of generative models and detectors.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Chop Chop: Byzantine Atomic Broadcast to the Network Limit
Authors:
Martina Camaioni,
Rachid Guerraoui,
Matteo Monti,
Pierre-Louis Roman,
Manuel Vidigueira,
Gauthier Voron
Abstract:
At the heart of state machine replication, the celebrated technique enabling decentralized and secure universal computation, lies Atomic Broadcast, a fundamental communication primitive that orders, authenticates, and deduplicates messages. This paper presents Chop Chop, a Byzantine Atomic Broadcast system that amortizes the cost of ordering, authenticating and deduplicating messages, achieving "l…
▽ More
At the heart of state machine replication, the celebrated technique enabling decentralized and secure universal computation, lies Atomic Broadcast, a fundamental communication primitive that orders, authenticates, and deduplicates messages. This paper presents Chop Chop, a Byzantine Atomic Broadcast system that amortizes the cost of ordering, authenticating and deduplicating messages, achieving "line rate" (i.e., closely matching the complexity of a protocol that does not ensure any ordering, authentication or Byzantine resilience) even when processing messages as small as 8 bytes. Chop Chop attains this performance by means of a new form of batching we call distillation. A distilled batch is a set of messages that are fast to authenticate and deduplicate, as well as order. Batches are distilled using a novel interactive mechanism involving brokers, an untrusted layer of facilitating processes between clients and servers. In a geo-distributed deployment of 64 medium-sized servers, with clients situated cross-cloud, Chop Chop processes 43,600,000 messages per second with an average latency of 3.6 seconds. Under the same conditions, state-of-the-art alternatives offer two orders of magnitude less throughput for the same latency. We showcase three simple Chop Chop applications: a Payment system, an Auction house and a "Pixel war" game, respectively achieving 32, 2.3 and 35 million operations per second.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
On the Privacy-Robustness-Utility Trilemma in Distributed Learning
Authors:
Youssef Allouah,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
John Stephan
Abstract:
The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any alg…
▽ More
The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines' data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.
△ Less
Submitted 29 May, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Fixing by Mixing: A Recipe for Optimal Byzantine ML under Heterogeneity
Authors:
Youssef Allouah,
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
John Stephan
Abstract:
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine mach…
▽ More
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
On the Validity of Consensus
Authors:
Pierre Civit,
Seth Gilbert,
Rachid Guerraoui,
Jovan Komatovic,
Manuel Vidigueira
Abstract:
The Byzantine consensus problem involves $n$ processes, out of which t < n could be faulty and behave arbitrarily. Three properties characterize consensus: (1) termination, requiring correct (non-faulty) processes to eventually reach a decision, (2) agreement, preventing them from deciding different values, and (3) validity, precluding ``unreasonable'' decisions. But, what is a reasonable decision…
▽ More
The Byzantine consensus problem involves $n$ processes, out of which t < n could be faulty and behave arbitrarily. Three properties characterize consensus: (1) termination, requiring correct (non-faulty) processes to eventually reach a decision, (2) agreement, preventing them from deciding different values, and (3) validity, precluding ``unreasonable'' decisions. But, what is a reasonable decision? Strong validity, a classical property, stipulates that, if all correct processes propose the same value, only that value can be decided. Weak validity, another established property, stipulates that, if all processes are correct and they propose the same value, that value must be decided. The space of possible validity properties is vast. However, their impact on consensus remains unclear.
This paper addresses the question of which validity properties allow Byzantine consensus to be solvable with partial synchrony, and at what cost. First, we determine necessary and sufficient conditions for a validity property to make the consensus problem solvable; we say that such validity properties are solvable. Notably, we prove that, if n <= 3t, all solvable validity properties are trivial (there exists an always-admissible decision). Furthermore, we show that, with any non-trivial (and solvable) validity property, consensus requires Omega(t^2) messages. This extends the seminal Dolev-Reischuk bound, originally proven for strong validity, to all non-trivial validity properties. Lastly, we give a general Byzantine consensus algorithm, we call Universal, for any solvable (and non-trivial) validity property. Importantly, Universal incurs O(n^2) message complexity. Thus, together with our lower bound, Universal implies a fundamental result in partial synchrony: with t \in Omega(n), the message complexity of all (non-trivial) consensus variants is Theta(n^2).
△ Less
Submitted 25 June, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
uBFT: Microsecond-scale BFT using Disaggregated Memory [Extended Version]
Authors:
Marcos K. Aguilera,
Naama Ben-David,
Rachid Guerraoui,
Antoine Murat,
Athanasios Xygkis,
Igor Zablotchi
Abstract:
We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tail…
▽ More
We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base -- disaggregated memory -- and consumes a practically bounded amount of memory. uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10us. This is at least 50$\times$ faster than MinBFT , a state of the art $2f{+}1$ BFT SMR based on Intel's SGX. We use uBFT to replicate two KV-stores (Memcached and Redis), as well as a financial order matching engine (Liquibook). These applications have low latency (up to 20us) and become Byzantine tolerant with as little as 10us more. The price for uBFT is a small amount of reliable disaggregated memory (less than 1 MiB), which in our prototype consists of a small number of memory servers connected through RDMA and replicated for fault tolerance.
△ Less
Submitted 16 March, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
SurferMonkey: A Decentralized Anonymous Blockchain Intercommunication System via Zero Knowledge Proofs
Authors:
Miguel Díaz Montiel,
Rachid Guerraoui,
Pierre-Louis Roman
Abstract:
Blockchain intercommunication systems enable the exchanges of messages between blockchains. This interoperability promotes innovation, unlocks liquidity and access to assets. However, blockchains are isolated systems that originally were not designed for interoperability. This makes cross-chain communication, or bridges for short, insecure by nature. More precisely, cross-chain systems face securi…
▽ More
Blockchain intercommunication systems enable the exchanges of messages between blockchains. This interoperability promotes innovation, unlocks liquidity and access to assets. However, blockchains are isolated systems that originally were not designed for interoperability. This makes cross-chain communication, or bridges for short, insecure by nature. More precisely, cross-chain systems face security challenges in terms of selfish rational players such as maximal extractable value (MEV) and censorship.
We propose to solve these challenges using zero knowledge proofs (ZKPs) for cross-chain communication. Securing cross-chain communication is remarkably more complex than securing single-chain events as such a system must preserve user security against both on- and off-chain analysis.
To achieve this goal, we propose the following pair of contributions: the DACT protocol and the SurferMonkey infrastructure that supports the DACT protocol. The DACT protocol is a global solution for the anonymity and security challenges of agnostic blockchain intercommunication. DACT breaks on- and off-chain analysis thanks to the use of ZKPs. SurferMonkey is a decentralized infrastructure that implements DACT in practice. Since SurferMonkey works at the blockchain application layer, any decentralized application (dApp) can use SurferMonkey to send any type of message to a dApp on another blockchain. With SurferMonkey, users can neither be censored nor be exposed to MEV. By applying decentralized proactive security, we obtain resilience against selfish rational players, and raise the security bar against cyberattacks. We have implemented a proof of concept (PoC) of SurferMonkey by reverse engineering Tornado Cash and by applying IDEN3 ZKP circuits. SurferMonkey enables new usecases, ranging from anonymous voting and gaming, to a new phase of anonymous decentralized finance (aDeFi).
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores
Authors:
Arsany Guirguis,
Diana Petrescu,
Florin Dinu,
Do Le Quoc,
Javier Picorel,
Rachid Guerraoui
Abstract:
Near-data computation techniques have been successfully deployed to mitigate the cloud network bottleneck between the storage and compute tiers. At Huawei, we are currently looking to get more value from these techniques by broadening their applicability. Machine learning (ML) applications are an appealing and timely target. This paper describes our experience applying near-data computation techni…
▽ More
Near-data computation techniques have been successfully deployed to mitigate the cloud network bottleneck between the storage and compute tiers. At Huawei, we are currently looking to get more value from these techniques by broadening their applicability. Machine learning (ML) applications are an appealing and timely target. This paper describes our experience applying near-data computation techniques to transfer learning (TL), a widely popular ML technique, in the context of disaggregated cloud object stores. Our techniques benefit both cloud providers and users. They improve our operational efficiency while providing users the performance improvements they demand from us. The main practical challenge to consider is that the storage-side computational resources are limited. Our approach is to split the TL deep neural network (DNN) during the feature extraction phase, before the training phase. This reduces the network transfers to the compute tier and further decouples the batch size of feature extraction from the training batch size. This facilitates our second technique, storage-side batch adaptation, which enables increased concurrency in the storage tier while avoiding out-of-memory errors. Guided by these insights, we present HAPI, our processing system for TL that spans the compute and storage tiers while remaining transparent to the user. Our evaluation with several state-of-the-art DNNs, such as ResNet, VGG, and Transformer, shows up to 11x improvement in application runtime and up to 8.3x reduction in the data transferred from the storage to the compute tier compared to running the computation entirely in the compute tier.
△ Less
Submitted 9 January, 2023; v1 submitted 16 October, 2022;
originally announced October 2022.
-
On the Impossible Safety of Large AI Models
Authors:
El-Mahdi El-Mhamdi,
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Lê-Nguyên Hoang,
Rafael Pinot,
Sébastien Rouault,
John Stephan
Abstract:
Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging featur…
▽ More
Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.
△ Less
Submitted 9 May, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Oracular Byzantine Reliable Broadcast [Extended Version]
Authors:
Martina Camaioni,
Rachid Guerraoui,
Matteo Monti,
Manuel Vidigueira
Abstract:
Byzantine Reliable Broadcast (BRB) is a fundamental distributed computing primitive, with applications ranging from notifications to asynchronous payment systems. Motivated by practical consideration, we study Client-Server Byzantine Reliable Broadcast (CSB), a multi-shot variant of BRB whose interface is split between broadcasting clients and delivering servers. We present Draft, an optimally res…
▽ More
Byzantine Reliable Broadcast (BRB) is a fundamental distributed computing primitive, with applications ranging from notifications to asynchronous payment systems. Motivated by practical consideration, we study Client-Server Byzantine Reliable Broadcast (CSB), a multi-shot variant of BRB whose interface is split between broadcasting clients and delivering servers. We present Draft, an optimally resilient implementation of CSB. Like most implementations of BRB, Draft guarantees both liveness and safety in an asynchronous environment. Under good conditions, however, Draft achieves unparalleled efficiency. In a moment of synchrony, free from Byzantine misbehaviour, and at the limit of infinitely many broadcasting clients, a Draft server delivers a $b$-bits payload at an asymptotic amortized cost of $0$ signature verifications, and $\log_2(c) + b$ bits exchanged, where $c$ is the number of clients in the system. This is the information-theoretical minimum number of bits required to convey the payload ($b$ bits, assuming it is compressed), along with an identifier for its sender ($\log_2(c)$ bits, necessary to enumerate any set of $c$ elements, and optimal if broadcasting frequencies are uniform or unknown). These two achievements have profound practical implications. Real-world BRB implementations are often bottlenecked either by expensive signature verifications, or by communication overhead. For Draft, instead, the network is the limit: a server can deliver payloads as quickly as it would receive them from an infallible oracle.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Robust Collaborative Learning with Linear Gradient Overhead
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Lê Nguyên Hoang,
Rafael Pinot,
John Stephan
Abstract:
Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous da…
▽ More
Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(α, λ)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.
△ Less
Submitted 3 June, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Carbon: An Asynchronous Voting-Based Payment System for a Client-Server Architecture
Authors:
Martina Camaioni,
Rachid Guerraoui,
Jovan Komatovic,
Matteo Monti,
Manuel Vidigueira
Abstract:
We present Carbon, an asynchronous payment system. To the best of our knowledge, Carbon is the first asynchronous payment system designed specifically for a client-server architecture. Namely, besides being able to make payments, clients of Carbon are capable of changing the set of running servers using a novel voting mechanism -- asynchronous, balance-based voting.
We present Carbon, an asynchronous payment system. To the best of our knowledge, Carbon is the first asynchronous payment system designed specifically for a client-server architecture. Namely, besides being able to make payments, clients of Carbon are capable of changing the set of running servers using a novel voting mechanism -- asynchronous, balance-based voting.
△ Less
Submitted 30 September, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Byzantine Consensus is Θ(n^2): The Dolev-Reischuk Bound is Tight even in Partial Synchrony! [Extended Version]
Authors:
Pierre Civit,
Muhammad Ayaz Dzulfikar,
Seth Gilbert,
Vincent Gramoli,
Rachid Guerraoui,
Jovan Komatovic,
Manuel Vidigueira
Abstract:
The Dolev-Reischuk bound says that any deterministic Byzantine consensus protocol has (at least) quadratic communication complexity in the worst case. While it has been shown that the bound is tight in synchronous environments, it is still unknown whether a consensus protocol with quadratic communication complexity can be obtained in partial synchrony. Until now, the most efficient known solutions…
▽ More
The Dolev-Reischuk bound says that any deterministic Byzantine consensus protocol has (at least) quadratic communication complexity in the worst case. While it has been shown that the bound is tight in synchronous environments, it is still unknown whether a consensus protocol with quadratic communication complexity can be obtained in partial synchrony. Until now, the most efficient known solutions for Byzantine consensus in partially synchronous settings had cubic communication complexity (e.g., HotStuff, binary DBFT).
This paper closes the existing gap by introducing SQuad, a partially synchronous Byzantine consensus protocol with quadratic worst-case communication complexity. In addition, SQuad is optimally-resilient and achieves linear worst-case latency complexity. The key technical contribution underlying SQuad lies in the way we solve view synchronization, the problem of bringing all correct processes to the same view with a correct leader for sufficiently long. Concretely, we present RareSync, a view synchronization protocol with quadratic communication complexity and linear latency complexity, which we utilize in order to obtain SQuad.
△ Less
Submitted 6 September, 2022; v1 submitted 19 August, 2022;
originally announced August 2022.
-
Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
John Stephan
Abstract:
Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arg…
▽ More
Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches.
We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: \emph{resilient averaging} at the server and \emph{distributed momentum} at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Robust Sparse Voting
Authors:
Youssef Allouah,
Rachid Guerraoui,
Lê-Nguyên Hoang,
Oscar Villemaud
Abstract:
Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to ha…
▽ More
Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to hack the voting process by reporting dishonest scores. We give a precise definition of the problem of robust sparse voting, highlight its underlying technical challenges, and present a novel voting mechanism addressing the problem. We prove that, using this mechanism, no voter can have more than a small parameterizable effect on each alternative's score; a property we call Lipschitz resilience. We also identify conditions of voters comparability under which any unanimous preferences can be recovered, even when each voter provides sparse scores, on a scale that is potentially very different from any other voter's score scale. Proving these properties required us to introduce, analyze and carefully compose novel aggregation primitives which could be of independent interest.
△ Less
Submitted 25 January, 2024; v1 submitted 17 February, 2022;
originally announced February 2022.
-
An Equivalence Between Data Poisoning and Byzantine Gradient Attacks
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Lê-Nguyên Hoang,
Oscar Villemaud
Abstract:
To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this mod…
▽ More
To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.
△ Less
Submitted 20 July, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Combining Differential Privacy and Byzantine Resilience in Distributed SGD
Authors:
Rachid Guerraoui,
Nirupam Gupta,
Rafael Pinot,
Sebastien Rouault,
John Stephan
Abstract:
Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architectu…
▽ More
Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(α,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(α,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
△ Less
Submitted 5 October, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Frugal Byzantine Computing
Authors:
M. K. Aguilera,
N. Ben-David,
R. Guerraoui,
D. Papuc,
A. Xygkis,
I. Zablotchi
Abstract:
Traditional techniques for handling Byzantine failures are expensive: digital signatures are too costly, while using $3f{+}1$ replicas is uneconomical ($f$ denotes the maximum number of Byzantine processes). We seek algorithms that reduce the number of replicas to $2f{+}1$ and minimize the number of signatures. While the first goal can be achieved in the message-and-memory model, accomplishing the…
▽ More
Traditional techniques for handling Byzantine failures are expensive: digital signatures are too costly, while using $3f{+}1$ replicas is uneconomical ($f$ denotes the maximum number of Byzantine processes). We seek algorithms that reduce the number of replicas to $2f{+}1$ and minimize the number of signatures. While the first goal can be achieved in the message-and-memory model, accomplishing the second goal simultaneously is challenging. We first address this challenge for the problem of broadcasting messages reliably. We consider two variants of this problem, Consistent Broadcast and Reliable Broadcast, typically considered very close. Perhaps surprisingly, we establish a separation between them in terms of signatures required. In particular, we show that Consistent Broadcast requires at least 1 signature in some execution, while Reliable Broadcast requires $O(n)$ signatures in some execution. We present matching upper bounds for both primitives within constant factors. We then turn to the problem of consensus and argue that this separation matters for solving consensus with Byzantine failures: we present a practical consensus algorithm that uses Consistent Broadcast as its main communication primitive. This algorithm works for $n=2f{+}1$ and avoids signatures in the common-case -- properties that have not been simultaneously achieved previously. Overall, our work approaches Byzantine computing in a frugal manner and motivates the use of Consistent Broadcast -- rather than Reliable Broadcast -- as a key primitive for reaching agreement.
△ Less
Submitted 3 August, 2021;
originally announced August 2021.
-
Velos: One-sided Paxos for RDMA applications
Authors:
Rachid Guerraoui,
Antoine Murat,
Athanasios Xygkis
Abstract:
Modern data centers are becoming increasingly equipped with RDMA-capable NICs. These devices enable distributed systems to rely on algorithms designed for shared memory. RDMA allows consensus to terminate within a few microsecond in failure-free scenarios, yet, RDMA-optimized algorithms still use expensive two-sided operations in case of failure. In this work, we present a new leader-based consens…
▽ More
Modern data centers are becoming increasingly equipped with RDMA-capable NICs. These devices enable distributed systems to rely on algorithms designed for shared memory. RDMA allows consensus to terminate within a few microsecond in failure-free scenarios, yet, RDMA-optimized algorithms still use expensive two-sided operations in case of failure. In this work, we present a new leader-based consensus algorithm that relies solely on one-sided RDMA verbs. Our algorithm is based on Paxos, it decides in a single one-sided RDMA operation in the common case, and changes leader also in a single one-sided RDMA operation in case of failure. We implement our algorithm in the form of an SMR system named Velos, and we evaluated our system against the state-of-the-art competitor Mu. Compared to Mu, our solution adds a small overhead of approximately 0.6 microseconds in failure-free executions and shines during failover periods during which it is 13 times faster in changing leader.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Strategyproof Learning: Building Trustworthy User-Generated Datasets
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Lê-Nguyên Hoang
Abstract:
We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality. By leveraging a careful design of the loss function, we propose Licchavi, a global and personalized learning framework with provable strategyproofness guarantees. Essentially, we prove that no user can gain much by replying to Licchavi's queries with answers that deviate from their true preference…
▽ More
We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality. By leveraging a careful design of the loss function, we propose Licchavi, a global and personalized learning framework with provable strategyproofness guarantees. Essentially, we prove that no user can gain much by replying to Licchavi's queries with answers that deviate from their true preferences. Interestingly, Licchavi also promotes the desirable "one person, one unit-force vote" fairness principle. Furthermore, our empirical evaluation of its performance showcases Licchavi's real-world applicability. We believe that our results are critical for the safety of any learning scheme that leverages user-generated data.
△ Less
Submitted 18 February, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
On the Strategyproofness of the Geometric Median
Authors:
El-Mahdi El-Mhamdi,
Sadegh Farhadkhani,
Rachid Guerraoui,
Lê-Nguyên Hoang
Abstract:
The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness…
▽ More
The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness. While we observe that the geometric median is not even approximately strategyproof, we prove that it is asymptotically $α$-strategyproof: when the number of users is large enough, a user that misbehaves can gain at most a multiplicative factor $α$, which we compute as a function of the distribution followed by the users. We then generalize our results to the case where users actually care more about specific dimensions, determining how this impacts $α$. We also show how the skewed geometric medians can be used to improve strategyproofness.
△ Less
Submitted 2 June, 2023; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?
Authors:
Rachid Guerraoui,
Nirupam Gupta,
Rafaël Pinot,
Sébastien Rouault,
John Stephan
Abstract:
This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theor…
▽ More
This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. A key finding of our analyses is that the classical approaches to these two (seemingly) orthogonal issues are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters of the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets; showing that it is impractical to ensure DP and Byzantine resilience simultaneously.
△ Less
Submitted 24 June, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Microsecond Consensus for Microsecond Applications
Authors:
Marcos K. Aguilera,
Naama Ben-David,
Rachid Guerraoui,
Virendra J. Marathe,
Athanasios Xygkis,
Igor Zablotchi
Abstract:
We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memo…
▽ More
We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%.
Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more - challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design.
We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Garfield: System Support for Byzantine Machine Learning
Authors:
Rachid Guerraoui,
Arsany Guirguis,
Jérémy Max Plassmann,
Anton Alexandre Ragot,
Sébastien Rouault
Abstract:
We present Garfield, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine-resilient. Garfield relies on a novel object-oriented design, reducing the coding effort, and addressing the vulnerability of the shared-graph architecture followed by classical ML frameworks. Garfield encompasses va…
▽ More
We present Garfield, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine-resilient. Garfield relies on a novel object-oriented design, reducing the coding effort, and addressing the vulnerability of the shared-graph architecture followed by classical ML frameworks. Garfield encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the very practical cost of Byzantine resilience in SGD-based ML applications. We report on the usage of Garfield on three main ML architectures: (a) a single server with multiple workers, (b) several servers and workers, and (c) peer-to-peer settings. Using Garfield, we highlight several interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (b) the throughput overhead comes more from communication than from robust aggregation, and (c) tolerating Byzantine servers costs more than tolerating Byzantine workers.
△ Less
Submitted 31 December, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Efficient Multi-word Compare and Swap
Authors:
Rachid Guerraoui,
Alex Kogan,
Virendra J. Marathe,
Igor Zablotchi
Abstract:
Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes requi…
▽ More
Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes required to implement MCAS. We first prove in this paper that it is impossible to "pack" the information required to perform a k-word CAS (k-CAS) in less than k locations to be CASed. Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case. We implement our algorithm and show that it outperforms a state-of-the-art baseline in a variety of benchmarks in most considered workloads. We also present a durably linearizable (persistent memory friendly) version of our MCAS algorithm using only 2 persistence fences per call, while still only requiring k+1 CASes per k-CAS.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)
Authors:
El-Mahdi El-Mhamdi,
Sadegh Farhadkhani,
Rachid Guerraoui,
Arsany Guirguis,
Lê Nguyên Hoang,
Sébastien Rouault
Abstract:
We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f < n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial v…
▽ More
We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f < n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial vector and seek to approximately agree on a common vector, which is close to the average of honest nodes' initial vectors. We present two asynchronous solutions to averaging agreement, each we prove optimal according to some dimension. The first, based on the minimum-diameter averaging, requires $ n \geq 6f+1$, but achieves asymptotically the best-possible averaging constant up to a multiplicative constant. The second, based on reliable broadcast and coordinate-wise trimmed mean, achieves optimal Byzantine resilience, i.e., $n \geq 3f+1$. Each of these algorithms induces an optimal Byzantine collaborative learning protocol. In particular, our equivalence yields new impossibility theorems on what any collaborative learning algorithm can achieve in adversarial and heterogeneous environments.
△ Less
Submitted 1 December, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
FLeet: Online Federated Learning via Staleness Awareness and Performance Prediction
Authors:
Georgios Damaskinos,
Rachid Guerraoui,
Anne-Marie Kermarrec,
Vlad Nitu,
Rhicheek Patra,
Francois Taiani
Abstract:
Federated Learning (FL) is very appealing for its privacy benefits: essentially, a global model is trained with updates computed on mobile devices while kee** the data of users local. Standard FL infrastructures are however designed to have no energy or performance impact on mobile devices, and are therefore not suitable for applications that require frequent (online) model updates, such as news…
▽ More
Federated Learning (FL) is very appealing for its privacy benefits: essentially, a global model is trained with updates computed on mobile devices while kee** the data of users local. Standard FL infrastructures are however designed to have no energy or performance impact on mobile devices, and are therefore not suitable for applications that require frequent (online) model updates, such as news recommenders.
This paper presents FLeet, the first Online FL system, acting as a middleware between the Android OS and the machine learning application. FLeet combines the privacy of Standard FL with the precision of online learning thanks to two core components: (i) I-Prof, a new lightweight profiler that predicts and controls the impact of learning tasks on mobile devices, and (ii) AdaSGD, a new adaptive learning algorithm that is resilient to delayed updates.
Our extensive evaluation shows that Online FL, as implemented by FLeet, can deliver a 2.3x quality boost compared to Standard FL, while only consuming 0.036% of the battery per day. I-Prof can accurately control the impact of learning tasks by improving the prediction accuracy up to 3.6x (computation time) and up to 19x (energy). AdaSGD outperforms alternative FL approaches by 18.4% in terms of convergence speed on heterogeneous data.
△ Less
Submitted 3 December, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Differentially Private Stochastic Coordinate Descent
Authors:
Georgios Damaskinos,
Celestine Mendler-Dünner,
Rachid Guerraoui,
Nikolaos Papandreou,
Thomas Parnell
Abstract:
In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on kee** auxiliary information in…
▽ More
In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on kee** auxiliary information in memory during training. This auxiliary information provides an additional privacy leak and poses the major challenge addressed in this work. Driven by the insight that under independent noise addition, the consistency of the auxiliary information holds in expectation, we present DP-SCD, the first differentially private stochastic coordinate descent algorithm. We analyze our new method theoretically and argue that decoupling and parallelizing coordinate updates is essential for its utility. On the empirical side we demonstrate competitive performance against the popular stochastic gradient descent alternative (DP-SGD) while requiring significantly less tuning.
△ Less
Submitted 14 March, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Host-Pathongen Co-evolution Inspired Algorithm Enables Robust GAN Training
Authors:
Andrei Kucharavy,
El Mahdi El Mhamdi,
Rachid Guerraoui
Abstract:
Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the ge…
▽ More
Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the generation of impressive imitations of real-life films, images and texts, whose fakeness is barely noticeable to humans. Despite their impressive performance, training GANs remains to this day more of an art than a reliable procedure, in a large part due to training process stability. Generators are susceptible to mode drop** and convergence to random patterns, which have to be mitigated by computationally expensive multiple restarts. Curiously, GANs bear an uncanny similarity to a co-evolution of a pathogen and its host's immune system in biology. In a biological context, the majority of potential pathogens indeed never make it and are kept at bay by the hots' immune system. Yet some are efficient enough to present a risk of a serious condition and recurrent infections. Here, we explore that similarity to propose a more robust algorithm for GANs training. We empirically show the increased stability and a better ability to generate high-quality images while using less computational power.
△ Less
Submitted 9 June, 2020; v1 submitted 22 May, 2020;
originally announced June 2020.
-
Online Payments by Merely Broadcasting Messages (Extended Version)
Authors:
Daniel Collins,
Rachid Guerraoui,
Jovan Komatovic,
Matteo Monti,
Athanasios Xygkis,
Matej Pavlovic,
Petr Kuznetsov,
Yvonne-Anne Pignolet,
Dragos-Adrian Seredinschi,
Andrei Tonkikh
Abstract:
We address the problem of online payments, where users can transfer funds among themselves. We introduce Astro, a system solving this problem efficiently in a decentralized, deterministic, and completely asynchronous manner. Astro builds on the insight that consensus is unnecessary to prevent double-spending. Instead of consensus, Astro relies on a weaker primitive---Byzantine reliable broadcast--…
▽ More
We address the problem of online payments, where users can transfer funds among themselves. We introduce Astro, a system solving this problem efficiently in a decentralized, deterministic, and completely asynchronous manner. Astro builds on the insight that consensus is unnecessary to prevent double-spending. Instead of consensus, Astro relies on a weaker primitive---Byzantine reliable broadcast---enabling a simpler and more efficient implementation than consensus-based payment systems.
In terms of efficiency, Astro executes a payment by merely broadcasting a message. The distinguishing feature of Astro is that it can maintain performance robustly, i.e., remain unaffected by a fraction of replicas being compromised or slowed down by an adversary. Our experiments on a public cloud network show that Astro can achieve near-linear scalability in a sharded setup, going from $10K$ payments/sec (2 shards) to $20K$ payments/sec (4 shards). In a nutshell, Astro can match VISA-level average payment throughput, and achieves a $5\times$ improvement over a state-of-the-art consensus-based solution, while exhibiting sub-second $95^{th}$ percentile latency.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Distributed Momentum for Byzantine-resilient Learning
Authors:
El-Mahdi El-Mhamdi,
Rachid Guerraoui,
Sébastien Rouault
Abstract:
Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules…
▽ More
Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules. In this work, we demonstrate the benefits on robustness of using momentum at the worker side. We first prove that computing momentum at the workers reduces the variance-norm ratio of the gradient estimation at the server, strengthening Byzantine resilient aggregation rules. We then provide an extensive experimental demonstration of the robustness effect of worker-side momentum on distributed SGD.
△ Less
Submitted 9 March, 2020; v1 submitted 28 February, 2020;
originally announced March 2020.
-
Dynamic Byzantine Reliable Broadcast [Technical Report]
Authors:
Rachid Guerraoui,
Jovan Komatovic,
Petr Kuznetsov,
Yvonne-Anne Pignolet,
Dragos-Adrian Seredinschi,
Andrei Tonkikh
Abstract:
Reliable broadcast is a communication primitive guaranteeing, intuitively, that all processes in a distributed system deliver the same set of messages. The reason why this primitive is appealing is twofold: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) reliable broadcast is power…
▽ More
Reliable broadcast is a communication primitive guaranteeing, intuitively, that all processes in a distributed system deliver the same set of messages. The reason why this primitive is appealing is twofold: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) reliable broadcast is powerful enough to implement important applications like payment systems.
The problem we tackle in this paper is that of dynamic reliable broadcast, i.e., enabling processes to join or leave the system. This property is desirable for long-lived applications (aiming to be highly available), yet has been precluded in previous asynchronous reliable broadcast protocols. We study this property in a general adversarial (i.e., Byzantine) environment.
We introduce the first specification of a dynamic Byzantine reliable broadcast (DBRB) primitive that is amenable to an asynchronous implementation. We then present an algorithm implementing this specification in an asynchronous network. Our DBRB algorithm ensures that if any correct process in the system broadcasts a message, then every correct process delivers that message unless it leaves the system. Moreover, if a correct process delivers a message, then every correct process that has not expressed its will to leave the system delivers that message. We assume that more than $2/3$ of processes in the system are correct at all times, which is tight in our context.
We also show that if only one process in the system can fail---and it can fail only by crashing---then it is impossible to implement a stronger primitive, ensuring that if any correct process in the system broadcasts or delivers a message, then every correct process in the system delivers that message---including those that leave.
△ Less
Submitted 20 November, 2020; v1 submitted 17 January, 2020;
originally announced January 2020.