Search | arXiv e-print repository

DSig: Breaking the Barrier of Signatures in Data Centers

Authors: Marcos K. Aguilera, Clément Burgelin, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, Igor Zablotchi

Abstract: Data centers increasingly host mutually distrustful users on shared infrastructure. A powerful tool to safeguard such users are digital signatures. Digital signatures have revolutionized Internet-scale applications, but current signatures are too slow for the growing genre of microsecond-scale systems in modern data centers. We propose DSig, the first digital signature system to achieve single-dig… ▽ More Data centers increasingly host mutually distrustful users on shared infrastructure. A powerful tool to safeguard such users are digital signatures. Digital signatures have revolutionized Internet-scale applications, but current signatures are too slow for the growing genre of microsecond-scale systems in modern data centers. We propose DSig, the first digital signature system to achieve single-digit microsecond latency to sign, transmit, and verify signatures in data center systems. DSig is based on the observation that, in many data center applications, the signer of a message knows most of the time who will verify its signature. We introduce a new hybrid signature scheme that combines cheap single-use hash-based signatures verified in the foreground with traditional signatures pre-verified in the background. Compared to prior state-of-the-art signatures, DSig reduces signing time from 18.9 to 0.7 us and verification time from 35.6 to 5.1 us, while kee** signature transmission time below 2.5 us. Moreover, DSig achieves 2.5x higher signing throughput and 6.9x higher verification throughput than the state of the art. We use DSig to (a) bring auditability to two key-value stores (HERD and Redis) and a financial trading system (based on Liquibook) for 86% lower added latency than the state of the art, and (b) replace signatures in BFT broadcast and BFT replication, reducing their latency by 73% and 69%, respectively △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: To appear in the proceedings of OSDI '24. Authors listed in alphabetical order

arXiv:2405.14670 [pdf, other]

Overcoming the Challenges of Batch Normalization in Federated Learning

Authors: Rachid Guerraoui, Rafael Pinot, Geovani Rizk, John Stephan, François Taiani

Abstract: Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We int… ▽ More Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We introduce in this paper Federated BatchNorm (FBN), a novel scheme that restores the benefits of batch normalization in federated learning. Essentially, FBN ensures that the batch normalization during training is consistent with what would be achieved in a centralized execution, hence preserving the distribution of the data, and providing running statistics that accurately approximate the global statistics. FBN thereby reduces the external covariate shift and matches the evaluation performance of the centralized setting. We also show that, with a slight increase in complexity, we can robustify FBN to mitigate erroneous statistics and potentially adversarial attacks. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14432 [pdf, other]

Boosting Robustness by Clip** Gradients in Distributed Learning

Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan

Abstract: Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. Th… ▽ More Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. The learning guarantee of SOTA Robust-DGD cannot be further improved when model initialization is done arbitrarily. However, we show that it is possible to circumvent the lower bound, and improve the learning performance, when the workers' gradients at model initialization are assumed to be bounded. We prove this by proposing pre-aggregation clip** of workers' gradients, using a novel scheme called adaptive robust clip** (ARC). Incorporating ARC in Robust-DGD provably improves the learning, under the aforementioned assumption on model initialization. The factor of improvement is prominent when the tolerable fraction of misbehaving workers approaches the breakdown point. ARC induces this improvement by constricting the search space, while preserving the robustness property of the original aggregation scheme at the same time. We validate this theoretical finding through exhaustive experiments on benchmark image classification tasks. △ Less

Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.01031 [pdf, other]

The Privacy Power of Correlated Noise in Decentralized Learning

Authors: Youssef Allouah, Anastasia Koloskova, Aymane El Firdoussi, Martin Jaggi, Rachid Guerraoui

Abstract: Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose De… ▽ More Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Yet, without additional precautions, curious users can still leverage models obtained from their peers to violate privacy. In this paper, we propose Decor, a variant of decentralized SGD with differential privacy (DP) guarantees. Essentially, in Decor, users securely exchange randomness seeds in one communication round to generate pairwise-canceling correlated Gaussian noises, which are injected to protect local models at every communication round. We theoretically and empirically show that, for arbitrary connected graphs, Decor matches the central DP optimal privacy-utility trade-off. We do so under SecLDP, our new relaxation of local DP, which protects all user communications against an external eavesdropper and curious users, assuming that every pair of connected users shares a secret, i.e., an information hidden to all others. The main theoretical challenge is to control the accumulation of non-canceling correlated noise due to network sparsity. We also propose a companion SecLDP privacy accountant for public use. △ Less

Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted as conference paper at ICML 2024

arXiv:2405.00491 [pdf, ps, other]

On the Relevance of Byzantine Robust Optimization Against Data Poisoning

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot

Abstract: The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faul… ▽ More The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faulty workers}. The problem of {\em Byzantine ML} formalizes these robustness issues by considering a distributed ML environment in which workers (storing a portion of the global dataset) can deviate arbitrarily from the prescribed algorithm. Although the problem has attracted a lot of attention from a theoretical point of view, its practical importance for addressing realistic faults (where the behavior of any worker is locally constrained) remains unclear. It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable. We prove that, while tolerating a wider range of faulty behaviors, Byzantine ML yields solutions that are, in a precise sense, optimal even under the weaker data poisoning threat model. Then, we study a generic data poisoning model wherein some workers have {\em fully-poisonous local data}, i.e., their datasets are entirely corruptible, and the remainders have {\em partially-poisonous local data}, i.e., only a fraction of their local datasets is corruptible. We prove that Byzantine-robust schemes yield optimal solutions against both these forms of data poisoning, and that the former is more harmful when workers have {\em heterogeneous} local data. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 38 pages

arXiv:2403.08374 [pdf, other]

Efficient Signature-Free Validated Agreement

Authors: Pierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Igor Zablotchi

Abstract: Byzantine agreement enables n processes to agree on a common L-bit value, despite up to t > 0 arbitrary failures. A long line of work has been dedicated to improving the bit complexity of Byzantine agreement in synchrony. This has culminated in COOL, an error-free (deterministically secure against a computationally unbounded adversary) solution that achieves O(nL + n^2 logn) worst-case bit complex… ▽ More Byzantine agreement enables n processes to agree on a common L-bit value, despite up to t > 0 arbitrary failures. A long line of work has been dedicated to improving the bit complexity of Byzantine agreement in synchrony. This has culminated in COOL, an error-free (deterministically secure against a computationally unbounded adversary) solution that achieves O(nL + n^2 logn) worst-case bit complexity (which is optimal for L > n logn according to the Dolev-Reischuk lower bound). COOL satisfies strong validity: if all correct processes propose the same value, only that value can be decided. Strong validity is, however, not appropriate for today's state machine replication (SMR) and blockchain protocols. These systems value progress and require a decided value to always be valid, excluding default decisions (such as EMPTY) even in cases where there is no unanimity a priori. Validated Byzantine agreement satisfies this property (called external validity). Yet, the best error-free (or even signature-free) validated agreement solutions achieve only O(n^2L) bit complexity, a far cry from the Omega(nL + n^2) Dolev-Reishcuk lower bound. In this paper, we present two new synchronous algorithms for validated Byzantine agreement, HashExt and ErrorFreeExt, with different trade-offs. Both algorithms are (1) signature-free, (2) optimally resilient (tolerate up to t < n / 3 failures), and (3) early-stop** (terminate in O(f+1) rounds, where f <= t is the actual number of failures). On the one hand, HashExt uses only hashes and achieves O(nL + n^3 kappa) bit complexity, which is optimal for L > n^2 kappa (where kappa is the size of a hash). On the other hand, ErrorFreeExt is error-free, using no cryptography whatsoever, and achieves O( (nL + n^2) logn ) bit complexity, which is near-optimal for any L. △ Less

Submitted 17 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.12780 [pdf, other]

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Authors: Youssef Allouah, Sadegh Farhadkhani, Rachid GuerraouI, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych

Abstract: The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the c… ▽ More The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks. △ Less

Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10059 [pdf, other]

Partial Synchrony for Free? New Upper Bounds for Byzantine Agreement

Authors: Pierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Igor Zablotchi

Abstract: Byzantine agreement allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine agreement exchanges Omega(n^2) bits. In synchronous networks, solutions with optimal O(n^2) bit complexity, optimal fault tolerance, and no cryptography have been established for over three decades. However, these s… ▽ More Byzantine agreement allows n processes to decide on a common value, in spite of arbitrary failures. The seminal Dolev-Reischuk bound states that any deterministic solution to Byzantine agreement exchanges Omega(n^2) bits. In synchronous networks, solutions with optimal O(n^2) bit complexity, optimal fault tolerance, and no cryptography have been established for over three decades. However, these solutions lack robustness under adverse network conditions. Therefore, research has increasingly focused on Byzantine agreement for partially synchronous networks. Numerous solutions have been proposed for the partially synchronous setting. However, these solutions are notoriously hard to prove correct, and the most efficient cryptography-free algorithms still require O(n^3) exchanged bits in the worst case. In this paper, we introduce Oper, the first generic transformation of deterministic Byzantine agreement algorithms from synchrony to partial synchrony. Oper requires no cryptography, is optimally resilient (n >= 3t+1, where t is the maximum number of failures), and preserves the worst-case per-process bit complexity of the transformed synchronous algorithm. Leveraging Oper, we present the first partially synchronous Byzantine agreement algorithm that (1) achieves optimal O(n^2) bit complexity, (2) requires no cryptography, and (3) is optimally resilient (n >= 3t+1), thus showing that the Dolev-Reischuk bound is tight even in partial synchrony. Moreover, we adapt Oper for long values and obtain several new partially synchronous algorithms with improved complexity and weaker (or completely absent) cryptographic assumptions. △ Less

Submitted 5 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2312.14712 [pdf, other]

Robustness, Efficiency, or Privacy: Pick Two in Machine Learning

Authors: Youssef Allouah, Rachid Guerraoui, John Stephan

Abstract: The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the cost… ▽ More The success of machine learning (ML) applications relies on vast datasets and distributed architectures which, as they grow, present major challenges. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the costs associated with achieving these objectives in distributed ML architectures, from both theoretical and empirical perspectives. We overview the meanings of privacy and robustness in distributed ML, and clarify how they can be achieved efficiently in isolation. However, we contend that the integration of these two objectives entails a notable compromise in computational efficiency. In short, traditional noise injection hurts accuracy by concealing poisoned inputs, while cryptographic methods clash with poisoning defenses due to their non-linear nature. However, we outline future research directions aimed at reconciling this compromise with efficiency by considering weaker threat models. △ Less

Submitted 11 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2311.08060 [pdf, other]

All Byzantine Agreement Problems are Expensive

Authors: Pierre Civit, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Anton Paramonov, Manuel Vidigueira

Abstract: Byzantine agreement, arguably the most fundamental problem in distributed computing, operates among n processes, out of which t < n can exhibit arbitrary failures. The problem states that all correct (non-faulty) processes must eventually decide (termination) the same value (agreement) from a set of admissible values defined by the proposals of the processes (validity). Depending on the exact vers… ▽ More Byzantine agreement, arguably the most fundamental problem in distributed computing, operates among n processes, out of which t < n can exhibit arbitrary failures. The problem states that all correct (non-faulty) processes must eventually decide (termination) the same value (agreement) from a set of admissible values defined by the proposals of the processes (validity). Depending on the exact version of the validity property, Byzantine agreement comes in different forms, from Byzantine broadcast to strong and weak consensus, to modern variants of the problem introduced in today's blockchain systems. Regardless of the specific flavor of the agreement problem, its communication cost is a fundamental metric whose improvement has been the focus of decades of research. The Dolev-Reischuk bound, one of the most celebrated results in distributed computing, proved 40 years ago that, at least for Byzantine broadcast, no deterministic solution can do better than Omega(t^2) exchanged messages in the worst case. Since then, it remained unknown whether the quadratic lower bound extends to seemingly weaker variants of Byzantine agreement. This paper answers the question in the affirmative, closing this long-standing open problem. Namely, we prove that any non-trivial agreement problem requires Omega(t^2) messages to be exchanged in the worst case. To prove the general lower bound, we determine the weakest Byzantine agreement problem and show, via a novel indistinguishability argument, that it incurs Omega(t^2) exchanged messages. △ Less

Submitted 15 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.01972 [pdf, other]

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Authors: Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma

Abstract: We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonst… ▽ More We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume. △ Less

Submitted 27 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Accepted paper at NeurIPS 2023

arXiv:2309.13591 [pdf, other]

Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity

Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Rafaël Pinot, Geovani Rizk

Abstract: The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model consider… ▽ More The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G,B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction 1/2. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice. △ Less

Submitted 28 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2309.05395 [pdf, other]

SABLE: Secure And Byzantine robust LEarning

Authors: Antoine Choffrut, Rachid Guerraoui, Rafael Pinot, Renaud Sirdey, John Stephan, Martin Zuber

Abstract: Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes pos… ▽ More Due to the widespread availability of data, machine learning (ML) algorithms are increasingly being implemented in distributed topologies, wherein various nodes collaborate to train ML models via the coordination of a central server. However, distributed learning approaches face significant vulnerabilities, primarily stemming from two potential threats. Firstly, the presence of Byzantine nodes poses a risk of corrupting the learning process by transmitting inaccurate information to the server. Secondly, a curious server may compromise the privacy of individual nodes, sometimes reconstructing the entirety of the nodes' data. Homomorphic encryption (HE) has emerged as a leading security measure to preserve privacy in distributed learning under non-Byzantine scenarios. However, the extensive computational demands of HE, particularly for high-dimensional ML models, have deterred attempts to design purely homomorphic operators for non-linear robust aggregators. This paper introduces SABLE, the first homomorphic and Byzantine robust distributed learning algorithm. SABLE leverages HTS, a novel and efficient homomorphic operator implementing the prominent coordinate-wise trimmed mean robust aggregator. Designing HTS enables us to implement HMED, a novel homomorphic median aggregator. Extensive experiments on standard ML tasks demonstrate that SABLE achieves practical execution times while maintaining an ML accuracy comparable to its non-private counterpart. △ Less

Submitted 14 December, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.03524 [pdf, ps, other]

Strong Byzantine Agreement with Adaptive Word Complexity

Authors: Pierre Civit, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira

Abstract: The strong Byzantine agreement (SBA) problem is defined among n processes, out of which t < n can be faulty and behave arbitrarily. SBA allows correct (non-faulty) processes to agree on a common value. Moreover, if all correct processes have proposed the same value, only that value can be agreed upon. It has been known for a long time that any solution to the SBA problem incurs quadratic worst-cas… ▽ More The strong Byzantine agreement (SBA) problem is defined among n processes, out of which t < n can be faulty and behave arbitrarily. SBA allows correct (non-faulty) processes to agree on a common value. Moreover, if all correct processes have proposed the same value, only that value can be agreed upon. It has been known for a long time that any solution to the SBA problem incurs quadratic worst-case word complexity; additionally, the bound was known to be tight. However, no existing protocol achieves adaptive word complexity, where the number of exchanged words depends on the actual number of faults, and not on the upper bound. Therefore, it is still unknown whether SBA with adaptive word complexity exists. This paper answers the question in the affirmative. Namely, we introduce STRONG, a synchronous protocol that solves SBA among n = (2 + Omega(1))t + 1 processes and achieves adaptive word complexity. We show that the fundamental challenge of adaptive SBA lies in efficiently solving certification, the problem of obtaining a constant-sized, locally-verifiable proof that a value can safely be decided. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.02477 [pdf, other]

On the Inherent Anonymity of Gossi**

Authors: Rachid Guerraoui, Anne-Marie Kermarrec, Anastasiia Kucherenko, Rafael Pinot, Sasha Voitovych

Abstract: Detecting the source of a gossip is a critical issue, related to identifying patient zero in an epidemic, or the origin of a rumor in a social network. Although it is widely acknowledged that random and local gossip communications make source identification difficult, there exists no general quantification of the level of anonymity provided to the source. This paper presents a principled method ba… ▽ More Detecting the source of a gossip is a critical issue, related to identifying patient zero in an epidemic, or the origin of a rumor in a social network. Although it is widely acknowledged that random and local gossip communications make source identification difficult, there exists no general quantification of the level of anonymity provided to the source. This paper presents a principled method based on $\varepsilon$-differential privacy to analyze the inherent source anonymity of gossi** for a large class of graphs. First, we quantify the fundamental limit of source anonymity any gossip protocol can guarantee in an arbitrary communication graph. In particular, our result indicates that when the graph has poor connectivity, no gossip protocol can guarantee any meaningful level of differential privacy. This prompted us to further analyze graphs with controlled connectivity. We prove on these graphs that a large class of gossip protocols, namely cobra walks, offers tangible differential privacy guarantees to the source. In doing so, we introduce an original proof technique based on the reduction of a gossip protocol to what we call a random walk with probabilistic die out. This proof technique is of independent interest to the gossip community and readily extends to other protocols inherited from the security community, such as the Dandelion protocol. Interestingly, our tight analysis precisely captures the trade-off between dissemination time of a gossip protocol and its source anonymity. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Full version of DISC2023 paper

arXiv:2306.09991 [pdf, other]

Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning

Authors: Andrei Kucharavy, Rachid Guerraoui, Ljiljana Dolamic

Abstract: Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Arti… ▽ More Whenever applicable, the Stochastic Gradient Descent (SGD) has shown itself to be unreasonably effective. Instead of underperforming and getting trapped in local minima due to the batch noise, SGD leverages it to learn to generalize better and find minima that are good enough for the entire dataset. This led to numerous theoretical and experimental investigations, especially in the context of Artificial Neural Networks (ANNs), leading to better machine learning algorithms. However, SGD is not applicable in a non-differentiable setting, leaving all that prior research off the table. In this paper, we show that a class of evolutionary algorithms (EAs) inspired by the Gillespie-Orr Mutational Landscapes model for natural evolution is formally equivalent to SGD in certain settings and, in practice, is well adapted to large ANNs. We refer to such EAs as Gillespie-Orr EA class (GO-EAs) and empirically show how an insight transfer from SGD can work for them. We then show that for ANNs trained to near-optimality or in the transfer learning setting, the equivalence also allows transferring the insights from the Mutational Landscapes model to SGD. We then leverage this equivalence to experimentally show how SGD and GO-EAs can provide mutual insight through examples of minima flatness, transfer learning, and mixing of individuals in EAs applied to large models. △ Less

Submitted 20 May, 2023; originally announced June 2023.

Comments: To be published in ALIFE 2023; 16 pages, 10 figures, 1 listing

ACM Class: I.2.8; G.1.6

arXiv:2306.00431 [pdf, other]

Every Bit Counts in Consensus

Authors: Pierre Civit, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Matteo Monti, Manuel Vidigueira

Abstract: Consensus enables n processes to agree on a common valid L-bit value, despite t < n/3 processes being faulty and acting arbitrarily. A long line of work has been dedicated to improving the worst-case communication complexity of consensus in partial synchrony. This has recently culminated in the worst-case word complexity of O(n^2). However, the worst-case bit complexity of the best solution is sti… ▽ More Consensus enables n processes to agree on a common valid L-bit value, despite t < n/3 processes being faulty and acting arbitrarily. A long line of work has been dedicated to improving the worst-case communication complexity of consensus in partial synchrony. This has recently culminated in the worst-case word complexity of O(n^2). However, the worst-case bit complexity of the best solution is still O(n^2 L + n^2 kappa) (where kappa is the security parameter), far from the Ω(n L + n^2) lower bound. The gap is significant given the practical use of consensus primitives, where values typically consist of batches of large size (L > n). This paper shows how to narrow the aforementioned gap while achieving optimal linear latency. Namely, we present a new algorithm, DARE (Disperse, Agree, REtrieve), that improves upon the O(n^2 L) term via a novel dispersal primitive. DARE achieves O(n^{1.5} L + n^{2.5} kappa) bit complexity, an effective sqrt{n}-factor improvement over the state-of-the-art (when L > n kappa). Moreover, we show that employing heavier cryptographic primitives, namely STARK proofs, allows us to devise DARE-Stark, a version of DARE which achieves the near-optimal bit complexity of O(n L + n^2 poly(kappa)). Both DARE and DARE-Stark achieve optimal O(n) latency. △ Less

Submitted 7 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2304.13540 [pdf, ps, other]

Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search

Authors: Andrei Kucharavy, Matteo Monti, Rachid Guerraoui, Ljiljana Dolamic

Abstract: Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them. Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly… ▽ More Modern machine learning (ML) models are capable of impressive performances. However, their prowess is not due only to the improvements in their architecture and training algorithms but also to a drastic increase in computational power used to train them. Such a drastic increase led to a growing interest in distributed ML, which in turn made worker failures and adversarial attacks an increasingly pressing concern. While distributed byzantine resilient algorithms have been proposed in a differentiable setting, none exist in a gradient-free setting. The goal of this work is to address this shortcoming. For that, we introduce a more general definition of byzantine-resilience in ML - the \textit{model-consensus}, that extends the definition of the classical distributed consensus. We then leverage this definition to show that a general class of gradient-free ML algorithms - ($1,λ$)-Evolutionary Search - can be combined with classical distributed consensus algorithms to generate gradient-free byzantine-resilient distributed learning algorithms. We provide proofs and pseudo-code for two specific cases - the Total Order Broadcast and proof-of-work leader election. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 10 pages, 4 listings, 2 theorems

ACM Class: I.2.11; D.1.3; F.1.2

arXiv:2304.08968 [pdf, other]

Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs

Authors: Da Silva Gameiro Henrique, Andrei Kucharavy, Rachid Guerraoui

Abstract: The self-attention revolution allowed generative language models to scale and achieve increasingly impressive abilities. Such models - commonly referred to as Large Language Models (LLMs) - have recently gained prominence with the general public, thanks to conversational fine-tuning, putting their behavior in line with public expectations regarding AI. This prominence amplified prior concerns rega… ▽ More The self-attention revolution allowed generative language models to scale and achieve increasingly impressive abilities. Such models - commonly referred to as Large Language Models (LLMs) - have recently gained prominence with the general public, thanks to conversational fine-tuning, putting their behavior in line with public expectations regarding AI. This prominence amplified prior concerns regarding the misuse of LLMs and led to the emergence of numerous tools to detect LLMs in the wild. Unfortunately, most such tools are critically flawed. While major publications in the LLM detectability field suggested that LLMs were easy to detect with fine-tuned autoencoders, the limitations of their results are easy to overlook. Specifically, they assumed publicly available generative models without fine-tunes or non-trivial prompts. While the importance of these assumptions has been demonstrated, until now, it remained unclear how well such detection could be countered. Here, we show that an attacker with access to such detectors' reference human texts and output not only evades detection but can fully frustrate the detector training - with a reasonable budget and all its outputs labeled as such. Achieving it required combining common "reinforcement from critic" loss function modification and AdamW optimizer, which led to surprisingly good fine-tuning generalization. Finally, we warn against the temptation to transpose the conclusions obtained in RNN-driven text GANs to LLMs due to their better representative ability. These results have critical implications for the detection and prevention of malicious use of generative language models, and we hope they will aid the designers of generative models and detectors. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 15 pages, 6 figures; 10 pages, 7 figures Supplementary Materials; under review at ECML 2023

ACM Class: I.2.7; K.6.5

arXiv:2304.07081 [pdf, other]

Chop Chop: Byzantine Atomic Broadcast to the Network Limit

Authors: Martina Camaioni, Rachid Guerraoui, Matteo Monti, Pierre-Louis Roman, Manuel Vidigueira, Gauthier Voron

Abstract: At the heart of state machine replication, the celebrated technique enabling decentralized and secure universal computation, lies Atomic Broadcast, a fundamental communication primitive that orders, authenticates, and deduplicates messages. This paper presents Chop Chop, a Byzantine Atomic Broadcast system that amortizes the cost of ordering, authenticating and deduplicating messages, achieving "l… ▽ More At the heart of state machine replication, the celebrated technique enabling decentralized and secure universal computation, lies Atomic Broadcast, a fundamental communication primitive that orders, authenticates, and deduplicates messages. This paper presents Chop Chop, a Byzantine Atomic Broadcast system that amortizes the cost of ordering, authenticating and deduplicating messages, achieving "line rate" (i.e., closely matching the complexity of a protocol that does not ensure any ordering, authentication or Byzantine resilience) even when processing messages as small as 8 bytes. Chop Chop attains this performance by means of a new form of batching we call distillation. A distilled batch is a set of messages that are fast to authenticate and deduplicate, as well as order. Batches are distilled using a novel interactive mechanism involving brokers, an untrusted layer of facilitating processes between clients and servers. In a geo-distributed deployment of 64 medium-sized servers, with clients situated cross-cloud, Chop Chop processes 43,600,000 messages per second with an average latency of 3.6 seconds. Under the same conditions, state-of-the-art alternatives offer two orders of magnitude less throughput for the same latency. We showcase three simple Chop Chop applications: a Payment system, an Auction house and a "Pixel war" game, respectively achieving 32, 2.3 and 35 million operations per second. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2302.04787 [pdf, other]

On the Privacy-Robustness-Utility Trilemma in Distributed Learning

Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, John Stephan

Abstract: The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any alg… ▽ More The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines' data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data. △ Less

Submitted 29 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: Accepted paper at ICML

arXiv:2302.01772 [pdf, other]

Fixing by Mixing: A Recipe for Optimal Byzantine ML under Heterogeneity

Authors: Youssef Allouah, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, John Stephan

Abstract: Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine mach… ▽ More Byzantine machine learning (ML) aims to ensure the resilience of distributed learning algorithms to misbehaving (or Byzantine) machines. Although this problem received significant attention, prior works often assume the data held by the machines to be homogeneous, which is seldom true in practical settings. Data heterogeneity makes Byzantine ML considerably more challenging, since a Byzantine machine can hardly be distinguished from a non-Byzantine outlier. A few solutions have been proposed to tackle this issue, but these provide suboptimal probabilistic guarantees and fare poorly in practice. This paper closes the theoretical gap, achieving optimality and inducing good empirical results. In fact, we show how to automatically adapt existing solutions for (homogeneous) Byzantine ML to the heterogeneous setting through a powerful mechanism, we call nearest neighbor mixing (NNM), which boosts any standard robust distributed gradient descent variant to yield optimal Byzantine resilience under heterogeneity. We obtain similar guarantees (in expectation) by plugging NNM in the distributed stochastic heavy ball method, a practical substitute to distributed gradient descent. We obtain empirical results that significantly outperform state-of-the-art Byzantine ML solutions. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: Accepted paper at AISTATS 2023

arXiv:2301.04920 [pdf, other]

On the Validity of Consensus

Authors: Pierre Civit, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira

Abstract: The Byzantine consensus problem involves $n$ processes, out of which t < n could be faulty and behave arbitrarily. Three properties characterize consensus: (1) termination, requiring correct (non-faulty) processes to eventually reach a decision, (2) agreement, preventing them from deciding different values, and (3) validity, precluding ``unreasonable'' decisions. But, what is a reasonable decision… ▽ More The Byzantine consensus problem involves $n$ processes, out of which t < n could be faulty and behave arbitrarily. Three properties characterize consensus: (1) termination, requiring correct (non-faulty) processes to eventually reach a decision, (2) agreement, preventing them from deciding different values, and (3) validity, precluding ``unreasonable'' decisions. But, what is a reasonable decision? Strong validity, a classical property, stipulates that, if all correct processes propose the same value, only that value can be decided. Weak validity, another established property, stipulates that, if all processes are correct and they propose the same value, that value must be decided. The space of possible validity properties is vast. However, their impact on consensus remains unclear. This paper addresses the question of which validity properties allow Byzantine consensus to be solvable with partial synchrony, and at what cost. First, we determine necessary and sufficient conditions for a validity property to make the consensus problem solvable; we say that such validity properties are solvable. Notably, we prove that, if n <= 3t, all solvable validity properties are trivial (there exists an always-admissible decision). Furthermore, we show that, with any non-trivial (and solvable) validity property, consensus requires Omega(t^2) messages. This extends the seminal Dolev-Reischuk bound, originally proven for strong validity, to all non-trivial validity properties. Lastly, we give a general Byzantine consensus algorithm, we call Universal, for any solvable (and non-trivial) validity property. Importantly, Universal incurs O(n^2) message complexity. Thus, together with our lower bound, Universal implies a fundamental result in partial synchrony: with t \in Omega(n), the message complexity of all (non-trivial) consensus variants is Theta(n^2). △ Less

Submitted 25 June, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

Comments: The extended version of the PODC 2023 paper

arXiv:2210.17174 [pdf, other]

uBFT: Microsecond-scale BFT using Disaggregated Memory [Extended Version]

Authors: Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, Igor Zablotchi

Abstract: We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tail… ▽ More We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base -- disaggregated memory -- and consumes a practically bounded amount of memory. uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10us. This is at least 50$\times$ faster than MinBFT , a state of the art $2f{+}1$ BFT SMR based on Intel's SGX. We use uBFT to replicate two KV-stores (Memcached and Redis), as well as a financial order matching engine (Liquibook). These applications have low latency (up to 20us) and become Byzantine tolerant with as little as 10us more. The price for uBFT is a small amount of reliable disaggregated memory (less than 1 MiB), which in our prototype consists of a small number of memory servers connected through RDMA and replicated for fault tolerance. △ Less

Submitted 16 March, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.13242 [pdf, other]

SurferMonkey: A Decentralized Anonymous Blockchain Intercommunication System via Zero Knowledge Proofs

Authors: Miguel Díaz Montiel, Rachid Guerraoui, Pierre-Louis Roman

Abstract: Blockchain intercommunication systems enable the exchanges of messages between blockchains. This interoperability promotes innovation, unlocks liquidity and access to assets. However, blockchains are isolated systems that originally were not designed for interoperability. This makes cross-chain communication, or bridges for short, insecure by nature. More precisely, cross-chain systems face securi… ▽ More Blockchain intercommunication systems enable the exchanges of messages between blockchains. This interoperability promotes innovation, unlocks liquidity and access to assets. However, blockchains are isolated systems that originally were not designed for interoperability. This makes cross-chain communication, or bridges for short, insecure by nature. More precisely, cross-chain systems face security challenges in terms of selfish rational players such as maximal extractable value (MEV) and censorship. We propose to solve these challenges using zero knowledge proofs (ZKPs) for cross-chain communication. Securing cross-chain communication is remarkably more complex than securing single-chain events as such a system must preserve user security against both on- and off-chain analysis. To achieve this goal, we propose the following pair of contributions: the DACT protocol and the SurferMonkey infrastructure that supports the DACT protocol. The DACT protocol is a global solution for the anonymity and security challenges of agnostic blockchain intercommunication. DACT breaks on- and off-chain analysis thanks to the use of ZKPs. SurferMonkey is a decentralized infrastructure that implements DACT in practice. Since SurferMonkey works at the blockchain application layer, any decentralized application (dApp) can use SurferMonkey to send any type of message to a dApp on another blockchain. With SurferMonkey, users can neither be censored nor be exposed to MEV. By applying decentralized proactive security, we obtain resilience against selfish rational players, and raise the security bar against cyberattacks. We have implemented a proof of concept (PoC) of SurferMonkey by reverse engineering Tornado Cash and by applying IDEN3 ZKP circuits. SurferMonkey enables new usecases, ranging from anonymous voting and gaming, to a new phase of anonymous decentralized finance (aDeFi). △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.08650 [pdf, other]

Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores

Authors: Arsany Guirguis, Diana Petrescu, Florin Dinu, Do Le Quoc, Javier Picorel, Rachid Guerraoui

Abstract: Near-data computation techniques have been successfully deployed to mitigate the cloud network bottleneck between the storage and compute tiers. At Huawei, we are currently looking to get more value from these techniques by broadening their applicability. Machine learning (ML) applications are an appealing and timely target. This paper describes our experience applying near-data computation techni… ▽ More Near-data computation techniques have been successfully deployed to mitigate the cloud network bottleneck between the storage and compute tiers. At Huawei, we are currently looking to get more value from these techniques by broadening their applicability. Machine learning (ML) applications are an appealing and timely target. This paper describes our experience applying near-data computation techniques to transfer learning (TL), a widely popular ML technique, in the context of disaggregated cloud object stores. Our techniques benefit both cloud providers and users. They improve our operational efficiency while providing users the performance improvements they demand from us. The main practical challenge to consider is that the storage-side computational resources are limited. Our approach is to split the TL deep neural network (DNN) during the feature extraction phase, before the training phase. This reduces the network transfers to the compute tier and further decouples the batch size of feature extraction from the training batch size. This facilitates our second technique, storage-side batch adaptation, which enables increased concurrency in the storage tier while avoiding out-of-memory errors. Guided by these insights, we present HAPI, our processing system for TL that spans the compute and storage tiers while remaining transparent to the user. Our evaluation with several state-of-the-art DNNs, such as ResNet, VGG, and Transformer, shows up to 11x improvement in application runtime and up to 8.3x reduction in the data transferred from the storage to the compute tier compared to running the computation entirely in the compute tier. △ Less

Submitted 9 January, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

Comments: 14 pages, 14 figures, 5 tables

arXiv:2209.15259 [pdf, ps, other]

On the Impossible Safety of Large AI Models

Authors: El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê-Nguyên Hoang, Rafael Pinot, Sébastien Rouault, John Stephan

Abstract: Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging featur… ▽ More Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees. △ Less

Submitted 9 May, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 40 pages

arXiv:2209.13304 [pdf, other]

Oracular Byzantine Reliable Broadcast [Extended Version]

Authors: Martina Camaioni, Rachid Guerraoui, Matteo Monti, Manuel Vidigueira

Abstract: Byzantine Reliable Broadcast (BRB) is a fundamental distributed computing primitive, with applications ranging from notifications to asynchronous payment systems. Motivated by practical consideration, we study Client-Server Byzantine Reliable Broadcast (CSB), a multi-shot variant of BRB whose interface is split between broadcasting clients and delivering servers. We present Draft, an optimally res… ▽ More Byzantine Reliable Broadcast (BRB) is a fundamental distributed computing primitive, with applications ranging from notifications to asynchronous payment systems. Motivated by practical consideration, we study Client-Server Byzantine Reliable Broadcast (CSB), a multi-shot variant of BRB whose interface is split between broadcasting clients and delivering servers. We present Draft, an optimally resilient implementation of CSB. Like most implementations of BRB, Draft guarantees both liveness and safety in an asynchronous environment. Under good conditions, however, Draft achieves unparalleled efficiency. In a moment of synchrony, free from Byzantine misbehaviour, and at the limit of infinitely many broadcasting clients, a Draft server delivers a $b$-bits payload at an asymptotic amortized cost of $0$ signature verifications, and $\log_2(c) + b$ bits exchanged, where $c$ is the number of clients in the system. This is the information-theoretical minimum number of bits required to convey the payload ($b$ bits, assuming it is compressed), along with an identifier for its sender ($\log_2(c)$ bits, necessary to enumerate any set of $c$ elements, and optimal if broadcasting frequencies are uniform or unknown). These two achievements have profound practical implications. Real-world BRB implementations are often bottlenecked either by expensive signature verifications, or by communication overhead. For Draft, instead, the network is the limit: a server can deliver payloads as quickly as it would receive them from an infallible oracle. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.10931 [pdf, other]

Robust Collaborative Learning with Linear Gradient Overhead

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê Nguyên Hoang, Rafael Pinot, John Stephan

Abstract: Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous da… ▽ More Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak's momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(α, λ)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning. △ Less

Submitted 3 June, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: Accepted paper at ICML 2023

arXiv:2209.09580 [pdf, other]

Carbon: An Asynchronous Voting-Based Payment System for a Client-Server Architecture

Authors: Martina Camaioni, Rachid Guerraoui, Jovan Komatovic, Matteo Monti, Manuel Vidigueira

Abstract: We present Carbon, an asynchronous payment system. To the best of our knowledge, Carbon is the first asynchronous payment system designed specifically for a client-server architecture. Namely, besides being able to make payments, clients of Carbon are capable of changing the set of running servers using a novel voting mechanism -- asynchronous, balance-based voting. We present Carbon, an asynchronous payment system. To the best of our knowledge, Carbon is the first asynchronous payment system designed specifically for a client-server architecture. Namely, besides being able to make payments, clients of Carbon are capable of changing the set of running servers using a novel voting mechanism -- asynchronous, balance-based voting. △ Less

Submitted 30 September, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2208.09262 [pdf, other]

Byzantine Consensus is Θ(n^2): The Dolev-Reischuk Bound is Tight even in Partial Synchrony! [Extended Version]

Authors: Pierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Vincent Gramoli, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira

Abstract: The Dolev-Reischuk bound says that any deterministic Byzantine consensus protocol has (at least) quadratic communication complexity in the worst case. While it has been shown that the bound is tight in synchronous environments, it is still unknown whether a consensus protocol with quadratic communication complexity can be obtained in partial synchrony. Until now, the most efficient known solutions… ▽ More The Dolev-Reischuk bound says that any deterministic Byzantine consensus protocol has (at least) quadratic communication complexity in the worst case. While it has been shown that the bound is tight in synchronous environments, it is still unknown whether a consensus protocol with quadratic communication complexity can be obtained in partial synchrony. Until now, the most efficient known solutions for Byzantine consensus in partially synchronous settings had cubic communication complexity (e.g., HotStuff, binary DBFT). This paper closes the existing gap by introducing SQuad, a partially synchronous Byzantine consensus protocol with quadratic worst-case communication complexity. In addition, SQuad is optimally-resilient and achieves linear worst-case latency complexity. The key technical contribution underlying SQuad lies in the way we solve view synchronization, the problem of bringing all correct processes to the same view with a correct leader for sufficiently long. Concretely, we present RareSync, a view synchronization protocol with quadratic communication complexity and linear latency complexity, which we utilize in order to obtain SQuad. △ Less

Submitted 6 September, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

arXiv:2205.12173 [pdf, other]

Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, John Stephan

Abstract: Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arg… ▽ More Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine}) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i.e., often violated in practice, and (b) heterogeneous, i.e., making it difficult to compare approaches. We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: \emph{resilient averaging} at the server and \emph{distributed momentum} at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: Accepted at ICML 2022

arXiv:2202.08656 [pdf, other]

Robust Sparse Voting

Authors: Youssef Allouah, Rachid Guerraoui, Lê-Nguyên Hoang, Oscar Villemaud

Abstract: Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to ha… ▽ More Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to hack the voting process by reporting dishonest scores. We give a precise definition of the problem of robust sparse voting, highlight its underlying technical challenges, and present a novel voting mechanism addressing the problem. We prove that, using this mechanism, no voter can have more than a small parameterizable effect on each alternative's score; a property we call Lipschitz resilience. We also identify conditions of voters comparability under which any unanimous preferences can be recovered, even when each voter provides sparse scores, on a scale that is potentially very different from any other voter's score scale. Proving these properties required us to introduce, analyze and carefully compose novel aggregation primitives which could be of independent interest. △ Less

Submitted 25 January, 2024; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: Accepted at AISTATS 2024

arXiv:2202.08578 [pdf, other]

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang, Oscar Villemaud

Abstract: To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this mod… ▽ More To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models. △ Less

Submitted 20 July, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.02398

Journal ref: ICML 2022

arXiv:2110.03991 [pdf, other]

Combining Differential Privacy and Byzantine Resilience in Distributed SGD

Authors: Rachid Guerraoui, Nirupam Gupta, Rafael Pinot, Sebastien Rouault, John Stephan

Abstract: Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architectu… ▽ More Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(α,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(α,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR. △ Less

Submitted 5 October, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

arXiv:2108.01330 [pdf, other]

Frugal Byzantine Computing

Authors: M. K. Aguilera, N. Ben-David, R. Guerraoui, D. Papuc, A. Xygkis, I. Zablotchi

Abstract: Traditional techniques for handling Byzantine failures are expensive: digital signatures are too costly, while using $3f{+}1$ replicas is uneconomical ($f$ denotes the maximum number of Byzantine processes). We seek algorithms that reduce the number of replicas to $2f{+}1$ and minimize the number of signatures. While the first goal can be achieved in the message-and-memory model, accomplishing the… ▽ More Traditional techniques for handling Byzantine failures are expensive: digital signatures are too costly, while using $3f{+}1$ replicas is uneconomical ($f$ denotes the maximum number of Byzantine processes). We seek algorithms that reduce the number of replicas to $2f{+}1$ and minimize the number of signatures. While the first goal can be achieved in the message-and-memory model, accomplishing the second goal simultaneously is challenging. We first address this challenge for the problem of broadcasting messages reliably. We consider two variants of this problem, Consistent Broadcast and Reliable Broadcast, typically considered very close. Perhaps surprisingly, we establish a separation between them in terms of signatures required. In particular, we show that Consistent Broadcast requires at least 1 signature in some execution, while Reliable Broadcast requires $O(n)$ signatures in some execution. We present matching upper bounds for both primitives within constant factors. We then turn to the problem of consensus and argue that this separation matters for solving consensus with Byzantine failures: we present a practical consensus algorithm that uses Consistent Broadcast as its main communication primitive. This algorithm works for $n=2f{+}1$ and avoids signatures in the common-case -- properties that have not been simultaneously achieved previously. Overall, our work approaches Byzantine computing in a frugal manner and motivates the use of Consistent Broadcast -- rather than Reliable Broadcast -- as a key primitive for reaching agreement. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: This paper is an extended version of the DISC 2021 paper

arXiv:2106.08676 [pdf, ps, other]

Velos: One-sided Paxos for RDMA applications

Authors: Rachid Guerraoui, Antoine Murat, Athanasios Xygkis

Abstract: Modern data centers are becoming increasingly equipped with RDMA-capable NICs. These devices enable distributed systems to rely on algorithms designed for shared memory. RDMA allows consensus to terminate within a few microsecond in failure-free scenarios, yet, RDMA-optimized algorithms still use expensive two-sided operations in case of failure. In this work, we present a new leader-based consens… ▽ More Modern data centers are becoming increasingly equipped with RDMA-capable NICs. These devices enable distributed systems to rely on algorithms designed for shared memory. RDMA allows consensus to terminate within a few microsecond in failure-free scenarios, yet, RDMA-optimized algorithms still use expensive two-sided operations in case of failure. In this work, we present a new leader-based consensus algorithm that relies solely on one-sided RDMA verbs. Our algorithm is based on Paxos, it decides in a single one-sided RDMA operation in the common case, and changes leader also in a single one-sided RDMA operation in case of failure. We implement our algorithm in the form of an SMR system named Velos, and we evaluated our system against the state-of-the-art competitor Mu. Compared to Mu, our solution adds a small overhead of approximately 0.6 microseconds in failure-free executions and shines during failover periods during which it is 13 times faster in changing leader. △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2106.02398 [pdf, other]

Strategyproof Learning: Building Trustworthy User-Generated Datasets

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang

Abstract: We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality. By leveraging a careful design of the loss function, we propose Licchavi, a global and personalized learning framework with provable strategyproofness guarantees. Essentially, we prove that no user can gain much by replying to Licchavi's queries with answers that deviate from their true preference… ▽ More We prove in this paper that, perhaps surprisingly, incentivizing data misreporting is not a fatality. By leveraging a careful design of the loss function, we propose Licchavi, a global and personalized learning framework with provable strategyproofness guarantees. Essentially, we prove that no user can gain much by replying to Licchavi's queries with answers that deviate from their true preferences. Interestingly, Licchavi also promotes the desirable "one person, one unit-force vote" fairness principle. Furthermore, our empirical evaluation of its performance showcases Licchavi's real-world applicability. We believe that our results are critical for the safety of any learning scheme that leverages user-generated data. △ Less

Submitted 18 February, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: 31 pages

arXiv:2106.02394 [pdf, other]

On the Strategyproofness of the Geometric Median

Authors: El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang

Abstract: The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness… ▽ More The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness. While we observe that the geometric median is not even approximately strategyproof, we prove that it is asymptotically $α$-strategyproof: when the number of users is large enough, a user that misbehaves can gain at most a multiplicative factor $α$, which we compute as a function of the distribution followed by the users. We then generalize our results to the case where users actually care more about specific dimensions, determining how this impacts $α$. We also show how the skewed geometric medians can be used to improve strategyproofness. △ Less

Submitted 2 June, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Accepted paper at AISTATS 2023

arXiv:2102.08166 [pdf, other]

Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?

Authors: Rachid Guerraoui, Nirupam Gupta, Rafaël Pinot, Sébastien Rouault, John Stephan

Abstract: This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theor… ▽ More This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study if a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and $(α,f)$-Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. A key finding of our analyses is that the classical approaches to these two (seemingly) orthogonal issues are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters of the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets; showing that it is impractical to ensure DP and Byzantine resilience simultaneously. △ Less

Submitted 24 June, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

arXiv:2010.06288 [pdf, other]

Microsecond Consensus for Microsecond Applications

Authors: Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra J. Marathe, Athanasios Xygkis, Igor Zablotchi

Abstract: We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memo… ▽ More We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%. Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more - challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design. We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: Full version of OSDI'20 paper

arXiv:2010.05888 [pdf, other]

Garfield: System Support for Byzantine Machine Learning

Authors: Rachid Guerraoui, Arsany Guirguis, Jérémy Max Plassmann, Anton Alexandre Ragot, Sébastien Rouault

Abstract: We present Garfield, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine-resilient. Garfield relies on a novel object-oriented design, reducing the coding effort, and addressing the vulnerability of the shared-graph architecture followed by classical ML frameworks. Garfield encompasses va… ▽ More We present Garfield, a library to transparently make machine learning (ML) applications, initially built with popular (but fragile) frameworks, e.g., TensorFlow and PyTorch, Byzantine-resilient. Garfield relies on a novel object-oriented design, reducing the coding effort, and addressing the vulnerability of the shared-graph architecture followed by classical ML frameworks. Garfield encompasses various communication patterns and supports computations on CPUs and GPUs, allowing addressing the general question of the very practical cost of Byzantine resilience in SGD-based ML applications. We report on the usage of Garfield on three main ML architectures: (a) a single server with multiple workers, (b) several servers and workers, and (c) peer-to-peer settings. Using Garfield, we highlight several interesting facts about the cost of Byzantine resilience. In particular, (a) Byzantine resilience, unlike crash resilience, induces an accuracy loss, (b) the throughput overhead comes more from communication than from robust aggregation, and (c) tolerating Byzantine servers costs more than tolerating Byzantine workers. △ Less

Submitted 31 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 31 pages; 16 figures; 2 tables

arXiv:2008.02527 [pdf, other]

Efficient Multi-word Compare and Swap

Authors: Rachid Guerraoui, Alex Kogan, Virendra J. Marathe, Igor Zablotchi

Abstract: Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes requi… ▽ More Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes required to implement MCAS. We first prove in this paper that it is impossible to "pack" the information required to perform a k-word CAS (k-CAS) in less than k locations to be CASed. Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case. We implement our algorithm and show that it outperforms a state-of-the-art baseline in a variety of benchmarks in most considered workloads. We also present a durably linearizable (persistent memory friendly) version of our MCAS algorithm using only 2 persistence fences per call, while still only requiring k+1 CASes per k-CAS. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: Full version of DISC '20 paper

arXiv:2008.00742 [pdf, other]

Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)

Authors: El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Arsany Guirguis, Lê Nguyên Hoang, Sébastien Rouault

Abstract: We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f < n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial v… ▽ More We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f < n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial vector and seek to approximately agree on a common vector, which is close to the average of honest nodes' initial vectors. We present two asynchronous solutions to averaging agreement, each we prove optimal according to some dimension. The first, based on the minimum-diameter averaging, requires $ n \geq 6f+1$, but achieves asymptotically the best-possible averaging constant up to a multiplicative constant. The second, based on reliable broadcast and coordinate-wise trimmed mean, achieves optimal Byzantine resilience, i.e., $n \geq 3f+1$. Each of these algorithms induces an optimal Byzantine collaborative learning protocol. In particular, our equivalence yields new impossibility theorems on what any collaborative learning algorithm can achieve in adversarial and heterogeneous environments. △ Less

Submitted 1 December, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: 34 pages, 1 figure

Journal ref: NeurIPS 2021

arXiv:2006.07273 [pdf, other]

doi 10.1145/3423211.3425685

FLeet: Online Federated Learning via Staleness Awareness and Performance Prediction

Authors: Georgios Damaskinos, Rachid Guerraoui, Anne-Marie Kermarrec, Vlad Nitu, Rhicheek Patra, Francois Taiani

Abstract: Federated Learning (FL) is very appealing for its privacy benefits: essentially, a global model is trained with updates computed on mobile devices while kee** the data of users local. Standard FL infrastructures are however designed to have no energy or performance impact on mobile devices, and are therefore not suitable for applications that require frequent (online) model updates, such as news… ▽ More Federated Learning (FL) is very appealing for its privacy benefits: essentially, a global model is trained with updates computed on mobile devices while kee** the data of users local. Standard FL infrastructures are however designed to have no energy or performance impact on mobile devices, and are therefore not suitable for applications that require frequent (online) model updates, such as news recommenders. This paper presents FLeet, the first Online FL system, acting as a middleware between the Android OS and the machine learning application. FLeet combines the privacy of Standard FL with the precision of online learning thanks to two core components: (i) I-Prof, a new lightweight profiler that predicts and controls the impact of learning tasks on mobile devices, and (ii) AdaSGD, a new adaptive learning algorithm that is resilient to delayed updates. Our extensive evaluation shows that Online FL, as implemented by FLeet, can deliver a 2.3x quality boost compared to Standard FL, while only consuming 0.036% of the battery per day. I-Prof can accurately control the impact of learning tasks by improving the prediction accuracy up to 3.6x (computation time) and up to 19x (energy). AdaSGD outperforms alternative FL approaches by 18.4% in terms of convergence speed on heterogeneous data. △ Less

Submitted 3 December, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.07272 [pdf, other]

Differentially Private Stochastic Coordinate Descent

Authors: Georgios Damaskinos, Celestine Mendler-Dünner, Rachid Guerraoui, Nikolaos Papandreou, Thomas Parnell

Abstract: In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on kee** auxiliary information in… ▽ More In this paper we tackle the challenge of making the stochastic coordinate descent algorithm differentially private. Compared to the classical gradient descent algorithm where updates operate on a single model vector and controlled noise addition to this vector suffices to hide critical information about individuals, stochastic coordinate descent crucially relies on kee** auxiliary information in memory during training. This auxiliary information provides an additional privacy leak and poses the major challenge addressed in this work. Driven by the insight that under independent noise addition, the consistency of the auxiliary information holds in expectation, we present DP-SCD, the first differentially private stochastic coordinate descent algorithm. We analyze our new method theoretically and argue that decoupling and parallelizing coordinate updates is essential for its utility. On the empirical side we demonstrate competitive performance against the popular stochastic gradient descent alternative (DP-SGD) while requiring significantly less tuning. △ Less

Submitted 14 March, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.04720 [pdf, other]

Host-Pathongen Co-evolution Inspired Algorithm Enables Robust GAN Training

Authors: Andrei Kucharavy, El Mahdi El Mhamdi, Rachid Guerraoui

Abstract: Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the ge… ▽ More Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the generation of impressive imitations of real-life films, images and texts, whose fakeness is barely noticeable to humans. Despite their impressive performance, training GANs remains to this day more of an art than a reliable procedure, in a large part due to training process stability. Generators are susceptible to mode drop** and convergence to random patterns, which have to be mitigated by computationally expensive multiple restarts. Curiously, GANs bear an uncanny similarity to a co-evolution of a pathogen and its host's immune system in biology. In a biological context, the majority of potential pathogens indeed never make it and are kept at bay by the hots' immune system. Yet some are efficient enough to present a risk of a serious condition and recurrent infections. Here, we explore that similarity to propose a more robust algorithm for GANs training. We empirically show the increased stability and a better ability to generate high-quality images while using less computational power. △ Less

Submitted 9 June, 2020; v1 submitted 22 May, 2020; originally announced June 2020.

Comments: 8 pages, 10 figures

MSC Class: 92B20; 68T05; ACM Class: I.5.2

arXiv:2004.13184 [pdf, other]

Online Payments by Merely Broadcasting Messages (Extended Version)

Authors: Daniel Collins, Rachid Guerraoui, Jovan Komatovic, Matteo Monti, Athanasios Xygkis, Matej Pavlovic, Petr Kuznetsov, Yvonne-Anne Pignolet, Dragos-Adrian Seredinschi, Andrei Tonkikh

Abstract: We address the problem of online payments, where users can transfer funds among themselves. We introduce Astro, a system solving this problem efficiently in a decentralized, deterministic, and completely asynchronous manner. Astro builds on the insight that consensus is unnecessary to prevent double-spending. Instead of consensus, Astro relies on a weaker primitive---Byzantine reliable broadcast--… ▽ More We address the problem of online payments, where users can transfer funds among themselves. We introduce Astro, a system solving this problem efficiently in a decentralized, deterministic, and completely asynchronous manner. Astro builds on the insight that consensus is unnecessary to prevent double-spending. Instead of consensus, Astro relies on a weaker primitive---Byzantine reliable broadcast---enabling a simpler and more efficient implementation than consensus-based payment systems. In terms of efficiency, Astro executes a payment by merely broadcasting a message. The distinguishing feature of Astro is that it can maintain performance robustly, i.e., remain unaffected by a fraction of replicas being compromised or slowed down by an adversary. Our experiments on a public cloud network show that Astro can achieve near-linear scalability in a sharded setup, going from $10K$ payments/sec (2 shards) to $20K$ payments/sec (4 shards). In a nutshell, Astro can match VISA-level average payment throughput, and achieves a $5\times$ improvement over a state-of-the-art consensus-based solution, while exhibiting sub-second $95^{th}$ percentile latency. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: This is an extended version of a conference article, appearing in the proceedings of the 50th IEEE/IFIP Int. Conference on Dependable Systems and Networks (DSN 2020). This work has been supported in part by the European grant 862082, AT2 -- ERC-2019-PoC, and in part by a grant from Interchain Foundation

arXiv:2003.00010 [pdf, other]

Distributed Momentum for Byzantine-resilient Learning

Authors: El-Mahdi El-Mhamdi, Rachid Guerraoui, Sébastien Rouault

Abstract: Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules… ▽ More Momentum is a variant of gradient descent that has been proposed for its benefits on convergence. In a distributed setting, momentum can be implemented either at the server or the worker side. When the aggregation rule used by the server is linear, commutativity with addition makes both deployments equivalent. Robustness and privacy are however among motivations to abandon linear aggregation rules. In this work, we demonstrate the benefits on robustness of using momentum at the worker side. We first prove that computing momentum at the workers reduces the variance-norm ratio of the gradient estimation at the server, strengthening Byzantine resilient aggregation rules. We then provide an extensive experimental demonstration of the robustness effect of worker-side momentum on distributed SGD. △ Less

Submitted 9 March, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

Comments: Source code (for academic use only): https://github.com/LPD-EPFL/ByzantineMomentum

arXiv:2001.06271 [pdf, other]

Dynamic Byzantine Reliable Broadcast [Technical Report]

Authors: Rachid Guerraoui, Jovan Komatovic, Petr Kuznetsov, Yvonne-Anne Pignolet, Dragos-Adrian Seredinschi, Andrei Tonkikh

Abstract: Reliable broadcast is a communication primitive guaranteeing, intuitively, that all processes in a distributed system deliver the same set of messages. The reason why this primitive is appealing is twofold: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) reliable broadcast is power… ▽ More Reliable broadcast is a communication primitive guaranteeing, intuitively, that all processes in a distributed system deliver the same set of messages. The reason why this primitive is appealing is twofold: (i) we can implement it deterministically in a completely asynchronous environment, unlike stronger primitives like consensus and total-order broadcast, and yet (ii) reliable broadcast is powerful enough to implement important applications like payment systems. The problem we tackle in this paper is that of dynamic reliable broadcast, i.e., enabling processes to join or leave the system. This property is desirable for long-lived applications (aiming to be highly available), yet has been precluded in previous asynchronous reliable broadcast protocols. We study this property in a general adversarial (i.e., Byzantine) environment. We introduce the first specification of a dynamic Byzantine reliable broadcast (DBRB) primitive that is amenable to an asynchronous implementation. We then present an algorithm implementing this specification in an asynchronous network. Our DBRB algorithm ensures that if any correct process in the system broadcasts a message, then every correct process delivers that message unless it leaves the system. Moreover, if a correct process delivers a message, then every correct process that has not expressed its will to leave the system delivers that message. We assume that more than $2/3$ of processes in the system are correct at all times, which is tight in our context. We also show that if only one process in the system can fail---and it can fail only by crashing---then it is impossible to implement a stronger primitive, ensuring that if any correct process in the system broadcasts or delivers a message, then every correct process in the system delivers that message---including those that leave. △ Less

Submitted 20 November, 2020; v1 submitted 17 January, 2020; originally announced January 2020.

Comments: This is an extended version of a conference article, appearingin the proceedings of the 24th Int. Conference on Principles of Distributed Systems (OPODIS 2020). This work has been supported in part by a grant from Interchain Foundation

Showing 1–50 of 86 results for author: GuerraouI, R