-
Neuro-mimetic Task-free Unsupervised Online Learning with Continual Self-Organizing Maps
Authors:
Hitesh Vaidya,
Travis Desell,
Ankur Mali,
Alexander Ororbia
Abstract:
An intelligent system capable of continual learning is one that can process and extract knowledge from potentially infinitely long streams of pattern vectors. The major challenge that makes crafting such a system difficult is known as catastrophic forgetting - an agent, such as one based on artificial neural networks (ANNs), struggles to retain previously acquired knowledge when learning from new…
▽ More
An intelligent system capable of continual learning is one that can process and extract knowledge from potentially infinitely long streams of pattern vectors. The major challenge that makes crafting such a system difficult is known as catastrophic forgetting - an agent, such as one based on artificial neural networks (ANNs), struggles to retain previously acquired knowledge when learning from new samples. Furthermore, ensuring that knowledge is preserved for previous tasks becomes more challenging when input is not supplemented with task boundary information. Although forgetting in the context of ANNs has been studied extensively, there still exists far less work investigating it in terms of unsupervised architectures such as the venerable self-organizing map (SOM), a neural model often used in clustering and dimensionality reduction. While the internal mechanisms of SOMs could, in principle, yield sparse representations that improve memory retention, we observe that, when a fixed-size SOM processes continuous data streams, it experiences concept drift. In light of this, we propose a generalization of the SOM, the continual SOM (CSOM), which is capable of online unsupervised learning under a low memory budget. Our results, on benchmarks including MNIST, Kuzushiji-MNIST, and Fashion-MNIST, show almost a two times increase in accuracy, and CIFAR-10 demonstrates a state-of-the-art result when tested on (online) unsupervised class incremental learning setting.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Byzantine fault-tolerant distributed set intersection with redundancy
Authors:
Shuo Liu,
Nitin H. Vaidya
Abstract:
In this report, we study the problem of Byzantine fault-tolerant distributed set intersection and the importance of redundancy in solving this problem. Specifically, consider a distributed system with $n$ agents, each of which has a local set. There are up to $f$ agents that are Byzantine faulty. The goal is to find the intersection of the sets of the non-faulty agents.
We derive the Byzantine s…
▽ More
In this report, we study the problem of Byzantine fault-tolerant distributed set intersection and the importance of redundancy in solving this problem. Specifically, consider a distributed system with $n$ agents, each of which has a local set. There are up to $f$ agents that are Byzantine faulty. The goal is to find the intersection of the sets of the non-faulty agents.
We derive the Byzantine set intersection problem from the Byzantine optimization problem. We present the definition of $2f$-redundancy, and identify the necessary and sufficient condition if the Byzantine set intersection problem can be solved if a certain redundancy property is satisfied, and then present an equivalent condition. We further extend our results to arbitrary communication graphs in a decentralized setting. Finally, we present solvability results for the Byzantine optimization problem, inspired by our findings on Byzantine set intersection. The results we provide are for synchronous and asynchronous systems both.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Impact of Redundancy on Resilience in Distributed Optimization and Learning
Authors:
Shuo Liu,
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
This report considers the problem of resilient distributed optimization and stochastic learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has its own local cost function. The agents collaborate with the server to find a minimum of the aggregate of the local cost functions. In the context of stochastic learning, the local cost of an agent is…
▽ More
This report considers the problem of resilient distributed optimization and stochastic learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has its own local cost function. The agents collaborate with the server to find a minimum of the aggregate of the local cost functions. In the context of stochastic learning, the local cost of an agent is the loss function computed over the data at that agent. In this report, we consider this problem in a system wherein some of the agents may be Byzantine faulty and some of the agents may be slow (also called stragglers). In this setting, we investigate the conditions under which it is possible to obtain an "approximate" solution to the above problem. In particular, we introduce the notion of $(f, r; ε)$-resilience to characterize how well the true solution is approximated in the presence of up to $f$ Byzantine faulty agents, and up to $r$ slow agents (or stragglers) -- smaller $ε$ represents a better approximation. We also introduce a measure named $(f, r; ε)$-redundancy to characterize the redundancy in the cost functions of the agents. Greater redundancy allows for a better approximation when solving the problem of aggregate cost minimization.
In this report, we constructively show (both theoretically and empirically) that $(f, r; \mathcal{O}(ε))$-resilience can indeed be achieved in practice, given that the local cost functions are sufficiently redundant.
△ Less
Submitted 14 December, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Reducing Catastrophic Forgetting in Self Organizing Maps with Internally-Induced Generative Replay
Authors:
Hitesh Vaidya,
Travis Desell,
Alexander Ororbia
Abstract:
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents that adapt in this way is that neural systems struggle to retain previously-acquired knowledge when learning from new samples. This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain…
▽ More
A lifelong learning agent is able to continually learn from potentially infinite streams of pattern sensory data. One major historic difficulty in building agents that adapt in this way is that neural systems struggle to retain previously-acquired knowledge when learning from new samples. This problem is known as catastrophic forgetting (interference) and remains an unsolved problem in the domain of machine learning to this day. While forgetting in the context of feedforward networks has been examined extensively over the decades, far less has been done in the context of alternative architectures such as the venerable self-organizing map (SOM), an unsupervised neural model that is often used in tasks such as clustering and dimensionality reduction. Although the competition among its internal neurons might carry the potential to improve memory retention, we observe that a fixed-sized SOM trained on task incremental data, i.e., it receives data points related to specific classes at certain temporal increments, experiences significant forgetting. In this study, we propose the continual SOM (c-SOM), a model that is capable of reducing its own forgetting when processing information.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Byzantine Consensus in Directed Hypergraphs
Authors:
Muhammad Samir Khan,
Nitin H. Vaidya
Abstract:
Byzantine consensus is a classical problem in distributed computing. Each node in a synchronous system starts with a binary input. The goal is to reach agreement in the presence of Byzantine faulty nodes. We consider the setting where communication between nodes is modelled via a directed hypergraph. In the classical point-to-point communication model, the communication between nodes is modelled a…
▽ More
Byzantine consensus is a classical problem in distributed computing. Each node in a synchronous system starts with a binary input. The goal is to reach agreement in the presence of Byzantine faulty nodes. We consider the setting where communication between nodes is modelled via a directed hypergraph. In the classical point-to-point communication model, the communication between nodes is modelled as a simple graph where all messages sent on an edge are private between the two endpoints of the edge. This allows a faulty node to equivocate, i.e., lie differently to its different neighbors. Different models have been proposed in the literature that weaken equivocation. In the local broadcast model, every message transmitted by a node is received identically and correctly by all of its neighbors. In the hypergraph model, every message transmitted by a node on a hyperedge is received identically and correctly by all nodes on the hyperedge. Tight network conditions are known for each of the three cases for undirected (hyper)graphs. For the directed models, tight conditions are known for the point-to-point and local broadcast models.
In this work, we consider the directed hypergraph model that encompasses all the models above. Each directed hyperedge consists of a single head (sender) and at least one tail (receiver), This models a local multicast channel where messages transmitted by the sender are received identically by all the receivers in the hyperedge. For this model, we identify tight network conditions for consensus. We observe how the directed hypergraph model reduces to each of the three models above under specific conditions. In each case, we relate our network condition to the corresponding known tight conditions. The directed hypergraph model also encompasses other practical network models of interest that have not been explored previously, as elaborated in the paper.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Asynchronous Distributed Optimization with Redundancy in Cost Functions
Authors:
Shuo Liu,
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
This paper considers the problem of asynchronous distributed multi-agent optimization on server-based system architecture. In this problem, each agent has a local cost, and the goal for the agents is to collectively find a minimum of their aggregate cost. A standard algorithm to solve this problem is the iterative distributed gradient-descent (DGD) method being implemented collaboratively by the s…
▽ More
This paper considers the problem of asynchronous distributed multi-agent optimization on server-based system architecture. In this problem, each agent has a local cost, and the goal for the agents is to collectively find a minimum of their aggregate cost. A standard algorithm to solve this problem is the iterative distributed gradient-descent (DGD) method being implemented collaboratively by the server and the agents. In the synchronous setting, the algorithm proceeds from one iteration to the next only after all the agents complete their expected communication with the server. However, such synchrony can be expensive and even infeasible in real-world applications. We show that waiting for all the agents is unnecessary in many applications of distributed optimization, including distributed machine learning, due to redundancy in the cost functions (or {\em data}). Specifically, we consider a generic notion of redundancy named $(r,ε)$-redundancy implying solvability of the original multi-agent optimization problem with $ε$ accuracy, despite the removal of up to $r$ (out of total $n$) agents from the system. We present an asynchronous DGD algorithm where in each iteration the server only waits for (any) $n-r$ agents, instead of all the $n$ agents. Assuming $(r,ε)$-redundancy, we show that our asynchronous algorithm converges to an approximate solution with error that is linear in $ε$ and $r$. Moreover, we also present a generalization of our algorithm to tolerate some Byzantine faulty agents in the system. Finally, we demonstrate the improved communication efficiency of our algorithm through experiments on MNIST and Fashion-MNIST using the benchmark neural network LeNet.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Byzantine Fault-Tolerance in Peer-to-Peer Distributed Gradient-Descent
Authors:
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
We consider the problem of Byzantine fault-tolerance in the peer-to-peer (P2P) distributed gradient-descent method -- a prominent algorithm for distributed optimization in a P2P system. In this problem, the system comprises of multiple agents, and each agent has a local cost function. In the fault-free case, when all the agents are honest, the P2P distributed gradient-descent method allows all the…
▽ More
We consider the problem of Byzantine fault-tolerance in the peer-to-peer (P2P) distributed gradient-descent method -- a prominent algorithm for distributed optimization in a P2P system. In this problem, the system comprises of multiple agents, and each agent has a local cost function. In the fault-free case, when all the agents are honest, the P2P distributed gradient-descent method allows all the agents to reach a consensus on a solution that minimizes their aggregate cost. However, we consider a scenario where a certain number of agents may be Byzantine faulty. Such faulty agents may not follow an algorithm correctly, and may share arbitrary incorrect information to prevent other non-faulty agents from solving the optimization problem. In the presence of Byzantine faulty agents, a more reasonable goal is to allow all the non-faulty agents to reach a consensus on a solution that minimizes the aggregate cost of all the non-faulty agents. We refer to this fault-tolerance goal as $f$-resilience where $f$ is the maximum number of Byzantine faulty agents in a system of $n$ agents, with $f < n$. Most prior work on fault-tolerance in P2P distributed optimization only consider approximate fault-tolerance wherein, unlike $f$-resilience, all the non-faulty agents' compute a minimum point of a non-uniformly weighted aggregate of their cost functions. We propose a fault-tolerance mechanism that confers provable $f$-resilience to the P2P distributed gradient-descent method, provided the non-faulty agents satisfy the necessary condition of $2f$-redundancy, defined later in the paper. Moreover, compared to prior work, our algorithm is applicable to a larger class of high-dimensional convex distributed optimization problems.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Approximate Byzantine Fault-Tolerance in Distributed Optimization
Authors:
Shuo Liu,
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that r…
▽ More
This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to $f$ (out of $n$) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called $2f$-redundancy. However, $2f$-redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of $(f,ε)$-resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with $ε$ accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to $2f$-redundancy. We obtain necessary and sufficient conditions for achieving $(f,ε)$-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for $(f,ε)$-resilience of the distributed gradient-descent method when equipped with robust gradient aggregation.
△ Less
Submitted 21 May, 2024; v1 submitted 22 January, 2021;
originally announced January 2021.
-
Byzantine Fault-Tolerance in Decentralized Optimization under Minimal Redundancy
Authors:
Nirupam Gupta,
Thinh T. Doan,
Nitin H. Vaidya
Abstract:
This paper considers the problem of Byzantine fault-tolerance in multi-agent decentralized optimization. In this problem, each agent has a local cost function. The goal of a decentralized optimization algorithm is to allow the agents to cooperatively compute a common minimum point of their aggregate cost function. We consider the case when a certain number of agents may be Byzantine faulty. Such f…
▽ More
This paper considers the problem of Byzantine fault-tolerance in multi-agent decentralized optimization. In this problem, each agent has a local cost function. The goal of a decentralized optimization algorithm is to allow the agents to cooperatively compute a common minimum point of their aggregate cost function. We consider the case when a certain number of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm, and they may share arbitrary or incorrect information with other non-faulty agents. Presence of such Byzantine agents renders a typical decentralized optimization algorithm ineffective. We propose a decentralized optimization algorithm with provable exact fault-tolerance against a bounded number of Byzantine agents, provided the non-faulty agents have a minimal redundancy.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE)
Authors:
Nirupam Gupta,
Shuo Liu,
Nitin H. Vaidya
Abstract:
This paper considers the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method - a popular algorithm for distributed multi-agent machine learning. In this problem, each agent samples data points independently from a certain data-generating distribution. In the fault-free case, the D-SGD method allows all the agents to learn a mathematical model best fitting th…
▽ More
This paper considers the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method - a popular algorithm for distributed multi-agent machine learning. In this problem, each agent samples data points independently from a certain data-generating distribution. In the fault-free case, the D-SGD method allows all the agents to learn a mathematical model best fitting the data collectively sampled by all agents. We consider the case when a fraction of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm correctly, and may render traditional D-SGD method ineffective by sharing arbitrary incorrect stochastic gradients. We propose a norm-based gradient-filter, named comparative gradient elimination (CGE), that robustifies the D-SGD method against Byzantine agents. We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard stochastic assumptions, and is computationally simpler compared to many existing gradient-filters such as multi-KRUM, geometric median-of-means, and the spectral filters. We empirically show, by simulating distributed learning on neural networks, that the fault-tolerance of CGE is comparable to that of existing gradient-filters. We also empirically show that exponential averaging of stochastic gradients improves the fault-tolerance of a generic gradient-filter.
△ Less
Submitted 17 April, 2021; v1 submitted 11 August, 2020;
originally announced August 2020.
-
Asynchronous Byzantine Approximate Consensus in Directed Networks
Authors:
Dimitris Sakavalas,
Lewis Tseng,
Nitin H. Vaidya
Abstract:
In this work, we study the approximate consensus problem in asynchronous message-passing networks where some nodes may become Byzantine faulty. We answer an open problem raised by Tseng and Vaidya, 2012, proposing the first algorithm of optimal resilience for directed networks. Interestingly, our results show that the tight condition on the underlying communication networks for asynchronous Byzant…
▽ More
In this work, we study the approximate consensus problem in asynchronous message-passing networks where some nodes may become Byzantine faulty. We answer an open problem raised by Tseng and Vaidya, 2012, proposing the first algorithm of optimal resilience for directed networks. Interestingly, our results show that the tight condition on the underlying communication networks for asynchronous Byzantine approximate consensus coincides with the tight condition for synchronous Byzantine exact consensus. Our results can be viewed as a non-trivial generalization of the algorithm by Abraham et al., 2004, which applies to the special case of complete networks. The tight condition and techniques identified in the paper shed light on the fundamental properties for solving approximate consensus in asynchronous directed networks.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
A Private and Finite-Time Algorithm for Solving a Distributed System of Linear Equations
Authors:
Shripad Gade,
Ji Liu,
Nitin H. Vaidya
Abstract:
This paper studies a system of linear equations, denoted as $Ax = b$, which is horizontally partitioned (rows in $A$ and $b$) and stored over a network of $m$ devices connected in a fixed directed graph. We design a fast distributed algorithm for solving such a partitioned system of linear equations, that additionally, protects the privacy of local data against an honest-but-curious adversary that…
▽ More
This paper studies a system of linear equations, denoted as $Ax = b$, which is horizontally partitioned (rows in $A$ and $b$) and stored over a network of $m$ devices connected in a fixed directed graph. We design a fast distributed algorithm for solving such a partitioned system of linear equations, that additionally, protects the privacy of local data against an honest-but-curious adversary that corrupts at most $τ$ nodes in the network. First, we present TITAN, privaTe fInite Time Average coNsensus algorithm, for solving a general average consensus problem over directed graphs, while protecting statistical privacy of private local data against an honest-but-curious adversary. Second, we propose a distributed linear system solver that involves each agent/devices computing an update based on local private data, followed by private aggregation using TITAN. Finally, we show convergence of our solver to the least squares solution in finite rounds along with statistical privacy of local linear equations against an honest-but-curious adversary provided the graph has weak vertex-connectivity of at least $τ+1$. We perform numerical experiments to validate our claims and compare our solution to the state-of-the-art methods by comparing computation, communication and memory costs.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Preserving Statistical Privacy in Distributed Optimization
Authors:
Nirupam Gupta,
Shripad Gade,
Nikhil Chopra,
Nitin H. Vaidya
Abstract:
We present a distributed optimization protocol that preserves statistical privacy of agents' local cost functions against a passive adversary that corrupts some agents in the network. The protocol is a composition of a distributed ``{\em zero-sum}" obfuscation protocol that obfuscates the agents' local cost functions, and a standard non-private distributed optimization method. We show that our pro…
▽ More
We present a distributed optimization protocol that preserves statistical privacy of agents' local cost functions against a passive adversary that corrupts some agents in the network. The protocol is a composition of a distributed ``{\em zero-sum}" obfuscation protocol that obfuscates the agents' local cost functions, and a standard non-private distributed optimization method. We show that our protocol protects the statistical privacy of the agents' local cost functions against a passive adversary that corrupts up to $t$ arbitrary agents as long as the communication network has $(t+1)$-vertex connectivity. The ``{\em zero-sum}" obfuscation protocol preserves the sum of the agents' local cost functions and therefore ensures accuracy of the computed solution.
△ Less
Submitted 29 December, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Resilience in Collaborative Optimization: Redundant and Independent Cost Functions
Authors:
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
This report considers the problem of Byzantine fault-tolerance in multi-agent collaborative optimization. In this problem, each agent has a local cost function. The goal of a collaborative optimization algorithm is to compute a minimum of the aggregate of the agents' cost functions. We consider the case when a certain number of agents may be Byzantine faulty. Such faulty agents may not follow a pr…
▽ More
This report considers the problem of Byzantine fault-tolerance in multi-agent collaborative optimization. In this problem, each agent has a local cost function. The goal of a collaborative optimization algorithm is to compute a minimum of the aggregate of the agents' cost functions. We consider the case when a certain number of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm, and they may send arbitrary or incorrect information regarding their local cost functions. A reasonable goal in presence of such faulty agents is to minimize the aggregate cost of the non-faulty agents. In this report, we show that this goal can be achieved if and only if the cost functions of the non-faulty agents have a minimal redundancy property. We present different algorithms that achieve such tolerance against faulty agents, and demonstrate a trade-off between the complexity of an algorithm and the properties of the agents' cost functions.
Further, we also consider the case when the cost functions are independent or do not satisfy the minimal redundancy property. In that case, we quantify the tolerance against faulty agents by introducing a metric called weak resilience. We present an algorithm that attains weak resilience when the faulty agents are in the minority and the cost functions are non-negative.
△ Less
Submitted 31 March, 2020; v1 submitted 21 March, 2020;
originally announced March 2020.
-
Improved Extension Protocols for Byzantine Broadcast and Agreement
Authors:
Kartik Nayak,
Ling Ren,
Elaine Shi,
Nitin H. Vaidya,
Zhuolun Xiang
Abstract:
Byzantine broadcast (BB) and Byzantine agreement (BA) are two most fundamental problems and essential building blocks in distributed computing, and improving their efficiency is of interest to both theoreticians and practitioners. In this paper, we study extension protocols of BB and BA, i.e., protocols that solve BB/BA with long inputs of $l$ bits using lower costs than $l$ single-bit instances.…
▽ More
Byzantine broadcast (BB) and Byzantine agreement (BA) are two most fundamental problems and essential building blocks in distributed computing, and improving their efficiency is of interest to both theoreticians and practitioners. In this paper, we study extension protocols of BB and BA, i.e., protocols that solve BB/BA with long inputs of $l$ bits using lower costs than $l$ single-bit instances. We present new protocols with improved communication complexity in almost all settings: authenticated BA/BB with $t<n/2$, authenticated BB with $t<(1-ε)n$, unauthenticated BA/BB with $t<n/3$, and asynchronous reliable broadcast and BA with $t<n/3$. The new protocols are advantageous and significant in several aspects. First, they achieve the best-possible communication complexity of $Θ(nl)$ for wider ranges of input sizes compared to prior results. Second, the authenticated extension protocols achieve optimal communication complexity given the current best available BB/BA protocols for short messages. Third, to the best of our knowledge, our asynchronous and authenticated protocols in the setting are the first extension protocols in that setting.
△ Less
Submitted 5 October, 2020; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Randomized Reactive Redundancy for Byzantine Fault-Tolerance in Parallelized Learning
Authors:
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
This report considers the problem of Byzantine fault-tolerance in synchronous parallelized learning that is founded on the parallelized stochastic gradient descent (parallelized-SGD) algorithm. The system comprises a master, and $n$ workers, where up to $f$ of the workers are Byzantine faulty. Byzantine workers need not follow the master's instructions correctly, and might send malicious incorrect…
▽ More
This report considers the problem of Byzantine fault-tolerance in synchronous parallelized learning that is founded on the parallelized stochastic gradient descent (parallelized-SGD) algorithm. The system comprises a master, and $n$ workers, where up to $f$ of the workers are Byzantine faulty. Byzantine workers need not follow the master's instructions correctly, and might send malicious incorrect (or faulty) information. The identity of the Byzantine workers remains fixed throughout the learning process, and is unknown a priori to the master. We propose two coding schemes, a deterministic scheme and a randomized scheme, for guaranteeing exact fault-tolerance if $2f < n$. The coding schemes use the concept of reactive redundancy for isolating Byzantine workers that eventually send faulty information. We note that the computation efficiencies of the schemes compare favorably with other (deterministic or randomized) coding schemes, for exact fault-tolerance.
△ Less
Submitted 19 December, 2019;
originally announced December 2019.
-
Exact Byzantine Consensus on Arbitrary Directed Graphs under Local Broadcast Model
Authors:
Muhammad Samir Khan,
Lewis Tseng,
Nitin H. Vaidya
Abstract:
We consider Byzantine consensus in a synchronous system where nodes are connected by a network modeled as a directed graph, i.e., communication links between neighboring nodes are not necessarily bi-directional. The directed graph model is motivated by wireless networks wherein asymmetric communication links can occur. In the classical point-to-point communication model, a message sent on a commun…
▽ More
We consider Byzantine consensus in a synchronous system where nodes are connected by a network modeled as a directed graph, i.e., communication links between neighboring nodes are not necessarily bi-directional. The directed graph model is motivated by wireless networks wherein asymmetric communication links can occur. In the classical point-to-point communication model, a message sent on a communication link is private between the two nodes on the link. This allows a Byzantine faulty node to equivocate, i.e., send inconsistent information to its neighbors. This paper considers the local broadcast model of communication, wherein transmission by a node is received identically by all of its outgoing neighbors. This allows such neighbors to detect a faulty node's attempt to equivocate, effectively depriving the faulty nodes of the ability to send conflicting information to different neighbors.
Prior work has obtained sufficient and necessary conditions on undirected graphs to be able to achieve Byzantine consensus under the local broadcast model. In this paper, we obtain tight conditions on directed graphs to be able to achieve Byzantine consensus with binary inputs under the local broadcast model. The results obtained in the paper provide insights into the trade-off between directionality of communication and the ability to achieve consensus.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Exact Byzantine Consensus on Undirected Graphs under Local Broadcast Model
Authors:
Muhammad Samir Khan,
Syed Shalan Naqvi,
Nitin H. Vaidya
Abstract:
This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical point-to-point communication model, it is well-known [7] that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among $n$ nodes in…
▽ More
This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical point-to-point communication model, it is well-known [7] that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among $n$ nodes in the presence of up to $f$ Byzantine faulty nodes: $n \ge 3f+1$ and vertex connectivity at least $2f+1$. In the classical point-to-point communication model, it is possible for a faulty node to equivocate, i.e., transmit conflicting information to different neighbors. Such equivocation is possible because messages sent by a node to one of its neighbors are not overheard by other neighbors.
This paper considers the local broadcast model. In contrast to the point-to-point communication model, in the local broadcast model, messages sent by a node are received identically by all of its neighbors. Thus, under the local broadcast model, attempts by a node to send conflicting information can be detected by its neighbors. Under this model, we show that the following two conditions are both necessary and sufficient for Byzantine consensus: vertex connectivity at least $\lfloor 3f/2 \rfloor + 1$ and minimum node degree at least $2f$. Observe that the local broadcast model results in a lower requirement for connectivity and the number of nodes $n$, as compared to the point-to-point communication model.
We extend the above results to a hybrid model that allows some of the Byzantine faulty nodes to equivocate. The hybrid model bridges the gap between the point-to-point and local broadcast models, and helps to precisely characterize the trade-off between equivocation and network requirements.
△ Less
Submitted 27 May, 2019; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Byzantine Fault Tolerant Distributed Linear Regression
Authors:
Nirupam Gupta,
Nitin H. Vaidya
Abstract:
This paper considers the problem of Byzantine fault tolerance in distributed linear regression in a multi-agent system. However, the proposed algorithms are given for a more general class of distributed optimization problems, of which distributed linear regression is a special case. The system comprises of a server and multiple agents, where each agent is holding a certain number of data points an…
▽ More
This paper considers the problem of Byzantine fault tolerance in distributed linear regression in a multi-agent system. However, the proposed algorithms are given for a more general class of distributed optimization problems, of which distributed linear regression is a special case. The system comprises of a server and multiple agents, where each agent is holding a certain number of data points and responses that satisfy a linear relationship (could be noisy). The objective of the server is to determine this relationship, given that some of the agents in the system (up to a known number) are Byzantine faulty (aka. actively adversarial). We show that the server can achieve this objective, in a deterministic manner, by robustifying the original distributed gradient descent method using norm based filters, namely 'norm filtering' and 'norm-cap filtering', incurring an additional log-linear computation cost in each iteration. The proposed algorithms improve upon the existing methods on three levels: i) no assumptions are required on the probability distribution of data points, ii) system can be partially asynchronous, and iii) the computational overhead (in order to handle Byzantine faulty agents) is log-linear in number of agents and linear in dimension of data points. The proposed algorithms differ from each other in the assumptions made for their correctness, and the gradient filter they use.
△ Less
Submitted 4 April, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Byzantine Consensus under Local Broadcast Model: Tight Sufficient Condition
Authors:
Muhammad Samir Khan,
Nitin H. Vaidya
Abstract:
In this work we consider Byzantine Consensus on undirected communication graphs under the local broadcast model. In the classical point-to-point communication model the messages exchanged between two nodes $u, v$ on an edge $uv$ of $G$ are private. This allows a faulty node to send conflicting information to its different neighbours, a property called equivocation. In contrast, in the local broadc…
▽ More
In this work we consider Byzantine Consensus on undirected communication graphs under the local broadcast model. In the classical point-to-point communication model the messages exchanged between two nodes $u, v$ on an edge $uv$ of $G$ are private. This allows a faulty node to send conflicting information to its different neighbours, a property called equivocation. In contrast, in the local broadcast communication model considered here, a message sent by node $u$ is received identically by all of its neighbours. This restriction to broadcast messages provides non-equivocation even for faulty nodes. In prior results [10, 11] it was shown that in the local broadcast model the communication graph must be $(\lfloor 3f/2 \rfloor +1)$-connected and have degree at least $2f$ to achieve Byzantine Consensus. In this work we show that this network condition is tight.
△ Less
Submitted 15 January, 2019; v1 submitted 12 January, 2019;
originally announced January 2019.
-
Distributed Learning with Adversarial Agents Under Relaxed Network Condition
Authors:
Pooja Vyavahare,
Lili Su,
Nitin H. Vaidya
Abstract:
This work studies the problem of non-Bayesian learning over multi-agent network when there are some adversarial (faulty) agents in the network. At each time step, each non-faulty agent collects partial information about an unknown state of the world and tries to estimate true state of the world by iteratively sharing information with its neighbors. Existing algorithms in this setting require that…
▽ More
This work studies the problem of non-Bayesian learning over multi-agent network when there are some adversarial (faulty) agents in the network. At each time step, each non-faulty agent collects partial information about an unknown state of the world and tries to estimate true state of the world by iteratively sharing information with its neighbors. Existing algorithms in this setting require that all non-faulty agents in the network should be able to achieve consensus via local information exchange.
In this work, we present an analysis of a distributed algorithm which does not require the network to achieve consensus. We show that if every non-faulty agent can receive enough information (via iteratively communicating with neighbors) to differentiate the true state of the world from other possible states then it can indeed learn the true state.
△ Less
Submitted 7 January, 2019;
originally announced January 2019.
-
Exact Byzantine Consensus Under Local-Broadcast Model
Authors:
Syed Shalan Naqvi,
Muhammad Samir Khan,
Nitin H. Vaidya
Abstract:
This paper considers the problem of achieving exact Byzantine consensus in a synchronous system under a local-broadcast communication model. The nodes communicate with each other via message-passing. The communication network is modeled as an undirected graph, with each vertex representing a node in the system. Under the local-broadcast communication model, when any node transmits a message, all i…
▽ More
This paper considers the problem of achieving exact Byzantine consensus in a synchronous system under a local-broadcast communication model. The nodes communicate with each other via message-passing. The communication network is modeled as an undirected graph, with each vertex representing a node in the system. Under the local-broadcast communication model, when any node transmits a message, all its neighbors in the communication graph receive the message reliably. This communication model is motivated by wireless networks. In this work, we present necessary and sufficient conditions on the underlying communication graph to achieve exact Byzantine consensus under the local-broadcast communication model.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
Optimal Record and Replay under Causal Consistency
Authors:
Russell L. Jones,
Muhammad S. Khan,
Nitin H. Vaidya
Abstract:
We investigate the minimum record needed to replay executions of processes that share causally consistent memory. For a version of causal consistency, we identify optimal records under both offline and online recording setting. Under the offline setting, a central authority has information about every process' view of the execution and can decide what information to record for each process. Under…
▽ More
We investigate the minimum record needed to replay executions of processes that share causally consistent memory. For a version of causal consistency, we identify optimal records under both offline and online recording setting. Under the offline setting, a central authority has information about every process' view of the execution and can decide what information to record for each process. Under the online setting, each process has to decide on the record at runtime as the operations are observed.
△ Less
Submitted 29 October, 2018; v1 submitted 22 May, 2018;
originally announced May 2018.
-
Global Stabilization for Causally Consistent Partial Replication
Authors:
Zhuolun Xiang,
Nitin H. Vaidya
Abstract:
Causally consistent distributed storage systems have received significant attention recently due to the potential for providing high throughput and causality guarantees. {\em Global stabilization} is a technique established for achieving causal consistency in distributed multi-version key-value store systems, adopted by the previous work such as GentleRain \cite{Du2014GentleRainCA} and Cure \cite{…
▽ More
Causally consistent distributed storage systems have received significant attention recently due to the potential for providing high throughput and causality guarantees. {\em Global stabilization} is a technique established for achieving causal consistency in distributed multi-version key-value store systems, adopted by the previous work such as GentleRain \cite{Du2014GentleRainCA} and Cure \cite{akkoorath2016cure}. Intuitively, this approach serializes all updates by their physical time and computes the ``Global Stable Time'' which is a time point $t$ such that versions with timestamp $\leq t$ can be returned to the client without violating causality. However, all previous designs with global stabilization assume {\em full replication}, where each data center stores a full copy of data, and each client is restricted to access servers within one data center. In this paper, we propose a theoretical framework to support {\em general partial replication} with causal consistency via global stabilization, where each server can store an arbitrary subset of the data, and each client is allowed to communicate with any subset of the servers and migrate among them without extra delays. We propose an algorithm that implements causal consistency for distributed multi-version key-value stores with general partially replication. We prove the optimality of the Global Stable Time computation in our algorithm regarding the remote update visibility latency, i.e. how fast update from a remote server is visible to the client, under general partial replication. We also provide trade-offs to further optimize the remote update visibility by introducing extra delays during client's migration. Simulation results on the performance of our algorithm compared to the previous work are also provided.
△ Less
Submitted 6 May, 2019; v1 submitted 14 March, 2018;
originally announced March 2018.
-
Effects of Topology Knowledge and Relay Depth on Asynchronous Consensus
Authors:
Dimitris Sakavalas,
Lewis Tseng,
Nitin H. Vaidya
Abstract:
Consider a point-to-point message-passing network. We are interested in the asynchronous crash-tolerant consensus problem in incomplete networks. We study the feasibility and efficiency of approximate consensus under different restrictions on topology knowledge and the relay depth, i.e., the maximum number of hops any message can be relayed. These two constraints are common in large-scale networks…
▽ More
Consider a point-to-point message-passing network. We are interested in the asynchronous crash-tolerant consensus problem in incomplete networks. We study the feasibility and efficiency of approximate consensus under different restrictions on topology knowledge and the relay depth, i.e., the maximum number of hops any message can be relayed. These two constraints are common in large-scale networks, and are used to avoid memory overload and network congestion respectively. Specifically, for different values of integers k, k , we consider that each node knows all its neighbors of at most k-hop distance (k-hop topology knowledge), and the relay depth is k . We consider both directed and undirected graphs. More concretely, we answer the following main question in asynchronous systems:
What is a tight condition on the underlying communication graphs for achieving approximate consensus if each node has only a k-hop topology knowledge and relay depth k?
To prove that the necessary conditions presented in the paper are also sufficient, we have developed algorithms that achieve consensus in graphs satisfying those conditions:
-The first class of algorithms requires k-hop topology knowledge and relay depth k. Unlike prior algorithms, these algorithms do not flood the network, and each node does not need the full topology knowledge. We show how the convergence time and the message complexity of those algorithms is affected by k, providing the respective upper bounds.
-The second set of algorithms requires only one-hop neighborhood knowledge, i.e., immediate incoming and outgoing neighbors, but needs to flood the network (i.e., relay depth is n, where n is the number of nodes). One result that may be of independent interest is a topology discovery mechanism to learn and "estimate" the topology in asynchronous directed networks with crash faults.
△ Less
Submitted 21 May, 2018; v1 submitted 12 March, 2018;
originally announced March 2018.
-
Private Learning on Networks: Part II
Authors:
Shripad Gade,
Nitin H. Vaidya
Abstract:
This paper considers a distributed multi-agent optimization problem, with the global objective consisting of the sum of local objective functions of the agents. The agents solve the optimization problem using local computation and communication between adjacent agents in the network. We present two randomized iterative algorithms for distributed optimization. To improve privacy, our algorithms add…
▽ More
This paper considers a distributed multi-agent optimization problem, with the global objective consisting of the sum of local objective functions of the agents. The agents solve the optimization problem using local computation and communication between adjacent agents in the network. We present two randomized iterative algorithms for distributed optimization. To improve privacy, our algorithms add "structured" randomization to the information exchanged between the agents. We prove deterministic correctness (in every execution) of the proposed algorithms despite the information being perturbed by noise with non-zero mean. We prove that a special case of a proposed algorithm (called function sharing) preserves privacy of individual polynomial objective functions under a suitable connectivity condition on the network topology.
△ Less
Submitted 5 November, 2017; v1 submitted 27 March, 2017;
originally announced March 2017.
-
Partially Replicated Causally Consistent Shared Memory: Lower Bounds and An Algorithm
Authors:
Zhuolun Xiang,
Nitin H. Vaidya
Abstract:
The focus of this paper is on causal consistency in a {\em partially replicated} distributed shared memory (DSM) system that provides the abstraction of shared read/write registers. Maintaining causal consistency in distributed shared memory systems has received significant attention in the past, mostly on {\em full replication} wherein each replica stores a copy of all the registers in the shared…
▽ More
The focus of this paper is on causal consistency in a {\em partially replicated} distributed shared memory (DSM) system that provides the abstraction of shared read/write registers. Maintaining causal consistency in distributed shared memory systems has received significant attention in the past, mostly on {\em full replication} wherein each replica stores a copy of all the registers in the shared memory. To ensure causal consistency, all causally preceding updates must be performed before an update is performed at any given replica. Therefore, some mechanism for tracking causal dependencies is required, such as vector timestamps with the number of vector elements being equal to the number of replicas in the context of full replication. In this paper, we investigate causal consistency in {\em partially replicated systems}, wherein each replica may store only a subset of the shared registers. Building on the past work, this paper makes three key contributions: 1. We present a necessary condition on the metadata (which we refer as a {\em timestamp}) that must be maintained by each replica to be able to track causality accurately. The necessary condition identifies a set of directed edges in a {\em share graph} that a replica's timestamp must keep track of. 2. We present an algorithm for achieving causal consistency using a timestamp that matches the above necessary condition, thus showing that the condition is necessary and sufficient. 3. We define a measurement of timestamp space size and present a lower bound (in bits) on the size of the timestamps. The lower bound matches our algorithm in several special cases.
△ Less
Submitted 29 May, 2019; v1 submitted 15 March, 2017;
originally announced March 2017.
-
Private Learning on Networks
Authors:
Shripad Gade,
Nitin H. Vaidya
Abstract:
Continual data collection and widespread deployment of machine learning algorithms, particularly the distributed variants, have raised new privacy challenges. In a distributed machine learning scenario, the dataset is stored among several machines and they solve a distributed optimization problem to collectively learn the underlying model. We present a secure multi-party computation inspired priva…
▽ More
Continual data collection and widespread deployment of machine learning algorithms, particularly the distributed variants, have raised new privacy challenges. In a distributed machine learning scenario, the dataset is stored among several machines and they solve a distributed optimization problem to collectively learn the underlying model. We present a secure multi-party computation inspired privacy preserving distributed algorithm for optimizing a convex function consisting of several possibly non-convex functions. Each individual objective function is privately stored with an agent while the agents communicate model parameters with neighbor machines connected in a network. We show that our algorithm can correctly optimize the overall objective function and learn the underlying model accurately. We further prove that under a vertex connectivity condition on the topology, our algorithm preserves privacy of individual objective functions. We establish limits on the what a coalition of adversaries can learn by observing the messages and states shared over a network.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.
-
Timestamps for Partial Replication
Authors:
Zhuolun Xiang,
Nitin H. Vaidya
Abstract:
Maintaining causal consistency in distributed shared memory systems using vector timestamps has received a lot of attention from both theoretical and practical prospective. However, most of the previous literature focuses on full replication where each data is stored in all replicas, which may not be scalable due to the increasing amount of data. In this report, we investigate how to achieve causa…
▽ More
Maintaining causal consistency in distributed shared memory systems using vector timestamps has received a lot of attention from both theoretical and practical prospective. However, most of the previous literature focuses on full replication where each data is stored in all replicas, which may not be scalable due to the increasing amount of data. In this report, we investigate how to achieve causal consistency in partial replicated systems, where each replica may store different set of data. We propose an algorithm that tracks causal dependencies via vector timestamp in client-server model for partial replication. The cost of our algorithm in terms of timestamps size varies as a function of the manner in which the replicas share data, and the set of replicas accessed by each client. We also establish a connection between our algorithm with the previous work on full replication.
△ Less
Submitted 26 December, 2016; v1 submitted 12 November, 2016;
originally announced November 2016.
-
Multiversion Altruistic Locking
Authors:
Chinmay Chandak,
Hrishikesh Vaidya,
Sathya Peri
Abstract:
This paper builds on altruistic locking which is an extension of 2PL. It allows more relaxed rules as compared to 2PL. But altruistic locking too enforces some rules which disallow some valid schedules (present in VSR and CSR) to be passed by AL. This paper proposes a multiversion variant of AL which solves this problem. The report also discusses the relationship or comparison between different pr…
▽ More
This paper builds on altruistic locking which is an extension of 2PL. It allows more relaxed rules as compared to 2PL. But altruistic locking too enforces some rules which disallow some valid schedules (present in VSR and CSR) to be passed by AL. This paper proposes a multiversion variant of AL which solves this problem. The report also discusses the relationship or comparison between different protocols such as MAL and MV2PL, MAL and AL, MAL and 2PL and so on. This paper also discusses the caveats involved in MAL and where it lies in the Venn diagram of multiversion serializable schedule protocols. Finally, the possible use of MAL in hybrid protocols and the parameters involved in making MAL successful are discussed.
△ Less
Submitted 30 October, 2016; v1 submitted 28 August, 2016;
originally announced August 2016.
-
Distributed Optimization of Convex Sum of Non-Convex Functions
Authors:
Shripad Gade,
Nitin H. Vaidya
Abstract:
We present a distributed solution to optimizing a convex function composed of several non-convex functions. Each non-convex function is privately stored with an agent while the agents communicate with neighbors to form a network. We show that coupled consensus and projected gradient descent algorithm proposed in [1] can optimize convex sum of non-convex functions under an additional assumption on…
▽ More
We present a distributed solution to optimizing a convex function composed of several non-convex functions. Each non-convex function is privately stored with an agent while the agents communicate with neighbors to form a network. We show that coupled consensus and projected gradient descent algorithm proposed in [1] can optimize convex sum of non-convex functions under an additional assumption on gradient Lipschitzness. We further discuss the applications of this analysis in improving privacy in distributed optimization.
△ Less
Submitted 18 August, 2016;
originally announced August 2016.
-
Distributed Optimization for Client-Server Architecture with Negative Gradient Weights
Authors:
Shripad Gade,
Nitin H. Vaidya
Abstract:
Availability of both massive datasets and computing resources have made machine learning and predictive analytics extremely pervasive. In this work we present a synchronous algorithm and architecture for distributed optimization motivated by privacy requirements posed by applications in machine learning. We present an algorithm for the recently proposed multi-parameter-server architecture. We cons…
▽ More
Availability of both massive datasets and computing resources have made machine learning and predictive analytics extremely pervasive. In this work we present a synchronous algorithm and architecture for distributed optimization motivated by privacy requirements posed by applications in machine learning. We present an algorithm for the recently proposed multi-parameter-server architecture. We consider a group of parameter servers that learn a model based on randomized gradients received from clients. Clients are computational entities with private datasets (inducing a private objective function), that evaluate and upload randomized gradients to the parameter servers. The parameter servers perform model updates based on received gradients and share the model parameters with other servers. We prove that the proposed algorithm can optimize the overall objective function for a very general architecture involving $C$ clients connected to $S$ parameter servers in an arbitrary time varying topology and the parameter servers forming a connected network.
△ Less
Submitted 19 December, 2016; v1 submitted 12 August, 2016;
originally announced August 2016.
-
Defending Non-Bayesian Learning against Adversarial Attacks
Authors:
Lili Su,
Nitin H. Vaidya
Abstract:
This paper addresses the problem of non-Bayesian learning over multi-agent networks, where agents repeatedly collect partially informative observations about an unknown state of the world, and try to collaboratively learn the true state. We focus on the impact of the adversarial agents on the performance of consensus-based non-Bayesian learning, where non-faulty agents combine local learning updat…
▽ More
This paper addresses the problem of non-Bayesian learning over multi-agent networks, where agents repeatedly collect partially informative observations about an unknown state of the world, and try to collaboratively learn the true state. We focus on the impact of the adversarial agents on the performance of consensus-based non-Bayesian learning, where non-faulty agents combine local learning updates with consensus primitives. In particular, we consider the scenario where an unknown subset of agents suffer Byzantine faults -- agents suffering Byzantine faults behave arbitrarily. Two different learning rules are proposed.
△ Less
Submitted 28 June, 2016;
originally announced June 2016.
-
Efficient Timestamps for Capturing Causality
Authors:
Nitin H. Vaidya,
Sandeep S. Kulkarni
Abstract:
Consider an asynchronous system consisting of processes that communicate via message-passing. The processes communicate over a potentially {\em incomplete} communication network consisting of reliable bidirectional communication channels. Thus, not every pair of processes is necessarily able to communicate with each other directly. % For instance, when the communication network is a {\em star} gra…
▽ More
Consider an asynchronous system consisting of processes that communicate via message-passing. The processes communicate over a potentially {\em incomplete} communication network consisting of reliable bidirectional communication channels. Thus, not every pair of processes is necessarily able to communicate with each other directly. % For instance, when the communication network is a {\em star} graph, there is a {\em central} process % that can communicate with all the remaining processes (which are called {\em radial} processes), % but the radial processes cannot communicate with each other directly.
The goal of the algorithms discussed in this paper is to assign timestamps to the events at all the processes such that (a) distinct events are assigned distinct timestamps, and (b) the happened-before relationship between the events can be inferred from the timestamps. We consider three types of algorithms for assigning timestamps to events: (i) Online algorithms that must (greedily) assign a timestamp to each event when the event occurs. (ii) Offline algorithms that assign timestamps to event after a finite execution is complete. (iii) Inline algorithms that assign a timestamp to each event when it occurs, but may modify some elements of a timestamp again at a later time.
For specific classes of graphs, particularly {\em star} graphs and graphs with connectivity $\geq 1$, the paper presents bounds on the length of vector timestamps assigned by an {\em online} algorithm. The paper then presents an {\em inline} algorithm, which typically assigns substantially smaller timestamps than the optimal-length {\em online} vector timestamps. In particular, the inline algorithm assigns timestamp in the form of a tuple containing $2c+2$ integer elements, where $c$ is the size of the vertex cover for the underlying communication graph.
△ Less
Submitted 19 June, 2016;
originally announced June 2016.
-
Asynchronous Distributed Hypothesis Testing in the Presence of Crash Failures
Authors:
Lili Su,
Nitin H. Vaidya
Abstract:
This paper addresses the problem of distributed hypothesis testing in multi-agent networks, where agents repeatedly collect local observations about an unknown state of the world, and try to collaboratively detect the true state through information exchange. We focus on the impact of failures and asynchrony (two fundamental factors in distributed systems) on the performance of consensus-based non-…
▽ More
This paper addresses the problem of distributed hypothesis testing in multi-agent networks, where agents repeatedly collect local observations about an unknown state of the world, and try to collaboratively detect the true state through information exchange. We focus on the impact of failures and asynchrony (two fundamental factors in distributed systems) on the performance of consensus-based non-Bayesian learning. In particular, we consider the scenario where the networked agents may suffer crash faults, and messages delay can be arbitrarily long but finite. We identify the minimal global detectability of the network for non-Bayesian rule to succeed. In addition, we obtain a generalization of a celebrated result by Wolfowitz and Hajnal to submatrices, which might be of independent interest.
△ Less
Submitted 4 February, 2016;
originally announced June 2016.
-
Relaxed Byzantine Vector Consensus
Authors:
Zhuolun Xiang,
Nitin H. Vaidya
Abstract:
Exact Byzantine consensus problem requires that non-faulty processes reach agreement on a decision (or output) that is in the convex hull of the inputs at the non-faulty processes. It is well-known that exact consensus is impossible in an asynchronous system in presence of faults, and in a synchronous system, n>=3f+1 is tight on the number of processes to achieve exact Byzantine consensus with sca…
▽ More
Exact Byzantine consensus problem requires that non-faulty processes reach agreement on a decision (or output) that is in the convex hull of the inputs at the non-faulty processes. It is well-known that exact consensus is impossible in an asynchronous system in presence of faults, and in a synchronous system, n>=3f+1 is tight on the number of processes to achieve exact Byzantine consensus with scalar inputs, in presence of up to f Byzantine faulty processes. Recent work has shown that when the inputs are d-dimensional vectors of reals, n>=max(3f+1,(d+1)f+1) is tight to achieve exact Byzantine consensus in synchronous systems, and n>= (d+2)f+1 for approximate Byzantine consensus in asynchronous systems.
Due to the dependence of the lower bound on vector dimension d, the number of processes necessary becomes large when the vector dimension is large. With the hope of reducing the lower bound on n, we consider two relaxed versions of Byzantine vector consensus: k-Relaxed Byzantine vector consensus and (delta,p)-Relaxed Byzantine vector consensus. In k-relaxed consensus, the validity condition requires that the output must be in the convex hull of projection of the inputs onto any subset of k-dimensions of the vectors. For (delta,p)-consensus the validity condition requires that the output must be within distance delta of the convex hull of the inputs of the non-faulty processes, where L_p norm is used as the distance metric. For (delta,p)-consensus, we consider two versions: in one version, delta is a constant, and in the second version, delta is a function of the inputs themselves.
We show that for k-relaxed consensus and (delta,p)-consensus with constant delta>=0, the bound on n is identical to the bound stated above for the original vector consensus problem. On the other hand, when delta depends on the inputs, we show that the bound on n is smaller when d>=3.
△ Less
Submitted 29 January, 2016;
originally announced January 2016.
-
Fault-Tolerant Distributed Optimization (Part IV): Constrained Optimization with Arbitrary Directed Networks
Authors:
Lili Su,
Nitin H. Vaidya
Abstract:
We study the problem of constrained distributed optimization in multi-agent networks when some of the computing agents may be faulty. In this problem, the system goal is to have all the non-faulty agents collectively minimize a global objective given by weighted average of local cost functions, each of which is initially known to a non-faulty agent only. In particular, we are interested in the sce…
▽ More
We study the problem of constrained distributed optimization in multi-agent networks when some of the computing agents may be faulty. In this problem, the system goal is to have all the non-faulty agents collectively minimize a global objective given by weighted average of local cost functions, each of which is initially known to a non-faulty agent only. In particular, we are interested in the scenario when the computing agents are connected by an arbitrary directed communication network, some of the agents may suffer from crash faults or Byzantine faults, and the estimate of each agent is restricted to lie in a common constraint set. This problem finds its applications in social computing and distributed large-scale machine learning.
The fault-tolerant multi-agent optimization problem was first formulated by Su and Vaidya, and is solved when the local functions are defined over the whole real line, and the networks are fully-connected. In this report, we consider arbitrary directed communication networks and focus on the scenario where, local estimates at the non-faulty agents are constrained, and only local communication and minimal memory carried across iterations are allowed. In particular, we generalize our previous results on fully-connected networks and unconstrained optimization to arbitrary directed networks and constrained optimization. As a byproduct, we provide a matrix representation for iterative approximate crash consensus. The matrix representation allows us to characterize the convergence rate for crash iterative consensus.
△ Less
Submitted 5 November, 2015;
originally announced November 2015.
-
Application-Aware Consistency: An Application to Social Network
Authors:
Lewis Tseng,
Alec Benzer,
Nitin H. Vaidya
Abstract:
This work weakens well-known consistency models using graphs that capture applications' characteristics. The weakened models not only respect application semantic, but also yield a performance benefit. We introduce a notion of dependency graphs, which specify relations between events that are important with respect to application semantic, and then weaken traditional consistency models (e.g., caus…
▽ More
This work weakens well-known consistency models using graphs that capture applications' characteristics. The weakened models not only respect application semantic, but also yield a performance benefit. We introduce a notion of dependency graphs, which specify relations between events that are important with respect to application semantic, and then weaken traditional consistency models (e.g., causal consistency) using these graphs. Particularly, we consider two types of graphs: intra-process and inter-process dependency graphs, where intra-process dependency graphs specify how events in a single process are related, and inter-process dependency graphs specify how events across multiple processes are related. Then, based on these two types of graphs, we define new consistency model, namely {\em application-aware} consistency. We also discuss why such application-aware consistency can be useful in social network applications.
This is a work in progress, and we present early ideas regarding application-aware consistency here.
△ Less
Submitted 6 October, 2015; v1 submitted 15 February, 2015;
originally announced February 2015.
-
Iterative Byzantine Vector Consensus in Incomplete Graphs
Authors:
Nitin H. Vaidya
Abstract:
This work addresses Byzantine vector consensus (BVC), wherein the input at each process is a d-dimensional vector of reals, and each process is expected to decide on a decision vector that is in the convex hull of the input vectors at the fault-free processes [3, 8]. The input vector at each process may also be viewed as a point in the d-dimensional Euclidean space R^d, where d > 0 is a finite int…
▽ More
This work addresses Byzantine vector consensus (BVC), wherein the input at each process is a d-dimensional vector of reals, and each process is expected to decide on a decision vector that is in the convex hull of the input vectors at the fault-free processes [3, 8]. The input vector at each process may also be viewed as a point in the d-dimensional Euclidean space R^d, where d > 0 is a finite integer. Recent work [3, 8] has addressed Byzantine vector consensus in systems that can be modeled by a complete graph. This paper considers Byzantine vector consensus in incomplete graphs. In particular, we address a particular class of iterative algorithms in incomplete graphs, and prove a necessary condition, and a sufficient condition, for the graphs to be able to solve the vector consensus problem iteratively. We present an iterative Byzantine vector consensus algorithm, and prove it correct under the sufficient condition. The necessary condition presented in this paper for vector consensus does not match with the sufficient condition for d > 1; thus, a weaker condition may potentially suffice for Byzantine vector consensus.
△ Less
Submitted 9 July, 2013;
originally announced July 2013.
-
Byzantine Vector Consensus in Complete Graphs
Authors:
Nitin H. Vaidya,
Vijay K. Garg
Abstract:
Consider a network of n processes each of which has a d-dimensional vector of reals as its input. Each process can communicate directly with all the processes in the system; thus the communication network is a complete graph. All the communication channels are reliable and FIFO (first-in-first-out). The problem of Byzantine vector consensus (BVC) requires agreement on a d-dimensional vector that i…
▽ More
Consider a network of n processes each of which has a d-dimensional vector of reals as its input. Each process can communicate directly with all the processes in the system; thus the communication network is a complete graph. All the communication channels are reliable and FIFO (first-in-first-out). The problem of Byzantine vector consensus (BVC) requires agreement on a d-dimensional vector that is in the convex hull of the d-dimensional input vectors at the non-faulty processes. We obtain the following results for Byzantine vector consensus in complete graphs while tolerating up to f Byzantine failures:
* We prove that in a synchronous system, n >= max(3f+1, (d+1)f+1) is necessary and sufficient for achieving Byzantine vector consensus.
* In an asynchronous system, it is known that exact consensus is impossible in presence of faulty processes. For an asynchronous system, we prove that n >= (d+2)f+1 is necessary and sufficient to achieve approximate Byzantine vector consensus.
Our sufficiency proofs are constructive. We show sufficiency by providing explicit algorithms that solve exact BVC in synchronous systems, and approximate BVC in asynchronous systems.
We also obtain tight bounds on the number of processes for achieving BVC using algorithms that are restricted to a simpler communication pattern.
△ Less
Submitted 11 February, 2013;
originally announced February 2013.
-
Parameter-independent Iterative Approximate Byzantine Consensus
Authors:
Lewis Tseng,
Nitin H. Vaidya
Abstract:
In this work, we explore iterative approximate Byzantine consensus algorithms that do not make explicit use of the global parameter of the graph, i.e., the upper-bound on the number of faults, f.
In this work, we explore iterative approximate Byzantine consensus algorithms that do not make explicit use of the global parameter of the graph, i.e., the upper-bound on the number of faults, f.
△ Less
Submitted 23 August, 2012;
originally announced August 2012.