-
Stability of Decentralized Gradient Descent in Open Multi-Agent Systems
Authors:
Julien M. Hendrickx,
Michael G. Rabbat
Abstract:
The aim of decentralized gradient descent (DGD) is to minimize a sum of $n$ functions held by interconnected agents. We study the stability of DGD in open contexts where agents can join or leave the system, resulting each time in the addition or the removal of their function from the global objective. Assuming all functions are smooth, strongly convex, and their minimizers all lie in a given ball,…
▽ More
The aim of decentralized gradient descent (DGD) is to minimize a sum of $n$ functions held by interconnected agents. We study the stability of DGD in open contexts where agents can join or leave the system, resulting each time in the addition or the removal of their function from the global objective. Assuming all functions are smooth, strongly convex, and their minimizers all lie in a given ball, we characterize the sensitivity of the global minimizer of the sum of these functions to the removal or addition of a new function and provide bounds in $ O\left(\min \left(κ^{0.5}, κ/n^{0.5},κ^{1.5}/n\right)\right)$ where $κ$ is the condition number. We also show that the states of all agents can be eventually bounded independently of the sequence of arrivals and departures. The magnitude of the bound scales with the importance of the interconnection, which also determines the accuracy of the final solution in the absence of arrival and departure, exposing thus a potential trade-off between accuracy and sensitivity. Our analysis relies on the formulation of DGD as gradient descent on an auxiliary function. The tightness of our results is analyzed using the PESTO Toolbox.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization
Authors:
Angelia Nedić,
Alex Olshevsky,
Michael G. Rabbat
Abstract:
In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions. Algorithms interleave local computations with communication among all or a subset of the nodes. Motivated by a variety of applications---distributed estimation in sensor networks, fitting models to massive data sets, and distributed control…
▽ More
In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions. Algorithms interleave local computations with communication among all or a subset of the nodes. Motivated by a variety of applications---distributed estimation in sensor networks, fitting models to massive data sets, and distributed control of multi-robot systems, to name a few---significant advances have been made towards the development of robust, practical algorithms with theoretical performance guarantees. This paper presents an overview of recent work in this area. In general, rates of convergence depend not only on the number of nodes involved and the desired level of accuracy, but also on the structure and nature of the network over which nodes communicate (e.g., whether links are directed or undirected, static or time-varying). We survey the state-of-the-art algorithms and their analyses tailored to these different scenarios, highlighting the role of the network topology.
△ Less
Submitted 15 January, 2018; v1 submitted 25 September, 2017;
originally announced September 2017.
-
On Reconstructability of Quadratic Utility Functions from the Iterations in Gradient Methods
Authors:
Farhad Farokhi,
Iman Shames,
Michael G. Rabbat,
Mikael Johansson
Abstract:
In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can rec…
▽ More
In this paper, we consider a scenario where an eavesdropper can read the content of messages transmitted over a network. The nodes in the network are running a gradient algorithm to optimize a quadratic utility function where such a utility optimization is a part of a decision making process by an administrator. We are interested in understanding the conditions under which the eavesdropper can reconstruct the utility function or a scaled version of it and, as a result, gain insight into the decision-making process. We establish that if the parameter of the gradient algorithm, i.e.,~the step size, is chosen appropriately, the task of reconstruction becomes practically impossible for a class of Bayesian filters with uniform priors. We establish what step-size rules should be employed to ensure this.
△ Less
Submitted 17 September, 2015;
originally announced September 2015.
-
On the Convergence of Alternating Direction Lagrangian Methods for Nonconvex Structured Optimization Problems
Authors:
Sindri Magnússon,
Pradeep Chathuranga Weeraddana,
Michael G. Rabbat,
Carlo Fischione
Abstract:
Nonconvex and structured optimization problems arise in many engineering applications that demand scalable and distributed solution methods. The study of the convergence properties of these methods is in general difficult due to the nonconvexity of the problem. In this paper, two distributed solution methods that combine the fast convergence properties of augmented Lagrangian-based methods with th…
▽ More
Nonconvex and structured optimization problems arise in many engineering applications that demand scalable and distributed solution methods. The study of the convergence properties of these methods is in general difficult due to the nonconvexity of the problem. In this paper, two distributed solution methods that combine the fast convergence properties of augmented Lagrangian-based methods with the separability properties of alternating optimization are investigated. The first method is adapted from the classic quadratic penalty function method and is called the Alternating Direction Penalty Method (ADPM). Unlike the original quadratic penalty function method, in which single-step optimizations are adopted, ADPM uses an alternating optimization, which in turn makes it scalable. The second method is the well-known Alternating Direction Method of Multipliers (ADMM). It is shown that ADPM for nonconvex problems asymptotically converges to a primal feasible point under mild conditions and an additional condition ensuring that it asymptotically reaches the standard first order necessary conditions for local optimality are introduced. In the case of the ADMM, novel sufficient conditions under which the algorithm asymptotically reaches the standard first order necessary conditions are established. Based on this, complete convergence of ADMM for a class of low dimensional problems are characterized. Finally, the results are illustrated by applying ADPM and ADMM to a nonconvex localization problem in wireless sensor networks.
△ Less
Submitted 30 April, 2015; v1 submitted 29 September, 2014;
originally announced September 2014.
-
Efficient Distributed Online Prediction and Stochastic Optimization with Approximate Distributed Averaging
Authors:
Konstantinos I. Tsianos,
Michael G. Rabbat
Abstract:
We study distributed methods for online prediction and stochastic optimization. Our approach is iterative: in each round nodes first perform local computations and then communicate in order to aggregate information and synchronize their decision variables. Synchronization is accomplished through the use of a distributed averaging protocol. When an exact distributed averaging protocol is used, it i…
▽ More
We study distributed methods for online prediction and stochastic optimization. Our approach is iterative: in each round nodes first perform local computations and then communicate in order to aggregate information and synchronize their decision variables. Synchronization is accomplished through the use of a distributed averaging protocol. When an exact distributed averaging protocol is used, it is known that the optimal regret bound of $\mathcal{O}(\sqrt{m})$ can be achieved using the distributed mini-batch algorithm of Dekel et al. (2012), where $m$ is the total number of samples processed across the network. We focus on methods using approximate distributed averaging protocols and show that the optimal regret bound can also be achieved in this setting. In particular, we propose a gossip-based optimization method which achieves the optimal regret bound. The amount of communication required depends on the network topology through the second largest eigenvalue of the transition matrix of a random walk on the network. In the setting of stochastic optimization, the proposed gossip-based approach achieves nearly-linear scaling: the optimization error is guaranteed to be no more than $ε$ after $\mathcal{O}(\frac{1}{n ε^2})$ rounds, each of which involves $\mathcal{O}(\log n)$ gossip iterations, when nodes communicate over a well-connected graph. This scaling law is also observed in numerical experiments on a cluster.
△ Less
Submitted 5 March, 2014; v1 submitted 3 March, 2014;
originally announced March 2014.