-
Efficient Distributed Online Prediction and Stochastic Optimization with Approximate Distributed Averaging
Authors:
Konstantinos I. Tsianos,
Michael G. Rabbat
Abstract:
We study distributed methods for online prediction and stochastic optimization. Our approach is iterative: in each round nodes first perform local computations and then communicate in order to aggregate information and synchronize their decision variables. Synchronization is accomplished through the use of a distributed averaging protocol. When an exact distributed averaging protocol is used, it i…
▽ More
We study distributed methods for online prediction and stochastic optimization. Our approach is iterative: in each round nodes first perform local computations and then communicate in order to aggregate information and synchronize their decision variables. Synchronization is accomplished through the use of a distributed averaging protocol. When an exact distributed averaging protocol is used, it is known that the optimal regret bound of $\mathcal{O}(\sqrt{m})$ can be achieved using the distributed mini-batch algorithm of Dekel et al. (2012), where $m$ is the total number of samples processed across the network. We focus on methods using approximate distributed averaging protocols and show that the optimal regret bound can also be achieved in this setting. In particular, we propose a gossip-based optimization method which achieves the optimal regret bound. The amount of communication required depends on the network topology through the second largest eigenvalue of the transition matrix of a random walk on the network. In the setting of stochastic optimization, the proposed gossip-based approach achieves nearly-linear scaling: the optimization error is guaranteed to be no more than $ε$ after $\mathcal{O}(\frac{1}{n ε^2})$ rounds, each of which involves $\mathcal{O}(\log n)$ gossip iterations, when nodes communicate over a well-connected graph. This scaling law is also observed in numerical experiments on a cluster.
△ Less
Submitted 5 March, 2014; v1 submitted 3 March, 2014;
originally announced March 2014.
-
Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization
Authors:
Konstantinos I. Tsianos,
Sean Lawlor,
Michael G. Rabbat
Abstract:
We study the scalability of consensus-based distributed optimization algorithms by considering two questions: How many processors should we use for a given problem, and how often should they communicate when communication is not free? Central to our analysis is a problem-specific value $r$ which quantifies the communication/computation tradeoff. We show that organizing the communication among node…
▽ More
We study the scalability of consensus-based distributed optimization algorithms by considering two questions: How many processors should we use for a given problem, and how often should they communicate when communication is not free? Central to our analysis is a problem-specific value $r$ which quantifies the communication/computation tradeoff. We show that organizing the communication among nodes as a $k$-regular expander graph (Reingold, Vadhan, and Wigderson, 2002) yields speedups, while when all pairs of nodes communicate (as in a complete graph), there is an optimal number of processors that depends on $r$. Surprisingly, a speedup can be obtained, in terms of the time to reach a fixed level of accuracy, by communicating less and less frequently as the computation progresses. Experiments on a real cluster solving metric learning and non-smooth convex minimization tasks demonstrate strong agreement between theory and practice.
△ Less
Submitted 5 September, 2012;
originally announced September 2012.
-
The Impact of Communication Delays on Distributed Consensus Algorithms
Authors:
Konstantinos I. Tsianos,
Michael G. Rabbat
Abstract:
We study the effect of communication delays on distributed consensus algorithms. Two ways to model delays on a network are presented. The first model assumes that each link delivers messages with a fixed (constant) amount of delay, and the second model is more realistic, allowing for i.i.d. time-varying bounded delays. In contrast to previous work studying the effects of delays on consensus algori…
▽ More
We study the effect of communication delays on distributed consensus algorithms. Two ways to model delays on a network are presented. The first model assumes that each link delivers messages with a fixed (constant) amount of delay, and the second model is more realistic, allowing for i.i.d. time-varying bounded delays. In contrast to previous work studying the effects of delays on consensus algorithms, the models studied here allow for a node to receive multiple messages from the same neighbor in one iteration. The analysis of the fixed delay model shows that convergence to a consensus is guaranteed and the rate of convergence is reduced by no more than a factor O(B^2) where B is the maximum delay on any link. For the time-varying delay model we also give a convergence proof which, for row-stochastic consensus protocols, is not a trivial consequence of ergodic matrix products. In both delay models, the consensus value is no longer the average, even if the original protocol was an averaging protocol. For this reason, we propose the use of a different consensus algorithm called Push-Sum [Kempe et al. 2003]. We model delays in the Push-Sum framework and show that convergence to the average consensus is guaranteed. This suggests that Push-Sum might be a better choice from a practical standpoint.
△ Less
Submitted 24 July, 2012;
originally announced July 2012.
-
Distributed Strongly Convex Optimization
Authors:
Konstantinos I. Tsianos,
Michael G. Rabbat
Abstract:
A lot of effort has been invested into characterizing the convergence rates of gradient based algorithms for non-linear convex optimization. Recently, motivated by large datasets and problems in machine learning, the interest has shifted towards distributed optimization. In this work we present a distributed algorithm for strongly convex constrained optimization. Each node in a network of n comput…
▽ More
A lot of effort has been invested into characterizing the convergence rates of gradient based algorithms for non-linear convex optimization. Recently, motivated by large datasets and problems in machine learning, the interest has shifted towards distributed optimization. In this work we present a distributed algorithm for strongly convex constrained optimization. Each node in a network of n computers converges to the optimum of a strongly convex, L-Lipchitz continuous, separable objective at a rate O(log (sqrt(n) T) / T) where T is the number of iterations. This rate is achieved in the online setting where the data is revealed one at a time to the nodes, and in the batch setting where each node has access to its full local dataset from the start. The same convergence rate is achieved in expectation when the subgradients used at each node are corrupted with additive zero-mean noise.
△ Less
Submitted 19 July, 2012; v1 submitted 12 July, 2012;
originally announced July 2012.
-
Multiscale Gossip for Efficient Decentralized Averaging in Wireless Packet Networks
Authors:
Konstantinos I. Tsianos,
Michael G. Rabbat
Abstract:
This paper describes and analyzes a hierarchical gossip algorithm for solving the distributed average consensus problem in wireless sensor networks. The network is recursively partitioned into subnetworks. Initially, nodes at the finest scale gossip to compute local averages. Then, using geographic routing to enable gossip between nodes that are not directly connected, these local averages are pro…
▽ More
This paper describes and analyzes a hierarchical gossip algorithm for solving the distributed average consensus problem in wireless sensor networks. The network is recursively partitioned into subnetworks. Initially, nodes at the finest scale gossip to compute local averages. Then, using geographic routing to enable gossip between nodes that are not directly connected, these local averages are progressively fused up the hierarchy until the global average is computed. We show that the proposed hierarchical scheme with $k$ levels of hierarchy is competitive with state-of-the-art randomized gossip algorithms, in terms of message complexity, achieving $ε$-accuracy with high probability after $O\big(n \log \log n \log \frac{kn}ε \big)$ messages. Key to our analysis is the way in which the network is recursively partitioned. We find that the optimal scaling law is achieved when subnetworks at scale $j$ contain $O(n^{(2/3)^j})$ nodes; then the message complexity at any individual scale is $O(n \log \frac{kn}ε)$, and the total number of scales in the hierarchy grows slowly, as $Θ(\log \log n)$. Another important consequence of hierarchical construction is that the longest distance over which messages are exchanged is $O(n^{1/3})$ hops (at the highest scale), and most messages (at lower scales) travel shorter distances. In networks that use link-level acknowledgements, this results in less congestion and resource usage by reducing message retransmissions. Simulations illustrate that the proposed scheme is more message-efficient than existing state-of-the-art randomized gossip algorithms based on averaging along paths.
△ Less
Submitted 27 February, 2012; v1 submitted 9 November, 2010;
originally announced November 2010.