-
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method
Authors:
Bugra Can,
Saeed Soori,
Maryam Mehri Dehnavi,
Mert Gürbüzbalaban
Abstract:
This work proposes a distributed algorithm for solving empirical risk minimization problems, called L-DQN, under the master/worker communication model. L-DQN is a distributed limited-memory quasi-Newton method that supports asynchronous computations among the worker nodes. Our method is efficient both in terms of storage and communication costs, i.e., in every iteration the master node and workers…
▽ More
This work proposes a distributed algorithm for solving empirical risk minimization problems, called L-DQN, under the master/worker communication model. L-DQN is a distributed limited-memory quasi-Newton method that supports asynchronous computations among the worker nodes. Our method is efficient both in terms of storage and communication costs, i.e., in every iteration the master node and workers communicate vectors of size $O(d)$, where $d$ is the dimension of the decision variable, and the amount of memory required on each node is $O(md)$, where $m$ is an adjustable parameter. To our knowledge, this is the first distributed quasi-Newton method with provable global linear convergence guarantees in the asynchronous setting where delays between nodes are present. Numerical experiments are provided to illustrate the theory and the practical performance of our method.
△ Less
Submitted 4 September, 2021; v1 submitted 20 August, 2021;
originally announced August 2021.
-
Randomized Gossi** with Effective Resistance Weights: Performance Guarantees and Applications
Authors:
Bugra Can,
Saeed Soori,
Necdet Serhat Aybat,
Maryam Mehri Dehnavi,
Mert Gurbuzbalaban
Abstract:
The effective resistance between a pair of nodes in a weighted undirected graph is defined as the potential difference induced when a unit current is injected at one node and extracted from the other, treating edge weights as the conductance values of edges. The effective resistance is a key quantity of interest in many applications, e.g., solving linear systems, Markov Chains, and continuous-time…
▽ More
The effective resistance between a pair of nodes in a weighted undirected graph is defined as the potential difference induced when a unit current is injected at one node and extracted from the other, treating edge weights as the conductance values of edges. The effective resistance is a key quantity of interest in many applications, e.g., solving linear systems, Markov Chains, and continuous-time averaging networks. We consider effective resistances (ER) in the context of designing randomized gossi** methods for the consensus problem, where the aim is to compute the average of node values in a distributed manner through iteratively computing weighted averages among randomly chosen neighbors. We show that employing ER weights improves the averaging time corresponding to the traditional choice of uniform weights -the amount of improvement depends on the network structure. We illustrate these results through numerical experiments. We also present an application of the ER gossi** to distributed optimization: we numerically verified that using ER gossi** within EXTRA and DPGA-W methods improves their practical performance in terms of communication efficiency.
△ Less
Submitted 16 October, 2021; v1 submitted 29 July, 2019;
originally announced July 2019.
-
DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
Authors:
Saeed Soori,
Konstantin Mischenko,
Aryan Mokhtari,
Maryam Mehri Dehnavi,
Mert Gurbuzbalaban
Abstract:
In this paper, we consider distributed algorithms for solving the empirical risk minimization problem under the master/worker communication model. We develop a distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence. To our knowledge, this is the first distributed asynchronous algorithm with superlinear convergence guarantees. Our algorithm is communication-efficie…
▽ More
In this paper, we consider distributed algorithms for solving the empirical risk minimization problem under the master/worker communication model. We develop a distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence. To our knowledge, this is the first distributed asynchronous algorithm with superlinear convergence guarantees. Our algorithm is communication-efficient in the sense that at every iteration the master node and workers communicate vectors of size $O(p)$, where $p$ is the dimension of the decision variable. The proposed method is based on a distributed asynchronous averaging scheme of decision vectors and gradients in a way to effectively capture the local Hessian information of the objective function. Our convergence theory supports asynchronous computations subject to both bounded delays and unbounded delays with a bounded time-average. Unlike in the majority of asynchronous optimization literature, we do not require choosing smaller stepsize when delays are huge. We provide numerical experiments that match our theoretical results and showcase significant improvement comparing to state-of-the-art distributed algorithms.
△ Less
Submitted 10 June, 2019; v1 submitted 2 June, 2019;
originally announced June 2019.
-
Avoiding Communication in Proximal Methods for Convex Optimization Problems
Authors:
Saeed Soori,
Aditya Devarakonda,
James Demmel,
Mert Gurbuzbalaban,
Maryam Mehri Dehnavi
Abstract:
The fast iterative soft thresholding algorithm (FISTA) is used to solve convex regularized optimization problems in machine learning. Distributed implementations of the algorithm have become popular since they enable the analysis of large datasets. However, existing formulations of FISTA communicate data at every iteration which reduces its performance on modern distributed architectures. The comm…
▽ More
The fast iterative soft thresholding algorithm (FISTA) is used to solve convex regularized optimization problems in machine learning. Distributed implementations of the algorithm have become popular since they enable the analysis of large datasets. However, existing formulations of FISTA communicate data at every iteration which reduces its performance on modern distributed architectures. The communication costs of FISTA, including bandwidth and latency costs, is closely tied to the mathematical formulation of the algorithm. This work reformulates FISTA to communicate data at every k iterations and reduce data communication when operating on large data sets. We formulate the algorithm for two different optimization methods on the Lasso problem and show that the latency cost is reduced by a factor of k while bandwidth and floating-point operation costs remain the same. The convergence rates and stability properties of the reformulated algorithms are similar to the standard formulations. The performance of communication-avoiding FISTA and Proximal Newton methods is evaluated on 1 to 1024 nodes for multiple benchmarks and demonstrate average speedups of 3-10x with scaling properties that outperform the classical algorithms.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.