-
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity
Authors:
Dmitry Kovalev,
Aleksandr Beznosikov,
Ekaterina Borodich,
Alexander Gasnikov,
Gesualdo Scutari
Abstract:
We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of g…
▽ More
We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of $p$ and $q$, that is,
$\mathcal{O}(\sqrt{L_p/μ})$ and $\mathcal{O}(\sqrt{L_q/μ})$, respectively. This result is much sharper than the classic black-box complexity $\mathcal{O}(\sqrt{(L_p+L_q)/μ})$, especially when the difference between $L_q$ and $L_q$ is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on {\it both} communication and local gradient calls, with the former having being a long-standing open problem. Finally the method is extended to distributed saddle-problems (under function similarity) by means of solving a class of variational inequalities, achieving lower communication and computation complexity bounds.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees
Authors:
Ying Sun,
Marie Maros,
Gesualdo Scutari,
Guang Cheng
Abstract:
We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimens…
▽ More
We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimensional scaling, allowing the ambient dimension $d$ to grow with (and possibly exceed) the sample size $N$. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$. When $s\log d/N=o(1)$, a condition necessary for statistical consistency, an $\varepsilon$-optimal solution is attained after $\mathcal{O}(κ\log (1/\varepsilon))$ gradient computations and $O (κ/(1-ρ) \log (1/\varepsilon))$ communication rounds, where $κ$ is the restricted condition number of the loss function and $ρ$ measures the network connectivity. The computation cost matches that of the centralized projected gradient algorithm despite having data distributed; whereas the communication rounds reduce as the network connectivity improves. Overall, our study reveals interesting connections between statistical efficiency, network connectivity \& topology, and convergence rate in high dimensions.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Distributed Sparse Regression via Penalization
Authors:
Yao Ji,
Gesualdo Scutari,
Ying Sun,
Harsha Honnappa
Abstract:
We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been exten…
▽ More
We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings.
△ Less
Submitted 21 June, 2023; v1 submitted 11 November, 2021;
originally announced November 2021.
-
Acceleration in Distributed Optimization under Similarity
Authors:
Ye Tian,
Gesualdo Scutari,
Tianyu Cao,
Alexander Gasnikov
Abstract:
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a {\it preconditioned, accelerated} distributed method. An $\varepsilon$-solut…
▽ More
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a {\it preconditioned, accelerated} distributed method. An $\varepsilon$-solution is achieved in $\tilde{\mathcal{O}}\big(\sqrt{\frac{β/μ}{1-ρ}}\log1/\varepsilon\big)$ number of communications steps, where $β/μ$ is the relative condition number between the global and local loss functions, and $ρ$ characterizes the connectivity of the network. This rate matches (up to poly-log factors) lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest. Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems.
△ Less
Submitted 9 April, 2022; v1 submitted 24 October, 2021;
originally announced October 2021.
-
Decentralized Asynchronous Non-convex Stochastic Optimization on Directed Graphs
Authors:
Vyacheslav Kungurtsev,
Mahdi Morafah,
Tara Javidi,
Gesualdo Scutari
Abstract:
Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize a collective (additive) objective function consisting of agents' individual (possibly non-convex) local objective functions. Each agent only has access to a no…
▽ More
Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize a collective (additive) objective function consisting of agents' individual (possibly non-convex) local objective functions. Each agent only has access to a noisy estimate of the gradient of its own function (one component of the sum of objective functions). We proposed an asynchronous distributed algorithm for such a class of problems. The algorithm combines stochastic gradients with tracking in an asynchronous push-sum framework and obtain the standard sublinear convergence rate for general non-convex functions, matching the rate of centralized stochastic gradient descent SGD.
Our experiments on a non-convex image classification task using convolutional neural network validate the convergence of our proposed algorithm across different number of nodes and graph connectivity percentages.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Finite-Bit Quantization For Distributed Algorithms With Linear Convergence
Authors:
Nicolò Michelusi,
Gesualdo Scutari,
Chang-Shen Lee
Abstract:
This paper studies distributed algorithms for (strongly convex) composite optimization problems over mesh networks, subject to quantized communications. Instead of focusing on a specific algorithmic design, a black-box model is proposed, casting linearly convergent distributed algorithms in the form of fixed-point iterates. The algorithmic model is equipped with a novel random or deterministic Bia…
▽ More
This paper studies distributed algorithms for (strongly convex) composite optimization problems over mesh networks, subject to quantized communications. Instead of focusing on a specific algorithmic design, a black-box model is proposed, casting linearly convergent distributed algorithms in the form of fixed-point iterates. The algorithmic model is equipped with a novel random or deterministic Biased Compression (BC) rule on the quantizer design, and a new Adaptive encoding Nonuniform Quantizer (ANQ) coupled with a communication-efficient encoding scheme, which implements the BC-rule using a finite number of bits (below machine precision). This fills a gap existing in most state-of-the-art quantization schemes, such as those based on the popular compression rule, which rely on communication of some scalar signals with negligible quantization error (in practice quantized at the machine precision). A unified communication complexity analysis is developed for the black-box model, determining the average number of bits required to reach a solution of the optimization problem within a target accuracy. It is shown that the proposed BC-rule preserves linear convergence of the unquantized algorithms, and a trade-off between convergence rate and communication cost under ANQ-based quantization is characterized. Numerical results validate our theoretical findings and show that distributed algorithms equipped with the proposed ANQ have more favorable communication cost than algorithms using state-of-the-art quantization rules.
△ Less
Submitted 17 May, 2022; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Distributed Saddle-Point Problems Under Similarity
Authors:
Aleksandr Beznosikov,
Gesualdo Scutari,
Alexander Rogozin,
Alexander Gasnikov
Abstract:
We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms…
▽ More
We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality $ε>0$ is achieved over master/workers networks in $Ω\big(Δ\cdot δ/μ\cdot \log (1/\varepsilon)\big)$ rounds of communications, where $δ>0$ measures the degree of similarity of the local functions, $μ$ is their strong convexity constant, and $Δ$ is the diameter of the network. The lower communication complexity bound over meshed networks reads $Ω\big(1/{\sqrtρ} \cdot δ/μ\cdot\log (1/\varepsilon)\big)$, where $ρ$ is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust logistic regression problem.
△ Less
Submitted 22 August, 2022; v1 submitted 22 July, 2021;
originally announced July 2021.
-
An Accelerated Second-Order Method for Distributed Stochastic Optimization
Authors:
Artem Agafonov,
Pavel Dvurechensky,
Gesualdo Scutari,
Alexander Gasnikov,
Dmitry Kamzolov,
Aleksandr Lukashevich,
Amir Daneshmand
Abstract:
We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication complexity bound for this setting and improves upon e…
▽ More
We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication complexity bound for this setting and improves upon existing upper bound. We further exploit this algorithm to obtain convergence rate bounds for the original stochastic optimization problem and compare our bounds with the existing bounds in several regimes when the goal is to minimize the number of communication rounds and increase the parallelization by increasing the number of workers.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Newton Method over Networks is Fast up to the Statistical Precision
Authors:
Amir Daneshmand,
Gesualdo Scutari,
Pavel Dvurechensky,
Alexander Gasnikov
Abstract:
We propose a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph. The algorithm employs an inexact, preconditioned Newton step at each agent's side: the gradient of the centralized loss is iteratively estimated via a gradient-tracking consensus mechanism and the Hessian is subsamp…
▽ More
We propose a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph. The algorithm employs an inexact, preconditioned Newton step at each agent's side: the gradient of the centralized loss is iteratively estimated via a gradient-tracking consensus mechanism and the Hessian is subsampled over the local data sets. No Hessian matrices are thus exchanged over the network. We derive global complexity bounds for convex and strongly convex losses. Our analysis reveals an interesting interplay between sample and iteration/communication complexity: statistically accurate solutions are achievable in roughly the same number of iterations of the centralized cubic Newton method, with a communication cost per iteration of the order of $\widetilde{\mathcal{O}}\big(1/\sqrt{1-ρ}\big)$, where $ρ$ characterizes the connectivity of the network. This demonstrates a significant communication saving with respect to that of existing, statistically oblivious, distributed Newton-based methods over networks.
△ Less
Submitted 16 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Asynchronous Optimization over Graphs: Linear Convergence under Error Bound Conditions
Authors:
Loris Cannelli,
Francisco Facchinei,
Gesualdo Scutari,
Vyacheslav Kungurtsev
Abstract:
We consider convex and nonconvex constrained optimization with a partially separable objective function: agents minimize the sum of local objective functions, each of which is known only by the associated agent and depends on the variables of that agent and those of a few others. This partitioned setting arises in several applications of practical interest. We propose what is, to the best of our k…
▽ More
We consider convex and nonconvex constrained optimization with a partially separable objective function: agents minimize the sum of local objective functions, each of which is known only by the associated agent and depends on the variables of that agent and those of a few others. This partitioned setting arises in several applications of practical interest. We propose what is, to the best of our knowledge, the first distributed, asynchronous algorithm with rate guarantees for this class of problems. When the objective function is nonconvex, the algorithm provably converges to a stationary solution at a sublinear rate whereas linear rate is achieved when the objective satisfies under the renowned Luo-Tseng error bound condition (which is less stringent than strong convexity). Numerical results on matrix completion and LASSO problems show the effectiveness of our method.
△ Less
Submitted 18 October, 2020;
originally announced October 2020.
-
Diminishing Stepsize Methods for Nonconvex Composite Problems via Ghost Penalties: from the General to the Convex Regular Constrained Case
Authors:
Francisco Facchinei,
Vyacheskav Kungurtsevb,
Lorenzo Lampariello,
Gesualdo Scutari
Abstract:
In this paper we first extend the diminishing stepsize method for nonconvex constrained problems presented in [4] to deal with equality constraints and a nonsmooth objective function of composite type. We then consider the particular case in which the constraints are convex and satisfy a standard constraint qualification and show that in this setting the algorithm can be considerably simplified, r…
▽ More
In this paper we first extend the diminishing stepsize method for nonconvex constrained problems presented in [4] to deal with equality constraints and a nonsmooth objective function of composite type. We then consider the particular case in which the constraints are convex and satisfy a standard constraint qualification and show that in this setting the algorithm can be considerably simplified, reducing the computational burden of each iteration.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis
Authors:
**ming Xu,
Ye Tian,
Ying Sun,
Gesualdo Scutari
Abstract:
We study distributed composite optimization over networks: agents minimize a sum of smooth (strongly) convex functions, the agents' sum-utility, plus a nonsmooth (extended-valued) convex one. We propose a general unified algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Distinguishing features of our scheme ar…
▽ More
We study distributed composite optimization over networks: agents minimize a sum of smooth (strongly) convex functions, the agents' sum-utility, plus a nonsmooth (extended-valued) convex one. We propose a general unified algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Distinguishing features of our scheme are: (i) When the agents' functions are strongly convex, the algorithm converges at a linear rate, whose dependence on the agents' functions and network topology is decoupled, matching the typical rates of centralized optimization; the rate expression improves on existing results; (ii) When the objective function is convex (but not strongly convex), similar separation as in (i) is established for the coefficient of the proved sublinear rate; (iii) The algorithm can adjust the ratio between the number of communications and computations to achieve a rate (in terms of computations) independent on the network connectivity; and (iv) A by-product of our analysis is a tuning recommendation for several existing (non accelerated) distributed algorithms yielding the fastest provably (worst-case) convergence rate. This is the first time that a general distributed algorithmic framework applicable to composite optimization enjoys all such properties.
△ Less
Submitted 12 March, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks
Authors:
**ming Xu,
Ye Tian,
Ying Sun,
Gesualdo Scutari
Abstract:
This paper proposes a novel family of primal-dual-based distributed algorithms for smooth, convex, multi-agent optimization over networks that uses only gradient information and gossip communications. The algorithms can also employ acceleration on the computation and communications. We provide a unified analysis of their convergence rate, measured in terms of the Bregman distance associated to the…
▽ More
This paper proposes a novel family of primal-dual-based distributed algorithms for smooth, convex, multi-agent optimization over networks that uses only gradient information and gossip communications. The algorithms can also employ acceleration on the computation and communications. We provide a unified analysis of their convergence rate, measured in terms of the Bregman distance associated to the saddle point reformation of the distributed optimization problem. When acceleration is employed, the rate is shown to be optimal, in the sense that it matches (under the proposed metric) existing complexity lower bounds of distributed algorithms applicable to such a class of problem and using only gradient information and gossip communications. Preliminary numerical results on distributed least-square regression problems show that the proposed algorithm compares favorably on existing distributed schemes.
△ Less
Submitted 2 March, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization
Authors:
**ming Xu,
Ying Sun,
Ye Tian,
Gesualdo Scutari
Abstract:
We study distributed composite optimization over networks: agents minimize the sum of a smooth (strongly) convex function, the agents' sum-utility, plus a non-smooth (extended-valued) convex one. We propose a general algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Our results unify several approaches propose…
▽ More
We study distributed composite optimization over networks: agents minimize the sum of a smooth (strongly) convex function, the agents' sum-utility, plus a non-smooth (extended-valued) convex one. We propose a general algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Our results unify several approaches proposed in the literature of distributed optimization for special instances of our formulation. Distinguishing features of our scheme are: (i) when the agents' functions are strongly convex, the algorithm converges at a linear rate, whose dependencies on the agents' functions and the network topology are decoupled, matching the typical rates of centralized optimization; (ii) the step-size does not depend on the network parameters but only on the optimization ones; and (iii) the algorithm can adjust the ratio between the number of communications and computations to achieve the same rate of the centralized proximal gradient scheme (in terms of computations). This is the first time that a distributed algorithm applicable to composite optimization enjoys such properties.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Asynchronous Decentralized Successive Convex Approximation
Authors:
Ye Tian,
Ying Sun,
Gesualdo Scutari
Abstract:
We study decentralized asynchronous multiagent optimization over networks, modeled as static (possibly directed) graphs. The optimization problem consists of minimizing a (possibly nonconvex) smooth function--the sum of the agents' local costs--plus a convex (possibly nonsmooth) regularizer, subject to convex constraints. Agents can perform their local computations as well as communicate with thei…
▽ More
We study decentralized asynchronous multiagent optimization over networks, modeled as static (possibly directed) graphs. The optimization problem consists of minimizing a (possibly nonconvex) smooth function--the sum of the agents' local costs--plus a convex (possibly nonsmooth) regularizer, subject to convex constraints. Agents can perform their local computations as well as communicate with their immediate neighbors at any time, without any form of coordination or centralized scheduling; furthermore, when solving their local subproblems, they can use outdated information from their neighbors. We propose the first distributed asynchronous algorithm, termed ASY-DSCA, that converges at an R-linear rate to the optimal solution of convex problems whose objective function satisfies a general error bound condition; this condition is weaker than the more frequently used strong convexity, and it is satisfied by several empirical risk functions that are not strongly convex; examples include LASSO and logistic regression problems. When the objective function is nonconvex, ASY-DSCA converges to a stationary solution of the problem at a sublinear rate.
△ Less
Submitted 30 January, 2020; v1 submitted 22 September, 2019;
originally announced September 2019.
-
Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation
Authors:
Ying Sun,
Amir Daneshmand,
Gesualdo Scutari
Abstract:
We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond…
▽ More
We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of $F$. SONATA achieves precision $ε>0$ on the objective value in $\mathcal{O}(κ_g \log(1/ε))$ gradient computations at each node and $\tilde{\mathcal{O}}\big(κ_g (1-ρ)^{-1/2} \log(1/ε)\big)$ communication steps, where $κ_g$ is the condition number of $F$ and $ρ$ characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing $F$, whose rate depends on much larger quantities than $κ_g$ (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision $ε>0$ in $\mathcal{O}\big((β/μ) \log(1/ε)\big)$ iterations and $\tilde{\mathcal{O}}\big((β/μ) (1-ρ)^{-1/2} \log(1/ε)\big)$ communication steps, where $β$ measures the degree of similarity of the agents' losses and $μ$ is the strong convexity constant of $F$. Therefore, when $β/μ< κ_g$, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network.
△ Less
Submitted 11 October, 2020; v1 submitted 7 May, 2019;
originally announced May 2019.
-
Finite rate distributed weight-balancing and average consensus over digraphs
Authors:
Chang-Shen Lee,
Nicolò Michelusi,
Gesualdo Scutari
Abstract:
This paper proposes the first distributed algorithm that solves the weight-balancing problem using only finite rate and simplex communications among nodes, compliant with the directed nature of the graph edges. It is proved that the algorithm converges to a weight-balanced solution at sublinear rate. The analysis builds upon a new metric inspired by positional system representations, which charact…
▽ More
This paper proposes the first distributed algorithm that solves the weight-balancing problem using only finite rate and simplex communications among nodes, compliant with the directed nature of the graph edges. It is proved that the algorithm converges to a weight-balanced solution at sublinear rate. The analysis builds upon a new metric inspired by positional system representations, which characterizes the dynamics of information exchange over the network, and on a novel step-size rule. Building on this result, a novel distributed algorithm is proposed that solves the average consensus problem over digraphs, using, at each timeslot, finite rate simplex communications between adjacent nodes -- some bits for the weight-balancing problem and others for the average consensus. Convergence of the proposed quantized consensus algorithm to the average of the node's unquantized initial values is established, both almost surely and in the moment generating function of the error; and a sublinear convergence rate is proved for sufficiently large step-sizes. Numerical results validate our theoretical findings.
△ Less
Submitted 29 February, 2020; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Second-order Guarantees of Distributed Gradient Algorithms
Authors:
Amir Daneshmand,
Gesualdo Scutari,
Vyacheslav Kungurtsev
Abstract:
We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more rec…
▽ More
We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more recent class of distributed algorithms based on gradient tracking--implementable also over digraphs--likely converges to exact SoS solutions, thus avoiding (strict) saddle-points. Furthermore, new convergence rate results to first-order critical points is established for the latter class of algorithms.
△ Less
Submitted 25 May, 2020; v1 submitted 23 September, 2018;
originally announced September 2018.
-
Limited Rate Distributed Weight-Balancing and Average Consensus Over Digraphs
Authors:
Chang-Shen Lee,
Nicolò Michelusi,
Gesualdo Scutari
Abstract:
Distributed quantized weight-balancing and average consensus over fixed digraphs are considered. A digraph with non-negative weights associated to its edges is weight-balanced if, for each node, the sum of the weights of its out-going edges is equal to that of its incoming edges. This paper proposes and analyzes the first distributed algorithm that solves the weight-balancing problem using only fi…
▽ More
Distributed quantized weight-balancing and average consensus over fixed digraphs are considered. A digraph with non-negative weights associated to its edges is weight-balanced if, for each node, the sum of the weights of its out-going edges is equal to that of its incoming edges. This paper proposes and analyzes the first distributed algorithm that solves the weight-balancing problem using only finite rate and simplex communications among nodes (compliant to the directed nature of the graph edges). Asymptotic convergence of the scheme is proved and a convergence rate analysis is provided. Building on this result, a novel distributed algorithm is proposed that solves the average consensus problem over digraphs, using, at each iteration, finite rate simplex communications between adjacent nodes -- some bits for the weight-balancing problem, other for the average consensus. Convergence of the proposed quantized consensus algorithm to the average of the real (i.e., unquantized) agent's initial values is proved, both almost surely and in $r$th mean for all positive integer $r$. Finally, numerical results validate our theoretical findings.
△ Less
Submitted 17 September, 2018;
originally announced September 2018.
-
Distributed Nonconvex Constrained Optimization over Time-Varying Digraphs
Authors:
Gesualdo Scutari,
Ying Sun
Abstract:
This paper considers nonconvex distributed constrained optimization over networks, modeled as directed (possibly time-varying) graphs. We introduce the first algorithmic framework for the minimization of the sum of a smooth nonconvex (nonseparable) function--the agent's sum-utility--plus a Difference-of-Convex (DC) function (with nonsmooth convex part). This general formulation arises in many appl…
▽ More
This paper considers nonconvex distributed constrained optimization over networks, modeled as directed (possibly time-varying) graphs. We introduce the first algorithmic framework for the minimization of the sum of a smooth nonconvex (nonseparable) function--the agent's sum-utility--plus a Difference-of-Convex (DC) function (with nonsmooth convex part). This general formulation arises in many applications, from statistical machine learning to engineering. The proposed distributed method combines successive convex approximation techniques with a judiciously designed perturbed push-sum consensus mechanism that aims to track locally the gradient of the (smooth part of the) sum-utility. Sublinear convergence rate is proved when a fixed step-size (possibly different among the agents) is employed whereas asymptotic convergence to stationary solutions is proved using a diminishing step-size. Numerical results show that our algorithms compare favorably with current schemes on both convex and nonconvex problems.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
Distributed Big-Data Optimization via Block-wise Gradient Tracking
Authors:
Ivano Notarnicola,
Ying Sun,
Gesualdo Scutari,
Giuseppe Notarstefano
Abstract:
We study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is on big-data problems in which there is a large number of variables to optimize. If treated by means of standard distributed optimi…
▽ More
We study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is on big-data problems in which there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method where, at each iteration, agents update in an uncoordinated fashion only one block of the entire decision vector. To deal with the nonconvexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques combined with a novel block-wise perturbed push-sum consensus protocol, which is instrumental to perform local block-averaging operations and tracking of gradient averages. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed.
△ Less
Submitted 31 August, 2018; v1 submitted 22 August, 2018;
originally announced August 2018.
-
Decentralized Dictionary Learning Over Time-Varying Digraphs
Authors:
Amir Daneshmand,
Ying Sun,
Gesualdo Scutari,
Francisco Facchinei,
Brian M. Sadler
Abstract:
This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be ineffi…
▽ More
This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs.
△ Less
Submitted 5 March, 2019; v1 submitted 17 August, 2018;
originally announced August 2018.
-
Distributed Big-Data Optimization via Block Communications
Authors:
Ivano Notarnicola,
Ying Sun,
Gesualdo Scutari,
Giuseppe Notarstefano
Abstract:
We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication…
▽ More
We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication overhead. To address this issue, we propose the first distributed algorithm whereby agents optimize and communicate only a portion of their local variables. The scheme hinges on successive convex approximation (SCA) to handle the nonconvexity of the objective function, coupled with a novel block-signal tracking scheme, aiming at locally estimating the average of the agents' gradients. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Numerical results on a sparse regression problem show the effectiveness of the proposed algorithm and the impact of the block size on its practical convergence speed and communication cost.
△ Less
Submitted 27 May, 2018;
originally announced May 2018.
-
Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization
Authors:
Gesualdo Scutari,
Ying Sun
Abstract:
Recent years have witnessed a surge of interest in parallel and distributed optimization methods for large-scale systems. In particular, nonconvex large-scale optimization problems have found a wide range of applications in several engineering fields. The design and the analysis of such complex, large-scale, systems pose several challenges and call for the development of new optimization models an…
▽ More
Recent years have witnessed a surge of interest in parallel and distributed optimization methods for large-scale systems. In particular, nonconvex large-scale optimization problems have found a wide range of applications in several engineering fields. The design and the analysis of such complex, large-scale, systems pose several challenges and call for the development of new optimization models and algorithms. The major contribution of this paper is to put forth a general, unified, algorithmic framework, based on Successive Convex Approximation (SCA) techniques, for the parallel and distributed solution of a general class of non-convex constrained (non-separable, networked) problems. The presented framework unifies and generalizes several existing SCA methods, making them appealing for a parallel/distributed implementation while offering a flexible selection of function approximants, step size schedules, and control of the computation/communication efficiency. This paper is organized according to the lectures that one of the authors delivered at the CIME Summer School on Centralized and Distributed Multi-agent Optimization Models and Algorithms, held in Cetraro, Italy, June 23--27, 2014. These lectures are: I) Successive Convex Approximation Methods: Basics; II) Parallel Successive Convex Approximation Methods; and III) Distributed Successive Convex Approximation Methods.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Distributed Big-Data Optimization via Block-Iterative Convexification and Averaging
Authors:
Ivano Notarnicola,
Ying Sun,
Gesualdo Scutari,
Giuseppe Notarstefano
Abstract:
In this paper, we study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is in big-data problems wherein there is a large number of variables to optimize. If treated by means of standard dist…
▽ More
In this paper, we study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is in big-data problems wherein there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable, due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method whereby at each iteration agents optimize and then communicate (in an uncoordinated fashion) only a subset of their decision variables. To deal with non-convexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate gradient averages; and ii) a novel block-wise consensus-based protocol to perform local block-averaging operations and gradient tacking. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
Achieving Linear Convergence in Distributed Asynchronous Multi-agent Optimization
Authors:
Ye Tian,
Ying Sun,
Gesualdo Scutari
Abstract:
This papers studies multi-agent (convex and \emph{nonconvex}) optimization over static digraphs. We propose a general distributed \emph{asynchronous} algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-syn…
▽ More
This papers studies multi-agent (convex and \emph{nonconvex}) optimization over static digraphs. We propose a general distributed \emph{asynchronous} algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not be known to the agent or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents' gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is {sufficiently small}. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, {\it uncoordinated} step-sizes are considered. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings.
△ Less
Submitted 11 September, 2019; v1 submitted 27 March, 2018;
originally announced March 2018.
-
Ghost Penalties in Nonconvex Constrained Optimization: Diminishing Stepsizes and Iteration Complexity
Authors:
Francisco Facchinei,
Vyacheslav Kungurtsev,
Lorenzo Lampariello,
Gesualdo Scutari
Abstract:
We consider nonconvex constrained optimization problems and propose a new approach to the convergence analysis based on penalty functions. We make use of classical penalty functions in an unconventional way, in that penalty functions only enter in the theoretical analysis of convergence while the algorithm itself is penalty-free. Based on this idea, we are able to establish several new results, in…
▽ More
We consider nonconvex constrained optimization problems and propose a new approach to the convergence analysis based on penalty functions. We make use of classical penalty functions in an unconventional way, in that penalty functions only enter in the theoretical analysis of convergence while the algorithm itself is penalty-free. Based on this idea, we are able to establish several new results, including the first general analysis for diminishing stepsize methods in nonconvex, constrained optimization, showing convergence to generalized stationary points, and a complexity study for SQP-type algorithms.
△ Less
Submitted 30 May, 2020; v1 submitted 11 September, 2017;
originally announced September 2017.
-
Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization. Part II: Complexity and Numerical Results
Authors:
Loris Cannelli,
Francisco Facchinei,
Vyacheslav Kungurtsev,
Gesualdo Scutari
Abstract:
We present complexity and numerical results for a new asynchronous parallel algorithmic method for the minimization of the sum of a smooth nonconvex function and a convex nonsmooth regularizer, subject to both convex and nonconvex constraints. The proposed method hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational…
▽ More
We present complexity and numerical results for a new asynchronous parallel algorithmic method for the minimization of the sum of a smooth nonconvex function and a convex nonsmooth regularizer, subject to both convex and nonconvex constraints. The proposed method hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchronous implementations in a more faithful way than state-of-the-art models. In the companion paper we provided a detailed description on the probabilistic model and gave convergence results for a diminishing stepsize version of our method. Here, we provide theoretical complexity results for a fixed stepsize version of the method and report extensive numerical comparisons on both convex and nonconvex problems demonstrating the efficiency of our approach.
△ Less
Submitted 19 January, 2017; v1 submitted 17 January, 2017;
originally announced January 2017.
-
Distributed Dictionary Learning
Authors:
Amir Daneshmand,
Gesualdo Scutari,
Francisco Facchinei
Abstract:
The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion c…
▽ More
The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion center, due to resource limitations, communication overhead or privacy considerations. We develop a general distributed algorithmic framework for the (nonconvex) DL problem and establish its asymptotic convergence. The new method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a gradient tracking mechanism instrumental to locally estimate the missing global information; and ii) a consensus step, as a mechanism to distribute the computations among the agents. To the best of our knowledge, this is the first distributed algorithm with provable convergence for the DL problem and, more in general, bi-convex optimization problems over (time-varying) directed graphs.
△ Less
Submitted 21 December, 2016;
originally announced December 2016.
-
Distributed Nonconvex Optimization for Sparse Representation
Authors:
Ying Sun,
Gesualdo Scutari
Abstract:
We consider a non-convex constrained Lagrangian formulation of a fundamental bi-criteria optimization problem for variable selection in statistical learning; the two criteria are a smooth (possibly) nonconvex loss function, measuring the fitness of the model to data, and the latter function is a difference-of-convex (DC) regularization, employed to promote some extra structure on the solution, lik…
▽ More
We consider a non-convex constrained Lagrangian formulation of a fundamental bi-criteria optimization problem for variable selection in statistical learning; the two criteria are a smooth (possibly) nonconvex loss function, measuring the fitness of the model to data, and the latter function is a difference-of-convex (DC) regularization, employed to promote some extra structure on the solution, like sparsity. This general class of nonconvex problems arises in many big-data applications, from statistical machine learning to physical sciences and engineering. We develop the first unified distributed algorithmic framework for these problems and establish its asymptotic convergence to d-stationary solutions. Two key features of the method are: i) it can be implemented on arbitrary networks (digraphs) with (possibly) time-varying connectivity; and ii) it does not require the restrictive assumption that the (sub)gradient of the objective function is bounded, which enlarges significantly the class of statistical learning problems that can be solved with convergence guarantees.
△ Less
Submitted 20 November, 2016;
originally announced November 2016.
-
Asynchronous Parallel Algorithms for Nonconvex Optimization
Authors:
Loris Cannelli,
Francisco Facchinei,
Vyacheslav Kungurtsev,
Gesualdo Scutari
Abstract:
We propose a new asynchronous parallel block-descent algorithmic framework for the minimization of the sum of a smooth nonconvex function and a nonsmooth convex one, subject to both convex and nonconvex constraints. The proposed framework hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchr…
▽ More
We propose a new asynchronous parallel block-descent algorithmic framework for the minimization of the sum of a smooth nonconvex function and a nonsmooth convex one, subject to both convex and nonconvex constraints. The proposed framework hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchronous implementations in a more faithful way than current state-of-the-art models. Other key features of the framework are: i) it covers in a unified way several specific solution methods; ii) it accommodates a variety of possible parallel computing architectures; and iii) it can deal with nonconvex constraints. Almost sure convergence to stationary solutions is proved, and theoretical complexity results are provided, showing nearly ideal linear speedup when the number of workers is not too large.
△ Less
Submitted 29 March, 2018; v1 submitted 16 July, 2016;
originally announced July 2016.
-
Distributed Nonconvex Multiagent Optimization Over Time-Varying Networks
Authors:
Ying Sun,
Gesualdo Scutari,
Daniel Palomar
Abstract:
We study nonconvex distributed optimization in multiagent networks where the communications between nodes is modeled as a time-varying sequence of arbitrary digraphs. We introduce a novel broadcast-based distributed algorithmic framework for the (constrained) minimization of the sum of a smooth (possibly nonconvex and nonseparable) function, i.e., the agents' sum-utility, plus a convex (possibly n…
▽ More
We study nonconvex distributed optimization in multiagent networks where the communications between nodes is modeled as a time-varying sequence of arbitrary digraphs. We introduce a novel broadcast-based distributed algorithmic framework for the (constrained) minimization of the sum of a smooth (possibly nonconvex and nonseparable) function, i.e., the agents' sum-utility, plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce some structure in the solution, typically sparsity. The proposed method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate the gradients of agents' cost functions; and ii) a novel broadcast protocol to disseminate information and distribute the computation among the agents. Asymptotic convergence to stationary solutions is established. A key feature of the proposed algorithm is that it neither requires the double-stochasticity of the consensus matrices (but only column stochasticity) nor the knowledge of the graph sequence to implement. To the best of our knowledge, the proposed framework is the first broadcast-based distributed algorithm for convex and nonconvex constrained optimization over arbitrary, time-varying digraphs. Numerical results show that our algorithm outperforms current schemes on both convex and nonconvex problems.
△ Less
Submitted 14 December, 2016; v1 submitted 1 July, 2016;
originally announced July 2016.
-
NEXT: In-Network Nonconvex Optimization
Authors:
Paolo Di Lorenzo,
Gesualdo Scutari
Abstract:
We study nonconvex distributed optimization in multi-agent networks with time-varying (nonsymmetric) connectivity. We introduce the first algorithmic framework for the distributed minimization of the sum of a smooth (possibly nonconvex and nonseparable) function - the agents' sum-utility - plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce so…
▽ More
We study nonconvex distributed optimization in multi-agent networks with time-varying (nonsymmetric) connectivity. We introduce the first algorithmic framework for the distributed minimization of the sum of a smooth (possibly nonconvex and nonseparable) function - the agents' sum-utility - plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce some structure in the solution, typically sparsity. The proposed method hinges on successive convex approximation techniques while leveraging dynamic consensus as a mechanism to distribute the computation among the agents: each agent first solves (possibly inexactly) a local convex approximation of the nonconvex original problem, and then performs local averaging operations. Asymptotic convergence to (stationary) solutions of the nonconvex problem is established. Our algorithmic framework is then customized to a variety of convex and nonconvex problems in several fields, including signal processing, communications, networking, and machine learning. Numerical results show that the new method compares favorably to existing distributed algorithms on both convex and nonconvex problems.
△ Less
Submitted 1 February, 2016;
originally announced February 2016.
-
A Parallel Stochastic Approximation Method for Nonconvex Multi-Agent Optimization Problems
Authors:
Yang Yang,
Gesualdo Scutari,
Daniel P. Palomar,
Marius Pesavento
Abstract:
Consider the problem of minimizing the expected value of a (possibly nonconvex) cost function parameterized by a random (vector) variable, when the expectation cannot be computed accurately (e.g., because the statistics of the random variables are unknown and/or the computational complexity is prohibitive). Classical sample stochastic gradient methods for solving this problem may empirically suffe…
▽ More
Consider the problem of minimizing the expected value of a (possibly nonconvex) cost function parameterized by a random (vector) variable, when the expectation cannot be computed accurately (e.g., because the statistics of the random variables are unknown and/or the computational complexity is prohibitive). Classical sample stochastic gradient methods for solving this problem may empirically suffer from slow convergence. In this paper, we propose for the first time a stochastic parallel Successive Convex Approximation-based (best-response) algorithmic framework for general nonconvex stochastic sum-utility optimization problems, which arise naturally in the design of multi-agent systems. The proposed novel decomposition enables all users to update their optimization variables in parallel by solving a sequence of strongly convex subproblems, one for each user. Almost surely convergence to stationary points is proved. We then customize our algorithmic framework to solve the stochastic sum rate maximization problem over Single-Input-Single-Output (SISO) frequency-selective interference channels, multiple-input-multiple-output (MIMO) interference channels, and MIMO multiple-access channels. Numerical results show that our algorithms are much faster than state-of-the-art stochastic gradient schemes while achieving the same (or better) sum-rates.
△ Less
Submitted 21 October, 2014; v1 submitted 19 October, 2014;
originally announced October 2014.
-
Parallel and Distributed Methods for Nonconvex Optimization-Part I: Theory
Authors:
Gesualdo Scutari,
Francisco Facchinei,
Lorenzo Lampariello,
Peiran Song
Abstract:
In this two-part paper, we propose a general algorithmic framework for the minimization of a nonconvex smooth function subject to nonconvex smooth constraints. The algorithm solves a sequence of (separable) strongly convex problems and mantains feasibility at each iteration. Convergence to a stationary solution of the original nonconvex optimization is established. Our framework is very general an…
▽ More
In this two-part paper, we propose a general algorithmic framework for the minimization of a nonconvex smooth function subject to nonconvex smooth constraints. The algorithm solves a sequence of (separable) strongly convex problems and mantains feasibility at each iteration. Convergence to a stationary solution of the original nonconvex optimization is established. Our framework is very general and flexible; it unifies several existing Successive Convex Approximation (SCA)-based algorithms such as (proximal) gradient or Newton type methods, block coordinate (parallel) descent schemes, difference of convex functions methods, and improves on their convergence properties. More importantly, and differently from current SCA approaches, it naturally leads to distributed and parallelizable implementations for a large class of nonconvex problems.
This Part I is devoted to the description of the framework in its generality. In Part II we customize our general methods to several multi-agent optimization problems, mainly in communications and networking; the result is a new class of (distributed) algorithms that compare favorably to existing ad-hoc (centralized) schemes (when they exist).
△ Less
Submitted 14 January, 2016; v1 submitted 17 October, 2014;
originally announced October 2014.
-
Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization
Authors:
Amir Daneshmand,
Francisco Facchinei,
Vyacheslav Kungurtsev,
Gesualdo Scutari
Abstract:
We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at…
▽ More
We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing local convex approximations of the original nonconvex function. To tackle with huge-scale problems, the (block) variables to be updated are chosen according to a \emph{mixed random and deterministic} procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on huge-scale problems the proposed hybrid random/deterministic algorithm outperforms both random and deterministic schemes.
△ Less
Submitted 2 September, 2014; v1 submitted 16 July, 2014;
originally announced July 2014.
-
Parallel Selective Algorithms for Big Data Optimization
Authors:
Francisco Facchinei,
Gesualdo Scutari,
Simone Sagratella
Abstract:
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e., sequential) ones, as well a…
▽ More
We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e., sequential) ones, as well as virtually all possibilities "in between" with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results on LASSO, logistic regression, and some nonconvex quadratic problems show that the new method consistently outperforms existing algorithms.
△ Less
Submitted 8 December, 2014; v1 submitted 22 February, 2014;
originally announced February 2014.
-
Flexible Parallel Algorithms for Big Data Optimization
Authors:
Francisco Facchinei,
Simone Sagratella,
Gesualdo Scutari
Abstract:
We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in Lasso problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually…
▽ More
We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in Lasso problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually all possibilities in between (e.g., gradient- or Newton-type methods) with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results show that the new method compares favorably to existing algorithms.
△ Less
Submitted 11 November, 2013;
originally announced November 2013.
-
A New Distributed DC-Programming Method and its Applications
Authors:
Alberth Alvarado,
Gesualdo Scutari,
Jong-Shi Pang
Abstract:
We propose a novel decomposition framework for the distributed optimization of Difference Convex (DC)-type nonseparable sum-utility functions subject to coupling convex constraints. A major contribution of the paper is to develop for the first time a class of (inexact) best-response-like algorithms with provable convergence, where a suitably convexified version of the original DC program is iterat…
▽ More
We propose a novel decomposition framework for the distributed optimization of Difference Convex (DC)-type nonseparable sum-utility functions subject to coupling convex constraints. A major contribution of the paper is to develop for the first time a class of (inexact) best-response-like algorithms with provable convergence, where a suitably convexified version of the original DC program is iteratively solved. The main feature of the proposed successive convex approximation method is its decomposability structure across the users, which leads naturally to distributed algorithms in the primal and/or dual domain. The proposed framework is applicable to a variety of multiuser DC problems in different areas, ranging from signal processing, to communications and networking. As a case study, in the second part of the paper we focus on two examples, namely: i) a novel resource allocation problem in the emerging area of cooperative physical layer security; ii) and the renowned sum-rate maximization of MIMO Cognitive Radio networks. Our contribution in this context is to devise a class of easy-to-implement distributed algorithms with provable convergence to stationary solution of such problems. Numerical results show that the proposed distributed schemes reach performance close to (and sometimes better than) that of centralized methods.
△ Less
Submitted 20 September, 2013; v1 submitted 15 August, 2013;
originally announced August 2013.
-
Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems
Authors:
Gesualdo Scutari,
Francisco Facchinei,
Peiran Song,
Daniel P. Palomar,
Jong-Shi Pang
Abstract:
We propose a novel decomposition framework for the distributed optimization of general nonconvex sum-utility functions arising naturally in the system design of wireless multiuser interfering systems. Our main contributions are: i) the development of the first class of (inexact) Jacobi best-response algorithms with provable convergence, where all the users simultaneously and iteratively solve a su…
▽ More
We propose a novel decomposition framework for the distributed optimization of general nonconvex sum-utility functions arising naturally in the system design of wireless multiuser interfering systems. Our main contributions are: i) the development of the first class of (inexact) Jacobi best-response algorithms with provable convergence, where all the users simultaneously and iteratively solve a suitably convexified version of the original sum-utility optimization problem; ii) the derivation of a general dynamic pricing mechanism that provides a unified view of existing pricing schemes that are based, instead, on heuristics; and iii) a framework that can be easily particularized to well-known applications, giving rise to very efficient practical (Jacobi or Gauss-Seidel) algorithms that outperform existing adhoc methods proposed for very specific problems. Interestingly, our framework contains as special cases well-known gradient algorithms for nonconvex sum-utility problems, and many blockcoordinate descent schemes for convex functions.
△ Less
Submitted 19 September, 2013; v1 submitted 4 February, 2013;
originally announced February 2013.
-
Competitive Design of Multiuser MIMO Systems based on Game Theory: A Unified View
Authors:
Gesualdo Scutari,
Daniel P. Palomar,
Sergio Barbarossa
Abstract:
This paper considers the noncooperative maximization of mutual information in the Gaussian interference channel in a fully distributed fashion via game theory. This problem has been studied in a number of papers during the past decade for the case of frequency-selective channels. A variety of conditions guaranteeing the uniqueness of the Nash Equilibrium (NE) and convergence of many different di…
▽ More
This paper considers the noncooperative maximization of mutual information in the Gaussian interference channel in a fully distributed fashion via game theory. This problem has been studied in a number of papers during the past decade for the case of frequency-selective channels. A variety of conditions guaranteeing the uniqueness of the Nash Equilibrium (NE) and convergence of many different distributed algorithms have been derived. In this paper we provide a unified view of the state-of-the-art results, showing that most of the techniques proposed in the literature to study the game, even though apparently different, can be unified using our recent interpretation of the waterfilling operator as a projection onto a proper polyhedral set. Based on this interpretation, we then provide a mathematical framework, useful to derive a unified set of sufficient conditions guaranteeing the uniqueness of the NE and the global convergence of waterfilling based asynchronous distributed algorithms.
The proposed mathematical framework is also instrumental to study the extension of the game to the more general MIMO case, for which only few results are available in the current literature. The resulting algorithm is, similarly to the frequency-selective case, an iterative asynchronous MIMO waterfilling algorithm. The proof of convergence hinges again on the interpretation of the MIMO waterfilling as a matrix projection, which is the natural generalization of our results obtained for the waterfilling map** in the frequency-selective case.
△ Less
Submitted 9 June, 2008;
originally announced June 2008.