Search | arXiv e-print repository

Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

Authors: Dmitry Kovalev, Aleksandr Beznosikov, Ekaterina Borodich, Alexander Gasnikov, Gesualdo Scutari

Abstract: We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of g… ▽ More We study structured convex optimization problems, with additive objective $r:=p + q$, where $r$ is ($μ$-strongly) convex, $q$ is $L_q$-smooth and convex, and $p$ is $L_p$-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of $p$ and $q$, that is, $\mathcal{O}(\sqrt{L_p/μ})$ and $\mathcal{O}(\sqrt{L_q/μ})$, respectively. This result is much sharper than the classic black-box complexity $\mathcal{O}(\sqrt{(L_p+L_q)/μ})$, especially when the difference between $L_q$ and $L_q$ is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on {\it both} communication and local gradient calls, with the former having being a long-standing open problem. Finally the method is extended to distributed saddle-problems (under function similarity) by means of solving a class of variational inequalities, achieving lower communication and computation complexity bounds. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: 24 pages, 2 new algorithms, 12 theorems, 2 figures

arXiv:2201.08507 [pdf, ps, other]

High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees

Authors: Ying Sun, Marie Maros, Gesualdo Scutari, Guang Cheng

Abstract: We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimens… ▽ More We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimensional scaling, allowing the ambient dimension $d$ to grow with (and possibly exceed) the sample size $N$. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$. When $s\log d/N=o(1)$, a condition necessary for statistical consistency, an $\varepsilon$-optimal solution is attained after $\mathcal{O}(κ\log (1/\varepsilon))$ gradient computations and $O (κ/(1-ρ) \log (1/\varepsilon))$ communication rounds, where $κ$ is the restricted condition number of the loss function and $ρ$ measures the network connectivity. The computation cost matches that of the centralized projected gradient algorithm despite having data distributed; whereas the communication rounds reduce as the network connectivity improves. Overall, our study reveals interesting connections between statistical efficiency, network connectivity \& topology, and convergence rate in high dimensions. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: 50 pages, 7 figures

arXiv:2111.06530 [pdf, other]

Distributed Sparse Regression via Penalization

Authors: Yao Ji, Gesualdo Scutari, Ying Sun, Harsha Honnappa

Abstract: We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been exten… ▽ More We study sparse linear regression over a network of agents, modeled as an undirected graph (with no centralized node). The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint -- the latter being instrumental to obtain distributed solution methods. While penalty-based consensus methods have been extensively studied in the optimization literature, their statistical and computational guarantees in the high dimensional setting remain unclear. This work provides an answer to this open problem. Our contribution is two-fold. First, we establish statistical consistency of the estimator: under a suitable choice of the penalty parameter, the optimal solution of the penalized problem achieves near optimal minimax rate $\mathcal{O}(s \log d/N)$ in $\ell_2$-loss, where $s$ is the sparsity value, $d$ is the ambient dimension, and $N$ is the total sample size in the network -- this matches centralized sample rates. Second, we show that the proximal-gradient algorithm applied to the penalized problem, which naturally leads to distributed implementations, converges linearly up to a tolerance of the order of the centralized statistical error -- the rate scales as $\mathcal{O}(d)$, revealing an unavoidable speed-accuracy dilemma.Numerical results demonstrate the tightness of the derived sample rate and convergence rate scalings. △ Less

Submitted 21 June, 2023; v1 submitted 11 November, 2021; originally announced November 2021.

Comments: 63 pages, journal publication

arXiv:2110.12347 [pdf, other]

Acceleration in Distributed Optimization under Similarity

Authors: Ye Tian, Gesualdo Scutari, Tianyu Cao, Alexander Gasnikov

Abstract: We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a {\it preconditioned, accelerated} distributed method. An $\varepsilon$-solut… ▽ More We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. The loss functions of the agents are assumed to be \textit{similar}, due to statistical data similarity or otherwise. In order to reduce the number of communications to reach a solution accuracy, we proposed a {\it preconditioned, accelerated} distributed method. An $\varepsilon$-solution is achieved in $\tilde{\mathcal{O}}\big(\sqrt{\frac{β/μ}{1-ρ}}\log1/\varepsilon\big)$ number of communications steps, where $β/μ$ is the relative condition number between the global and local loss functions, and $ρ$ characterizes the connectivity of the network. This rate matches (up to poly-log factors) lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest. Numerical results show significant communication savings with respect to existing accelerated distributed schemes, especially when solving ill-conditioned problems. △ Less

Submitted 9 April, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

arXiv:2110.10406 [pdf, other]

Decentralized Asynchronous Non-convex Stochastic Optimization on Directed Graphs

Authors: Vyacheslav Kungurtsev, Mahdi Morafah, Tara Javidi, Gesualdo Scutari

Abstract: Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize a collective (additive) objective function consisting of agents' individual (possibly non-convex) local objective functions. Each agent only has access to a no… ▽ More Distributed Optimization is an increasingly important subject area with the rise of multi-agent control and optimization. We consider a decentralized stochastic optimization problem where the agents on a graph aim to asynchronously optimize a collective (additive) objective function consisting of agents' individual (possibly non-convex) local objective functions. Each agent only has access to a noisy estimate of the gradient of its own function (one component of the sum of objective functions). We proposed an asynchronous distributed algorithm for such a class of problems. The algorithm combines stochastic gradients with tracking in an asynchronous push-sum framework and obtain the standard sublinear convergence rate for general non-convex functions, matching the rate of centralized stochastic gradient descent SGD. Our experiments on a non-convex image classification task using convolutional neural network validate the convergence of our proposed algorithm across different number of nodes and graph connectivity percentages. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2107.11304 [pdf, other]

Finite-Bit Quantization For Distributed Algorithms With Linear Convergence

Authors: Nicolò Michelusi, Gesualdo Scutari, Chang-Shen Lee

Abstract: This paper studies distributed algorithms for (strongly convex) composite optimization problems over mesh networks, subject to quantized communications. Instead of focusing on a specific algorithmic design, a black-box model is proposed, casting linearly convergent distributed algorithms in the form of fixed-point iterates. The algorithmic model is equipped with a novel random or deterministic Bia… ▽ More This paper studies distributed algorithms for (strongly convex) composite optimization problems over mesh networks, subject to quantized communications. Instead of focusing on a specific algorithmic design, a black-box model is proposed, casting linearly convergent distributed algorithms in the form of fixed-point iterates. The algorithmic model is equipped with a novel random or deterministic Biased Compression (BC) rule on the quantizer design, and a new Adaptive encoding Nonuniform Quantizer (ANQ) coupled with a communication-efficient encoding scheme, which implements the BC-rule using a finite number of bits (below machine precision). This fills a gap existing in most state-of-the-art quantization schemes, such as those based on the popular compression rule, which rely on communication of some scalar signals with negligible quantization error (in practice quantized at the machine precision). A unified communication complexity analysis is developed for the black-box model, determining the average number of bits required to reach a solution of the optimization problem within a target accuracy. It is shown that the proposed BC-rule preserves linear convergence of the unquantized algorithms, and a trade-off between convergence rate and communication cost under ANQ-based quantization is characterized. Numerical results validate our theoretical findings and show that distributed algorithms equipped with the proposed ANQ have more favorable communication cost than algorithms using state-of-the-art quantization rules. △ Less

Submitted 17 May, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: To appear in the IEEE Transactions on Information Theory

arXiv:2107.10706 [pdf, other]

Distributed Saddle-Point Problems Under Similarity

Authors: Aleksandr Beznosikov, Gesualdo Scutari, Alexander Rogozin, Alexander Gasnikov

Abstract: We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms… ▽ More We study solution methods for (strongly-)convex-(strongly)-concave Saddle-Point Problems (SPPs) over networks of two type - master/workers (thus centralized) architectures and meshed (thus decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving the SPP. We show that a given suboptimality $ε>0$ is achieved over master/workers networks in $Ω\big(Δ\cdot δ/μ\cdot \log (1/\varepsilon)\big)$ rounds of communications, where $δ>0$ measures the degree of similarity of the local functions, $μ$ is their strong convexity constant, and $Δ$ is the diameter of the network. The lower communication complexity bound over meshed networks reads $Ω\big(1/{\sqrtρ} \cdot δ/μ\cdot\log (1/\varepsilon)\big)$, where $ρ$ is the (normalized) eigengap of the gossip matrix used for the communication between neighbouring nodes. We then propose algorithms matching the lower bounds over either types of networks (up to log-factors). We assess the effectiveness of the proposed algorithms on a robust logistic regression problem. △ Less

Submitted 22 August, 2022; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: Appears in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Minor modifications with respect to the NeurIPS version. 35 pages, 3 algorithms, 4 figures, 1 table

Journal ref: https://proceedings.neurips.cc/paper/2021/hash/44e65d3e9bc2f88b2b3d566de51a5381-Abstract.html

arXiv:2103.14392 [pdf, ps, other]

An Accelerated Second-Order Method for Distributed Stochastic Optimization

Authors: Artem Agafonov, Pavel Dvurechensky, Gesualdo Scutari, Alexander Gasnikov, Dmitry Kamzolov, Aleksandr Lukashevich, Amir Daneshmand

Abstract: We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication complexity bound for this setting and improves upon e… ▽ More We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication complexity bound for this setting and improves upon existing upper bound. We further exploit this algorithm to obtain convergence rate bounds for the original stochastic optimization problem and compare our bounds with the existing bounds in several regimes when the goal is to minimize the number of communication rounds and increase the parallelization by increasing the number of workers. △ Less

Submitted 26 March, 2021; originally announced March 2021.

arXiv:2102.06780 [pdf, ps, other]

Newton Method over Networks is Fast up to the Statistical Precision

Authors: Amir Daneshmand, Gesualdo Scutari, Pavel Dvurechensky, Alexander Gasnikov

Abstract: We propose a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph. The algorithm employs an inexact, preconditioned Newton step at each agent's side: the gradient of the centralized loss is iteratively estimated via a gradient-tracking consensus mechanism and the Hessian is subsamp… ▽ More We propose a distributed cubic regularization of the Newton method for solving (constrained) empirical risk minimization problems over a network of agents, modeled as undirected graph. The algorithm employs an inexact, preconditioned Newton step at each agent's side: the gradient of the centralized loss is iteratively estimated via a gradient-tracking consensus mechanism and the Hessian is subsampled over the local data sets. No Hessian matrices are thus exchanged over the network. We derive global complexity bounds for convex and strongly convex losses. Our analysis reveals an interesting interplay between sample and iteration/communication complexity: statistically accurate solutions are achievable in roughly the same number of iterations of the centralized cubic Newton method, with a communication cost per iteration of the order of $\widetilde{\mathcal{O}}\big(1/\sqrt{1-ρ}\big)$, where $ρ$ characterizes the connectivity of the network. This demonstrates a significant communication saving with respect to that of existing, statistically oblivious, distributed Newton-based methods over networks. △ Less

Submitted 16 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: In proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

arXiv:2010.09057 [pdf, other]

Asynchronous Optimization over Graphs: Linear Convergence under Error Bound Conditions

Authors: Loris Cannelli, Francisco Facchinei, Gesualdo Scutari, Vyacheslav Kungurtsev

Abstract: We consider convex and nonconvex constrained optimization with a partially separable objective function: agents minimize the sum of local objective functions, each of which is known only by the associated agent and depends on the variables of that agent and those of a few others. This partitioned setting arises in several applications of practical interest. We propose what is, to the best of our k… ▽ More We consider convex and nonconvex constrained optimization with a partially separable objective function: agents minimize the sum of local objective functions, each of which is known only by the associated agent and depends on the variables of that agent and those of a few others. This partitioned setting arises in several applications of practical interest. We propose what is, to the best of our knowledge, the first distributed, asynchronous algorithm with rate guarantees for this class of problems. When the objective function is nonconvex, the algorithm provably converges to a stationary solution at a sublinear rate whereas linear rate is achieved when the objective satisfies under the renowned Luo-Tseng error bound condition (which is less stringent than strong convexity). Numerical results on matrix completion and LASSO problems show the effectiveness of our method. △ Less

Submitted 18 October, 2020; originally announced October 2020.

Comments: To appear on IEEE Trans. on Automatic Control

arXiv:2007.16024 [pdf, ps, other]

Diminishing Stepsize Methods for Nonconvex Composite Problems via Ghost Penalties: from the General to the Convex Regular Constrained Case

Authors: Francisco Facchinei, Vyacheskav Kungurtsevb, Lorenzo Lampariello, Gesualdo Scutari

Abstract: In this paper we first extend the diminishing stepsize method for nonconvex constrained problems presented in [4] to deal with equality constraints and a nonsmooth objective function of composite type. We then consider the particular case in which the constraints are convex and satisfy a standard constraint qualification and show that in this setting the algorithm can be considerably simplified, r… ▽ More In this paper we first extend the diminishing stepsize method for nonconvex constrained problems presented in [4] to deal with equality constraints and a nonsmooth objective function of composite type. We then consider the particular case in which the constraints are convex and satisfy a standard constraint qualification and show that in this setting the algorithm can be considerably simplified, reducing the computational burden of each iteration. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: arXiv admin note: text overlap with arXiv:1709.03384

arXiv:2002.11534 [pdf, ps, other]

doi 10.1109/TSP.2021.3086579

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

Authors: **ming Xu, Ye Tian, Ying Sun, Gesualdo Scutari

Abstract: We study distributed composite optimization over networks: agents minimize a sum of smooth (strongly) convex functions, the agents' sum-utility, plus a nonsmooth (extended-valued) convex one. We propose a general unified algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Distinguishing features of our scheme ar… ▽ More We study distributed composite optimization over networks: agents minimize a sum of smooth (strongly) convex functions, the agents' sum-utility, plus a nonsmooth (extended-valued) convex one. We propose a general unified algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Distinguishing features of our scheme are: (i) When the agents' functions are strongly convex, the algorithm converges at a linear rate, whose dependence on the agents' functions and network topology is decoupled, matching the typical rates of centralized optimization; the rate expression improves on existing results; (ii) When the objective function is convex (but not strongly convex), similar separation as in (i) is established for the coefficient of the proved sublinear rate; (iii) The algorithm can adjust the ratio between the number of communications and computations to achieve a rate (in terms of computations) independent on the network connectivity; and (iv) A by-product of our analysis is a tuning recommendation for several existing (non accelerated) distributed algorithms yielding the fastest provably (worst-case) convergence rate. This is the first time that a general distributed algorithmic framework applicable to composite optimization enjoys all such properties. △ Less

Submitted 12 March, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

Comments: arXiv admin note: text overlap with arXiv:1910.09817

arXiv:1910.10666 [pdf, ps, other]

Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks

Authors: **ming Xu, Ye Tian, Ying Sun, Gesualdo Scutari

Abstract: This paper proposes a novel family of primal-dual-based distributed algorithms for smooth, convex, multi-agent optimization over networks that uses only gradient information and gossip communications. The algorithms can also employ acceleration on the computation and communications. We provide a unified analysis of their convergence rate, measured in terms of the Bregman distance associated to the… ▽ More This paper proposes a novel family of primal-dual-based distributed algorithms for smooth, convex, multi-agent optimization over networks that uses only gradient information and gossip communications. The algorithms can also employ acceleration on the computation and communications. We provide a unified analysis of their convergence rate, measured in terms of the Bregman distance associated to the saddle point reformation of the distributed optimization problem. When acceleration is employed, the rate is shown to be optimal, in the sense that it matches (under the proposed metric) existing complexity lower bounds of distributed algorithms applicable to such a class of problem and using only gradient information and gossip communications. Preliminary numerical results on distributed least-square regression problems show that the proposed algorithm compares favorably on existing distributed schemes. △ Less

Submitted 2 March, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: final version for AISTATS 2020

arXiv:1910.09817 [pdf, ps, other]

A Unified Contraction Analysis of a Class of Distributed Algorithms for Composite Optimization

Authors: **ming Xu, Ying Sun, Ye Tian, Gesualdo Scutari

Abstract: We study distributed composite optimization over networks: agents minimize the sum of a smooth (strongly) convex function, the agents' sum-utility, plus a non-smooth (extended-valued) convex one. We propose a general algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Our results unify several approaches propose… ▽ More We study distributed composite optimization over networks: agents minimize the sum of a smooth (strongly) convex function, the agents' sum-utility, plus a non-smooth (extended-valued) convex one. We propose a general algorithmic framework for such a class of problems and provide a unified convergence analysis leveraging the theory of operator splitting. Our results unify several approaches proposed in the literature of distributed optimization for special instances of our formulation. Distinguishing features of our scheme are: (i) when the agents' functions are strongly convex, the algorithm converges at a linear rate, whose dependencies on the agents' functions and the network topology are decoupled, matching the typical rates of centralized optimization; (ii) the step-size does not depend on the network parameters but only on the optimization ones; and (iii) the algorithm can adjust the ratio between the number of communications and computations to achieve the same rate of the centralized proximal gradient scheme (in terms of computations). This is the first time that a distributed algorithm applicable to composite optimization enjoys such properties. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: To appear in the Proc. of the 2019 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 19)

arXiv:1909.10144 [pdf, ps, other]

Asynchronous Decentralized Successive Convex Approximation

Authors: Ye Tian, Ying Sun, Gesualdo Scutari

Abstract: We study decentralized asynchronous multiagent optimization over networks, modeled as static (possibly directed) graphs. The optimization problem consists of minimizing a (possibly nonconvex) smooth function--the sum of the agents' local costs--plus a convex (possibly nonsmooth) regularizer, subject to convex constraints. Agents can perform their local computations as well as communicate with thei… ▽ More We study decentralized asynchronous multiagent optimization over networks, modeled as static (possibly directed) graphs. The optimization problem consists of minimizing a (possibly nonconvex) smooth function--the sum of the agents' local costs--plus a convex (possibly nonsmooth) regularizer, subject to convex constraints. Agents can perform their local computations as well as communicate with their immediate neighbors at any time, without any form of coordination or centralized scheduling; furthermore, when solving their local subproblems, they can use outdated information from their neighbors. We propose the first distributed asynchronous algorithm, termed ASY-DSCA, that converges at an R-linear rate to the optimal solution of convex problems whose objective function satisfies a general error bound condition; this condition is weaker than the more frequently used strong convexity, and it is satisfied by several empirical risk functions that are not strongly convex; examples include LASSO and logistic regression problems. When the objective function is nonconvex, ASY-DSCA converges to a stationary solution of the problem at a sublinear rate. △ Less

Submitted 30 January, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

arXiv:1905.02637 [pdf, ps, other]

Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation

Authors: Ying Sun, Amir Daneshmand, Gesualdo Scutari

Abstract: We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond… ▽ More We study distributed multiagent optimization over (directed, time-varying) graphs. We consider the minimization of $F+G$ subject to convex constraints, where $F$ is the smooth strongly convex sum of the agent's losses and $G$ is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (going thus beyond linearization, such as proximal-gradient) coupled with a perturbed (push-sum) consensus mechanism that aims to track locally the gradient of $F$. SONATA achieves precision $ε>0$ on the objective value in $\mathcal{O}(κ_g \log(1/ε))$ gradient computations at each node and $\tilde{\mathcal{O}}\big(κ_g (1-ρ)^{-1/2} \log(1/ε)\big)$ communication steps, where $κ_g$ is the condition number of $F$ and $ρ$ characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (non-accelerated) schemes just minimizing $F$, whose rate depends on much larger quantities than $κ_g$ (e.g., the worst-case condition number among the agents). When considering in particular empirical risk minimization problems with statistically similar data across the agents, SONATA employing high-order surrogates achieves precision $ε>0$ in $\mathcal{O}\big((β/μ) \log(1/ε)\big)$ iterations and $\tilde{\mathcal{O}}\big((β/μ) (1-ρ)^{-1/2} \log(1/ε)\big)$ communication steps, where $β$ measures the degree of similarity of the agents' losses and $μ$ is the strong convexity constant of $F$. Therefore, when $β/μ< κ_g$, the use of high-order surrogates yields provably faster rates than what achievable by first-order models; this is without exchanging any Hessian matrix over the network. △ Less

Submitted 11 October, 2020; v1 submitted 7 May, 2019; originally announced May 2019.

Comments: This revised version contains explicit expression of the convergence rates. Furthermore, new rates are provided in the case data among the agents are statistically similar

arXiv:1901.00611 [pdf, other]

Finite rate distributed weight-balancing and average consensus over digraphs

Authors: Chang-Shen Lee, Nicolò Michelusi, Gesualdo Scutari

Abstract: This paper proposes the first distributed algorithm that solves the weight-balancing problem using only finite rate and simplex communications among nodes, compliant with the directed nature of the graph edges. It is proved that the algorithm converges to a weight-balanced solution at sublinear rate. The analysis builds upon a new metric inspired by positional system representations, which charact… ▽ More This paper proposes the first distributed algorithm that solves the weight-balancing problem using only finite rate and simplex communications among nodes, compliant with the directed nature of the graph edges. It is proved that the algorithm converges to a weight-balanced solution at sublinear rate. The analysis builds upon a new metric inspired by positional system representations, which characterizes the dynamics of information exchange over the network, and on a novel step-size rule. Building on this result, a novel distributed algorithm is proposed that solves the average consensus problem over digraphs, using, at each timeslot, finite rate simplex communications between adjacent nodes -- some bits for the weight-balancing problem and others for the average consensus. Convergence of the proposed quantized consensus algorithm to the average of the node's unquantized initial values is established, both almost surely and in the moment generating function of the error; and a sublinear convergence rate is proved for sufficiently large step-sizes. Numerical results validate our theoretical findings. △ Less

Submitted 29 February, 2020; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: A preliminary version arXiv:1809.06440 of this paper has appeared at IEEE CDC 2018

arXiv:1809.08694 [pdf, ps, other]

Second-order Guarantees of Distributed Gradient Algorithms

Authors: Amir Daneshmand, Gesualdo Scutari, Vyacheslav Kungurtsev

Abstract: We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more rec… ▽ More We consider distributed smooth nonconvex unconstrained optimization over networks, modeled as a connected graph. We examine the behavior of distributed gradient-based algorithms near strict saddle points. Specifically, we establish that (i) the renowned Distributed Gradient Descent (DGD) algorithm likely converges to a neighborhood of a Second-order Stationary (SoS) solution; and (ii) the more recent class of distributed algorithms based on gradient tracking--implementable also over digraphs--likely converges to exact SoS solutions, thus avoiding (strict) saddle-points. Furthermore, new convergence rate results to first-order critical points is established for the latter class of algorithms. △ Less

Submitted 25 May, 2020; v1 submitted 23 September, 2018; originally announced September 2018.

Comments: Final version, to appear on SIAM J. on Optimization

arXiv:1809.06440 [pdf, other]

Limited Rate Distributed Weight-Balancing and Average Consensus Over Digraphs

Authors: Chang-Shen Lee, Nicolò Michelusi, Gesualdo Scutari

Abstract: Distributed quantized weight-balancing and average consensus over fixed digraphs are considered. A digraph with non-negative weights associated to its edges is weight-balanced if, for each node, the sum of the weights of its out-going edges is equal to that of its incoming edges. This paper proposes and analyzes the first distributed algorithm that solves the weight-balancing problem using only fi… ▽ More Distributed quantized weight-balancing and average consensus over fixed digraphs are considered. A digraph with non-negative weights associated to its edges is weight-balanced if, for each node, the sum of the weights of its out-going edges is equal to that of its incoming edges. This paper proposes and analyzes the first distributed algorithm that solves the weight-balancing problem using only finite rate and simplex communications among nodes (compliant to the directed nature of the graph edges). Asymptotic convergence of the scheme is proved and a convergence rate analysis is provided. Building on this result, a novel distributed algorithm is proposed that solves the average consensus problem over digraphs, using, at each iteration, finite rate simplex communications between adjacent nodes -- some bits for the weight-balancing problem, other for the average consensus. Convergence of the proposed quantized consensus algorithm to the average of the real (i.e., unquantized) agent's initial values is proved, both almost surely and in $r$th mean for all positive integer $r$. Finally, numerical results validate our theoretical findings. △ Less

Submitted 17 September, 2018; originally announced September 2018.

Comments: Part of this work will be presented at the 57th IEEE Conference on Decision and Control

arXiv:1809.01106 [pdf, other]

Distributed Nonconvex Constrained Optimization over Time-Varying Digraphs

Authors: Gesualdo Scutari, Ying Sun

Abstract: This paper considers nonconvex distributed constrained optimization over networks, modeled as directed (possibly time-varying) graphs. We introduce the first algorithmic framework for the minimization of the sum of a smooth nonconvex (nonseparable) function--the agent's sum-utility--plus a Difference-of-Convex (DC) function (with nonsmooth convex part). This general formulation arises in many appl… ▽ More This paper considers nonconvex distributed constrained optimization over networks, modeled as directed (possibly time-varying) graphs. We introduce the first algorithmic framework for the minimization of the sum of a smooth nonconvex (nonseparable) function--the agent's sum-utility--plus a Difference-of-Convex (DC) function (with nonsmooth convex part). This general formulation arises in many applications, from statistical machine learning to engineering. The proposed distributed method combines successive convex approximation techniques with a judiciously designed perturbed push-sum consensus mechanism that aims to track locally the gradient of the (smooth part of the) sum-utility. Sublinear convergence rate is proved when a fixed step-size (possibly different among the agents) is employed whereas asymptotic convergence to stationary solutions is proved using a diminishing step-size. Numerical results show that our algorithms compare favorably with current schemes on both convex and nonconvex problems. △ Less

Submitted 4 September, 2018; originally announced September 2018.

Comments: Submitted June 3, 2017, revised June 5, 2108. Part of this work has been presented at the 2016 Asilomar Conference on System, Signal and Computers and the 2017 IEEE ICASSP Conference

arXiv:1808.07252 [pdf, other]

Distributed Big-Data Optimization via Block-wise Gradient Tracking

Authors: Ivano Notarnicola, Ying Sun, Gesualdo Scutari, Giuseppe Notarstefano

Abstract: We study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is on big-data problems in which there is a large number of variables to optimize. If treated by means of standard distributed optimi… ▽ More We study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is on big-data problems in which there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method where, at each iteration, agents update in an uncoordinated fashion only one block of the entire decision vector. To deal with the nonconvexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques combined with a novel block-wise perturbed push-sum consensus protocol, which is instrumental to perform local block-averaging operations and tracking of gradient averages. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed. △ Less

Submitted 31 August, 2018; v1 submitted 22 August, 2018; originally announced August 2018.

arXiv:1808.05933 [pdf, other]

Decentralized Dictionary Learning Over Time-Varying Digraphs

Authors: Amir Daneshmand, Ying Sun, Gesualdo Scutari, Francisco Facchinei, Brian M. Sadler

Abstract: This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be ineffi… ▽ More This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs. △ Less

Submitted 5 March, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

arXiv:1805.10654 [pdf, other]

Distributed Big-Data Optimization via Block Communications

Authors: Ivano Notarnicola, Ying Sun, Gesualdo Scutari, Giuseppe Notarstefano

Abstract: We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication… ▽ More We study distributed multi-agent large-scale optimization problems, wherein the cost function is composed of a smooth possibly nonconvex sum-utility plus a DC (Difference-of-Convex) regularizer. We consider the scenario where the dimension of the optimization variables is so large that optimizing and/or transmitting the entire set of variables could cause unaffordable computation and communication overhead. To address this issue, we propose the first distributed algorithm whereby agents optimize and communicate only a portion of their local variables. The scheme hinges on successive convex approximation (SCA) to handle the nonconvexity of the objective function, coupled with a novel block-signal tracking scheme, aiming at locally estimating the average of the agents' gradients. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Numerical results on a sparse regression problem show the effectiveness of the proposed algorithm and the impact of the block size on its practical convergence speed and communication cost. △ Less

Submitted 27 May, 2018; originally announced May 2018.

arXiv:1805.06963 [pdf, other]

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization

Authors: Gesualdo Scutari, Ying Sun

Abstract: Recent years have witnessed a surge of interest in parallel and distributed optimization methods for large-scale systems. In particular, nonconvex large-scale optimization problems have found a wide range of applications in several engineering fields. The design and the analysis of such complex, large-scale, systems pose several challenges and call for the development of new optimization models an… ▽ More Recent years have witnessed a surge of interest in parallel and distributed optimization methods for large-scale systems. In particular, nonconvex large-scale optimization problems have found a wide range of applications in several engineering fields. The design and the analysis of such complex, large-scale, systems pose several challenges and call for the development of new optimization models and algorithms. The major contribution of this paper is to put forth a general, unified, algorithmic framework, based on Successive Convex Approximation (SCA) techniques, for the parallel and distributed solution of a general class of non-convex constrained (non-separable, networked) problems. The presented framework unifies and generalizes several existing SCA methods, making them appealing for a parallel/distributed implementation while offering a flexible selection of function approximants, step size schedules, and control of the computation/communication efficiency. This paper is organized according to the lectures that one of the authors delivered at the CIME Summer School on Centralized and Distributed Multi-agent Optimization Models and Algorithms, held in Cetraro, Italy, June 23--27, 2014. These lectures are: I) Successive Convex Approximation Methods: Basics; II) Parallel Successive Convex Approximation Methods; and III) Distributed Successive Convex Approximation Methods. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Journal ref: Lecture Notes in Mathematics, C.I.M.E, Springer Verlag series, 2018

arXiv:1805.00658 [pdf, other]

Distributed Big-Data Optimization via Block-Iterative Convexification and Averaging

Authors: Ivano Notarnicola, Ying Sun, Gesualdo Scutari, Giuseppe Notarstefano

Abstract: In this paper, we study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is in big-data problems wherein there is a large number of variables to optimize. If treated by means of standard dist… ▽ More In this paper, we study distributed big-data nonconvex optimization in multi-agent networks. We consider the (constrained) minimization of the sum of a smooth (possibly) nonconvex function, i.e., the agents' sum-utility, plus a convex (possibly) nonsmooth regularizer. Our interest is in big-data problems wherein there is a large number of variables to optimize. If treated by means of standard distributed optimization algorithms, these large-scale problems may be intractable, due to the prohibitive local computation and communication burden at each node. We propose a novel distributed solution method whereby at each iteration agents optimize and then communicate (in an uncoordinated fashion) only a subset of their decision variables. To deal with non-convexity of the cost function, the novel scheme hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate gradient averages; and ii) a novel block-wise consensus-based protocol to perform local block-averaging operations and gradient tacking. Asymptotic convergence to stationary solutions of the nonconvex problem is established. Finally, numerical results show the effectiveness of the proposed algorithm and highlight how the block dimension impacts on the communication overhead and practical convergence speed. △ Less

Submitted 2 May, 2018; originally announced May 2018.

arXiv:1803.10359 [pdf, ps, other]

Achieving Linear Convergence in Distributed Asynchronous Multi-agent Optimization

Authors: Ye Tian, Ying Sun, Gesualdo Scutari

Abstract: This papers studies multi-agent (convex and \emph{nonconvex}) optimization over static digraphs. We propose a general distributed \emph{asynchronous} algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-syn… ▽ More This papers studies multi-agent (convex and \emph{nonconvex}) optimization over static digraphs. We propose a general distributed \emph{asynchronous} algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not be known to the agent or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents' gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is {sufficiently small}. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, {\it uncoordinated} step-sizes are considered. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings. △ Less

Submitted 11 September, 2019; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: Part of this work has been presented to Allerton 2018; first version posted on arxiv on March 2018; revised Nov. 2018. To appear on IEEE Trans. on Automatic Control

arXiv:1709.03384 [pdf, ps, other]

Ghost Penalties in Nonconvex Constrained Optimization: Diminishing Stepsizes and Iteration Complexity

Authors: Francisco Facchinei, Vyacheslav Kungurtsev, Lorenzo Lampariello, Gesualdo Scutari

Abstract: We consider nonconvex constrained optimization problems and propose a new approach to the convergence analysis based on penalty functions. We make use of classical penalty functions in an unconventional way, in that penalty functions only enter in the theoretical analysis of convergence while the algorithm itself is penalty-free. Based on this idea, we are able to establish several new results, in… ▽ More We consider nonconvex constrained optimization problems and propose a new approach to the convergence analysis based on penalty functions. We make use of classical penalty functions in an unconventional way, in that penalty functions only enter in the theoretical analysis of convergence while the algorithm itself is penalty-free. Based on this idea, we are able to establish several new results, including the first general analysis for diminishing stepsize methods in nonconvex, constrained optimization, showing convergence to generalized stationary points, and a complexity study for SQP-type algorithms. △ Less

Submitted 30 May, 2020; v1 submitted 11 September, 2017; originally announced September 2017.

Comments: To appear on Mathematics of Operations Research

arXiv:1701.04900 [pdf, ps, other]

Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization. Part II: Complexity and Numerical Results

Authors: Loris Cannelli, Francisco Facchinei, Vyacheslav Kungurtsev, Gesualdo Scutari

Abstract: We present complexity and numerical results for a new asynchronous parallel algorithmic method for the minimization of the sum of a smooth nonconvex function and a convex nonsmooth regularizer, subject to both convex and nonconvex constraints. The proposed method hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational… ▽ More We present complexity and numerical results for a new asynchronous parallel algorithmic method for the minimization of the sum of a smooth nonconvex function and a convex nonsmooth regularizer, subject to both convex and nonconvex constraints. The proposed method hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchronous implementations in a more faithful way than state-of-the-art models. In the companion paper we provided a detailed description on the probabilistic model and gave convergence results for a diminishing stepsize version of our method. Here, we provide theoretical complexity results for a fixed stepsize version of the method and report extensive numerical comparisons on both convex and nonconvex problems demonstrating the efficiency of our approach. △ Less

Submitted 19 January, 2017; v1 submitted 17 January, 2017; originally announced January 2017.

Comments: This is the second part of a two-paper work. The first part can be found at: arXiv:1607.04818

arXiv:1612.07335 [pdf, ps, other]

Distributed Dictionary Learning

Authors: Amir Daneshmand, Gesualdo Scutari, Francisco Facchinei

Abstract: The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion c… ▽ More The paper studies distributed Dictionary Learning (DL) problems where the learning task is distributed over a multi-agent network with time-varying (nonsymmetric) connectivity. This formulation is relevant, for instance, in big-data scenarios where massive amounts of data are collected/stored in different spatial locations and it is unfeasible to aggregate and/or process all the data in a fusion center, due to resource limitations, communication overhead or privacy considerations. We develop a general distributed algorithmic framework for the (nonconvex) DL problem and establish its asymptotic convergence. The new method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a gradient tracking mechanism instrumental to locally estimate the missing global information; and ii) a consensus step, as a mechanism to distribute the computations among the agents. To the best of our knowledge, this is the first distributed algorithm with provable convergence for the DL problem and, more in general, bi-convex optimization problems over (time-varying) directed graphs. △ Less

Submitted 21 December, 2016; originally announced December 2016.

arXiv:1611.06576 [pdf, ps, other]

Distributed Nonconvex Optimization for Sparse Representation

Authors: Ying Sun, Gesualdo Scutari

Abstract: We consider a non-convex constrained Lagrangian formulation of a fundamental bi-criteria optimization problem for variable selection in statistical learning; the two criteria are a smooth (possibly) nonconvex loss function, measuring the fitness of the model to data, and the latter function is a difference-of-convex (DC) regularization, employed to promote some extra structure on the solution, lik… ▽ More We consider a non-convex constrained Lagrangian formulation of a fundamental bi-criteria optimization problem for variable selection in statistical learning; the two criteria are a smooth (possibly) nonconvex loss function, measuring the fitness of the model to data, and the latter function is a difference-of-convex (DC) regularization, employed to promote some extra structure on the solution, like sparsity. This general class of nonconvex problems arises in many big-data applications, from statistical machine learning to physical sciences and engineering. We develop the first unified distributed algorithmic framework for these problems and establish its asymptotic convergence to d-stationary solutions. Two key features of the method are: i) it can be implemented on arbitrary networks (digraphs) with (possibly) time-varying connectivity; and ii) it does not require the restrictive assumption that the (sub)gradient of the objective function is bounded, which enlarges significantly the class of statistical learning problems that can be solved with convergence guarantees. △ Less

Submitted 20 November, 2016; originally announced November 2016.

Comments: Submitted to ICASSP 2017

arXiv:1607.04818 [pdf, other]

Asynchronous Parallel Algorithms for Nonconvex Optimization

Authors: Loris Cannelli, Francisco Facchinei, Vyacheslav Kungurtsev, Gesualdo Scutari

Abstract: We propose a new asynchronous parallel block-descent algorithmic framework for the minimization of the sum of a smooth nonconvex function and a nonsmooth convex one, subject to both convex and nonconvex constraints. The proposed framework hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchr… ▽ More We propose a new asynchronous parallel block-descent algorithmic framework for the minimization of the sum of a smooth nonconvex function and a nonsmooth convex one, subject to both convex and nonconvex constraints. The proposed framework hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchronous implementations in a more faithful way than current state-of-the-art models. Other key features of the framework are: i) it covers in a unified way several specific solution methods; ii) it accommodates a variety of possible parallel computing architectures; and iii) it can deal with nonconvex constraints. Almost sure convergence to stationary solutions is proved, and theoretical complexity results are provided, showing nearly ideal linear speedup when the number of workers is not too large. △ Less

Submitted 29 March, 2018; v1 submitted 16 July, 2016; originally announced July 2016.

Comments: This is the first part of a two-paper work. The second part can be found at: arXiv:1701.04900

arXiv:1607.00249 [pdf, ps, other]

Distributed Nonconvex Multiagent Optimization Over Time-Varying Networks

Authors: Ying Sun, Gesualdo Scutari, Daniel Palomar

Abstract: We study nonconvex distributed optimization in multiagent networks where the communications between nodes is modeled as a time-varying sequence of arbitrary digraphs. We introduce a novel broadcast-based distributed algorithmic framework for the (constrained) minimization of the sum of a smooth (possibly nonconvex and nonseparable) function, i.e., the agents' sum-utility, plus a convex (possibly n… ▽ More We study nonconvex distributed optimization in multiagent networks where the communications between nodes is modeled as a time-varying sequence of arbitrary digraphs. We introduce a novel broadcast-based distributed algorithmic framework for the (constrained) minimization of the sum of a smooth (possibly nonconvex and nonseparable) function, i.e., the agents' sum-utility, plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce some structure in the solution, typically sparsity. The proposed method hinges on Successive Convex Approximation (SCA) techniques coupled with i) a tracking mechanism instrumental to locally estimate the gradients of agents' cost functions; and ii) a novel broadcast protocol to disseminate information and distribute the computation among the agents. Asymptotic convergence to stationary solutions is established. A key feature of the proposed algorithm is that it neither requires the double-stochasticity of the consensus matrices (but only column stochasticity) nor the knowledge of the graph sequence to implement. To the best of our knowledge, the proposed framework is the first broadcast-based distributed algorithm for convex and nonconvex constrained optimization over arbitrary, time-varying digraphs. Numerical results show that our algorithm outperforms current schemes on both convex and nonconvex problems. △ Less

Submitted 14 December, 2016; v1 submitted 1 July, 2016; originally announced July 2016.

arXiv:1602.00591 [pdf, ps, other]

NEXT: In-Network Nonconvex Optimization

Authors: Paolo Di Lorenzo, Gesualdo Scutari

Abstract: We study nonconvex distributed optimization in multi-agent networks with time-varying (nonsymmetric) connectivity. We introduce the first algorithmic framework for the distributed minimization of the sum of a smooth (possibly nonconvex and nonseparable) function - the agents' sum-utility - plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce so… ▽ More We study nonconvex distributed optimization in multi-agent networks with time-varying (nonsymmetric) connectivity. We introduce the first algorithmic framework for the distributed minimization of the sum of a smooth (possibly nonconvex and nonseparable) function - the agents' sum-utility - plus a convex (possibly nonsmooth and nonseparable) regularizer. The latter is usually employed to enforce some structure in the solution, typically sparsity. The proposed method hinges on successive convex approximation techniques while leveraging dynamic consensus as a mechanism to distribute the computation among the agents: each agent first solves (possibly inexactly) a local convex approximation of the nonconvex original problem, and then performs local averaging operations. Asymptotic convergence to (stationary) solutions of the nonconvex problem is established. Our algorithmic framework is then customized to a variety of convex and nonconvex problems in several fields, including signal processing, communications, networking, and machine learning. Numerical results show that the new method compares favorably to existing distributed algorithms on both convex and nonconvex problems. △ Less

Submitted 1 February, 2016; originally announced February 2016.

Comments: To appear on IEEE Transactions on Signal and Information Processing over Networks

arXiv:1410.5076 [pdf, ps, other]

doi 10.1109/TSP.2016.2531627

A Parallel Stochastic Approximation Method for Nonconvex Multi-Agent Optimization Problems

Authors: Yang Yang, Gesualdo Scutari, Daniel P. Palomar, Marius Pesavento

Abstract: Consider the problem of minimizing the expected value of a (possibly nonconvex) cost function parameterized by a random (vector) variable, when the expectation cannot be computed accurately (e.g., because the statistics of the random variables are unknown and/or the computational complexity is prohibitive). Classical sample stochastic gradient methods for solving this problem may empirically suffe… ▽ More Consider the problem of minimizing the expected value of a (possibly nonconvex) cost function parameterized by a random (vector) variable, when the expectation cannot be computed accurately (e.g., because the statistics of the random variables are unknown and/or the computational complexity is prohibitive). Classical sample stochastic gradient methods for solving this problem may empirically suffer from slow convergence. In this paper, we propose for the first time a stochastic parallel Successive Convex Approximation-based (best-response) algorithmic framework for general nonconvex stochastic sum-utility optimization problems, which arise naturally in the design of multi-agent systems. The proposed novel decomposition enables all users to update their optimization variables in parallel by solving a sequence of strongly convex subproblems, one for each user. Almost surely convergence to stationary points is proved. We then customize our algorithmic framework to solve the stochastic sum rate maximization problem over Single-Input-Single-Output (SISO) frequency-selective interference channels, multiple-input-multiple-output (MIMO) interference channels, and MIMO multiple-access channels. Numerical results show that our algorithms are much faster than state-of-the-art stochastic gradient schemes while achieving the same (or better) sum-rates. △ Less

Submitted 21 October, 2014; v1 submitted 19 October, 2014; originally announced October 2014.

Comments: Part of this work has been presented at IEEE SPAWC 2013

arXiv:1410.4754 [pdf, ps, other]

Parallel and Distributed Methods for Nonconvex Optimization-Part I: Theory

Authors: Gesualdo Scutari, Francisco Facchinei, Lorenzo Lampariello, Peiran Song

Abstract: In this two-part paper, we propose a general algorithmic framework for the minimization of a nonconvex smooth function subject to nonconvex smooth constraints. The algorithm solves a sequence of (separable) strongly convex problems and mantains feasibility at each iteration. Convergence to a stationary solution of the original nonconvex optimization is established. Our framework is very general an… ▽ More In this two-part paper, we propose a general algorithmic framework for the minimization of a nonconvex smooth function subject to nonconvex smooth constraints. The algorithm solves a sequence of (separable) strongly convex problems and mantains feasibility at each iteration. Convergence to a stationary solution of the original nonconvex optimization is established. Our framework is very general and flexible; it unifies several existing Successive Convex Approximation (SCA)-based algorithms such as (proximal) gradient or Newton type methods, block coordinate (parallel) descent schemes, difference of convex functions methods, and improves on their convergence properties. More importantly, and differently from current SCA approaches, it naturally leads to distributed and parallelizable implementations for a large class of nonconvex problems. This Part I is devoted to the description of the framework in its generality. In Part II we customize our general methods to several multi-agent optimization problems, mainly in communications and networking; the result is a new class of (distributed) algorithms that compare favorably to existing ad-hoc (centralized) schemes (when they exist). △ Less

Submitted 14 January, 2016; v1 submitted 17 October, 2014; originally announced October 2014.

Comments: Part of this work has been presented at IEEE ICASSP 2014; Part II is available as separate arrive submission

arXiv:1407.4504 [pdf, ps, other]

doi 10.1109/TSP.2015.2436357

Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization

Authors: Amir Daneshmand, Francisco Facchinei, Vyacheslav Kungurtsev, Gesualdo Scutari

Abstract: We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at… ▽ More We propose a decomposition framework for the parallel optimization of the sum of a differentiable {(possibly nonconvex)} function and a nonsmooth (possibly nonseparable), convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. The main contribution of this work is a novel \emph{parallel, hybrid random/deterministic} decomposition scheme wherein, at each iteration, a subset of (block) variables is updated at the same time by minimizing local convex approximations of the original nonconvex function. To tackle with huge-scale problems, the (block) variables to be updated are chosen according to a \emph{mixed random and deterministic} procedure, which captures the advantages of both pure deterministic and random update-based schemes. Almost sure convergence of the proposed scheme is established. Numerical results show that on huge-scale problems the proposed hybrid random/deterministic algorithm outperforms both random and deterministic schemes. △ Less

Submitted 2 September, 2014; v1 submitted 16 July, 2014; originally announced July 2014.

Comments: The order of the authors is alphabetical

arXiv:1402.5521 [pdf, ps, other]

doi 10.1109/TSP.2015.2399858

Parallel Selective Algorithms for Big Data Optimization

Authors: Francisco Facchinei, Gesualdo Scutari, Simone Sagratella

Abstract: We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e., sequential) ones, as well a… ▽ More We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e., sequential) ones, as well as virtually all possibilities "in between" with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results on LASSO, logistic regression, and some nonconvex quadratic problems show that the new method consistently outperforms existing algorithms. △ Less

Submitted 8 December, 2014; v1 submitted 22 February, 2014; originally announced February 2014.

Comments: This work is an extended version of the conference paper that has been presented at IEEE ICASSP'14. The first and the second author contributed equally to the paper. This revised version contains new numerical results on non convex quadratic problems

arXiv:1311.2444 [pdf, ps, other]

Flexible Parallel Algorithms for Big Data Optimization

Authors: Francisco Facchinei, Simone Sagratella, Gesualdo Scutari

Abstract: We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in Lasso problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually… ▽ More We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in Lasso problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually all possibilities in between (e.g., gradient- or Newton-type methods) with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results show that the new method compares favorably to existing algorithms. △ Less

Submitted 11 November, 2013; originally announced November 2013.

Comments: submitted to IEEE ICASSP 2014

arXiv:1308.3521 [pdf, ps, other]

A New Distributed DC-Programming Method and its Applications

Authors: Alberth Alvarado, Gesualdo Scutari, Jong-Shi Pang

Abstract: We propose a novel decomposition framework for the distributed optimization of Difference Convex (DC)-type nonseparable sum-utility functions subject to coupling convex constraints. A major contribution of the paper is to develop for the first time a class of (inexact) best-response-like algorithms with provable convergence, where a suitably convexified version of the original DC program is iterat… ▽ More We propose a novel decomposition framework for the distributed optimization of Difference Convex (DC)-type nonseparable sum-utility functions subject to coupling convex constraints. A major contribution of the paper is to develop for the first time a class of (inexact) best-response-like algorithms with provable convergence, where a suitably convexified version of the original DC program is iteratively solved. The main feature of the proposed successive convex approximation method is its decomposability structure across the users, which leads naturally to distributed algorithms in the primal and/or dual domain. The proposed framework is applicable to a variety of multiuser DC problems in different areas, ranging from signal processing, to communications and networking. As a case study, in the second part of the paper we focus on two examples, namely: i) a novel resource allocation problem in the emerging area of cooperative physical layer security; ii) and the renowned sum-rate maximization of MIMO Cognitive Radio networks. Our contribution in this context is to devise a class of easy-to-implement distributed algorithms with provable convergence to stationary solution of such problems. Numerical results show that the proposed distributed schemes reach performance close to (and sometimes better than) that of centralized methods. △ Less

Submitted 20 September, 2013; v1 submitted 15 August, 2013; originally announced August 2013.

Comments: submitted to IEEE Transactions on Signal Processing

arXiv:1302.0756 [pdf, ps, other]

doi 10.1109/TSP.2013.2293126

Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems

Authors: Gesualdo Scutari, Francisco Facchinei, Peiran Song, Daniel P. Palomar, Jong-Shi Pang

Abstract: We propose a novel decomposition framework for the distributed optimization of general nonconvex sum-utility functions arising naturally in the system design of wireless multiuser interfering systems. Our main contributions are: i) the development of the first class of (inexact) Jacobi best-response algorithms with provable convergence, where all the users simultaneously and iteratively solve a su… ▽ More We propose a novel decomposition framework for the distributed optimization of general nonconvex sum-utility functions arising naturally in the system design of wireless multiuser interfering systems. Our main contributions are: i) the development of the first class of (inexact) Jacobi best-response algorithms with provable convergence, where all the users simultaneously and iteratively solve a suitably convexified version of the original sum-utility optimization problem; ii) the derivation of a general dynamic pricing mechanism that provides a unified view of existing pricing schemes that are based, instead, on heuristics; and iii) a framework that can be easily particularized to well-known applications, giving rise to very efficient practical (Jacobi or Gauss-Seidel) algorithms that outperform existing adhoc methods proposed for very specific problems. Interestingly, our framework contains as special cases well-known gradient algorithms for nonconvex sum-utility problems, and many blockcoordinate descent schemes for convex functions. △ Less

Submitted 19 September, 2013; v1 submitted 4 February, 2013; originally announced February 2013.

Comments: submitted to IEEE Transactions on Signal Processing

arXiv:0806.1565 [pdf, ps, other]

doi 10.1109/JSAC.2008.080907

Competitive Design of Multiuser MIMO Systems based on Game Theory: A Unified View

Authors: Gesualdo Scutari, Daniel P. Palomar, Sergio Barbarossa

Abstract: This paper considers the noncooperative maximization of mutual information in the Gaussian interference channel in a fully distributed fashion via game theory. This problem has been studied in a number of papers during the past decade for the case of frequency-selective channels. A variety of conditions guaranteeing the uniqueness of the Nash Equilibrium (NE) and convergence of many different di… ▽ More This paper considers the noncooperative maximization of mutual information in the Gaussian interference channel in a fully distributed fashion via game theory. This problem has been studied in a number of papers during the past decade for the case of frequency-selective channels. A variety of conditions guaranteeing the uniqueness of the Nash Equilibrium (NE) and convergence of many different distributed algorithms have been derived. In this paper we provide a unified view of the state-of-the-art results, showing that most of the techniques proposed in the literature to study the game, even though apparently different, can be unified using our recent interpretation of the waterfilling operator as a projection onto a proper polyhedral set. Based on this interpretation, we then provide a mathematical framework, useful to derive a unified set of sufficient conditions guaranteeing the uniqueness of the NE and the global convergence of waterfilling based asynchronous distributed algorithms. The proposed mathematical framework is also instrumental to study the extension of the game to the more general MIMO case, for which only few results are available in the current literature. The resulting algorithm is, similarly to the frequency-selective case, an iterative asynchronous MIMO waterfilling algorithm. The proof of convergence hinges again on the interpretation of the MIMO waterfilling as a matrix projection, which is the natural generalization of our results obtained for the waterfilling map** in the frequency-selective case. △ Less

Submitted 9 June, 2008; originally announced June 2008.

Comments: To appear on IEEE Journal on Selected Areas in Communications (JSAC), September 2008

Showing 1–41 of 41 results for author: Scutari, G