-
High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
Authors:
Aleksandar Armacki,
Pranay Sharma,
Gauri Joshi,
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong resul…
▽ More
We study high-probability convergence guarantees of learning on streaming data in the presence of heavy-tailed noise. In the proposed scenario, the model is updated in an online fashion, as new information is observed, without storing any additional data. To combat the heavy-tailed noise, we consider a general framework of nonlinear stochastic gradient descent (SGD), providing several strong results. First, for non-convex costs and component-wise nonlinearities, we establish a convergence rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{4}}\right)$, whose exponent is independent of noise and problem parameters. Second, for strongly convex costs and component-wise nonlinearities, we establish a rate arbitrarily close to $\mathcal{O}\left(t^{-\frac{1}{2}}\right)$ for the weighted average of iterates, with exponent again independent of noise and problem parameters. Finally, for strongly convex costs and a broader class of nonlinearities, we establish convergence of the last iterate, with a rate $\mathcal{O}\left(t^{-ζ} \right)$, where $ζ\in (0,1)$ depends on problem parameters, noise and nonlinearity. As we show analytically and numerically, $ζ$ can be used to inform the preferred choice of nonlinearity for given problem settings. Compared to state-of-the-art, who only consider clip**, require bounded noise moments of order $η\in (1,2]$, and establish convergence rates whose exponents go to zero as $η\rightarrow 1$, we provide high-probability guarantees for a much broader class of nonlinearities and symmetric density noise, with convergence rates whose exponents are bounded away from zero, even when the noise has finite first moment only. Moreover, in the case of strongly convex functions, we demonstrate analytically and numerically that clip** is not always the optimal nonlinearity, further underlining the value of our general framework.
△ Less
Submitted 30 April, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Smoothed Gradient Clip** and Error Feedback for Distributed Optimization under Heavy-Tailed Noise
Authors:
Shuhua Yu,
Dusan Jakovetic,
Soummya Kar
Abstract:
Motivated by understanding and analysis of large-scale machine learning under heavy-tailed gradient noise, we study distributed optimization with gradient clip**, i.e., in which certain clip** operators are applied to the gradients or gradient estimates computed from local clients prior to further processing. While vanilla gradient clip** has proven effective in mitigating the impact of heav…
▽ More
Motivated by understanding and analysis of large-scale machine learning under heavy-tailed gradient noise, we study distributed optimization with gradient clip**, i.e., in which certain clip** operators are applied to the gradients or gradient estimates computed from local clients prior to further processing. While vanilla gradient clip** has proven effective in mitigating the impact of heavy-tailed gradient noises in non-distributed setups, it incurs bias that causes convergence issues in heterogeneous distributed settings. To address the inherent bias introduced by gradient clip**, we develop a smoothed clip** operator, and propose a distributed gradient method equipped with an error feedback mechanism, i.e., the clip** operator is applied on the difference between some local gradient estimator and local stochastic gradient. We establish that, for the first time in the strongly convex setting with heavy-tailed gradient noises that may not have finite moments of order greater than one, the proposed distributed gradient method's mean square error (MSE) converges to zero at a rate $O(1/t^ι)$, $ι\in (0, 1/2)$, where the exponent $ι$ stays bounded away from zero as a function of the problem condition number and the first absolute moment of the noise and, in particular, is shown to be independent of the existence of higher order gradient noise moments $α> 1$. Numerical experiments validate our theoretical findings.
△ Less
Submitted 2 February, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Optimum Production through variational principle with the time quadratic demand, fuzzy time period and fuzzy integrand
Authors:
J. N. Roul,
K. Maity,
S. Kar,
M. Maiti
Abstract:
Here a real life optimal control problem under fuzzy time period using variational principle is formulated and Solved. The unit production cost is a function of production rate and also dependent on raw material cost, development cost due to durability and wear-tear cost. The holding cost is assumed to be non-linear, dependent on time. The profit function which consists of revenue, production cost…
▽ More
Here a real life optimal control problem under fuzzy time period using variational principle is formulated and Solved. The unit production cost is a function of production rate and also dependent on raw material cost, development cost due to durability and wear-tear cost. The holding cost is assumed to be non-linear, dependent on time. The profit function which consists of revenue, production cost and holding cost is formulated as a Fuzzy-Final Time and Fixed State System optimal control problem with fuzzy time period. Here production rate is unknown and considered as a control variable and stock level is taken as a state variable. It is formulated to optimize the production rate so that total profit is maximum. The non-linear optimization technique-Generalised Reduced Gradient Method (LINGO 11.0) is used. The optimum results are illustrated both numerically and graphically.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
An Equivalent Circuit Workflow for Unconstrained Optimization
Authors:
Aayushya Agarwal,
Carmel Fiscko,
Soummya Kar,
Larry Pileggi,
Bruno Sinopoli
Abstract:
We introduce a new workflow for unconstrained optimization whereby objective functions are mapped onto a physical domain to more easily design algorithms that are robust to hyperparameters and achieve fast convergence rates. Specifically, we represent optimization problems as an equivalent circuit that are then solved solely as nonlinear circuits using robust solution methods. The equivalent circu…
▽ More
We introduce a new workflow for unconstrained optimization whereby objective functions are mapped onto a physical domain to more easily design algorithms that are robust to hyperparameters and achieve fast convergence rates. Specifically, we represent optimization problems as an equivalent circuit that are then solved solely as nonlinear circuits using robust solution methods. The equivalent circuit models the trajectory of component-wise scaled gradient flow problem as the transient response of the circuit for which the steady-state coincides with a critical point of the objective function. The equivalent circuit model leverages circuit domain knowledge to methodically design new optimization algorithms that would likely not be developed without a physical model. We incorporate circuit knowledge into optimization methods by 1) enhancing the underlying circuit model for fast numerical analysis, 2) controlling the optimization trajectory by designing the nonlinear circuit components, and 3) solving for step sizes using well-known methods from the circuit simulation. We first establish the necessary conditions that the controls must fulfill for convergence. We show that existing descent algorithms can be re-derived as special cases of this approach and derive new optimization algorithms that are developed with insights from a circuit-based model. The new algorithms can be designed to be robust to hyperparameters, achieve convergence rates comparable or faster than state of the art methods, and are applicable to optimizing a variety of both convex and nonconvex problems.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Nonlinear consensus+innovations under correlated heavy-tailed noises: Mean square convergence rate and asymptotics
Authors:
Manojlo Vukovic,
Dusan Jakovetic,
Dragana Bajovic,
Soummya Kar
Abstract:
We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed,…
▽ More
We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent identically distributed (i.i.d.) in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things (IoT) deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator's almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions.
△ Less
Submitted 9 November, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Peripherally automorphic unital completely positive maps
Authors:
B. V. Rajarama Bhat,
Samir Kar,
Bharat Talwar
Abstract:
We identify and characterize unital completely positive (UCP) maps on finite dimensional $C^*$-algebras for which the Choi-Effros product extended to the space generated by peripheral eigenvectors matches with the original product. We analyze a decomposition of general UCP maps in finite dimensions into persistent and transient parts. It is shown that UCP maps on finite dimensional $C^*$-algebras…
▽ More
We identify and characterize unital completely positive (UCP) maps on finite dimensional $C^*$-algebras for which the Choi-Effros product extended to the space generated by peripheral eigenvectors matches with the original product. We analyze a decomposition of general UCP maps in finite dimensions into persistent and transient parts. It is shown that UCP maps on finite dimensional $C^*$-algebras with spectrum contained in the unit circle are $\ast$-automorphisms.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Large deviations rates for stochastic gradient descent with strongly convex functions
Authors:
Dragana Bajovic,
Dusan Jakovetic,
Soummya Kar
Abstract:
Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily boun…
▽ More
Recent works have shown that high probability metrics with stochastic gradient descent (SGD) exhibit informativeness and in some cases advantage over the commonly adopted mean-square error-based ones. In this work we provide a formal framework for the study of general high probability bounds with SGD, based on the theory of large deviations. The framework allows for a generic (not-necessarily bounded) gradient noise satisfying mild technical assumptions, allowing for the dependence of the noise distribution on the current iterate. Under the preceding assumptions, we find an upper large deviations bound for SGD with strongly convex functions. The corresponding rate function captures analytical dependence on the noise distribution and other problem parameters. This is in contrast with conventional mean-square error analysis that captures only the noise dependence through the variance and does not capture the effect of higher order moments nor interplay between the noise geometry and the shape of the cost function. We also derive exact large deviation rates for the case when the objective function is quadratic and show that the obtained function matches the one from the general upper bound hence showing the tightness of the general upper bound. Numerical examples illustrate and corroborate theoretical findings.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Secure Distributed Optimization Under Gradient Attacks
Authors:
Shuhua Yu,
Soummya Kar
Abstract:
In this paper, we study secure distributed optimization against arbitrary gradient attack in multi-agent networks. In distributed optimization, there is no central server to coordinate local updates, and each agent can only communicate with its neighbors on a predefined network. We consider the scenario where out of $n$ networked agents, a fixed but unknown fraction $ρ$ of the agents are under arb…
▽ More
In this paper, we study secure distributed optimization against arbitrary gradient attack in multi-agent networks. In distributed optimization, there is no central server to coordinate local updates, and each agent can only communicate with its neighbors on a predefined network. We consider the scenario where out of $n$ networked agents, a fixed but unknown fraction $ρ$ of the agents are under arbitrary gradient attack in that their stochastic gradient oracles return arbitrary information to derail the optimization process, and the goal is to minimize the sum of local objective functions on unattacked agents. We propose a distributed stochastic gradient method that combines local variance reduction and clip** (CLIP-VRG). We show that, in a connected network, when unattacked local objective functions are convex and smooth, share a common minimizer, and their sum is strongly convex, CLIP-VRG leads to almost sure convergence of the iterates to the exact sum cost minimizer at all agents. We quantify a tight upper bound of the fraction $ρ$ of attacked agents in terms of problem parameters such as the condition number of the associated sum cost that guarantee exact convergence of CLIP-VRG, and characterize its asymptotic convergence rate. Finally, we empirically demonstrate the effectiveness of the proposed method under gradient attacks in both synthetic dataset and image classification datasets.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Networked Signal and Information Processing
Authors:
Stefan Vlaski,
Soummya Kar,
Ali H. Sayed,
José M. F. Moura
Abstract:
The article reviews significant advances in networked signal and information processing, which have enabled in the last 25 years extending decision making and inference, optimization, control, and learning to the increasingly ubiquitous environments of distributed agents. As these interacting agents cooperate, new collective behaviors emerge from local decisions and actions. Moreover, and signific…
▽ More
The article reviews significant advances in networked signal and information processing, which have enabled in the last 25 years extending decision making and inference, optimization, control, and learning to the increasingly ubiquitous environments of distributed agents. As these interacting agents cooperate, new collective behaviors emerge from local decisions and actions. Moreover, and significantly, theory and applications show that networked agents, through cooperation and sharing, are able to match the performance of cloud or federated solutions, while offering the potential for improved privacy, increasing resilience, and saving resources.
△ Less
Submitted 18 April, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Peripheral Poisson Boundary
Authors:
B. V. Rajarama Bhat,
Samir Kar,
Bharat Talwar
Abstract:
It is shown that the operator space generated by peripheral eigenvectors of a unital completely positive map on a von Neumann algebra has a $C^*$-algebra structure. This extends the notion of non-commutative Poisson boundary by including the point spectrum of the map contained in the unit circle. The main ingredient is dilation theory. This theory provides a simple formula for the new product. The…
▽ More
It is shown that the operator space generated by peripheral eigenvectors of a unital completely positive map on a von Neumann algebra has a $C^*$-algebra structure. This extends the notion of non-commutative Poisson boundary by including the point spectrum of the map contained in the unit circle. The main ingredient is dilation theory. This theory provides a simple formula for the new product. The notion has implications to our understanding of quantum dynamics. For instance, it is shown that the peripheral Poisson boundary remains invariant in discrete quantum dynamics.
△ Less
Submitted 22 May, 2024; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Nonlinear gradient map**s and stochastic optimization: A general framework with applications to heavy-tail noise
Authors:
Dusan Jakovetic,
Dragana Bajovic,
Anit Kumar Sahu,
Soummya Kar,
Nemanja Milosevic,
Dusan Stamenkovic
Abstract:
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assum…
▽ More
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios when gradient noise exhibits heavy tails. The proposed framework subsumes several popular nonlinearity choices, like clipped, normalized, signed or quantized gradient, but we also consider novel nonlinearity choices. We establish for the considered class of methods strong convergence guarantees assuming a strongly convex cost function with Lipschitz continuous gradients under very general assumptions on the gradient noise. Most notably, we show that, for a nonlinearity with bounded outputs and for the gradient noise that may not have finite moments of order greater than one, the nonlinear SGD's mean squared error (MSE), or equivalently, the expected cost function's optimality gap, converges to zero at rate~$O(1/t^ζ)$, $ζ\in (0,1)$. In contrast, for the same noise setting, the linear SGD generates a sequence with unbounded variances. Furthermore, for the nonlinearities that can be decoupled component wise, like, e.g., sign gradient or component-wise clip**, we show that the nonlinear SGD asymptotically (locally) achieves a $O(1/t)$ rate in the weak convergence sense and explicitly quantify the corresponding asymptotic variance. Experiments show that, while our framework is more general than existing studies of SGD under heavy-tail noise, several easy-to-implement nonlinearities from our framework are competitive with state of the art alternatives on real data sets with heavy tail noises.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Variance reduced stochastic optimization over directed graphs with row and column stochastic weights
Authors:
Muhammad I. Qureshi,
Ran Xin,
Soummya Kar,
Usman A. Khan
Abstract:
This paper proposes AB-SAGA, a first-order distributed stochastic optimization method to minimize a finite-sum of smooth and strongly convex functions distributed over an arbitrary directed graph. AB-SAGA removes the uncertainty caused by the stochastic gradients using a node-level variance reduction and subsequently employs network-level gradient tracking to address the data dissimilarity across…
▽ More
This paper proposes AB-SAGA, a first-order distributed stochastic optimization method to minimize a finite-sum of smooth and strongly convex functions distributed over an arbitrary directed graph. AB-SAGA removes the uncertainty caused by the stochastic gradients using a node-level variance reduction and subsequently employs network-level gradient tracking to address the data dissimilarity across the nodes. Unlike existing methods that use the nonlinear push-sum correction to cancel the imbalance caused by the directed communication, the consensus updates in AB-SAGA are linear and uses both row and column stochastic weights. We show that for a constant step-size, AB-SAGA converges linearly to the global optimal. We quantify the directed nature of the underlying graph using an explicit directivity constant and characterize the regimes in which AB-SAGA achieves a linear speed-up over its centralized counterpart. Numerical experiments illustrate the convergence of AB-SAGA for strongly convex and nonconvex problems.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Distributed design of deterministic discrete-time privacy preserving average consensus for multi-agent systems through network augmentation
Authors:
Guilherme Ramos,
A. Pedro Aguiar,
Soummya Kar,
Sérgio Pequito
Abstract:
Average consensus protocols emerge with a central role in distributed systems and decision-making such as distributed information fusion, distributed optimization, distributed estimation, and control. A key advantage of these protocols is that agents exchange and reveal their state information only to their neighbors. Yet, it can raise privacy concerns in situations where the agents' states contai…
▽ More
Average consensus protocols emerge with a central role in distributed systems and decision-making such as distributed information fusion, distributed optimization, distributed estimation, and control. A key advantage of these protocols is that agents exchange and reveal their state information only to their neighbors. Yet, it can raise privacy concerns in situations where the agents' states contain sensitive information. In this paper, we propose a novel (noiseless) privacy preserving distributed algorithms for multi-agent systems to reach an average consensus. The main idea of the algorithms is that each agent runs a (small) network with a crafted structure and dynamics to form a network of networks (i.e., the connection between the newly created networks and their interconnections respecting the initial network connections). Together with a re-weighting of the dynamic parameters dictating the inter-agent dynamics and the initial states, we show that it is possible to ensure that the value of each node converges to the consensus value of the original network. Furthermore, we show that, under mild assumptions, it is possible to craft the dynamics such that the design can be achieved in a distributed fashion. Finally, we illustrate the proposed algorithm with examples.
△ Less
Submitted 18 December, 2021;
originally announced December 2021.
-
Dynamic Median Consensus Over Random Networks
Authors:
Shuhua Yu,
Yuan Chen,
Soummya Kar
Abstract:
This paper studies the problem of finding the median of N distinct numbers distributed across networked agents. Each agent updates its estimate for the median from noisy local observations of one of the N numbers and information from neighbors. We consider an undirected random network that is connected on average, and a noisy observation sequence that has finite variance and almost surely decaying…
▽ More
This paper studies the problem of finding the median of N distinct numbers distributed across networked agents. Each agent updates its estimate for the median from noisy local observations of one of the N numbers and information from neighbors. We consider an undirected random network that is connected on average, and a noisy observation sequence that has finite variance and almost surely decaying bias. We present a consensus+innovations algorithm with clipped innovations. Under some regularity assumptions on the network and observation model, we show that each agent's local estimate converges to the set of median(s) almost surely at an asymptotic sublinear rate. Numerical experiments demonstrate the effectiveness of the presented algorithm.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
A Stochastic Proximal Gradient Framework for Decentralized Non-Convex Composite Optimization: Topology-Independent Sample Complexity and Communication Efficiency
Authors:
Ran Xin,
Subhro Das,
Usman A. Khan,
Soummya Kar
Abstract:
Decentralized optimization is a promising parallel computation paradigm for large-scale data analytics and machine learning problems defined over a network of nodes. This paper is concerned with decentralized non-convex composite problems with population or empirical risk. In particular, the networked nodes are tasked to find an approximate stationary point of the average of local, smooth, possibl…
▽ More
Decentralized optimization is a promising parallel computation paradigm for large-scale data analytics and machine learning problems defined over a network of nodes. This paper is concerned with decentralized non-convex composite problems with population or empirical risk. In particular, the networked nodes are tasked to find an approximate stationary point of the average of local, smooth, possibly non-convex risk functions plus a possibly non-differentiable extended valued convex regularizer. Under this general formulation, we propose the first provably efficient, stochastic proximal gradient framework, called ProxGT. Specifically, we construct and analyze several instances of ProxGT that are tailored respectively for different problem classes of interest. Remarkably, we show that the sample complexities of these instances are network topology-independent and achieve linear speedups compared to that of the corresponding centralized optimal methods implemented on a single node.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Characterizing the Functional Density Power Divergence Class
Authors:
Souvik Ray,
Subrata Pal,
Sumit Kumar Kar,
Ayanendranath Basu
Abstract:
Divergence measures have a long association with statistical inference, machine learning and information theory. The density power divergence and related measures have produced many useful (and popular) statistical procedures, which provide a good balance between model efficiency on one hand and outlier stability or robustness on the other. The logarithmic density power divergence, a particular lo…
▽ More
Divergence measures have a long association with statistical inference, machine learning and information theory. The density power divergence and related measures have produced many useful (and popular) statistical procedures, which provide a good balance between model efficiency on one hand and outlier stability or robustness on the other. The logarithmic density power divergence, a particular logarithmic transform of the density power divergence, has also been very successful in producing efficient and stable inference procedures; in addition it has also led to significant demonstrated applications in information theory. The success of the minimum divergence procedures based on the density power divergence and the logarithmic density power divergence (which also go by the names $β$-divergence and $γ$-divergence, respectively) make it imperative and meaningful to look for other, similar divergences which may be obtained as transforms of the density power divergence in the same spirit. With this motivation we search for such transforms of the density power divergence, referred to herein as the functional density power divergence class. The present article characterizes this functional density power divergence class, and thus identifies the available divergence measures within this construct that may be explored further for possible applications in statistical inference, machine learning and information theory.
△ Less
Submitted 4 September, 2022; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Characterizing Logarithmic Bregman Functions
Authors:
Souvik Ray,
Subrata Pal,
Sumit Kumar Kar,
Ayanendranath Basu
Abstract:
Minimum divergence procedures based on the density power divergence and the logarithmic density power divergence have been extremely popular and successful in generating inference procedures which combine a high degree of model efficiency with strong outlier stability. Such procedures are always preferable in practical situations over procedures which achieve their robustness at a major cost of ef…
▽ More
Minimum divergence procedures based on the density power divergence and the logarithmic density power divergence have been extremely popular and successful in generating inference procedures which combine a high degree of model efficiency with strong outlier stability. Such procedures are always preferable in practical situations over procedures which achieve their robustness at a major cost of efficiency or are highly efficient but have poor robustness properties. The density power divergence (DPD) family of Basu et al.(1998) and the logarithmic density power divergence (LDPD) family of Jones et al.(2001) provide flexible classes of divergences where the adjustment between efficiency and robustness is controlled by a single, real, non-negative parameter. The usefulness of these two families of divergences in statistical inference makes it meaningful to search for other related families of divergences in the same spirit. The DPD family is a member of the class of Bregman divergences, and the LDPD family is obtained by log transformations of the different segments of the divergences within the DPD family. Both the DPD and LDPD families lead to the Kullback-Leibler divergence in the limiting case as the tuning parameter $α\rightarrow 0$. In this paper we study this relation in detail, and demonstrate that such log transformations can only be meaningful in the context of the DPD (or the convex generating function of the DPD) within the general fold of Bregman divergences, giving us a limit to the extent to which the search for useful divergences could be successful.
△ Less
Submitted 9 November, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
On the Accuracy of Deterministic Models for Viral Spread on Networks
Authors:
Anirudh Sridhar,
Soummya Kar
Abstract:
We consider the emergent behavior of viral spread when agents in a large population interact with each other over a contact network. When the number of agents is large and the contact network is a complete graph, it is well known that the population behavior -- that is, the fraction of susceptible, infected and recovered agents -- converges to the solution of an ordinary differential equation (ODE…
▽ More
We consider the emergent behavior of viral spread when agents in a large population interact with each other over a contact network. When the number of agents is large and the contact network is a complete graph, it is well known that the population behavior -- that is, the fraction of susceptible, infected and recovered agents -- converges to the solution of an ordinary differential equation (ODE) known as the classical SIR model as the population size approaches infinity. In contrast, we study interactions over contact networks with generic topologies and derive conditions under which the population behavior concentrates around either the classic SIR model or other deterministic models. Specifically, we show that when most vertex degrees in the contact network are sufficiently large, the population behavior concentrates around an ODE known as the network SIR model. We then study the short and intermediate-term evolution of the network SIR model and show that if the contact network has an expander-type property or the initial set of infections is well-mixed in the population, the network SIR model reduces to the classical SIR model. To complement these results, we illustrate through simulations that the two models can yield drastically different predictions, hence use of the classical SIR model can be misleading in certain cases.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Finite-Time In-Network Computation of Linear Transforms
Authors:
Soummya Kar,
Markus Püschel,
José M. F. Moura
Abstract:
This paper focuses on finite-time in-network computation of linear transforms of distributed graph data. Finite-time transform computation problems are of interest in graph-based computing and signal processing applications in which the objective is to compute, by means of distributed iterative methods, various (linear) transforms of the data distributed at the agents or nodes of the graph. While…
▽ More
This paper focuses on finite-time in-network computation of linear transforms of distributed graph data. Finite-time transform computation problems are of interest in graph-based computing and signal processing applications in which the objective is to compute, by means of distributed iterative methods, various (linear) transforms of the data distributed at the agents or nodes of the graph. While finite-time computation of consensus-type or more generally rank-one transforms have been studied, systematic approaches toward scalable computing of general linear transforms, specifically in the case of heterogeneous agent objectives in which each agent is interested in obtaining a different linear combination of the network data, are relatively less explored. In this paper, by employing ideas from algebraic geometry, we develop a systematic characterization of linear transforms that are amenable to distributed in-network computation in finite-time using linear iterations. Further, we consider the general case of directed inter-agent communication graphs. Specifically, it is shown that \emph{almost all} linear transformations of data distributed on the nodes of a digraph containing a Hamiltonian cycle may be computed using at most $N$ linear distributed iterations. Finally, by studying an associated matrix factorization based reformulation of the transform computation problem, we obtain, as a by-product, certain results and characterizations on sparsity-constrained matrix factorization that are of independent interest.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.
-
A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
This paper considers decentralized stochastic optimization over a network of $n$ nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an $ε$-accurate first-order stationary point of the sum of the local costs. We focus on an online setting, where each node accesses its local cost only by means of a stochastic first-order oracle tha…
▽ More
This paper considers decentralized stochastic optimization over a network of $n$ nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an $ε$-accurate first-order stationary point of the sum of the local costs. We focus on an online setting, where each node accesses its local cost only by means of a stochastic first-order oracle that returns a noisy version of the exact gradient. In this context, we propose a novel single-loop decentralized hybrid variance-reduced stochastic gradient method, called GT-HSGD, that outperforms the existing approaches in terms of both the oracle complexity and practical implementation. The GT-HSGD algorithm implements specialized local hybrid stochastic gradient estimators that are fused over the network to track the global gradient. Remarkably, GT-HSGD achieves a network topology-independent oracle complexity of $O(n^{-1}ε^{-3})$ when the required error tolerance $ε$ is small enough, leading to a linear speedup with respect to the centralized optimal online variance-reduced approaches that operate on a single node. Numerical experiments are provided to illustrate our main technical results.
△ Less
Submitted 14 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Mean-field Approximations for Stochastic Population Processes with Heterogeneous Interactions
Authors:
Anirudh Sridhar,
Soummya Kar
Abstract:
This paper studies a general class of stochastic population processes in which agents interact with one another over a network. Agents update their behaviors in a random and decentralized manner according to a policy that depends only on the agent's current state and an estimate of the macroscopic population state, given by a weighted average of the neighboring states. When the number of agents is…
▽ More
This paper studies a general class of stochastic population processes in which agents interact with one another over a network. Agents update their behaviors in a random and decentralized manner according to a policy that depends only on the agent's current state and an estimate of the macroscopic population state, given by a weighted average of the neighboring states. When the number of agents is large and the network is a complete graph (has all-to-all information access), the macroscopic behavior of the population can be well-approximated by a set of deterministic differential equations called a {\it mean-field approximation}. For incomplete networks such characterizations remained previously unclear, i.e., in general whether a suitable mean-field approximation exists for the macroscopic behavior of the population. The paper addresses this gap by establishing a generic theory describing when various mean-field approximations are accurate for \emph{arbitrary} interaction structures.
Our results are threefold. Letting $W$ be the matrix describing agent interactions, we first show that a simple mean-field approximation that incorrectly assumes a homogeneous interaction structure is accurate provided $W$ has a large spectral gap. Second, we show that a more complex mean-field approximation which takes into account agent interactions is accurate as long as the Frobenius norm of $W$ is small. Finally, we compare the predictions of the two mean-field approximations through simulations, highlighting cases where using mean-field approximations that assume a homogeneous interaction structure can lead to inaccurate qualitative and quantitative predictions.
△ Less
Submitted 19 July, 2023; v1 submitted 23 January, 2021;
originally announced January 2021.
-
Learning to Solve AC Optimal Power Flow by Differentiating through Holomorphic Embeddings
Authors:
Henning Lange,
Bingqing Chen,
Mario Berges,
Soummya Kar
Abstract:
Alternating current optimal power flow (AC-OPF) is one of the fundamental problems in power systems operation. AC-OPF is traditionally cast as a constrained optimization problem that seeks optimal generation set points whilst fulfilling a set of non-linear equality constraints -- the power flow equations. With increasing penetration of renewable generation, grid operators need to solve larger prob…
▽ More
Alternating current optimal power flow (AC-OPF) is one of the fundamental problems in power systems operation. AC-OPF is traditionally cast as a constrained optimization problem that seeks optimal generation set points whilst fulfilling a set of non-linear equality constraints -- the power flow equations. With increasing penetration of renewable generation, grid operators need to solve larger problems at shorter intervals. This motivates the research interest in learning OPF solutions with neural networks, which have fast inference time and is potentially scalable to large networks. The main difficulty in solving the AC-OPF problem lies in dealing with this equality constraint that has spurious roots, i.e. there are assignments of voltages that fulfill the power flow equations that however are not physically realizable. This property renders any method relying on projected-gradients brittle because these non-physical roots can act as attractors. In this paper, we show efficient strategies that circumvent this problem by differentiating through the operations of a power flow solver that embeds the power flow equations into a holomorphic function. The resulting learning-based approach is validated experimentally on a 200-bus system and we show that, after training, the learned agent produces optimized power flow solutions reliably and fast. Specifically, we report a 12x increase in speed and a 40% increase in robustness compared to a traditional solver. To the best of our knowledge, this approach constitutes the first learning-based approach that successfully respects the full non-linear AC-OPF equations.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
A fast randomized incremental gradient method for decentralized non-convex optimization
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
We study decentralized non-convex finite-sum minimization problems described over a network of nodes, where each node possesses a local batch of data samples. In this context, we analyze a single-timescale randomized incremental gradient method, called GT-SAGA. GT-SAGA is computationally efficient as it evaluates one component gradient per node per iteration and achieves provably fast and robust p…
▽ More
We study decentralized non-convex finite-sum minimization problems described over a network of nodes, where each node possesses a local batch of data samples. In this context, we analyze a single-timescale randomized incremental gradient method, called GT-SAGA. GT-SAGA is computationally efficient as it evaluates one component gradient per node per iteration and achieves provably fast and robust performance by leveraging node-level variance reduction and network-level gradient tracking. For general smooth non-convex problems, we show the almost sure and mean-squared convergence of GT-SAGA to a first-order stationary point and further describe regimes of practical significance where it outperforms the existing approaches and achieves a network topology-independent iteration complexity respectively. When the global function satisfies the Polyak-Lojaciewisz condition, we show that GT-SAGA exhibits linear convergence to an optimal solution in expectation and describe regimes of practical interest where the performance is network topology-independent and improves upon the existing methods. Numerical experiments are included to highlight the main convergence aspects of GT-SAGA in non-convex settings.
△ Less
Submitted 30 September, 2021; v1 submitted 7 November, 2020;
originally announced November 2020.
-
A multi-objective multi-item solid transportation problem with vehicle cost, volume and weight capacity under fuzzy environment
Authors:
Mouhya B. Kar,
Pradip Kundu,
Samarjit Kar,
Tandra Pal
Abstract:
Generally, in transportation problem, full vehicles (e.g., light commercial vehicles, medium duty and heavy duty trucks, etc.) are to be booked, and transportation cost of a vehicle has to be paid irrespective of the fulfilment of the capacity of the vehicle. Besides the transportation cost, total time that includes travel time of a vehicle, loading and unloading times of products is also an impor…
▽ More
Generally, in transportation problem, full vehicles (e.g., light commercial vehicles, medium duty and heavy duty trucks, etc.) are to be booked, and transportation cost of a vehicle has to be paid irrespective of the fulfilment of the capacity of the vehicle. Besides the transportation cost, total time that includes travel time of a vehicle, loading and unloading times of products is also an important issue. Also, instead of a single item, different types of items may need to be transported from some sources to destinations through different types of conveyances. The optimal transportation policy may be affected by many other issues like volume and weight of per unit of product, unavailability of sufficient number of certain types of vehicles, etc. In this paper, we formulate a multi-objective multi-item solid transportation problem by addressing all these issues. The problem is formulated with the transportation cost and time parameters as fuzzy variables. Using credibility theory of fuzzy variables, a chance-constraint programming model is formulated, and is then transformed into the corresponding deterministic form. Finally numerical example is provided to illustrate the problem.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
The economics of utility-scale portable energy storage systems in a high-renewable grid
Authors:
Guannan He,
Jeremy Michalek,
Soummya Kar,
Qixin Chen,
Da Zhang,
Jay F. Whitacre
Abstract:
Battery storage is expected to play a crucial role in the low-carbon transformation of energy systems. The deployment of battery storage in the power gird, however, is currently severely limited by its low economic viability, which results from not only high capital costs but also the lack of flexible and efficient utilization schemes and business models. Making utility-scale battery storage porta…
▽ More
Battery storage is expected to play a crucial role in the low-carbon transformation of energy systems. The deployment of battery storage in the power gird, however, is currently severely limited by its low economic viability, which results from not only high capital costs but also the lack of flexible and efficient utilization schemes and business models. Making utility-scale battery storage portable through trucking unlocks its capability to provide various on-demand services. We introduce the potential applications of utility-scale portable energy storage and investigate its economics in California using a spatiotemporal decision model that determines the optimal operation and transportation schedules of portable storage. We show that mobilizing energy storage can increase its life-cycle revenues by 70% in some areas and improve renewable energy integration by relieving local transmission congestion. The life-cycle revenue of spatiotemporal arbitrage can fully compensate for the costs of portable energy storage system in several regions in California, including San Diego and the San Francisco Bay Area.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Fast decentralized non-convex finite-sum optimization with recursive variance reduction
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARA…
▽ More
This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $ε$-accurate first-order stationary point with $O\big(\max\big\{N^{\frac{1}{2}},n(1-λ)^{-2},n^{\frac{2}{3}}m^{\frac{1}{3}}(1-λ)^{-1}\big\}Lε^{-2}\big)$ gradient complexity, where ${(1-λ)\in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N^{\frac{1}{2}}(1-λ)^{3})}$, this gradient complexity furthers reduces to ${O(N^{\frac{1}{2}}Lε^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.
△ Less
Submitted 18 September, 2021; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Distributed Gradient Flow: Nonsmoothness, Nonconvexity, and Saddle Point Evasion
Authors:
Brian Swenson,
Ryan Murray,
H. Vincent Poor,
Soummya Kar
Abstract:
The paper considers distributed gradient flow (DGF) for multi-agent nonconvex optimization. DGF is a continuous-time approximation of distributed gradient descent that is often easier to study than its discrete-time counterpart. The paper has two main contributions. First, the paper considers optimization of nonsmooth, nonconvex objective functions. It is shown that DGF converges to critical point…
▽ More
The paper considers distributed gradient flow (DGF) for multi-agent nonconvex optimization. DGF is a continuous-time approximation of distributed gradient descent that is often easier to study than its discrete-time counterpart. The paper has two main contributions. First, the paper considers optimization of nonsmooth, nonconvex objective functions. It is shown that DGF converges to critical points in this setting. The paper then considers the problem of avoiding saddle points. It is shown that if agents' objective functions are assumed to be smooth and nonconvex, then DGF can only converge to a saddle point from a zero-measure set of initial conditions. To establish this result, the paper proves a stable manifold theorem for DGF, which is a fundamental contribution of independent interest. In a companion paper, analogous results are derived for discrete-time algorithms.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
An improved convergence analysis for decentralized online stochastic non-convex optimization
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions,…
▽ More
In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD. In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD. When the global non-convex function additionally satisfies the Polyak-Lojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.
△ Less
Submitted 28 December, 2020; v1 submitted 10 August, 2020;
originally announced August 2020.
-
Optimum Production for a heaped stock dependent breakable item through variational principle
Authors:
J. N. Roul,
K. Maity,
S. Kar,
M. Maiti
Abstract:
Breakability rate of fragile item depends on the accumulated stress of heaped stock level. So breakablility rate can be considered as dependent parameter of stock variable. The unit production cost is a function of production rate and also dependent on raw material cost, development cost and wear-tear cost. The holding cost is assumed to be non-linear, dependent on time. Here optimal control probl…
▽ More
Breakability rate of fragile item depends on the accumulated stress of heaped stock level. So breakablility rate can be considered as dependent parameter of stock variable. The unit production cost is a function of production rate and also dependent on raw material cost, development cost and wear-tear cost. The holding cost is assumed to be non-linear, dependent on time. Here optimal control problem for a fragile item under finite time horizon is considered. The profit function which consists of revenue, production and holding costs is formulated as a Fixed-Final Time and Fixed State System(cf. Naidu (2000)) optimal control problem with finite time horizon. Here production rate is unknown and considered as a control variable and stock level is taken as a state variable. It is formulated to optimize the production rate so that total profit is maximum. As particular cases, models are evaluated with and without breakability. The models are solved by using conventional Variational Principle along with the non-linear optimization technique-Generalised Reduced Gradient Method (LINGO 12.0). The optimum results are illustrated both numerically and graphically. Some sensitivity analysis on breakability coefficient are presented.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
S-ADDOPT: Decentralized stochastic first-order optimization over directed graphs
Authors:
Muhammad I. Qureshi,
Ran Xin,
Soummya Kar,
Usman A. Khan
Abstract:
In this report, we study decentralized stochastic optimization to minimize a sum of smooth and strongly convex cost functions when the functions are distributed over a directed network of nodes. In contrast to the existing work, we use gradient tracking to improve certain aspects of the resulting algorithm. In particular, we propose the~\textbf{\texttt{S-ADDOPT}} algorithm that assumes a stochasti…
▽ More
In this report, we study decentralized stochastic optimization to minimize a sum of smooth and strongly convex cost functions when the functions are distributed over a directed network of nodes. In contrast to the existing work, we use gradient tracking to improve certain aspects of the resulting algorithm. In particular, we propose the~\textbf{\texttt{S-ADDOPT}} algorithm that assumes a stochastic first-order oracle at each node and show that for a constant step-size~$α$, each node converges linearly inside an error ball around the optimal solution, the size of which is controlled by~$α$. For decaying step-sizes~$\mathcal{O}(1/k)$, we show that~\textbf{\texttt{S-ADDOPT}} reaches the exact solution sublinearly at~$\mathcal{O}(1/k)$ and its convergence is asymptotically network-independent. Thus the asymptotic behavior of~\textbf{\texttt{S-ADDOPT}} is comparable to the centralized stochastic gradient descent. Numerical experiments over both strongly convex and non-convex problems illustrate the convergence behavior and the performance comparison of the proposed algorithm.
△ Less
Submitted 22 July, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Distributed Gradient Methods for Nonconvex Optimization: Local and Global Convergence Guarantees
Authors:
Brian Swenson,
Soummya Kar,
H. Vincent Poor,
José M. F. Moura,
Aaron Jaech
Abstract:
The article discusses distributed gradient-descent algorithms for computing local and global minima in nonconvex optimization. For local optimization, we focus on distributed stochastic gradient descent (D-SGD)--a simple network-based variant of classical SGD. We discuss local minima convergence guarantees and explore the simple but critical role of the stable-manifold theorem in analyzing saddle-…
▽ More
The article discusses distributed gradient-descent algorithms for computing local and global minima in nonconvex optimization. For local optimization, we focus on distributed stochastic gradient descent (D-SGD)--a simple network-based variant of classical SGD. We discuss local minima convergence guarantees and explore the simple but critical role of the stable-manifold theorem in analyzing saddle-point avoidance. For global optimization, we discuss annealing-based methods in which slowly decaying noise is added to D-SGD. Conditions are discussed under which convergence to global minima is guaranteed. Numerical examples illustrate the key concepts in the paper.
△ Less
Submitted 16 September, 2020; v1 submitted 23 March, 2020;
originally announced March 2020.
-
Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima
Authors:
Brian Swenson,
Ryan Murray,
H. Vincent Poor,
Soummya Kar
Abstract:
In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD avoids saddle points an…
▽ More
In centralized settings, it is well known that stochastic gradient descent (SGD) avoids saddle points and converges to local minima in nonconvex problems. However, similar guarantees are lacking for distributed first-order algorithms. The paper studies distributed stochastic gradient descent (D-SGD)--a simple network-based implementation of SGD. Conditions under which D-SGD avoids saddle points and converges to local minima are studied. First, we consider the problem of computing critical points. Assuming loss functions are nonconvex and possibly nonsmooth, it is shown that, for each fixed initialization, D-SGD converges to critical points of the loss with probability one. Next, we consider the problem of avoiding saddle points. In this case, we again assume that loss functions may be nonconvex and nonsmooth, but are smooth in a neighborhood of a saddle point. It is shown that, for any fixed initialization, D-SGD avoids such saddle points with probability one. Results are proved by studying the underlying (distributed) gradient flow, using the ordinary differential equation (ODE) method of stochastic approximation, and extending classical techniques from dynamical systems theory such as stable manifolds. Results are proved in the general context of subspace-constrained optimization, of which D-SGD is a special case.
△ Less
Submitted 4 March, 2022; v1 submitted 5 March, 2020;
originally announced March 2020.
-
Gradient tracking and variance reduction for decentralized optimization and machine learning
Authors:
Ran Xin,
Soummya Kar,
Usman A. Khan
Abstract:
Decentralized methods to solve finite-sum minimization problems are important in many signal processing and machine learning tasks where the data is distributed over a network of nodes and raw data sharing is not permitted due to privacy and/or resource constraints. In this article, we review decentralized stochastic first-order methods and provide a unified algorithmic framework that combines var…
▽ More
Decentralized methods to solve finite-sum minimization problems are important in many signal processing and machine learning tasks where the data is distributed over a network of nodes and raw data sharing is not permitted due to privacy and/or resource constraints. In this article, we review decentralized stochastic first-order methods and provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve both robust performance and fast convergence. We provide explicit theoretical guarantees of the corresponding methods when the objective functions are smooth and strongly-convex, and show their applicability to non-convex problems via numerical experiments. Throughout the article, we provide intuitive illustrations of the main technical ideas by casting appropriate tradeoffs and comparisons among the methods of interest and by highlighting applications to decentralized training of machine learning models.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Variance-Reduced Decentralized Stochastic Optimization with Accelerated Convergence
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
This paper describes a novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes. The proposed framework, that we call~\GTVR, is stochastic and decentralized, and thus is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server. The \GTVR~framework leads to a family of algorith…
▽ More
This paper describes a novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes. The proposed framework, that we call~\GTVR, is stochastic and decentralized, and thus is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server. The \GTVR~framework leads to a family of algorithms with two key ingredients: (i) \textit{local variance reduction}, that enables estimating the local batch gradients from arbitrarily drawn samples of local data; and, (ii) \textit{global gradient tracking}, which fuses the gradient information across the nodes. Naturally, combining different variance reduction and gradient tracking techniques leads to different algorithms of interest with valuable practical tradeoffs and design considerations.
Our focus in this paper is on two instantiations of the~$\GTVR$ framework, namely~\textbf{\texttt{GT-SAGA}} and~\textbf{\texttt{GT-SVRG}}, that, similar to their centralized counterparts (\SAGA~and~\SVRG), exhibit a compromise between space and time. We show that both~\textbf{\texttt{GT-SAGA}} and~\textbf{\texttt{GT-SVRG}} achieve accelerated linear convergence for smooth and strongly convex problems and further describe the regimes in which they achieve non-asymptotic, network-independent linear convergence rates that are faster with respect to the existing decentralized first-order schemes. Moreover, we show that both algorithms achieve a linear speedup in such regimes, in that, the total number of gradient computations required at each node is reduced by a factor of $1/n$, where $n$ is the number of nodes, compared to their centralized counterparts that process all data at a single node. Extensive simulations illustrate the convergence behavior of the corresponding algorithms.
△ Less
Submitted 9 October, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Resilient Distributed Recovery of Large Fields
Authors:
Yuan Chen,
Soummya Kar,
José M. F. Moura
Abstract:
This paper studies the resilient distributed recovery of large fields under measurement attacks, by a team of agents, where each measures a small subset of the components of a large spatially distributed field. An adversary corrupts some of the measurements. The agents collaborate to process their measurements, and each is interested in recovering only a fraction of the field. We present a field r…
▽ More
This paper studies the resilient distributed recovery of large fields under measurement attacks, by a team of agents, where each measures a small subset of the components of a large spatially distributed field. An adversary corrupts some of the measurements. The agents collaborate to process their measurements, and each is interested in recovering only a fraction of the field. We present a field recovery consensus+innovations type distributed algorithm that is resilient to measurement attacks, where an agent maintains and updates a local state based on its neighbors states and its own measurement. Under sufficient conditions on the attacker and the connectivity of the communication network, each agent's state, even those with compromised measurements, converges to the true value of the field components that it is interested in recovering. Finally, we illustrate the performance of our algorithm through numerical examples.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking -- Part II: GT-SVRG
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
Decentralized stochastic optimization has recently benefited from gradient tracking methods \cite{DSGT_Pu,DSGT_Xin} providing efficient solutions for large-scale empirical risk minimization problems. In Part I \cite{GT_SAGA} of this work, we develop \textbf{\texttt{GT-SAGA}} that is based on a decentralized implementation of SAGA \cite{SAGA} using gradient tracking and discuss regimes of practical…
▽ More
Decentralized stochastic optimization has recently benefited from gradient tracking methods \cite{DSGT_Pu,DSGT_Xin} providing efficient solutions for large-scale empirical risk minimization problems. In Part I \cite{GT_SAGA} of this work, we develop \textbf{\texttt{GT-SAGA}} that is based on a decentralized implementation of SAGA \cite{SAGA} using gradient tracking and discuss regimes of practical interest where \textbf{\texttt{GT-SAGA}} outperforms existing decentralized approaches in terms of the total number of local gradient computations. In this paper, we describe \textbf{\texttt{GT-SVRG}} that develops a decentralized gradient tracking based implementation of SVRG \cite{SVRG}, another well-known variance-reduction technique. We show that the convergence rate of \textbf{\texttt{GT-SVRG}} matches that of \textbf{\texttt{GT-SAGA}} for smooth and strongly-convex functions and highlight different trade-offs between the two algorithms in various settings.
△ Less
Submitted 10 December, 2019; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking--Part I: GT-SAGA
Authors:
Ran Xin,
Usman A. Khan,
Soummya Kar
Abstract:
In this paper, we study decentralized empirical risk minimization problems, where the goal is to minimize a finite-sum of smooth and strongly-convex functions available over a network of nodes. In this Part I, we propose \textbf{\texttt{GT-SAGA}}, a decentralized stochastic first-order algorithm based on gradient tracking \cite{DSGT_Pu,DSGT_Xin} and a variance-reduction technique called SAGA \cite…
▽ More
In this paper, we study decentralized empirical risk minimization problems, where the goal is to minimize a finite-sum of smooth and strongly-convex functions available over a network of nodes. In this Part I, we propose \textbf{\texttt{GT-SAGA}}, a decentralized stochastic first-order algorithm based on gradient tracking \cite{DSGT_Pu,DSGT_Xin} and a variance-reduction technique called SAGA \cite{SAGA}. We develop the convergence analysis and the iteration complexity of this algorithm. We further demonstrate various trade-offs and discuss scenarios in which \textbf{\texttt{GT-SAGA}} achieves superior performance (in terms of the number of local gradient computations required) with respect to existing decentralized schemes. In Part II \cite{GT_SVRG} of this two-part paper, we develop and analyze \textbf{\texttt{GT-SVRG}}, a decentralized gradient tracking based implementation of SVRG \cite{SVRG}, another well-known variance-reduction technique.
△ Less
Submitted 10 December, 2019; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Distributed Gradient Descent: Nonconvergence to Saddle Points and the Stable-Manifold Theorem
Authors:
Brian Swenson,
Ryan Murray,
H. Vincent Poor,
Soummya Kar
Abstract:
The paper studies a distributed gradient descent (DGD) process and considers the problem of showing that in nonconvex optimization problems, DGD typically converges to local minima rather than saddle points. The paper considers unconstrained minimization of a smooth objective function. In centralized settings, the problem of demonstrating nonconvergence to saddle points of gradient descent (and va…
▽ More
The paper studies a distributed gradient descent (DGD) process and considers the problem of showing that in nonconvex optimization problems, DGD typically converges to local minima rather than saddle points. The paper considers unconstrained minimization of a smooth objective function. In centralized settings, the problem of demonstrating nonconvergence to saddle points of gradient descent (and variants) is typically handled by way of the stable-manifold theorem from classical dynamical systems theory. However, the classical stable-manifold theorem is not applicable in distributed settings. The paper develops an appropriate stable-manifold theorem for DGD showing that convergence to saddle points may only occur from a low-dimensional stable manifold. Under appropriate assumptions (e.g., coercivity), this result implies that DGD typically converges to local minima and not to saddle points.
△ Less
Submitted 23 October, 2019; v1 submitted 7 August, 2019;
originally announced August 2019.
-
An introduction to decentralized stochastic optimization with gradient tracking
Authors:
Ran Xin,
Soummya Kar,
Usman A. Khan
Abstract:
Decentralized solutions to finite-sum minimization are of significant importance in many signal processing, control, and machine learning applications. In such settings, the data is distributed over a network of arbitrarily-connected nodes and raw data sharing is prohibitive often due to communication or privacy constraints. In this article, we review decentralized stochastic first-order optimizat…
▽ More
Decentralized solutions to finite-sum minimization are of significant importance in many signal processing, control, and machine learning applications. In such settings, the data is distributed over a network of arbitrarily-connected nodes and raw data sharing is prohibitive often due to communication or privacy constraints. In this article, we review decentralized stochastic first-order optimization methods and illustrate some recent improvements based on gradient tracking and variance reduction, focusing particularly on smooth and strongly-convex objective functions. We provide intuitive illustrations of the main technical ideas as well as applications of the algorithms in the context of decentralized training of machine learning models.
△ Less
Submitted 12 November, 2019; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Distributed Global Optimization by Annealing
Authors:
Brian Swenson,
Soummya Kar,
H. Vincent Poor,
José M. F. Moura
Abstract:
The paper considers a distributed algorithm for global minimization of a nonconvex function. The algorithm is a first-order consensus + innovations type algorithm that incorporates decaying additive Gaussian noise for annealing, converging to the set of global minima under certain technical assumptions. The paper presents simple methods for verifying that the required technical assumptions hold an…
▽ More
The paper considers a distributed algorithm for global minimization of a nonconvex function. The algorithm is a first-order consensus + innovations type algorithm that incorporates decaying additive Gaussian noise for annealing, converging to the set of global minima under certain technical assumptions. The paper presents simple methods for verifying that the required technical assumptions hold and illustrates it with a distributed target-localization application.
△ Less
Submitted 20 July, 2019;
originally announced July 2019.
-
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling
Authors:
Jianyu Wang,
Anit Kumar Sahu,
Zhouyi Yang,
Gauri Joshi,
Soummya Kar
Abstract:
This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that ca…
▽ More
This paper studies the problem of error-runtime trade-off, typically encountered in decentralized training based on stochastic gradient descent (SGD) using a given network. While a denser (sparser) network topology results in faster (slower) error convergence in terms of iterations, it incurs more (less) communication time/delay per iteration. In this paper, we propose MATCHA, an algorithm that can achieve a win-win in this error-runtime trade-off for any arbitrary network topology. The main idea of MATCHA is to parallelize inter-node communication by decomposing the topology into matchings. To preserve fast error convergence speed, it identifies and communicates more frequently over critical links, and saves communication time by using other links less frequently. Experiments on a suite of datasets and deep neural networks validate the theoretical analyses and demonstrate that MATCHA takes up to $5\times$ less time than vanilla decentralized SGD to reach the same training loss.
△ Less
Submitted 18 November, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Resilient Distributed Field Estimation
Authors:
Yuan Chen,
Soummya Kar,
José M. F. Moura
Abstract:
We study resilient distributed field estimation under measurement attacks. A network of agents or devices measures a large, spatially distributed physical field parameter. An adversary arbitrarily manipulates the measurements of some of the agents. Each agent's goal is to process its measurements and information received from its neighbors to estimate only a few specific components of the field. W…
▽ More
We study resilient distributed field estimation under measurement attacks. A network of agents or devices measures a large, spatially distributed physical field parameter. An adversary arbitrarily manipulates the measurements of some of the agents. Each agent's goal is to process its measurements and information received from its neighbors to estimate only a few specific components of the field. We present $\mathbf{SAFE}$, the Saturating Adaptive Field Estimator, a consensus+innovations distributed field estimator that is resilient to measurement attacks. Under sufficient conditions on the compromised measurement streams, the physical coupling between the field and the agents' measurements, and the connectivity of the cyber communication network, $\mathbf{SAFE}$ guarantees that each agent's estimate converges almost surely to the true value of the components of the parameter in which the agent is interested. Finally, we illustrate the performance of $\mathbf{SAFE}$ through numerical examples.
△ Less
Submitted 26 March, 2020; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Power System Dispatch with Marginal Degradation Cost of Battery Storage
Authors:
Guannan He,
Soummya Kar,
Javad Mohammadi,
Panayiotis Moutis,
Jay F. Whitacre
Abstract:
Battery storage is essential for the future smart grid. The inevitable cell degradation renders the battery lifetime volatile and highly dependent on battery dispatch, and thus incurs opportunity cost. This paper rigorously derives the marginal degradation cost of battery for power system dispatch. The derived optimal marginal degradation cost is time-variant to reflect the time value of money and…
▽ More
Battery storage is essential for the future smart grid. The inevitable cell degradation renders the battery lifetime volatile and highly dependent on battery dispatch, and thus incurs opportunity cost. This paper rigorously derives the marginal degradation cost of battery for power system dispatch. The derived optimal marginal degradation cost is time-variant to reflect the time value of money and the functionality fade of battery and takes the form of a constant value divided by a discount factor plus a term related to battery state of health. In case studies, we demonstrate the evolution of the optimal marginal costs of degradation that corresponds to the optimal long-term dispatch outcome. We also show that the optimal marginal cost of degradation depends on the marginal cost of generation in the grid.
△ Less
Submitted 9 June, 2020; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Annealing for Distributed Global Optimization
Authors:
Brian Swenson,
Soummya Kar,
H. Vincent Poor,
Jose' M. F. Moura
Abstract:
The paper proves convergence to global optima for a class of distributed algorithms for nonconvex optimization in network-based multi-agent settings. Agents are permitted to communicate over a time-varying undirected graph. Each agent is assumed to possess a local objective function (assumed to be smooth, but possibly nonconvex). The paper considers algorithms for optimizing the sum function. A di…
▽ More
The paper proves convergence to global optima for a class of distributed algorithms for nonconvex optimization in network-based multi-agent settings. Agents are permitted to communicate over a time-varying undirected graph. Each agent is assumed to possess a local objective function (assumed to be smooth, but possibly nonconvex). The paper considers algorithms for optimizing the sum function. A distributed algorithm of the consensus+innovations type is proposed which relies on first-order information at the agent level. Under appropriate conditions on network connectivity and the cost objective, convergence to the set of global optima is achieved by an annealing-type approach, with decaying Gaussian noise independently added into each agent's update step. It is shown that the proposed algorithm converges in probability to the set of global minima of the sum function.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Clustering with Distributed Data
Authors:
Soummya Kar,
Brian Swenson
Abstract:
We consider $K$-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked $K$-means, or $NK$-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-d…
▽ More
We consider $K$-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked $K$-means, or $NK$-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-dimensional statistics and not raw data at the agents. The proposed approach develops a parametric family of multi-agent clustering objectives (parameterized by $ρ$) and associated distributed $NK$-means algorithms (also parameterized by $ρ$). The $NK$-means algorithm with parameter $ρ$ converges to a set of fixed points relative to the associated multi-agent objective (designated as `generalized minima'). By appropriate choice of $ρ$, the set of generalized minima may be brought arbitrarily close to the set of Lloyd's minima. Thus, the $NK$-means algorithm may be used to compute Lloyd's minima of the collective dataset up to arbitrary accuracy.
△ Less
Submitted 1 January, 2019;
originally announced January 2019.
-
Resilient Distributed Parameter Estimation with Heterogeneous Data
Authors:
Yuan Chen,
Soummya Kar,
José M. F. Moura
Abstract:
This paper studies resilient distributed estimation under measurement attacks. A set of agents each makes successive local, linear, noisy measurements of an unknown vector field collected in a vector parameter. The local measurement models are heterogeneous across agents and may be locally unobservable for the unknown parameter. An adversary compromises some of the measurement streams and changes…
▽ More
This paper studies resilient distributed estimation under measurement attacks. A set of agents each makes successive local, linear, noisy measurements of an unknown vector field collected in a vector parameter. The local measurement models are heterogeneous across agents and may be locally unobservable for the unknown parameter. An adversary compromises some of the measurement streams and changes their values arbitrarily. The agents' goal is to cooperate over a peer-to-peer communication network to process their (possibly compromised) local measurements and estimate the value of the unknown vector parameter. We present SAGE, the Saturating Adaptive Gain Estimator, a distributed, recursive, consensus+innovations estimator that is resilient to measurement attacks. We demonstrate that, as long as the number of compromised measurement streams is below a particular bound, then, SAGE guarantees that all of the agents' local estimates converge almost surely to the value of the parameter. The resilience of the estimator -- i.e., the number of compromised measurement streams it can tolerate -- does not depend on the topology of the inter-agent communication network. Finally, we illustrate the performance of SAGE through numerical examples.
△ Less
Submitted 30 May, 2019; v1 submitted 20 December, 2018;
originally announced December 2018.
-
Spatiotemporal Arbitrage of Large-Scale Portable Energy Storage for Grid Congestion Relief
Authors:
Guannan He,
Da Zhang,
Xidong Pi,
Qixin Chen,
Soummya Kar,
Jay Whitacre
Abstract:
Energy storage has great potential in grid congestion relief. By making large-scale energy storage portable through trucking, its capability to address grid congestion can be greatly enhanced. This paper explores a business model of large-scale portable energy storage for spatiotemporal arbitrage over nodes with congestion. We propose a spatiotemporal arbitrage model to determine the optimal opera…
▽ More
Energy storage has great potential in grid congestion relief. By making large-scale energy storage portable through trucking, its capability to address grid congestion can be greatly enhanced. This paper explores a business model of large-scale portable energy storage for spatiotemporal arbitrage over nodes with congestion. We propose a spatiotemporal arbitrage model to determine the optimal operation and transportation schedules of portable storage. To validate the business model, we simulate the schedules of a Tesla Semi full of Tesla Powerpack doing arbitrage over two nodes in California with local transmission congestion. The results indicate that the contributions of portable storage to congestion relief are much greater than that of stationary storage, and that trucking storage can bring net profit in energy arbitrage applications.
△ Less
Submitted 24 November, 2018;
originally announced November 2018.
-
Towards Gradient Free and Projection Free Stochastic Optimization
Authors:
Anit Kumar Sahu,
Manzil Zaheer,
Soummya Kar
Abstract:
This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate…
▽ More
This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate $O\left(1/T^{1/3}\right)$, where $T$ denotes the iteration count. In particular, the primal sub-optimality gap is shown to have a dimension dependence of $O\left(d^{1/3}\right)$, which is the best known dimension dependence among all zeroth order optimization algorithms with one directional derivative per iteration. For non-convex functions, we obtain the \emph{Frank-Wolfe} gap to be $O\left(d^{1/3}T^{-1/4}\right)$. Experiments on black-box optimization setups demonstrate the efficacy of the proposed algorithm.
△ Less
Submitted 18 February, 2019; v1 submitted 7 October, 2018;
originally announced October 2018.
-
Communication-Efficient Distributed Strongly Convex Stochastic Optimization: Non-Asymptotic Rates
Authors:
Anit Kumar Sahu,
Dusan Jakovetic,
Dragana Bajovic,
Soummya Kar
Abstract:
We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop…
▽ More
We examine fundamental tradeoffs in iterative distributed zeroth and first order stochastic optimization in multi-agent networks in terms of \emph{communication cost} (number of per-node transmissions) and \emph{computational cost}, measured by the number of per-node noisy function (respectively, gradient) evaluations with zeroth order (respectively, first order) methods. Specifically, we develop novel distributed stochastic optimization methods for zeroth and first order strongly convex optimization by utilizing a probabilistic inter-agent communication protocol that increasingly sparsifies communications among agents as time progresses. Under standard assumptions on the cost functions and the noise statistics, we establish with the proposed method the $O(1/(C_{\mathrm{comm}})^{4/3-ζ})$ and $O(1/(C_{\mathrm{comm}})^{8/9-ζ})$ mean square error convergence rates, for the first and zeroth order optimization, respectively, where $C_{\mathrm{comm}}$ is the expected number of network communications and $ζ>0$ is arbitrarily small. The methods are shown to achieve order-optimal convergence rates in terms of computational cost~$C_{\mathrm{comp}}$, $O(1/C_{\mathrm{comp}})$ (first order optimization) and $O(1/(C_{\mathrm{comp}})^{2/3})$ (zeroth order optimization), while achieving the order-optimal convergence rates in terms of iterations. Experiments on real-life datasets illustrate the efficacy of the proposed algorithms.
△ Less
Submitted 9 September, 2018;
originally announced September 2018.
-
Fully Distributed Cooperative Charging for Plug-in Electric Vehicles in Constrained Power Networks
Authors:
M. Hadi Amini,
Javad Mohammadi,
Soummya Kar
Abstract:
Plug-in Electric Vehicles (PEVs) play a pivotal role in transportation electrification. The flexible nature of PEVs' charging demand can be utilized for reducing charging cost as well as optimizing the operating cost of power and transportation networks. Utilizing charging flexibilities of geographically spread PEVs requires design and implementation of efficient optimization algorithms. To this e…
▽ More
Plug-in Electric Vehicles (PEVs) play a pivotal role in transportation electrification. The flexible nature of PEVs' charging demand can be utilized for reducing charging cost as well as optimizing the operating cost of power and transportation networks. Utilizing charging flexibilities of geographically spread PEVs requires design and implementation of efficient optimization algorithms. To this end, we propose a fully distributed algorithm to solve the PEVs' Cooperative Charging with Power constraints (PEV-CCP). Our solution considers the electric power limits that originate from physical characteristics of charging station, such as on-site transformer capacity limit, and allows for containing charging burden of PEVs on the electric distribution network. Our approach is also motivated by the increasing load demand at the distribution level due to additional PEV charging demand. Our proposed approach distributes computation among agents (PEVs) to solve the PEV-CCP problem in a distributed fashion through an iterative interaction between neighboring agents. The structure of each agent's update functions ensures an agreement on a price signal while enforcing individual PEV constraints. In addition to converging towards the globally-optimum solution, our algorithm ensures the feasibility of each PEV's decision at each iteration. We have tested performance of the proposed approach using a fleet of PEVs.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.