-
Nearly Consistent Finite Particle Estimates in Streaming Importance Sampling
Authors:
Alec Koppel,
Amrit Singh Bedi,
Brian M. Sadler,
Victor Elvira
Abstract:
In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of {available data} and prior information. When the distribution of random variables is {intractable}, Monte Carlo (MC) sampling is usually required. {Importance sampling is a standard MC tool that approximates this unavailable distribution with a set of weighted samples.} This pr…
▽ More
In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of {available data} and prior information. When the distribution of random variables is {intractable}, Monte Carlo (MC) sampling is usually required. {Importance sampling is a standard MC tool that approximates this unavailable distribution with a set of weighted samples.} This procedure is asymptotically consistent as the number of MC samples (particles) go to infinity. However, retaining infinitely many particles is intractable. Thus, we propose a way to only keep a \emph{finite representative subset} of particles and their augmented importance weights that is \emph{nearly consistent}. To do so in {an online manner}, we (1) embed the posterior density estimate in a reproducing kernel Hilbert space (RKHS) through its kernel mean embedding; and (2) sequentially project this RKHS element onto a lower-dimensional subspace in RKHS using the maximum mean discrepancy, an integral probability metric. Theoretically, we establish that this scheme results in a bias determined by a compression parameter, which yields a tunable tradeoff between consistency and memory. In experiments, we observe the compressed estimates achieve comparable performance to the dense ones with substantial reductions in representational complexity.
△ Less
Submitted 5 April, 2021; v1 submitted 23 September, 2019;
originally announced September 2019.
-
Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony
Authors:
Amrit Singh Bedi,
Alec Koppel,
Ketan Rajawat,
Brian M. Sadler
Abstract:
An open challenge in supervised learning is \emph{conceptual drift}: a data point begins as classified according to one label, but over time the notion of that label changes. Beyond linear autoregressive models, transfer and meta learning address drift, but require data that is representative of disparate domains at the outset of training. To relax this requirement, we propose a memory-efficient \…
▽ More
An open challenge in supervised learning is \emph{conceptual drift}: a data point begins as classified according to one label, but over time the notion of that label changes. Beyond linear autoregressive models, transfer and meta learning address drift, but require data that is representative of disparate domains at the outset of training. To relax this requirement, we propose a memory-efficient \emph{online} universal function approximator based on compressed kernel methods. Our approach hinges upon viewing non-stationary learning as online convex optimization with dynamic comparators, for which performance is quantified by dynamic regret.
Prior works control dynamic regret growth only for linear models. In contrast, we hypothesize actions belong to reproducing kernel Hilbert spaces (RKHS). We propose a functional variant of online gradient descent (OGD) operating in tandem with greedy subspace projections. Projections are necessary to surmount the fact that RKHS functions have complexity proportional to time.
For this scheme, we establish sublinear dynamic regret growth in terms of both loss variation and functional path length, and that the memory of the function sequence remains moderate. Experiments demonstrate the usefulness of the proposed technique for online nonlinear regression and classification problems with non-stationary data.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning
Authors:
Anis Elgabli,
Jihong Park,
Amrit S. Bedi,
Mehdi Bennis,
Vaneet Aggarwal
Abstract:
When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alter…
▽ More
When the data is distributed across multiple servers, lowering the communication cost between the servers (or workers) while solving the distributed learning problem is an important problem and is the focus of this paper. In particular, we propose a fast, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm, Group Alternating Direction Method of Multipliers (GADMM) is based on the Alternating Direction Method of Multipliers (ADMM) framework. The key novelty in GADMM is that it solves the problem in a decentralized topology where at most half of the workers are competing for the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges to the optimal solution for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets. Furthermore, we propose Dynamic GADMM (D-GADMM), a variant of GADMM, and prove its convergence under the time-varying network topology of the workers.
△ Less
Submitted 24 March, 2020; v1 submitted 30 August, 2019;
originally announced September 2019.
-
Adaptive Kernel Learning in Heterogeneous Networks
Authors:
Hrusikesha Pradhan,
Amrit Singh Bedi,
Alec Koppel,
Ketan Rajawat
Abstract:
We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams. We focus on the case where agents seek to estimate a regression \emph{function} that belongs to a reproducing kernel Hilbert space (RKHS). To incentivize coordination while respecting network heterog…
▽ More
We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams. We focus on the case where agents seek to estimate a regression \emph{function} that belongs to a reproducing kernel Hilbert space (RKHS). To incentivize coordination while respecting network heterogeneity, we impose nonlinear proximity constraints. To solve the constrained stochastic program, we propose applying a functional variant of stochastic primal-dual (Arrow-Hurwicz) method which yields a decentralized algorithm. To handle the fact that agents' functions have complexity proportional to time (owing to the RKHS parameterization), we project the primal iterates onto subspaces greedily constructed from kernel evaluations of agents' local observations. The resulting scheme, dubbed Heterogeneous Adaptive Learning with Kernels (HALK), when used with constant step-sizes, yields $\mathcal{O}(\sqrt{T})$ attenuation in sub-optimality and exactly satisfies the constraints in the long run, which improves upon the state of the art rates for vector-valued problems.
△ Less
Submitted 1 June, 2021; v1 submitted 1 August, 2019;
originally announced August 2019.
-
Online Learning over Dynamic Graphs via Distributed Proximal Gradient Algorithm
Authors:
Rishabh Dixit,
Amrit Singh Bedi,
Ketan Rajawat
Abstract:
We consider the problem of tracking the minimum of a time-varying convex optimization problem over a dynamic graph. Motivated by target tracking and parameter estimation problems in intermittently connected robotic and sensor networks, the goal is to design a distributed algorithm capable of handling non-differentiable regularization penalties. The proposed proximal online gradient descent algorit…
▽ More
We consider the problem of tracking the minimum of a time-varying convex optimization problem over a dynamic graph. Motivated by target tracking and parameter estimation problems in intermittently connected robotic and sensor networks, the goal is to design a distributed algorithm capable of handling non-differentiable regularization penalties. The proposed proximal online gradient descent algorithm is built to run in a fully decentralized manner and utilizes consensus updates over possibly disconnected graphs. The performance of the proposed algorithm is analyzed by develo** bounds on its dynamic regret in terms of the cumulative path length of the time-varying optimum. It is shown that as compared to the centralized case, the dynamic regret incurred by the proposed algorithm over $T$ time slots is worse by a factor of $\log(T)$ only, despite the disconnected and time-varying network topology. The empirical performance of the proposed algorithm is tested on the distributed dynamic sparse recovery problem, where it is shown to incur a dynamic regret that is close to that of the centralized algorithm.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Esca** Saddle Points with the Successive Convex Approximation Algorithm
Authors:
Amrit Singh Bedi,
Ketan Rajawat,
Vaneet Aggarwal
Abstract:
Optimizing non-convex functions is of primary importance in the vast majority of machine learning algorithms. Even though many gradient descent based algorithms have been studied, successive convex approximation based algorithms have been recently empirically shown to converge faster. However, such successive convex approximation based algorithms can get stuck in a first-order stationary point. To…
▽ More
Optimizing non-convex functions is of primary importance in the vast majority of machine learning algorithms. Even though many gradient descent based algorithms have been studied, successive convex approximation based algorithms have been recently empirically shown to converge faster. However, such successive convex approximation based algorithms can get stuck in a first-order stationary point. To avoid that, we propose an algorithm that perturbs the optimization variable slightly at the appropriate iteration. In addition to achieving the same convergence rate results as the non-perturbed version, we show that the proposed algorithm converges to a second order stationary point. Thus, the proposed algorithm escapes the saddle point efficiently and does not get stuck at the first order saddle points.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Nonparametric Compositional Stochastic Optimization for Risk-Sensitive Kernel Learning
Authors:
Amrit Singh Bedi,
Alec Koppel,
Ketan Rajawat,
Panchajanya Sanyal
Abstract:
In this work, we address optimization problems where the objective function is a nonlinear function of an expected value, i.e., compositional stochastic {strongly convex programs}. We consider the case where the decision variable is not vector-valued but instead belongs to a reproducing Kernel Hilbert Space (RKHS), motivated by risk-aware formulations of supervised learning and Markov Decision Pro…
▽ More
In this work, we address optimization problems where the objective function is a nonlinear function of an expected value, i.e., compositional stochastic {strongly convex programs}. We consider the case where the decision variable is not vector-valued but instead belongs to a reproducing Kernel Hilbert Space (RKHS), motivated by risk-aware formulations of supervised learning and Markov Decision Processes defined over continuous spaces.
We develop the first memory-efficient stochastic algorithm for this setting, which we call Compositional Online Learning with Kernels (COLK). COLK, at its core a two-time-scale stochastic approximation method, addresses the fact that (i) compositions of expected value problems cannot be addressed by classical stochastic gradient due to the presence of the inner expectation; and (ii) the RKHS-induced parameterization has complexity which is proportional to the iteration index which is mitigated through greedily constructed subspace projections. We establish almost sure convergence of COLK with attenuating step-sizes, and linear convergence in mean to a neighborhood with constant step-sizes, as well as the fact that its complexity is at-worst finite. The experiments with robust formulations of supervised learning demonstrate that COLK reliably converges, attains consistent performance across training runs, and thus overcomes overfitting.
△ Less
Submitted 26 November, 2020; v1 submitted 15 February, 2019;
originally announced February 2019.
-
On Socially Optimal Traffic Flow in the Presence of Random Users
Authors:
Anant Chopra,
Deepak S. Kalhan,
Amrit S. Bedi,
Abhishek K. Gupta,
Ketan Rajawat
Abstract:
Traffic assignment is an integral part of urban city planning. Roads and freeways are constructed to cater to the expected demands of the commuters between different origin-destination pairs with the overall objective of minimising the travel cost. As compared to static traffic assignment problems where the traffic network is fixed over time, a dynamic traffic network is more realistic where the n…
▽ More
Traffic assignment is an integral part of urban city planning. Roads and freeways are constructed to cater to the expected demands of the commuters between different origin-destination pairs with the overall objective of minimising the travel cost. As compared to static traffic assignment problems where the traffic network is fixed over time, a dynamic traffic network is more realistic where the network's cost parameters change over time due to the presence of random congestion. In this paper, we consider a stochastic version of the traffic assignment problem where the central planner is interested in finding an optimal social flow in the presence of random users. These users are random and cannot be controlled by any central directives. We propose a Frank-Wolfe algorithm based stochastic algorithm to determine the socially optimal flow for the stochastic setting in an online manner. Further, simulation results corroborate the efficacy of the proposed algorithm.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Online Learning with Inexact Proximal Online Gradient Descent Algorithms
Authors:
Rishabh Dixit,
Amrit Singh Bedi,
Ruchi Tripathi,
Ketan Rajawat
Abstract:
We consider non-differentiable dynamic optimization problems such as those arising in robotics and subspace tracking. Given the computational constraints and the time-varying nature of the problem, a low-complexity algorithm is desirable, while the accuracy of the solution may only increase slowly over time. We put forth the proximal online gradient descent (OGD) algorithm for tracking the optimum…
▽ More
We consider non-differentiable dynamic optimization problems such as those arising in robotics and subspace tracking. Given the computational constraints and the time-varying nature of the problem, a low-complexity algorithm is desirable, while the accuracy of the solution may only increase slowly over time. We put forth the proximal online gradient descent (OGD) algorithm for tracking the optimum of a composite objective function comprising of a differentiable loss function and a non-differentiable regularizer. An online learning framework is considered and the gradient of the loss function is allowed to be erroneous. Both, the gradient error as well as the dynamics of the function optimum or target are adversarial and the performance of the inexact proximal OGD is characterized in terms of its dynamic regret, expressed in terms of the cumulative error and path length of the target. The proposed inexact proximal OGD is generalized for application to large-scale problems where the loss function has a finite sum structure. In such cases, evaluation of the full gradient may not be viable and a variance reduced version is proposed that allows the component functions to be sub-sampled. The efficacy of the proposed algorithms is tested on the problem of formation control in robotics and on the dynamic foreground-background separation problem in video.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
An Online Approach to D2D Trajectory Utility Maximization Problem
Authors:
Amrit S. Bedi,
Ketan Rajawat,
Marceau Coupechoux
Abstract:
This paper considers the problem of designing the user trajectory in a device-to-device communications setting. We consider a pair of pedestrians connected through a D2D link. The pedestrians seek to reach their respective destinations while using the D2D link for data exchange applications such as file transfer, video calling, and online gaming. In order to enable better D2D connectivity, the ped…
▽ More
This paper considers the problem of designing the user trajectory in a device-to-device communications setting. We consider a pair of pedestrians connected through a D2D link. The pedestrians seek to reach their respective destinations while using the D2D link for data exchange applications such as file transfer, video calling, and online gaming. In order to enable better D2D connectivity, the pedestrians are willing to deviate from their respective shortest paths, at the cost of reaching their destinations slightly late. A generic trajectory optimization problem is formulated and solved for the case when full information about the problem in known in advance.
Motivated by the D2D user's need to keep their destinations private, we also formulate a regularized variant of the problem that can be used to develop a fully online algorithm. The proposed online algorithm is quite efficient, and is shown to achieve a sublinear \emph{offline} regret while satisfying the required mobility constraints exactly. The theoretical results are backed by detailed numerical tests that establish the efficacy of the proposed algorithms under various settings.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
Tracking Moving Agents via Inexact Online Gradient Descent Algorithm
Authors:
Amrit Singh Bedi,
Paban Sarma,
Ketan Rajawat
Abstract:
Multi-agent systems are being increasingly deployed in challenging environments for performing complex tasks such as multi-target tracking, search-and-rescue, and intrusion detection. Notwithstanding the computational limitations of individual robots, such systems rely on collaboration to sense and react to the environment. This paper formulates the generic target tracking problem as a time-varyin…
▽ More
Multi-agent systems are being increasingly deployed in challenging environments for performing complex tasks such as multi-target tracking, search-and-rescue, and intrusion detection. Notwithstanding the computational limitations of individual robots, such systems rely on collaboration to sense and react to the environment. This paper formulates the generic target tracking problem as a time-varying optimization problem and puts forth an inexact online gradient descent method for solving it sequentially. The performance of the proposed algorithm is studied by characterizing its dynamic regret, a notion common to the online learning literature. Building upon the existing results, we provide improved regret rates that not only allow non-strongly convex costs but also explicating the role of the cumulative gradient error. Two distinct classes of problems are considered: one in which the objective function adheres to a quadratic growth condition, and another where the objective function is convex but the variable belongs to a compact domain. For both cases, results are developed while allowing the error to be either adversarial or arising from a white noise process. Further, the generality of the proposed framework is demonstrated by develo** online variants of existing stochastic gradient algorithms and interpreting them as special cases of the proposed inexact gradient method. The efficacy of the proposed inexact gradient framework is established on a multi-agent multi-target tracking problem, while its flexibility is exemplified by generating online movie recommendations for Movielens $10$M dataset.
△ Less
Submitted 28 November, 2017; v1 submitted 14 October, 2017;
originally announced October 2017.
-
Asynchronous Decentralized Stochastic Optimization in Heterogeneous Networks
Authors:
Amrit Singh Bedi,
Alec Koppel,
Ketan Rajawat
Abstract:
We consider expected risk minimization in multi-agent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global objective function, which is an average of sum of the statistical average loss function of each agent in the network. Since agents are not assumed to observe data from identical distribution…
▽ More
We consider expected risk minimization in multi-agent systems comprised of distinct subsets of agents operating without a common time-scale. Each individual in the network is charged with minimizing the global objective function, which is an average of sum of the statistical average loss function of each agent in the network. Since agents are not assumed to observe data from identical distributions, the hypothesis that all agents seek a common action is violated, and thus the hypothesis upon which consensus constraints are formulated is violated. Thus, we consider nonlinear network proximity constraints which incentivize nearby nodes to make decisions which are close to one another but not necessarily coincide. Moreover, agents are not assumed to receive their sequentially arriving observations on a common time index, and thus seek to learn in an asynchronous manner. An asynchronous stochastic variant of the Arrow-Hurwicz saddle point method is proposed to solve this problem which operates by alternating primal stochastic descent steps and Lagrange multiplier updates which penalize the discrepancies between agents. This tool leads to an implementation that allows for each agent to operate asynchronously with local information only and message passing with neighbors. Our main result establishes that the proposed method yields convergence in expectation both in terms of the primal sub-optimality and constraint violation to radii of sizes $\mathcal{O}(\sqrt{T})$ and $\mathcal{O}(T^{3/4})$, respectively. Empirical evaluation on an asynchronously operating wireless network that manages user channel interference through an adaptive communications pricing mechanism demonstrates that our theoretical results translates well to practice.
△ Less
Submitted 9 December, 2017; v1 submitted 18 July, 2017;
originally announced July 2017.
-
Asynchronous Incremental Stochastic Dual Descent Algorithm for Network Resource Allocation
Authors:
Amrit S. Bedi,
Ketan Rajawat
Abstract:
Stochastic network optimization problems entail finding resource allocation policies that are optimum on an average but must be designed in an online fashion. Such problems are ubiquitous in communication networks, where resources such as energy and bandwidth are divided among nodes to satisfy certain long-term objectives. This paper proposes an asynchronous incremental dual decent resource alloca…
▽ More
Stochastic network optimization problems entail finding resource allocation policies that are optimum on an average but must be designed in an online fashion. Such problems are ubiquitous in communication networks, where resources such as energy and bandwidth are divided among nodes to satisfy certain long-term objectives. This paper proposes an asynchronous incremental dual decent resource allocation algorithm that utilizes delayed stochastic {gradients} for carrying out its updates. The proposed algorithm is well-suited to heterogeneous networks as it allows the computationally-challenged or energy-starved nodes to, at times, postpone the updates. The asymptotic analysis of the proposed algorithm is carried out, establishing dual convergence under both, constant and diminishing step sizes. It is also shown that with constant step size, the proposed resource allocation policy is asymptotically near-optimal. An application involving multi-cell coordinated beamforming is detailed, demonstrating the usefulness of the proposed algorithm.
△ Less
Submitted 9 December, 2017; v1 submitted 27 February, 2017;
originally announced February 2017.
-
Network Resource Allocation via Stochastic Subgradient Descent: Convergence Rate
Authors:
Amrit Singh Bedi,
Ketan Rajawat
Abstract:
This paper considers a general stochastic resource allocation problem that arises widely in wireless networks, cognitive radio, networks, smart-grid communications, and cross-layer design. The problem formulation involves expectations with respect to a collection of random variables with unknown distributions, representing exogenous quantities such as channel gain, user density, or spectrum occupa…
▽ More
This paper considers a general stochastic resource allocation problem that arises widely in wireless networks, cognitive radio, networks, smart-grid communications, and cross-layer design. The problem formulation involves expectations with respect to a collection of random variables with unknown distributions, representing exogenous quantities such as channel gain, user density, or spectrum occupancy. We consider the constant step-size stochastic dual subgradient descent (SDSD) method that has been widely used for online resource allocation in networks. The problem is solved in dual domain which results in a primal resource allocation subproblem at each time instant. The goal here is to characterize the non-asymptotic behavior of such stochastic resource allocations in an almost sure sense.
It is well known that with a step size of $ε$, {SDSD} converges to an $\mathcal{O}(ε)$-sized neighborhood of the optimum. In practice however, there exists a trade-off between the rate of convergence and the choice of $ε$. This paper establishes a convergence rate result for the SDSD algorithm that precisely characterizes this trade-off. {Towards this end, a novel stochastic bound on the gap between the objective function and the optimum is developed. The asymptotic behavior of the stochastic term is characterized in an almost sure sense, thereby generalizing the existing results for the {stochastic subgradient} methods.} For the stochastic resource allocation problem at hand, the result explicates the rate with which the allocated resources become near-optimal. As an application, the power and user-allocation problem in device-to-device networks is formulated and solved using the {SDSD} algorithm. Further intuition on the rate results is obtained from the verification of the regularity conditions and accompanying simulation results.
△ Less
Submitted 9 December, 2017; v1 submitted 26 February, 2017;
originally announced February 2017.