Search | arXiv e-print repository

Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

Authors: S. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson, Daniel Sadoc Menasché, Gugan Thoppe

Abstract: Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Mar… ▽ More Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2403.09940 [pdf, ps, other]

Global Convergence Guarantees for Federated Policy Gradient Methods with Adversaries

Authors: Swetha Ganesh, Jiayu Chen, Gugan Thoppe, Vaneet Aggarwal

Abstract: Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories. However, if a small fraction of these agents are adversarial, it can lead to catastrophic results. We propose a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server. Under this setting, our res… ▽ More Federated Reinforcement Learning (FRL) allows multiple agents to collaboratively build a decision making policy without sharing raw trajectories. However, if a small fraction of these agents are adversarial, it can lead to catastrophic results. We propose a policy gradient based approach that is robust to adversarial agents which can send arbitrary values to the server. Under this setting, our results form the first global convergence guarantees with general parametrization. These results demonstrate resilience with adversaries, while achieving sample complexity of order $\tilde{\mathcal{O}}\left( \frac{1}{ε^2} \left( \frac{1}{N-f} + \frac{f^2}{(N-f)^2}\right)\right)$, where $N$ is the total number of agents and $f$ is the number of adversarial agents. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 27 pages, 6 figures

arXiv:2310.11389 [pdf, ps, other]

Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

Authors: Gugan Thoppe, L. A. Prashanth, Sanjay Bhat

Abstract: We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show that estimating any of these risk measures with $ε$-accuracy, either in expected or high-probability sense, requires at least $Ω(1/ε^2)$ samples. Then, using a t… ▽ More We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show that estimating any of these risk measures with $ε$-accuracy, either in expected or high-probability sense, requires at least $Ω(1/ε^2)$ samples. Then, using a truncation scheme, we derive an upper bound for the CVaR and variance estimation. This bound matches our lower bound up to logarithmic factors. Finally, we discuss an extension of our estimation scheme that covers more general risk measures satisfying a certain continuity criterion, e.g., spectral risk measures, utility-based shortfall risk. To the best of our knowledge, our work is the first to provide lower and upper bounds for estimating any risk measure beyond the mean within a Markovian setting. Our lower bounds also extend to the infinite-horizon discounted costs' mean. Even in that case, our lower bound of $Ω(1/ε^2) $ improves upon the existing $Ω(1/ε)$ bound [13]. △ Less

Submitted 11 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2304.01525 [pdf, other]

Online Learning with Adversaries: A Differential-Inclusion Analysis

Authors: Swetha Ganesh, Alexandre Reiffers-Masson, Gugan Thoppe

Abstract: We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning (FL) with adversaries. In this work, we demonstrate its effectiveness in estimating the mean of a random vector. Our main result is that the proposed algorithm almost surely converges to the desired mean $μ.$ This makes ours the first asynchronous FL method to have an a.s. convergence guarantee in t… ▽ More We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning (FL) with adversaries. In this work, we demonstrate its effectiveness in estimating the mean of a random vector. Our main result is that the proposed algorithm almost surely converges to the desired mean $μ.$ This makes ours the first asynchronous FL method to have an a.s. convergence guarantee in the presence of adversaries. We derive this convergence using a novel differential-inclusion-based two-timescale analysis. Two other highlights of our proof include (a) the use of a novel Lyapunov function to show that $μ$ is the unique global attractor for our algorithm's limiting dynamics, and (b) the use of martingale and stop**-time theory to show that our algorithm's iterates are almost surely bounded. △ Less

Submitted 26 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 6 pages, 2 figures

arXiv:2301.13236 [pdf, other]

SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

Authors: Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

Abstract: Despite the popularity of policy gradient methods, they are known to suffer from large variance and high sample complexity. To mitigate this, we introduce SoftTreeMax -- a generalization of softmax that takes planning into account. In SoftTreeMax, we extend the traditional logits with the multi-step discounted cumulative reward, topped with the logits of future states. We consider two variants of… ▽ More Despite the popularity of policy gradient methods, they are known to suffer from large variance and high sample complexity. To mitigate this, we introduce SoftTreeMax -- a generalization of softmax that takes planning into account. In SoftTreeMax, we extend the traditional logits with the multi-step discounted cumulative reward, topped with the logits of future states. We consider two variants of SoftTreeMax, one for cumulative reward and one for exponentiated reward. For both, we analyze the gradient variance and reveal for the first time the role of a tree expansion policy in mitigating this variance. We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy. Specifically, we show that the closer the resulting state transitions are to uniform, the faster the decay. In a practical implementation, we utilize a parallelized GPU-based simulator for fast and efficient tree search. Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient. We then show in simulation how the variance of the gradient is reduced by three orders of magnitude, leading to better sample complexity compared to the standard policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in a faster run time compared to distributed PPO. Lastly, we demonstrate that high reward correlates with lower variance. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2209.13966

arXiv:2208.10583 [pdf, other]

Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking

Authors: Eshwar S R, Shishir Kolathaya, Gugan Thoppe

Abstract: Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimate… ▽ More Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well. △ Less

Submitted 21 February, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

arXiv:2205.13617 [pdf, other]

Demystifying Approximate Value-based RL with $ε$-greedy Exploration: A Differential Inclusion View

Authors: Aditya Gopalan, Gugan Thoppe

Abstract: Q-learning and SARSA with $ε$-greedy exploration are leading reinforcement learning methods. Their tabular forms converge to the optimal Q-function under reasonable conditions. However, with function approximation, these methods exhibit strange behaviors such as policy oscillation, chattering, and convergence to different attractors (possibly even the worst policy) on different runs, apart from th… ▽ More Q-learning and SARSA with $ε$-greedy exploration are leading reinforcement learning methods. Their tabular forms converge to the optimal Q-function under reasonable conditions. However, with function approximation, these methods exhibit strange behaviors such as policy oscillation, chattering, and convergence to different attractors (possibly even the worst policy) on different runs, apart from the usual instability. A theory to explain these phenomena has been a long-standing open problem, even for basic linear function approximation (Sutton, 1999). Our work uses differential inclusion to provide the first framework for resolving this problem. We also provide numerical examples to illustrate our framework's prowess in explaining these algorithms' behaviors. △ Less

Submitted 10 February, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: 22 pages, 3 figures

MSC Class: 93E35; 68Q32 ACM Class: I.2.0

arXiv:2110.15547 [pdf, ps, other]

Does Momentum Help? A Sample Complexity Analysis

Authors: Swetha Ganesh, Rohan Deb, Gugan Thoppe, Amarjit Budhiraja

Abstract: Stochastic Heavy Ball (SHB) and Nesterov's Accelerated Stochastic Gradient (ASG) are popular momentum methods in stochastic optimization. While benefits of such acceleration ideas in deterministic settings are well understood, their advantages in stochastic optimization is still unclear. In fact, in some specific instances, it is known that momentum does not help in the sample complexity sense. Ou… ▽ More Stochastic Heavy Ball (SHB) and Nesterov's Accelerated Stochastic Gradient (ASG) are popular momentum methods in stochastic optimization. While benefits of such acceleration ideas in deterministic settings are well understood, their advantages in stochastic optimization is still unclear. In fact, in some specific instances, it is known that momentum does not help in the sample complexity sense. Our work shows that a similar outcome actually holds for the whole of quadratic optimization. Specifically, we obtain a lower bound on the sample complexity of SHB and ASG for this family and show that the same bound can be achieved by the vanilla SGD. We note that there exist results claiming the superiority of momentum based methods in quadratic optimization, but these are based on one-sided or flawed analyses. △ Less

Submitted 11 July, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.15092 [pdf, ps, other]

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

Authors: Gugan Thoppe, Bhumesh Kumar

Abstract: In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MA… ▽ More In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize $n^{-γ}$ with $γ\in (0, 1),$ the distributed TD(0) algorithm with linear function approximation has a convergence rate of $O(\sqrt{n^{-γ} \ln n })$ a.s.; for the $1/n$ type stepsize, the same is $O(\sqrt{n^{-1} \ln \ln n})$ a.s. These decay rates do not depend on the graph depicting the interactions among the different agents. △ Less

Submitted 15 January, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Some typos corrected; 19 pages

MSC Class: 93E35; 68Q32 ACM Class: I.2.11

arXiv:2012.14122 [pdf, other]

The Shadow knows: Empirical Distributions of Minimum Spanning Acycles and Persistence Diagrams of Random Complexes

Authors: Nicolas Fraiman, Sayan Mukherjee, Gugan Thoppe

Abstract: In 1985, Frieze showed that the expected sum of the edge weights of the minimum spanning tree (MST) in the uniformly weighted graph converges to $ζ(3)$. Recently, Hino and Kanazawa extended this result to a uniformly weighted simplicial complex, where the role of the MST is played by its higher-dimensional analog -- the Minimum Spanning Acycle (MSA). Our work goes beyond and describes the histogra… ▽ More In 1985, Frieze showed that the expected sum of the edge weights of the minimum spanning tree (MST) in the uniformly weighted graph converges to $ζ(3)$. Recently, Hino and Kanazawa extended this result to a uniformly weighted simplicial complex, where the role of the MST is played by its higher-dimensional analog -- the Minimum Spanning Acycle (MSA). Our work goes beyond and describes the histogram of all the weights in this random MST and random MSA. Specifically, we show that their empirical distributions converge to a measure based on a concept called the shadow. The shadow of a graph is the set of all the missing transitive edges, and, for a simplicial complex, it is a related topological generalization. As a corollary, we obtain a similar claim for the death times in the persistence diagram corresponding to the above-weighted complex, a result of interest in applied topology. △ Less

Submitted 29 January, 2024; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: 18 pages, 4 figures

MSC Class: 60C05; 60G57; 05E45

arXiv:2009.08142 [pdf, other]

Online Algorithms for Estimating Change Rates of Web Pages

Authors: Konstantin Avrachenkov, Kishor Patil, Gugan Thoppe

Abstract: A search engine maintains local copies of different web pages to provide quick search results. This local cache is kept up-to-date by a web crawler that frequently visits these different pages to track changes in them. Ideally, the local copy should be updated as soon as a page changes on the web. However, finite bandwidth availability and server restrictions limit how frequently different pages c… ▽ More A search engine maintains local copies of different web pages to provide quick search results. This local cache is kept up-to-date by a web crawler that frequently visits these different pages to track changes in them. Ideally, the local copy should be updated as soon as a page changes on the web. However, finite bandwidth availability and server restrictions limit how frequently different pages can be crawled. This brings forth the following optimization problem: maximize the freshness of the local cache subject to the crawling frequencies being within prescribed bounds. While tractable algorithms do exist to solve this problem, these either assume the knowledge of exact page change rates or use inefficient methods such as MLE for estimating the same. We address this issue here. We provide three novel schemes for online estimation of page change rates, all of which have extremely low running times per iteration. The first is based on the law of large numbers and the second on stochastic approximation. The third is an extension of the second and includes a heavy-ball momentum term. All these schemes only need partial information about the page change process, i.e., they only need to know if the page has changed or not since the last crawled instance. Our main theoretical results concern asymptotic convergence and convergence rates of these three schemes. In fact, our work is the first to show convergence of the original stochastic heavy-ball method when neither the gradient nor the noise variance is uniformly bounded. We also provide some numerical experiments (based on real and synthetic data) to demonstrate the superiority of our proposed estimators over existing ones such as MLE. We emphasize that our algorithms are also readily applicable to the synchronization of databases and network inventory management. △ Less

Submitted 4 November, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: This is the author version of the paper accepted to {\it International Journal of Performance Evaluation}, Elsevier; 25 pages. arXiv admin note: text overlap with arXiv:2004.02167

arXiv:2004.02167 [pdf, other]

Change Rate Estimation and Optimal Freshness in Web Page Crawling

Authors: Konstantin Avrachenkov, Kishor Patil, Gugan Thoppe

Abstract: For providing quick and accurate results, a search engine maintains a local snapshot of the entire web. And, to keep this local cache fresh, it employs a crawler for tracking changes across various web pages. However, finite bandwidth availability and server restrictions impose some constraints on the crawling frequency. Consequently, the ideal crawling rates are the ones that maximise the freshne… ▽ More For providing quick and accurate results, a search engine maintains a local snapshot of the entire web. And, to keep this local cache fresh, it employs a crawler for tracking changes across various web pages. However, finite bandwidth availability and server restrictions impose some constraints on the crawling frequency. Consequently, the ideal crawling rates are the ones that maximise the freshness of the local cache and also respect the above constraints. Azar et al. 2018 recently proposed a tractable algorithm to solve this optimisation problem. However, they assume the knowledge of the exact page change rates, which is unrealistic in practice. We address this issue here. Specifically, we provide two novel schemes for online estimation of page change rates. Both schemes only need partial information about the page change process, i.e., they only need to know if the page has changed or not since the last crawled instance. For both these schemes, we prove convergence and, also, derive their convergence rates. Finally, we provide some numerical experiments to compare the performance of our proposed estimators with the existing ones (e.g., MLE). △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: This paper has been accepted to the 13th EAI International Conference on Performance Evaluation Methodologies and Tools, VALUETOOLS'20, May 18--20, 2020, Tsukuba, Japan. This is the author version of the paper

arXiv:2001.06860 [pdf, other]

Limit theorems for topological invariants of the dynamic multi-parameter simplicial complex

Authors: Takashi Owada, Gennady Samorodnitsky, Gugan Thoppe

Abstract: Topological study of existing random simplicial complexes is non-trivial and has led to several seminal works. However, the applicability of such studies is limited since the randomness there is usually governed by a single parameter. With this in mind, we focus here on the topology of the recently proposed multi-parameter random simplicial complex and, more importantly, of its dynamic analogue th… ▽ More Topological study of existing random simplicial complexes is non-trivial and has led to several seminal works. However, the applicability of such studies is limited since the randomness there is usually governed by a single parameter. With this in mind, we focus here on the topology of the recently proposed multi-parameter random simplicial complex and, more importantly, of its dynamic analogue that we introduce here. In this dynamic setup, the temporal evolution of simplices is determined by stationary and possibly non-Markovian processes with a renewal structure. The dynamic versions of the clique complex and the Linial-Meshulum complex are special cases of our setup. Our key result concerns the regime where face-counts of a particular dimension dominate. We show that the Betti numbers corresponding to this dimension and the Euler characteristic satisfy functional strong law of large numbers and functional central limit theorems. Surprisingly, in the latter result, the limiting Gaussian process depends only upon the dynamics in the smallest non-trivial dimension. △ Less

Submitted 4 February, 2021; v1 submitted 19 January, 2020; originally announced January 2020.

Comments: 42 pages, 1 figure

MSC Class: 60F17; 55U05; 60C05; 60F15

arXiv:1911.09157 [pdf, ps, other]

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Authors: Gal Dalal, Balazs Szorenyi, Gugan Thoppe

Abstract: Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, $θ_n$ and $w_n,$ which are updated using two distinct stepsize sequences, $α_n$ and… ▽ More Policy evaluation in reinforcement learning is often conducted using two-timescale stochastic approximation, which results in various gradient temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide convergence rate bounds for this suite of algorithms. Algorithms such as these have two iterates, $θ_n$ and $w_n,$ which are updated using two distinct stepsize sequences, $α_n$ and $β_n,$ respectively. Assuming $α_n = n^{-α}$ and $β_n = n^{-β}$ with $1 > α> β> 0,$ we show that, with high probability, the two iterates converge to their respective solutions $θ^*$ and $w^*$ at rates given by $\|θ_n - θ^*\| = \tilde{O}( n^{-α/2})$ and $\|w_n - w^*\| = \tilde{O}(n^{-β/2});$ here, $\tilde{O}$ hides logarithmic terms. Via comparable lower bounds, we show that these bounds are, in fact, tight. To the best of our knowledge, ours is the first finite-time analysis which achieves these rates. While it was known that the two timescale components decouple asymptotically, our results depict this phenomenon more explicitly by showing that it in fact happens from some finite time onwards. Lastly, compared to existing works, our result applies to a broader family of stepsizes, including non-square summable ones. △ Less

Submitted 4 December, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

arXiv:1807.11018 [pdf, other]

Betti Numbers of Gaussian Excursions in the Sparse Regime

Authors: Gugan Thoppe, Sunder Ram Krishnan

Abstract: Random field excursions is an increasingly vital topic within data analysis in medicine, cosmology, materials science, etc. This work is the first detailed study of their Betti numbers in the so-called `sparse' regime. Specifically, we consider a piecewise constant Gaussian field whose covariance function is positive and satisfies some local, boundedness, and decay rate conditions. We model its ex… ▽ More Random field excursions is an increasingly vital topic within data analysis in medicine, cosmology, materials science, etc. This work is the first detailed study of their Betti numbers in the so-called `sparse' regime. Specifically, we consider a piecewise constant Gaussian field whose covariance function is positive and satisfies some local, boundedness, and decay rate conditions. We model its excursion set via a Cech complex. For Betti numbers of this complex, we then prove various limit theorems as the window size and the excursion level together grow to infinity. Our results include asymptotic mean and variance estimates, a vanishing to non-vanishing phase transition with a precise estimate of the transition threshold, and a weak law in the non-vanishing regime. We further obtain a Poisson approximation and a central limit theorem close to the transition threshold. Our proofs combine extreme value theory and combinatorial topology tools. △ Less

Submitted 23 August, 2018; v1 submitted 29 July, 2018; originally announced July 2018.

Comments: 66 pages, 4 figures

MSC Class: 60G15; 60F05; 05E45 (Primary) 60G60; 60G70; 60G10; 55U10 (Secondary)

arXiv:1704.01161 [pdf, other]

Finite Sample Analyses for TD(0) with Function Approximation

Authors: Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

Abstract: TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Existing convergence rates for Temporal Difference (TD) methods apply only to somewhat modified versions, e.g., projected variants or ones where stepsize… ▽ More TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Existing convergence rates for Temporal Difference (TD) methods apply only to somewhat modified versions, e.g., projected variants or ones where stepsizes depend on unknown problem parameters. Our analyses obviate these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. The two are obtained via different approaches that use relatively unknown, recently developed stochastic approximation techniques. △ Less

Submitted 11 December, 2017; v1 submitted 4 April, 2017; originally announced April 2017.

arXiv:1703.05376 [pdf, other]

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Authors: Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor

Abstract: Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as `lock-in… ▽ More Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as `lock-in probability'. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection. As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC. △ Less

Submitted 4 June, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

arXiv:1701.00239 [pdf, other]

Randomly Weighted $d-$complexes: Minimal Spanning Acycles and Persistence Diagrams

Authors: Primoz Skraba, Gugan Thoppe, D. Yogeshwaran

Abstract: A weighted $d-$complex is a simplicial complex of dimension $d$ in which each face is assigned a real-valued weight. We derive three key results here concerning persistence diagrams and minimal spanning acycles (MSAs) of such complexes. First, we establish an equivalence between the MSA face-weights and \emph{death times} in the persistence diagram. Next, we show a novel stability result for the M… ▽ More A weighted $d-$complex is a simplicial complex of dimension $d$ in which each face is assigned a real-valued weight. We derive three key results here concerning persistence diagrams and minimal spanning acycles (MSAs) of such complexes. First, we establish an equivalence between the MSA face-weights and \emph{death times} in the persistence diagram. Next, we show a novel stability result for the MSA face-weights which, due to our first result, also holds true for the death and birth times, separately. Our final result concerns a perturbation of a mean-field model of randomly weighted $d-$complexes. The $d-$face weights here are perturbation of some i.i.d. distribution while all the lower-dimensional faces have a weight of $0$. If the perturbations decay sufficiently quickly, we show that suitably scaled extremal nearest face-weights, face-weights of the $d-$MSA, and the associated death times converge to an inhomogeneous Poisson point process. This result completely characterizes the extremal points of persistence diagrams and MSAs. The point process convergence and the asymptotic equivalence of three point processes are new for any weighted random complex model, including even the non-perturbed case. Lastly, as a consequence of our stability result, we show that Frieze's $ζ(3)$ limit for random minimal spanning trees and the recent extension to random MSAs by Hino and Kanazawa also hold in suitable noisy settings. △ Less

Submitted 22 March, 2020; v1 submitted 1 January, 2017; originally announced January 2017.

Comments: 42 Pages, 1 Figure. Streamlined introduction, modified Section 3 significantly

MSC Class: 60C05; 05E45 (Primary) 60G70; 60B99; 05C80 (Secondary)

arXiv:1506.08657 [pdf, ps, other]

A Concentration Bound for Stochastic Approximation via Alekseev's Formula

Authors: Gugan Thoppe, Vivek S. Borkar

Abstract: Given an ODE and its perturbation, the Alekseev formula expresses the solutions of the latter in terms related to the former. By exploiting this formula and a new concentration inequality for martingale-differences, we develop a novel approach for analyzing nonlinear Stochastic Approximation (SA). This approach is useful for studying a SA's behaviour close to a Locally Asymptotically Stable Equili… ▽ More Given an ODE and its perturbation, the Alekseev formula expresses the solutions of the latter in terms related to the former. By exploiting this formula and a new concentration inequality for martingale-differences, we develop a novel approach for analyzing nonlinear Stochastic Approximation (SA). This approach is useful for studying a SA's behaviour close to a Locally Asymptotically Stable Equilibrium (LASE) of its limiting ODE; this LASE need not be the limiting ODE's only attractor. As an application, we obtain a new concentration bound for nonlinear SA. That is, given $ε>0$ and that the current iterate is in a neighbourhood of a LASE, we provide an estimate for i.) the time required to hit the $ε-$ball of this LASE, and ii.) the probability that after this time the iterates are indeed within this $ε-$ball and stay there thereafter. The latter estimate can also be viewed as the `lock-in' probability. Compared to related results, our concentration bound is tighter and holds under significantly weaker assumptions. In particular, our bound applies even when the stepsizes are not square-summable. Despite the weaker hypothesis, we show that the celebrated Kushner-Clark lemma continues to hold. % △ Less

Submitted 30 March, 2019; v1 submitted 26 June, 2015; originally announced June 2015.

Comments: 44 pages. Mentioned that Dh(x*) needs to be Hurwitz

arXiv:1503.01983 [pdf, ps, other]

On the evolution of topology in dynamic clique complexes

Authors: Gugan Thoppe, D. Yogeshwaran, Robert Adler

Abstract: We consider a time varying analogue of the Erd{\H o}s-R{\' e}nyi graph and study the topological variations of its associated clique complex. The dynamics of the graph are stationary and are determined by the edges, which evolve independently as continuous time Markov chains. Our main result is that when the edge inclusion probability is of the form $p = n^α$, where $n$ is the number of vertices a… ▽ More We consider a time varying analogue of the Erd{\H o}s-R{\' e}nyi graph and study the topological variations of its associated clique complex. The dynamics of the graph are stationary and are determined by the edges, which evolve independently as continuous time Markov chains. Our main result is that when the edge inclusion probability is of the form $p = n^α$, where $n$ is the number of vertices and $α\in (-1/k, -1/(k + 1)),$ then the process of the normalized $k-$th Betti number of these dynamic clique complexes converges weakly to the Ornstein-Uhlenbeck process as $n \to \infty.$ △ Less

Submitted 15 January, 2016; v1 submitted 5 March, 2015; originally announced March 2015.

Comments: Rewrote the introduction

arXiv:1404.6635 [pdf, other]

Greedy Block Coordinate Descent (GBCD) Method for High Dimensional Quadratic Programs

Authors: Gugan Thoppe, Vivek S. Borkar, Dinesh Garg

Abstract: High dimensional unconstrained quadratic programs (UQPs) involving massive datasets are now common in application areas such as web, social networks, etc. Unless computational resources that match up to these datasets are available, solving such problems using classical UQP methods is very difficult. This paper discusses alternatives. We first define high dimensional compliant (HDC) methods for UQ… ▽ More High dimensional unconstrained quadratic programs (UQPs) involving massive datasets are now common in application areas such as web, social networks, etc. Unless computational resources that match up to these datasets are available, solving such problems using classical UQP methods is very difficult. This paper discusses alternatives. We first define high dimensional compliant (HDC) methods for UQPs---methods that can solve high dimensional UQPs by adapting to available computational resources. We then show that the class of block Kaczmarz and block coordinate descent (BCD) are the only existing methods that can be made HDC. As a possible answer to the question of the `best' amongst BCD methods for UQP, we propose a novel greedy BCD (GBCD) method with serial, parallel and distributed variants. Convergence rates and numerical tests confirm that the GBCD is indeed an effective method to solve high dimensional UQPs. In fact, it sometimes beats even the conjugate gradient. △ Less

Submitted 12 July, 2014; v1 submitted 26 April, 2014; originally announced April 2014.

Comments: 29 pages, 3 figures, New references added

arXiv:1212.3696 [pdf, other]

doi 10.1016/j.automatica.2013.12.016

A Stochastic Kaczmarz Algorithm for Network Tomography

Authors: Gugan Thoppe, Vivek S. Borkar, D. Manjunath

Abstract: We develop a stochastic approximation version of the classical Kaczmarz algorithm that is incremental in nature and takes as input noisy real time data. Our analysis shows that with probability one it mimics the behavior of the original scheme: starting from the same initial point, our algorithm and the corresponding deterministic Kaczmarz algorithm converge to precisely the same point. The motiva… ▽ More We develop a stochastic approximation version of the classical Kaczmarz algorithm that is incremental in nature and takes as input noisy real time data. Our analysis shows that with probability one it mimics the behavior of the original scheme: starting from the same initial point, our algorithm and the corresponding deterministic Kaczmarz algorithm converge to precisely the same point. The motivation for this work comes from network tomography where network parameters are to be estimated based upon end-to-end measurements. Numerical examples via Matlab based simulations demonstrate the efficacy of the algorithm. △ Less

Submitted 18 October, 2013; v1 submitted 15 December, 2012; originally announced December 2012.

Comments: Figures have been improved. Streamlined notation

arXiv:1210.7911 [pdf, other]

Generalized Network Tomography (journal version)

Authors: Gugan Thoppe

Abstract: Generalized network tomography (GNT) deals with estimation of link performance parameters for networks with arbitrary topologies using only end-to-end path measurements of pure unicast probe packets. In this paper, by taking advantage of the properties of generalized hyperexponential distributions and polynomial systems, a novel algorithm to infer the complete link metric distributions under the f… ▽ More Generalized network tomography (GNT) deals with estimation of link performance parameters for networks with arbitrary topologies using only end-to-end path measurements of pure unicast probe packets. In this paper, by taking advantage of the properties of generalized hyperexponential distributions and polynomial systems, a novel algorithm to infer the complete link metric distributions under the framework of GNT is developed. The significant advantages of this algorithm are that it does not require: i) the path measurements to be synchronous and ii) any prior knowledge of the link metric distributions. Moreover, if the path-link matrix of the network has the property that every pair of its columns are linearly independent, then it is shown that the algorithm can uniquely identify the link metric distributions up to any desired accuracy. Matlab based simulations have been included to illustrate the potential of the proposed scheme. △ Less

Submitted 30 October, 2012; originally announced October 2012.

Comments: 33 pages. Extended version of arXiv:1207.2530

MSC Class: 47A50; 47A52; 62J99;

arXiv:1207.2530 [pdf, other]

Generalized Network Tomography

Authors: Gugan Thoppe

Abstract: For successful estimation, the usual network tomography algorithms crucially require i) end-to-end data generated using multicast probe packets, real or emulated, and ii) the network to be a tree rooted at a single sender with destinations at leaves. These requirements, consequently, limit their scope of application. In this paper, we address successfully a general problem, henceforth called gener… ▽ More For successful estimation, the usual network tomography algorithms crucially require i) end-to-end data generated using multicast probe packets, real or emulated, and ii) the network to be a tree rooted at a single sender with destinations at leaves. These requirements, consequently, limit their scope of application. In this paper, we address successfully a general problem, henceforth called generalized network tomography, wherein the objective is to estimate the link performance parameters for networks with arbitrary topologies using only end-to-end measurements of pure unicast probe packets. Mathematically, given a binary matrix $A,$ we propose a novel algorithm to uniquely estimate the distribution of $X,$ a vector of independent non-negative random variables, given only IID samples of the components of the random vector $Y = AX.$ This algorithm, in fact, does not even require any prior knowledge of the unknown distributions. The idea is to approximate the distribution of each component of $X$ using linear combinations of known exponential bases and estimate the unknown weights. These weights are obtained by solving a set of polynomial systems based on the moment generating function of the components of $Y.$ For unique identifiability, it is only required that every pair of columns of the matrix $A$ be linearly independent, a property that holds true for the routing matrices of all multicast tree networks. Matlab based simulations have been included to illustrate the potential of the proposed scheme. △ Less

Submitted 1 November, 2012; v1 submitted 10 July, 2012; originally announced July 2012.

Comments: 8 Pages, Corrected Typos in Lemma 1

Showing 1–24 of 24 results for author: Thoppe, G