Skip to main content

Showing 1–25 of 25 results for author: Borkar, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.10424  [pdf, other

    cs.LG eess.SY stat.ML

    A Concentration Bound for TD(0) with Function Approximation

    Authors: Siddharth Chandak, Vivek S. Borkar

    Abstract: We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Submitted to Stochastic Systems

  2. arXiv:2311.14421  [pdf, other

    eess.SY cs.LG

    Approximation of Convex Envelope Using Reinforcement Learning

    Authors: Vivek S. Borkar, Adit Akarsh

    Abstract: Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function. Based on this, we develop a reinforcement learning scheme to approximate the convex envelope, using a variant of Q-learning for controlled optimal stop**. It shows very promising results on a standard library of test problems.

    Submitted 24 November, 2023; originally announced November 2023.

  3. arXiv:2311.12613  [pdf, other

    eess.SY cs.LG cs.MA

    Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

    Authors: Keshav P. Keval, Vivek S. Borkar

    Abstract: In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP). The goal, inspired by Blackwell's Approachability Theorem, is to lower the time average cost of each agent to below a pre-specified agent-specific bound. For the MMDP, we assume the state dynamics to be controlled by the joint actions of agents, but the per-stage costs to only depend… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  4. arXiv:2304.03729  [pdf, other

    eess.SY cs.LG

    Full Gradient Deep Reinforcement Learning for Average-Reward Criterion

    Authors: Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov

    Abstract: We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Marko… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: 13 pages, 4 figures; Accepted by 5th Annual Learning for Dynamics & Control Conference (L4DC) 2023

    MSC Class: 93-06

  5. arXiv:2211.01595  [pdf, other

    eess.SY cs.LG

    Reinforcement Learning in Non-Markovian Environments

    Authors: Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

    Abstract: Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek g… ▽ More

    Submitted 13 February, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 19 pages, accepted for publication at Systems and Control Letters

  6. arXiv:2111.02644  [pdf, ps, other

    cs.LG eess.SY

    A Concentration Bound for LSPE($λ$)

    Authors: Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare

    Abstract: The popular LSPE($λ$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

    Submitted 30 November, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: 17 pages, accepted for publication in Systems and Control Letters

  7. arXiv:2107.09153  [pdf, other

    cs.IT eess.SY

    User Association in Dense mmWave Networks as Restless Bandits

    Authors: S. K. Singh, V. S. Borkar, G. S. Kasbekar

    Abstract: We study the problem of user association, i.e., determining which base station (BS) a user should associate with, in a dense millimeter wave (mmWave) network. In our system model, in each time slot, a user arrives with some probability in a region with a relatively small geographical area served by a dense mmWave network. Our goal is to devise an association policy under which, in each time slot i… ▽ More

    Submitted 28 April, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 11 pages, 7 figures

  8. arXiv:2106.14308  [pdf, other

    cs.LG eess.SY

    Concentration of Contractive Stochastic Approximation and Reinforcement Learning

    Authors: Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

    Abstract: Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).

    Submitted 11 June, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

    Comments: 20 pages, Accepted for publication in Stochastic Systems

  9. Prospect-theoretic Q-learning

    Authors: Vivek S. Borkar, Siddharth Chandak

    Abstract: We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point. We analyze the asymptotic behavior of the scheme by analyzing its limiting differential equatio… ▽ More

    Submitted 1 September, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Published in Systems and Control Letters. 16 pages, 8 figures

  10. arXiv:2010.06445  [pdf, ps, other

    math.OC eess.SY physics.soc-ph

    Revisiting SIR in the age of COVID-19: Explicit Solutions and Control Problems

    Authors: Vivek S. Borkar, D. Manjunath

    Abstract: The non-population conserving SIR (SIR-NC) model to describe the spread of infections in a community is proposed and studied. Unlike the standard SIR model, SIR-NC does not assume population conservation. Although similar in form to the standard SIR, SIR-NC admits a closed form solution while allowing us to model mortality, and also provides different, and arguably a more realistic, interpretation… ▽ More

    Submitted 4 November, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: SIAM Journal of Control and Optimization 2020

  11. arXiv:1910.04402  [pdf, ps, other

    eess.SP cs.NI

    Scheduling in Wireless Networks with Spatial Reuse of Spectrum as Restless Bandits

    Authors: Vivek S. Borkar, Shantanu Choudhary, Vaibhav Kumar Gupta, Gaurav S. Kasbekar

    Abstract: We study the problem of scheduling packet transmissions with the aim of minimizing the energy consumption and data transmission delay of users in a wireless network in which spatial reuse of spectrum is employed. We approach this problem using the theory of Whittle index for cost minimizing restless bandits, which has been used to effectively solve problems in a variety of applications. We design… ▽ More

    Submitted 8 June, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: Revision

  12. arXiv:1902.01048  [pdf, ps, other

    math.OC eess.SY

    Average cost optimal control under weak ergodicity hypotheses: Relative value iterations

    Authors: Ari Arapostathis, Vivek S. Borkar

    Abstract: We study Markov decision processes with Polish state and action spaces. The action space is state dependent and is not necessarily compact. We first establish the existence of an optimal ergodic occupation measure using only a near-monotone hypothesis on the running cost. Then we study the well-posedness of Bellman equation, or what is commonly known as the average cost optimality equation, under… ▽ More

    Submitted 14 August, 2023; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: 32 pages

    MSC Class: 90C40 (93E20)

  13. Non-asymptotic Error Bounds For Constant Stepsize Stochastic Approximation For Tracking Mobile Agents

    Authors: Bhumesh Kumar, Vivek Borkar, Akhil Shetty

    Abstract: This work revisits the constant stepsize stochastic approximation algorithm for tracking a slowly moving target and obtains a bound for the tracking error that is valid for the entire time axis, using the Alekseev non-linear variation of constants formula. It is the first non-asymptptic bound for the entire time axis in the sense that it is not based on the vanishing stepsize limit and associated… ▽ More

    Submitted 1 March, 2019; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: Expanded and revised

    Journal ref: Mathematics of Control, Signals, and Systems (2019)

  14. arXiv:1710.11471  [pdf, other

    cs.PF eess.SY

    Distributed Server Allocation for Content Delivery Networks

    Authors: Sarath Pattathil, Vivek S. Borkar, Gaurav S. Kasbekar

    Abstract: We propose a dynamic formulation of file-sharing networks in terms of an average cost Markov decision process with constraints. By analyzing a Whittle-like relaxation thereof, we propose an index policy in the spirit of Whittle and compare it by simulations with other natural heuristics.

    Submitted 9 February, 2019; v1 submitted 28 October, 2017; originally announced October 2017.

    Comments: 22 pages, 10 figures

  15. arXiv:1709.03248  [pdf, ps, other

    cs.RO eess.SY

    Vector Field Guidance for Convoy Monitoring Using Elliptical Orbits

    Authors: Aseem V. Borkar, Vivek S. Borkar, Arpita Sinha

    Abstract: We propose a novel vector field based guidance scheme for tracking and surveillance of a convoy, moving along a possibly nonlinear trajectory on the ground, by an aerial agent. The scheme first computes a time varying ellipse that encompasses all the targets in the convoy using a simple regression based algorithm. It then ensures convergence of the agent to a trajectory that repeatedly traverses t… ▽ More

    Submitted 13 September, 2017; v1 submitted 11 September, 2017; originally announced September 2017.

  16. arXiv:1708.08246  [pdf, other

    eess.SY

    Distributed Stochastic Approximation with Local Projections

    Authors: Suhail M. Shah, Vivek S. Borkar

    Abstract: We propose a distributed version of a stochastic approximation scheme constrained to remain in the intersection of a finite family of convex sets. The projection to the intersection of these sets is also computed in a distributed manner and a `nonlinear gossip' mechanism is employed to blend the projection iterations with the stochastic approximation using multiple time scales

    Submitted 28 August, 2017; originally announced August 2017.

    Comments: 28 pages, 3 figures, submitted to SiOpt

  17. arXiv:1707.02440  [pdf, other

    eess.SY

    Whittle Indexability in Egalitarian Processor Sharing Systems

    Authors: Vivek S. Borkar, Sarath Pattathil

    Abstract: The egalitarian processor sharing model is viewed as a restless bandit and its Whittle indexability is established. A numerical scheme for computing the Whittle indices is provided, along with supporting numerical experiments.

    Submitted 13 July, 2017; v1 submitted 8 July, 2017; originally announced July 2017.

    Comments: 27 pages, 6 figures

  18. arXiv:1706.09778  [pdf, other

    eess.SY

    Opportunistic Scheduling as Restless Bandits

    Authors: Vivek S. Borkar, Gaurav S. Kasbekar, Sarath Pattathil, Priyesh Y. Shetty

    Abstract: In this paper we consider energy efficient scheduling in a multiuser setting where each user has a finite sized queue and there is a cost associated with holding packets (jobs) in each queue (modeling the delay constraints). The packets of each user need to be sent over a common channel. The channel qualities seen by the users are time-varying and differ across users; also, the cost incurred, i.e.… ▽ More

    Submitted 17 October, 2017; v1 submitted 29 June, 2017; originally announced June 2017.

    Comments: 10 pages, 7 figures

  19. arXiv:1503.08558  [pdf, other

    cs.IR eess.SY math.OC

    Whittle Index Policy for Crawling Ephemeral Content

    Authors: Konstantin Avrachenkov, Vivek Borkar

    Abstract: We consider a task of scheduling a crawler to retrieve content from several sites with ephemeral content. A user typically loses interest in ephemeral content, like news or posts at social network groups, after several days or hours. Thus, development of timely crawling policy for such ephemeral information sources is very important. We first formulate this problem as an optimal control problem wi… ▽ More

    Submitted 30 March, 2015; originally announced March 2015.

  20. arXiv:1411.0728  [pdf, ps, other

    cs.LG cs.GT eess.SY math.OC

    Approachability in Stackelberg Stochastic Games with Vector Costs

    Authors: Dileep Kalathil, Vivek Borkar, Rahul Jain

    Abstract: The notion of approachability was introduced by Blackwell [1] in the context of vector-valued repeated games. The famous Blackwell's approachability theorem prescribes a strategy for approachability, i.e., for `steering' the average cost of a given agent towards a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/deci… ▽ More

    Submitted 20 June, 2016; v1 submitted 3 November, 2014; originally announced November 2014.

    Comments: 18 Pages, Submitted to Dynamic Games and Applications

  21. arXiv:1404.6635  [pdf, other

    math.OC eess.SY stat.CO

    Greedy Block Coordinate Descent (GBCD) Method for High Dimensional Quadratic Programs

    Authors: Gugan Thoppe, Vivek S. Borkar, Dinesh Garg

    Abstract: High dimensional unconstrained quadratic programs (UQPs) involving massive datasets are now common in application areas such as web, social networks, etc. Unless computational resources that match up to these datasets are available, solving such problems using classical UQP methods is very difficult. This paper discusses alternatives. We first define high dimensional compliant (HDC) methods for UQ… ▽ More

    Submitted 12 July, 2014; v1 submitted 26 April, 2014; originally announced April 2014.

    Comments: 29 pages, 3 figures, New references added

  22. arXiv:1309.7841  [pdf, ps, other

    cs.DC cs.IT eess.SY math.OC

    Asynchronous Gossip for Averaging and Spectral Ranking

    Authors: Vivek S. Borkar, Rahul Makhijani, Rajesh Sundaresan

    Abstract: We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired avera… ▽ More

    Submitted 6 January, 2014; v1 submitted 30 September, 2013; originally announced September 2013.

    Comments: 14 pages, 7 figures. Minor revision

  23. arXiv:1303.0618  [pdf, ps, other

    math.OC eess.SY math.AP

    Convergence of The Relative Value Iteration for the Ergodic Control Problem of Nondegenerate Diffusions under Near-Monotone Costs

    Authors: Ari Arapostathis, Vivek S. Borkar, K. Suresh Kumar

    Abstract: We study the relative value iteration for the ergodic control problem under a near-monotone running cost structure for a nondegenerate diffusion controlled through its drift. This algorithm takes the form of a quasilinear parabolic Cauchy initial value problem in $\RR^{d}$. We show that this Cauchy problem stabilizes, or in other words, that the solution of the quasilinear parabolic equation conve… ▽ More

    Submitted 2 April, 2013; v1 submitted 4 March, 2013; originally announced March 2013.

    MSC Class: 93E15; 93E20

    Journal ref: SIAM Journal of Control and Optimization 52 (2014), no. 1, pp. 1-31

  24. Relative Value Iteration for Stochastic Differential Games

    Authors: Ari Arapostathis, Vivek S. Borkar, K. Suresh Kumar

    Abstract: We study zero-sum stochastic differential games with player dynamics governed by a nondegenerate controlled diffusion process. Under the assumption of uniform stability, we establish the existence of a solution to the Isaac's equation for the ergodic game and characterize the optimal stationary strategies. The data is not assumed to be bounded, nor do we assume geometric ergodicity. Thus our resul… ▽ More

    Submitted 2 April, 2013; v1 submitted 30 October, 2012; originally announced October 2012.

    MSC Class: 93E15; 93E20 (Primary) 60J25; 60J60; 90C40 (Secondary)

    Journal ref: Advances in dynamic games, 3--27, Ann. Internat. Soc. Dynam. Games, 13, Birkhäuser/Springer, Cham, 2013

  25. arXiv:1107.4142  [pdf, ps, other

    math.PR cs.IT eess.SY math.OC

    Asymptotics of the Invariant Measure in Mean Field Models with Jumps

    Authors: Vivek S. Borkar, Rajesh Sundaresan

    Abstract: We consider the asymptotics of the invariant measure for the process of the empirical spatial distribution of $N$ coupled Markov chains in the limit of a large number of chains. Each chain reflects the stochastic evolution of one particle. The chains are coupled through the dependence of the transition rates on this spatial distribution of particles in the various states. Our model is a caricature… ▽ More

    Submitted 23 January, 2013; v1 submitted 20 July, 2011; originally announced July 2011.

    Comments: 58 pages, reorganised to get quickly to the main results on invariant measure; Stochastic Systems, volume 2, 2012