Skip to main content

Showing 1–9 of 9 results for author: Meyn, S P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2008.03559  [pdf, other

    math.OC cs.LG

    Convex Q-Learning, Part 1: Deterministic Optimal Control

    Authors: Prashant G. Mehta, Sean P. Meyn

    Abstract: It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent? These questions are unanswered even in the special case of Q-fun… ▽ More

    Submitted 8 August, 2020; originally announced August 2020.

    Comments: This pre-print is written in a tutorial style so it is accessible to new-comers. It will be a part of a handout for upcoming short courses on RL. A more compact version suitable for journal submission is in preparation

    MSC Class: 68T05 (Primary) 93E35; 49L20 (Secondary)

  2. arXiv:2002.10301  [pdf, other

    cs.LG eess.SY stat.ML

    Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

    Authors: Adithya M. Devraj, Sean P. Meyn

    Abstract: Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-γ)$, where $γ< 1$ is the discount factor. For a large discount factor, these bounds seem to imply that a very large number of samples is required to achieve an $\varepsilon$-optimal p… ▽ More

    Submitted 7 July, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: 33 pages, 4 figures

  3. arXiv:1910.05405  [pdf, other

    cs.LG eess.SY stat.ML

    Zap Q-Learning With Nonlinear Function Approximation

    Authors: Shuhang Chen, Adithya M. Devraj, Fan Lu, Ana Bušić, Sean P. Meyn

    Abstract: Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stop**. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general the… ▽ More

    Submitted 15 July, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

  4. arXiv:1904.11538  [pdf, other

    eess.SY cs.LG

    Zap Q-Learning for Optimal Stop** Time Problems

    Authors: Shuhang Chen, Adithya M. Devraj, Ana Bušić, Sean P. Meyn

    Abstract: The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stop** in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of $\mathbb{R}^n$. We build on the dynamic programming approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning algorithm to estimate… ▽ More

    Submitted 30 September, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

  5. arXiv:1812.11137  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Differential Temporal Difference Learning

    Authors: Adithya M. Devraj, Ioannis Kontoyiannis, Sean P. Meyn

    Abstract: Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as Temporal Difference (… ▽ More

    Submitted 27 February, 2020; v1 submitted 28 December, 2018; originally announced December 2018.

    Comments: Preliminary versions of some of the results in this article were submitted as arXiv:1604.01828

    MSC Class: 93E20; 93E35; 60J20

  6. arXiv:1707.03770  [pdf, other

    eess.SY cs.LG math.OC

    Fastest Convergence for Q-learning

    Authors: Adithya M. Devraj, Sean P. Meyn

    Abstract: The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scal… ▽ More

    Submitted 21 March, 2018; v1 submitted 12 July, 2017; originally announced July 2017.

  7. arXiv:1604.01828  [pdf, other

    eess.SY cs.LG math.OC

    Differential TD Learning for Value Function Approximation

    Authors: Adithya M. Devraj, Sean P. Meyn

    Abstract: Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A popular approximation technique is known as Temporal Difference (TD) learning. The algorithm introduced in this paper is intended to resolve two well-known problems… ▽ More

    Submitted 23 December, 2018; v1 submitted 6 April, 2016; originally announced April 2016.

    MSC Class: 93E20; 93E35; 60J20

  8. arXiv:1502.03762  [pdf, other

    math.OC cs.IT

    Rationally inattentive control of Markov processes

    Authors: Ehsan Shafieepoorfard, Maxim Raginsky, Sean P. Meyn

    Abstract: The article poses a general model for optimal control subject to information constraints, motivated in part by recent work of Sims and others on information-constrained decision-making by economic agents. In the average-cost optimal control framework, the general model introduced in this paper reduces to a variant of the linear-programming representation of the average-cost optimal control problem… ▽ More

    Submitted 23 February, 2016; v1 submitted 12 February, 2015; originally announced February 2015.

    Comments: 30 pages, 2 figures; accepted to SIAM Journal on Control and Optimization

    MSC Class: 94A34; 90C40; 90C47

  9. arXiv:1010.4820  [pdf, ps, other

    math.OC cs.IT eess.SY

    Random-Time, State-Dependent Stochastic Drift for Markov Chains and Application to Stochastic Stabilization Over Erasure Channels

    Authors: Serdar Yüksel, Sean P. Meyn

    Abstract: It is known that state-dependent, multi-step Lyapunov bounds lead to greatly simplified verification theorems for stability for large classes of Markov chain models. This is one component of the "fluid model" approach to stability of stochastic networks. In this paper we extend the general theory to randomized multi-step Lyapunov theory to obtain criteria for stability and steady-state performance… ▽ More

    Submitted 17 May, 2012; v1 submitted 22 October, 2010; originally announced October 2010.

    Comments: To appear in IEEE Transactions on Automatic Control

    MSC Class: 93E03; 94A15; 60J05