Showing 1–2 of 2 results for author: Dodhia, P

Search v0.5.6 released 2020-02-24

arXiv:2211.01595 [pdf, other]

eess.SY cs.LG

Reinforcement Learning in Non-Markovian Environments

Authors: Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

Abstract: Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek g… ▽ More Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments. △ Less

Submitted 13 February, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: 19 pages, accepted for publication at Systems and Control Letters
arXiv:2106.14308 [pdf, other]

cs.LG eess.SY

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

Authors: Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

Abstract: Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0). Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0). △ Less

Submitted 11 June, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

Comments: 20 pages, Accepted for publication in Stochastic Systems

Search v0.5.6 released 2020-02-24