Search | arXiv e-print repository

arXiv:1910.08446 [pdf, ps, other]

Autonomous exploration for navigating in non-stationary CMPs

Authors: Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

Abstract: We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prov… ▽ More We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prove an upper bound on the exploration steps in terms of the number of changes. △ Less

Submitted 18 October, 2019; originally announced October 2019.

arXiv:1905.05857 [pdf, ps, other]

Variational Regret Bounds for Reinforcement Learning

Authors: Pratik Gajane, Ronald Ortner, Peter Auer

Abstract: We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms… ▽ More We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting. △ Less

Submitted 10 September, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: Presented at UAI 2019

arXiv:1805.10066 [pdf, other]

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

Authors: Pratik Gajane, Ronald Ortner, Peter Auer

Abstract: We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitab… ▽ More We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis. △ Less

Submitted 25 May, 2018; originally announced May 2018.

arXiv:1305.1318 [pdf]

Meta-Analysis of Gene Level Association Tests

Authors: Dajiang J. Liu, Gina M. Peloso, Xiaowei Zhan, Oddgeir Holmen, Matthew Zawistowski, Shuang Feng, Majid Nikpay, Paul L. Auer, Anuj Goel, He Zhang, Ulrike Peters, Martin Farrall, Marju Orho-Melander, Charles Kooperberg, Ruth McPherson, Hugh Watkins, Cristen J. Willer, Kristian Hveem, Olle Melander, Sekar Kathiresan, Gonçalo R. Abecasis

Abstract: The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare varian… ▽ More The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the unit of analysis. Here, we propose and evaluate new approaches for meta-analysis of rare variant association. We show that our approach retains useful features of single variant meta-analytic approaches and demonstrate its utility in a study of blood lipid levels in ~18,500 individuals genotyped with exome arrays. △ Less

Submitted 6 May, 2013; originally announced May 2013.

arXiv:1209.2693 [pdf, ps, other]

Regret Bounds for Restless Markov Bandits

Authors: Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos

Abstract: We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addi… ▽ More We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem. △ Less

Submitted 12 September, 2012; originally announced September 2012.

Comments: In proceedings of The 23rd International Conference on Algorithmic Learning Theory (ALT 2012)

Journal ref: Proceedings of ALT, Lyon, France, LNCS 7568, pp.214-228, 2012

arXiv:1110.6886 [pdf, other]

PAC-Bayesian Inequalities for Martingales

Authors: Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, Peter Auer

Abstract: We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and ot… ▽ More We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in probability theory and statistics, where martingales are encountered. We also present a comparison inequality that bounds the expectation of a convex function of a martingale difference sequence shifted to the [0,1] interval by the expectation of the same function of independent Bernoulli variables. This inequality is applied to derive a tighter analog of Hoeffding-Azuma's inequality. △ Less

Submitted 30 July, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

arXiv:1105.4585 [pdf, ps, other]

PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

Authors: Yevgeny Seldin, Nicolò Cesa-Bianchi, François Laviolette, Peter Auer, John Shawe-Taylor, Jan Peters

Abstract: We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolvin… ▽ More We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales. △ Less

Submitted 23 May, 2011; originally announced May 2011.

Comments: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop/

arXiv:1105.2416 [pdf, ps, other]

PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

Authors: Yevgeny Seldin, François Laviolette, John Shawe-Taylor, Jan Peters, Peter Auer

Abstract: We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concen… ▽ More We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered. △ Less

Submitted 19 May, 2011; v1 submitted 12 May, 2011; originally announced May 2011.

Showing 1–8 of 8 results for author: Auer, P