-
Autonomous exploration for navigating in non-stationary CMPs
Authors:
Pratik Gajane,
Ronald Ortner,
Peter Auer,
Csaba Szepesvari
Abstract:
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prov…
▽ More
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prove an upper bound on the exploration steps in terms of the number of changes.
△ Less
Submitted 18 October, 2019;
originally announced October 2019.
-
Variational Regret Bounds for Reinforcement Learning
Authors:
Pratik Gajane,
Ronald Ortner,
Peter Auer
Abstract:
We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms…
▽ More
We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting.
△ Less
Submitted 10 September, 2019; v1 submitted 14 May, 2019;
originally announced May 2019.
-
A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions
Authors:
Pratik Gajane,
Ronald Ortner,
Peter Auer
Abstract:
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitab…
▽ More
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
Meta-Analysis of Gene Level Association Tests
Authors:
Dajiang J. Liu,
Gina M. Peloso,
Xiaowei Zhan,
Oddgeir Holmen,
Matthew Zawistowski,
Shuang Feng,
Majid Nikpay,
Paul L. Auer,
Anuj Goel,
He Zhang,
Ulrike Peters,
Martin Farrall,
Marju Orho-Melander,
Charles Kooperberg,
Ruth McPherson,
Hugh Watkins,
Cristen J. Willer,
Kristian Hveem,
Olle Melander,
Sekar Kathiresan,
Gonçalo R. Abecasis
Abstract:
The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare varian…
▽ More
The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the unit of analysis. Here, we propose and evaluate new approaches for meta-analysis of rare variant association. We show that our approach retains useful features of single variant meta-analytic approaches and demonstrate its utility in a study of blood lipid levels in ~18,500 individuals genotyped with exome arrays.
△ Less
Submitted 6 May, 2013;
originally announced May 2013.
-
Regret Bounds for Restless Markov Bandits
Authors:
Ronald Ortner,
Daniil Ryabko,
Peter Auer,
Rémi Munos
Abstract:
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addi…
▽ More
We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.
△ Less
Submitted 12 September, 2012;
originally announced September 2012.
-
PAC-Bayesian Inequalities for Martingales
Authors:
Yevgeny Seldin,
François Laviolette,
Nicolò Cesa-Bianchi,
John Shawe-Taylor,
Peter Auer
Abstract:
We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and ot…
▽ More
We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in probability theory and statistics, where martingales are encountered.
We also present a comparison inequality that bounds the expectation of a convex function of a martingale difference sequence shifted to the [0,1] interval by the expectation of the same function of independent Bernoulli variables. This inequality is applied to derive a tighter analog of Hoeffding-Azuma's inequality.
△ Less
Submitted 30 July, 2012; v1 submitted 31 October, 2011;
originally announced October 2011.
-
PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off
Authors:
Yevgeny Seldin,
Nicolò Cesa-Bianchi,
François Laviolette,
Peter Auer,
John Shawe-Taylor,
Jan Peters
Abstract:
We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolvin…
▽ More
We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.
△ Less
Submitted 23 May, 2011;
originally announced May 2011.
-
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Authors:
Yevgeny Seldin,
François Laviolette,
John Shawe-Taylor,
Jan Peters,
Peter Auer
Abstract:
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concen…
▽ More
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered.
△ Less
Submitted 19 May, 2011; v1 submitted 12 May, 2011;
originally announced May 2011.