Skip to main content

Showing 1–8 of 8 results for author: Auer, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:1910.08446  [pdf, ps, other

    cs.LG stat.ML

    Autonomous exploration for navigating in non-stationary CMPs

    Authors: Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

    Abstract: We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change. For this setting, we propose a performance measure called exploration steps which counts the time steps at which the learner lacks sufficient knowledge to navigate its environment efficiently. We devise a learning meta-algorithm, MNM and prov… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

  2. arXiv:1905.05857  [pdf, ps, other

    cs.LG stat.ML

    Variational Regret Bounds for Reinforcement Learning

    Authors: Pratik Gajane, Ronald Ortner, Peter Auer

    Abstract: We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms… ▽ More

    Submitted 10 September, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: Presented at UAI 2019

  3. arXiv:1805.10066  [pdf, other

    cs.LG stat.ML

    A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

    Authors: Pratik Gajane, Ronald Ortner, Peter Auer

    Abstract: We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitab… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

  4. arXiv:1305.1318  [pdf

    stat.ME

    Meta-Analysis of Gene Level Association Tests

    Authors: Dajiang J. Liu, Gina M. Peloso, Xiaowei Zhan, Oddgeir Holmen, Matthew Zawistowski, Shuang Feng, Majid Nikpay, Paul L. Auer, Anuj Goel, He Zhang, Ulrike Peters, Martin Farrall, Marju Orho-Melander, Charles Kooperberg, Ruth McPherson, Hugh Watkins, Cristen J. Willer, Kristian Hveem, Olle Melander, Sekar Kathiresan, Gonçalo R. Abecasis

    Abstract: The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large samples sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare varian… ▽ More

    Submitted 6 May, 2013; originally announced May 2013.

  5. arXiv:1209.2693  [pdf, ps, other

    cs.LG math.OC stat.ML

    Regret Bounds for Restless Markov Bandits

    Authors: Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos

    Abstract: We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after $T$ steps achieves $\tilde{O}(\sqrt{T})$ regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addi… ▽ More

    Submitted 12 September, 2012; originally announced September 2012.

    Comments: In proceedings of The 23rd International Conference on Algorithmic Learning Theory (ALT 2012)

    Journal ref: Proceedings of ALT, Lyon, France, LNCS 7568, pp.214-228, 2012

  6. arXiv:1110.6886  [pdf, other

    cs.LG cs.IT stat.ML

    PAC-Bayesian Inequalities for Martingales

    Authors: Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, Peter Auer

    Abstract: We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and ot… ▽ More

    Submitted 30 July, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

  7. arXiv:1105.4585  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

    Authors: Yevgeny Seldin, Nicolò Cesa-Bianchi, François Laviolette, Peter Auer, John Shawe-Taylor, Jan Peters

    Abstract: We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolvin… ▽ More

    Submitted 23 May, 2011; originally announced May 2011.

    Comments: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop/

  8. arXiv:1105.2416  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

    Authors: Yevgeny Seldin, François Laviolette, John Shawe-Taylor, Jan Peters, Peter Auer

    Abstract: We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concen… ▽ More

    Submitted 19 May, 2011; v1 submitted 12 May, 2011; originally announced May 2011.