Skip to main content

Showing 1–23 of 23 results for author: Vernade, C

.
  1. arXiv:2405.18100  [pdf, other

    cs.LG math.OC

    A Pontryagin Perspective on Reinforcement Learning

    Authors: Onno Eberhard, Claire Vernade, Michael Muehlebach

    Abstract: Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing o… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2402.05878  [pdf, other

    stat.ML cs.LG

    Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

    Authors: Nicolas Nguyen, Imad Aouali, András György, Claire Vernade

    Abstract: We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is in… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2310.20266  [pdf, other

    cs.AI math.OC math.PR

    Beyond Average Return in Markov Decision Processes

    Authors: Alexandre Marthe, Aurélien Garivier, Claire Vernade

    Abstract: What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Neurips 2023, Dec 2023, New Orleans, United States

  4. arXiv:2212.14530  [pdf, other

    cs.AI cs.LG

    POMRL: No-Regret Learning-to-Plan with Increasing Horizons

    Authors: Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

    Abstract: We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying struc… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 24 pages, 6 figures

  5. arXiv:2202.13001  [pdf, other

    cs.LG stat.ML

    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

    Authors: MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

    Abstract: We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M\le K$, the learner aims to compete with the best subset of arms of size $M$. We design an algorithm based on a reduction to bandit submodular maximizati… ▽ More

    Submitted 18 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  6. arXiv:2102.04152  [pdf, other

    stat.ML cs.AI cs.LG

    EigenGame Unloaded: When playing games is better than optimizing

    Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

    Abstract: We build on the recently proposed EigenGame that views eigendecomposition as a competitive game. EigenGame's updates are biased if computed using minibatches of data, which hinders convergence and more sophisticated parallelism in the stochastic setting. In this work, we propose an unbiased stochastic update that is asymptotically equivalent to EigenGame, enjoys greater parallelism allowing comput… ▽ More

    Submitted 22 March, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Published in ICLR '22

  7. arXiv:2011.05944  [pdf, other

    stat.ML cs.LG

    Asymptotically Optimal Information-Directed Sampling

    Authors: Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

    Abstract: We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Ou… ▽ More

    Submitted 2 July, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: Accepted at COLT 2021

  8. arXiv:2010.10182  [pdf, ps, other

    stat.ML cs.LG

    The Elliptical Potential Lemma Revisited

    Authors: Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori

    Abstract: This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma. This result is important in online learning, especially for linear stochastic bandits. The original proof of the result, however short and elegant, does not give much flexibility on the type of potentials considered and we believe that this new interpretation can be of interest for future research in t… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 8 pages

  9. arXiv:2010.00554  [pdf, other

    cs.LG stat.ML

    EigenGame: PCA as a Nash Equilibrium

    Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

    Abstract: We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm -- which combines elements from Oja's rule with a generalized Gram-Schmidt orthogon… ▽ More

    Submitted 16 March, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2021

  10. arXiv:2006.10460  [pdf, other

    cs.LG stat.ML

    Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

    Authors: Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

    Abstract: We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies. We propose a new method to compute a lower bound on the value of an arbitrary target policy given some logged data in contextual bandits for a… ▽ More

    Submitted 21 March, 2022; v1 submitted 18 June, 2020; originally announced June 2020.

  11. arXiv:2006.10459  [pdf, other

    stat.ML cs.LG

    Stochastic bandits with arm-dependent delays

    Authors: Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

    Abstract: Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 19 Pages, 4 figures

    MSC Class: 62L10

  12. arXiv:2006.02119  [pdf, other

    stat.ML cs.LG

    Non-Stationary Delayed Bandits with Intermediate Observations

    Authors: Claire Vernade, Andras Gyorgy, Timothy Mann

    Abstract: Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. While mitigating the effects of delays in learning is well-understood in stationary environments, the problem becomes much more challenging when the environment changes. In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about th… ▽ More

    Submitted 11 August, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: 18 pages, 17 figures, ICML 2020

  13. arXiv:1912.03074  [pdf, other

    stat.ML cs.LG

    Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

    Authors: Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

    Abstract: Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lower bound for this problem. We close this gap by first proving that rank-one bandits are a particular instance of unimodal bandits, and then providing a… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  14. arXiv:1909.09146  [pdf, other

    cs.LG stat.ML

    Weighted Linear Bandits for Non-Stationary Environments

    Authors: Yoan Russac, Claire Vernade, Olivier Cappé

    Abstract: We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential… ▽ More

    Submitted 20 March, 2020; v1 submitted 19 September, 2019; originally announced September 2019.

    Journal ref: NeurIPS 2019 - 33rd Conference on Neural Information Processing Systems, Dec 2019, Vancouver, Canada

  15. arXiv:1807.02089  [pdf, other

    stat.ML cs.LG

    Linear Bandits with Stochastic Delayed Feedback

    Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

    Abstract: Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners ho** to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase… ▽ More

    Submitted 2 March, 2020; v1 submitted 5 July, 2018; originally announced July 2018.

  16. arXiv:1707.08820  [pdf, other

    stat.ML cs.LG

    Max K-armed bandit: On the ExtremeHunter algorithm and beyond

    Authors: Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade

    Abstract: This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold. We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014), and next propose an alternative approach, showing that, remarkably, Extreme Bandits can be reduc… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

  17. arXiv:1706.09186  [pdf, other

    cs.LG

    Stochastic Bandit Models for Delayed Conversions

    Authors: Claire Vernade, Olivier Cappé, Vianney Perchet

    Abstract: Online advertising and product recommendation are important domains of applications for multi-armed bandit methods. In these fields, the reward that is immediately available is most often only a proxy for the actual outcome of interest, which we refer to as a conversion. For instance, in web advertising, clicks can be observed within a few seconds after an ad display but the corresponding sale --i… ▽ More

    Submitted 12 July, 2017; v1 submitted 28 June, 2017; originally announced June 2017.

    Comments: Conference on Uncertainty in Artificial Intelligence, Aug 2017, Sydney, Australia

  18. arXiv:1706.01383  [pdf, other

    cs.LG

    Sparse Stochastic Bandits

    Authors: Joon Kwon, Vianney Perchet, Claire Vernade

    Abstract: In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales linearly with d (or with sqrt(d) in the minimax sense). We here consider the sparse case of this classical problem in the sense that only a small number of arms,… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

  19. arXiv:1703.06513  [pdf, other

    cs.LG stat.ML

    Bernoulli Rank-$1$ Bandits for Click Feedback

    Authors: Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

    Abstract: The probability that a user will click a search result depends both on its relevance and its position on the results page. The position based model explains this behavior by ascribing to every item an attraction probability, and to every position an examination probability. To be clicked, a result must be both attractive and examined. The probabilities of an item-position pair being clicked thus f… ▽ More

    Submitted 19 March, 2017; originally announced March 2017.

  20. arXiv:1608.03023  [pdf, other

    cs.LG stat.ML

    Stochastic Rank-1 Bandits

    Authors: Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

    Abstract: We propose stochastic rank-$1$ bandits, a class of online learning problems where at each step a learning agent chooses a pair of row and column arms, and receives the product of their values as a reward. The main challenge of the problem is that the individual values of the row and column are unobserved. We assume that these values are stochastic and drawn independently. We propose a computationa… ▽ More

    Submitted 8 March, 2017; v1 submitted 9 August, 2016; originally announced August 2016.

    Comments: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics

  21. arXiv:1606.02448  [pdf, other

    cs.LG math.ST

    Multiple-Play Bandits in the Position-Based Model

    Authors: Paul Lagrée, Claire Vernade, Olivier Cappé

    Abstract: Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting. However, a major concern in this context is when the system cannot decide whether the user feedback for each item is actually exploitable. Indeed, much of the content may have been simply ignored by the user. The present work proposes to exploit available… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

  22. arXiv:1603.01450   

    cs.DS cs.LG

    Sequential ranking under random semi-bandit feedback

    Authors: Hossein Vahabi, Paul Lagrée, Claire Vernade, Olivier Cappé

    Abstract: In many web applications, a recommendation is not a single item suggested to a user but a list of possibly interesting contents that may be ranked in some contexts. The combinatorial bandit problem has been studied quite extensively these last two years and many theoretical results now exist : lower bounds on the regret or asymptotically optimal algorithms. However, because of the variety of situa… ▽ More

    Submitted 26 May, 2016; v1 submitted 4 March, 2016; originally announced March 2016.

    Comments: This submission has been withdrawn by arXiv administrators due to irreconcilable authorship dispute

  23. arXiv:1509.09130  [pdf, ps, other

    stat.ML cs.IR cs.LG cs.SI

    Learning From Missing Data Using Selection Bias in Movie Recommendation

    Authors: Claire Vernade, Olivier Cappé

    Abstract: Recommending items to users is a challenging task due to the large amount of missing information. In many cases, the data solely consist of ratings or tags voluntarily contributed by each user on a very limited subset of the available items, so that most of the data of potential interest is actually missing. Current approaches to recommendation usually assume that the unobserved data is missing at… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.