Skip to main content

Showing 1–14 of 14 results for author: Degenne, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.03033  [pdf, other

    cs.LG stat.ML

    Optimal Multi-Fidelity Best-Arm Identification

    Authors: Riccardo Poiani, Rémy Degenne, Emilie Kaufmann, Alberto Maria Metelli, Marcello Restelli

    Abstract: In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimalit… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2305.16041  [pdf, other

    stat.ML cs.LG

    An $\varepsilon$-Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

    Authors: Marc Jourdan, Rémy Degenne, Emilie Kaufmann

    Abstract: We propose EB-TC$\varepsilon$, a novel sampling rule for $\varepsilon$-best arm identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for approximate best arm identification. EB-TC$\varepsilon$ is an *anytime* sampling rule that can therefore be employed without modification for fixed confidence or fixed budget identification (without prior knowledge of the b… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 68 pages, 14 figures, 4 tables. To be published in the Thirty-seventh Conference on Neural Information Processing Systems

  3. arXiv:2303.09468  [pdf, ps, other

    stat.ML cs.LG

    On the Existence of a Complexity in Fixed Budget Bandit Identification

    Authors: Rémy Degenne

    Abstract: In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We sh… ▽ More

    Submitted 30 June, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: 24 pages, 36th Annual Conference on Learning Theory, Proceedings of Machine Learning Research vol 195

  4. arXiv:2210.05431  [pdf, other

    stat.ML cs.LG

    Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

    Authors: Marc Jourdan, Rémy Degenne

    Abstract: A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the as… ▽ More

    Submitted 6 November, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 32 pages, 5 figures, 3 tables. To be published in the Thirty-seventh Conference on Neural Information Processing Systems

  5. arXiv:2210.00974  [pdf, other

    stat.ML cs.LG

    Dealing with Unknown Variances in Best-Arm Identification

    Authors: Marc Jourdan, Rémy Degenne, Emilie Kaufmann

    Abstract: The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known. Despite its practical relevance for many applications, few works studied it for unknown variances. In this paper we introduce and analyze two approaches to deal with unknown variances, either by plugging in the empirical variance or by adapting t… ▽ More

    Submitted 23 January, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: 73 pages, 5 figures, 3 tables. To be published in the 34th International Conference on Algorithmic Learning Theory, Singapore, 2023

  6. arXiv:2206.05979  [pdf, other

    stat.ML cs.LG

    Top Two Algorithms Revisited

    Authors: Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann

    Abstract: Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been… ▽ More

    Submitted 4 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: 75 pages, 8 figures, 3 tables

  7. arXiv:2206.04456  [pdf, other

    stat.ML cs.LG

    Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear Bandits

    Authors: Marc Jourdan, Rémy Degenne

    Abstract: In pure-exploration problems, information is gathered sequentially to answer a question on the stochastic environment. While best-arm identification for linear bandits has been extensively studied in recent years, few works have been dedicated to identifying one arm that is $\varepsilon$-close to the best one (and not exactly the best one). In this problem with several correct answers, an identifi… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 47 pages, 10 figures, 8 tables. To be published in the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  8. arXiv:2205.10936  [pdf, other

    cs.LG stat.ML

    On Elimination Strategies for Bandit Fixed-Confidence Identification

    Authors: Andrea Tirinzoni, Rémy Degenne

    Abstract: Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time. However, existing elimination strategies are often not fully adaptive (they update their sampling rule infrequently) and are not easy to extend to combinatorial settings, where the set of answer… ▽ More

    Submitted 24 October, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  9. arXiv:2007.00969  [pdf, other

    stat.ML cs.LG

    Structure Adaptive Algorithms for Stochastic Bandits

    Authors: Rémy Degenne, Han Shao, Wouter M. Koolen

    Abstract: We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are flexible (in that they easily adapt to different structures), powerful (in that they perform well empirically and/or provably match instance-dependent l… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: 10+18 pages. To be published in the proceedings of ICML 2020

  10. arXiv:2007.00953  [pdf, other

    stat.ML cs.LG

    Gamification of Pure Exploration for Linear Bandits

    Authors: Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko

    Abstract: We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: 11+25 pages. To be published in the proceedings of ICML 2020

  11. arXiv:1906.10431  [pdf, other

    stat.ML cs.LG

    Non-Asymptotic Pure Exploration by Solving Games

    Authors: Rémy Degenne, Wouter M. Koolen, Pierre Ménard

    Abstract: Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

  12. arXiv:1902.03475  [pdf, other

    cs.LG stat.ML

    Pure Exploration with Multiple Correct Answers

    Authors: Rémy Degenne, Wouter M. Koolen

    Abstract: We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. W… ▽ More

    Submitted 9 February, 2019; originally announced February 2019.

  13. arXiv:1810.04088  [pdf, other

    cs.LG stat.ML

    Bridging the gap between regret minimization and best arm identification, with application to A/B tests

    Authors: Rémy Degenne, Thomas Nedelec, Clément Calauzènes, Vianney Perchet

    Abstract: State of the art online learning procedures focus either on selecting the best alternative ("best arm identification") or on minimizing the cost (the "regret"). We merge these two objectives by providing the theoretical analysis of cost minimizing algorithms that are also delta-PAC (with a proven guaranteed bound on the decision time), hence fulfilling at the same time regret minimization and best… ▽ More

    Submitted 26 February, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

    Journal ref: AISTATS 2019 proceedings

  14. arXiv:1807.03558  [pdf, other

    cs.LG stat.ML

    Bandits with Side Observations: Bounded vs. Logarithmic Regret

    Authors: Rémy Degenne, Evrard Garcelon, Vianney Perchet

    Abstract: We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $ε$, an extra observation is gathered by the agent for free. We prove that, no matter how small $ε$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/ε)}{Δ_i}$, up to multiplicative cons… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: Conference on Uncertainty in Artificial Intelligence (UAI) 2018, 21 pages