Skip to main content

Showing 1–11 of 11 results for author: Baudry, D

.
  1. arXiv:2406.06805  [pdf, other

    cs.DS

    Lookback Prophet Inequalities

    Authors: Ziyad Benomar, Dorian Baudry, Vianney Perchet

    Abstract: Prophet inequalities are fundamental optimal stop** problems, where a decision-maker observes sequentially items with values sampled independently from known distributions, and must decide at each new observation to either stop and gain the current value or reject it irrevocably and move to the next step. This model is often too pessimistic and does not adequately represent real-world online sel… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2403.11637  [pdf, other

    cs.LG stat.ML

    The Value of Reward Lookahead in Reinforcement Learning

    Authors: Nadav Merlis, Dorian Baudry, Vianney Perchet

    Abstract: In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only after acting, and so the goal is to maximize the expected cumulative reward. Yet, in many practical settings, reward information is observed in advance -- prices are observed before performing transactions; nearby traffic informat… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  3. The Value of Extended Reality Techniques to Improve Remote Collaborative Maintenance Operations: A User Study

    Authors: Corentin Coupry, Paul Richard, David Bigaud, Sylvain Noblecourt, David Baudry

    Abstract: In the Architecture, Engineering and Construction (AEC) sector, data extracted from building information modelling (BIM) can be used to create a digital twin (DT). The algorithms of a BIM-based DT can facilitate the retrieval of information, which can then be used to improve building operation and maintenance procedures. However, with the increased complexity and automation of the building, mainte… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Journal ref: CONVR2023 - 23rd International Conference on Construction Applications of Virtual Reality ''MANAGING THE DIGITAL TRANSFORMATION OF CONSTRUCTION INDUSTRY'', University of Florence, Italy, Nov 2023, Florence, Italy. pp.23-33

  4. arXiv:2303.06058  [pdf, ps, other

    cs.LG stat.ML

    A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

    Authors: Dorian Baudry, Kazuya Suzuki, Junya Honda

    Abstract: In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling… ▽ More

    Submitted 21 December, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

  5. arXiv:2210.04537  [pdf, other

    cs.AI

    Towards an efficient and risk aware strategy for guiding farmers in identifying best crop management

    Authors: Romain Gautron, Dorian Baudry, Myriam Adam, Gatien N Falconnier, Marc Corbeels

    Abstract: Identification of best performing fertilizer practices among a set of contrasting practices with field trials is challenging as crop losses are costly for farmers. To identify best management practices, an ''intuitive strategy'' would be to set multi-year field trials with equal proportion of each practice to test. Our objective was to provide an identification strategy using a bandit algorithm th… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  6. arXiv:2206.05979  [pdf, other

    stat.ML cs.LG

    Top Two Algorithms Revisited

    Authors: Marc Jourdan, Rémy Degenne, Dorian Baudry, Rianne de Heide, Emilie Kaufmann

    Abstract: Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been… ▽ More

    Submitted 4 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: 75 pages, 8 figures, 3 tables

  7. arXiv:2203.10883  [pdf, other

    cs.LG

    Efficient Algorithms for Extreme Bandits

    Authors: Dorian Baudry, Yoan Russac, Emilie Kaufmann

    Abstract: In this paper, we contribute to the Extreme Bandit problem, a variant of Multi-Armed Bandits in which the learner seeks to collect the largest possible reward. We first study the concentration of the maximum of i.i.d random variables under mild assumptions on the tail of the rewards distributions. This analysis motivates the introduction of Quantile of Maxima (QoMax). The properties of QoMax are s… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Proceedings of the 25 th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  8. arXiv:2111.09724  [pdf, other

    stat.ML cs.LG

    From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

    Authors: Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

    Abstract: The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e.g bounded with known support, exponential family, etc). These assumptions are suitable for many real-world problems but sometimes they require knowledge (on tails for instance) that may not be precisely accessible to the practitioner, raising the question of the robustness o… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Journal ref: Neurips 2021, Dec 2021, Sydney, Australia

  9. arXiv:2106.10935  [pdf, other

    cs.AI cs.LG

    On Limited-Memory Subsampling Strategies for Bandits

    Authors: Dorian Baudry, Yoan Russac, Olivier Cappé

    Abstract: There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling. One drawback however of these approaches is the additional complexity required by random subsampling and the storage of the full history of rewards. Our first contribution is to show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. (2020) under the name of ''… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Journal ref: ICML 2021- International Conference on Machine Learning, Jul 2021, Vienna- Virtual, Austria

  10. arXiv:2012.05754  [pdf, other

    cs.LG

    Optimal Thompson Sampling strategies for support-aware CVaR bandits

    Authors: Dorian Baudry, Romain Gautron, Emilie Kaufmann, Odalric-Ambryn Maillard

    Abstract: In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of p… ▽ More

    Submitted 21 March, 2022; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021). In this version we refine Lemma 2 and correct its proof (does not change the main theorems)

  11. arXiv:2010.14323  [pdf, other

    stat.ML cs.LG

    Sub-sampling for Efficient Non-Parametric Bandit Exploration

    Authors: Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

    Abstract: In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020, Dec 2020, Vancouver, Canada