Skip to main content

Showing 1–32 of 32 results for author: Garivier, A

Searching in archive math. Search in all archives.
.
  1. arXiv:2310.20266  [pdf, other

    cs.AI math.OC math.PR

    Beyond Average Return in Markov Decision Processes

    Authors: Alexandre Marthe, Aurélien Garivier, Claire Vernade

    Abstract: What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Neurips 2023, Dec 2023, New Orleans, United States

  2. arXiv:2306.14535  [pdf, ps, other

    cs.AI math.ST

    About the Cost of Central Privacy in Density Estimation

    Authors: Clément Lalanne, Aurélien Garivier, Rémi Gribonval

    Abstract: We study non-parametric density estimation for densities in Lipschitz and Sobolev spaces, and under central privacy. In particular, we investigate regimes where the privacy budget is not supposed to be constant. We consider the classical definition of central differential privacy, but also the more recent notion of central concentrated differential privacy. We recover the result of Barber \& Duchi… ▽ More

    Submitted 26 December, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Journal ref: Transactions on Machine Learning Research Journal, 2023

  3. arXiv:2210.05222  [pdf, other

    cs.AI math.ST

    Regret Analysis of the Stochastic Direct Search Method for Blind Resource Allocation

    Authors: Juliette Achddou, Olivier Cappe, Aurélien Garivier

    Abstract: Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed. The goal is to maximize the cumulative expected sum of returns. This is a realistic model for budget allocation across subdivisions of marketing campaigns, when t… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  4. arXiv:2210.00895  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

    Authors: Antoine Barrier, Aurélien Garivier, Gilles Stoltz

    Abstract: We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on… ▽ More

    Submitted 6 February, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Journal ref: ALT 2023 - The 34th International Conference on Algorithmic Learning Theory, Feb 2023, Singapour, Singapore

  5. arXiv:2205.06069  [pdf, ps, other

    cs.DS math.ST

    Sequential algorithms for testing identity and closeness of distributions

    Authors: Omar Fawzi, Nicolas Flammarion, Aurélien Garivier, Aadil Oufkir

    Abstract: What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $ε$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  6. arXiv:2105.12978  [pdf, other

    math.ST stat.ML

    A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

    Authors: Antoine Barrier, Aurélien Garivier, Tomáš Kocák

    Abstract: We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity.But the main advantage over other algorithms l… ▽ More

    Submitted 7 March, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

    Journal ref: 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022, Mar 2022, Valencia, Spain

  7. arXiv:1905.03495  [pdf, other

    math.ST

    Non-Asymptotic Sequential Tests for Overlap** Hypotheses and application to near optimal arm identification in bandit models

    Authors: Aurélien Garivier, Emilie Kaufmann

    Abstract: In this paper, we study sequential testing problems with \emph{overlap**} hypotheses. We first focus on the simple problem of assessing if the mean $μ$ of a Gaussian distribution is smaller or larger than a fixed $ε>0$; if $μ\in(-ε,ε)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given $K$ probability distributions on $\mathbb{R}$ w… ▽ More

    Submitted 18 November, 2021; v1 submitted 9 May, 2019; originally announced May 2019.

    Journal ref: Sequential Analysis, Taylor \& Francis, 2021

  8. arXiv:1805.05071  [pdf, other

    stat.ML cs.LG math.ST

    KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

    Authors: Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz

    Abstract: We consider $K$-armed stochastic bandits and consider cumulative regret bounds up to time $T$. We are interested in strategies achieving simultaneously a distribution-free regret bound of optimal order $\sqrt{KT}$ and a distribution-dependent regret that is asymptotically optimal, that is, matching the $κ\ln T$ lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996), where $κ$ is t… ▽ More

    Submitted 1 July, 2022; v1 submitted 14 May, 2018; originally announced May 2018.

  9. arXiv:1711.04454  [pdf, other

    math.ST stat.ML

    Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

    Authors: Aurélien Garivier, Pierre Ménard, Laurent Rossi, Pierre Menard

    Abstract: We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid for any risk $δ$ and any $δ$-correct algorithm; in addition, we propose an algorithm whose sample complexity is of the same order of magnitude for small risks. This work is motivated by phase 1 clinical tr… ▽ More

    Submitted 24 July, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

  10. arXiv:1702.07211  [pdf, ps, other

    stat.ML cs.LG math.ST

    A minimax and asymptotically optimal algorithm for stochastic bandits

    Authors: Pierre Ménard, Aurélien Garivier

    Abstract: We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with s… ▽ More

    Submitted 20 September, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

    Journal ref: Algorithmic Learning Theory, Springer, 2017, 2017 Algorithmic Learning Theory Conference 76

  11. arXiv:1702.00001  [pdf, other

    cs.LG math.ST stat.ML

    Learning the distribution with largest mean: two bandit frameworks

    Authors: Emilie Kaufmann, Aurélien Garivier

    Abstract: Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a… ▽ More

    Submitted 7 November, 2017; v1 submitted 31 January, 2017; originally announced February 2017.

    Journal ref: ESAIM: Proceedings and Surveys, EDP Sciences, A Para{î}tre, 2017, pp.1 - 10

  12. arXiv:1605.08988  [pdf, other

    math.ST cs.LG

    On Explore-Then-Commit Strategies

    Authors: Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

    Abstract: We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stop** time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message… ▽ More

    Submitted 14 November, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

  13. arXiv:1602.07182  [pdf, other

    math.ST cs.LG

    Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

    Authors: Aurélien Garivier, Pierre Ménard, Gilles Stoltz

    Abstract: We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds… ▽ More

    Submitted 13 October, 2018; v1 submitted 23 February, 2016; originally announced February 2016.

  14. arXiv:1602.04676  [pdf, ps, other

    math.ST cs.GT stat.ML

    Maximin Action Identification: A New Bandit Framework for Games

    Authors: Aurélien Garivier, Emilie Kaufmann, Wouter Koolen

    Abstract: We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which ope… ▽ More

    Submitted 15 February, 2016; originally announced February 2016.

  15. arXiv:1602.04589  [pdf, ps, other

    math.ST cs.LG stat.ML

    Optimal Best Arm Identification with Fixed Confidence

    Authors: Aurélien Garivier, Emilie Kaufmann

    Abstract: We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stop** ru… ▽ More

    Submitted 1 June, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

    Comments: Conference on Learning Theory (COLT), Jun 2016, New York, United States

  16. arXiv:1508.06505  [pdf, other

    math.ST

    Conditional quantile sequential estimation for stochastic codes

    Authors: Tatiana Labopin-Richard, Fabrice Gamboa, Aurélien Garivier, Jerome Stenger

    Abstract: We propose and analyze an algorithm for the sequential estimation of a conditional quantile in the context of real stochastic codes with vectorvalued inputs. Our algorithm is based on k-nearest neighbors smoothing within a Robbins-Monro estimator. We discuss the convergence of the algorithm under some conditions on the stochastic code. We provide non-asymptotic rates of convergence of the mean squ… ▽ More

    Submitted 5 August, 2019; v1 submitted 26 August, 2015; originally announced August 2015.

  17. arXiv:1405.6677  [pdf, other

    math.ST q-fin.RM

    Bregman superquantiles. Estimation methods and applications

    Authors: Tatiana Labopin-Richard, Fabrice Gamboa, Aurélien Garivier, Bertrand Iooss

    Abstract: In this work, we extend some quantities introduced in "Optimization of conditional value-at-risk" of R.T Rockafellar and S. Uryasev to the case where the proximity between real numbers is measured by using a Bregman divergence. This leads to the definition of the Bregman superquantile. Axioms of a coherent measure of risk discussed in "Coherent approches to risk in optimization under uncertainty"… ▽ More

    Submitted 6 January, 2016; v1 submitted 26 May, 2014; originally announced May 2014.

  18. arXiv:1405.3224  [pdf, other

    math.ST cs.LG stat.ML

    On the Complexity of A/B Testing

    Authors: Emilie Kaufmann, Olivier Cappé, Aurélien Garivier

    Abstract: A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity… ▽ More

    Submitted 24 February, 2015; v1 submitted 13 May, 2014; originally announced May 2014.

    Journal ref: Conference on Learning Theory, Jun 2014, Barcelona, Spain. JMLR: Workshop and Conference Proceedings, 35, pp.461-481

  19. arXiv:1403.3758  [pdf, other

    math.ST cs.DB

    Big Data Analytics - Retour vers le Futur 3; De Statisticien à Data Scientist

    Authors: Philippe Besse, Aurélien Garivier, Jean-Michel Loubes

    Abstract: The rapid evolution of information systems managing more and more voluminous data has caused profound paradigm shifts in the job of statistician, becoming successively data miner, bioinformatician and now data scientist. Without the sake of completeness and after having illustrated these successive mutations, this article briefly introduced the new research issues that quickly rise in Statistics,… ▽ More

    Submitted 21 May, 2014; v1 submitted 15 March, 2014; originally announced March 2014.

    Comments: in French

  20. Informational Confidence Bounds for Self-Normalized Averages and Applications

    Authors: Aurélien Garivier

    Abstract: We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations. The results rely on a peeling argument in exponential martingale techniques that represents an alternative to the method of mixture. The motivating examples of bandit problems and context tree estimation are detailed.

    Submitted 13 September, 2013; originally announced September 2013.

    ACM Class: G.3

    Journal ref: 2013 IEEE Information Theory Workshop p.489-493

  21. arXiv:1210.1136  [pdf, ps, other

    math.PR math.ST

    Kullback-Leibler upper confidence bounds for optimal sequential allocation

    Authors: Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz

    Abstract: We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148-177], based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this… ▽ More

    Submitted 26 August, 2013; v1 submitted 3 October, 2012; originally announced October 2012.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOS1119 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1119

    Journal ref: Annals of Statistics 2013, Vol. 41, No. 3, 1516-1541

  22. Sequential Monte Carlo smoothing for general state space hidden Markov models

    Authors: Randal Douc, Aurélien Garivier, Eric Moulines, Jimmy Olsson

    Abstract: Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models. The aim of this paper is to provide a foundation of particle-based approximation of such distributions and to analyze, in a common unifying framework, different schemes producing such approximations.… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Comments: Published in at http://dx.doi.org/10.1214/10-AAP735 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1012.4183 by other authors

    Report number: IMS-AAP-AAP735

    Journal ref: Annals of Applied Probability 2011, Vol. 21, No. 6, 2109-2145

  23. arXiv:1111.2191  [pdf, other

    math.ST

    Oracle approach and slope heuristic in context tree estimation

    Authors: A. Garivier, M. Lerasle

    Abstract: We introduce a general approach to prove oracle properties in context tree selection. The results derive from a concentration condition that is verified, for example, by mixing processes. Moreover, we show the superiority of the oracle approach from a non-asymptotic point of view in simulations where the classical BIC estimator has nice oracle properties even when it does not recover the source.… ▽ More

    Submitted 9 November, 2011; originally announced November 2011.

    Comments: 51 pages, 10 figures

  24. arXiv:1110.5447  [pdf, ps, other

    math.OC cs.LG

    Optimal discovery with probabilistic expert advice

    Authors: Sébastien Bubeck, Damien Ernst, Aurélien Garivier

    Abstract: We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turing missing mass estimator. We show that this strategy uniformly attains the optimal discovery rate in a macroscopic limit sense, under some assumptions… ▽ More

    Submitted 25 October, 2011; originally announced October 2011.

    MSC Class: 93E35

  25. arXiv:1106.5971  [pdf, ps, other

    math.PR cs.DS

    Perfect Simulation Of Processes With Long Memory: A `Coupling Into And From The Past' Algorithm

    Authors: Aurélien Garivier

    Abstract: We describe a new algorithm for the perfect simulation of variable length Markov chains and random systems with perfect connections. This algorithm, which generalizes Propp and Wilson's simulation scheme, is based on the idea of coupling into and from the past. It improves on existing algorithms by relaxing the conditions on the kernel and by accelerating convergence, even in the simple case of fi… ▽ More

    Submitted 14 October, 2013; v1 submitted 29 June, 2011; originally announced June 2011.

    Comments: 22 pages, 8 figures

    MSC Class: 60J22 ACM Class: G.3

  26. arXiv:1102.2490  [pdf, ps, other

    math.ST cs.LG eess.SY math.OC

    The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

    Authors: Aurélien Garivier, Olivier Cappé

    Abstract: This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Furthermore, we… ▽ More

    Submitted 29 August, 2013; v1 submitted 12 February, 2011; originally announced February 2011.

    Comments: 18 pages, 3 figures; Conf. Comput. Learning Theory (COLT) 2011 in Budapest, Hungary

    MSC Class: 93E35

    Journal ref: Conference On Learning Theory n°24 Jul. 2011 pp.359-376

  27. Joint estimation of intersecting context tree models

    Authors: Antonio Galves, Aurélien Garivier, Elisabeth Gassiat

    Abstract: We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and share many of their contexts. In order to understand the differences between the two sources, it is important to identify which contexts and which transition pro… ▽ More

    Submitted 3 October, 2012; v1 submitted 3 February, 2011; originally announced February 2011.

    ACM Class: G.3

  28. arXiv:1011.2424  [pdf, ps, other

    math.ST math.PR

    Context Tree Selection: A Unifying View

    Authors: Aurélien Garivier, Florencia Leonardi

    Abstract: The present paper investigates non-asymptotic properties of two popular procedures of context tree (or Variable Length Markov Chains) estimation: Rissanen's algorithm Context and the Penalized Maximum Likelihood criterion. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory… ▽ More

    Submitted 29 June, 2011; v1 submitted 10 November, 2010; originally announced November 2010.

  29. arXiv:1004.5229  [pdf, ps, other

    cs.LG math.ST stat.ML

    Optimism in Reinforcement Learning and Kullback-Leibler Divergence

    Authors: Sarah Filippi, Olivier Cappé, Aurélien Garivier

    Abstract: We consider model-based reinforcement learning in finite Markov De- cision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value it- erations under a constraint of consistency with the estimated model tran- sition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recen… ▽ More

    Submitted 13 October, 2010; v1 submitted 29 April, 2010; originally announced April 2010.

    Comments: This work has been accepted and presented at ALLERTON 2010; Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, Monticello (Illinois) : États-Unis (2010)

  30. arXiv:0904.0316  [pdf, ps, other

    math.ST

    On the Forward Filtering Backward Smoothing particle approximations of the smoothing distribution in general state spaces models

    Authors: Randal Douc, Aurelien Garivier, Eric Moulines, Jimmy Olsson

    Abstract: A prevalent problem in general state-space models is the approximation of the smoothing distribution of a state, or a sequence of states, conditional on the observations from the past, the present, and the future. The aim of this paper is to provide a rigorous foundation for the calculation, or approximation, of such smoothed distributions, and to analyse in a common unifying framework different… ▽ More

    Submitted 2 April, 2009; originally announced April 2009.

    MSC Class: 60G10; 60K35; 60G18

  31. arXiv:0805.3415  [pdf, ps, other

    math.ST

    On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

    Authors: Aurélien Garivier, Eric Moulines

    Abstract: Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not change in time, Upper-Confidence Bound (UCB) policies have been shown to be rate optimal. A challenging variant of the MABP is the non-stationary bandit problem… ▽ More

    Submitted 22 May, 2008; originally announced May 2008.

    Comments: 24 pages

  32. Coding on countably infinite alphabets

    Authors: Stéphane Boucheron, Aurélien Garivier, Elisabeth Gassiat

    Abstract: This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax red… ▽ More

    Submitted 16 January, 2008; originally announced January 2008.

    Comments: 33 pages

    MSC Class: 62B10; 68P30; 94A29

    Journal ref: Information Theory, IEEE Transactions on (Volume:55 , Issue: 1 ) 358 - 373 Jan. 2009