Search | arXiv e-print repository

arXiv:2310.20266 [pdf, other]

Beyond Average Return in Markov Decision Processes

Authors: Alexandre Marthe, Aurélien Garivier, Claire Vernade

Abstract: What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we… ▽ More What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).DistRL permits, however, to evaluate other functionals approximately. We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.These results contribute to advancing the theory of Markov Decision Processes by examining overall characteristics of the return, and particularly risk-conscious strategies. △ Less

Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: Neurips 2023, Dec 2023, New Orleans, United States

arXiv:2306.14535 [pdf, ps, other]

About the Cost of Central Privacy in Density Estimation

Authors: Clément Lalanne, Aurélien Garivier, Rémi Gribonval

Abstract: We study non-parametric density estimation for densities in Lipschitz and Sobolev spaces, and under central privacy. In particular, we investigate regimes where the privacy budget is not supposed to be constant. We consider the classical definition of central differential privacy, but also the more recent notion of central concentrated differential privacy. We recover the result of Barber \& Duchi… ▽ More We study non-parametric density estimation for densities in Lipschitz and Sobolev spaces, and under central privacy. In particular, we investigate regimes where the privacy budget is not supposed to be constant. We consider the classical definition of central differential privacy, but also the more recent notion of central concentrated differential privacy. We recover the result of Barber \& Duchi (2014) stating that histogram estimators are optimal against Lipschitz distributions for the L2 risk, and under regular differential privacy, and we extend it to other norms and notions of privacy. Then, we investigate higher degrees of smoothness, drawing two conclusions: First, and contrary to what happens with constant privacy budget (Wasserman \& Zhou, 2010), there are regimes where imposing privacy degrades the regular minimax risk of estimation on Sobolev densities. Second, so-called projection estimators are near-optimal against the same classes of densities in this new setup with pure differential privacy, but contrary to the constant privacy budget case, it comes at the cost of relaxation. With zero concentrated differential privacy, there is no need for relaxation, and we prove that the estimation is optimal. △ Less

Submitted 26 December, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Journal ref: Transactions on Machine Learning Research Journal, 2023

arXiv:2210.05222 [pdf, other]

Regret Analysis of the Stochastic Direct Search Method for Blind Resource Allocation

Authors: Juliette Achddou, Olivier Cappe, Aurélien Garivier

Abstract: Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed. The goal is to maximize the cumulative expected sum of returns. This is a realistic model for budget allocation across subdivisions of marketing campaigns, when t… ▽ More Motivated by programmatic advertising optimization, we consider the task of sequentially allocating budget across a set of resources. At every time step, a feasible allocation is chosen and only a corresponding random return is observed. The goal is to maximize the cumulative expected sum of returns. This is a realistic model for budget allocation across subdivisions of marketing campaigns, when the objective is to maximize the number of conversions. We study direct search (aka pattern search) methods for linearly constrained and derivative-free optimization in the presence of noise. Those algorithms are easy to implement and particularly suited to constrained optimization. They have not yet been analyzed from the perspective of cumulative regret. We provide a regret upper-bound of the order of T 2/3 in the general case. Our mathematical analysis also establishes, as a by-product, time-independent regret bounds in the deterministic, unconstrained case. We also propose an improved version of the method relying on sequential tests to accelerate the identification of descent directions. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.00895 [pdf, other]

On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

Authors: Antoine Barrier, Aurélien Garivier, Gilles Stoltz

Abstract: We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on… ▽ More We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions. △ Less

Submitted 6 February, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Journal ref: ALT 2023 - The 34th International Conference on Algorithmic Learning Theory, Feb 2023, Singapour, Singapore

arXiv:2205.06069 [pdf, ps, other]

Sequential algorithms for testing identity and closeness of distributions

Authors: Omar Fawzi, Nicolas Flammarion, Aurélien Garivier, Aadil Oufkir

Abstract: What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $ε$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms… ▽ More What advantage do \emph{sequential} procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions $\mathcal{D}_1$ and $\mathcal{D}_2$ on $\{1,\dots, n\}$ are equal or $ε$-far, we give several answers to this question. We show that for a small alphabet size $n$, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least $4$ in terms sample complexity. For a general alphabet size $n$, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance $TV(\mathcal{D}_1, \mathcal{D}_2)$ between $\mathcal{D}_1$ and $\mathcal{D}_2$ is larger than $ε$. As a corollary, letting $ε$ go to $0$, we obtain a sequential algorithm for testing closeness when no a priori bound on $TV(\mathcal{D}_1, \mathcal{D}_2)$ is given that has a sample complexity $\tilde{\mathcal{O}}(\frac{n^{2/3}}{TV(\mathcal{D}_1, \mathcal{D}_2)^{4/3}})$: this improves over the $\tilde{\mathcal{O}}(\frac{n/\log n}{TV(\mathcal{D}_1, \mathcal{D}_2)^{2} })$ tester of \cite{daskalakis2017optimal} and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing identity and closeness: they can improve the worst case number of samples by at most a constant factor. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2105.12978 [pdf, other]

A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

Authors: Antoine Barrier, Aurélien Garivier, Tomáš Kocák

Abstract: We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity.But the main advantage over other algorithms l… ▽ More We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity.But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest. △ Less

Submitted 7 March, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

Journal ref: 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022, Mar 2022, Valencia, Spain

arXiv:1905.03495 [pdf, other]

Non-Asymptotic Sequential Tests for Overlap** Hypotheses and application to near optimal arm identification in bandit models

Authors: Aurélien Garivier, Emilie Kaufmann

Abstract: In this paper, we study sequential testing problems with \emph{overlap**} hypotheses. We first focus on the simple problem of assessing if the mean $μ$ of a Gaussian distribution is smaller or larger than a fixed $ε>0$; if $μ\in(-ε,ε)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given $K$ probability distributions on $\mathbb{R}$ w… ▽ More In this paper, we study sequential testing problems with \emph{overlap**} hypotheses. We first focus on the simple problem of assessing if the mean $μ$ of a Gaussian distribution is smaller or larger than a fixed $ε>0$; if $μ\in(-ε,ε)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given $K$ probability distributions on $\mathbb{R}$ with means $μ_1,\dots,μ_K$, we derive the asymptotic complexity of identifying, with risk at most $δ$, an index $I\in\{1,\dots,K\}$ such that $μ_I\geq \max_iμ_i -ε$. We provide non-asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on two versions of a change of measure lemma (a high-level form, and a low-level form) whose relative merits are discussed. △ Less

Submitted 18 November, 2021; v1 submitted 9 May, 2019; originally announced May 2019.

Journal ref: Sequential Analysis, Taylor \& Francis, 2021

arXiv:1805.05071 [pdf, other]

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

Authors: Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz

Abstract: We consider $K$-armed stochastic bandits and consider cumulative regret bounds up to time $T$. We are interested in strategies achieving simultaneously a distribution-free regret bound of optimal order $\sqrt{KT}$ and a distribution-dependent regret that is asymptotically optimal, that is, matching the $κ\ln T$ lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996), where $κ$ is t… ▽ More We consider $K$-armed stochastic bandits and consider cumulative regret bounds up to time $T$. We are interested in strategies achieving simultaneously a distribution-free regret bound of optimal order $\sqrt{KT}$ and a distribution-dependent regret that is asymptotically optimal, that is, matching the $κ\ln T$ lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996), where $κ$ is the optimal problem-dependent constant. This constant $κ$ depends on the model $\mathcal{D}$ considered (the family of possible distributions over the arms). Ménard and Garivier (2017) provided strategies achieving such a bi-optimality in the parametric case of models given by one-dimensional exponential families, while Lattimore (2016, 2018) did so for the family of (sub)Gaussian distributions with variance less than $1$. We extend this result to the non-parametric case of all distributions over $[0,1]$. We do so by combining the MOSS strategy by Audibert and Bubeck (2009), which enjoys a distribution-free regret bound of optimal order $\sqrt{KT}$, and the KL-UCB strategy by Cappé et al. (2013), for which we provide in passing the first analysis of an optimal distribution-dependent $κ\ln T$ regret bound in the model of all distributions over $[0,1]$. We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for $K$-armed stochastic bandits. △ Less

Submitted 1 July, 2022; v1 submitted 14 May, 2018; originally announced May 2018.

arXiv:1711.04454 [pdf, other]

Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

Authors: Aurélien Garivier, Pierre Ménard, Laurent Rossi, Pierre Menard

Abstract: We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid for any risk $δ$ and any $δ$-correct algorithm; in addition, we propose an algorithm whose sample complexity is of the same order of magnitude for small risks. This work is motivated by phase 1 clinical tr… ▽ More We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid for any risk $δ$ and any $δ$-correct algorithm; in addition, we propose an algorithm whose sample complexity is of the same order of magnitude for small risks. This work is motivated by phase 1 clinical trials, a practically important setting where the arm means are increasing by nature, and where no satisfactory solution is available so far. △ Less

Submitted 24 July, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

arXiv:1702.07211 [pdf, ps, other]

A minimax and asymptotically optimal algorithm for stochastic bandits

Authors: Pierre Ménard, Aurélien Garivier

Abstract: We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with s… ▽ More We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs. △ Less

Submitted 20 September, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

Journal ref: Algorithmic Learning Theory, Springer, 2017, 2017 Algorithmic Learning Theory Conference 76

arXiv:1702.00001 [pdf, other]

Learning the distribution with largest mean: two bandit frameworks

Authors: Emilie Kaufmann, Aurélien Garivier

Abstract: Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a… ▽ More Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem. △ Less

Submitted 7 November, 2017; v1 submitted 31 January, 2017; originally announced February 2017.

Journal ref: ESAIM: Proceedings and Surveys, EDP Sciences, A Para{î}tre, 2017, pp.1 - 10

arXiv:1605.08988 [pdf, other]

On Explore-Then-Commit Strategies

Authors: Aurélien Garivier, Emilie Kaufmann, Tor Lattimore

Abstract: We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stop** time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message… ▽ More We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stop** time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case. △ Less

Submitted 14 November, 2016; v1 submitted 29 May, 2016; originally announced May 2016.

arXiv:1602.07182 [pdf, other]

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Authors: Aurélien Garivier, Pierre Ménard, Gilles Stoltz

Abstract: We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds… ▽ More We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications. △ Less

Submitted 13 October, 2018; v1 submitted 23 February, 2016; originally announced February 2016.

arXiv:1602.04676 [pdf, ps, other]

Maximin Action Identification: A New Bandit Framework for Games

Authors: Aurélien Garivier, Emilie Kaufmann, Wouter Koolen

Abstract: We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which ope… ▽ More We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1602.04589 [pdf, ps, other]

Optimal Best Arm Identification with Fixed Confidence

Authors: Aurélien Garivier, Emilie Kaufmann

Abstract: We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stop** ru… ▽ More We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stop** rule named after Chernoff, for which we give a new analysis. △ Less

Submitted 1 June, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

Comments: Conference on Learning Theory (COLT), Jun 2016, New York, United States

arXiv:1508.06505 [pdf, other]

Conditional quantile sequential estimation for stochastic codes

Authors: Tatiana Labopin-Richard, Fabrice Gamboa, Aurélien Garivier, Jerome Stenger

Abstract: We propose and analyze an algorithm for the sequential estimation of a conditional quantile in the context of real stochastic codes with vectorvalued inputs. Our algorithm is based on k-nearest neighbors smoothing within a Robbins-Monro estimator. We discuss the convergence of the algorithm under some conditions on the stochastic code. We provide non-asymptotic rates of convergence of the mean squ… ▽ More We propose and analyze an algorithm for the sequential estimation of a conditional quantile in the context of real stochastic codes with vectorvalued inputs. Our algorithm is based on k-nearest neighbors smoothing within a Robbins-Monro estimator. We discuss the convergence of the algorithm under some conditions on the stochastic code. We provide non-asymptotic rates of convergence of the mean squared error and we discuss the tuning of the algorithm's parameters. △ Less

Submitted 5 August, 2019; v1 submitted 26 August, 2015; originally announced August 2015.

arXiv:1405.6677 [pdf, other]

Bregman superquantiles. Estimation methods and applications

Authors: Tatiana Labopin-Richard, Fabrice Gamboa, Aurélien Garivier, Bertrand Iooss

Abstract: In this work, we extend some quantities introduced in "Optimization of conditional value-at-risk" of R.T Rockafellar and S. Uryasev to the case where the proximity between real numbers is measured by using a Bregman divergence. This leads to the definition of the Bregman superquantile. Axioms of a coherent measure of risk discussed in "Coherent approches to risk in optimization under uncertainty"… ▽ More In this work, we extend some quantities introduced in "Optimization of conditional value-at-risk" of R.T Rockafellar and S. Uryasev to the case where the proximity between real numbers is measured by using a Bregman divergence. This leads to the definition of the Bregman superquantile. Axioms of a coherent measure of risk discussed in "Coherent approches to risk in optimization under uncertainty" of R.T Rockafellar are studied in the case of Bregman superquantile. Furthermore, we deal with asymptotic properties of a Monte Carlo estimator of the Bregman superquantile. △ Less

Submitted 6 January, 2016; v1 submitted 26 May, 2014; originally announced May 2014.

arXiv:1405.3224 [pdf, other]

On the Complexity of A/B Testing

Authors: Emilie Kaufmann, Olivier Cappé, Aurélien Garivier

Abstract: A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity… ▽ More A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing that improve over the results currently available both in the fixed-confidence (or delta-PAC) and fixed-budget settings. When the distribution of the outcomes are Gaussian, we prove that the complexity of the fixed-confidence and fixed-budget settings are equivalent, and that uniform sampling of both alternatives is optimal only in the case of equal variances. In the common variance case, we also provide a stop** rule that terminates faster than existing fixed-confidence algorithms. In the case of Bernoulli distributions, we show that the complexity of fixed-budget setting is smaller than that of fixed-confidence setting and that uniform sampling of both alternatives -though not optimal- is advisable in practice when combined with an appropriate stop** criterion. △ Less

Submitted 24 February, 2015; v1 submitted 13 May, 2014; originally announced May 2014.

Journal ref: Conference on Learning Theory, Jun 2014, Barcelona, Spain. JMLR: Workshop and Conference Proceedings, 35, pp.461-481

arXiv:1403.3758 [pdf, other]

Big Data Analytics - Retour vers le Futur 3; De Statisticien à Data Scientist

Authors: Philippe Besse, Aurélien Garivier, Jean-Michel Loubes

Abstract: The rapid evolution of information systems managing more and more voluminous data has caused profound paradigm shifts in the job of statistician, becoming successively data miner, bioinformatician and now data scientist. Without the sake of completeness and after having illustrated these successive mutations, this article briefly introduced the new research issues that quickly rise in Statistics,… ▽ More The rapid evolution of information systems managing more and more voluminous data has caused profound paradigm shifts in the job of statistician, becoming successively data miner, bioinformatician and now data scientist. Without the sake of completeness and after having illustrated these successive mutations, this article briefly introduced the new research issues that quickly rise in Statistics, and more generally in Mathematics, in order to integrate the characteristics: volume, variety and velocity, of big data. △ Less

Submitted 21 May, 2014; v1 submitted 15 March, 2014; originally announced March 2014.

Comments: in French

arXiv:1309.3376 [pdf, ps, other]

doi 10.1109/ITW.2013.6691311

Informational Confidence Bounds for Self-Normalized Averages and Applications

Authors: Aurélien Garivier

Abstract: We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations. The results rely on a peeling argument in exponential martingale techniques that represents an alternative to the method of mixture. The motivating examples of bandit problems and context tree estimation are detailed. We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations. The results rely on a peeling argument in exponential martingale techniques that represents an alternative to the method of mixture. The motivating examples of bandit problems and context tree estimation are detailed. △ Less

Submitted 13 September, 2013; originally announced September 2013.

ACM Class: G.3

Journal ref: 2013 IEEE Information Theory Workshop p.489-493

arXiv:1210.1136 [pdf, ps, other]

doi 10.1214/13-AOS1119

Kullback-Leibler upper confidence bounds for optimal sequential allocation

Authors: Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz

Abstract: We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148-177], based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this… ▽ More We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins [J. R. Stat. Soc. Ser. B Stat. Methodol. 41 (1979) 148-177], based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: the kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins [Adv. in Appl. Math. 6 (1985) 4-22] and Burnetas and Katehakis [Adv. in Appl. Math. 17 (1996) 122-142], respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art. △ Less

Submitted 26 August, 2013; v1 submitted 3 October, 2012; originally announced October 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1119 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1119

Journal ref: Annals of Statistics 2013, Vol. 41, No. 3, 1516-1541

arXiv:1202.2945 [pdf, ps, other]

doi 10.1214/10-AAP735

Sequential Monte Carlo smoothing for general state space hidden Markov models

Authors: Randal Douc, Aurélien Garivier, Eric Moulines, Jimmy Olsson

Abstract: Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models. The aim of this paper is to provide a foundation of particle-based approximation of such distributions and to analyze, in a common unifying framework, different schemes producing such approximations.… ▽ More Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models. The aim of this paper is to provide a foundation of particle-based approximation of such distributions and to analyze, in a common unifying framework, different schemes producing such approximations. In this setting, general convergence results, including exponential deviation inequalities and central limit theorems, are established. In particular, time uniform bounds on the marginal smoothing error are obtained under appropriate mixing conditions on the transition kernel of the latent chain. In addition, we propose an algorithm approximating the joint smoothing distribution at a cost that grows only linearly with the number of particles. △ Less

Submitted 14 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/10-AAP735 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1012.4183 by other authors

Report number: IMS-AAP-AAP735

Journal ref: Annals of Applied Probability 2011, Vol. 21, No. 6, 2109-2145

arXiv:1111.2191 [pdf, other]

Oracle approach and slope heuristic in context tree estimation

Authors: A. Garivier, M. Lerasle

Abstract: We introduce a general approach to prove oracle properties in context tree selection. The results derive from a concentration condition that is verified, for example, by mixing processes. Moreover, we show the superiority of the oracle approach from a non-asymptotic point of view in simulations where the classical BIC estimator has nice oracle properties even when it does not recover the source.… ▽ More We introduce a general approach to prove oracle properties in context tree selection. The results derive from a concentration condition that is verified, for example, by mixing processes. Moreover, we show the superiority of the oracle approach from a non-asymptotic point of view in simulations where the classical BIC estimator has nice oracle properties even when it does not recover the source. Our second objective is to extend the slope algorithm of \cite{AM08} to context tree estimation. The algorithm gives a practical way to evaluate the leading constant in front of the penalties. We study the slope heuristic underlying this algorithm and obtain the first results on the slope phenomenon in a discrete, non i.i.d framework. We illustrate in simulations the improvement of the oracle properties of BIC estimators by the slope algorithm. △ Less

Submitted 9 November, 2011; originally announced November 2011.

Comments: 51 pages, 10 figures

arXiv:1110.5447 [pdf, ps, other]

Optimal discovery with probabilistic expert advice

Authors: Sébastien Bubeck, Damien Ernst, Aurélien Garivier

Abstract: We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turing missing mass estimator. We show that this strategy uniformly attains the optimal discovery rate in a macroscopic limit sense, under some assumptions… ▽ More We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turing missing mass estimator. We show that this strategy uniformly attains the optimal discovery rate in a macroscopic limit sense, under some assumptions on the probabilistic experts. We also provide numerical experiments suggesting that this optimal behavior may still hold under weaker assumptions. △ Less

Submitted 25 October, 2011; originally announced October 2011.

MSC Class: 93E35

arXiv:1106.5971 [pdf, ps, other]

Perfect Simulation Of Processes With Long Memory: A `Coupling Into And From The Past' Algorithm

Authors: Aurélien Garivier

Abstract: We describe a new algorithm for the perfect simulation of variable length Markov chains and random systems with perfect connections. This algorithm, which generalizes Propp and Wilson's simulation scheme, is based on the idea of coupling into and from the past. It improves on existing algorithms by relaxing the conditions on the kernel and by accelerating convergence, even in the simple case of fi… ▽ More We describe a new algorithm for the perfect simulation of variable length Markov chains and random systems with perfect connections. This algorithm, which generalizes Propp and Wilson's simulation scheme, is based on the idea of coupling into and from the past. It improves on existing algorithms by relaxing the conditions on the kernel and by accelerating convergence, even in the simple case of finite order Markov chains. Although chains of variable or infinite order have been widely investigated for decades, their use in applied probability, from information theory to bio-informatics and linguistics, has recently led to considerable renewed interest. △ Less

Submitted 14 October, 2013; v1 submitted 29 June, 2011; originally announced June 2011.

Comments: 22 pages, 8 figures

MSC Class: 60J22 ACM Class: G.3

arXiv:1102.2490 [pdf, ps, other]

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

Authors: Aurélien Garivier, Olivier Cappé

Abstract: This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Furthermore, we… ▽ More This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Furthermore, we show that simple adaptations of the KL-UCB algorithm are also optimal for specific classes of (possibly unbounded) rewards, including those generated from exponential families of distributions. A large-scale numerical study comparing KL-UCB with its main competitors (UCB, UCB2, UCB-Tuned, UCB-V, DMED) shows that KL-UCB is remarkably efficient and stable, including for short time horizons. KL-UCB is also the only method that always performs better than the basic UCB policy. Our regret bounds rely on deviations results of independent interest which are stated and proved in the Appendix. As a by-product, we also obtain an improved regret bound for the standard UCB algorithm. △ Less

Submitted 29 August, 2013; v1 submitted 12 February, 2011; originally announced February 2011.

Comments: 18 pages, 3 figures; Conf. Comput. Learning Theory (COLT) 2011 in Budapest, Hungary

MSC Class: 93E35

Journal ref: Conference On Learning Theory n°24 Jul. 2011 pp.359-376

arXiv:1102.0673 [pdf, ps, other]

doi 10.1016/j.spa.2011.06.012

Joint estimation of intersecting context tree models

Authors: Antonio Galves, Aurélien Garivier, Elisabeth Gassiat

Abstract: We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and share many of their contexts. In order to understand the differences between the two sources, it is important to identify which contexts and which transition pro… ▽ More We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and share many of their contexts. In order to understand the differences between the two sources, it is important to identify which contexts and which transition probabilities are specific to each source. We consider a class of probabilistic context tree models with three types of contexts: those which appear in one, the other, or both sources. We use a BIC penalized maximum likelihood procedure that jointly estimates the two sources. We propose a new algorithm which efficiently computes the estimated context trees. We prove that the procedure is strongly consistent. We also present a simulation study showing the practical advantage of our procedure over a procedure that works separately on each dataset. △ Less

Submitted 3 October, 2012; v1 submitted 3 February, 2011; originally announced February 2011.

ACM Class: G.3

arXiv:1011.2424 [pdf, ps, other]

Context Tree Selection: A Unifying View

Authors: Aurélien Garivier, Florencia Leonardi

Abstract: The present paper investigates non-asymptotic properties of two popular procedures of context tree (or Variable Length Markov Chains) estimation: Rissanen's algorithm Context and the Penalized Maximum Likelihood criterion. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory… ▽ More The present paper investigates non-asymptotic properties of two popular procedures of context tree (or Variable Length Markov Chains) estimation: Rissanen's algorithm Context and the Penalized Maximum Likelihood criterion. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory conditions are required: the proof relies on new deviation inequalities for empirical probabilities of independent interest. The underestimation properties rely on loss-of-memory and separation conditions of the process. These results improve and generalize the bounds obtained previously. Context tree models have been introduced by Rissanen as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. △ Less

Submitted 29 June, 2011; v1 submitted 10 November, 2010; originally announced November 2010.

arXiv:1004.5229 [pdf, ps, other]

doi 10.1109/ALLERTON.2010.5706896

Optimism in Reinforcement Learning and Kullback-Leibler Divergence

Authors: Sarah Filippi, Olivier Cappé, Aurélien Garivier

Abstract: We consider model-based reinforcement learning in finite Markov De- cision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value it- erations under a constraint of consistency with the estimated model tran- sition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recen… ▽ More We consider model-based reinforcement learning in finite Markov De- cision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value it- erations under a constraint of consistency with the estimated model tran- sition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recently been shown to guarantee near-optimal regret bounds. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, for solving KL-optimistic extended value iteration. Using recent deviation bounds on the KL divergence, we prove that KL-UCRL provides the same guarantees as UCRL2 in terms of regret. However, numerical experiments on classical benchmarks show a significantly improved behavior, particularly when the MDP has reduced connectivity. To support this observation, we provide elements of com- parison between the two algorithms based on geometric considerations. △ Less

Submitted 13 October, 2010; v1 submitted 29 April, 2010; originally announced April 2010.

Comments: This work has been accepted and presented at ALLERTON 2010; Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, Monticello (Illinois) : États-Unis (2010)

arXiv:0904.0316 [pdf, ps, other]

On the Forward Filtering Backward Smoothing particle approximations of the smoothing distribution in general state spaces models

Authors: Randal Douc, Aurelien Garivier, Eric Moulines, Jimmy Olsson

Abstract: A prevalent problem in general state-space models is the approximation of the smoothing distribution of a state, or a sequence of states, conditional on the observations from the past, the present, and the future. The aim of this paper is to provide a rigorous foundation for the calculation, or approximation, of such smoothed distributions, and to analyse in a common unifying framework different… ▽ More A prevalent problem in general state-space models is the approximation of the smoothing distribution of a state, or a sequence of states, conditional on the observations from the past, the present, and the future. The aim of this paper is to provide a rigorous foundation for the calculation, or approximation, of such smoothed distributions, and to analyse in a common unifying framework different schemes to reach this goal. Through a cohesive and generic exposition of the scientific literature we offer several novel extensions allowing to approximate joint smoothing distribution in the most general case with a cost growing linearly with the number of particles. △ Less

Submitted 2 April, 2009; originally announced April 2009.

MSC Class: 60G10; 60K35; 60G18

arXiv:0805.3415 [pdf, ps, other]

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Authors: Aurélien Garivier, Eric Moulines

Abstract: Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not change in time, Upper-Confidence Bound (UCB) policies have been shown to be rate optimal. A challenging variant of the MABP is the non-stationary bandit problem… ▽ More Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not change in time, Upper-Confidence Bound (UCB) policies have been shown to be rate optimal. A challenging variant of the MABP is the non-stationary bandit problem where the gambler must decide which arm to play while facing the possibility of a changing environment. In this paper, we consider the situation where the distributions of rewards remain constant over epochs and change at unknown time instants. We analyze two algorithms: the discounted UCB and the sliding-window UCB. We establish for these two algorithms an upper-bound for the expected regret by upper-bounding the expectation of the number of times a suboptimal arm is played. For that purpose, we derive a Hoeffding type inequality for self normalized deviations with a random number of summands. We establish a lower-bound for the regret in presence of abrupt changes in the arms reward distributions. We show that the discounted UCB and the sliding-window UCB both match the lower-bound up to a logarithmic factor. △ Less

Submitted 22 May, 2008; originally announced May 2008.

Comments: 24 pages

arXiv:0801.2456 [pdf, ps, other]

doi 10.1109/TIT.2008.2008150

Coding on countably infinite alphabets

Authors: Stéphane Boucheron, Aurélien Garivier, Elisabeth Gassiat

Abstract: This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax red… ▽ More This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend ourknowledge concerning universal coding to contexts where the key tools from parametric inference △ Less

Submitted 16 January, 2008; originally announced January 2008.

Comments: 33 pages

MSC Class: 62B10; 68P30; 94A29

Journal ref: Information Theory, IEEE Transactions on (Volume:55 , Issue: 1 ) 358 - 373 Jan. 2009

Showing 1–32 of 32 results for author: Garivier, A