Search | arXiv e-print repository

Hebbian learning inspired estimation of the linear regression parameters from queries

Authors: Johannes Schmidt-Hieber, Wouter M Koolen

Abstract: Local learning rules in biological neural networks (BNNs) are commonly referred to as Hebbian learning. [26] links a biologically motivated Hebbian learning rule to a specific zeroth-order optimization method. In this work, we study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model. Zeroth-order optimization methods are known to converge with… ▽ More Local learning rules in biological neural networks (BNNs) are commonly referred to as Hebbian learning. [26] links a biologically motivated Hebbian learning rule to a specific zeroth-order optimization method. In this work, we study a variation of this Hebbian learning rule to recover the regression vector in the linear regression model. Zeroth-order optimization methods are known to converge with suboptimal rate for large parameter dimension compared to first-order methods like gradient descent, and are therefore thought to be in general inferior. By establishing upper and lower bounds, we show, however, that such methods achieve near-optimal rates if only queries of the linear regression loss are available. Moreover, we prove that this Hebbian learning rule can achieve considerably faster rates than any non-adaptive method that selects the queries independently of the data. △ Less

Submitted 26 September, 2023; originally announced November 2023.

Comments: 34 pages

MSC Class: Primary: 62L20; secondary: 62J05

arXiv:2304.12768 [pdf, ps, other]

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Authors: Hédi Hadiji, Sarah Sachs, Tim van Erven, Wouter M. Koolen

Abstract: In the first-order query model for zero-sum $K\times K$ matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent. This classical model has received renewed interest after the discovery by Rakhlin and Sridharan that $ε$-approximate Nash equilibria can be computed efficiently from $O(\frac{\ln K}ε)$ instead of… ▽ More In the first-order query model for zero-sum $K\times K$ matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent. This classical model has received renewed interest after the discovery by Rakhlin and Sridharan that $ε$-approximate Nash equilibria can be computed efficiently from $O(\frac{\ln K}ε)$ instead of $O(\frac{\ln K}{ε^2})$ queries. Surprisingly, the optimal number of such queries, as a function of both $ε$ and $K$, is not known. We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($ε=0$), by showing that they require a number of queries that is linear in $K$, which means that it is essentially as hard as querying the whole matrix, which can also be done with $K$ queries. Second, for $ε> 0$, the current query complexity upper bound stands at $O(\min(\frac{\ln(K)}ε , K))$. We argue that, unfortunately, obtaining a matching lower bound is not possible with existing techniques: we prove that no lower bound can be derived by constructing hard matrices whose entries take values in a known countable set, because such matrices can be fully identified by a single query. This rules out, for instance, reducing to an optimization problem over the hypercube by encoding it as a binary payoff matrix. We then introduce a new technique for lower bounds, which allows us to obtain lower bounds of order $\tildeΩ(\log(\frac{1}{Kε})$ for any $ε\leq 1 / (cK^4)$, where $c$ is a constant independent of $K$. We further discuss possible future directions to improve on our techniques in order to close the gap with the upper bounds. △ Less

Submitted 2 November, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2203.04485 [pdf, ps, other]

A composite generalization of Ville's martingale theorem

Authors: Johannes Ruf, Martin Larsson, Wouter M. Koolen, Aaditya Ramdas

Abstract: We provide a composite version of Ville's theorem that an event has zero measure if and only if there exists a nonnegative martingale which explodes to infinity when that event occurs. This is a classic result connecting measure-theoretic probability to the sequence-by-sequence game-theoretic probability, recently developed by Shafer and Vovk. Our extension of Ville's result involves appropriate c… ▽ More We provide a composite version of Ville's theorem that an event has zero measure if and only if there exists a nonnegative martingale which explodes to infinity when that event occurs. This is a classic result connecting measure-theoretic probability to the sequence-by-sequence game-theoretic probability, recently developed by Shafer and Vovk. Our extension of Ville's result involves appropriate composite generalizations of nonnegative martingales and measure-zero events: these are respectively provided by ``e-processes'', and a new inverse capital outer measure. We then develop a novel line-crossing inequality for sums of random variables which are only required to have a finite first moment, which we use to prove a composite version of the strong law of large numbers (SLLN). This allows us to show that violation of the SLLN is an event of outer measure zero and that our e-process explodes to infinity on every such violating sequence, while this is provably not achievable with a nonnegative (super)martingale. △ Less

Submitted 3 May, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 21 pages

arXiv:2110.15573 [pdf, other]

A/B/n Testing with Control in the Presence of Subpopulations

Authors: Yoan Russac, Christina Katsimerou, Dennis Bohle, Olivier Cappé, Aurélien Garivier, Wouter Koolen

Abstract: Motivated by A/B/n testing applications, we consider a finite set of distributions (called \emph{arms}), one of which is treated as a \emph{control}. We assume that the population is stratified into homogeneous subpopulations. At every time step, a subpopulation is sampled and an arm is chosen: the resulting observation is an independent draw from the arm conditioned on the subpopulation. The qual… ▽ More Motivated by A/B/n testing applications, we consider a finite set of distributions (called \emph{arms}), one of which is treated as a \emph{control}. We assume that the population is stratified into homogeneous subpopulations. At every time step, a subpopulation is sampled and an arm is chosen: the resulting observation is an independent draw from the arm conditioned on the subpopulation. The quality of each arm is assessed through a weighted combination of its subpopulation means. We propose a strategy for sequentially choosing one arm per time step so as to discover as fast as possible which arms, if any, have higher weighted expectation than the control. This strategy is shown to be asymptotically optimal in the following sense: if $τ_δ$ is the first time when the strategy ensures that it is able to output the correct answer with probability at least $1-δ$, then $\mathbb{E}[τ_δ]$ grows linearly with $\log(1/δ)$ at the exact optimal rate. This rate is identified in the paper in three different settings: (1) when the experimenter does not observe the subpopulation information, (2) when the subpopulation of each sample is observed but not chosen, and (3) when the experimenter can select the subpopulation from which each response is sampled. We illustrate the efficiency of the proposed strategy with numerical simulations on synthetic and real data collected from an A/B/n experiment. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Journal ref: NeurIPS 2021, Dec 2021, Virtual, France

arXiv:2107.01881 [pdf, ps, other]

Robust Online Convex Optimization in the Presence of Outliers

Authors: Tim van Erven, Sarah Sachs, Wouter M. Koolen, Wojciech Kotłowski

Abstract: We consider online convex optimization when a number k of data points are outliers that may be corrupted. We model this by introducing the notion of robust regret, which measures the regret only on rounds that are not outliers. The aim for the learner is to achieve small robust regret, without knowing where the outliers are. If the outliers are chosen adversarially, we show that a simple filtering… ▽ More We consider online convex optimization when a number k of data points are outliers that may be corrupted. We model this by introducing the notion of robust regret, which measures the regret only on rounds that are not outliers. The aim for the learner is to achieve small robust regret, without knowing where the outliers are. If the outliers are chosen adversarially, we show that a simple filtering strategy on extreme gradients incurs O(k) additive overhead compared to the usual regret bounds, and that this is unimprovable, which means that k needs to be sublinear in the number of rounds. We further ask which additional assumptions would allow for a linear number of outliers. It turns out that the usual benign cases of independently, identically distributed (i.i.d.) observations or strongly convex losses are not sufficient. However, combining i.i.d. observations with the assumption that outliers are those observations that are in an extreme quantile of the distribution, does lead to sublinear robust regret, even though the expected number of outliers is linear. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Journal ref: Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:4174-4194, 2021

arXiv:2102.06622 [pdf, other]

MetaGrad: Adaptation using Multiple Learning Rates in Online Learning

Authors: Tim van Erven, Wouter M. Koolen, Dirk van der Hoeven

Abstract: We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is kn… ▽ More We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. We prove this by drawing a connection to the Bernstein condition, which is known to imply fast rates in offline statistical learning. MetaGrad further adapts automatically to the size of the gradients. Its main feature is that it simultaneously considers multiple learning rates, which are weighted directly proportional to their empirical performance on the data using a new meta-algorithm. We provide three versions of MetaGrad. The full matrix version maintains a full covariance matrix and is applicable to learning tasks for which we can afford update time quadratic in the dimension. The other two versions provide speed-ups for high-dimensional learning tasks with an update time that is linear in the dimension: one is based on sketching, the other on running a separate copy of the basic algorithm per coordinate. We evaluate all versions of MetaGrad on benchmark online classification and regression tasks, on which they consistently outperform both online gradient descent and AdaGrad. △ Less

Submitted 30 August, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Journal ref: Journal of Machine Learning Research 22(161):1-61, 2021

arXiv:2102.03734 [pdf, other]

Regret Minimization in Heavy-Tailed Bandits

Authors: Shubhada Agrawal, Sandeep Juneja, Wouter M. Koolen

Abstract: We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of orde… ▽ More We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+ε)$ are uniformly bounded by a known constant B, for some given $ε> 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost. △ Less

Submitted 7 February, 2021; originally announced February 2021.

Comments: 35 pages, 2 figures

arXiv:2102.00630 [pdf, other]

Testing exchangeability: fork-convexity, supermartingales, and e-processes

Authors: Aaditya Ramdas, Johannes Ruf, Martin Larsson, Wouter Koolen

Abstract: Suppose we observe an infinite series of coin flips $X_1,X_2,\ldots$, and wish to sequentially test the null that these binary random variables are exchangeable. Nonnegative supermartingales (NSMs) are a workhorse of sequential inference, but we prove that they are powerless for this problem. First, utilizing a geometric concept called fork-convexity (a sequential analog of convexity), we show tha… ▽ More Suppose we observe an infinite series of coin flips $X_1,X_2,\ldots$, and wish to sequentially test the null that these binary random variables are exchangeable. Nonnegative supermartingales (NSMs) are a workhorse of sequential inference, but we prove that they are powerless for this problem. First, utilizing a geometric concept called fork-convexity (a sequential analog of convexity), we show that any process that is an NSM under a set of distributions, is also necessarily an NSM under their "fork-convex hull". Second, we demonstrate that the fork-convex hull of the exchangeable null consists of all possible laws over binary sequences; this implies that any NSM under exchangeability is necessarily nonincreasing, hence always yields a powerless test for any alternative. Since testing arbitrary deviations from exchangeability is information theoretically impossible, we focus on Markovian alternatives. We combine ideas from universal inference and the method of mixtures to derive a "safe e-process", which is a nonnegative process with expectation at most one under the null at any stop** time, and is upper bounded by a martingale, but is not itself an NSM. This in turn yields a level $α$ sequential test that is consistent; regret bounds from universal coding also demonstrate rate-optimal power. We present ways to extend these results to any finite alphabet and to Markovian alternatives of any order using a "double mixture" approach. We provide an array of simulations, and give general approaches based on betting for unstructured or ill-specified alternatives. Finally, inspired by Shafer, Vovk, and Ville, we provide game-theoretic interpretations of our e-processes and pathwise results. △ Less

Submitted 23 July, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

Comments: 34 pages, 7 figures, accepted at the International Journal of Approximate Reasoning

arXiv:2008.07606 [pdf, other]

Optimal Best-Arm Identification Methods for Tail-Risk Measures

Authors: Shubhada Agrawal, Wouter M. Koolen, Sandeep Juneja

Abstract: Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely m… ▽ More Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely many that has the smallest CVaR, VaR, or weighted sum of CVaR and mean. The latter captures the risk-return trade-off common in finance. Our main contribution is an optimal $δ$-correct algorithm that acts on general arms, including heavy-tailed distributions, and matches the lower bound on the expected number of samples needed, asymptotically (as $δ$ approaches $0$). The algorithm requires solving a non-convex optimization problem in the space of probability measures, that requires delicate analysis. En-route, we develop new non-asymptotic empirical likelihood-based concentration inequalities for tail-risk measures which are tighter than those for popular truncation-based empirical estimators. △ Less

Submitted 21 June, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: 55 pages, 4 figures

arXiv:2007.00969 [pdf, other]

Structure Adaptive Algorithms for Stochastic Bandits

Authors: Rémy Degenne, Han Shao, Wouter M. Koolen

Abstract: We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are flexible (in that they easily adapt to different structures), powerful (in that they perform well empirically and/or provably match instance-dependent l… ▽ More We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are flexible (in that they easily adapt to different structures), powerful (in that they perform well empirically and/or provably match instance-dependent lower bounds) and efficient in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: 10+18 pages. To be published in the proceedings of ICML 2020

arXiv:2002.12242 [pdf, other]

Lipschitz and Comparator-Norm Adaptivity in Online Learning

Authors: Zakaria Mhammedi, Wouter M. Koolen

Abstract: We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients se… ▽ More We study Online Convex Optimization in the unbounded setting where neither predictions nor gradient are constrained. The goal is to simultaneously adapt to both the sequence of gradients and the comparator. We first develop parameter-free and scale-free algorithms for a simplified setting with hints. We present two versions: the first adapts to the squared norms of both comparator and gradients separately using $O(d)$ time per round, the second adapts to their squared inner products (which measure variance only in the comparator direction) in time $O(d^3)$ per round. We then generalize two prior reductions to the unbounded setting; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work). We discuss their optimality in light of prior and new lower bounds. We apply our methods to obtain sharper regret bounds for scale-invariant online prediction with linear models. △ Less

Submitted 8 August, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: 30 Pages, 1 Figure

arXiv:1906.10431 [pdf, other]

Non-Asymptotic Pure Exploration by Solving Games

Authors: Rémy Degenne, Wouter M. Koolen, Pierre Ménard

Abstract: Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of… ▽ More Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem. △ Less

Submitted 25 June, 2019; originally announced June 2019.

arXiv:1906.07801 [pdf, other]

Safe Testing

Authors: Peter Grünwald, Rianne de Heide, Wouter Koolen

Abstract: We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define grow… ▽ More We develop the theory of hypothesis testing based on the e-value, a notion of evidence that, unlike the p-value, allows for effortlessly combining results from several studies in the common scenario where the decision to perform a new study may depend on previous outcomes. Tests based on e-values are safe, i.e. they preserve Type-I error guarantees, under such optional continuation. We define growth-rate optimality (GRO) as an analogue of power in an optional continuation context, and we show how to construct GRO e-variables for general testing problems with composite null and alternative, emphasizing models with nuisance parameters. GRO e-values take the form of Bayes factors with special priors. We illustrate the theory using several classic examples including a one-sample safe t-test and the 2 x 2 contingency table. Sharing Fisherian, Neymanian and Jeffreys-Bayesian interpretations, e-values may provide a methodology acceptable to adherents of all three schools. △ Less

Submitted 10 March, 2023; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Accepted as discussion paper to the Journal of the Royal Statistical Society series B

arXiv:1902.10797 [pdf, ps, other]

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

Authors: Zakaria Mhammedi, Wouter M. Koolen, Tim van Erven

Abstract: We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A r… ▽ More We aim to design adaptive online learning algorithms that take advantage of any special structure that might be present in the learning task at hand, with as little manual tuning by the user as possible. A fundamental obstacle that comes up in the design of such adaptive algorithms is to calibrate a so-called step-size or learning rate hyperparameter depending on variance, gradient norms, etc. A recent technique promises to overcome this difficulty by maintaining multiple learning rates in parallel. This technique has been applied in the MetaGrad algorithm for online convex optimization and the Squint algorithm for prediction with expert advice. However, in both cases the user still has to provide in advance a Lipschitz hyperparameter that bounds the norm of the gradients. Although this hyperparameter is typically not available in advance, tuning it correctly is crucial: if it is set too small, the methods may fail completely; but if it is taken too large, performance deteriorates significantly. In the present work we remove this Lipschitz hyperparameter by designing new versions of MetaGrad and Squint that adapt to its optimal value automatically. We achieve this by dynamically updating the set of active learning rates. For MetaGrad, we further improve the computational efficiency of handling constraints on the domain of prediction, and we remove the need to specify the number of rounds in advance. △ Less

Submitted 30 May, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

Comments: 22 pages. To appear in COLT 2019

arXiv:1902.03475 [pdf, other]

Pure Exploration with Multiple Correct Answers

Authors: Rémy Degenne, Wouter M. Koolen

Abstract: We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. W… ▽ More We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound. △ Less

Submitted 9 February, 2019; originally announced February 2019.

arXiv:1811.11419 [pdf, other]

Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals

Authors: Emilie Kaufmann, Wouter Koolen

Abstract: This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and may take into account several arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and… ▽ More This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and may take into account several arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stop** rules based on generalized likelihood ratios for a large class of sequential identification problems, and to construct tight confidence intervals for some functions of the means of the arms. △ Less

Submitted 8 December, 2021; v1 submitted 28 November, 2018; originally announced November 2018.

Journal ref: Journal of Machine Learning Research, Microtome Publishing, 2021

arXiv:1806.00973 [pdf, other]

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Authors: Emilie Kaufmann, Wouter Koolen, Aurelien Garivier

Abstract: Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates ver… ▽ More Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stop** rules. We complement our theoretical guarantees by experiments showing that MS works best in practice. △ Less

Submitted 4 June, 2018; originally announced June 2018.

arXiv:1706.02986 [pdf, ps, other]

Monte-Carlo Tree Search by Best Arm Identification

Authors: Emilie Kaufmann, Wouter Koolen

Abstract: Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems. We study the game tree search problem, where the goal is to quickly identify the optimal move in a given game tree by sequentially sampling its stochastic payoffs. We develop new algorithms for trees of arbitrary… ▽ More Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems. We study the game tree search problem, where the goal is to quickly identify the optimal move in a given game tree by sequentially sampling its stochastic payoffs. We develop new algorithms for trees of arbitrary depth, that operate by summarizing all deeper levels of the tree into confidence intervals at depth one, and applying a best arm identification procedure at the root. We prove new sample complexity guarantees with a refined dependence on the problem instance. We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees. △ Less

Submitted 6 November, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

Comments: Advances in Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, United States

arXiv:1605.06439 [pdf, ps, other]

Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning

Authors: Wouter M. Koolen, Peter Grünwald, Tim van Erven

Abstract: We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a… ▽ More We consider online learning algorithms that guarantee worst-case regret rates in adversarial environments (so they can be deployed safely and will perform robustly), yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance). We quantify the friendliness of stochastic environments by means of the well-known Bernstein (a.k.a. generalized Tsybakov margin) condition. For two recent algorithms (Squint for the Hedge setting and MetaGrad for online convex optimization) we show that the particular form of their data-dependent individual-sequence regret guarantees implies that they adapt automatically to the Bernstein parameters of the stochastic environment. We prove that these algorithms attain fast rates in their respective settings both in expectation and with high probability. △ Less

Submitted 20 May, 2016; originally announced May 2016.

Journal ref: Advances in Neural Information Processing Systems 29 (NeurIPS), 4457-4465, 2016

arXiv:1604.08740 [pdf, ps, other]

MetaGrad: Multiple Learning Rates in Online Learning

Authors: Tim van Erven, Wouter M. Koolen

Abstract: In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions.… ▽ More In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution. MetaGrad's main feature is that it simultaneously considers multiple learning rates. Unlike previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret. Instead, they are weighted directly proportional to their empirical performance on the data using a tilted exponential weights master algorithm. △ Less

Submitted 1 November, 2016; v1 submitted 29 April, 2016; originally announced April 2016.

Journal ref: Advances in Neural Information Processing Systems 29 (NeurIPS), 3666-3674, 2016

arXiv:1603.04190 [pdf, ps, other]

Online Isotonic Regression

Authors: Wojciech Kotłowski, Wouter M. Koolen, Alan Malek

Abstract: We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that non… ▽ More We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by $O\big(T^{1/3} \log^{2/3}(T)\big)$ and present a matching $Ω(T^{1/3})$ lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to $O(\log T)$ or even to $O(1)$ (when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss. △ Less

Submitted 7 October, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

Comments: 25 pages

arXiv:1602.04676 [pdf, ps, other]

Maximin Action Identification: A New Bandit Framework for Games

Authors: Aurélien Garivier, Emilie Kaufmann, Wouter Koolen

Abstract: We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which ope… ▽ More We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1512.03223 [pdf, other]

doi 10.1016/j.ijar.2016.03.001

Robust Probability Updating

Authors: Thijs van Ommen, Wouter M. Koolen, Thijs E. Feenstra, Peter D. Grünwald

Abstract: This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution.… ▽ More This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution. This paper thus addresses a generalization of that problem to arbitrary distributions on finite outcome spaces, arbitrary sets of `messages', and (almost) arbitrary loss functions, and provides existence and characterization theorems for robust probability updating strategies. We find that for logarithmic loss, optimality is characterized by an elegant condition, which we call RCAR (reverse coarsening at random). Under certain conditions, the same condition also characterizes optimality for a much larger class of loss functions, and we obtain an objective and general answer to how one should update probabilities in the light of new information. △ Less

Submitted 2 May, 2016; v1 submitted 10 December, 2015; originally announced December 2015.

Comments: 47 pages, 4 figures. This second version is the accepted manuscript: it incorporates reviewer comments and has a new title

Journal ref: International Journal of Approximate Reasoning 74 (2016) 30-57

arXiv:1502.08009 [pdf, ps, other]

Second-order Quantile Methods for Experts and Combinatorial Games

Authors: Wouter M. Koolen, Tim van Erven

Abstract: We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to… ▽ More We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of 'easy data', which may be paraphrased as "the learning problem has small variance" and "multiple decisions are useful", are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. In this paper we outline a new method for obtaining such adaptive algorithms, based on a potential function that aggregates a range of learning rates (which are essential tuning parameters). By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles. △ Less

Submitted 27 February, 2015; originally announced February 2015.

arXiv:1311.6536 [pdf, other]

doi 10.1109/TIT.2013.2273353

Universal Codes from Switching Strategies

Authors: Wouter M. Koolen, Steven de Rooij

Abstract: We discuss algorithms for combining sequential prediction strategies, a task which can be viewed as a natural generalisation of the concept of universal coding. We describe a graphical language based on Hidden Markov Models for defining prediction strategies, and we provide both existing and new models as examples. The models include efficient, parameterless models for switching between the input… ▽ More We discuss algorithms for combining sequential prediction strategies, a task which can be viewed as a natural generalisation of the concept of universal coding. We describe a graphical language based on Hidden Markov Models for defining prediction strategies, and we provide both existing and new models as examples. The models include efficient, parameterless models for switching between the input strategies over time, including a model for the case where switches tend to occur in clusters, and finally a new model for the scenario where the prediction strategies have a known relationship, and where jumps are typically between strongly related ones. This last model is relevant for coding time series data where parameter drift is expected. As theoretical ontributions we introduce an interpolation construction that is useful in the development and analysis of new algorithms, and we establish a new sophisticated lemma for analysing the individual sequence regret of parameterised models. △ Less

Submitted 25 November, 2013; originally announced November 2013.

Journal ref: IEEE Transactions on Information Theory, 59(11):7168-7185, November 2013

arXiv:1301.0534 [pdf, ps, other]

Follow the Leader If You Can, Hedge If You Must

Authors: Steven de Rooij, Tim van Erven, Peter D. Grünwald, Wouter M. Koolen

Abstract: Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combi… ▽ More Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As part of our construction, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by Cesa-Bianchi, Mansour and Stoltz (2007), yielding slightly improved worst-case guarantees. By interleaving AdaHedge and FTL, the FlipFlop algorithm achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge's worst-case guarantees. AdaHedge and FlipFlop do not need to know the range of the losses in advance; moreover, unlike earlier methods, both have the intuitive property that the issued weights are invariant under rescaling and translation of the losses. The losses are also allowed to be negative, in which case they may be interpreted as gains. △ Less

Submitted 17 January, 2013; v1 submitted 3 January, 2013; originally announced January 2013.

Comments: under submission

Journal ref: Journal of Machine Learning Research 15(37):1281-1316, 2014

arXiv:1008.4654 [pdf, other]

Freezing and Slee**: Tracking Experts that Learn by Evolving Past Posteriors

Authors: Wouter M. Koolen, Tim van Erven

Abstract: A problem posed by Freund is how to efficiently track a small pool of experts out of a much larger set. This problem was solved when Bousquet and Warmuth introduced their mixing past posteriors (MPP) algorithm in 2001. In Freund's problem the experts would normally be considered black boxes. However, in this paper we re-examine Freund's problem in case the experts have internal structure that en… ▽ More A problem posed by Freund is how to efficiently track a small pool of experts out of a much larger set. This problem was solved when Bousquet and Warmuth introduced their mixing past posteriors (MPP) algorithm in 2001. In Freund's problem the experts would normally be considered black boxes. However, in this paper we re-examine Freund's problem in case the experts have internal structure that enables them to learn. In this case the problem has two possible interpretations: should the experts learn from all data or only from the subsequence on which they are being tracked? The MPP algorithm solves the first case. Our contribution is to generalise MPP to address the second option. The results we obtain apply to any expert structure that can be formalised using (expert) hidden Markov models. Curiously enough, for our interpretation there are \emph{two} natural reference schemes: freezing and slee**. For each scheme, we provide an efficient prediction strategy and prove the relevant loss bound. △ Less

Submitted 27 August, 2010; originally announced August 2010.

arXiv:1008.4532 [pdf, other]

Switching between Hidden Markov Models using Fixed Share

Authors: Wouter M. Koolen, Tim van Erven

Abstract: In prediction with expert advice the goal is to design online prediction algorithms that achieve small regret (additional loss on the whole data) compared to a reference scheme. In the simplest such scheme one compares to the loss of the best expert in hindsight. A more ambitious goal is to split the data into segments and compare to the best expert on each segment. This is appropriate if the natu… ▽ More In prediction with expert advice the goal is to design online prediction algorithms that achieve small regret (additional loss on the whole data) compared to a reference scheme. In the simplest such scheme one compares to the loss of the best expert in hindsight. A more ambitious goal is to split the data into segments and compare to the best expert on each segment. This is appropriate if the nature of the data changes between segments. The standard fixed-share algorithm is fast and achieves small regret compared to this scheme. Fixed share treats the experts as black boxes: there are no assumptions about how they generate their predictions. But if the experts are learning, the following question arises: should the experts learn from all data or only from data in their own segment? The original algorithm naturally addresses the first case. Here we consider the second option, which is more appropriate exactly when the nature of the data changes between segments. In general extending fixed share to this second case will slow it down by a factor of T on T outcomes. We show, however, that no such slowdown is necessary if the experts are hidden Markov models. △ Less

Submitted 26 August, 2010; originally announced August 2010.

arXiv:0809.2965 [pdf, ps, other]

On Time-Bounded Incompressibility of Compressible Strings and Sequences

Authors: E. G. Daylight, W. M. Koolen, P. M. B. Vitanyi

Abstract: For every total recursive time bound $t$, a constant fraction of all compressible (low Kolmogorov complexity) strings is $t$-bounded incompressible (high time-bounded Kolmogorov complexity); there are uncountably many infinite sequences of which every initial segment of length $n$ is compressible to $\log n$ yet $t$-bounded incompressible below ${1/4}n - \log n$; and there are countable infinite… ▽ More For every total recursive time bound $t$, a constant fraction of all compressible (low Kolmogorov complexity) strings is $t$-bounded incompressible (high time-bounded Kolmogorov complexity); there are uncountably many infinite sequences of which every initial segment of length $n$ is compressible to $\log n$ yet $t$-bounded incompressible below ${1/4}n - \log n$; and there are countable infinitely many recursive infinite sequence of which every initial segment is similarly $t$-bounded incompressible. These results are related to, but different from, Barzdins's lemma. △ Less

Submitted 11 August, 2009; v1 submitted 17 September, 2008; originally announced September 2008.

Comments: 9 pages, LaTeX, no figures, submitted to Information Processing Letters. Changed and added a Barzdins-like lemma for infinite sequences with different quantification oreder, a fixed constant, and uncountably many sequences

arXiv:0802.2027 [pdf, other]

Kolmogorov Complexity Theory over the Reals

Authors: Martin Ziegler, Wouter M. Koolen

Abstract: Kolmogorov Complexity constitutes an integral part of computability theory, information theory, and computational complexity theory -- in the discrete setting of bits and Turing machines. Over real numbers, on the other hand, the BSS-machine (aka real-RAM) has been established as a major model of computation. This real realm has turned out to exhibit natural counterparts to many notions and resu… ▽ More Kolmogorov Complexity constitutes an integral part of computability theory, information theory, and computational complexity theory -- in the discrete setting of bits and Turing machines. Over real numbers, on the other hand, the BSS-machine (aka real-RAM) has been established as a major model of computation. This real realm has turned out to exhibit natural counterparts to many notions and results in classical complexity and recursion theory; although usually with considerably different proofs. The present work investigates similarities and differences between discrete and real Kolmogorov Complexity as introduced by Montana and Pardo (1998). △ Less

Submitted 28 March, 2008; v1 submitted 14 February, 2008; originally announced February 2008.

ACM Class: F.4.1; F.1.1; E.4; I.1.2; I.1.3

arXiv:0802.2015 [pdf, ps, other]

Combining Expert Advice Efficiently

Authors: Wouter Koolen, Steven de Rooij

Abstract: We show how models for prediction with expert advice can be defined concisely and clearly using hidden Markov models (HMMs); standard HMM algorithms can then be used to efficiently calculate, among other things, how the expert predictions should be weighted according to the model. We cast many existing models as HMMs and recover the best known running times in each case. We also describe two new… ▽ More We show how models for prediction with expert advice can be defined concisely and clearly using hidden Markov models (HMMs); standard HMM algorithms can then be used to efficiently calculate, among other things, how the expert predictions should be weighted according to the model. We cast many existing models as HMMs and recover the best known running times in each case. We also describe two new models: the switch distribution, which was recently developed to improve Bayesian/Minimum Description Length model selection, and a new generalisation of the fixed share algorithm based on run-length coding. We give loss bounds for all models and shed new light on their relationships. △ Less

Submitted 15 February, 2008; v1 submitted 14 February, 2008; originally announced February 2008.

Comments: 50 pages

ACM Class: G.3

Showing 1–31 of 31 results for author: Koolen, W