-
Optimal Unimodular Matching
Authors:
Nathanaël Enriquez,
Mike Liu,
Laurent Ménard,
Vianney Perchet
Abstract:
We consider sequences of finite weighted random graphs that converge locally to unimodular i.i.d. weighted random trees. When the weights are atomless, we prove that the matchings of maximal weight converge locally to a matching on the limiting tree. For this purpose, we introduce and study unimodular matchings on weighted unimodular random trees as well as a notion of optimality for these objects…
▽ More
We consider sequences of finite weighted random graphs that converge locally to unimodular i.i.d. weighted random trees. When the weights are atomless, we prove that the matchings of maximal weight converge locally to a matching on the limiting tree. For this purpose, we introduce and study unimodular matchings on weighted unimodular random trees as well as a notion of optimality for these objects. In this context, we prove that, in law, there is a unique optimal unimodular matching for a given unimodular tree. We then prove that this law is the local limit of the sequence of matchings of maximal weight. Along the way, we also show that this law is characterised by an equation derived from a message passing algorithm.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Stochastic Mirror Descent for Large-Scale Sparse Recovery
Authors:
Sasila Ilandarideva,
Yannis Bekri,
Anatoli Juditsky,
Vianney Perchet
Abstract:
In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters. The proposed solution reduces to resolving a penalized stochastic optimization problem on each stage of a multistage algorithm; each problem being solved to a prescribed accuracy by the non-Euclidean Composite Stochastic Mirror Descent (CSMD) algorithm. Assuming that…
▽ More
In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters. The proposed solution reduces to resolving a penalized stochastic optimization problem on each stage of a multistage algorithm; each problem being solved to a prescribed accuracy by the non-Euclidean Composite Stochastic Mirror Descent (CSMD) algorithm. Assuming that the problem objective is smooth and quadratically minorated and stochastic perturbations are sub-Gaussian, our analysis prescribes the method parameters which ensure fast convergence of the estimation error (the radius of a confidence ball of a given norm around the approximate solution). This convergence is linear during the first "preliminary" phase of the routine and is sublinear during the second "asymptotic" phase. We consider an application of the proposed approach to sparse Generalized Linear Regression problem. In this setting, we show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution. We also present a numerical study illustrating the performance of the algorithm on high-dimensional simulation data.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Making the most of your day: online learning for optimal allocation of time
Authors:
Etienne Boursier,
Tristan Garrec,
Vianney Perchet,
Marco Scarsini
Abstract:
We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, sh…
▽ More
We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration. If she rejects it, she remains on hold until a new task proposal arrives. We study the regret incurred by the agent, first when she knows her reward function but does not know the distribution of the task duration, and then when she does not know her reward function, either. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts.
△ Less
Submitted 4 November, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Social Learning in Non-Stationary Environments
Authors:
Etienne Boursier,
Vianney Perchet,
Marco Scarsini
Abstract:
Potential buyers of a product or service, before making their decisions, tend to read reviews written by previous consumers. We consider Bayesian consumers with heterogeneous preferences, who sequentially decide whether to buy an item of unknown quality, based on previous buyers' reviews. The quality is multi-dimensional and may occasionally vary over time; the reviews are also multi-dimensional.…
▽ More
Potential buyers of a product or service, before making their decisions, tend to read reviews written by previous consumers. We consider Bayesian consumers with heterogeneous preferences, who sequentially decide whether to buy an item of unknown quality, based on previous buyers' reviews. The quality is multi-dimensional and may occasionally vary over time; the reviews are also multi-dimensional. In the simple uni-dimensional and static setting, beliefs about the quality are known to converge to its true value. Our paper extends this result in several ways. First, a multi-dimensional quality is considered, second, rates of convergence are provided, third, a dynamical Markovian model with varying quality is studied. In this dynamical setting the cost of learning is shown to be small.
△ Less
Submitted 23 February, 2022; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Online A-Optimal Design and Active Linear Regression
Authors:
Xavier Fontaine,
Pierre Perrault,
Michal Valko,
Vianney Perchet
Abstract:
We consider in this paper the problem of optimal experiment design where a decision maker can choose which points to sample to obtain an estimate $\hatβ$ of the hidden parameter $β^{\star}$ of an underlying linear model. The key challenge of this work lies in the heteroscedasticity assumption that we make, meaning that each covariate has a different and unknown variance. The goal of the decision m…
▽ More
We consider in this paper the problem of optimal experiment design where a decision maker can choose which points to sample to obtain an estimate $\hatβ$ of the hidden parameter $β^{\star}$ of an underlying linear model. The key challenge of this work lies in the heteroscedasticity assumption that we make, meaning that each covariate has a different and unknown variance. The goal of the decision maker is then to figure out on the fly the optimal way to allocate the total budget of $T$ samples between covariates, as sampling several times a specific one will reduce the variance of the estimated model around it (but at the cost of a possible higher variance elsewhere). By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hatβ-β^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design. Combining techniques from bandit and convex optimization we propose a new active sampling algorithm and we compare it with existing ones. We provide theoretical guarantees of this algorithm in different settings, including a $\mathcal{O}(T^{-2})$ regret bound in the case where the covariates form a basis of the feature space, generalizing and improving existing results. Numerical experiments validate our theoretical findings.
△ Less
Submitted 30 December, 2020; v1 submitted 20 June, 2019;
originally announced June 2019.
-
An adaptive stochastic optimization algorithm for resource allocation
Authors:
Xavier Fontaine,
Shie Mannor,
Vianney Perchet
Abstract:
We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the…
▽ More
We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the complexity of the problem, expressed in term of the regularity of the returns of the resources, measured by the exponent in the Łojasiewicz inequality (or by their universal concavity parameter). Our parameter-independent algorithm recovers the optimal rates for strongly-concave functions and the classical fast rates of multi-armed bandit (for linear reward functions). Moreover, the algorithm improves existing results on stochastic optimization in this regret minimization setting for intermediate cases.
△ Less
Submitted 16 January, 2020; v1 submitted 12 February, 2019;
originally announced February 2019.
-
A differential game on Wasserstein space. Application to weak approachability with partial monitoring
Authors:
Vianney Perchet,
Marc Quincampoix
Abstract:
Studying continuous time counterpart of some discrete time dynamics is now a standard and fruitful technique, as some properties hold in both setups. In game theory, this is usually done by considering differential games on Euclidean spaces. This allows to infer properties on the convergence of values of a repeated game, to deal with the various concepts of approachability, etc. In this paper, we…
▽ More
Studying continuous time counterpart of some discrete time dynamics is now a standard and fruitful technique, as some properties hold in both setups. In game theory, this is usually done by considering differential games on Euclidean spaces. This allows to infer properties on the convergence of values of a repeated game, to deal with the various concepts of approachability, etc. In this paper, we introduce a specific but quite abstract differential game defined on the Wasserstein space of probability distributions and we prove the existence of its value. Going back to the discrete time dynamics, we derive results on weak approachability with partial monitoring: we prove that any set satisfying a suitable compatibility condition is either weakly approachable or weakly excludable. We also obtain that the value for differential games with nonanticipative strategies is the same that those defined with a new concept of strategies very suitable to make links with repeated games.
△ Less
Submitted 12 November, 2018;
originally announced November 2018.
-
Regularized Contextual Bandits
Authors:
Xavier Fontaine,
Quentin Berthet,
Vianney Perchet
Abstract:
We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy which is known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously - and independently - reg…
▽ More
We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy which is known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously - and independently - regularized multi-armed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problem-independent convergence rates, ending up in intermediate convergence rates interpolating between the aforementioned slow and fast rates.
△ Less
Submitted 5 June, 2019; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Finding the bandit in a graph: Sequential search-and-stop
Authors:
Pierre Perrault,
Vianney Perchet,
Michal Valko
Abstract:
We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and resta…
▽ More
We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.
△ Less
Submitted 22 April, 2019; v1 submitted 6 June, 2018;
originally announced June 2018.
-
Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
Authors:
Quentin Berthet,
Vianney Perchet
Abstract:
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To s…
▽ More
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We give theoretical guarantees for the performance of this algorithm over various classes of functions, and discuss the optimality of these results.
△ Less
Submitted 6 September, 2017; v1 submitted 22 February, 2017;
originally announced February 2017.
-
Highly-Smooth Zero-th Order Online Optimization Vianney Perchet
Authors:
Francis Bach,
Vianney Perchet
Abstract:
The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines. In this paper we consider convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as logistic regression. We show that as opposed…
▽ More
The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines. In this paper we consider convex optimization with noisy zero-th order information, that is noisy function evaluations at any desired point. We focus on problems with high degrees of smoothness, such as logistic regression. We show that as opposed to gradient-based algorithms, high-order smoothness may be used to improve estimation rates, with a precise dependence of our upper-bounds on the degree of smoothness. In particular, we show that for infinitely differentiable functions, we recover the same dependence on sample size as gradient-based algorithms, with an extra dimension-dependent factor. This is done for both convex and strongly-convex functions, with finite horizon and anytime algorithms. Finally, we also recover similar results in the online optimization setting.
△ Less
Submitted 26 May, 2016;
originally announced May 2016.
-
Batched bandit problems
Authors:
Vianney Perchet,
Philippe Rigollet,
Sylvain Chassang,
Erik Snowberg
Abstract:
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost fo…
▽ More
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
△ Less
Submitted 29 March, 2016; v1 submitted 2 May, 2015;
originally announced May 2015.
-
Approachability in unknown games: Online learning meets multi-objective optimization
Authors:
Shie Mannor,
Vianney Perchet,
Gilles Stoltz
Abstract:
In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the…
▽ More
In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the game structure: she receives an arbitrary vector-valued reward vector at every round. She wishes to approach the smallest ("best") possible set given the observed average payoffs in hindsight. This extension of the standard setting has implications even when the original target set is not approachable and when it is not obvious which expansion of it should be approached instead. We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals. We further propose a concrete strategy to approach these goals. Our method does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes. Applications to global cost minimization and to approachability under sample path constraints are considered.
△ Less
Submitted 17 June, 2016; v1 submitted 10 February, 2014;
originally announced February 2014.
-
A Primal Condition for Approachability with Partial Monitoring
Authors:
Shie Mannor,
Vianney Perchet,
Gilles Stoltz
Abstract:
In approachability with full monitoring there are two types of conditions that are known to be equivalent for convex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half-spaces are approachable in the one-shot game; while the dual one is of the form: a convex set C is approachable if and only if it intersects all payoff sets of…
▽ More
In approachability with full monitoring there are two types of conditions that are known to be equivalent for convex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half-spaces are approachable in the one-shot game; while the dual one is of the form: a convex set C is approachable if and only if it intersects all payoff sets of a certain form. We consider approachability in games with partial monitoring. In previous works (Perchet 2011; Mannor et al. 2011) we provided a dual characterization of approachable convex sets; we also exhibited efficient strategies in the case where C is a polytope. In this paper we provide primal conditions on a convex set to be approachable with partial monitoring. They depend on a modified reward function and lead to approachability strategies, based on modified payoff functions, that proceed by projections similarly to Blackwell's (1956) strategy; this is in contrast with previously studied strategies in this context that relied mostly on the signaling structure and aimed at estimating well the distributions of the signals received. Our results generalize classical results by Kohlberg 1975 (see also Mertens et al. 1994) and apply to games with arbitrary signaling structure as well as to arbitrary convex sets.
△ Less
Submitted 23 May, 2013;
originally announced May 2013.
-
Bounded regret in stochastic multi-armed bandits
Authors:
Sébastien Bubeck,
Vianney Perchet,
Philippe Rigollet
Abstract:
We study the stochastic multi-armed bandit problem when one knows the value $μ^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $Δ$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only know…
▽ More
We study the stochastic multi-armed bandit problem when one knows the value $μ^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $Δ$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows $Δ$, and bounded regret of order $1/Δ$ is not possible if one only knows $μ^{(\star)}$
△ Less
Submitted 12 February, 2013; v1 submitted 6 February, 2013;
originally announced February 2013.
-
On an unified framework for approachability in games with or without signals
Authors:
Vianney Perchet,
Marc Quincampoix
Abstract:
We unify standard frameworks for approachability both in full or partial monitoring by defining a new abstract game, called the "purely informative game", where the outcome at each stage is the maximal information players can obtain, represented as some probability measure. Objectives of players can be rewritten as the convergence (to some given set) of sequences of averages of these probability m…
▽ More
We unify standard frameworks for approachability both in full or partial monitoring by defining a new abstract game, called the "purely informative game", where the outcome at each stage is the maximal information players can obtain, represented as some probability measure. Objectives of players can be rewritten as the convergence (to some given set) of sequences of averages of these probability measures. We obtain new results extending the approachability theory developed by Blackwell moreover this new abstract framework enables us to characterize approachable sets with, as usual, a remarkably simple and clear reformulation for convex sets. Translated into the original games, those results become the first necessary and sufficient condition under which an arbitrary set is approachable and they cover and extend previous known results for convex sets. We also investigate a specific class of games where, thanks to some unusual definition of averages and convexity, we again obtain a complete characterization of approachable sets along with rates of convergence.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
The multi-armed bandit problem with covariates
Authors:
Vianney Perchet,
Philippe Rigollet
Abstract:
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards…
▽ More
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (abse) that adaptively decomposes the global problem into suitably "localized" static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (se) policy. Our results include sharper regret bounds for the se policy in a static bandit problem and minimax optimal regret bounds for the abse policy in the dynamic problem.
△ Less
Submitted 24 May, 2013; v1 submitted 27 October, 2011;
originally announced October 2011.
-
Robust approachability and regret minimization in games with partial monitoring
Authors:
Shie Mannor,
Vianney Perchet,
Gilles Stoltz
Abstract:
Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient a…
▽ More
Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient algorithms (i.e., with constant per-step complexity) for this setup. We finally consider external regret and internal regret in repeated games with partial monitoring and derive regret-minimizing strategies based on approachability theory.
△ Less
Submitted 15 February, 2012; v1 submitted 25 May, 2011;
originally announced May 2011.
-
Internal Regret with Partial Monitoring. Calibration-Based Optimal Algorithms
Authors:
Vianney Perchet
Abstract:
We provide consistent random algorithms for sequential decision under partial monitoring, i.e. when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in avera…
▽ More
We provide consistent random algorithms for sequential decision under partial monitoring, i.e. when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law.
They are based on a generalization of calibration, no longer defined in terms of a Voronoi diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal -- as well as the usual external -- regret at stage $n$ by $O(n^{-1/3})$, which is known to be optimal.
△ Less
Submitted 22 February, 2011;
originally announced February 2011.