Skip to main content

Showing 1–45 of 45 results for author: Perchet, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.11316  [pdf, ps, other

    stat.ML cs.DS cs.GT cs.LG

    Improved Algorithms for Contextual Dynamic Pricing

    Authors: Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet

    Abstract: In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Buyers will purchase products only if the prices are below their valuations. The goal of the seller is to design a pricing strategy that collects as much revenue as possible. We focus on two different valuation models. The first assumes that valuations linearly depend on the context and are further d… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2403.11637  [pdf, other

    cs.LG stat.ML

    The Value of Reward Lookahead in Reinforcement Learning

    Authors: Nadav Merlis, Dorian Baudry, Vianney Perchet

    Abstract: In reinforcement learning (RL), agents sequentially interact with changing environments while aiming to maximize the obtained rewards. Usually, rewards are observed only after acting, and so the goal is to maximize the expected cumulative reward. Yet, in many practical settings, reward information is observed in advance -- prices are observed before performing transactions; nearby traffic informat… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  3. arXiv:2402.13079  [pdf, ps, other

    stat.ML cs.IR cs.IT cs.LG

    Mode Estimation with Partial Feedback

    Authors: Charles Arnal, Vivien Cabannes, Vianney Perchet

    Abstract: The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments. These new learning pipelines call for new theoretical frameworks. In this paper, we formalize core aspects of weakly supervised and active learning with a simple problem: the estimation of the mode of a distribution using partial feedback. We show how entropy coding allows for… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    MSC Class: 62L05; 62B86; 62D10; 62B10

  4. arXiv:2309.00656  [pdf, other

    cs.GT cs.LG stat.ML

    Local and adaptive mirror descents in extensive-form games

    Authors: Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

    Abstract: We study how to learn $ε$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by $T$. Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer e… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  5. arXiv:2306.02071  [pdf, other

    cs.AI cs.GT stat.CO stat.ML

    DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

    Authors: Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet

    Abstract: We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain, to some relevant pre-defined utility of a machine learning task, of aggregating an individual dataset to others. The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification, which can be combined with Monte Carlo integration to overcome the computation… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: 22 pages

  6. arXiv:2305.19691  [pdf, other

    cs.LG stat.ML

    Constant or logarithmic regret in asynchronous multiplayer bandits

    Authors: Hugo Richard, Etienne Boursier, Vianney Perchet

    Abstract: Multiplayer bandits have recently been extensively studied because of their application to cognitive radio networks. While the literature mostly considers synchronous players, radio networks (e.g. for IoT) tend to have asynchronous devices. This motivates the harder, asynchronous multiplayer bandits problem, which was first tackled with an explore-then-commit (ETC) algorithm (see Dakdouk, 2022),… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  7. arXiv:2212.12567  [pdf, other

    stat.ML cs.LG

    Adapting to game trees in zero-sum imperfect information games

    Authors: Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

    Abstract: Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $ε$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/ε^2)$ on the required number of realizations to learn these strategies with hi… ▽ More

    Submitted 15 February, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

  8. arXiv:2211.16275  [pdf, ps, other

    stat.ML cs.GT cs.LG

    A survey on multi-player bandits

    Authors: Etienne Boursier, Vianney Perchet

    Abstract: Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio network… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: final version, accepted at JMLR

  9. arXiv:2210.12882  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Stochastic Mirror Descent for Large-Scale Sparse Recovery

    Authors: Sasila Ilandarideva, Yannis Bekri, Anatoli Juditsky, Vianney Perchet

    Abstract: In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters. The proposed solution reduces to resolving a penalized stochastic optimization problem on each stage of a multistage algorithm; each problem being solved to a prescribed accuracy by the non-Euclidean Composite Stochastic Mirror Descent (CSMD) algorithm. Assuming that… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

  10. arXiv:2205.15695  [pdf, other

    cs.LG stat.ML

    On Preemption and Learning in Stochastic Scheduling

    Authors: Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, Vianney Perchet

    Abstract: We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution. We start by analyzing the scenario where the type characteristics are known and then move to two learning scenarios where the types are unknown: non-preemptive problems, where each started job must be completed before moving to another job; and preemptive problems, where job executio… ▽ More

    Submitted 1 June, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: Accepted to ICML 2023

  11. arXiv:2205.13255  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Active Labeling: Streaming Stochastic Gradients

    Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

    Abstract: The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 38 pages (9 main pages), 9 figures

    MSC Class: 68T37 ACM Class: G.3

  12. arXiv:2108.00230  [pdf, other

    stat.ML cs.LG

    Pure Exploration and Regret Minimization in Matching Bandits

    Authors: Flore Sentenac, Jialin Yi, Clément Calauzènes, Vianney Perchet, Milan Vojnovic

    Abstract: Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of ver… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  13. arXiv:2107.00995  [pdf, other

    cs.DS stat.ML

    Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm

    Authors: Nathan Noiry, Flore Sentenac, Vianney Perchet

    Abstract: Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i.i.d., but they have fixed degree distributions -- the so-called configuration model. We estimate the competitive ratio of the simplest algorithm, GREEDY, by approximating some relevant stochastic discrete processes by their continuous counterparts, that are sol… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  14. arXiv:2106.05061  [pdf, other

    cs.LG stat.ML

    Quickest change detection with unknown parameters: Constant complexity and near optimality

    Authors: Firas Jarboui, Viannet Perchet

    Abstract: We consider the quickest change detection problem where both the parameters of pre- and post- change distributions are unknown, which prevents the use of classical simple hypothesis testing. Without additional assumptions, optimal solutions are not tractable as they rely on some minimax and robust variant of the objective. As a consequence, change points might be detected too late for practical ap… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  15. arXiv:2106.04228  [pdf, ps, other

    stat.ML cs.GT cs.LG cs.NI

    Decentralized Learning in Online Queuing Systems

    Authors: Flore Sentenac, Etienne Boursier, Vianney Perchet

    Abstract: Motivated by packet routing in computer networks, online queuing systems are composed of queues receiving packets at different rates. Repeatedly, they send packets to servers, each of them treating only at most one packet at a time. In the centralized case, the number of accumulated packets remains bounded (i.e., the system is \textit{stable}) as long as the ratio between service rates and arrival… ▽ More

    Submitted 4 November, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera ready

  16. arXiv:2102.08087  [pdf, other

    stat.ML cs.LG math.OC stat.OT

    Making the most of your day: online learning for optimal allocation of time

    Authors: Etienne Boursier, Tristan Garrec, Vianney Perchet, Marco Scarsini

    Abstract: We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, sh… ▽ More

    Submitted 4 November, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 camera ready

  17. arXiv:2012.14264  [pdf, other

    cs.LG stat.ML

    Lifelong Learning in Multi-Armed Bandits

    Authors: Matthieu Jedor, Jonathan Louëdec, Vianney Perchet

    Abstract: Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with the objective to minimize the total regret incurred over a series of tasks. While most bandit algorithms are designed to have a low worst-case regret, we exami… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

  18. arXiv:2007.09996  [pdf, ps, other

    math.OC cs.LG stat.OT

    Social Learning in Non-Stationary Environments

    Authors: Etienne Boursier, Vianney Perchet, Marco Scarsini

    Abstract: Potential buyers of a product or service, before making their decisions, tend to read reviews written by previous consumers. We consider Bayesian consumers with heterogeneous preferences, who sequentially decide whether to buy an item of unknown quality, based on previous buyers' reviews. The quality is multi-dimensional and may occasionally vary over time; the reviews are also multi-dimensional.… ▽ More

    Submitted 23 February, 2022; v1 submitted 20 July, 2020; originally announced July 2020.

  19. arXiv:2006.06613  [pdf, ps, other

    stat.ML cs.LG

    Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

    Authors: Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

    Abstract: We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propo… ▽ More

    Submitted 3 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: accepted to NeurIPS 2020

  20. arXiv:2005.01656  [pdf, ps, other

    cs.LG stat.ML

    Categorized Bandits

    Authors: Matthieu Jedor, Jonathan Louedec, Vianney Perchet

    Abstract: We introduce a new stochastic multi-armed bandit setting where arms are grouped inside ``ordered'' categories. The motivating example comes from e-commerce, where a customer typically has a greater appetence for items of a specific well-identified but unknown category than any other one. We introduce three concepts of ordering between categories, inspired by stochastic dominance between random var… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

  21. arXiv:2002.01197  [pdf, ps, other

    cs.LG stat.ML

    Selfish Robustness and Equilibria in Multi-Player Bandits

    Authors: Etienne Boursier, Vianney Perchet

    Abstract: Motivated by cognitive radios, stochastic multi-player multi-armed bandits gained a lot of interest recently. In this class of problems, several players simultaneously pull arms and encounter a collision - with 0 reward - if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward (obediently following some fixed protocol) has been mo… ▽ More

    Submitted 19 June, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

  22. Markov Decision Process for MOOC users behavioral inference

    Authors: Firas Jarboui, Célya Gruson-daniel, Pierre Chanial, Alain Durmus, Vincent Rocchisani, Sophie-helene Goulet Ebongue, Anneliese Depoux, Wilfried Kirschenmann, Vianney Perchet

    Abstract: Studies on massive open online courses (MOOCs) users discuss the existence of typical profiles and their impact on the learning process of the students. However defining the typical behaviors as well as classifying the users accordingly is a difficult task. In this paper we suggest two methods to model MOOC users behaviour given their log data. We mold their behavior into a Markov Decision Process… ▽ More

    Submitted 10 March, 2021; v1 submitted 10 July, 2019; originally announced July 2019.

  23. arXiv:1906.08509  [pdf, other

    stat.ML cs.LG math.OC

    Online A-Optimal Design and Active Linear Regression

    Authors: Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

    Abstract: We consider in this paper the problem of optimal experiment design where a decision maker can choose which points to sample to obtain an estimate $\hatβ$ of the hidden parameter $β^{\star}$ of an underlying linear model. The key challenge of this work lies in the heteroscedasticity assumption that we make, meaning that each covariate has a different and unknown variance. The goal of the decision m… ▽ More

    Submitted 30 December, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: 29 pages, 5 figures

  24. arXiv:1905.11797  [pdf, ps, other

    cs.LG stat.ML

    ROI Maximization in Stochastic Online Decision-Making

    Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

    Abstract: We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive proposals for technological innovations and want to quickly decide whether they are worth implementing. We design an algorithm for learning ROI-maximizing decision-making policies over a sequence of innovati… ▽ More

    Submitted 22 December, 2021; v1 submitted 28 May, 2019; originally announced May 2019.

  25. arXiv:1905.11148  [pdf, other

    stat.ML cs.LG stat.AP

    Utility/Privacy Trade-off through the lens of Optimal Transport

    Authors: Etienne Boursier, Vianney Perchet

    Abstract: Strategic information is valuable either by remaining private (for instance if it is sensitive) or, on the other hand, by being used publicly to increase some utility. These two objectives are antagonistic and leaking this information might be more rewarding than concealing it. Unlike classical solutions that focus on the first point, we consider instead agents that optimize a natural trade-off be… ▽ More

    Submitted 2 March, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

    Comments: AISTATS 2020

  26. arXiv:1902.04376  [pdf, ps, other

    stat.ML cs.LG math.OC

    An adaptive stochastic optimization algorithm for resource allocation

    Authors: Xavier Fontaine, Shie Mannor, Vianney Perchet

    Abstract: We consider the classical problem of sequential resource allocation where a decision maker must repeatedly divide a budget between several resources, each with diminishing returns. This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret. We construct an algorithm that is {\em adaptive} to the… ▽ More

    Submitted 16 January, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: ALT2020, 45 pages, 9 figures

    Journal ref: Proceedings of Machine Learning Research (PMLR), volume 117, 2020

  27. arXiv:1902.03794  [pdf, other

    stat.ML cs.LG

    Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

    Authors: Pierre Perrault, Vianney Perchet, Michal Valko

    Abstract: We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as \emph{independence}. However, while being optimal in terms of asymptotic regret, these algorithms are inefficient. In our paper, we first reduce their implementation to a specific \emph{submod… ▽ More

    Submitted 20 June, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: Accepted to ICML 2019, Long Beach

  28. arXiv:1902.01239  [pdf, other

    stat.ML cs.LG

    A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

    Authors: Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet

    Abstract: We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward. We consider the challenging heterogeneous setting, in which different arms may have different means for different players, and propose a new and efficient algorithm that combines the idea of… ▽ More

    Submitted 20 March, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: AISTATS2020

  29. arXiv:1810.05065  [pdf, ps, other

    stat.ML cs.LG math.OC

    Regularized Contextual Bandits

    Authors: Xavier Fontaine, Quentin Berthet, Vianney Perchet

    Abstract: We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy which is known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously - and independently - reg… ▽ More

    Submitted 5 June, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: AISTATS 2019, 23 pages, 2 figures

    Journal ref: Proceedings of Machine Learning Research, PMLR 89:2144-2153, 2019

  30. arXiv:1810.04088  [pdf, other

    cs.LG stat.ML

    Bridging the gap between regret minimization and best arm identification, with application to A/B tests

    Authors: Rémy Degenne, Thomas Nedelec, Clément Calauzènes, Vianney Perchet

    Abstract: State of the art online learning procedures focus either on selecting the best alternative ("best arm identification") or on minimizing the cost (the "regret"). We merge these two objectives by providing the theoretical analysis of cost minimizing algorithms that are also delta-PAC (with a proven guaranteed bound on the decision time), hence fulfilling at the same time regret minimization and best… ▽ More

    Submitted 26 February, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

    Journal ref: AISTATS 2019 proceedings

  31. arXiv:1809.08151  [pdf, other

    cs.LG stat.ML

    SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

    Authors: Etienne Boursier, Vianney Perchet

    Abstract: Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed bandit problem, where several players pull arms simultaneously and collisions occur if one of them is pulled by several players at the same stage. We present a decentralized algorithm that achieves the same performance as a centralized one, contradicting the existing lower bounds for that problem. This is pos… ▽ More

    Submitted 19 November, 2019; v1 submitted 21 September, 2018; originally announced September 2018.

    Journal ref: NeurIPS 2019

  32. arXiv:1807.03558  [pdf, other

    cs.LG stat.ML

    Bandits with Side Observations: Bounded vs. Logarithmic Regret

    Authors: Rémy Degenne, Evrard Garcelon, Vianney Perchet

    Abstract: We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $ε$, an extra observation is gathered by the agent for free. We prove that, no matter how small $ε$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/ε)}{Δ_i}$, up to multiplicative cons… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: Conference on Uncertainty in Artificial Intelligence (UAI) 2018, 21 pages

  33. arXiv:1807.03288  [pdf, ps, other

    cs.LG stat.ML

    Dynamic Pricing with Finitely Many Unknown Valuations

    Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Vianney Perchet

    Abstract: Motivated by posted price auctions where buyers are grouped in an unknown number of latent types characterized by their private values for the good on sale, we investigate revenue maximization in stochastic dynamic pricing when the distribution of buyers' private values is supported on an unknown set of points in [0,1] of unknown cardinality $K$. This setting can be viewed as an instance of a stoc… ▽ More

    Submitted 5 March, 2019; v1 submitted 9 July, 2018; originally announced July 2018.

  34. arXiv:1806.02282  [pdf, ps, other

    stat.ML cs.LG math.OC

    Finding the bandit in a graph: Sequential search-and-stop

    Authors: Pierre Perrault, Vianney Perchet, Michal Valko

    Abstract: We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and resta… ▽ More

    Submitted 22 April, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: in International Conference on Artificial Intelligence and Statistics (AISTATS 2019), April 2019, Naha, Okinawa, Japan

  35. arXiv:1704.00773  [pdf, other

    stat.ML cs.LG

    A comparative study of counterfactual estimators

    Authors: Thomas Nedelec, Nicolas Le Roux, Vianney Perchet

    Abstract: We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal. We then exhibit properties optimal estimators should possess. In the case where examples have been gathered using multiple policies, we show that fused estimators domina… ▽ More

    Submitted 29 January, 2019; v1 submitted 3 April, 2017; originally announced April 2017.

  36. arXiv:1702.06917  [pdf, ps, other

    cs.LG math.OC stat.ML

    Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

    Authors: Quentin Berthet, Vianney Perchet

    Abstract: We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To s… ▽ More

    Submitted 6 September, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

  37. arXiv:1609.08870  [pdf, ps, other

    cs.GT stat.ML

    Approachability of convex sets in generalized quitting games

    Authors: János Flesch, Rida Laraki, Vianney Perchet

    Abstract: We consider Blackwell approachability, a very powerful and geometric tool in game theory, used for example to design strategies of the uninformed player in repeated games with incomplete information. We extend this theory to "generalized quitting games" , a class of repeated stochastic games in which each player may have quitting actions, such as the Big-Match. We provide three simple geometric an… ▽ More

    Submitted 28 September, 2016; originally announced September 2016.

  38. arXiv:1511.08405  [pdf, ps, other

    cs.LG stat.ML

    Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case

    Authors: Joon Kwon, Vianney Perchet

    Abstract: We demonstrate that, in the classical non-stochastic regret minimization problem with $d$ decisions, gains and losses to be respectively maximized or minimized are fundamentally different. Indeed, by considering the additional sparsity assumption (at each stage, at most $s$ decisions incur a nonzero outcome), we derive optimal regret bounds of different orders. Specifically, with gains, we obtain… ▽ More

    Submitted 26 November, 2015; originally announced November 2015.

  39. arXiv:1511.05720  [pdf, other

    cs.GT cs.LG stat.ML

    Online learning in repeated auctions

    Authors: Jonathan Weed, Vianney Perchet, Philippe Rigollet

    Abstract: Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic… ▽ More

    Submitted 18 November, 2015; originally announced November 2015.

    MSC Class: Primary 62L05; secondary 62C20

  40. arXiv:1402.2043  [pdf, other

    stat.ML cs.LG math.ST

    Approachability in unknown games: Online learning meets multi-objective optimization

    Authors: Shie Mannor, Vianney Perchet, Gilles Stoltz

    Abstract: In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the… ▽ More

    Submitted 17 June, 2016; v1 submitted 10 February, 2014; originally announced February 2014.

  41. arXiv:1311.4825  [pdf, other

    stat.ML cs.LG

    Gaussian Process Optimization with Mutual Information

    Authors: Emile Contal, Vianney Perchet, Nicolas Vayatis

    Abstract: In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves furthe… ▽ More

    Submitted 8 June, 2015; v1 submitted 19 November, 2013; originally announced November 2013.

    Comments: Proceedings of The 31st International Conference on Machine Learning (ICML 2014)

  42. arXiv:1305.5399  [pdf, other

    math.OC cs.GT cs.LG stat.ML

    A Primal Condition for Approachability with Partial Monitoring

    Authors: Shie Mannor, Vianney Perchet, Gilles Stoltz

    Abstract: In approachability with full monitoring there are two types of conditions that are known to be equivalent for convex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half-spaces are approachable in the one-shot game; while the dual one is of the form: a convex set C is approachable if and only if it intersects all payoff sets of… ▽ More

    Submitted 23 May, 2013; originally announced May 2013.

  43. arXiv:1302.1611  [pdf, ps, other

    math.ST cs.LG stat.ML

    Bounded regret in stochastic multi-armed bandits

    Authors: Sébastien Bubeck, Vianney Perchet, Philippe Rigollet

    Abstract: We study the stochastic multi-armed bandit problem when one knows the value $μ^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $Δ$. We propose a new randomized policy that attains a regret {\em uniformly bounded over time} in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only know… ▽ More

    Submitted 12 February, 2013; v1 submitted 6 February, 2013; originally announced February 2013.

    MSC Class: 62L05

  44. arXiv:1110.6084  [pdf, ps, other

    math.ST cs.LG stat.ML

    The multi-armed bandit problem with covariates

    Authors: Vianney Perchet, Philippe Rigollet

    Abstract: We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards… ▽ More

    Submitted 24 May, 2013; v1 submitted 27 October, 2011; originally announced October 2011.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOS1101 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1101

    Journal ref: Annals of Statistics 2013, Vol. 41, No. 2, 693-721

  45. arXiv:1006.1746  [pdf, ps, other

    cs.GT cs.LG stat.ML

    Calibration and Internal no-Regret with Partial Monitoring

    Authors: Vianney Perchet

    Abstract: Calibrated strategies can be obtained by performing strategies that have no internal regret in some auxiliary game. Such strategies can be constructed explicitly with the use of Blackwell's approachability theorem, in an other auxiliary game. We establish the converse: a strategy that approaches a convex $B$-set can be derived from the construction of a calibrated strategy. We develop these tools… ▽ More

    Submitted 9 June, 2010; originally announced June 2010.