Skip to main content

Showing 1–26 of 26 results for author: Abbasi-Yadkori, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2202.13001  [pdf, other

    cs.LG stat.ML

    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

    Authors: MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

    Abstract: We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M\le K$, the learner aims to compete with the best subset of arms of size $M$. We design an algorithm based on a reduction to bandit submodular maximizati… ▽ More

    Submitted 18 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  2. arXiv:2201.06532  [pdf, ps, other

    cs.LG stat.ML

    A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

    Authors: Yasin Abbasi-Yadkori, Andras Gyorgy, Nevena Lazic

    Abstract: We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their dynamic regret, which is defined as the difference between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative r… ▽ More

    Submitted 8 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

  3. arXiv:2108.05533  [pdf, ps, other

    cs.LG math.OC stat.ML

    Efficient Local Planning with Linear Function Approximation

    Authors: Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazić, Csaba Szepesvári

    Abstract: We study query and computationally efficient planning algorithms with linear function approximation and a simulator. We assume that the agent only has local access to the simulator, meaning that the agent can only query the simulator at states that have been visited before. This setting is more practical than many prior works on reinforcement learning with a generative model. We propose two algori… ▽ More

    Submitted 4 February, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Algorithmic Learning Theory 2022

  4. arXiv:2102.12611  [pdf, other

    cs.LG stat.ML

    Improved Regret Bound and Experience Replay in Regularized Policy Iteration

    Authors: Nevena Lazic, Dong Yin, Yasin Abbasi-Yadkori, Csaba Szepesvari

    Abstract: In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy iteration) can be sharpened from $O(T^{3/4})$ to $O(\sqrt{T})$ under nearly identical assumptions, and instantiate the bound with linear function approximation.… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  5. arXiv:2102.06234  [pdf, other

    cs.LG stat.ML

    Optimization Issues in KL-Constrained Approximate Policy Iteration

    Authors: Nevena Lazić, Botao Hao, Yasin Abbasi-Yadkori, Dale Schuurmans, Csaba Szepesvári

    Abstract: Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API). While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy. Popular practical algorithms such as TRPO, MPO, and VMPO replace regularization by a constraint on KL-divergence of consecutiv… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  6. arXiv:2102.02049  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

    Authors: Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

    Abstract: We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map. The generative model provides a local access to the MDP: The planner can ask for random transitions from previously returned states and arbitrary actions, and features are only accessible for states that are encountered in this proce… ▽ More

    Submitted 9 July, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

  7. arXiv:2010.10182  [pdf, ps, other

    stat.ML cs.LG

    The Elliptical Potential Lemma Revisited

    Authors: Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori

    Abstract: This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma. This result is important in online learning, especially for linear stochastic bandits. The original proof of the result, however short and elegant, does not give much flexibility on the type of potentials considered and we believe that this new interpretation can be of interest for future research in t… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 8 pages

  8. arXiv:2006.05491  [pdf, other

    cs.LG stat.ML

    Regret Balancing for Bandit and RL Model Selection

    Authors: Yasin Abbasi-Yadkori, Aldo Pacchiano, My Phan

    Abstract: We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by estimating the regret of each algorithm and playing the algorithms such that all empirical regrets are ensured to be of the same order, the overall regret bala… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: Submitted to the Thirty-Fourth Annual Conference on Neural Information Processing Systems (NeurIPS 2020)

  9. arXiv:2006.02672  [pdf, other

    cs.LG stat.ML

    Sample Efficient Graph-Based Optimization with Noisy Observations

    Authors: Tan Nguyen, Ali Shameli, Yasin Abbasi-Yadkori, Anup Rao, Branislav Kveton

    Abstract: We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations. We define a notion of convexity, and we show that a variant of best-arm identification can find a near-optimal solution after a small number of queries that is independent of the size of the graph. For functions that have local minima and are nearly convex, we show a sample comp… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: The first version of this paper appeared in AISTATS 2019. Thank to community feedback, some typos and a minor issue have been identified. Specifically, on page 4, column 2, line 18, the statement $Δ_{1,s} \ge (1+m)^{S-1-s} Δ_1$ is not valid, and in the proof of Theorem 2, "By Lemma 1" should be "By Definition 2". These problems are fixed in this updated version published here on arxiv

    Journal ref: AISTATS 2019

  10. arXiv:2003.01704  [pdf, other

    cs.LG stat.ML

    Model Selection in Contextual Stochastic Bandit Problems

    Authors: Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

    Abstract: We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial meta-algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that… ▽ More

    Submitted 4 December, 2022; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: 33 main pages, 15 appendix pages

  11. arXiv:2002.03069  [pdf, other

    cs.LG stat.ML

    Adaptive Approximate Policy Iteration

    Authors: Botao Hao, Nevena Lazic, Yasin Abbasi-Yadkori, Pooria Joulani, Csaba Szepesvari

    Abstract: Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited, and existing results are largely focused on episodic or discounted Markov decision processes (MDPs). In this work, we present adaptive approximate policy itera… ▽ More

    Submitted 11 February, 2021; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: Accepted at AISTATS 2021

  12. arXiv:1908.10479  [pdf, other

    cs.LG stat.ML

    Exploration-Enhanced POLITEX

    Authors: Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari, Gellert Weisz

    Abstract: We study algorithms for average-cost reinforcement learning problems with value function approximation. Our starting point is the recently proposed POLITEX algorithm, a version of policy iteration where the policy produced in each iteration is near-optimal in hindsight for the sum of all past value function estimates. POLITEX has sublinear regret guarantees in uniformly-mixing MDPs when the value… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  13. arXiv:1908.04970  [pdf, other

    cs.LG stat.ML

    Thompson Sampling with Approximate Inference

    Authors: My Phan, Yasin Abbasi-Yadkori, Justin Domke

    Abstract: We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in $α$-divergence) can lead to poor performance (linear regret) due to under-ex… ▽ More

    Submitted 14 January, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

  14. arXiv:1906.05247  [pdf, other

    stat.ML cs.LG

    Bootstrap** Upper Confidence Bound

    Authors: Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

    Abstract: Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration inequalities, which thus lead to over-exploration. In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap… ▽ More

    Submitted 30 October, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Accepted by NeurIPS 2019

  15. arXiv:1805.09793  [pdf, other

    cs.LG stat.ML

    New Insights into Bootstrap** for Bandits

    Authors: Sharan Vaswani, Branislav Kveton, Zheng Wen, Anup Rao, Mark Schmidt, Yasin Abbasi-Yadkori

    Abstract: We investigate the use of bootstrap** in the bandit setting. We first show that the commonly used non-parametric bootstrap** (NPB) procedure can be provably inefficient and establish a near-linear lower bound on the regret incurred by it under the bandit model with Bernoulli rewards. We show that NPB with an appropriate amount of forced exploration can result in sub-linear albeit sub-optimal r… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  16. arXiv:1805.01648  [pdf, other

    stat.ML cs.LG math.PR stat.CO

    Sharp convergence rates for Langevin dynamics in the nonconvex setting

    Authors: Xiang Cheng, Niladri S. Chatterji, Yasin Abbasi-Yadkori, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is… ▽ More

    Submitted 6 July, 2020; v1 submitted 4 May, 2018; originally announced May 2018.

    Comments: 78 pages, 2 figures

  17. arXiv:1804.10488  [pdf, other

    cs.LG stat.ML

    Offline Evaluation of Ranking Policies with Click Models

    Authors: Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, Zheng Wen

    Abstract: Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algor… ▽ More

    Submitted 13 June, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

  18. arXiv:1804.06021  [pdf, other

    cs.LG math.OC stat.ML

    Model-Free Linear Quadratic Control via Reduction to Expert Prediction

    Authors: Yasin Abbasi-Yadkori, Nevena Lazic, Csaba Szepesvari

    Abstract: Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linea… ▽ More

    Submitted 5 October, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

  19. arXiv:1802.09646  [pdf, other

    cs.LG stat.ML

    Optimizing over a Restricted Policy Class in Markov Decision Processes

    Authors: Ershad Banijamali, Yasin Abbasi-Yadkori, Mohammad Ghavamzadeh, Nikos Vlassis

    Abstract: We address the problem of finding an optimal policy in a Markov decision process under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are only interested in optimizing in their convex hull. We show that this problem is NP-hard to solve… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.

    Comments: 14 pages

  20. arXiv:1712.04644  [pdf, other

    cs.LG stat.ML

    Stochastic Low-Rank Bandits

    Authors: Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan

    Abstract: Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobs… ▽ More

    Submitted 13 December, 2017; originally announced December 2017.

  21. arXiv:1611.09252  [pdf, ps, other

    stat.CO math.PR

    Fast Mixing Random Walks and Regularity of Incompressible Vector Fields

    Authors: Yasin Abbasi-Yadkori

    Abstract: We show sufficient conditions under which the \textsc{BallWalk} algorithm mixes fast in a bounded connected subset of $\Real^n$. In particular, we show fast mixing if the space is the transformation of a convex space under a smooth incompressible flow. Construction of such smooth flows is in turn reduced to the study of the regularity of the solution of the Dirichlet problem for Laplace's equation… ▽ More

    Submitted 23 November, 2016; originally announced November 2016.

  22. arXiv:1611.06426  [pdf, other

    stat.ML cs.LG

    Conservative Contextual Linear Bandits

    Authors: Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

    Abstract: Safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e., guaranteed to perform at least as well as a baseline. In this paper, we study the issue of safety in contextual linear bandits that have application in many different fields including p… ▽ More

    Submitted 3 March, 2017; v1 submitted 19 November, 2016; originally announced November 2016.

  23. arXiv:1610.08865  [pdf, other

    stat.CO cs.AI math.CO math.PR

    Hit-and-Run for Sampling and Planning in Non-Convex Spaces

    Authors: Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek

    Abstract: We propose the Hit-and-Run algorithm for planning and sampling problems in non-convex spaces. For sampling, we show the first analysis of the Hit-and-Run algorithm in non-convex spaces and show that it mixes fast as long as certain smoothness conditions are satisfied. In particular, our analysis reveals an intriguing connection between fast mixing and the existence of smooth measure-preserving map… ▽ More

    Submitted 19 October, 2016; originally announced October 2016.

  24. arXiv:1406.6812  [pdf, other

    cs.LG stat.ML

    Online learning in MDPs with side information

    Authors: Yasin Abbasi-Yadkori, Gergely Neu

    Abstract: We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side informa… ▽ More

    Submitted 26 June, 2014; originally announced June 2014.

  25. arXiv:1406.3926  [pdf, other

    cs.LG stat.ML

    Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm

    Authors: Yasin Abbasi-Yadkori, Csaba Szepesvari

    Abstract: We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems. Since computing the optimal control is computationally expensive, we design an algorithm that trades off performance for computational efficiency. The algorithm is a lazy posterior sampling method that maintains a distribution over the unknown parameter. The algorithm changes its policy only wh… ▽ More

    Submitted 16 June, 2014; originally announced June 2014.

  26. arXiv:1303.3055  [pdf, ps, other

    cs.LG stat.ML

    Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

    Authors: Yasin Abbasi-Yadkori, Peter L. Bartlett, Csaba Szepesvari

    Abstract: We study the problem of learning Markov decision processes with finite state and action spaces when the transition probability distributions and loss functions are chosen adversarially and are allowed to change with time. We introduce an algorithm whose regret with respect to any policy in a comparison class grows as the square root of the number of rounds of the game, provided the transition prob… ▽ More

    Submitted 12 March, 2013; originally announced March 2013.