Skip to main content

Showing 1–3 of 3 results for author: Gouverneur, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.03361  [pdf, ps, other

    stat.ML cs.LG

    Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 15 pages: 8 of main text and 7 of appendices

  2. arXiv:2304.13593  [pdf, ps, other

    stat.ML cs.LG

    Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: In this work, we study the performance of the Thompson Sampling algorithm for Contextual Bandit problems based on the framework introduced by Neu et al. and their concept of lifted information ratio. First, we prove a comprehensive bound on the Thompson Sampling expected cumulative regret that depends on the mutual information of the environment parameters and the history. Then, we introduce new b… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 8 pages: 5 of the main text, 1 of references, and 2 of appendices. Accepted to ISIT 2023

  3. arXiv:2207.08735  [pdf, ps, other

    cs.LG stat.ML

    An Information-Theoretic Analysis of Bayesian Reinforcement Learning

    Authors: Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

    Abstract: Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: 10 pages: 6 of the main text, 1 of references, and 3 of appendices