Skip to main content

Showing 1–9 of 9 results for author: Perrault, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.18186  [pdf, other

    stat.ML cs.LG

    Model-free Posterior Sampling via Learning Rate Randomization

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieve… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: NeurIPS-2023

  2. arXiv:2310.17303  [pdf, ps, other

    stat.ML cs.LG

    Demonstration-Regularized RL

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavio… ▽ More

    Submitted 10 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: This revision fixes an error due to use of some incorrect results (Lemma 32, Corollary 11 by Talebi & Maillard, 2018) in the proof of Theorem 8. The condition for the RLHF results have slightly changed

  3. arXiv:2303.08059  [pdf, other

    stat.ML cs.LG

    Fast Rates for Maximum Entropy Exploration

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a g… ▽ More

    Submitted 6 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: ICML-2023

  4. arXiv:2302.11182  [pdf, ps, other

    stat.ML cs.LG

    When Combinatorial Thompson Sampling meets Approximation Regret

    Authors: Pierre Perrault

    Abstract: We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon $T$) [Wang… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Neurips 2022

  5. arXiv:2101.01631  [pdf, other

    cs.DS

    On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions

    Authors: Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

    Abstract: We demonstrate that from an algorithm guaranteeing an approximation factor for the ratio of submodular (RS) optimization problem, we can build another algorithm having a different kind of approximation guarantee -- weaker than the classical one -- for the difference of submodular (DS) optimization problem, and vice versa. We also illustrate the link between these two problems by analyzing a \texts… ▽ More

    Submitted 9 September, 2022; v1 submitted 5 January, 2021; originally announced January 2021.

  6. arXiv:2006.06613  [pdf, ps, other

    stat.ML cs.LG

    Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

    Authors: Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

    Abstract: We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propo… ▽ More

    Submitted 3 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: accepted to NeurIPS 2020

  7. arXiv:1906.08509  [pdf, other

    stat.ML cs.LG math.OC

    Online A-Optimal Design and Active Linear Regression

    Authors: Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

    Abstract: We consider in this paper the problem of optimal experiment design where a decision maker can choose which points to sample to obtain an estimate $\hatβ$ of the hidden parameter $β^{\star}$ of an underlying linear model. The key challenge of this work lies in the heteroscedasticity assumption that we make, meaning that each covariate has a different and unknown variance. The goal of the decision m… ▽ More

    Submitted 30 December, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: 29 pages, 5 figures

  8. arXiv:1902.03794  [pdf, other

    stat.ML cs.LG

    Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

    Authors: Pierre Perrault, Vianney Perchet, Michal Valko

    Abstract: We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as \emph{independence}. However, while being optimal in terms of asymptotic regret, these algorithms are inefficient. In our paper, we first reduce their implementation to a specific \emph{submod… ▽ More

    Submitted 20 June, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: Accepted to ICML 2019, Long Beach

  9. arXiv:1806.02282  [pdf, ps, other

    stat.ML cs.LG math.OC

    Finding the bandit in a graph: Sequential search-and-stop

    Authors: Pierre Perrault, Vianney Perchet, Michal Valko

    Abstract: We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and resta… ▽ More

    Submitted 22 April, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: in International Conference on Artificial Intelligence and Statistics (AISTATS 2019), April 2019, Naha, Okinawa, Japan