Skip to main content

Showing 1–15 of 15 results for author: Tiapkin, D

.
  1. arXiv:2407.05704  [pdf, ps, other

    cs.LG

    Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

    Authors: Daniil Tiapkin, Evgenii Chzhen, Gilles Stoltz

    Abstract: In this paper, we consider the problem of learning in adversarial Markov decision processes [MDPs] with an oblivious adversary in a full-information setting. The agent interacts with an environment during $T$ episodes, each of which consists of $H$ stages, and each episode is evaluated with respect to a reward function that will be revealed only at the end of the episode. We propose an algorithm,… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2406.13655  [pdf, other

    cs.LG cs.AI

    Improving GFlowNets with Monte Carlo Tree Search

    Authors: Nikita Morozov, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov

    Abstract: Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insights, we propose to enhance planning capabilities of… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2024 SPIGM Workshop

  3. arXiv:2403.03811  [pdf, other

    stat.ML cs.GT cs.LG

    Incentivized Learning in Principal-Agent Bandit Games

    Authors: Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine, El Mahdi El Mhamdi, Eric Moulines, Michael I. Jordan, Alain Durmus

    Abstract: This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent. The principal and the agent have misaligned objectives and the choice of action is only left to the agent. However, the principal can influence the agent's decisions by offering incentives which add up to his rewards. The principal aims to iteratively learn an i… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2310.18186  [pdf, other

    stat.ML cs.LG

    Model-free Posterior Sampling via Learning Rate Randomization

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieve… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: NeurIPS-2023

  5. arXiv:2310.17303  [pdf, ps, other

    stat.ML cs.LG

    Demonstration-Regularized RL

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavio… ▽ More

    Submitted 10 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: This revision fixes an error due to use of some incorrect results (Lemma 32, Corollary 11 by Talebi & Maillard, 2018) in the proof of Theorem 8. The condition for the RLHF results have slightly changed

  6. arXiv:2310.14286  [pdf, ps, other

    stat.ML cs.LG math.OC

    Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability

    Authors: Sergey Samsonov, Daniil Tiapkin, Alexey Naumov, Eric Moulines

    Abstract: In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear function approximation for policy evaluation in discounted Markov decision processes. We show that a simple algorithm with a universal and instance-independent step size together with Polyak-Ruppert tail averaging is sufficient to obtain near-optimal variance and bias… ▽ More

    Submitted 15 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to COLT-2024

    MSC Class: 62L20; 60J20

  7. arXiv:2310.12934  [pdf, other

    cs.LG stat.ML

    Generative Flow Networks as Entropy-Regularized RL

    Authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov

    Abstract: The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We de… ▽ More

    Submitted 25 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024 (Oral)

  8. arXiv:2304.03056  [pdf, ps, other

    math.PR math.ST stat.ML

    Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

    Authors: Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko

    Abstract: In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize similar bounds for the B… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  9. arXiv:2303.09261  [pdf, other

    math.OC stat.ML

    Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

    Authors: Sholom Schechtman, Daniil Tiapkin, Michael Muehlebach, Eric Moulines

    Abstract: We consider the problem of minimizing a non-convex function over a smooth manifold $\mathcal{M}$. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM) which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards $\mathcal{M}$. ODCGM is… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  10. arXiv:2303.08059  [pdf, other

    stat.ML cs.LG

    Fast Rates for Maximum Entropy Exploration

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a g… ▽ More

    Submitted 6 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: ICML-2023

  11. arXiv:2209.14414  [pdf, other

    stat.ML cs.LG

    Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

    Abstract: We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of poste… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.07704

  12. arXiv:2205.07704  [pdf, other

    stat.ML cs.LG

    From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

    Authors: Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order… ▽ More

    Submitted 22 June, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

  13. arXiv:2103.00299  [pdf, other

    math.OC stat.ML

    Primal-Dual Stochastic Mirror Descent for MDPs

    Authors: Daniil Tiapkin, Alexander Gasnikov

    Abstract: We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze this algorit… ▽ More

    Submitted 27 February, 2022; v1 submitted 27 February, 2021; originally announced March 2021.

  14. arXiv:2010.04677  [pdf, other

    math.OC

    Improved Complexity Bounds in Wasserstein Barycenter Problem

    Authors: Darina Dvinskikh, Daniil Tiapkin

    Abstract: In this paper, we focus on computational aspects of the Wasserstein barycenter problem. We propose two algorithms to compute Wasserstein barycenters of $m$ discrete measures of size $n$ with accuracy $\e$. The first algorithm, based on mirror prox with a specific norm, meets the complexity of celebrated accelerated iterative Bregman projections (IBP), namely $\widetilde O(mn^2\sqrt n/\e)$, however… ▽ More

    Submitted 24 February, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: 23 pages

  15. arXiv:2006.06763  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Saddle-Point Optimization for Wasserstein Barycenters

    Authors: Daniil Tiapkin, Alexander Gasnikov, Pavel Dvurechensky

    Abstract: We consider the population Wasserstein barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem. We employ the structure of the problem and obtain a conv… ▽ More

    Submitted 2 December, 2021; v1 submitted 11 June, 2020; originally announced June 2020.