Skip to main content

Showing 1–22 of 22 results for author: Naumov, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.16644  [pdf, other

    stat.ML cs.LG math.OC math.PR math.ST

    Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning

    Authors: Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey Naumov

    Abstract: In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Our findings reveal that the fastest rate of normal approximation is achieved when setting the most aggressive step size $α_{k} \asymp k^{-1/2}$. Moreover, we prove the non-asymptotic validit… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    MSC Class: 60F05; 62L20; 62E20

  2. arXiv:2402.04114  [pdf, other

    stat.ML cs.LG math.OC

    SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

    Authors: Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, Eric Moulines

    Abstract: In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $ε$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that u… ▽ More

    Submitted 27 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: now with linear speed-up!

  3. arXiv:2310.18186  [pdf, other

    stat.ML cs.LG

    Model-free Posterior Sampling via Learning Rate Randomization

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieve… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: NeurIPS-2023

  4. arXiv:2310.17303  [pdf, ps, other

    stat.ML cs.LG

    Demonstration-Regularized RL

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavio… ▽ More

    Submitted 10 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: This revision fixes an error due to use of some incorrect results (Lemma 32, Corollary 11 by Talebi & Maillard, 2018) in the proof of Theorem 8. The condition for the RLHF results have slightly changed

  5. arXiv:2310.14286  [pdf, ps, other

    stat.ML cs.LG math.OC

    Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability

    Authors: Sergey Samsonov, Daniil Tiapkin, Alexey Naumov, Eric Moulines

    Abstract: In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear function approximation for policy evaluation in discounted Markov decision processes. We show that a simple algorithm with a universal and instance-independent step size together with Polyak-Ruppert tail averaging is sufficient to obtain near-optimal variance and bias… ▽ More

    Submitted 15 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to COLT-2024

    MSC Class: 62L20; 60J20

  6. arXiv:2310.12934  [pdf, other

    cs.LG stat.ML

    Generative Flow Networks as Entropy-Regularized RL

    Authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov

    Abstract: The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We de… ▽ More

    Submitted 25 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024 (Oral)

  7. arXiv:2305.15938  [pdf, ps, other

    math.OC cs.LG stat.ML

    First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

    Authors: Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, Eric Moulines

    Abstract: This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the unde… ▽ More

    Submitted 30 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). 41 pages, 3 algorithms, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2023/hash/8c3e38ce55a0fa44bc325bc6fdb7f4e5-Abstract-Conference.html

  8. arXiv:2304.03056  [pdf, ps, other

    math.PR math.ST stat.ML

    Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

    Authors: Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko

    Abstract: In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize similar bounds for the B… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  9. arXiv:2304.01111  [pdf, ps, other

    math.ST cs.LG math.PR stat.ME stat.ML

    Theoretical guarantees for neural control variates in MCMC

    Authors: Denis Belomestny, Artur Goldman, Alexey Naumov, Sergey Samsonov

    Abstract: In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlyin… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    MSC Class: 65C40; 62-08

  10. arXiv:2303.08059  [pdf, other

    stat.ML cs.LG

    Fast Rates for Maximum Entropy Exploration

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a g… ▽ More

    Submitted 6 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: ICML-2023

  11. arXiv:2303.05838  [pdf, ps, other

    math.PR math.ST stat.ML

    Rosenthal-type inequalities for linear statistics of Markov chains

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Marina Sheshukova

    Abstract: In this paper, we establish novel deviation bounds for additive functionals of geometrically ergodic Markov chains similar to Rosenthal and Bernstein inequalities for sums of independent random variables. We pay special attention to the dependence of our bounds on the mixing time of the corresponding chain. More precisely, we establish explicit bounds that are linked to the constants from the mart… ▽ More

    Submitted 28 June, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    MSC Class: 60E15; 60J20; 65C40

  12. arXiv:2209.14414  [pdf, other

    stat.ML cs.LG

    Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

    Abstract: We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of poste… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.07704

  13. arXiv:2207.04475  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov

    Abstract: This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} θ= \bar{\mathbf{b}}$ for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by (asymptotically) unbiased observations… ▽ More

    Submitted 29 March, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

    MSC Class: 62L20; 60J20

  14. arXiv:2206.09527  [pdf, other

    math.NA math.ST stat.ML

    Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations

    Authors: Denis Belomestny, Alexey Naumov, Nikita Puchkin, Sergey Samsonov

    Abstract: This paper investigates the approximation properties of deep neural networks with piecewise-polynomial activation functions. We derive the required depth, width, and sparsity of a deep neural network to approximate any Hölder smooth function up to a given approximation error in Hölder norms in such a way that all weights of this neural network are bounded by $1$. The latter feature is essential to… ▽ More

    Submitted 2 December, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

    Comments: 28 pages

    MSC Class: 41A25; 41A15; 41A28; 68T07

  15. arXiv:2205.07704  [pdf, other

    stat.ML cs.LG

    From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

    Authors: Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order… ▽ More

    Submitted 22 June, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

  16. arXiv:2111.02702  [pdf, other

    stat.ML cs.LG

    Local-Global MCMC kernels: the best of both worlds

    Authors: Sergey Samsonov, Evgeny Lagutin, Marylou Gabrié, Alain Durmus, Alexey Naumov, Eric Moulines

    Abstract: Recent works leveraging learning to enhance sampling have shown promising results, in particular by designing effective non-local moves and global proposals. However, learning accuracy is inevitably limited in regions where little data is available such as in the tails of distributions as well as in high-dimensional problems. In the present paper we study an Explore-Exploit Markov chain Monte Carl… ▽ More

    Submitted 4 October, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: text overlap with arXiv:1111.5421 by other authors

  17. arXiv:2106.01257  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Kevin Scaman, Hoi-To Wai

    Abstract: This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}θ= \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$. Our ana… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 21 pages

  18. arXiv:2102.00199  [pdf, ps, other

    math.ST stat.ML

    Rates of convergence for density estimation with generative adversarial networks

    Authors: Nikita Puchkin, Sergey Samsonov, Denis Belomestny, Eric Moulines, Alexey Naumov

    Abstract: In this work we undertake a thorough study of the non-asymptotic properties of the vanilla generative adversarial networks (GANs). We prove an oracle inequality for the Jensen-Shannon (JS) divergence between the underlying density $\mathsf{p}^*$ and the GAN estimate with a significantly better statistical error term compared to the previously known results. The advantage of our bound becomes clear… ▽ More

    Submitted 25 January, 2024; v1 submitted 30 January, 2021; originally announced February 2021.

    Comments: To appear in Journal of Machine Learning Research

  19. arXiv:2102.00185  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Hoi-To Wai

    Abstract: This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

  20. arXiv:2008.06858  [pdf, other

    math.ST stat.CO

    Variance reduction for dependent sequences with applications to Stochastic Gradient MCMC

    Authors: D. Belomestny, L. Iosipoi, E. Moulines, A. Naumov, S. Samsonov

    Abstract: In this paper we propose a novel and practical variance reduction approach for additive functionals of dependent sequences. Our approach combines the use of control variates with the minimisation of an empirical variance estimate. We analyse finite sample properties of the proposed method and derive finite-time bounds of the excess asymptotic variance to zero. We apply our methodology to Stochasti… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    MSC Class: 60J20; 65C40; 65C60

  21. arXiv:2002.01268  [pdf, other

    stat.ML cs.LG

    Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

    Authors: Maxim Kaledin, Eric Moulines, Alexey Naumov, Vladislav Tadic, Hoi-To Wai

    Abstract: Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

  22. arXiv:1910.03643  [pdf, other

    math.ST cs.LG math.PR stat.CO stat.ML

    Variance reduction for Markov chains with application to MCMC

    Authors: D. Belomestny, L. Iosipoi, E. Moulines, A. Naumov, S. Samsonov

    Abstract: In this paper we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample variance. This feature is theoretically demonstrated by… ▽ More

    Submitted 15 February, 2020; v1 submitted 8 October, 2019; originally announced October 2019.