Skip to main content

Showing 1–25 of 25 results for author: Naumov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13655  [pdf, other

    cs.LG cs.AI

    Improving GFlowNets with Monte Carlo Tree Search

    Authors: Nikita Morozov, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov

    Abstract: Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insights, we propose to enhance planning capabilities of… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICML 2024 SPIGM Workshop

  2. arXiv:2406.10019  [pdf, other

    cs.LG cs.AI cs.CL cs.CV math.NA

    Group and Shuffle: Efficient Structured Orthogonal Parametrization

    Authors: Mikhail Gorbunov, Nikolay Yudin, Vera Soboleva, Aibek Alanov, Alexey Naumov, Maxim Rakhuba

    Abstract: The increasing size of neural networks has led to a growing demand for methods of efficient fine-tuning. Recently, an orthogonal fine-tuning paradigm was introduced that uses orthogonal matrices for adapting the weights of a pretrained model. In this paper, we introduce a new class of structured matrices, which unifies and generalizes structured classes from previous works. We examine properties o… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2405.16644  [pdf, other

    stat.ML cs.LG math.OC math.PR math.ST

    Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning

    Authors: Sergey Samsonov, Eric Moulines, Qi-Man Shao, Zhuo-Song Zhang, Alexey Naumov

    Abstract: In this paper, we obtain the Berry-Esseen bound for multivariate normal approximation for the Polyak-Ruppert averaged iterates of the linear stochastic approximation (LSA) algorithm with decreasing step size. Our findings reveal that the fastest rate of normal approximation is achieved when setting the most aggressive step size $α_{k} \asymp k^{-1/2}$. Moreover, we prove the non-asymptotic validit… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    MSC Class: 60F05; 62L20; 62E20

  4. arXiv:2402.04114  [pdf, other

    stat.ML cs.LG math.OC

    SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

    Authors: Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, Eric Moulines

    Abstract: In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication complexity of FedLSA scales polynomially with the inverse of the desired accuracy $ε$. To overcome this, we propose SCAFFLSA a new variant of FedLSA that u… ▽ More

    Submitted 27 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: now with linear speed-up!

  5. arXiv:2401.16367  [pdf, other

    cs.LG cs.AI cs.CL

    TQCompressor: improving tensor decomposition methods in neural networks via permutations

    Authors: V. Abronin, A. Naumov, D. Mazur, D. Bystrov, K. Tsarova, Ar. Melnikov, I. Oseledets, S. Dolgov, R. Brasher, M. Perelshtein

    Abstract: We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associ… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  6. arXiv:2310.18186  [pdf, other

    stat.ML cs.LG

    Model-free Posterior Sampling via Learning Rate Randomization

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieve… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: NeurIPS-2023

  7. arXiv:2310.17303  [pdf, ps, other

    stat.ML cs.LG

    Demonstration-Regularized RL

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

    Abstract: Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavio… ▽ More

    Submitted 10 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: This revision fixes an error due to use of some incorrect results (Lemma 32, Corollary 11 by Talebi & Maillard, 2018) in the proof of Theorem 8. The condition for the RLHF results have slightly changed

  8. arXiv:2310.14286  [pdf, ps, other

    stat.ML cs.LG math.OC

    Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability

    Authors: Sergey Samsonov, Daniil Tiapkin, Alexey Naumov, Eric Moulines

    Abstract: In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear function approximation for policy evaluation in discounted Markov decision processes. We show that a simple algorithm with a universal and instance-independent step size together with Polyak-Ruppert tail averaging is sufficient to obtain near-optimal variance and bias… ▽ More

    Submitted 15 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to COLT-2024

    MSC Class: 62L20; 60J20

  9. arXiv:2310.12934  [pdf, other

    cs.LG stat.ML

    Generative Flow Networks as Entropy-Regularized RL

    Authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov

    Abstract: The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We de… ▽ More

    Submitted 25 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024 (Oral)

  10. arXiv:2308.11562  [pdf, other

    eess.IV cs.CV

    EndoNet: model for automatic calculation of H-score on histological slides

    Authors: Egor Ushakov, Anton Naumov, Vladislav Fomberg, Polina Vishnyakova, Aleksandra Asaturova, Alina Badlaeva, Anna Tregubova, Evgeny Karpulevich, Gennady Sukhikh, Timur Fatkhudinov

    Abstract: H-score is a semi-quantitative method used to assess the presence and distribution of proteins in tissue samples by combining the intensity of staining and percentage of stained nuclei. It is widely used but time-consuming and can be limited in accuracy and precision. Computer-aided methods may help overcome these limitations and improve the efficiency of pathologists' workflows. In this work, we… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  11. arXiv:2305.15938  [pdf, ps, other

    math.OC cs.LG stat.ML

    First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities

    Authors: Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, Eric Moulines

    Abstract: This paper delves into stochastic optimization problems that involve Markovian noise. We present a unified approach for the theoretical analysis of first-order gradient methods for stochastic optimization and variational inequalities. Our approach covers scenarios for both non-convex and strongly convex minimization problems. To achieve an optimal (linear) dependence on the mixing time of the unde… ▽ More

    Submitted 30 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Appears in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). 41 pages, 3 algorithms, 2 tables

    Journal ref: https://proceedings.neurips.cc/paper_files/paper/2023/hash/8c3e38ce55a0fa44bc325bc6fdb7f4e5-Abstract-Conference.html

  12. arXiv:2304.01111  [pdf, ps, other

    math.ST cs.LG math.PR stat.ME stat.ML

    Theoretical guarantees for neural control variates in MCMC

    Authors: Denis Belomestny, Artur Goldman, Alexey Naumov, Sergey Samsonov

    Abstract: In this paper, we propose a variance reduction approach for Markov chains based on additive control variates and the minimization of an appropriate estimate for the asymptotic variance. We focus on the particular case when control variates are represented as deep neural networks. We derive the optimal convergence rate of the asymptotic variance under various ergodicity assumptions on the underlyin… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    MSC Class: 65C40; 62-08

  13. arXiv:2303.16214  [pdf, other

    cs.LG quant-ph

    Tetra-AML: Automatic Machine Learning via Tensor Networks

    Authors: A. Naumov, Ar. Melnikov, V. Abronin, F. Oxanichenko, K. Izmailov, M. Pflitsch, A. Melnikov, M. Perelshtein

    Abstract: Neural networks have revolutionized many aspects of society but in the era of huge models with billions of parameters, optimizing and deploying them for commercial applications can require significant computational and financial resources. To address these challenges, we introduce the Tetra-AML toolbox, which automates neural architecture search and hyperparameter optimization via a custom-develop… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  14. arXiv:2303.08059  [pdf, other

    stat.ML cs.LG

    Fast Rates for Maximum Entropy Exploration

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a g… ▽ More

    Submitted 6 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: ICML-2023

  15. arXiv:2209.14414  [pdf, other

    stat.ML cs.LG

    Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

    Authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

    Abstract: We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of poste… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.07704

  16. arXiv:2207.07689  [pdf, other

    cs.LG cs.AI

    Strict baselines for Covid-19 forecasting and ML perspective for USA and Russia

    Authors: Alexander G. Sboev, Nikolay A. Kudryshov, Ivan A. Moloshnikov, Saveliy V. Zavertyaev, Aleksandr V. Naumov, Roman B. Rybka

    Abstract: Currently, the evolution of Covid-19 allows researchers to gather the datasets accumulated over 2 years and to use them in predictive analysis. In turn, this makes it possible to assess the efficiency potential of more complex predictive models, including neural networks with different forecast horizons. In this paper, we present the results of a consistent comparative study of different types of… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  17. arXiv:2207.04475  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov

    Abstract: This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} θ= \bar{\mathbf{b}}$ for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated by (asymptotically) unbiased observations… ▽ More

    Submitted 29 March, 2023; v1 submitted 10 July, 2022; originally announced July 2022.

    MSC Class: 62L20; 60J20

  18. arXiv:2205.07704  [pdf, other

    stat.ML cs.LG

    From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

    Authors: Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

    Abstract: We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits. Our method uses the quantile of a Q-value function posterior as upper confidence bound on the optimal Q-value function. For Bayes-UCBVI, we prove a regret bound of order… ▽ More

    Submitted 22 June, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

  19. arXiv:2111.02702  [pdf, other

    stat.ML cs.LG

    Local-Global MCMC kernels: the best of both worlds

    Authors: Sergey Samsonov, Evgeny Lagutin, Marylou Gabrié, Alain Durmus, Alexey Naumov, Eric Moulines

    Abstract: Recent works leveraging learning to enhance sampling have shown promising results, in particular by designing effective non-local moves and global proposals. However, learning accuracy is inevitably limited in regions where little data is available such as in the tails of distributions as well as in high-dimensional problems. In the present paper we study an Explore-Exploit Markov chain Monte Carl… ▽ More

    Submitted 4 October, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: text overlap with arXiv:1111.5421 by other authors

  20. arXiv:2106.01257  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Kevin Scaman, Hoi-To Wai

    Abstract: This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}θ= \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$. Our ana… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 21 pages

  21. arXiv:2105.02135  [pdf, other

    cs.LG math.OC

    UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

    Authors: D. Belomestny, I. Levin, E. Moulines, A. Naumov, S. Samsonov, V. Zorina

    Abstract: Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^π$ corresponding to a policy $π$ does not provide reliable information on how far is the policy $π$ from the optimal one. We present a novel model-free upper value iteration procedure $({\sf UVIP})$ that allows us to estimate… ▽ More

    Submitted 3 June, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  22. arXiv:2105.00059  [pdf, other

    cs.CL cs.AI cs.LG

    An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets

    Authors: Alexander Sboev, Sanna Sboeva, Ivan Moloshnikov, Artem Gryaznov, Roman Rybka, Alexander Naumov, Anton Selivanov, Gleb Rylkov, Viacheslav Ilyin

    Abstract: We present the full-size Russian complexly NER-labeled corpus of Internet user reviews, along with an evaluation of accuracy levels reached on this corpus by a set of advanced deep learning neural networks to extract the pharmacologically meaningful entities from Russian texts. The corpus annotation includes mentions of the following entities: Medication (33005 mentions), Adverse Drug Reaction (17… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  23. arXiv:2102.00185  [pdf, ps, other

    stat.ML cs.LG math.PR math.ST

    On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

    Authors: Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Hoi-To Wai

    Abstract: This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

  24. arXiv:2002.01268  [pdf, other

    stat.ML cs.LG

    Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

    Authors: Maxim Kaledin, Eric Moulines, Alexey Naumov, Vladislav Tadic, Hoi-To Wai

    Abstract: Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

  25. arXiv:1910.03643  [pdf, other

    math.ST cs.LG math.PR stat.CO stat.ML

    Variance reduction for Markov chains with application to MCMC

    Authors: D. Belomestny, L. Iosipoi, E. Moulines, A. Naumov, S. Samsonov

    Abstract: In this paper we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample variance. This feature is theoretically demonstrated by… ▽ More

    Submitted 15 February, 2020; v1 submitted 8 October, 2019; originally announced October 2019.