Search | arXiv e-print repository

Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

Authors: Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr Miłoś, Marek Cygan

Abstract: Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, R… ▽ More Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, Regularized, Optimistic) algorithm. The key innovation behind BRO is that strong regularization allows for effective scaling of the critic networks, which, paired with optimistic exploration, leads to superior performance. BRO achieves state-of-the-art results, significantly outperforming the leading model-based and model-free algorithms across 40 complex tasks from the DeepMind Control, MetaWorld, and MyoSuite benchmarks. BRO is the first model-free algorithm to achieve near-optimal policies in the notoriously challenging Dog and Humanoid tasks. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2403.01014 [pdf, other]

A Case for Validation Buffer in Pessimistic Actor-Critic

Authors: Michal Nauman, Mateusz Ostaszewski, Marek Cygan

Abstract: In this paper, we investigate the issue of error accumulation in critic networks updated via pessimistic temporal difference objectives. We show that the critic approximation error can be approximated via a recursive fixed-point model similar to that of the Bellman value. We use such recursive definition to retrieve the conditions under which the pessimistic critic is unbiased. Building on these i… ▽ More In this paper, we investigate the issue of error accumulation in critic networks updated via pessimistic temporal difference objectives. We show that the critic approximation error can be approximated via a recursive fixed-point model similar to that of the Bellman value. We use such recursive definition to retrieve the conditions under which the pessimistic critic is unbiased. Building on these insights, we propose Validation Pessimism Learning (VPL) algorithm. VPL uses a small validation buffer to adjust the levels of pessimism throughout the agent training, with the pessimism set such that the approximation error of the critic targets is minimized. We investigate the proposed approach on a variety of locomotion and manipulation tasks and report improvements in sample efficiency and performance. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2403.00514 [pdf, other]

Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

Authors: Michal Nauman, Michał Bortkiewicz, Piotr Miłoś, Tomasz Trzciński, Mateusz Ostaszewski, Marek Cygan

Abstract: Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms ra… ▽ More Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However, many of these techniques have been tested in limited settings, often on tasks from single simulation benchmarks and against well-known algorithms rather than a range of regularization approaches. This limits our understanding of the specific mechanisms driving RL improvements. To address this, we implemented over 60 different off-policy agents, each integrating established regularization techniques from recent state-of-the-art algorithms. We tested these agents across 14 diverse tasks from 2 simulation benchmarks, measuring training metrics related to overestimation, overfitting, and plasticity loss -- issues that motivate the examined regularization techniques. Our findings reveal that while the effectiveness of a specific regularization setup varies with the task, certain combinations consistently demonstrate robust and superior performance. Notably, a simple Soft Actor-Critic agent, appropriately regularized, reliably finds a better-performing policy within the training regime, which previously was achieved mainly through model-based approaches. △ Less

Submitted 19 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: ICML 2024

arXiv:2402.03500 [pdf, other]

Curriculum reinforcement learning for quantum architecture search under hardware errors

Authors: Yash J. Patel, Akash Kundu, Mateusz Ostaszewski, Xavier Bonet-Monroig, Vedran Dunjko, Onur Danaci

Abstract: The key challenge in the noisy intermediate-scale quantum era is finding useful circuits compatible with current device limitations. Variational quantum algorithms (VQAs) offer a potential solution by fixing the circuit architecture and optimizing individual gate parameters in an external loop. However, parameter optimization can become intractable, and the overall performance of the algorithm dep… ▽ More The key challenge in the noisy intermediate-scale quantum era is finding useful circuits compatible with current device limitations. Variational quantum algorithms (VQAs) offer a potential solution by fixing the circuit architecture and optimizing individual gate parameters in an external loop. However, parameter optimization can become intractable, and the overall performance of the algorithm depends heavily on the initially chosen circuit architecture. Several quantum architecture search (QAS) algorithms have been developed to design useful circuit architectures automatically. In the case of parameter optimization alone, noise effects have been observed to dramatically influence the performance of the optimizer and final outcomes, which is a key line of study. However, the effects of noise on the architecture search, which could be just as critical, are poorly understood. This work addresses this gap by introducing a curriculum-based reinforcement learning QAS (CRLQAS) algorithm designed to tackle challenges in realistic VQA deployment. The algorithm incorporates (i) a 3D architecture encoding and restrictions on environment dynamics to explore the search space of possible circuits efficiently, (ii) an episode halting scheme to steer the agent to find shorter circuits, and (iii) a novel variant of simultaneous perturbation stochastic approximation as an optimizer for faster convergence. To facilitate studies, we developed an optimized simulator for our algorithm, significantly improving computational efficiency in simulating noisy quantum circuits by employing the Pauli-transfer matrix formalism in the Pauli-Liouville basis. Numerical experiments focusing on quantum chemistry tasks demonstrate that CRLQAS outperforms existing QAS algorithms across several metrics in both noiseless and noisy environments. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 32 pages, 11 figures, 6 tables. Accepted at ICLR 2024

arXiv:2402.02868 [pdf, other]

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

Authors: Maciej Wołczyk, Bartłomiej Cupiał, Mateusz Ostaszewski, Michał Bortkiewicz, Michał Zając, Razvan Pascanu, Łukasz Kuciński, Piotr Miłoś

Abstract: Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: for… ▽ More Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from $5$K to over $10$K points in the Human Monk scenario. △ Less

Submitted 12 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2310.19537 [pdf, other]

On consequences of finetuning on data with highly discriminative features

Authors: Wojciech Masarczyk, Tomasz Trzciński, Mateusz Ostaszewski

Abstract: In the era of transfer learning, training neural networks from scratch is becoming obsolete. Transfer learning leverages prior knowledge for new tasks, conserving computational resources. While its advantages are well-documented, we uncover a notable drawback: networks tend to prioritize basic data patterns, forsaking valuable pre-learned features. We term this behavior "feature erosion" and analy… ▽ More In the era of transfer learning, training neural networks from scratch is becoming obsolete. Transfer learning leverages prior knowledge for new tasks, conserving computational resources. While its advantages are well-documented, we uncover a notable drawback: networks tend to prioritize basic data patterns, forsaking valuable pre-learned features. We term this behavior "feature erosion" and analyze its impact on network performance and internal representations. △ Less

Submitted 15 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 -- UniReps Workshop

arXiv:2306.11086 [pdf, other]

doi 10.1088/1367-2630/ad1b7f

Enhancing variational quantum state diagonalization using reinforcement learning techniques

Authors: Akash Kundu, Przemysław Bedełek, Mateusz Ostaszewski, Onur Danaci, Yash J. Patel, Vedran Dunjko, Jarosław A. Miszczak

Abstract: The variational quantum algorithms are crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called variational quantum state diagonalization method, which constitutes an important algorithmic subroutine and can be… ▽ More The variational quantum algorithms are crucial for the application of NISQ computers. Such algorithms require short quantum circuits, which are more amenable to implementation on near-term hardware, and many such methods have been developed. One of particular interest is the so-called variational quantum state diagonalization method, which constitutes an important algorithmic subroutine and can be used directly to work with data encoded in quantum states. In particular, it can be applied to discern the features of quantum states, such as entanglement properties of a system, or in quantum machine learning algorithms. In this work, we tackle the problem of designing a very shallow quantum circuit, required in the quantum state diagonalization task, by utilizing reinforcement learning (RL). We use a novel encoding method for the RL-state, a dense reward function, and an $ε$-greedy policy to achieve this. We demonstrate that the circuits proposed by the reinforcement learning methods are shallower than the standard variational quantum state diagonalization algorithm and thus can be used in situations where hardware capabilities limit the depth of quantum circuits. The methods we propose in the paper can be readily adapted to address a wide range of variational quantum algorithms. △ Less

Submitted 11 January, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 24 pages with 13 figures, accepted in the New Journal of Physics, code available at https://github.com/iitis/RL_for_VQSD_ansatz_optimization

Journal ref: New Journal of Physics, 26, 013034 (2024)

arXiv:2305.19753 [pdf, other]

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Authors: Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Miłoś, Tomasz Trzciński

Abstract: Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers crea… ▽ More Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearly-separable representations, while the subsequent layers, which we refer to as \textit{the tunnel}, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning. △ Less

Submitted 30 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2211.15944 [pdf, other]

The Effectiveness of World Models for Continual Reinforcement Learning

Authors: Samuel Kessler, Mateusz Ostaszewski, Michał Bortkiewicz, Mateusz Żarski, Maciej Wołczyk, Jack Parker-Holder, Stephen J. Roberts, Piotr Miłoś

Abstract: World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay meth… ▽ More World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks. △ Less

Submitted 12 July, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted at CoLLAs 2023, 21 pages, 15 figures

arXiv:2211.06351 [pdf, other]

Emergency action termination for immediate reaction in hierarchical reinforcement learning

Authors: Michał Bortkiewicz, Jakub Łyskawa, Paweł Wawrzyński, Mateusz Ostaszewski, Artur Grudkowski, Tomasz Trzciński

Abstract: Hierarchical decomposition of control is unavoidable in large dynamical systems. In reinforcement learning (RL), it is usually solved with subgoals defined at higher policy levels and achieved at lower policy levels. Reaching these goals can take a substantial amount of time, during which it is not verified whether they are still worth pursuing. However, due to the randomness of the environment, t… ▽ More Hierarchical decomposition of control is unavoidable in large dynamical systems. In reinforcement learning (RL), it is usually solved with subgoals defined at higher policy levels and achieved at lower policy levels. Reaching these goals can take a substantial amount of time, during which it is not verified whether they are still worth pursuing. However, due to the randomness of the environment, these goals may become obsolete. In this paper, we address this gap in the state-of-the-art approaches and propose a method in which the validity of higher-level actions (thus lower-level goals) is constantly verified at the higher level. If the actions, i.e. lower level goals, become inadequate, they are replaced by more appropriate ones. This way we combine the advantages of hierarchical RL, which is fast training, and flat RL, which is immediate reactivity. We study our approach experimentally on seven benchmark environments. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2208.00156 [pdf, other]

Reinforcement learning with experience replay and adaptation of action dispersion

Authors: Paweł Wawrzyński, Wojciech Masarczyk, Mateusz Ostaszewski

Abstract: Effective reinforcement learning requires a proper balance of exploration and exploitation defined by the dispersion of action distribution. However, this balance depends on the task, the current stage of the learning process, and the current environment state. Existing methods that designate the action distribution dispersion require problem-dependent hyperparameters. In this paper, we propose to… ▽ More Effective reinforcement learning requires a proper balance of exploration and exploitation defined by the dispersion of action distribution. However, this balance depends on the task, the current stage of the learning process, and the current environment state. Existing methods that designate the action distribution dispersion require problem-dependent hyperparameters. In this paper, we propose to automatically designate the action distribution dispersion using the following principle: This distribution should have sufficient dispersion to enable the evaluation of future policies. To that end, the dispersion should be tuned to assure a sufficiently high probability (densities) of the actions in the replay buffer and the modes of the distributions that generated them, yet this dispersion should not be higher. This way, a policy can be effectively evaluated based on the actions in the buffer, but exploratory randomness in actions decreases when this policy converges. The above principle is verified here on challenging benchmarks Ant, HalfCheetah, Hopper, and Walker2D, with good results. Our method makes the action standard deviations converge to values similar to those resulting from trial-and-error optimization. △ Less

Submitted 30 July, 2022; originally announced August 2022.

ACM Class: I.2.6

arXiv:2103.16089 [pdf, other]

Reinforcement learning for optimization of variational quantum circuit architectures

Authors: Mateusz Ostaszewski, Lea M. Trenkwalder, Wojciech Masarczyk, Eleanor Scerri, Vedran Dunjko

Abstract: The study of Variational Quantum Eigensolvers (VQEs) has been in the spotlight in recent times as they may lead to real-world applications of near-term quantum devices. However, their performance depends on the structure of the used variational ansatz, which requires balancing the depth and expressivity of the corresponding circuit. In recent years, various methods for VQE structure optimization h… ▽ More The study of Variational Quantum Eigensolvers (VQEs) has been in the spotlight in recent times as they may lead to real-world applications of near-term quantum devices. However, their performance depends on the structure of the used variational ansatz, which requires balancing the depth and expressivity of the corresponding circuit. In recent years, various methods for VQE structure optimization have been introduced but the capacities of machine learning to aid with this problem has not yet been fully investigated. In this work, we propose a reinforcement learning algorithm that autonomously explores the space of possible ans{ä}tze, identifying economic circuits which still yield accurate ground energy estimates. The algorithm is intrinsically motivated, and it incrementally improves the accuracy of the result while minimizing the circuit depth. We showcase the performance of our algorithm on the problem of estimating the ground-state energy of lithium hydride (LiH). In this well-known benchmark problem, we achieve chemical accuracy, as well as state-of-the-art results in terms of circuit depth. △ Less

Submitted 30 March, 2021; originally announced March 2021.

arXiv:1909.05507 [pdf, other]

doi 10.3390/rs12162653

Effective training of deep convolutional neural networks for hyperspectral image classification through artificial labeling

Authors: Wojciech Masarczyk, Przemysław Głomb, Bartosz Grabowski, Mateusz Ostaszewski

Abstract: Hyperspectral imaging is a rich source of data, allowing for multitude of effective applications. However, such imaging remains challenging because of large data dimension and, typically, small pool of available training examples. While deep learning approaches have been shown to be successful in providing effective classification solutions, especially for high dimensional problems, unfortunately… ▽ More Hyperspectral imaging is a rich source of data, allowing for multitude of effective applications. However, such imaging remains challenging because of large data dimension and, typically, small pool of available training examples. While deep learning approaches have been shown to be successful in providing effective classification solutions, especially for high dimensional problems, unfortunately they work best with a lot of labelled examples available. To alleviate the second requirement for a particular dataset the transfer learning approach can be used: first the network is pre-trained on some dataset with large amount of training labels available, then the actual dataset is used to fine-tune the network. This strategy is not straightforward to apply with hyperspectral images, as it is often the case that only one particular image of some type or characteristic is available. In this paper, we propose and investigate a simple and effective strategy of transfer learning that uses unsupervised pre-training step without label information. This approach can be applied to many of the hyperspectral classification problems. Performed experiments show that it is very effective at improving the classification accuracy without being restricted to a particular image type or neural network architecture. The experiments were carried out on several deep neural network architectures and various sizes of labeled training sets. The greatest improvement in overall accuracy on the Indian Pines and Pavia University datasets is over 21 and 13 percentage points, respectively. An additional advantage of the proposed approach is the unsupervised nature of the pre-training step, which can be done immediately after image acquisition, without the need of the potentially costly expert's time. △ Less

Submitted 22 October, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Journal ref: Remote Sens. 2020, 12, 2653

arXiv:1803.05193 [pdf, other]

doi 10.1007/s11128-019-2240-7

Approximation of quantum control correction scheme using deep neural networks

Authors: M. Ostaszewski, J. A. Miszczak, P. Sadowski, L. Banchi

Abstract: We study the functional relationship between quantum control pulses in the idealized case and the pulses in the presence of an unwanted drift. We show that a class of artificial neural networks called LSTM is able to model this functional relationship with high efficiency, and hence the correction scheme required to counterbalance the effect of the drift. Our solution allows studying the map** f… ▽ More We study the functional relationship between quantum control pulses in the idealized case and the pulses in the presence of an unwanted drift. We show that a class of artificial neural networks called LSTM is able to model this functional relationship with high efficiency, and hence the correction scheme required to counterbalance the effect of the drift. Our solution allows studying the map** from quantum control pulses to system dynamics and then analysing the robustness of the latter against local variations in the control profile. △ Less

Submitted 28 March, 2019; v1 submitted 14 March, 2018; originally announced March 2018.

Comments: 6 pages, 3 figures, Python code available upon request. arXiv admin note: text overlap with arXiv:1803.05169

Journal ref: Quantum Inf Process (2019), 18:126

arXiv:1803.05169 [pdf, other]

doi 10.1088/1751-8121/ab8244

Geometrical versus time-series representation of data in quantum control learning

Authors: M. Ostaszewski, J. A. Miszczak, P. Sadowski

Abstract: Recently machine learning techniques have become popular for analysing physical systems and solving problems occurring in quantum computing. In this paper we focus on using such techniques for finding the sequence of physical operations implementing the given quantum logical operation. In this context we analyse the flexibility of the data representation and compare the applicability of two machin… ▽ More Recently machine learning techniques have become popular for analysing physical systems and solving problems occurring in quantum computing. In this paper we focus on using such techniques for finding the sequence of physical operations implementing the given quantum logical operation. In this context we analyse the flexibility of the data representation and compare the applicability of two machine learning approaches based on different representations of data. We demonstrate that the utilization of the geometrical structure of control pulses is sufficient for achieving high-fidelity of the implemented evolution. We also demonstrate that artificial neural networks, unlike geometrical methods, posses the generalization abilities enabling them to generate control pulses for the systems with variable strength of the disturbance. The presented results suggest that in some quantum control scenarios, geometrical data representation and processing is competitive to more complex methods. △ Less

Submitted 29 April, 2020; v1 submitted 14 March, 2018; originally announced March 2018.

Comments: 12 pages, 14 figures, Python code available upon the request

Journal ref: Journal of Physics A: Mathematical and Theoretical, Volume 53, Number 19 (2020)

arXiv:1512.02802 [pdf, other]

doi 10.1088/1751-8113/49/37/375302

Lively quantum walks on cycles

Authors: Przemysław Sadowski, Jarosław Adam Miszczak, Mateusz Ostaszewski

Abstract: We introduce a family of quantum walks on cycles parametrized by their liveliness, defined by the ability to execute a long-range move. We investigate the behaviour of the probability distribution and time-averaged probability distribution. We show that the liveliness parameter, controlling the magnitude of the additional long-range move, has a direct impact on the periodicity of the limiting dist… ▽ More We introduce a family of quantum walks on cycles parametrized by their liveliness, defined by the ability to execute a long-range move. We investigate the behaviour of the probability distribution and time-averaged probability distribution. We show that the liveliness parameter, controlling the magnitude of the additional long-range move, has a direct impact on the periodicity of the limiting distribution. We also show that the introduced model provides a method for network exploration which is robust against trap**. △ Less

Submitted 8 February, 2017; v1 submitted 9 December, 2015; originally announced December 2015.

Comments: 13 pages

MSC Class: 81P45; 94A05; 05C81 ACM Class: C.2.1; I.6.5

Journal ref: J. Phys. A: Math. Theor. 49 375302 (2016)

arXiv:1504.00580 [pdf, ps, other]

doi 10.20904/271001

Quantum image classification using principal component analysis

Authors: Mateusz Ostaszewski, Przemysław Sadowski, Piotr Gawron

Abstract: We present a novel quantum algorithm for classification of images. The algorithm is constructed using principal component analysis and von Neuman quantum measurements. In order to apply the algorithm we present a new quantum representation of grayscale images. We present a novel quantum algorithm for classification of images. The algorithm is constructed using principal component analysis and von Neuman quantum measurements. In order to apply the algorithm we present a new quantum representation of grayscale images. △ Less

Submitted 2 April, 2015; originally announced April 2015.

Comments: 9 pages

Journal ref: Theoretical and Applied Informatics, Vol. 27, No. 1, pp. 1-12 (2015)

Showing 1–17 of 17 results for author: Ostaszewski, M