Search | arXiv e-print repository

Demystifying the Recency Heuristic in Temporal-Difference Learning

Authors: Brett Daley, Marlos C. Machado, Martha White

Abstract: The recency heuristic in reinforcement learning is the assumption that stimuli that occurred closer in time to an acquired reward should be more heavily reinforced. The recency heuristic is one of the key assumptions made by TD($λ$), which reinforces recent experiences according to an exponentially decaying weighting. In fact, all other widely used return estimators for TD learning, such as $n$-st… ▽ More The recency heuristic in reinforcement learning is the assumption that stimuli that occurred closer in time to an acquired reward should be more heavily reinforced. The recency heuristic is one of the key assumptions made by TD($λ$), which reinforces recent experiences according to an exponentially decaying weighting. In fact, all other widely used return estimators for TD learning, such as $n$-step returns, satisfy a weaker (i.e., non-monotonic) recency heuristic. Why is the recency heuristic effective for temporal credit assignment? What happens when credit is assigned in a way that violates this heuristic? In this paper, we analyze the specific mathematical implications of adopting the recency heuristic in TD learning. We prove that any return estimator satisfying this heuristic: 1) is guaranteed to converge to the correct value function, 2) has a relatively fast contraction rate, and 3) has a long window of effective credit assignment, yet bounded worst-case variance. We also give a counterexample where on-policy, tabular TD methods violating the recency heuristic diverge. Our results offer some of the first theoretical evidence that credit assignment based on the recency heuristic facilitates learning. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: RLC 2024. 18 pages, 8 figures, 1 table

arXiv:2406.06811 [pdf, other]

Learning Continually by Spectral Regularization

Authors: Alex Lewandowski, Saurabh Kumar, Dale Schuurmans, András György, Marlos C. Machado

Abstract: Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early ph… ▽ More Loss of plasticity is a phenomenon where neural networks become more difficult to train during the course of learning. Continual learning algorithms seek to mitigate this effect by sustaining good predictive performance while maintaining network trainability. We develop new techniques for improving continual learning by first reconsidering how initialization can ensure trainability during early phases of learning. From this perspective, we derive new regularization strategies for continual learning that ensure beneficial initialization properties are better maintained throughout training. In particular, we investigate two new regularization techniques for continual learning: (i) Wasserstein regularization toward the initial weight distribution, which is less restrictive than regularizing toward initial weights; and (ii) regularizing weight matrix singular values, which directly ensures gradient diversity is maintained throughout training. We present an experimental analysis that shows these alternative regularizers can improve continual learning performance across a range of supervised learning tasks and model architectures. The alternative regularizers prove to be less sensitive to hyperparameters while demonstrating better training in individual tasks, sustaining trainability as new tasks arrive, and achieving better generalization performance. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.01712 [pdf, other]

Multiplicity dependence of the $p_T$-spectra for identified particles and its relationship with partonic entropy

Authors: L. S. Moriggi, G. S. Ramos, M. V. T. Machado

Abstract: We investigate the multiplicity dependence of the transverse momentum $p_T$ spectra of hadrons produced in high-energy collisions. We propose that the partonic distribution be parameterized by its non-extensive entropy and the parton saturation scale $Q_s(x)$. These two variables can be identified from the produced charged hadron distributions and provide important information on the gluon dynamic… ▽ More We investigate the multiplicity dependence of the transverse momentum $p_T$ spectra of hadrons produced in high-energy collisions. We propose that the partonic distribution be parameterized by its non-extensive entropy and the parton saturation scale $Q_s(x)$. These two variables can be identified from the produced charged hadron distributions and provide important information on the gluon dynamics at the moment of interaction. From this perspective we interpret data from different ALICE multiplicity classes at $\sqrt{s}= 13$ TeV and $\sqrt{s}= 5.02$ TeV. A multiplicity dependent scaling function is presented and the dependence of the interaction area on multiplicity is also investigated. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

arXiv:2404.15410 [pdf, ps, other]

Planning the path with Reinforcement Learning: Optimal Robot Motion Planning in RoboCup Small Size League Environments

Authors: Mateus G. Machado, João G. Melo, Cleber Zanchettin, Pedro H. M. Braga, Pedro V. Cunha, Edna N. S. Barros, Hansenclever F. Bassani

Abstract: This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle… ▽ More This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle-free environments compared to baseline algorithms. Additionally, our findings demonstrated dynamic obstacle avoidance capabilities, adeptly navigating around moving blocks. These findings highlight the potential of RL to enhance robot motion planning in the challenging and unpredictable SSL environment. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 12 pages, 3 figures, 3 tables

arXiv:2403.10304 [pdf, other]

KIF: A Framework for Virtual Integration of Heterogeneous Knowledge Bases using Wikidata

Authors: Guilherme Lima, Marcelo Machado, Elton Soares, Sandro R. Fiorini, Raphael Thiago, Leonardo G. Azevedo, Viviane T. da Silva, Renato Cerqueira

Abstract: We present a knowledge integration framework (called KIF) that uses Wikidata as a lingua franca to integrate heterogeneous knowledge bases. These can be triplestores, relational databases, CSV files, etc., which may or may not use the Wikidata dialect of RDF. KIF leverages Wikidata's data model and vocabulary plus user-defined map**s to expose a unified view of the integrated bases while kee**… ▽ More We present a knowledge integration framework (called KIF) that uses Wikidata as a lingua franca to integrate heterogeneous knowledge bases. These can be triplestores, relational databases, CSV files, etc., which may or may not use the Wikidata dialect of RDF. KIF leverages Wikidata's data model and vocabulary plus user-defined map**s to expose a unified view of the integrated bases while kee** track of the context and provenance of their statements. The result is a virtual knowledge base which behaves like an "extended Wikidata" and which can be queried either through an efficient filter interface or using SPARQL. We present the design and implementation of KIF, discuss how we have used it to solve a real integration problem in the domain of chemistry (involving Wikidata, PubChem, and IBM CIRCA), and present experimental results on the performance and overhead of KIF. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.12458 [pdf, other]

Testing the double-logarithm asymptotic gluon density in ultraperipheral heavy ion collisions at the Large Hadron Collider

Authors: D. A. Fagundes, M. V. T. Machado

Abstract: In this work we analyze the application of the analytical gluon distribution based on the double asymptotic scaling for the photoproduction of vector mesons in coherent $pp$, $pA$ and $AA$ collisions at the LHC energies using the color dipole formalism. Predictions for the rapidity distribution are presented for $ρ^0$ and $J/ ψ$, $ψ(2S)$ and $Υ(1S)$ photoproduction. An analysis on the uncertaintie… ▽ More In this work we analyze the application of the analytical gluon distribution based on the double asymptotic scaling for the photoproduction of vector mesons in coherent $pp$, $pA$ and $AA$ collisions at the LHC energies using the color dipole formalism. Predictions for the rapidity distribution are presented for $ρ^0$ and $J/ ψ$, $ψ(2S)$ and $Υ(1S)$ photoproduction. An analysis on the uncertainties associated to different implementations of the dipole-proton amplitude is performed. The vector meson photoproduction accompanied by electromagnetic dissociation is also analyzed. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 12 pages, 12 figures, 2 tables

arXiv:2402.06619 [pdf, other]

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

Authors: Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda , et al. (8 additional authors not shown)

Abstract: Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets.… ▽ More Datasets are foundational to many breakthroughs in modern artificial intelligence. Many recent achievements in the space of natural language processing (NLP) can be attributed to the finetuning of pre-trained models on a diverse set of tasks that enables a large language model (LLM) to respond to instructions. Instruction fine-tuning (IFT) requires specifically constructed and annotated datasets. However, existing datasets are almost all in the English language. In this work, our primary goal is to bridge the language gap by building a human-curated instruction-following dataset spanning 65 languages. We worked with fluent speakers of languages from around the world to collect natural instances of instructions and completions. Furthermore, we create the most extensive multilingual collection to date, comprising 513 million instances through templating and translating existing datasets across 114 languages. In total, we contribute four key resources: we develop and open-source the Aya Annotation Platform, the Aya Dataset, the Aya Collection, and the Aya Evaluation Suite. The Aya initiative also serves as a valuable case study in participatory research, involving collaborators from 119 countries. We see this as a valuable framework for future research collaborations that aim to bridge gaps in resources. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.03903 [pdf, other]

Averaging $n$-step Returns Reduces Variance in Reinforcement Learning

Authors: Brett Daley, Martha White, Marlos C. Machado

Abstract: Multistep returns, such as $n$-step returns and $λ$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- we… ▽ More Multistep returns, such as $n$-step returns and $λ$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- weighted averages of $n$-step returns -- to reduce variance. We prove for the first time that any compound return with the same contraction modulus as a given $n$-step return has strictly lower variance. We additionally prove that this variance-reduction property improves the finite-sample complexity of temporal-difference learning under linear function approximation. Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. We conduct experiments showing that compound returns often increase the sample efficiency of $n$-step deep RL agents like DQN and PPO. △ Less

Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ICML 2024. 27 pages, 7 figures, 3 tables

arXiv:2312.01624 [pdf, other]

GVFs in the Real World: Making Predictions Online for Water Treatment

Authors: Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White

Abstract: In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant. Develo** such a prediction system is a critical step on the path to optimizing and automating water treatment. Before that, there are many questions to answer about the predictability of the data, suitable neural network architectures, how to overcome partial obse… ▽ More In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant. Develo** such a prediction system is a critical step on the path to optimizing and automating water treatment. Before that, there are many questions to answer about the predictability of the data, suitable neural network architectures, how to overcome partial observability and more. We first describe this dataset, and highlight challenges with seasonality, nonstationarity, partial observability, and heterogeneity across sensors and operation modes of the plant. We then describe General Value Function (GVF) predictions -- discounted cumulative sums of observations -- and highlight why they might be preferable to classical n-step predictions common in time series prediction. We discuss how to use offline data to appropriately pre-train our temporal difference learning (TD) agents that learn these GVF predictions, including how to select hyperparameters for online fine-tuning in deployment. We find that the TD-prediction agent obtains an overall lower normalized mean-squared error than the n-step prediction agent. Finally, we show the importance of learning in deployment, by comparing a TD agent trained purely offline with no online updating to a TD agent that learns online. This final result is one of the first to motivate the importance of adapting predictions in real-time, for non-stationary high-volume systems in the real world. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: Published in Machine Learning (2023)

Journal ref: Machine Learning (2023): 1-31

arXiv:2312.01203 [pdf, other]

Harnessing Discrete Representations For Continual Reinforcement Learning

Authors: Edan Meyer, Adam White, Marlos C. Machado

Abstract: Reinforcement learning (RL) agents make decisions using nothing but observations from the environment, and consequently, heavily rely on the representations of those observations. Though some recent breakthroughs have used vector-based categorical representations of observations, often referred to as discrete representations, there is little work explicitly assessing the significance of such a cho… ▽ More Reinforcement learning (RL) agents make decisions using nothing but observations from the environment, and consequently, heavily rely on the representations of those observations. Though some recent breakthroughs have used vector-based categorical representations of observations, often referred to as discrete representations, there is little work explicitly assessing the significance of such a choice. In this work, we provide a thorough empirical investigation of the advantages of representing observations as vectors of categorical values within the context of reinforcement learning. We perform evaluations on world-model learning, model-free RL, and ultimately continual RL problems, where the benefits best align with the needs of the problem setting. We find that, when compared to traditional continuous representations, world models learned over discrete representations accurately model more of the world with less capacity, and that agents trained with discrete representations learn better policies with less data. In the context of continual RL, these benefits translate into faster adapting agents. Additionally, our analysis suggests that the observed performance improvements can be attributed to the information contained within the latent vectors and potentially the encoding of the discrete representation itself. △ Less

Submitted 5 December, 2023; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: 23 pages, 16 figures, submitted to ICLR 2024

arXiv:2312.00246 [pdf, other]

Directions of Curvature as an Explanation for Loss of Plasticity

Authors: Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado

Abstract: Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for loss of plasticity: Neural networks lose directions of curvature during training and that loss of p… ▽ More Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for loss of plasticity: Neural networks lose directions of curvature during training and that loss of plasticity can be attributed to this reduction in curvature. To support such a claim, we provide a systematic investigation of loss of plasticity across continual learning tasks using MNIST, CIFAR-10 and ImageNet. Our findings illustrate that loss of curvature directions coincides with loss of plasticity, while also showing that previous explanations are insufficient to explain loss of plasticity in all settings. Lastly, we show that regularizers which mitigate loss of plasticity also preserve curvature, motivating a simple distributional regularizer that proves to be effective across the problem settings we considered. △ Less

Submitted 27 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

arXiv:2310.15719 [pdf, other]

Recurrent Linear Transformers

Authors: Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White

Abstract: The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires a… ▽ More The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice. We evaluate our approaches in reinforcement learning problems where the aforementioned computational limitations make the application of transformers nearly infeasible. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments. When compared to a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use in more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance on harder tasks. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: transformers, reinforcement learning, partial observability

arXiv:2310.10833 [pdf, other]

Proper Laplacian Representation Learning

Authors: Diego Gomez, Michael Bowling, Marlos C. Machado

Abstract: The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising approach to address these problems by inducing informative state encoding and intrinsic rewards for temporally-extended action discovery and reward sha**. To ob… ▽ More The ability to learn good representations of states is essential for solving large reinforcement learning problems, where exploration, generalization, and transfer are particularly challenging. The Laplacian representation is a promising approach to address these problems by inducing informative state encoding and intrinsic rewards for temporally-extended action discovery and reward sha**. To obtain the Laplacian representation one needs to compute the eigensystem of the graph Laplacian, which is often approximated through optimization objectives compatible with deep learning approaches. These approximations, however, depend on hyperparameters that are impossible to tune efficiently, converge to arbitrary rotations of the desired eigenvectors, and are unable to accurately recover the corresponding eigenvalues. In this paper we introduce a theoretically sound objective and corresponding optimization algorithm for approximating the Laplacian representation. Our approach naturally recovers both the true eigenvectors and eigenvalues while eliminating the hyperparameter dependence of previous approximations. We provide theoretical guarantees for our method and we show that those results translate empirically into robust learning across multiple environments. △ Less

Submitted 3 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.04252 [pdf, ps, other]

Scaling limit of an equilibrium surface under the Random Average Process

Authors: Luiz Renato Fontes, Mariela Pentón Machado, Leonel Zuaznábar

Abstract: We consider the equilibrium surface of the Random Average Process started from an inclined plane, as seen from the height of the origin, obtained in [Ferrari & Fontes, 1998], where its fluctuations were shown to be of order of the square root of the distance to the origin in one dimension, and the square root of the log of that distance in two dimensions (and constant in higher dimensions). Remark… ▽ More We consider the equilibrium surface of the Random Average Process started from an inclined plane, as seen from the height of the origin, obtained in [Ferrari & Fontes, 1998], where its fluctuations were shown to be of order of the square root of the distance to the origin in one dimension, and the square root of the log of that distance in two dimensions (and constant in higher dimensions). Remarkably, even if not pointed out explicitly in [Ferrari & Fontes, 1998], the correlation structure of those fluctuations is given in terms of the Green's function of a certain random walk, and thus corresponds to those of Discrete Gaussian Free Fields. In the present paper we obtain the scaling limit of those fluctuations in one and two dimensions, in terms of Gaussian processes, in the sense of finite dimensional distributions. In one dimension, the limit is given by Brownian Motion; in two dimensions, we get a process with a discontinuous covariance function. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 23 pages

MSC Class: 60K35; 82C41

arXiv:2309.15888 [pdf]

Explainable machine learning identifies multi-omics signatures of muscle response to spaceflight in mice

Authors: Kevin Li, Riya Desai, Ryan T. Scott, Joel Ricky Steele, Meera Machado, Samuel Demharter, Adrienne Hoarfrost, Jessica L. Braun, Val A. Fajardo, Lauren M. Sanders, Sylvain V. Costes

Abstract: The adverse effects of microgravity exposure on mammalian physiology during spaceflight necessitate a deep understanding of the underlying mechanisms to develop effective countermeasures. One such concern is muscle atrophy, which is partly attributed to the dysregulation of calcium levels due to abnormalities in SERCA pump functioning. To identify potential biomarkers for this condition, multi-omi… ▽ More The adverse effects of microgravity exposure on mammalian physiology during spaceflight necessitate a deep understanding of the underlying mechanisms to develop effective countermeasures. One such concern is muscle atrophy, which is partly attributed to the dysregulation of calcium levels due to abnormalities in SERCA pump functioning. To identify potential biomarkers for this condition, multi-omics data and physiological data available on the NASA Open Science Data Repository (osdr.nasa.gov) were used, and machine learning methods were employed. Specifically, we used multi-omics (transcriptomic, proteomic, and DNA methylation) data and calcium reuptake data collected from C57BL/6J mouse soleus and tibialis anterior tissues during several 30+ day-long missions on the international space station. The QLattice symbolic regression algorithm was introduced to generate highly explainable models that predict either experimental conditions or calcium reuptake levels based on multi-omics features. The list of candidate models established by QLattice was used to identify key features contributing to the predictive capability of these models, with Acyp1 and Rps7 proteins found to be the most predictive biomarkers related to the resilience of the tibialis anterior muscle in space. These findings could serve as targets for future interventions aiming to reduce the extent of muscle atrophy during space travel. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.07686 [pdf, ps, other]

Double charmed meson production in $pp$ and $pA$ collisions at the LHC within the dipole approach in momentum representation

Authors: G. Sampaio dos Santos, G. Gil da Silveira, M. V. T. Machado

Abstract: A study of double charmed meson production in proton-proton and proton-nucleus collisions at the LHC energies is performed. Based on the color dipole formalism developed in the transverse momentum representation and the double parton scattering mechanism, predictions are made for the transverse momentum differential cross section for different pairs of $D$-mesons. The theoretical results consider… ▽ More A study of double charmed meson production in proton-proton and proton-nucleus collisions at the LHC energies is performed. Based on the color dipole formalism developed in the transverse momentum representation and the double parton scattering mechanism, predictions are made for the transverse momentum differential cross section for different pairs of $D$-mesons. The theoretical results consider the center-of-mass energy and forward rapidities associated to the measurements by the LHCb Collaboration. The results considering different unintegrated gluon distributions are presented and compared to data and predictions for proton-nucleus collisions are provided. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 20 pages, 4 figures

arXiv:2308.10753 [pdf, other]

The Total Variation-Wasserstein Problem

Authors: Antonin Chambolle, Vincent Duval, Joao Miguel Machado

Abstract: In this work we analyze the Total Variation-Wasserstein minimization problem. We propose an alternative form of deriving optimality conditions from the approach of Calier\&Poon'18, and as result obtain further regularity for the quantities involved. In the sequel we propose an algorithm to solve this problem alongside two numerical experiments. In this work we analyze the Total Variation-Wasserstein minimization problem. We propose an alternative form of deriving optimality conditions from the approach of Calier\&Poon'18, and as result obtain further regularity for the quantities involved. In the sequel we propose an algorithm to solve this problem alongside two numerical experiments. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.00181 [pdf, other]

Study of the azimuthal asymmetry in heavy ion collisions combining initial state momentum orientation and final state collective effects

Authors: Lucas Soster Moriggi, Érison dos Santos Rocha, Magno Valério Trindade Machado

Abstract: In the present work we investigate the source of azimuthal asymmetry for nuclear collision using a model that contemplates particles produced in the initial hard collisions and the collective effects described by a Blast-Wave like expansion. The latter is described by the relaxation time approximation of the Boltzmann transport equation. The parameters regarding collective flow and asymmetry are f… ▽ More In the present work we investigate the source of azimuthal asymmetry for nuclear collision using a model that contemplates particles produced in the initial hard collisions and the collective effects described by a Blast-Wave like expansion. The latter is described by the relaxation time approximation of the Boltzmann transport equation. The parameters regarding collective flow and asymmetry are fitted by the experimental data from $p_T$ spectrum and $v_2$ for PbPb and XeXe collisions at different centrality classes. As a by-product the ratio of final elliptic flow with the initial anisotropy, $v_2/ε_2$, and the average transverse momentum are predicted. △ Less

Submitted 22 September, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: Version to be published in Physical Review D

arXiv:2307.10180 [pdf]

Liquidus temperature nonlinear modeling of silicates $SiO_2-R_2O-RO$

Authors: Patrick dos Anjos, Lucas A. Quaresma, Marcelo L. P. Machado

Abstract: The liquidus temperature is an important parameter in understanding the crystalline behavior of materials and in the operation of blast furnaces. Its modeling can be carried out by linear and nonlinear methods through data, considering the artificial neural network a modeling method with high efficiency because it presents the theorem of universal approximation and with that better performances an… ▽ More The liquidus temperature is an important parameter in understanding the crystalline behavior of materials and in the operation of blast furnaces. Its modeling can be carried out by linear and nonlinear methods through data, considering the artificial neural network a modeling method with high efficiency because it presents the theorem of universal approximation and with that better performances and possibility of greater oscillations. The best linear model and the best nonlinear model were modeled by structural parameters and presented a good numerical approximation, thus demonstrating that mathematical modeling can be performed using structural arguments and also showing a dimensionality reduction method for modeling a thermophysical property of the materials. △ Less

Submitted 21 July, 2023; v1 submitted 19 June, 2023; originally announced July 2023.

Comments: 11 pages, 8 figures, 3 tables

arXiv:2306.10572 [pdf, ps, other]

doi 10.26421/QIC24.3-4-4

Quantum Algorithms for the Shortest Common Superstring and Text Assembling Problems

Authors: Kamil Khadiev, Carlos Manuel Bosch Machado, Zeyu Chen, Junde Wu

Abstract: In this paper, we consider two versions of the Text Assembling problem. We are given a sequence of strings $s^1,\dots,s^n$ of total length $L$ that is a dictionary, and a string $t$ of length $m$ that is texts. The first version of the problem is assembling $t$ from the dictionary. The second version is the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS).… ▽ More In this paper, we consider two versions of the Text Assembling problem. We are given a sequence of strings $s^1,\dots,s^n$ of total length $L$ that is a dictionary, and a string $t$ of length $m$ that is texts. The first version of the problem is assembling $t$ from the dictionary. The second version is the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). In this case, $t$ is not given, and we should construct the shortest string (we call it superstring) that contains each string from the given sequence as a substring. These problems are connected with the sequence assembly method for reconstructing a long DNA sequence from small fragments. For both problems, we suggest new quantum algorithms that work better than their classical counterparts. In the first case, we present a quantum algorithm with $O(m+\log m\sqrt{nL})$ running time. In the case of SSP, we present a quantum algorithm with running time $O(n^3 1.728^n +L +\sqrt{L}n^{1.5}+\sqrt{L}n\log^2L\log^2n)$. △ Less

Submitted 31 December, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2112.13319

Journal ref: In: Qunatum Information & Computation, Vol.24, 2024,pp0267-0294

arXiv:2305.14572 [pdf, ps, other]

The case for an EIC Theory Alliance: Theoretical Challenges of the EIC

Authors: Raktim Abir, Igor Akushevich, Tolga Altinoluk, Daniele Paolo Anderle, Fatma P. Aslan, Alessandro Bacchetta, Baha Balantekin, Joao Barata, Marco Battaglieri, Carlos A. Bertulani, Guillaume Beuf, Chiara Bissolotti, Daniël Boer, M. Boglione, Radja Boughezal, Eric Braaten, Nora Brambilla, Vladimir Braun, Duane Byer, Francesco Giovanni Celiberto, Yang-Ting Chien, Ian C. Cloët, Martha Constantinou, Wim Cosyn, Aurore Courtoy , et al. (146 additional authors not shown)

Abstract: We outline the physics opportunities provided by the Electron Ion Collider (EIC). These include the study of the parton structure of the nucleon and nuclei, the onset of gluon saturation, the production of jets and heavy flavor, hadron spectroscopy and tests of fundamental symmetries. We review the present status and future challenges in EIC theory that have to be addressed in order to realize thi… ▽ More We outline the physics opportunities provided by the Electron Ion Collider (EIC). These include the study of the parton structure of the nucleon and nuclei, the onset of gluon saturation, the production of jets and heavy flavor, hadron spectroscopy and tests of fundamental symmetries. We review the present status and future challenges in EIC theory that have to be addressed in order to realize this ambitious and impactful physics program, including how to engage a diverse and inclusive workforce. In order to address these many-fold challenges, we propose a coordinated effort involving theory groups with differing expertise is needed. We discuss the scientific goals and scope of such an EIC Theory Alliance. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 44 pages, ReVTeX, White Paper on EIC Theory Alliance

arXiv:2305.13519 [pdf]

Development of Non-Linear Equations for Predicting Electrical Conductivity in Silicates

Authors: Patrick dos Anjos, Lucas A. Quaresma, Marcelo L. P. Machado

Abstract: Electrical conductivity is of fundamental importance in electric arc furnaces (EAF) and the interaction of this phenomenon with the process slag results in energy losses and low optimization. As mathematical modeling helps in understanding the behavior of phenomena and it was used to predict the electrical conductivity of EAF slags through artificial neural networks. The best artificial neural net… ▽ More Electrical conductivity is of fundamental importance in electric arc furnaces (EAF) and the interaction of this phenomenon with the process slag results in energy losses and low optimization. As mathematical modeling helps in understanding the behavior of phenomena and it was used to predict the electrical conductivity of EAF slags through artificial neural networks. The best artificial neural network had 100 neurons in the hidden layer, with 6 predictor variables and the predicted variable, electrical conductivity. Mean absolute error and standard deviation of absolute error were calculated, and sensitivity analysis was performed to correlate the effect of each predictor variable with the predicted variable. △ Less

Submitted 28 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: 8 pages, 6 figures, 1 table (AISTech 2023 - Presented and Accepted)

arXiv:2304.14781 [pdf, other]

1D approximation of measures in Wasserstein spaces

Authors: Antonin Chambolle, Vincent Duval, Joao Miguel Machado

Abstract: We propose a variational approach to approximate measures with measures uniformly distributed over a 1 dimentional set. The problem consists in minimizing a Wasserstein distance as a data term with a regularization given by the length of the support. As it is challenging to prove existence of solutions to this problem, we propose a relaxed formulation, which always admits a solution. In the sequel… ▽ More We propose a variational approach to approximate measures with measures uniformly distributed over a 1 dimentional set. The problem consists in minimizing a Wasserstein distance as a data term with a regularization given by the length of the support. As it is challenging to prove existence of solutions to this problem, we propose a relaxed formulation, which always admits a solution. In the sequel we show that if the ambient space is $\mathbb{R}^2$ , under techinical assumptions, any solution to the relaxed problem is a solution to the original one. Finally we manage to prove that any optimal solution to the relaxed problem, and hence also to the original, is Ahlfors regular. △ Less

Submitted 26 June, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

arXiv:2304.01117 [pdf, other]

Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition

Authors: F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer, G. Espada, L. Ingelse, A. Fonseca, M. Landajuela, B. Petersen, R. Glatt, N. Mundhenk, C. S. Lee, J. D. Hochhalter, D. L. Randall, P. Kamienny, H. Zhang, G. Dick, A. Simon, B. Burlacu, Jaan Kasak, Meera Machado, Casper Wilstrup, W. G. La Cava

Abstract: Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize appr… ▽ More Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions. △ Less

Submitted 3 July, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 13 pages, 13 figures, submitted to IEEE Transactions on Evolutionary Computation

arXiv:2303.07507 [pdf, other]

Loss of Plasticity in Continual Deep Reinforcement Learning

Authors: Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, Marlos C. Machado

Abstract: The ability to learn continually is essential in a complex and changing world. In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon i… ▽ More The ability to learn continually is essential in a complex and changing world. In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises -- e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying dimensions (e.g., similarity between games, number of games, number of frames per game), with some experiments spanning 50 days and 2 billion environment interactions. Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy -- Concatenated ReLUs (CReLUs) activation function -- and demonstrate its effectiveness in facilitating continual learning in a changing environment. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2302.07873 [pdf, other]

Separating Technological and Clinical Safety Assurance for Medical Devices

Authors: Spencer Deevy, Tiago de Moraes Machado, Amen Modhafar, Wesley O'Beirne, Richard Paige, Alan Wassyng

Abstract: The safety and clinical effectiveness of medical devices are closely associated with their specific use in clinical treatments. Assuring safety and the desired clinical effectiveness is challenging. Different people may react differently to the same treatment due to variability in their physiology and genetics. Thus, we need to consider the outputs and behaviour of the device itself as well as the… ▽ More The safety and clinical effectiveness of medical devices are closely associated with their specific use in clinical treatments. Assuring safety and the desired clinical effectiveness is challenging. Different people may react differently to the same treatment due to variability in their physiology and genetics. Thus, we need to consider the outputs and behaviour of the device itself as well as the effect of using the device to treat a wide variety of patients. High-intensity focused ultrasound systems and radiation therapy machines are examples of systems in which this is a primary concern. Conventional monolithic assurance cases are complex, and this complexity affects our ability to address these concerns adequately. Based on the principle of separation of concerns, we propose separating the assurance of the use of these types of systems in clinical treatments into two linked assurance cases. The first assurance case demonstrates the safety of the manufacturer's device independent of the clinical treatment. The second demonstrates the safety and clinical effectiveness of the device when it is used in a specific clinical treatment. We introduce the idea of these separate assurance cases, and describe briefly how they are separated and linked. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2301.11321 [pdf, other]

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

Authors: Brett Daley, Martha White, Christopher Amato, Marlos C. Machado

Abstract: Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-po… ▽ More Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios to combat the variance of the IS estimator. Unfortunately, once a trace has been fully cut, the effect cannot be reversed. This has led to the development of credit-assignment strategies that account for multiple past experiences at a time. These trajectory-aware methods have not been extensively analyzed, and their theoretical justification remains uncertain. In this paper, we propose a multistep operator that can express both per-decision and trajectory-aware methods. We prove convergence conditions for our operator in the tabular setting, establishing the first guarantees for several existing methods as well as many new ones. Finally, we introduce Recency-Bounded Importance Sampling (RBIS), which leverages trajectory awareness to perform robustly across $λ$-values in an off-policy control task. △ Less

Submitted 31 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: ICML 2023. 8 pages, 2 figures. arXiv admin note: text overlap with arXiv:2112.12281

arXiv:2301.11181 [pdf, other]

Deep Laplacian-based Options for Temporally-Extended Exploration

Authors: Martin Klissarov, Marlos C. Machado

Abstract: Selecting exploratory actions that generate a rich stream of experience for better learning is a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem consists in selecting actions according to specific policies for an extended period of time, also known as options. A recent line of work to derive such exploratory options builds upon the eigenfunctions of the gra… ▽ More Selecting exploratory actions that generate a rich stream of experience for better learning is a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem consists in selecting actions according to specific policies for an extended period of time, also known as options. A recent line of work to derive such exploratory options builds upon the eigenfunctions of the graph Laplacian. Importantly, until now these methods have been mostly limited to tabular domains where (1) the graph Laplacian matrix was either given or could be fully estimated, (2) performing eigendecomposition on this matrix was computationally tractable, and (3) value functions could be learned exactly. Additionally, these methods required a separate option discovery phase. These assumptions are fundamentally not scalable. In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration. To do so, we introduce a fully online deep RL algorithm for discovering Laplacian-based options and evaluate our approach on a variety of pixel-based tasks. We compare to several state-of-the-art exploration methods and show that our approach is effective, general, and especially promising in non-stationary settings. △ Less

Submitted 9 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2301.05136 [pdf, other]

doi 10.1002/asna.20220117

Light vector meson photoproduction in ultraperipheral heavy ion collisions at the LHC within the Reggeometric Pomeron approach

Authors: László Jenkovszky, Érison S. Rocha, Magno V. T. Machado

Abstract: By using the Reggeometric Pomeron model for vector meson production which successfully describes the high energy lepton-nucleon data, we analyse the light meson production in ultra-peripheral heavy ion collisions at the Large Hadron Collider (LHC). The rapidity distributions for $ρ$ and $φ$ photoproduction in lead-lead, xenon-xenon and oxygen-oxygen collisions are investigated. By using the Reggeometric Pomeron model for vector meson production which successfully describes the high energy lepton-nucleon data, we analyse the light meson production in ultra-peripheral heavy ion collisions at the Large Hadron Collider (LHC). The rapidity distributions for $ρ$ and $φ$ photoproduction in lead-lead, xenon-xenon and oxygen-oxygen collisions are investigated. △ Less

Submitted 12 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

Comments: Proceedings IWARA 2022

arXiv:2211.11533 [pdf]

Linear Modeling of the Glass Transition Temperature of the system $SiO_2-Na_2O-CaO$

Authors: Patrick dos Anjos, Lucas A. Quaresma, Marcelo L. P. Machado

Abstract: This work aimed to mathematically model the glass transition temperature (Tg), one of the most important parameters regarding the behavior of slag, responsible for the sudden change in thermomechanical properties of non-crystalline materials, by the chemical composition of the SiO2-Na2O-CaO system, widely applicable in the production of glasses and constituent of iron, magnesium and aluminum metal… ▽ More This work aimed to mathematically model the glass transition temperature (Tg), one of the most important parameters regarding the behavior of slag, responsible for the sudden change in thermomechanical properties of non-crystalline materials, by the chemical composition of the SiO2-Na2O-CaO system, widely applicable in the production of glasses and constituent of iron, magnesium and aluminum metallurgy slags. The SciGlass database was used to provide data for mathematical modeling through the Python programming language, using the method of least squares. A new equation was established, called P Model, and it presented a lower mean absolute error and lower standard deviation of absolute errors in relation to 3 equations in the literature. The raised equation provides significant results in the mathematical modeling of Tg by the chemical system SiO2-Na2O-CaO, valid for the limits of the data used in the mathematical modeling. △ Less

Submitted 21 July, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures, 2 tables (CONTECC 2022 - Accepted and presented)

arXiv:2211.07805 [pdf, other]

Agent-State Construction with Auxiliary Inputs

Authors: Ruo Yu Tao, Adam White, Marlos C. Machado

Abstract: In many, if not every realistic sequential decision-making task, the decision-making agent is not able to model the full complexity of the world. The environment is often much larger and more complex than the agent, a setting also known as partial observability. In such settings, the agent must leverage more than just the current sensory inputs; it must construct an agent state that summarizes pre… ▽ More In many, if not every realistic sequential decision-making task, the decision-making agent is not able to model the full complexity of the world. The environment is often much larger and more complex than the agent, a setting also known as partial observability. In such settings, the agent must leverage more than just the current sensory inputs; it must construct an agent state that summarizes previous interactions with the world. Currently, a popular approach for tackling this problem is to learn the agent-state function via a recurrent network from the agent's sensory stream as input. Many impressive reinforcement learning applications have instead relied on environment-specific functions to aid the agent's inputs for history summarization. These augmentations are done in multiple ways, from simple approaches like concatenating observations to more complex ones such as uncertainty estimates. Although ubiquitous in the field, these additional inputs, which we term auxiliary inputs, are rarely emphasized, and it is not clear what their role or impact is. In this work we explore this idea further, and relate these auxiliary inputs to prior classic approaches to state construction. We present a series of examples illustrating the different ways of using auxiliary inputs for reinforcement learning. We show that these auxiliary inputs can be used to discriminate between observations that would otherwise be aliased, leading to more expressive features that smoothly interpolate between different states. Finally, we show that this approach is complementary to state-of-the-art methods such as recurrent neural networks and truncated back-propagation through time, and acts as a heuristic that facilitates longer temporal credit assignment, leading to better performance. △ Less

Submitted 5 May, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: Published in Transactions on Machine Learning Research. 13 pages + 2 references + 15 appendix, 12 figures

arXiv:2211.07587 [pdf]

Artificial neural networks for predicting the viscosity of lead-containing glasses

Authors: Patrick dos Anjos, Lucas A. Quaresma, Marcelo L. P. Machado

Abstract: The viscosity of lead-containing glasses is of fundamental importance for the manufacturing process, and can be predicted by algorithms such as artificial neural networks. The SciGlass database was used to provide training, validation and test data of chemical composition, temperature and viscosity for the construction of artificial neural networks with node variation in the hidden layer. The best… ▽ More The viscosity of lead-containing glasses is of fundamental importance for the manufacturing process, and can be predicted by algorithms such as artificial neural networks. The SciGlass database was used to provide training, validation and test data of chemical composition, temperature and viscosity for the construction of artificial neural networks with node variation in the hidden layer. The best model built with training data and validation data was compared with 7 other models from the literature, demonstrating better statistical evaluations of mean absolute error and coefficient of determination to the test data, with subsequent sensitivity analysis in agreement with the literature. Skewness and kurtosis were calculated and there is a good correlation between the values predicted by the best neural network built with the test data. △ Less

Submitted 20 November, 2022; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: 6 pages, 5 figures, 2 tables

arXiv:2210.15749 [pdf, ps, other]

doi 10.1016/j.physletb.2022.137585

Investigating exclusive $ρ^0$ photoproduction within the Regge phenomenology approach

Authors: László Jenkovszky, Érison dos Santos Rocha, Magno V. T. Machado

Abstract: The elastic differential and integrated total cross section for the exclusive $ρ^0$ photoproduction in electron-proton ($ep$) collisions are evaluated taking into account nonperturbative Pomeron exchange approach. By using three different models based on Regge phenomenology the results are compared to recent measurements by H1 Collaboration in $ep$ collisions and by the CMS collaboration from ultr… ▽ More The elastic differential and integrated total cross section for the exclusive $ρ^0$ photoproduction in electron-proton ($ep$) collisions are evaluated taking into account nonperturbative Pomeron exchange approach. By using three different models based on Regge phenomenology the results are compared to recent measurements by H1 Collaboration in $ep$ collisions and by the CMS collaboration from ultraperipheral proton-lead collisions. The analysis is expanded by calculating the coherent nuclear cross section, $σ(γA\rightarrow ρ^0 A)$, which is applied to $ρ^0$ production in ultraperipheral lead-lead and xenon-xenon collisions. The predictions are compared to the measurements performed by ALICE Collaboration. Aspects of the theoretical uncertainties and limitations of the formalism are scrutinized. △ Less

Submitted 12 December, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

arXiv:2209.14374 [pdf]

doi 10.1039/D2TA03641J

Functional thin films as cathode/electrolyte interlayers: a strategy to enhance the performance and durability of solid oxide fuel cells

Authors: Marina Machado, Federico Baiutti, Lucile Bernadet, Alex Morata, Marc Nuñez, Jan Pieter Ouweltjes, Fabio Coral Fonseca, Marc Torrell, Albert Tarancónb

Abstract: Electrochemical devices such as solid oxide fuel cells (SOFC) may greatly benefit from the implementation of nanoengineered thin-film multifunctional layers providing, alongside enhanced electrochemical activity, improved mechanical, and long-term stability. In this study, an ultrathin (400 nm) bilayer of samarium-doped ceria and a self-assembled nanocomposite made of Sm0.2Ce0.8O1.9-La0.8Sr0.2MnO3… ▽ More Electrochemical devices such as solid oxide fuel cells (SOFC) may greatly benefit from the implementation of nanoengineered thin-film multifunctional layers providing, alongside enhanced electrochemical activity, improved mechanical, and long-term stability. In this study, an ultrathin (400 nm) bilayer of samarium-doped ceria and a self-assembled nanocomposite made of Sm0.2Ce0.8O1.9-La0.8Sr0.2MnO3-$δ$ was fabricated by pulsed laser deposition and is employed as a functional oxygen electrode in an anode-supported solid oxide fuel cell. Introducing the functional bilayer in the cell architecture results in a simple processing technique for the fabrication of high-performance fuel cells (power density 1.0 W.cm-2 at 0.7 V and 750 $^\circ$C). Durability tests were carried out for up to 1500 h, showing a small degradation under extreme operating conditions of 1 A.cm-2, while a stable behaviour at 0.5 A.cm-2. Post-test analyses, including scanning and transmission electron microscopy and electrochemical impedance spectroscopy, demonstrate that the nanoengineered thin film layers remain mostly morphologically stable after the operation. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 24 pages, 12 figures

Journal ref: Journal of Materials Chemistry A, 2022, 10, 17317-17325

arXiv:2207.07794 [pdf, other]

Nuclear Modification Factor in Small System Collisions within Perturbative QCD Including Thermal Effects

Authors: L. S. Moriggi, M. V. T. Machado

Abstract: In this paper, dedicated to the memory of the late Prof. Jean Cleymans, the nuclear modification factors, $R_{xA}$, are investigated for pion production in small system collisions, measured by PHENIX experiment at RHIC (Relativistic Heavy Ion Collider). The theoretical framework is the transverse momentum $k_T$-factorization formalism for hard processes at small momentum fraction, $x$. Evidence fo… ▽ More In this paper, dedicated to the memory of the late Prof. Jean Cleymans, the nuclear modification factors, $R_{xA}$, are investigated for pion production in small system collisions, measured by PHENIX experiment at RHIC (Relativistic Heavy Ion Collider). The theoretical framework is the transverse momentum $k_T$-factorization formalism for hard processes at small momentum fraction, $x$. Evidence for collective expansion and thermal effects for pions, produced at equilibrium, is studied based on phenomenological parametrization of blast-wave type in the relaxation time approximation. The dependencies on the centrality and on the projectile species are discussed in terms of the behavior of Cronin peak and the suppression of $R_{xA}$ at large transverse momentum, $p_T$. The multiplicity of produced particles, which is sensitive to the soft sector of the spectra, is also included in the present analysis. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: 12 pages, 4 figures. Contribution to MDPI Physics Special Issue "Jean Cleymans: A Life for Physics", dedicated to the memory of Professor Jean Cleymans

arXiv:2206.06987 [pdf, other]

doi 10.1103/PhysRevD.107.014004

Asymptotic gluon density within the color dipole picture in the light of HERA high-precision data

Authors: D. A. Fagundes, M. V. T. Machado

Abstract: We present an analysis of the most precise set of HERA data within the color dipole formalism, by using an analytical gluon density, based on the double-logarithm approximation of the DGLAP equations in the asymptotic limit of the scaling variable, $σ=\log{(1/x)}\log{(\log{(Q^2/Q_ 0^2)})}\rightarrow \infty$. Fits to data, including charm and bottom quarks are performed and demonstrate the efficien… ▽ More We present an analysis of the most precise set of HERA data within the color dipole formalism, by using an analytical gluon density, based on the double-logarithm approximation of the DGLAP equations in the asymptotic limit of the scaling variable, $σ=\log{(1/x)}\log{(\log{(Q^2/Q_ 0^2)})}\rightarrow \infty$. Fits to data, including charm and bottom quarks are performed and demonstrate the efficiency of the model in describing the reduced cross section, $σ_{r}$, in the wide range $Q^2:(1.5,500)$ GeV$^2$ for two dipole models including parton saturation effects. We also give predictions to $F_{2}^{c\bar{c}}$ , $F_{2}^{b\bar{b}}$ and $F_{L}$, all describing the data reasonably well in the range $Q^2:(2.5,120)$ GeV$^2$. Total cross sections of exclusive photoproduction of $J/ψ$ and $ρ$ are also calculated and successfully compared to HERA data and recent measurements at LHCb. △ Less

Submitted 5 January, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 17 pages, 9 figures, 6 tables. Matches published version

Journal ref: Phys.Rev. D 107, 014004 (2023)

arXiv:2205.00925 [pdf, ps, other]

doi 10.1140/epjc/s10052-022-10771-6

$D$-meson production in high energy $pA$ collisions within the QCD color dipole transverse momentum representation

Authors: G. Sampaio dos Santos, G. Gil da Silveira, M. V. T. Machado

Abstract: The $D$-meson production is investigated by considering the unintegrated gluon distribution within the dipole approach in the momentum representation. We analyze the $D$-meson spectrum accounting for the effects of nonlinear behavior of the QCD dynamics which can be accordingly addressed in the dipole framework. The unintegrated gluon distribution is obtained by using geometric scaling property an… ▽ More The $D$-meson production is investigated by considering the unintegrated gluon distribution within the dipole approach in the momentum representation. We analyze the $D$-meson spectrum accounting for the effects of nonlinear behavior of the QCD dynamics which can be accordingly addressed in the dipole framework. The unintegrated gluon distribution is obtained by using geometric scaling property and the results are compared to the Glauber-Gribov framework. The absolute transverse momentum spectra and the nuclear modification ratios are investigated. Predictions are compared with the experimental measurements by the ALICE and LHCb Collaborations in $pA$ collisions for different rapidity bins. △ Less

Submitted 13 October, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: 23 pages, 9 figures, 1 table

Journal ref: Eur.Phys.J. C 82 (2022) 795

arXiv:2204.10350 [pdf, other]

doi 10.1103/PhysRevD.106.014002

Exclusive $Z^0$ production in $ep$ and $eA$ collisions at high energies

Authors: G. M. Peccini, L. S. Moriggi, M. V. T. Machado

Abstract: In this work the $k_{\perp}$-factorization formalism is applied to compute the exclusive $Z^0$ boson photoproduction in $ep$ and $eA$ collisions. The study is also extended to $pp$ and $AA$ processes. The nuclear effects are investigated considering heavy and light ions. Analytical models for the unintegrated gluon distribution are taken into account and the corresponding theoretical uncertainty i… ▽ More In this work the $k_{\perp}$-factorization formalism is applied to compute the exclusive $Z^0$ boson photoproduction in $ep$ and $eA$ collisions. The study is also extended to $pp$ and $AA$ processes. The nuclear effects are investigated considering heavy and light ions. Analytical models for the unintegrated gluon distribution are taken into account and the corresponding theoretical uncertainty is quantified. The analysis is done for electron-ion collisions at the Large Hadron-Electron Collider (LHeC), its high-energy upgrade (HE-LHeC) and at the Future Circular Collider (FCC) in lepton-hadron mode. Additionally, ultra-peripheral heavy ion collisions at future runs of the Large Hadron Collider (LHC) and at the FCC (hadron-hadron mode) are also considered. △ Less

Submitted 26 June, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: 8 pages, 5 figures

arXiv:2204.04700 [pdf, other]

doi 10.3847/1538-3881/ac6110

Activity and Rotation of Nearby Field M Dwarfs in the TESS Southern Continuous Viewing Zone

Authors: Francys Anthony, Alejandro Núñez, Marcel A. Agüeros, Jason L. Curtis, J. -D. do Nascimento, Jr., João M. Machado, Andrew W. Mann, Elisabeth R. Newton, Rayna Rampalli, Pa Chia Thao, Mackenna L. Wood

Abstract: The evolution of magnetism in late-type dwarfs remains murky, as we can only weakly predict levels of activity for M dwarfs of a given mass and age. We report results from our spectroscopic survey of M dwarfs in the Southern Continuous Viewing Zone (CVZ) of the Transiting Exoplanet Survey Satellite (TESS). As the TESS CVZs overlap with those of the James Webb Space Telescope, our targets constitut… ▽ More The evolution of magnetism in late-type dwarfs remains murky, as we can only weakly predict levels of activity for M dwarfs of a given mass and age. We report results from our spectroscopic survey of M dwarfs in the Southern Continuous Viewing Zone (CVZ) of the Transiting Exoplanet Survey Satellite (TESS). As the TESS CVZs overlap with those of the James Webb Space Telescope, our targets constitute a legacy sample for studies of nearby M dwarfs. For 122 stars, we obtained at least one $R\approx 2000$ optical spectrum with which we measure chromospheric $\mathrm{H}α$ emission, a proxy for magnetic field strength. The fraction of active stars is consistent with what is expected for field M dwarfs; as in previous studies, we find that late-type M dwarfs remain active for longer than their early type counterparts. While the TESS light curves for $\approx$20% of our targets show modulations consistent with rotation, TESS systematics are not well enough understood for confident measurements of rotation periods ($P_{\mathrm{rot}}$) longer than half the length of an observing sector. We report periods for 12 stars for which we measure $P_{\mathrm{rot}} {\lower0.8ex\hbox{$\buildrel <\over\sim$}}$ 15 d or find confirmation for the TESS-derived $P_{\mathrm{rot}}$ in the literature. Our sample of 21 $P_{\mathrm{rot}}$, which includes periods from the literature, is consistent with our targets being spun-down field stars. Finally, we examine the $\mathrm{H}α$-to-bolometric luminosity distribution for our sample. Two stars are rotating fast enough to be magnetically saturated, but are not, hinting at the possibility that fast rotators may appear inactive in $\mathrm{H}α$. △ Less

Submitted 10 April, 2022; originally announced April 2022.

Comments: Accepted for publication in AJ, 17 pages, 10 figures, 2 tables

arXiv:2203.15955 [pdf, other]

Investigating the Properties of Neural Network Representations in Reinforcement Learning

Authors: Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White

Abstract: In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designe… ▽ More In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation -- good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. We introduce and measure six representational properties over more than 25 thousand agent-task settings. We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment, with source and transfer tasks corresponding to different goal locations. We develop a method to better understand why some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance. We demonstrate the generality of the methodology by investigating representations learned by a Rainbow agent that successfully transfer across games modes in Atari 2600. △ Less

Submitted 5 May, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.11369 [pdf, other]

Temporal Abstractions-Augmented Temporally Contrastive Learning: An Alternative to the Laplacian in RL

Authors: Akram Erraqabi, Marlos C. Machado, Mingde Zhao, Sainbayar Sukhbaatar, Alessandro Lazaric, Ludovic Denoyer, Yoshua Bengio

Abstract: In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward sha**. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approac… ▽ More In reinforcement learning, the graph Laplacian has proved to be a valuable tool in the task-agnostic setting, with applications ranging from skill discovery to reward sha**. Recently, learning the Laplacian representation has been framed as the optimization of a temporally-contrastive objective to overcome its computational limitations in large (or continuous) state spaces. However, this approach requires uniform access to all states in the state space, overlooking the exploration problem that emerges during the representation learning process. In this work, we propose an alternative method that is able to recover, in a non-uniform-prior setting, the expressiveness and the desired properties of the Laplacian representation. We do so by combining the representation learning with a skill-based covering policy, which provides a better training distribution to extend and refine the representation. We also show that a simple augmentation of the representation objective with the learned temporal abstractions improves dynamics-awareness and helps exploration. We find that our method succeeds as an alternative to the Laplacian in the non-uniform setting and scales to challenging continuous control environments. Finally, even if our method is not optimized for skill discovery, the learned skills can successfully solve difficult continuous navigation tasks with sparse rewards, where standard skill discovery approaches are no so effective. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.10986 [pdf, ps, other]

doi 10.1103/PhysRevD.105.094009

Investigating the QCD dynamical entropy in high-energy hadronic collisions

Authors: G. S. Ramos, M. V. T. Machado

Abstract: The dynamical entropy of dense gluonic states in proton-proton collisions at high energies is studied by using phenomenological models for the unintegrated gluon distribution. The corresponding transverse momentum probability distributions are evaluated in terms of rapidity. The dynamical entropy density is obtained in the rapidity range relevant for the collisions at the Large Hadron Collider. Th… ▽ More The dynamical entropy of dense gluonic states in proton-proton collisions at high energies is studied by using phenomenological models for the unintegrated gluon distribution. The corresponding transverse momentum probability distributions are evaluated in terms of rapidity. The dynamical entropy density is obtained in the rapidity range relevant for the collisions at the Large Hadron Collider. The total entropy density for the dense system is computed as a function of the rapidity evolution $ΔY = Y-Y_0$ given an initial rapidity $Y_0$. The theoretical uncertainties are investigated and comparison with related approaches in literature is done. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 08 pages, 4 figures

arXiv:2202.03466 [pdf, other]

doi 10.1016/j.artint.2023.104001

Reward-Respecting Subtasks for Model-Based Reinforcement Learning

Authors: Richard S. Sutton, Marlos C. Machado, G. Zacharias Holland, David Szepesvari, Finbarr Timbers, Brian Tanner, Adam White

Abstract: To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress with state abstraction, but temporal abstraction has rarely been used, despite extensively developed theory based on the options framework. One reason for this is that the space of possible options is i… ▽ More To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress with state abstraction, but temporal abstraction has rarely been used, despite extensively developed theory based on the options framework. One reason for this is that the space of possible options is immense, and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks, such as reaching a bottleneck state or maximizing the cumulative sum of a sensory signal other than reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. In most previous work, the subtasks ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option terminates. We show that option models obtained from such reward-respecting subtasks are much more likely to be useful in planning than eigenoptions, shortest path options based on bottleneck states, or reward-respecting options generated by the option-critic. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how values, policies, options, and models can all be learned online and off-policy using standard algorithms and general value functions. △ Less

Submitted 16 September, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

Journal ref: Artificial Intelligence, first published online September 6, 2023

arXiv:2202.02162 [pdf, ps, other]

doi 10.1016/j.physletb.2022.137004

Regge phenomenology and coherent photoproduction of charmonium in peripheral heavy ion collisions

Authors: Laszlo Jenkovszky, Vladyslav Libov, Magno V. T. Machado

Abstract: By using models based on Regge phenomenology we analyse the coherent photoproduction of charmonium in peripheral heavy-ion collisions at the Large Hadron Collider (LHC). The centrality dependence is investigated and compared to the experimental results for coherent $J/ψ$ production in lead-lead LHC runs at the energies of 2.76 and 5.02 TeV. Theoretical uncertainties and possible limitations of the… ▽ More By using models based on Regge phenomenology we analyse the coherent photoproduction of charmonium in peripheral heavy-ion collisions at the Large Hadron Collider (LHC). The centrality dependence is investigated and compared to the experimental results for coherent $J/ψ$ production in lead-lead LHC runs at the energies of 2.76 and 5.02 TeV. Theoretical uncertainties and possible limitations of the formalism are also discussed. △ Less

Submitted 11 March, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: 12 pages, 7 figures

Journal ref: Physics Letters B, Volume 827, 10 April 2022, 137004

arXiv:2201.13432 [pdf, other]

doi 10.1088/1361-6471/ac8b27

Nuclear transverse momentum imbalance in the color dipole approach at the LHC regime

Authors: F. G. Ben, A. V. Giannini, M. V. T. Machado

Abstract: Transverse momentum broadening of a parton propagating through a large nucleus is evaluated in the color dipole approach using different models for the dipole cross section or unintegrated gluon distribution, which lead to different values of the coefficient $C_{\mathcal{F}}(0,s)$. Numerical calculations are compared to data extracted from LHCb and ALICE experiments for nuclear broadening of… ▽ More Transverse momentum broadening of a parton propagating through a large nucleus is evaluated in the color dipole approach using different models for the dipole cross section or unintegrated gluon distribution, which lead to different values of the coefficient $C_{\mathcal{F}}(0,s)$. Numerical calculations are compared to data extracted from LHCb and ALICE experiments for nuclear broadening of $J/ψ$. We find that different models which describe the small-$x$ data predict values of $Δp_T^2$ that agree reasonably well with experiment, specially for forward rapidity. The centrality dependence was also analysed and the models are consistent with the ALICE measurements. △ Less

Submitted 20 March, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: 6 pages, 2 figure, 2 tables

arXiv:2112.13319 [pdf, ps, other]

Quantum Algorithm for the Shortest Superstring Problem

Authors: Kamil Khadiev, Carlos Manuel Bosch Machado

Abstract: In this paper, we consider the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). The problem is as follows. For a positive integer $n$, a sequence of n strings $S=(s^1,\dots,s^n)$ is given. We should construct the shortest string $t$ (we call it superstring) that contains each string from the given sequence as a substring. The problem is connected with the… ▽ More In this paper, we consider the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). The problem is as follows. For a positive integer $n$, a sequence of n strings $S=(s^1,\dots,s^n)$ is given. We should construct the shortest string $t$ (we call it superstring) that contains each string from the given sequence as a substring. The problem is connected with the sequence assembly method for reconstructing a long DNA sequence from small fragments. We present a quantum algorithm with running time $O^*(1.728^n)$. Here $O^*$ notation does not consider polynomials of $n$ and the length of $t$. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Comments: 11 pages

arXiv:2111.13389 [pdf, ps, other]

doi 10.1016/j.physletb.2021.136836

The reggeometric pomeron and exclusive production of $J/ψ$ and $ψ(2S)$ in ultraperipheral collisions at the LHC

Authors: Laszlo Jenkovszky, Vladyslav Libov, Magno V. T. Machado

Abstract: By using a Regge-pole model for vector meson production (VMP), successfully describing the HERA data, we analyse the correlation between VMP cross sections in photon-induced reactions at HERA and those in ultra-peripheral collisions at the Large Hadron Collider (LHC). The rapidity distributions of proton-proton collisions at 13~TeV and lead-lead collisions at 2.76 and 5.02 TeV are investigated. Th… ▽ More By using a Regge-pole model for vector meson production (VMP), successfully describing the HERA data, we analyse the correlation between VMP cross sections in photon-induced reactions at HERA and those in ultra-peripheral collisions at the Large Hadron Collider (LHC). The rapidity distributions of proton-proton collisions at 13~TeV and lead-lead collisions at 2.76 and 5.02 TeV are investigated. The transverse momentum distribution in nuclear coherent vector meson production is also addressed. Predictions for future experiments on production of $J/ψ$ and $ψ(2S)$ are presented. △ Less

Submitted 22 January, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: 14 pages, 9 figures. arXiv admin note: text overlap with arXiv:1408.0530

Journal ref: Physics Letters B 824 (2022) 136836

arXiv:2110.05740 [pdf, other]

Temporal Abstraction in Reinforcement Learning with the Successor Representation

Authors: Marlos C. Machado, Andre Barreto, Doina Precup, Michael Bowling

Abstract: Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with th… ▽ More Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand. When this is not the case, there are no definitive answers for which options one should consider. In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. To support our claim, we take a big picture view of recent results, showing how the SR can be used to discover options that facilitate either temporally-extended exploration or planning. We cast these results as instantiations of a general framework for option discovery in which the agent's representation is used to identify useful options, which are then used to further improve its representation. This results in a virtuous, never-ending, cycle in which both the representation and the options are constantly refined based on each other. Beyond option discovery itself, we also discuss how the SR allows us to augment a set of options into a combinatorially large counterpart without additional learning. This is achieved through the combination of previously learned options. Our empirical evaluation focuses on options discovered for exploration and on the use of the SR to combine them. The results of our experiments shed light on important design decisions involved in the definition of options and demonstrate the synergy of different methods based on the SR, such as eigenoptions and the option keyboard. △ Less

Submitted 11 April, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: This is the final, published JMLR version

Journal ref: Journal of Machine Learning Research (JMLR), 24(80):1-69, 2023

arXiv:2109.11052 [pdf, other]

On Bonus-Based Exploration Methods in the Arcade Learning Environment

Authors: Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

Abstract: Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-base… ▽ More Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration problems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler $ε$-greedy scheme. In fact, we find that methods that perform best on that game often underperform $ε$-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow's 200 million) on Bellemare et al.'s hard exploration games. Our results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: Full version of arXiv:1908.02388

Journal ref: Published as a conference paper at ICLR 2020

arXiv:2108.05828 [pdf, other]

A general class of surrogate functions for stable and efficient reinforcement learning

Authors: Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

Abstract: Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives ris… ▽ More Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple bandit problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite. △ Less

Submitted 30 October, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: Fixed minor typos

Showing 1–50 of 244 results for author: Machado, M