Search | arXiv e-print repository

Mixture of Experts in a Mixture of RL settings

Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea… ▽ More Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17523 [pdf, other]

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Authors: Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

Abstract: Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec… ▽ More Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes. △ Less

Submitted 2 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.11065 [pdf, other]

Enabling mixed-precision with the help of tools: A Nekbone case study

Authors: Yanxiang Chen, Pablo de Oliveira Castro, Paolo Bientinesi, Roman Iakymchuk

Abstract: Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the CFD solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model.… ▽ More Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the CFD solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model. We evaluate the derived mixed-precision program by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, the introduction of mixed-precision in Nekbone, reducing time-to-solution by 40.7% and energy-to-solution by 47% on 128 MPI ranks. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2403.19260 [pdf, other]

NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

Authors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel P. Fraiberger

Abstract: To address the global issue of online hate, hate speech detection (HSD) systems are typically developed on datasets from the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on non-representative samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce Nai… ▽ More To address the global issue of online hate, hate speech detection (HSD) systems are typically developed on datasets from the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on non-representative samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce NaijaHate, the first dataset annotated for HSD which contains a representative sample of Nigerian tweets. We demonstrate that HSD evaluated on biased datasets traditionally used in the literature consistently overestimates real-world performance by at least two-fold. We then propose NaijaXLM-T, a pretrained model tailored to the Nigerian Twitter context, and establish the key role played by domain-adaptive pretraining and finetuning in maximizing HSD performance. Finally, owing to the modest performance of HSD systems in real-world conditions, we find that content moderators would need to review about ten thousand Nigerian tweets flagged as hateful daily to moderate 60% of all hateful content, highlighting the challenges of moderating hate speech at scale as social media usage continues to grow globally. Taken together, these results pave the way towards robust HSD systems and a better protection of social media users from hateful content in low-resource settings. △ Less

Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: ACL 2024 main conference. Data and models available at https://github.com/worldbank/NaijaHate

arXiv:2403.03950 [pdf, other]

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.12479 [pdf, other]

In value-based deep reinforcement learning, a pruned network is a good network

Authors: Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional… ▽ More Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters. △ Less

Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.08609 [pdf, other]

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards develo** scaling laws for reinforcement learning. △ Less

Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.06759 [pdf]

A Methodology for Questionnaire Analysis: Insights through Cluster Analysis of an Investor Competition Data

Authors: Carlos Henrique Q. Forster, Paulo André Lima de Castro, Andrei Ramalho

Abstract: In this paper, we propose a methodology for the analysis of questionnaire data along with its application on discovering insights from investor data motivated by a day trading competition. The questionnaire includes categorical questions, which are reduced to binary questions, 'yes' or 'no'. The methodology reduces dimensionality by grou** questions and participants with similar responses using… ▽ More In this paper, we propose a methodology for the analysis of questionnaire data along with its application on discovering insights from investor data motivated by a day trading competition. The questionnaire includes categorical questions, which are reduced to binary questions, 'yes' or 'no'. The methodology reduces dimensionality by grou** questions and participants with similar responses using clustering analysis. Rule discovery was performed by using a conversion rate metric. Innovative visual representations were proposed to validate the cluster analysis and the relation discovery between questions. When crossing with financial data, additional insights were revealed related to the recognized clusters. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 14 pages, 12 figures

arXiv:2312.08891 [pdf, other]

doi 10.2514/6.2024-2012

High-Dimensional Bayesian Optimisation with Large-Scale Constraints -- An Application to Aeroelastic Tailoring

Authors: Hauke Maathuis, Roeland De Breuker, Saullo G. P. Castro

Abstract: Design optimisation potentially leads to lightweight aircraft structures with lower environmental impact. Due to the high number of design variables and constraints, these problems are ordinarily solved using gradient-based optimisation methods, leading to a local solution in the design space while the global space is neglected. Bayesian Optimisation is a promising path towards sample-efficient, g… ▽ More Design optimisation potentially leads to lightweight aircraft structures with lower environmental impact. Due to the high number of design variables and constraints, these problems are ordinarily solved using gradient-based optimisation methods, leading to a local solution in the design space while the global space is neglected. Bayesian Optimisation is a promising path towards sample-efficient, global optimisation based on probabilistic surrogate models. While Bayesian optimisation methods have demonstrated their strength for problems with a low number of design variables, the scalability to high-dimensional problems while incorporating large-scale constraints is still lacking. Especially in aeroelastic tailoring where directional stiffness properties are embodied into the structural design of aircraft, to control aeroelastic deformations and to increase the aerodynamic and structural performance, the safe operation of the system needs to be ensured by involving constraints resulting from different analysis disciplines. Hence, a global design space search becomes even more challenging. The present study attempts to tackle the problem by using high-dimensional Bayesian Optimisation in combination with a dimensionality reduction approach to solve the optimisation problem occurring in aeroelastic tailoring, presenting a novel approach for high-dimensional problems with large-scale constraints. Experiments on well-known benchmark cases with black-box constraints show that the proposed approach can incorporate large-scale constraints. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Conference paper submitted to AIAA Scitech 2024 Forum

arXiv:2311.17894 [pdf, other]

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

Abstract: We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural n… ▽ More We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.14115 [pdf, other]

A density estimation perspective on learning from pairwise human preferences

Authors: Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

Abstract: Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted… ▽ More Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints. △ Less

Submitted 10 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

arXiv:2310.19804 [pdf, other]

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Abstract: Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). T… ▽ More Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: Published in TMLR

arXiv:2310.03882 [pdf, other]

Small batch deep reinforcement learning

Authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro

Abstract: In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant pe… ▽ More In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: Published at NeurIPS 2023

arXiv:2309.17094 [pdf, ps, other]

doi 10.1007/978-3-031-43619-2_28

How Easy it is to Know How: An Upper Bound for the Satisfiability Problem

Authors: Carlos Areces, Valentin Cassano, Raul Fervari, Pablo Castro, Andres Saravia

Abstract: We investigate the complexity of the satisfiability problem for a modal logic expressing `knowing how' assertions, related to an agent's abilities to achieve a certain goal. We take one of the most standard semantics for this kind of logics based on linear plans. Our main result is a proof that checking satisfiability of a `knowing how' formula can be done in $Σ_2^P$. The algorithm we present reli… ▽ More We investigate the complexity of the satisfiability problem for a modal logic expressing `knowing how' assertions, related to an agent's abilities to achieve a certain goal. We take one of the most standard semantics for this kind of logics based on linear plans. Our main result is a proof that checking satisfiability of a `knowing how' formula can be done in $Σ_2^P$. The algorithm we present relies on eliminating nested modalities in a formula, and then performing multiple calls to a satisfiability checking oracle for propositional logic. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.07309 [pdf, ps, other]

doi 10.4204/EPTCS.387.10

Quantifying Masking Fault-Tolerance via Fair Stochastic Games

Authors: Pablo F. Castro, Pedro R. D'Argenio, Ramiro Demasi, Luciano Putruele

Abstract: We introduce a formal notion of masking fault-tolerance between probabilistic transition systems using stochastic games. These games are inspired in bisimulation games, but they also take into account the possible faulty behavior of systems. When no faults are present, these games boil down to probabilistic bisimulation games. Since these games could be infinite, we propose a symbolic way of repre… ▽ More We introduce a formal notion of masking fault-tolerance between probabilistic transition systems using stochastic games. These games are inspired in bisimulation games, but they also take into account the possible faulty behavior of systems. When no faults are present, these games boil down to probabilistic bisimulation games. Since these games could be infinite, we propose a symbolic way of representing them so that they can be solved in polynomial time. In particular, we use this notion of masking to quantify the level of masking fault-tolerance exhibited by almost-sure failing systems, i.e., those systems that eventually fail with probability 1. The level of masking fault-tolerance of almost-sure failing systems can be calculated by solving a collection of functional equations. We produce this metric in a setting in which one of the player behaves in a strong fair way (mimicking the idea of fair environments). △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: In Proceedings EXPRESS/SOS2023, arXiv:2309.05788. arXiv admin note: substantial text overlap with arXiv:2207.02045

Journal ref: EPTCS 387, 2023, pp. 132-148

arXiv:2307.13824 [pdf, other]

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Authors: Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

Abstract: The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. I… ▽ More The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Published at European Conference on Machine Learning (ECML), 2023

arXiv:2306.13831 [pdf, other]

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

Authors: Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo de Lazcano, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, Jordan Terry

Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas.… ▽ More We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2305.19452 [pdf, other]

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

Abstract: We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis… ▽ More We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster. △ Less

Submitted 13 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: ICML 2023, revised version

arXiv:2305.13447 [pdf, other]

Regularization Through Simultaneous Learning: A Case Study on Plant Classification

Authors: Pedro Henrique Nascimento Castro, Gabriel Cássia Fortuna, Rafael Alves Bonfim de Queiroz, Gladston Juliano Prates Moreira, Eduardo José da Silva Luz

Abstract: In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty.… ▽ More In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty. This experimental configuration allows for a detailed examination of model performance across similar (PlantNet) and dissimilar (ImageNet) domains, thereby enriching the generalizability of Convolutional Neural Network models. Remarkably, our approach demonstrates superior performance over models without regularization and those applying dropout regularization exclusively, enhancing accuracy by 5 to 22 percentage points. Moreover, when combined with dropout, the proposed approach improves generalization, securing state-of-the-art results for the UFOP-HVD challenge. The method also showcases efficiency with significantly smaller sample sizes, suggesting its broad applicability across a spectrum of related tasks. In addition, an interpretability approach is deployed to evaluate feature quality by analyzing class feature correlations within the network's convolutional layers. The findings of this study provide deeper insights into the efficacy of Simultaneous Learning, particularly concerning its interaction with the auxiliary and target datasets. △ Less

Submitted 20 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.14082 [pdf, other]

JaxPruner: A concise library for sparsity research

Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks. △ Less

Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

arXiv:2304.12567 [pdf, other]

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures

arXiv:2304.01382 [pdf, other]

PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching

Authors: Pedro Castro, Tae-Kyun Kim

Abstract: Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one… ▽ More Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one-shot object pose estimator that overcomes these limitations. We create a new training pipeline for object to image matching based on a three-view system: a query with a positive and negative templates. This simple yet effective approach emulates test time scenarios by cheaply constructing an approximation of the full object point cloud during training. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer, a new attention layer that efficiently accommodates self and cross attention between the inputs. Moreover, we propose a pruning strategy where we iteratively remove redundant regions of the target object to further reduce the complexity and noise of the network while maintaining accuracy. Finally we redesign commonly used pose refinement strategies, zoom and 2D offset refinements, and adapt them to the one-shot paradigm. We outperform all prior one-shot pose estimation methods on the Linemod and YCB-V datasets as well achieve results rivaling recent instance-level methods. The source code and models are available at https://github.com/PedroCastro/PoseMatcher. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2302.12902 [pdf, other]

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Authors: Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

Abstract: In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective me… ▽ More In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance. △ Less

Submitted 13 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: Oral at ICML 2023

arXiv:2210.15600 [pdf, other]

doi 10.1080/27660400.2022.2153633

Automatic extraction of materials and properties from superconductors scientific literature

Authors: Luca Foppiano, Pedro Baptista de Castro, Pedro Ortiz Suarez, Kensei Terashima, Yoshihiko Takano, Masashi Ishii

Abstract: The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic a… ▽ More The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, do**, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method. △ Less

Submitted 22 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: 20 pages, 11 figures, 8 tables

Journal ref: STAM:M, 2023, VOL. 3, NO. 1, 2153633

arXiv:2210.11718 [pdf, other]

CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers

Authors: Pedro Castro, Tae-Kyun Kim

Abstract: Learning based 6D object pose estimation methods rely on computing large intermediate pose representations and/or iteratively refining an initial estimation with a slow render-compare pipeline. This paper introduces a novel method we call Cascaded Pose Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the f… ▽ More Learning based 6D object pose estimation methods rely on computing large intermediate pose representations and/or iteratively refining an initial estimation with a slow render-compare pipeline. This paper introduces a novel method we call Cascaded Pose Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the feature pyramid we call OSKFs(Object Surface Keypoint Features) where each element corresponds to an object keypoint. We employ lightweight deformable transformers and chain them together to iteratively refine proposed poses over the sampled OSKFs. We achieve inference runtimes 2x faster than the closest real-time state of the art methods while supporting up to 21 objects on a single model. We demonstrate the effectiveness of CRT-6D by performing extensive experiments on the LM-O and YCBV datasets. Compared to real-time methods, we achieve state of the art on LM-O and YCB-V, falling slightly behind methods with inference runtimes one order of magnitude higher. The source code is available at: https://github.com/PedroCastro/CRT-6D △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted at WACV2023

arXiv:2208.04213 [pdf]

Hybrid Serverless Computing: Opportunities and Challenges

Authors: Paul Castro, Vatche Isahagian, Vinod Muthusamy, Aleksander Slominski

Abstract: In recent years, there has been a surge in the adoption of serverless computing due to the ease of deployment, attractive pay-per-use pricing, and transparent horizontal auto-scaling. At the same time, infrastructure advancements such as the emergence of 5G networks and the explosion of devices connected to Internet known as Internet of Things (IoT), as well as new application requirements that co… ▽ More In recent years, there has been a surge in the adoption of serverless computing due to the ease of deployment, attractive pay-per-use pricing, and transparent horizontal auto-scaling. At the same time, infrastructure advancements such as the emergence of 5G networks and the explosion of devices connected to Internet known as Internet of Things (IoT), as well as new application requirements that constrain where computation and data can happen, will expand the reach of Cloud computing beyond traditional data centers into Hybrid Cloud. Digital transformation due to the pandemic, which accelerated changes to the workforce and spurred further adoption of AI, is expected to accelerate and the emergent Hybrid Cloud market could potentially expand to over trillion dollars. In the Hybrid Cloud environment, driven by the serverless tenants there will be an increased need to focus on enabling productive work for application builders that are using a distributed platform including public clouds, private clouds, and edge systems. In this chapter we investigate how far serverless computing can be extended to become Hybrid Serverless Computing. △ Less

Submitted 14 September, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2207.02045 [pdf, ps, other]

A Stochastic Game Approach to Masking Fault-Tolerance: Bisimulation and Quantification

Authors: Pablo F. Castro, Pedro D'Argenio, Luciano Putruele, Ramiro Demasi

Abstract: We introduce a formal notion of masking fault-tolerance between probabilistic transition systems based on a variant of probabilistic bisimulation (named masking simulation). We also provide the corresponding probabilistic game characterization. Even though these games could be infinite, we propose a symbolic way of representing them, such that it can be decided in polynomial time if there is a mas… ▽ More We introduce a formal notion of masking fault-tolerance between probabilistic transition systems based on a variant of probabilistic bisimulation (named masking simulation). We also provide the corresponding probabilistic game characterization. Even though these games could be infinite, we propose a symbolic way of representing them, such that it can be decided in polynomial time if there is a masking simulation between two probabilistic transition systems. We use this notion of masking to quantify the level of masking fault-tolerance exhibited by almost-sure failing systems, i.e., those systems that eventually fail with probability 1. The level of masking fault-tolerance of almost-sure failing systems can be calculated by solving a collection of functional equations. We produce this metric in a setting in which the minimizing player behaves in a strong fair way (mimicking the idea of fair environments), and limit our study to memoryless strategies due to the infinite nature of the game. We implemented these ideas in a prototype tool, and performed an experimental evaluation. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.10369 [pdf, other]

The State of Sparse Training in Deep Reinforcement Learning

Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

Abstract: The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic… ▽ More The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic investigation into applying a number of existing sparse training techniques on a variety of DRL agents and environments. Our results corroborate the findings from sparse training in the computer vision domain - sparse networks perform better than dense networks for the same parameter count - in the DRL domain. We provide detailed analyses on how the various components in DRL are affected by the use of sparse networks and conclude by suggesting promising avenues for improving the effectiveness of sparse training methods, as well as for advancing their use in DRL. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Proceedings of the 39th International Conference on Machine Learning (ICML'22)

arXiv:2206.01626 [pdf, other]

Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from s… ▽ More Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further. Open-sourced code and trained agents at https://agarwl.github.io/reincarnating_rl. △ Less

Submitted 4 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022. Code and agents at https://agarwl.github.io/reincarnating_rl

arXiv:2112.09811 [pdf, ps, other]

Playing Against Fair Adversaries in Stochastic Games with Total Rewards

Authors: Pablo F. Castro, Pedro R. D'Argenio, Luciano Putruele, Ramiro Demasi

Abstract: We investigate zero-sum turn-based two-player stochastic games in which the objective of one player is to maximize the amount of rewards obtained during a play, while the other aims at minimizing it. We focus on games in which the minimizer plays in a fair way. We believe that these kinds of games enjoy interesting applications in software verification, where the maximizer plays the role of a syst… ▽ More We investigate zero-sum turn-based two-player stochastic games in which the objective of one player is to maximize the amount of rewards obtained during a play, while the other aims at minimizing it. We focus on games in which the minimizer plays in a fair way. We believe that these kinds of games enjoy interesting applications in software verification, where the maximizer plays the role of a system intending to maximize the number of "milestones" achieved, and the minimizer represents the behavior of some uncooperative but yet fair environment. Normally, to study total reward properties, games are requested to be stop** (i.e., they reach a terminal state with probability 1). We relax the property to request that the game is stop** only under a fair minimizing player. We prove that these games are determined, i.e., each state of the game has a value defined. Furthermore, we show that both players have memoryless and deterministic optimal strategies, and the game value can be computed by approximating the greatest-fixed point of a set of functional equations. We implemented our approach in a prototype tool, and evaluated it on an illustrating example and an Unmanned Aerial Vehicle case study. △ Less

Submitted 19 May, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2112.09477 [pdf, other]

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

Authors: Rodrigo Toro Icarte, Ethan Waldie, Toryn Q. Klassen, Richard Valenzano, Margarita P. Castro, Sheila A. McIlraith

Abstract: Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function… ▽ More Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2112.02070 [pdf, other]

Malakai: Music That Adapts to the Shape of Emotions

Authors: Zack Harris, Liam Atticus Clarke, Pietro Gagliano, Dante Camarena, Manal Siddiqui, Pablo S. Castro

Abstract: The advent of ML music models such as Google Magenta's MusicVAE now allow us to extract and replicate compositional features from otherwise complex datasets. These models allow computational composers to parameterize abstract variables such as style and mood. By leveraging these models and combining them with procedural algorithms from the last few decades, it is possible to create a dynamic song… ▽ More The advent of ML music models such as Google Magenta's MusicVAE now allow us to extract and replicate compositional features from otherwise complex datasets. These models allow computational composers to parameterize abstract variables such as style and mood. By leveraging these models and combining them with procedural algorithms from the last few decades, it is possible to create a dynamic song that composes music in real-time to accompany interactive experiences. Malakai is a tool that helps users of varying skill levels create, listen to, remix and share such dynamic songs. Using Malakai, a Composer can create a dynamic song that can be interacted with by a Listener △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2111.11562 [pdf, other]

Reliable Actors with Retry Orchestration

Authors: Olivier Tardieu, David Grove, Gheorghe-Teodor Bercea, Paul Castro, Jaroslaw Cwiklik, Edward Epstein

Abstract: Cloud developers have to build applications that are resilient to failures and interruptions. We advocate for a fault-tolerant programming model for the cloud based on actors, retry orchestration, and tail calls. This model builds upon persistent data stores and messages queues readily available on the cloud. Retry orchestration not only guarantees that (1) failed actor invocations will be retried… ▽ More Cloud developers have to build applications that are resilient to failures and interruptions. We advocate for a fault-tolerant programming model for the cloud based on actors, retry orchestration, and tail calls. This model builds upon persistent data stores and messages queues readily available on the cloud. Retry orchestration not only guarantees that (1) failed actor invocations will be retried but also that (2) completed invocations are never repeated and (3) it preserves a strict happen-before relationship across failures within call stacks. Tail calls can break complex tasks into simple steps to minimize re-execution during recovery. We review key application patterns and failure scenarios. We formalize a process calculus to precisely capture the mechanisms of fault tolerance in this model. We briefly describe our implementation. Using an application inspired by a typical enterprise scenario, we validate the functional correctness of our implementation and assess the impact of fault preparedness and recovery on performance. △ Less

Submitted 11 November, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 22 pages, 7 figures

arXiv:2111.05128 [pdf, other]

Losses, Dissonances, and Distortions

Authors: Pablo Samuel Castro

Abstract: In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting. These dissonances and distortions become part of an artistic performance not just by affecting the visualizations, but also by affecting the artistic musical perform… ▽ More In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting. These dissonances and distortions become part of an artistic performance not just by affecting the visualizations, but also by affecting the artistic musical performance. The system is designed such that the performer can in turn affect the training process itself, thereby creating a closed feedback loop between two processes: the training of a machine learning model and the performance of an improvised piano piece. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021

arXiv:2110.14020 [pdf, other]

The Difficulty of Passive Learning in Deep Reinforcement Learning

Authors: Georg Ostrovski, Pablo Samuel Castro, Will Dabney

Abstract: Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justif… ▽ More Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justifications are mostly limited to the tabular or linear cases. Given the impressive results of deep reinforcement learning, we argue for a need to more clearly understand the challenges in this setting. In the vein of Held & Hein's classic 1963 experiment, we propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning. We identify function approximation in conjunction with fixed data distributions as the strongest factors, thereby extending but also challenging hypotheses stated in past work. Our results provide relevant insights for offline deep reinforcement learning, while also shedding new light on phenomena observed in the online case of learning control. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted paper at NeurIPS 2021

arXiv:2108.13264 [pdf, other]

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

Abstract: Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Lea… ▽ More Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. △ Less

Submitted 5 January, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: Outstanding Paper Award at NeurIPS 2021. Website: https://agarwl.github.io/rliable. 28 Pages, 33 Figures

arXiv:2108.05828 [pdf, other]

A general class of surrogate functions for stable and efficient reinforcement learning

Authors: Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

Abstract: Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives ris… ▽ More Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple bandit problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite. △ Less

Submitted 30 October, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: Fixed minor typos

arXiv:2106.08229 [pdf, other]

MICo: Improved representations via sampling-based state similarity for Markov decision processes

Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Abstract: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of sha** the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed… ▽ More We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of sha** the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark. △ Less

Submitted 21 January, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: Published at NeurIPS 2021

arXiv:2105.07530 [pdf, other]

doi 10.1016/j.revip.2021.100063

Advances in Multi-Variate Analysis Methods for New Physics Searches at the Large Hadron Collider

Authors: Anna Stakia, Tommaso Dorigo, Giovanni Banelli, Daniela Bortoletto, Alessandro Casa, Pablo de Castro, Christophe Delaere, Julien Donini, Livio Finos, Michele Gallinaro, Andrea Giammanco, Alexander Held, Fabricio Jiménez Morales, Grzegorz Kotkowski, Seng Pei Liew, Fabio Maltoni, Giovanna Menardi, Ioanna Papavergou, Alessia Saggio, Bruno Scarpa, Giles C. Strong, Cecilia Tosciri, João Varela, Pietro Vischia, Andreas Weiler

Abstract: Between the years 2015 and 2019, members of the Horizon 2020-funded Innovative Training Network named "AMVA4NewPhysics" studied the customization and application of advanced multivariate analysis methods and statistical learning tools to high-energy physics problems, as well as developed entirely new ones. Many of those methods were successfully used to improve the sensitivity of data analyses per… ▽ More Between the years 2015 and 2019, members of the Horizon 2020-funded Innovative Training Network named "AMVA4NewPhysics" studied the customization and application of advanced multivariate analysis methods and statistical learning tools to high-energy physics problems, as well as developed entirely new ones. Many of those methods were successfully used to improve the sensitivity of data analyses performed by the ATLAS and CMS experiments at the CERN Large Hadron Collider; several others, still in the testing phase, promise to further improve the precision of measurements of fundamental physics parameters and the reach of searches for new phenomena. In this paper, the most relevant new tools, among those studied and developed, are presented along with the evaluation of their performances. △ Less

Submitted 22 November, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

Comments: 101 pages, 21 figures, submitted to Elsevier. [v2]: Updated to published version (in 'Reviews in Physics')

Journal ref: Rev. Phys. 7 (2021) 100063

arXiv:2104.14353 [pdf, other]

A Smartphone based Application for Skin Cancer Classification Using Deep Learning with Clinical Images and Lesion Information

Authors: Breno Krohling, Pedro B. C. Castro, Andre G. C. Pacheco, Renato A. Krohling

Abstract: Over the last decades, the incidence of skin cancer, melanoma and non-melanoma, has increased at a continuous rate. In particular for melanoma, the deadliest type of skin cancer, early detection is important to increase patient prognosis. Recently, deep neural networks (DNNs) have become viable to deal with skin cancer detection. In this work, we present a smartphone-based application to assist on… ▽ More Over the last decades, the incidence of skin cancer, melanoma and non-melanoma, has increased at a continuous rate. In particular for melanoma, the deadliest type of skin cancer, early detection is important to increase patient prognosis. Recently, deep neural networks (DNNs) have become viable to deal with skin cancer detection. In this work, we present a smartphone-based application to assist on skin cancer detection. This application is based on a Convolutional Neural Network(CNN) trained on clinical images and patients demographics, both collected from smartphones. Also, as skin cancer datasets are imbalanced, we present an approach, based on the mutation operator of Differential Evolution (DE) algorithm, to balance data. In this sense, beyond provides a flexible tool to assist doctors on skin cancer screening phase, the method obtains promising results with a balanced accuracy of 85% and a recall of 96%. △ Less

Submitted 28 April, 2021; originally announced April 2021.

arXiv:2102.01514 [pdf, other]

Metrics and continuity in reinforcement learning

Authors: Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

Abstract: In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and top… ▽ More In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and topologies they induce, is thus of crucial importance, as it will directly affect the performance of the algorithms. Indeed, a number of recent works introduce algorithms assuming the existence of "well-behaved" neighbourhoods, but leave the full specification of such topologies for future work. In this paper we introduce a unified formalism for defining these topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process specifying the reinforcement learning problem. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: Accepted at AAAI 2021

arXiv:2101.08169 [pdf, other]

mt5se: An Open Source Framework for Building Autonomous Trading Robots

Authors: Paulo André Lima de Castro

Abstract: Autonomous trading robots have been studied in artificial intelligence area for quite some time. Many AI techniques have been tested for building autonomous agents able to trade financial assets. These initiatives include traditional neural networks, fuzzy logic, reinforcement learning but also more recent approaches like deep neural networks and deep reinforcement learning. Many developers claim… ▽ More Autonomous trading robots have been studied in artificial intelligence area for quite some time. Many AI techniques have been tested for building autonomous agents able to trade financial assets. These initiatives include traditional neural networks, fuzzy logic, reinforcement learning but also more recent approaches like deep neural networks and deep reinforcement learning. Many developers claim to be successful in creating robots with great performance when simulating execution with historical price series, so called backtesting. However, when these robots are used in real markets frequently they present poor performance in terms of risks and return. In this paper, we propose an open source framework (mt5se) that helps the development, backtesting, live testing and real operation of autonomous traders. We built and tested several traders using mt5se. The results indicate that it may help the development of better traders. Furthermore, we discuss the simple architecture that is used in many studies and propose an alternative multiagent architecture. Such architecture separates two main concerns for portfolio manager (PM) : price prediction and capital allocation. More than achieve a high accuracy, a PM should increase profits when it is right and reduce loss when it is wrong. Furthermore, price prediction is highly dependent of asset's nature and history, while capital allocation is dependent only on analyst's prediction performance and assets' correlation. Finally, we discuss some promising technologies in the area. △ Less

Submitted 28 June, 2022; v1 submitted 20 January, 2021; originally announced January 2021.

Comments: This paper replaces an old version of the framework, called mt5b3, which is now deprecated

arXiv:2101.07217 [pdf, other]

Is it a great Autonomous FX Trading Strategy or you are just fooling yourself

Authors: Murilo Sibrao Bernardini, Paulo Andre Lima de Castro

Abstract: In this paper, we propose a method for evaluating autonomous trading strategies that provides realistic expectations, regarding the strategy's long-term performance. This method addresses This method addresses many pitfalls that currently fool even experienced software developers and researchers, not to mention the customers that purchase these products. We present the results of applying our meth… ▽ More In this paper, we propose a method for evaluating autonomous trading strategies that provides realistic expectations, regarding the strategy's long-term performance. This method addresses This method addresses many pitfalls that currently fool even experienced software developers and researchers, not to mention the customers that purchase these products. We present the results of applying our method to several famous autonomous trading strategies, which are used to manage a diverse selection of financial assets. The results show that many of these published strategies are far from being reliable vehicles for financial investment. Our method exposes the difficulties involved in building a reliable, long-term strategy and provides a means to compare potential strategies and select the most promising one by establishing minimal periods and requirements for the test executions. There are many developers that create software to buy and sell financial assets autonomously and some of them present great performance when simulating with historical price series (commonly called backtests). Nevertheless, when these strategies are used in real markets (or data not used in their training or evaluation), quite often they perform very poorly. The proposed method can be used to evaluate potential strategies. In this way, the method helps to tell if you really have a great trading strategy or you are just fooling yourself. △ Less

Submitted 19 November, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: An Implementation of the proposed method: STSE is available at github. The paper includes the link in the reference section

arXiv:2101.05265 [pdf, other]

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

Abstract: Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoreti… ▽ More Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jum** task from pixels, and Distracting DM Control Suite. △ Less

Submitted 18 March, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: ICLR 2021 (Spotlight). Website: https://agarwl.github.io/pse

arXiv:2011.14826 [pdf, other]

Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research

Authors: Johan S. Obando-Ceron, Pablo Samuel Castro

Abstract: Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect… ▽ More Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect of widening the gap between those with ample access to computational resources, and those without. In this work we argue that, despite the community's emphasis on large-scale environments, the traditional small-scale environments can still yield valuable scientific insights and can help reduce the barriers to entry for underprivileged communities. To substantiate our claims, we empirically revisit the paper which introduced the Rainbow algorithm [Hessel et al., 2018] and present some new insights into the algorithms used by Rainbow. △ Less

Submitted 21 May, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

Comments: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)

arXiv:2011.05158 [pdf, other]

GANterpretations

Authors: Pablo Samuel Castro

Abstract: Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e.g., Arjovsky et al. [2017]) and creative uses of these generative models (e.g., [Karras et al., 2019, Zhu et al., 2017, ** et al., 2017]). In this work we propose an approach for using the power of GANs to automatically generate videos to accompa… ▽ More Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e.g., Arjovsky et al. [2017]) and creative uses of these generative models (e.g., [Karras et al., 2019, Zhu et al., 2017, ** et al., 2017]). In this work we propose an approach for using the power of GANs to automatically generate videos to accompany audio recordings by aligning to spectral properties of the recording. This allows musicians to explore new forms of multi-modal creative expression, where musical performance can induce an AI-generated musical video that is guided by said performance, as well as a medium for creating a visual narrative to follow a storyline (similar to what was proposed by Frosst and Kereliuk [2019]). △ Less

Submitted 6 November, 2020; originally announced November 2020.

Comments: In 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

arXiv:2007.09121 [pdf, other]

Dealing with Nuisance Parameters using Machine Learning in High Energy Physics: a Review

Authors: Tommaso Dorigo, Pablo de Castro

Abstract: In this work we discuss the impact of nuisance parameters on the effectiveness of machine learning in high-energy physics problems, and provide a review of techniques that allow to include their effect and reduce their impact in the search for optimal selection criteria and variable transformations. The introduction of nuisance parameters complicates the supervised learning task and its correspond… ▽ More In this work we discuss the impact of nuisance parameters on the effectiveness of machine learning in high-energy physics problems, and provide a review of techniques that allow to include their effect and reduce their impact in the search for optimal selection criteria and variable transformations. The introduction of nuisance parameters complicates the supervised learning task and its correspondence with the data analysis goal, due to their contribution degrading the model performances in real data, and the necessary addition of uncertainties in the resulting statistical inference. The approaches discussed include nuisance-parameterized models, modified or adversary losses, semi-supervised learning approaches, and inference-aware techniques. △ Less

Submitted 17 January, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: 43 pages, 5 figures. v1: original review manuscript. v2: text improvement/fixes from review process

arXiv:2005.05618 [pdf]

doi 10.1038/s41427-020-0214-y

Machine Learning Guided Discovery of Gigantic Magnetocaloric Effect in HoB$_{2}$ Near Hydrogen Liquefaction Temperature

Authors: Pedro Baptista de Castro, Kensei Terashima, Takafumi D Yamamoto, Zhufeng Hou, Suguru Iwasaki, Ryo Matsumoto, Shintaro Adachi, Yoshito Saito, Peng Song, Hiroyuki Takeya, Yoshihiko Takano

Abstract: Magnetic refrigeration exploits the magnetocaloric effect which is the entropy change upon application and removal of magnetic fields in materials, providing an alternate path for refrigeration other than the conventional gas cycles. While intensive research has uncovered a vast number of magnetic materials which exhibits large magnetocaloric effect, these properties for a large number of compound… ▽ More Magnetic refrigeration exploits the magnetocaloric effect which is the entropy change upon application and removal of magnetic fields in materials, providing an alternate path for refrigeration other than the conventional gas cycles. While intensive research has uncovered a vast number of magnetic materials which exhibits large magnetocaloric effect, these properties for a large number of compounds still remain unknown. To explore new functional materials in this unknown space, machine learning is used as a guide for selecting materials which could exhibit large magnetocaloric effect. By this approach, HoB$_{2}$ is singled out, synthesized and its magnetocaloric properties are evaluated, leading to the experimental discovery of gigantic magnetic entropy change 40.1 J kg$^{-1}$ K$^{-1}$ (0.35 J cm$^{-3}$ K$^{-1}$) for a field change of 5 T in the vicinity of a ferromagnetic second-order phase transition with a Curie temperature of 15 K. This is the highest value reported so far, to our knowledge, near the hydrogen liquefaction temperature thus it is a highly suitable material for hydrogen liquefaction and low temperature magnetic cooling applications. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: 12 pages including 3 figures and 1 table + 11 pages of supplementary information. Published version available at: https://rdcu.be/b36ep

Journal ref: NPG Asia Materials 12:35 (2020)

arXiv:2005.02732 [pdf, ps, other]

Custom-Precision Mathematical Library Explorations for Code Profiling and Optimization

Authors: David Defour, Pablo de Oliveira Castro, Matei Istoan, Eric Petit

Abstract: The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to address the increasing energy consumption and throughput requirements of scientific applications, library and hardware designers are moving beyond this one-size-fit… ▽ More The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to address the increasing energy consumption and throughput requirements of scientific applications, library and hardware designers are moving beyond this one-size-fits-all approach. In this article we propose to study the effects and benefits of using user-defined floating-point formats and target accuracies in calculations involving mathematical functions. Our tool collects input-data profiles and iteratively explores lower precisions for each call-site of a mathematical function in user applications. This profiling data will be a valuable asset for specializing and fine-tuning mathematical function implementations for a given application. We demonstrate the tool's capabilities on SGP4, a satellite tracking application. The profile data shows the potential for specialization and provides insight into answering where it is useful to provide variable-precision designs for elementary function evaluation. △ Less

Submitted 6 May, 2020; originally announced May 2020.

arXiv:1911.11134 [pdf, other]

Rigging the Lottery: Making All Tickets Winners

Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. Code used in our work can be found in github.com/google-research/rigl. △ Less

Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481

Showing 1–50 of 75 results for author: Castro, P