Skip to main content

Showing 1–50 of 75 results for author: Castro, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18420  [pdf, other

    cs.LG cs.AI

    Mixture of Experts in a Mixture of RL settings

    Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

    Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.17523  [pdf, other

    cs.LG cs.AI

    On the consistency of hyper-parameter selection in value-based deep reinforcement learning

    Authors: Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

    Abstract: Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2405.11065  [pdf, other

    cs.MS cs.DC cs.SE

    Enabling mixed-precision with the help of tools: A Nekbone case study

    Authors: Yanxiang Chen, Pablo de Oliveira Castro, Paolo Bientinesi, Roman Iakymchuk

    Abstract: Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the CFD solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model.… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  4. arXiv:2403.19260  [pdf, other

    cs.CL

    NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data

    Authors: Manuel Tonneau, Pedro Vitor Quinta de Castro, Karim Lasri, Ibrahim Farouq, Lakshminarayanan Subramanian, Victor Orozco-Olvera, Samuel P. Fraiberger

    Abstract: To address the global issue of online hate, hate speech detection (HSD) systems are typically developed on datasets from the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on non-representative samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce Nai… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: ACL 2024 main conference. Data and models available at https://github.com/worldbank/NaijaHate

  5. arXiv:2403.03950  [pdf, other

    cs.LG cs.AI stat.ML

    Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

    Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

    Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  6. arXiv:2402.12479  [pdf, other

    cs.LG cs.AI

    In value-based deep reinforcement learning, a pruned network is a good network

    Authors: Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

    Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  7. arXiv:2402.08609  [pdf, other

    cs.LG cs.AI

    Mixtures of Experts Unlock Parameter Scaling for Deep RL

    Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

    Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More

    Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  8. arXiv:2402.06759  [pdf

    cs.HC cs.AI

    A Methodology for Questionnaire Analysis: Insights through Cluster Analysis of an Investor Competition Data

    Authors: Carlos Henrique Q. Forster, Paulo André Lima de Castro, Andrei Ramalho

    Abstract: In this paper, we propose a methodology for the analysis of questionnaire data along with its application on discovering insights from investor data motivated by a day trading competition. The questionnaire includes categorical questions, which are reduced to binary questions, 'yes' or 'no'. The methodology reduces dimensionality by grou** questions and participants with similar responses using… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 14 pages, 12 figures

  9. High-Dimensional Bayesian Optimisation with Large-Scale Constraints -- An Application to Aeroelastic Tailoring

    Authors: Hauke Maathuis, Roeland De Breuker, Saullo G. P. Castro

    Abstract: Design optimisation potentially leads to lightweight aircraft structures with lower environmental impact. Due to the high number of design variables and constraints, these problems are ordinarily solved using gradient-based optimisation methods, leading to a local solution in the design space while the global space is neglected. Bayesian Optimisation is a promising path towards sample-efficient, g… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Conference paper submitted to AIAA Scitech 2024 Forum

  10. arXiv:2311.17894  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci cs.LG

    Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

    Authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

    Abstract: We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural n… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  11. arXiv:2311.14115  [pdf, other

    cs.LG cs.AI cs.CL

    A density estimation perspective on learning from pairwise human preferences

    Authors: Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

    Abstract: Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted… ▽ More

    Submitted 10 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  12. arXiv:2310.19804  [pdf, other

    cs.LG cs.AI

    A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

    Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

    Abstract: Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). T… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published in TMLR

  13. arXiv:2310.03882  [pdf, other

    cs.LG cs.AI

    Small batch deep reinforcement learning

    Authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant pe… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  14. How Easy it is to Know How: An Upper Bound for the Satisfiability Problem

    Authors: Carlos Areces, Valentin Cassano, Raul Fervari, Pablo Castro, Andres Saravia

    Abstract: We investigate the complexity of the satisfiability problem for a modal logic expressing `knowing how' assertions, related to an agent's abilities to achieve a certain goal. We take one of the most standard semantics for this kind of logics based on linear plans. Our main result is a proof that checking satisfiability of a `knowing how' formula can be done in $Σ_2^P$. The algorithm we present reli… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  15. arXiv:2309.07309  [pdf, ps, other

    cs.LO cs.FL cs.GT

    Quantifying Masking Fault-Tolerance via Fair Stochastic Games

    Authors: Pablo F. Castro, Pedro R. D'Argenio, Ramiro Demasi, Luciano Putruele

    Abstract: We introduce a formal notion of masking fault-tolerance between probabilistic transition systems using stochastic games. These games are inspired in bisimulation games, but they also take into account the possible faulty behavior of systems. When no faults are present, these games boil down to probabilistic bisimulation games. Since these games could be infinite, we propose a symbolic way of repre… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: In Proceedings EXPRESS/SOS2023, arXiv:2309.05788. arXiv admin note: substantial text overlap with arXiv:2207.02045

    Journal ref: EPTCS 387, 2023, pp. 132-148

  16. arXiv:2307.13824  [pdf, other

    cs.LG cs.AI

    Offline Reinforcement Learning with On-Policy Q-Function Regularization

    Authors: Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

    Abstract: The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. I… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Published at European Conference on Machine Learning (ECML), 2023

  17. arXiv:2306.13831  [pdf, other

    cs.LG

    Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

    Authors: Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo de Lazcano, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, Jordan Terry

    Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas.… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  18. arXiv:2305.19452  [pdf, other

    cs.LG cs.AI

    Bigger, Better, Faster: Human-level Atari with human-level efficiency

    Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

    Abstract: We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICML 2023, revised version

  19. arXiv:2305.13447  [pdf, other

    cs.LG cs.CV

    Regularization Through Simultaneous Learning: A Case Study on Plant Classification

    Authors: Pedro Henrique Nascimento Castro, Gabriel Cássia Fortuna, Rafael Alves Bonfim de Queiroz, Gladston Juliano Prates Moreira, Eduardo José da Silva Luz

    Abstract: In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty.… ▽ More

    Submitted 20 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  20. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  21. arXiv:2304.12567  [pdf, other

    cs.LG cs.AI stat.ML

    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

    Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures

  22. arXiv:2304.01382  [pdf, other

    cs.CV

    PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching

    Authors: Pedro Castro, Tae-Kyun Kim

    Abstract: Estimating the pose of an unseen object is the goal of the challenging one-shot pose estimation task. Previous methods have heavily relied on feature matching with great success. However, these methods are often inefficient and limited by their reliance on pre-trained models that have not be designed specifically for pose estimation. In this paper we propose PoseMatcher, an accurate model free one… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  23. arXiv:2302.12902  [pdf, other

    cs.LG

    The Dormant Neuron Phenomenon in Deep Reinforcement Learning

    Authors: Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

    Abstract: In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective me… ▽ More

    Submitted 13 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: Oral at ICML 2023

  24. arXiv:2210.15600  [pdf, other

    cs.CL cond-mat.supr-con cs.LG

    Automatic extraction of materials and properties from superconductors scientific literature

    Authors: Luca Foppiano, Pedro Baptista de Castro, Pedro Ortiz Suarez, Kensei Terashima, Yoshihiko Takano, Masashi Ishii

    Abstract: The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic a… ▽ More

    Submitted 22 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: 20 pages, 11 figures, 8 tables

    Journal ref: STAM:M, 2023, VOL. 3, NO. 1, 2153633

  25. arXiv:2210.11718  [pdf, other

    cs.CV cs.LG

    CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers

    Authors: Pedro Castro, Tae-Kyun Kim

    Abstract: Learning based 6D object pose estimation methods rely on computing large intermediate pose representations and/or iteratively refining an initial estimation with a slow render-compare pipeline. This paper introduces a novel method we call Cascaded Pose Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the f… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted at WACV2023

  26. arXiv:2208.04213  [pdf

    cs.DC cs.SE

    Hybrid Serverless Computing: Opportunities and Challenges

    Authors: Paul Castro, Vatche Isahagian, Vinod Muthusamy, Aleksander Slominski

    Abstract: In recent years, there has been a surge in the adoption of serverless computing due to the ease of deployment, attractive pay-per-use pricing, and transparent horizontal auto-scaling. At the same time, infrastructure advancements such as the emergence of 5G networks and the explosion of devices connected to Internet known as Internet of Things (IoT), as well as new application requirements that co… ▽ More

    Submitted 14 September, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

  27. arXiv:2207.02045  [pdf, ps, other

    cs.LO

    A Stochastic Game Approach to Masking Fault-Tolerance: Bisimulation and Quantification

    Authors: Pablo F. Castro, Pedro D'Argenio, Luciano Putruele, Ramiro Demasi

    Abstract: We introduce a formal notion of masking fault-tolerance between probabilistic transition systems based on a variant of probabilistic bisimulation (named masking simulation). We also provide the corresponding probabilistic game characterization. Even though these games could be infinite, we propose a symbolic way of representing them, such that it can be decided in polynomial time if there is a mas… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  28. arXiv:2206.10369  [pdf, other

    cs.LG cs.AI

    The State of Sparse Training in Deep Reinforcement Learning

    Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

    Abstract: The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning (ICML'22)

  29. arXiv:2206.01626  [pdf, other

    cs.LG cs.AI stat.ML

    Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from s… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Code and agents at https://agarwl.github.io/reincarnating_rl

  30. arXiv:2112.09811  [pdf, ps, other

    cs.LO

    Playing Against Fair Adversaries in Stochastic Games with Total Rewards

    Authors: Pablo F. Castro, Pedro R. D'Argenio, Luciano Putruele, Ramiro Demasi

    Abstract: We investigate zero-sum turn-based two-player stochastic games in which the objective of one player is to maximize the amount of rewards obtained during a play, while the other aims at minimizing it. We focus on games in which the minimizer plays in a fair way. We believe that these kinds of games enjoy interesting applications in software verification, where the maximizer plays the role of a syst… ▽ More

    Submitted 19 May, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  31. arXiv:2112.09477  [pdf, other

    cs.LG cs.AI

    Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

    Authors: Rodrigo Toro Icarte, Ethan Waldie, Toryn Q. Klassen, Richard Valenzano, Margarita P. Castro, Sheila A. McIlraith

    Abstract: Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  32. arXiv:2112.02070  [pdf, other

    cs.MM cs.AI

    Malakai: Music That Adapts to the Shape of Emotions

    Authors: Zack Harris, Liam Atticus Clarke, Pietro Gagliano, Dante Camarena, Manal Siddiqui, Pablo S. Castro

    Abstract: The advent of ML music models such as Google Magenta's MusicVAE now allow us to extract and replicate compositional features from otherwise complex datasets. These models allow computational composers to parameterize abstract variables such as style and mood. By leveraging these models and combining them with procedural algorithms from the last few decades, it is possible to create a dynamic song… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  33. arXiv:2111.11562  [pdf, other

    cs.DC cs.PL

    Reliable Actors with Retry Orchestration

    Authors: Olivier Tardieu, David Grove, Gheorghe-Teodor Bercea, Paul Castro, Jaroslaw Cwiklik, Edward Epstein

    Abstract: Cloud developers have to build applications that are resilient to failures and interruptions. We advocate for a fault-tolerant programming model for the cloud based on actors, retry orchestration, and tail calls. This model builds upon persistent data stores and messages queues readily available on the cloud. Retry orchestration not only guarantees that (1) failed actor invocations will be retried… ▽ More

    Submitted 11 November, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: 22 pages, 7 figures

  34. arXiv:2111.05128  [pdf, other

    cs.LG cs.AI cs.HC cs.SD eess.AS

    Losses, Dissonances, and Distortions

    Authors: Pablo Samuel Castro

    Abstract: In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting. These dissonances and distortions become part of an artistic performance not just by affecting the visualizations, but also by affecting the artistic musical perform… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021

  35. arXiv:2110.14020  [pdf, other

    cs.LG cs.AI

    The Difficulty of Passive Learning in Deep Reinforcement Learning

    Authors: Georg Ostrovski, Pablo Samuel Castro, Will Dabney

    Abstract: Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justif… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted paper at NeurIPS 2021

  36. arXiv:2108.13264  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Deep Reinforcement Learning at the Edge of the Statistical Precipice

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Lea… ▽ More

    Submitted 5 January, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Outstanding Paper Award at NeurIPS 2021. Website: https://agarwl.github.io/rliable. 28 Pages, 33 Figures

  37. arXiv:2108.05828  [pdf, other

    cs.LG cs.AI stat.ML

    A general class of surrogate functions for stable and efficient reinforcement learning

    Authors: Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

    Abstract: Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives ris… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Fixed minor typos

  38. arXiv:2106.08229  [pdf, other

    cs.LG cs.AI

    MICo: Improved representations via sampling-based state similarity for Markov decision processes

    Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

    Abstract: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of sha** the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed… ▽ More

    Submitted 21 January, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2021

  39. arXiv:2105.07530  [pdf, other

    hep-ex cs.LG hep-ph physics.data-an

    Advances in Multi-Variate Analysis Methods for New Physics Searches at the Large Hadron Collider

    Authors: Anna Stakia, Tommaso Dorigo, Giovanni Banelli, Daniela Bortoletto, Alessandro Casa, Pablo de Castro, Christophe Delaere, Julien Donini, Livio Finos, Michele Gallinaro, Andrea Giammanco, Alexander Held, Fabricio Jiménez Morales, Grzegorz Kotkowski, Seng Pei Liew, Fabio Maltoni, Giovanna Menardi, Ioanna Papavergou, Alessia Saggio, Bruno Scarpa, Giles C. Strong, Cecilia Tosciri, João Varela, Pietro Vischia, Andreas Weiler

    Abstract: Between the years 2015 and 2019, members of the Horizon 2020-funded Innovative Training Network named "AMVA4NewPhysics" studied the customization and application of advanced multivariate analysis methods and statistical learning tools to high-energy physics problems, as well as developed entirely new ones. Many of those methods were successfully used to improve the sensitivity of data analyses per… ▽ More

    Submitted 22 November, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: 101 pages, 21 figures, submitted to Elsevier. [v2]: Updated to published version (in 'Reviews in Physics')

    Journal ref: Rev. Phys. 7 (2021) 100063

  40. arXiv:2104.14353  [pdf, other

    eess.IV cs.CV

    A Smartphone based Application for Skin Cancer Classification Using Deep Learning with Clinical Images and Lesion Information

    Authors: Breno Krohling, Pedro B. C. Castro, Andre G. C. Pacheco, Renato A. Krohling

    Abstract: Over the last decades, the incidence of skin cancer, melanoma and non-melanoma, has increased at a continuous rate. In particular for melanoma, the deadliest type of skin cancer, early detection is important to increase patient prognosis. Recently, deep neural networks (DNNs) have become viable to deal with skin cancer detection. In this work, we present a smartphone-based application to assist on… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

  41. arXiv:2102.01514  [pdf, other

    cs.LG cs.AI stat.ML

    Metrics and continuity in reinforcement learning

    Authors: Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and top… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted at AAAI 2021

  42. arXiv:2101.08169  [pdf, other

    cs.AI

    mt5se: An Open Source Framework for Building Autonomous Trading Robots

    Authors: Paulo André Lima de Castro

    Abstract: Autonomous trading robots have been studied in artificial intelligence area for quite some time. Many AI techniques have been tested for building autonomous agents able to trade financial assets. These initiatives include traditional neural networks, fuzzy logic, reinforcement learning but also more recent approaches like deep neural networks and deep reinforcement learning. Many developers claim… ▽ More

    Submitted 28 June, 2022; v1 submitted 20 January, 2021; originally announced January 2021.

    Comments: This paper replaces an old version of the framework, called mt5b3, which is now deprecated

  43. arXiv:2101.07217  [pdf, other

    cs.SE cs.AI cs.CE

    Is it a great Autonomous FX Trading Strategy or you are just fooling yourself

    Authors: Murilo Sibrao Bernardini, Paulo Andre Lima de Castro

    Abstract: In this paper, we propose a method for evaluating autonomous trading strategies that provides realistic expectations, regarding the strategy's long-term performance. This method addresses This method addresses many pitfalls that currently fool even experienced software developers and researchers, not to mention the customers that purchase these products. We present the results of applying our meth… ▽ More

    Submitted 19 November, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: An Implementation of the proposed method: STSE is available at github. The paper includes the link in the reference section

  44. arXiv:2101.05265  [pdf, other

    cs.LG cs.AI stat.ML

    Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

    Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoreti… ▽ More

    Submitted 18 March, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: ICLR 2021 (Spotlight). Website: https://agarwl.github.io/pse

  45. arXiv:2011.14826  [pdf, other

    cs.LG cs.AI

    Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research

    Authors: Johan S. Obando-Ceron, Pablo Samuel Castro

    Abstract: Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect… ▽ More

    Submitted 21 May, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)

  46. arXiv:2011.05158  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    GANterpretations

    Authors: Pablo Samuel Castro

    Abstract: Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e.g., Arjovsky et al. [2017]) and creative uses of these generative models (e.g., [Karras et al., 2019, Zhu et al., 2017, ** et al., 2017]). In this work we propose an approach for using the power of GANs to automatically generate videos to accompa… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

    Comments: In 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

  47. arXiv:2007.09121  [pdf, other

    stat.ML cs.LG hep-ex hep-ph physics.data-an

    Dealing with Nuisance Parameters using Machine Learning in High Energy Physics: a Review

    Authors: Tommaso Dorigo, Pablo de Castro

    Abstract: In this work we discuss the impact of nuisance parameters on the effectiveness of machine learning in high-energy physics problems, and provide a review of techniques that allow to include their effect and reduce their impact in the search for optimal selection criteria and variable transformations. The introduction of nuisance parameters complicates the supervised learning task and its correspond… ▽ More

    Submitted 17 January, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: 43 pages, 5 figures. v1: original review manuscript. v2: text improvement/fixes from review process

  48. arXiv:2005.05618  [pdf

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Machine Learning Guided Discovery of Gigantic Magnetocaloric Effect in HoB$_{2}$ Near Hydrogen Liquefaction Temperature

    Authors: Pedro Baptista de Castro, Kensei Terashima, Takafumi D Yamamoto, Zhufeng Hou, Suguru Iwasaki, Ryo Matsumoto, Shintaro Adachi, Yoshito Saito, Peng Song, Hiroyuki Takeya, Yoshihiko Takano

    Abstract: Magnetic refrigeration exploits the magnetocaloric effect which is the entropy change upon application and removal of magnetic fields in materials, providing an alternate path for refrigeration other than the conventional gas cycles. While intensive research has uncovered a vast number of magnetic materials which exhibits large magnetocaloric effect, these properties for a large number of compound… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: 12 pages including 3 figures and 1 table + 11 pages of supplementary information. Published version available at: https://rdcu.be/b36ep

    Journal ref: NPG Asia Materials 12:35 (2020)

  49. arXiv:2005.02732  [pdf, ps, other

    cs.MS

    Custom-Precision Mathematical Library Explorations for Code Profiling and Optimization

    Authors: David Defour, Pablo de Oliveira Castro, Matei Istoan, Eric Petit

    Abstract: The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to address the increasing energy consumption and throughput requirements of scientific applications, library and hardware designers are moving beyond this one-size-fit… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

  50. arXiv:1911.11134  [pdf, other

    cs.LG cs.CV stat.ML

    Rigging the Lottery: Making All Tickets Winners

    Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

    Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More

    Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

    Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481