-
Mixture of Experts in a Mixture of RL settings
Authors:
Timon Willi,
Johan Obando-Ceron,
Jakob Foerster,
Karolina Dziugaite,
Pablo Samuel Castro
Abstract:
Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea…
▽ More
Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Authors:
Johan Obando-Ceron,
João G. M. Araújo,
Aaron Courville,
Pablo Samuel Castro
Abstract:
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec…
▽ More
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
△ Less
Submitted 2 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
In value-based deep reinforcement learning, a pruned network is a good network
Authors:
Johan Obando-Ceron,
Aaron Courville,
Pablo Samuel Castro
Abstract:
Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional…
▽ More
Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks, using only a small fraction of the full network parameters.
△ Less
Submitted 25 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Authors:
Johan Obando-Ceron,
Ghada Sokar,
Timon Willi,
Clare Lyle,
Jesse Farebrother,
Jakob Foerster,
Gintare Karolina Dziugaite,
Doina Precup,
Pablo Samuel Castro
Abstract:
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-…
▽ More
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards develo** scaling laws for reinforcement learning.
△ Less
Submitted 26 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Small batch deep reinforcement learning
Authors:
Johan Obando-Ceron,
Marc G. Bellemare,
Pablo Samuel Castro
Abstract:
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant pe…
▽ More
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Probabilistic Multimodal Depth Estimation Based on Camera-LiDAR Sensor Fusion
Authors:
Johan S. Obando-Ceron,
Victor Romero-Cano,
Sildomar Monteiro
Abstract:
Multi-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There have been outstanding advances in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution, or LiDAR sensors, due to the precise geometric data they provide. However, each of these suffers f…
▽ More
Multi-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There have been outstanding advances in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution, or LiDAR sensors, due to the precise geometric data they provide. However, each of these suffers from some inherent drawbacks, such as high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate for the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They process the sensor data streams independently and combine the high-level estimates obtained for each sensor. In this paper, we tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems.
This work proposes a Conditional Random Field model with multiple geometry and appearance potentials. It seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Authors:
Max Schwarzer,
Johan Obando-Ceron,
Aaron Courville,
Marc Bellemare,
Rishabh Agarwal,
Pablo Samuel Castro
Abstract:
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis…
▽ More
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
△ Less
Submitted 13 November, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
JaxPruner: A concise library for sparsity research
Authors:
Joo Hyung Lee,
Wonpyo Park,
Nicole Mitchell,
Jonathan Pilault,
Johan Obando-Ceron,
Han-Byul Kim,
Namhoon Lee,
Elias Frantar,
Yun Long,
Amir Yazdanbakhsh,
Shivani Agrawal,
Suvinay Subramanian,
Xin Wang,
Sheng-Chun Kao,
Xingyao Zhang,
Trevor Gale,
Aart Bik,
Woohyun Han,
Milen Ferev,
Zhonglin Han,
Hong-Seok Kim,
Yann Dauphin,
Gintare Karolina Dziugaite,
Pablo Samuel Castro,
Utku Evci
Abstract:
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the…
▽ More
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.
△ Less
Submitted 18 December, 2023; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research
Authors:
Johan S. Obando-Ceron,
Pablo Samuel Castro
Abstract:
Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect…
▽ More
Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect of widening the gap between those with ample access to computational resources, and those without. In this work we argue that, despite the community's emphasis on large-scale environments, the traditional small-scale environments can still yield valuable scientific insights and can help reduce the barriers to entry for underprivileged communities. To substantiate our claims, we empirically revisit the paper which introduced the Rainbow algorithm [Hessel et al., 2018] and present some new insights into the algorithms used by Rainbow.
△ Less
Submitted 21 May, 2021; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Exploiting the potential of deep reinforcement learning for classification tasks in high-dimensional and unstructured data
Authors:
Johan S. Obando-Ceron,
Victor Romero Cano,
Walter Mayor Toro
Abstract:
This paper presents a framework for efficiently learning feature selection policies which use less features to reach a high classification precision on large unstructured data. It uses a Deep Convolutional Autoencoder (DCAE) for learning compact feature spaces, in combination with recently-proposed Reinforcement Learning (RL) algorithms as Double DQN and Retrace.
This paper presents a framework for efficiently learning feature selection policies which use less features to reach a high classification precision on large unstructured data. It uses a Deep Convolutional Autoencoder (DCAE) for learning compact feature spaces, in combination with recently-proposed Reinforcement Learning (RL) algorithms as Double DQN and Retrace.
△ Less
Submitted 19 December, 2019;
originally announced December 2019.