Skip to main content

Showing 1–16 of 16 results for author: Brakel, P

.
  1. arXiv:2402.05546  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Actor-Critic Reinforcement Learning Scales to Large Models

    Authors: Jost Tobias Springenberg, Abbas Abdolmaleki, **gwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

    Abstract: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  2. arXiv:2203.17138  [pdf, other

    cs.RO cs.AI cs.LG

    Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors

    Authors: Steven Bohez, Saran Tunyasuvunakool, Philemon Brakel, Fereshteh Sadeghi, Leonard Hasenclever, Yuval Tassa, Emilio Parisotto, Jan Humplik, Tuomas Haarnoja, Roland Hafner, Markus Wulfmeier, Michael Neunert, Ben Moran, Noah Siegel, Andrea Huber, Francesco Romano, Nathan Batchelor, Federico Casarini, Josh Merel, Raia Hadsell, Nicolas Heess

    Abstract: We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our appro… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: 30 pages, 9 figures, 8 tables, 14 videos at https://bit.ly/robot-npmp , submitted to Science Robotics

  3. arXiv:2111.00262  [pdf, other

    cs.RO cs.LG

    Learning Coordinated Terrain-Adaptive Locomotion by Imitating a Centroidal Dynamics Planner

    Authors: Philemon Brakel, Steven Bohez, Leonard Hasenclever, Nicolas Heess, Konstantinos Bousmalis

    Abstract: Dynamic quadruped locomotion over challenging terrains with precise foot placements is a hard problem for both optimal control methods and Reinforcement Learning (RL). Non-linear solvers can produce coordinated constraint satisfying motions, but often take too long to converge for online application. RL methods can learn dynamic reactive controllers but require carefully tuned sha** rewards to p… ▽ More

    Submitted 30 October, 2021; originally announced November 2021.

    Comments: A shorter version without appendix was submitted to ICRA 2022

  4. arXiv:1804.00379  [pdf, other

    cs.LG stat.ML

    Recall Traces: Backtracking Models for Efficient Reinforcement Learning

    Authors: Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

    Abstract: In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate a… ▽ More

    Submitted 28 January, 2019; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: Accepted at ICLR 2019

  5. arXiv:1803.10225  [pdf, other

    eess.AS cs.NE cs.SD eess.SP

    Light Gated Recurrent Units for Speech Recognition

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR). Despite the great achievements of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially in challenging environments characterized by significant noise and reverberation. To improve robustness, modern speech reco… ▽ More

    Submitted 26 March, 2018; originally announced March 2018.

    Comments: Copyright 2018 IEEE

    Journal ref: IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 92-102, April 2018

  6. arXiv:1710.05050  [pdf, other

    stat.ML

    Learning Independent Features with Adversarial Nets for Non-linear ICA

    Authors: Philemon Brakel, Yoshua Bengio

    Abstract: Reliable measures of statistical dependence could be useful tools for learning independent features and performing tasks like source separation using Independent Component Analysis (ICA). Unfortunately, many of such measures, like the mutual information, are hard to estimate and optimize directly. We propose to learn independent features with adversarial objectives which optimize such measures imp… ▽ More

    Submitted 13 October, 2017; originally announced October 2017.

    Comments: A preliminary version of this work was presented at the ICML 2017 workshop on implicit models

  7. arXiv:1710.00641  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Improving speech recognition by revising gated recurrent units

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs hav… ▽ More

    Submitted 29 September, 2017; originally announced October 2017.

  8. arXiv:1703.08471  [pdf, other

    cs.CL cs.LG

    Batch-normalized joint training for DNN-based distant speech recognition

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits a lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in the last years on both speech enhancement and speech recognition, one potential limitation of state-of-the-art technology lies in composing mo… ▽ More

    Submitted 24 March, 2017; originally announced March 2017.

    Comments: arXiv admin note: text overlap with arXiv:1703.08002

  9. arXiv:1703.08002  [pdf, other

    cs.CL cs.LG

    A network of deep neural networks for distant speech recognition

    Authors: Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio

    Abstract: Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met. A prominent limitation of current systems lies in the lack of matching and communication between the various technologies involved in the distan… ▽ More

    Submitted 23 March, 2017; originally announced March 2017.

  10. arXiv:1701.02720  [pdf, other

    cs.CL cs.LG stat.ML

    Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

    Authors: Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar Laurent Yoshua Bengio, Aaron Courville

    Abstract: Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classi… ▽ More

    Submitted 10 January, 2017; originally announced January 2017.

  11. arXiv:1612.01928  [pdf, other

    cs.CL cs.CV cs.LG cs.SD stat.ML

    Invariant Representations for Noisy Speech Recognition

    Authors: Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio

    Abstract: Modern automatic speech recognition (ASR) systems need to be robust under acoustic variability arising from environmental, speaker, channel, and recording conditions. Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training. We attempt to address this problem by encouraging the neura… ▽ More

    Submitted 27 November, 2016; originally announced December 2016.

    Comments: 5 pages, 1 figure, 1 table, NIPS workshop on end-to-end speech recognition

  12. arXiv:1607.07086  [pdf, other

    cs.LG

    An Actor-Critic Algorithm for Sequence Prediction

    Authors: Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

    Abstract: We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \texti… ▽ More

    Submitted 3 March, 2017; v1 submitted 24 July, 2016; originally announced July 2016.

  13. arXiv:1511.06456  [pdf, other

    cs.LG

    Task Loss Estimation for Sequence Prediction

    Authors: Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

    Abstract: Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for… ▽ More

    Submitted 19 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Submitted to ICLR 2016

  14. arXiv:1511.06430  [pdf, other

    cs.LG

    Deconstructing the Ladder Network Architecture

    Authors: Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

    Abstract: The Manual labeling of data is and will remain a costly endeavor. For this reason, semi-supervised learning remains a topic of practical importance. The recently proposed Ladder Network is one such approach that has proven to be very successful. In addition to the supervised objective, the Ladder Network also adds an unsupervised objective corresponding to the reconstruction costs of a stack of de… ▽ More

    Submitted 24 May, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 2016

  15. arXiv:1510.01378  [pdf, other

    stat.ML cs.LG cs.NE

    Batch Normalized Recurrent Neural Networks

    Authors: César Laurent, Gabriel Pereyra, Philémon Brakel, Ying Zhang, Yoshua Bengio

    Abstract: Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feedforward neural networks . In particular, batch no… ▽ More

    Submitted 5 October, 2015; originally announced October 2015.

  16. arXiv:1508.04395  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    End-to-End Attention-based Large Vocabulary Speech Recognition

    Authors: Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

    Abstract: Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence decoding. We investigate a more direct approach in which the HMM is replaced with a Recurrent Neural Network (RNN)… ▽ More

    Submitted 14 March, 2016; v1 submitted 18 August, 2015; originally announced August 2015.