Skip to main content

Showing 51–82 of 82 results for author: Synnaeve, G

.
  1. arXiv:2005.07394  [pdf, other

    cs.CL cs.SD eess.AS

    Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

    Authors: Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig

    Abstract: Videos uploaded on social media are often accompanied with textual descriptions. In building automatic speech recognition (ASR) systems for videos, we can exploit the contextual information provided by such video metadata. In this paper, we explore ASR lattice rescoring by selectively attending to the video descriptions. We first use an attention based method to extract contextual vector represent… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  2. arXiv:2002.10336  [pdf, other

    cs.CL cs.LG eess.AS

    Semi-Supervised Speech Recognition via Local Prior Matching

    Authors: Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun

    Abstract: For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discri… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

  3. arXiv:2001.09832  [pdf, other

    cs.LG stat.ML

    Polygames: Improved Zero Learning

    Authors: Tristan Cazenave, Yen-Chi Chen, Guan-Wei Chen, Shi-Yu Chen, Xian-Dong Chiu, Julien Dehos, Maria Elsa, Qucheng Gong, Hengyuan Hu, Vasil Khalidov, Cheng-Ling Li, Hsin-I Lin, Yu-** Lin, Xavier Martinet, Vegard Mella, Jeremy Rapin, Baptiste Roziere, Gabriel Synnaeve, Fabien Teytaud, Olivier Teytaud, Shi-Cheng Ye, Yi-Jun Ye, Shi-Jim Yen, Sergey Zagoruyko

    Abstract: Since DeepMind's AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by kee** track of the best checkpoints during the training and by train… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  4. arXiv:2001.09727  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling Up Online Speech Recognition Using ConvNets

    Authors: Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

    Abstract: We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). We improve the core TDS architecture in order to limit the future context and hence reduce latency while maintaining accuracy. The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency a… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  5. Libri-Light: A Benchmark for ASR with Limited or No Supervision

    Authors: Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux

    Abstract: We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

  6. arXiv:1911.08460  [pdf, ps, other

    cs.CL cs.SD eess.AS

    End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

    Authors: Gabriel Synnaeve, Qiantong Xu, Jacob Kahn, Tatiana Likhomanenko, Edouard Grave, Vineel Pratap, Anuroop Sriram, Vitaliy Liptchinsky, Ronan Collobert

    Abstract: We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance… ▽ More

    Submitted 14 July, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: Published at the workshop on Self-supervision in Audio and Speech (SAS) at the 37th International Conference on Machine Learning (ICML 2020), Vienna, Austria

  7. arXiv:1910.10324  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

    Authors: Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig

    Abstract: Deep acoustic models typically receive features in the first layer of the network, and process increasingly abstract representations in the subsequent layers. Here, we propose to feed the input features at multiple depths in the acoustic model. As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and l… ▽ More

    Submitted 13 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Accepted in IEEE ICASSP 2020

  8. arXiv:1910.08809  [pdf, other

    cs.LG cs.MA stat.ML

    A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

    Authors: Nicolas Carion, Gabriel Synnaeve, Alessandro Lazaric, Nicolas Usunier

    Abstract: Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximit… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

    Journal ref: NeurIPS 2019

  9. arXiv:1907.09273  [pdf, other

    cs.AI cs.CL

    Why Build an Assistant in Minecraft?

    Authors: Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

    Abstract: In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

    Submitted 25 July, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

  10. arXiv:1906.12266  [pdf, other

    cs.LG cs.AI stat.ML

    Growing Action Spaces

    Authors: Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve

    Abstract: In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress. In this work, we use a curriculum of progressively growing action spaces to accelerate learning. We assume the environment is out of our control, but that the agent may set an internal curriculum by initially restricting its action space. Our ap… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

  11. arXiv:1906.04323  [pdf, other

    cs.CL cs.SD eess.AS

    Word-level Speech Recognition with a Letter to Word Encoder

    Authors: Ronan Collobert, Awni Hannun, Gabriel Synnaeve

    Abstract: We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We als… ▽ More

    Submitted 14 July, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: ICML 2020

  12. Who Needs Words? Lexicon-Free Speech Recognition

    Authors: Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

    Abstract: Lexicon-free speech recognition naturally deals with the problem of out-of-vocabulary (OOV) words. In this paper, we show that character-based language models (LM) can perform as well as word-based LMs for speech recognition, in word error rates (WER), even without restricting the decoding to a lexicon. We study character-based LMs and show that convolutional LMs can effectively leverage large (ch… ▽ More

    Submitted 13 September, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

    Comments: 8 pages, 1 figure

    Journal ref: Proc. Interspeech 2019

  13. arXiv:1902.06022  [pdf, other

    cs.CL

    A Fully Differentiable Beam Search Decoder

    Authors: Ronan Collobert, Awni Hannun, Gabriel Synnaeve

    Abstract: We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We d… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.

  14. wav2letter++: The Fastest Open-source Speech Recognition System

    Authors: Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert

    Abstract: This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster th… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

  15. arXiv:1812.06864  [pdf, other

    cs.CL

    Fully Convolutional Speech Recognition

    Authors: Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert

    Abstract: Current state-of-the-art speech recognition systems build on recurrent neural networks for acoustic and/or language modeling, and rely on feature extraction pipelines to extract mel-filterbanks or cepstral coefficients. In this paper we present an alternative approach based solely on convolutional neural networks, leveraging recent advances in acoustic models from the raw waveform and language mod… ▽ More

    Submitted 9 April, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

  16. arXiv:1812.03483  [pdf, ps, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

    Authors: Yossi Adi, Neil Zeghidour, Ronan Collobert, Nicolas Usunier, Vitaliy Liptchinsky, Gabriel Synnaeve

    Abstract: Transcribed datasets typically contain speaker identity for each instance in the data. We investigate two ways to incorporate this information during training: Multi-Task Learning and Adversarial Learning. In multi-task learning, the goal is speaker prediction; we expect a performance improvement with this joint training if the two tasks of speech recognition and speaker recognition share a common… ▽ More

    Submitted 14 February, 2019; v1 submitted 9 December, 2018; originally announced December 2018.

  17. arXiv:1812.00054  [pdf, other

    cs.LG cs.AI

    Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger

    Authors: Gabriel Synnaeve, Zeming Lin, Jonas Gehring, Dan Gant, Vegard Mella, Vasil Khalidov, Nicolas Carion, Nicolas Usunier

    Abstract: We formulate the problem of defogging as state estimation and future state prediction from previous, partial observations in the context of real-time strategy games. We propose to employ encoder-decoder neural networks for this task, and introduce proxy tasks and baselines for evaluation to assess their ability of capturing basic game rules and high-level dynamics. By combining convolutional neura… ▽ More

    Submitted 30 November, 2018; originally announced December 2018.

    Journal ref: Advances in Neural Information Processing Systems 31 (2018) 10759-10770

  18. arXiv:1811.08568  [pdf, other

    cs.LG stat.ML

    High-Level Strategy Selection under Partial Observability in StarCraft: Brood War

    Authors: Jonas Gehring, Da Ju, Vegard Mella, Daniel Gant, Nicolas Usunier, Gabriel Synnaeve

    Abstract: We consider the problem of high-level strategy selection in the adversarial setting of real-time strategy games from a reinforcement learning perspective, where taking an action corresponds to switching to the respective strategy. Here, a good strategy successfully counters the opponent's current and possible future strategies which can only be estimated using partial observations. We investigate… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  19. arXiv:1806.07098  [pdf, other

    cs.CL cs.SD eess.AS

    End-to-End Speech Recognition From the Raw Waveform

    Authors: Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

    Abstract: State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone fi… ▽ More

    Submitted 21 June, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

    Comments: Accepted for presentation at Interspeech 2018

  20. arXiv:1805.11199  [pdf, other

    cs.AI cs.LG

    Value Propagation Networks

    Authors: Nantas Nardelli, Gabriel Synnaeve, Zeming Lin, Pushmeet Kohli, Philip H. S. Torr, Nicolas Usunier

    Abstract: We present Value Propagation (VProp), a set of parameter-efficient differentiable planning modules built on Value Iteration which can successfully be trained using reinforcement learning to solve unseen tasks, has the capability to generalize to larger map sizes, and can learn to navigate in dynamic environments. We show that the modules enable learning to plan when the environment also includes s… ▽ More

    Submitted 25 March, 2019; v1 submitted 28 May, 2018; originally announced May 2018.

    Comments: Updated to match ICLR 2019 OpenReview's version

  21. arXiv:1712.09444  [pdf, other

    cs.CL cs.AI

    Letter-Based Speech Recognition with Gated ConvNets

    Authors: Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

    Abstract: In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forc… ▽ More

    Submitted 15 February, 2019; v1 submitted 22 December, 2017; originally announced December 2017.

    Comments: 13 pages.arXiv admin note: text overlap with arXiv:1609.03193

  22. arXiv:1711.01161  [pdf, other

    cs.CL

    Learning Filterbanks from Raw Speech for Phone Recognition

    Authors: Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux

    Abstract: We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks, and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for seve… ▽ More

    Submitted 4 April, 2018; v1 submitted 3 November, 2017; originally announced November 2017.

    Comments: Accepted at ICASSP 2018

  23. arXiv:1708.02139  [pdf, other

    cs.AI

    STARDATA: A StarCraft AI Research Dataset

    Authors: Zeming Lin, Jonas Gehring, Vasil Khalidov, Gabriel Synnaeve

    Abstract: We release a dataset of 65646 StarCraft replays that contains 1535 million frames and 496 million player actions. We provide full game state data along with the original replays that can be viewed in StarCraft. The game state data was recorded every 3 frames which ensures suitability for a wide variety of machine learning tasks such as strategy classification, inverse reinforcement learning, imita… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.

    Comments: To be presented at AIIDE17

  24. arXiv:1703.05407  [pdf, other

    cs.LG

    Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

    Authors: Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

    Abstract: We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be res… ▽ More

    Submitted 27 April, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Published in ICLR 2018

  25. arXiv:1611.00625  [pdf, other

    cs.LG cs.AI

    TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games

    Authors: Gabriel Synnaeve, Nantas Nardelli, Alex Auvolat, Soumith Chintala, Timothée Lacroix, Zeming Lin, Florian Richoux, Nicolas Usunier

    Abstract: We present TorchCraft, a library that enables deep learning research on Real-Time Strategy (RTS) games such as StarCraft: Brood War, by making it easier to control these games from a machine learning framework, here Torch. This white paper argues for using RTS games as a benchmark for AI research, and describes the design and components of TorchCraft.

    Submitted 3 November, 2016; v1 submitted 1 November, 2016; originally announced November 2016.

    ACM Class: I.2.1

  26. arXiv:1609.03193  [pdf, other

    cs.LG cs.AI cs.CL

    Wav2Letter: an End-to-End ConvNet-based Speech Recognition System

    Authors: Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve

    Abstract: This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simp… ▽ More

    Submitted 12 September, 2016; v1 submitted 11 September, 2016; originally announced September 2016.

    Comments: 8 pages, 4 figures (7 plots/schemas), 2 tables (4 tabulars)

    ACM Class: I.2.6; I.2.7

  27. arXiv:1609.02993  [pdf, other

    cs.AI cs.LG

    Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks

    Authors: Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, Soumith Chintala

    Abstract: We consider scenarios from the real-time strategy game StarCraft as new benchmarks for reinforcement learning algorithms. We propose micromanagement tasks, which present the problem of the short-term, low-level control of army members during a battle. From a reinforcement learning point of view, these scenarios are challenging because the state-action space is very large, and because there is no o… ▽ More

    Submitted 26 November, 2016; v1 submitted 9 September, 2016; originally announced September 2016.

    Comments: 18 pages, 1 figure (2 plots), 2 tables

    ACM Class: I.2.1; I.2.6

  28. arXiv:1511.07401  [pdf, other

    cs.LG cs.AI cs.NE

    MazeBase: A Sandbox for Learning from Games

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

    Abstract: This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these… ▽ More

    Submitted 7 January, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

  29. arXiv:1412.6645  [pdf, other

    cs.SD cs.CL cs.LG

    Weakly Supervised Multi-Embeddings Learning of Acoustic Models

    Authors: Gabriel Synnaeve, Emmanuel Dupoux

    Abstract: We trained a Siamese network with multi-task same/different information on a speech dataset, and found that it was possible to share a network for both tasks without a loss in performance. The first task was to discriminate between two same or different words, and the second was to discriminate between two same or different talkers.

    Submitted 20 April, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: 6 pages, 3 figures

    ACM Class: I.2.6; I.2.7; I.5.1

  30. arXiv:1211.4552  [pdf, other

    cs.AI

    A Dataset for StarCraft AI \& an Example of Armies Clustering

    Authors: Gabriel Synnaeve, Pierre Bessiere

    Abstract: This paper advocates the exploration of the full state of recorded real-time strategy (RTS) games, by human or robotic players, to discover how to reason about tactics and strategy. We present a dataset of StarCraft games encompassing the most of the games' state (not only player's orders). We explain one of the possible usages of this dataset by clustering armies on their compositions. This reduc… ▽ More

    Submitted 19 November, 2012; originally announced November 2012.

    Comments: Artificial Intelligence in Adversarial Real-Time Games 2012, Palo Alto : United States (2012)

  31. arXiv:1111.3735  [pdf, other

    cs.LG cs.AI

    A Bayesian Model for Plan Recognition in RTS Games applied to StarCraft

    Authors: Gabriel Synnaeve, Pierre Bessière

    Abstract: The task of keyhole (unobtrusive) plan recognition is central to adaptive game AI. "Tech trees" or "build trees" are the core of real-time strategy (RTS) game strategic (long term) planning. This paper presents a generic and simple Bayesian model for RTS build tree prediction from noisy observations, which parameters are learned from replays (game logs). This unsupervised machine learning approach… ▽ More

    Submitted 16 November, 2011; originally announced November 2011.

    Comments: 7 pages; Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE 2011), Palo Alto : États-Unis (2011)

  32. Bayesian Modeling of a Human MMORPG Player

    Authors: Gabriel Synnaeve, Pierre Bessiere

    Abstract: This paper describes an application of Bayesian programming to the control of an autonomous avatar in a multiplayer role-playing game (the example is based on World of Warcraft). We model a particular task, which consists of choosing what to do and to select which target in a situation where allies and foes are present. We explain the model in Bayesian programming and show how we could learn the c… ▽ More

    Submitted 24 November, 2010; originally announced November 2010.

    Comments: 30th international workshop on Bayesian Inference and Maximum Entropy, Chamonix : France (2010)