Skip to main content

Showing 1–4 of 4 results for author: Teboul, O

Searching in archive eess. Search in all archives.
.
  1. arXiv:2209.03143  [pdf, other

    cs.SD cs.LG eess.AS

    AudioLM: a Language Modeling Approach to Audio Generation

    Authors: Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

    Abstract: We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenizati… ▽ More

    Submitted 25 July, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

  2. arXiv:2105.13802  [pdf, other

    cs.SD cs.LG eess.AS

    DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

    Authors: Neil Zeghidour, Olivier Teboul, David Grangier

    Abstract: We introduce DIVE, an end-to-end speaker diarization algorithm. Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each speaker conditioned on the extracted representations. This strategy intrinsically resolves the speaker ordering ambiguity without requiring the classical permut… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  3. arXiv:2103.09879  [pdf, other

    cs.SD cs.AI eess.AS

    Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

    Authors: Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

    Abstract: Self-supervised pre-training using so-called "pretext" tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to reorder shuffled parts of the spectrogram of an audio signal, to improve downstream classification performance. We make two main contributions. First, we overcome the… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

  4. arXiv:2101.08596  [pdf, other

    cs.SD cs.LG eess.AS

    LEAF: A Learnable Frontend for Audio Classification

    Authors: Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

    Abstract: Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental limitations of handmade representations. In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted at ICLR 2021