Skip to main content

Showing 1–25 of 25 results for author: Defossez, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02315  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    An Independence-promoting Loss for Music Generation with Language Models

    Authors: Jean-Marie Lemercier, Simon Rouard, Jade Copet, Yossi Adi, Alexandre Défossez

    Abstract: Music generation schemes using language modeling rely on a vocabulary of audio tokens, generally provided as codes in a discrete latent space learnt by an auto-encoder. Multi-stage quantizers are often employed to produce these tokens, therefore the decoding strategy used for token prediction must be adapted to account for multiple codebooks: either it should model the joint distribution over all… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2401.17264  [pdf, other

    cs.SD cs.AI cs.CR

    Proactive Detection of Voice Cloning with Localized Watermarking

    Authors: Robin San Roman, Pierre Fernandez, Alexandre Défossez, Teddy Furon, Tuan Tran, Hady Elsahar

    Abstract: In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized waterma… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Published at ICML 2024. Code at https://github.com/facebookresearch/audioseal - webpage at https://pierrefdz.github.io/publications/audioseal/

  3. arXiv:2401.04577  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Masked Audio Generation using a Single Non-Autoregressive Transformer

    Authors: Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

    Abstract: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we predict spans of masked tokens obtained from a masking scheduler, while during inference we gradually construct the output sequence using several decoding steps. T… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  4. arXiv:2308.12950  [pdf, other

    cs.CL

    Code Llama: Open Foundation Models for Code

    Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, **gyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom , et al. (1 additional authors not shown)

    Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama… ▽ More

    Submitted 31 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  5. arXiv:2308.06979  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

    Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

    Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

    Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

  6. arXiv:2308.02560  [pdf, other

    cs.SD cs.LG eess.AS

    From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

    Authors: Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez

    Abstract: Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the condi… ▽ More

    Submitted 8 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 10 pages

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

  7. arXiv:2306.05284  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Simple and Controllable Music Generation

    Authors: Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

    Abstract: We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchicall… ▽ More

    Submitted 29 January, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Published at Neurips 2023

  8. arXiv:2305.13009  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Textually Pretrained Speech Language Models

    Authors: Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

    Abstract: Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model de… ▽ More

    Submitted 30 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  9. arXiv:2211.08553  [pdf, other

    eess.AS cs.SD

    Hybrid Transformers for Music Source Separation

    Authors: Simon Rouard, Francisco Massa, Alexandre Défossez

    Abstract: A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based Transformers have shown their ability to integrate information over long sequences. In this work, we introduce Hybrid Transformer Demucs (HT Demucs), an hybrid temporal/spectral bi-U-Net based on Hybr… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  10. arXiv:2211.01223  [pdf, other

    cs.SD eess.AS

    Audio Language Modeling using Perceptually-Guided Discrete Representations

    Authors: Felix Kreuk, Yaniv Taigman, Adam Polyak, Jade Copet, Gabriel Synnaeve, Alexandre Défossez, Yossi Adi

    Abstract: In this work, we study the task of Audio Language Modeling, in which we aim at learning probabilistic models for audio that can be used for generation and completion. We use a state-of-the-art perceptually-guided audio compression model, to encode audio to discrete representations. Next, we train a transformer-based causal language model using these representations. At inference time, we perform a… ▽ More

    Submitted 4 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

  11. arXiv:2210.13438  [pdf, other

    eess.AS cs.AI cs.SD stat.ML

    High Fidelity Neural Audio Compression

    Authors: Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

    Abstract: We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Preprint

  12. arXiv:2209.15352  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    AudioGen: Textually Guided Audio Generation

    Authors: Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

    Abstract: We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs. AudioGen operates on a learnt discrete audio representation. The task of text-to-audio generation poses multiple challenges. Due to the way audio travels through a medium, differe… ▽ More

    Submitted 5 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Accepted to ICLR 2023

  13. arXiv:2208.12266  [pdf, other

    eess.AS cs.AI cs.LG q-bio.NC

    Decoding speech perception from non-invasive brain recordings

    Authors: Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, Jean-Rémi King

    Abstract: Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings… ▽ More

    Submitted 5 October, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: updated version following publication in Nature Machine Intelligence (2023)

  14. arXiv:2206.15423  [pdf, other

    cs.SD cs.LG eess.AS

    Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain

    Authors: Dejan Markovic, Alexandre Defossez, Alexander Richard

    Abstract: We present a single-stage casual waveform-to-waveform multichannel model that can separate moving sound sources based on their broad spatial locations in a dynamic acoustic scene. We divide the scene into two spatial regions containing, respectively, the target and the interfering sound sources. The model is trained end-to-end and performs spatial processing implicitly, without any components base… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: Interspeech 2022

  15. arXiv:2111.03600  [pdf, other

    eess.AS cs.SD stat.ML

    Hybrid Spectrogram and Waveform Source Separation

    Authors: Alexandre Défossez

    Abstract: Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additiona… ▽ More

    Submitted 30 August, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: ISMIR 2021 MDX Workshop, 11 pages, 2 figures

  16. Music Demixing Challenge 2021

    Authors: Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk

    Abstract: Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, the widely used MUSDB18 dataset played an important role… ▽ More

    Submitted 23 May, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Journal ref: Frontiers in Signal Processing, 28 January 2022

  17. arXiv:2104.09987  [pdf, other

    stat.ML cs.AI cs.LG

    Differentiable Model Compression via Pseudo Quantization Noise

    Authors: Alexandre Défossez, Yossi Adi, Gabriel Synnaeve

    Abstract: We propose DiffQ a differentiable method for model compression for quantizing model parameters without gradient approximations (e.g., Straight Through Estimator). We suggest adding independent pseudo quantization noise to model parameters during training to approximate the effect of a quantization operator. DiffQ is differentiable both with respect to the unquantized weights and the number of bits… ▽ More

    Submitted 17 October, 2022; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: final TMLR version

  18. arXiv:2103.02339  [pdf, other

    q-bio.NC cs.LG cs.NE

    Deep Recurrent Encoder: A scalable end-to-end network to model brain signals

    Authors: Omar Chehab, Alexandre Defossez, Jean-Christophe Loiseau, Alexandre Gramfort, Jean-Remi King

    Abstract: Understanding how the brain responds to sensory inputs is challenging: brain recordings are partial, noisy, and high dimensional; they vary across sessions and subjects and they capture highly nonlinear dynamics. These challenges have led the community to develop a variety of preprocessing and analytical (almost exclusively linear) methods, each designed to tackle one of these issues. Instead, we… ▽ More

    Submitted 30 September, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

  19. arXiv:2006.12847  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Real Time Speech Enhancement in the Waveform Domain

    Authors: Alexandre Defossez, Gabriel Synnaeve, Yossi Adi

    Abstract: We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non… ▽ More

    Submitted 6 September, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: Interspeech 2020 Paper

  20. arXiv:2003.02395  [pdf, other

    stat.ML cs.LG

    A Simple Convergence Proof of Adam and Adagrad

    Authors: Alexandre Défossez, Léon Bottou, Francis Bach, Nicolas Usunier

    Abstract: We provide a simple proof of convergence covering both the Adam and Adagrad adaptive optimization algorithms when applied to smooth (possibly non-convex) objective functions with bounded gradients. We show that in expectation, the squared norm of the objective gradient averaged over the trajectory has an upper-bound which is explicit in the constants of the problem, parameters of the optimizer, th… ▽ More

    Submitted 17 October, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: final TMLR version

  21. arXiv:1911.13254  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Music Source Separation in the Waveform Domain

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source… ▽ More

    Submitted 28 April, 2021; v1 submitted 27 November, 2019; originally announced November 2019.

  22. arXiv:1909.01174  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed

    Authors: Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: We study the problem of source separation for music using deep learning with four known sources: drums, bass, vocals and other accompaniments. State-of-the-art approaches predict soft masks over mixture spectrograms while methods working on the waveform are lagging behind as measured on the standard MusDB benchmark. Our contribution is two fold. (i) We introduce a simple convolutional and recurren… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

  23. arXiv:1810.09785  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    SING: Symbol-to-Instrument Neural Generator

    Authors: Alexandre Défossez, Neil Zeghidour, Nicolas Usunier, Léon Bottou, Francis Bach

    Abstract: Recent progress in deep learning for audio synthesis opens the way to models that directly produce the waveform, shifting away from the traditional paradigm of relying on vocoders or MIDI synthesizers for speech or music generation. Despite their successes, current state-of-the-art neural audio synthesizers such as WaveNet and SampleRNN suffer from prohibitive training and inference times because… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

    Journal ref: Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montr{é}al, Canada

  24. arXiv:1711.01761  [pdf, ps, other

    cs.LG math.OC stat.ML

    AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

    Authors: Alexandre Défossez, Francis Bach

    Abstract: We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems. We call this method AdaBatch and it only requires a few lines of code change compared to regular mini-batch SGD algorithms. We provide a theoretical insight to understand how this new class of algorithms is p… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

  25. arXiv:1412.0156  [pdf, ps, other

    cs.LG math.OC stat.ML

    Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

    Authors: Alexandre Défossez, Francis Bach

    Abstract: We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent (a.k.a. least-mean-squares). In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a… ▽ More

    Submitted 29 November, 2014; originally announced December 2014.