Skip to main content

Showing 1–8 of 8 results for author: Quitry, F d C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  3. arXiv:2203.15652  [pdf, other

    eess.AS cs.SD

    CycleGAN-Based Unpaired Speech Dereverberation

    Authors: Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey

    Abstract: Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance. The main limitation of this approach is that such models can only be trained on large amounts of data and a variety of room impulse responses when the data is synthetically reverberated, since acquiring real paired data is costly. In thi… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  4. arXiv:2203.15578  [pdf, other

    cs.SD cs.LG eess.AS

    Disentangling speech from surroundings with neural embeddings

    Authors: Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Malcolm Slaney, Marco Tagliasacchi

    Abstract: We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning t… ▽ More

    Submitted 4 June, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted at ICASSP 2023

  5. arXiv:2101.08596  [pdf, other

    cs.SD cs.LG eess.AS

    LEAF: A Learnable Frontend for Audio Classification

    Authors: Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

    Abstract: Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental limitations of handmade representations. In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: Accepted at ICLR 2021

  6. arXiv:2002.12764  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Towards Learning a Universal Non-Semantic Representation of Speech

    Authors: Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Felix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv

    Abstract: The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a pre-existing embedding model trained for different datasets or tasks. The visual and language communities have established benchmarks to compare embeddings, but the speech community has yet to do so. This paper proposes a benchmark for comparing speech representations on non-semantic tasks, and proposes a… ▽ More

    Submitted 6 August, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Journal ref: Proceedings of INTERSPEECH 2020

  7. arXiv:1910.11910  [pdf, other

    eess.AS cs.LG cs.SD

    Learning audio representations via phase prediction

    Authors: Félix de Chaumont Quitry, Marco Tagliasacchi, Dominik Roblek

    Abstract: We learn audio representations by solving a novel self-supervised learning task, which consists of predicting the phase of the short-time Fourier transform from its magnitude. A convolutional encoder is used to map the magnitude spectrum of the input waveform to a lower dimensional embedding. A convolutional decoder is then used to predict the instantaneous frequency (i.e., the temporal rate of ch… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  8. arXiv:1905.11796  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Self-supervised audio representation learning for mobile devices

    Authors: Marco Tagliasacchi, Beat Gfeller, Félix de Chaumont Quitry, Dominik Roblek

    Abstract: We explore self-supervised models that can be potentially deployed on mobile devices to learn general purpose audio representations. Specifically, we propose methods that exploit the temporal context in the spectrogram domain. One method estimates the temporal gap between two short audio segments extracted at random from the same audio clip. The other methods are inspired by Word2Vec, a popular te… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.