Skip to main content

Showing 1–10 of 10 results for author: Dunbar, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2312.01515  [pdf, other

    cs.CL cs.SD eess.AS

    Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

    Authors: Sean Robertson, Ewan Dunbar

    Abstract: It has been generally assumed in the automatic speech recognition (ASR) literature that it is better for models to have access to wider context windows. Yet, many of the potential reasons this might be true in the supervised setting do not necessarily transfer over to the case of unsupervised learning. We investigate how much context is necessary to achieve high-quality pre-trained acoustic models… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Repository at https://github.com/sdrobert/scpc. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    ACM Class: I.2.7

  2. arXiv:2310.03018  [pdf, other

    eess.AS cs.CL cs.SD

    Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

    Authors: Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee

    Abstract: We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech enco… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICASSP 2024 (v2)

  3. arXiv:2210.15775  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating context-invariance in unsupervised speech representations

    Authors: Mark Hallap, Emmanuel Dupoux, Ewan Dunbar

    Abstract: Unsupervised speech representations have taken off, with benchmarks (SUPERB, ZeroSpeech) demonstrating major progress on semi-supervised speech recognition, speech synthesis, and speech-only language modelling. Inspiration comes from the promise of ``discovering the phonemes'' of a language or a similar low-bitrate encoding. However, one of the critical properties of phoneme transcriptions is cont… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  4. arXiv:2210.15759  [pdf, other

    cs.CL cs.SD eess.AS

    Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

    Authors: Ewan Dunbar, Nicolas Hamilakis, Emmanuel Dupoux

    Abstract: Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Journal ref: Journal: IEEE Journal of Selected Topics in Signal Processing Publication Date: OCTOBER 2022 Volume: 16, Issue: 6 On Page(s): 1211-1226 Print ISSN: 1932-4553 Online ISSN: 1941-0484 Digital Object Identifier: 10.1109/JSTSP.2022.3206084

  5. arXiv:2205.15823  [pdf, other

    cs.CL cs.SD eess.AS

    Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

    Authors: Juliette Millet, Ioana Chitoran, Ewan Dunbar

    Abstract: Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Journal ref: 2021. In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 661-673, Online. Association for Computational Linguistics

  6. arXiv:2205.15819  [pdf, other

    cs.CL cs.SD eess.AS

    Do self-supervised speech models develop human-like perception biases?

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Journal ref: 2022. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7591-7605, Dublin, Ireland. Association for Computational Linguistics

  7. arXiv:2011.11588  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

    Authors: Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

    Abstract: We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics. We present the results and analyses of a com… ▽ More

    Submitted 1 December, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: 14 pages, including references and supplementary material

  8. arXiv:2005.03418  [pdf, other

    cs.CL cs.SD eess.AS

    The Perceptimatic English Benchmark for Speech Perception Models

    Authors: Juliette Millet, Ewan Dunbar

    Abstract: We present the Perceptimatic English Benchmark, an open experimental benchmark for evaluating quantitative models of speech perception in English. The benchmark consists of ABX stimuli along with the responses of 91 American English-speaking listeners. The stimuli test discrimination of a large number of English and French phonemic contrasts. They are extracted directly from corpora of read speech… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: Accepted to CogSci Conference 2020

  9. arXiv:1911.06573  [pdf, other

    eess.AS cs.CL cs.SD

    Independent and automatic evaluation of acoustic-to-articulatory inversion models

    Authors: Maud Parrot, Juliette Millet, Ewan Dunbar

    Abstract: Reconstruction of articulatory trajectories from the acoustic speech signal has been proposed for improving speech recognition and text-to-speech synthesis. However, to be useful in these settings, articulatory reconstruction must be speaker independent. Furthermore, as most research focuses on single, small datasets with few speakers, robust articulatory reconstrucion could profit from combining… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

    Comments: 5 pages, 1 figure

  10. arXiv:1904.11469  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Challenge 2019: TTS without T

    Authors: Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery datase… ▽ More

    Submitted 7 July, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Interspeech 2019