Skip to main content

Showing 1–6 of 6 results for author: Saxon, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.11092  [pdf, other

    cs.CL cs.AI cs.CV cs.CY eess.IV

    Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

    Authors: Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

    Abstract: Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and co… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 Main Conference

  2. arXiv:2306.01735  [pdf, other

    cs.CL cs.AI cs.CV eess.IV

    Multilingual Conceptual Coverage in Text-to-Image Models

    Authors: Michael Saxon, William Yang Wang

    Abstract: We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: ACL 2023 main conference; 16 pages, 13 figures

  3. arXiv:2305.10684  [pdf, other

    eess.AS cs.SD

    Data Augmentation for Diverse Voice Conversion in Noisy Environments

    Authors: Avani Tanna, Michael Saxon, Amr El Abbadi, William Yang Wang

    Abstract: Voice conversion (VC) models have demonstrated impressive few-shot conversion quality on the clean, native speech populations they're trained on. However, when source or target speech accents, background noise conditions, or microphone characteristics differ from training, quality voice conversion is not guaranteed. These problems are often left unexamined in VC research, giving rise to frustratio… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023 Show and Tell, 2 pp

  4. arXiv:2106.09009  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    End-to-End Spoken Language Understanding for Generalized Voice Assistants

    Authors: Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, Athanasios Mouchtaris

    Abstract: End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to develo** an E2E model for generalized SLU in com… ▽ More

    Submitted 19 July, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021; 5 pages, 2 tables, 1 figure

    Journal ref: Proc. Interspeech 2021, 4738-4742

  5. arXiv:2008.02858  [pdf, other

    cs.CL cs.SD eess.AS

    Semantic Complexity in End-to-End Spoken Language Understanding

    Authors: Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris

    Abstract: End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted at Interspeech, 2020

  6. arXiv:1911.11360  [pdf, other

    eess.AS cs.SD eess.SP

    Robust Estimation of Hypernasality in Dysarthria with Acoustic Model Likelihood Features

    Authors: Michael Saxon, Ayush Tripathi, Yishan Jiao, Julie Liss, Visar Berisha

    Abstract: Hypernasality is a common characteristic symptom across many motor-speech disorders. For voiced sounds, hypernasality introduces an additional resonance in the lower frequencies and, for unvoiced sounds, there is reduced articulatory precision due to air esca** through the nasal cavity. However, the acoustic manifestation of these symptoms is highly variable, making hypernasality estimation very… ▽ More

    Submitted 5 August, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: 12 pages, 9 figures, 2 tables

    Journal ref: IEEE/ACM Trans. on Audio, Speech, and Language Proc. 28 (2020) 2511-2522