Skip to main content

Showing 1–4 of 4 results for author: Nandwana, M K

.
  1. arXiv:2406.10325  [pdf, other

    cs.CL cs.LG eess.AS

    Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment

    Authors: Joseph Liu, Mahesh Kumar Nandwana, Janne Pylkkönen, Hannes Heikinheimo, Morgan McGuire

    Abstract: Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  2. arXiv:2406.10223  [pdf, other

    cs.LG cs.SD eess.AS

    Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

    Authors: Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana, Joseph Liu, Eloi DuBois, Dao Le, Nicolas Thiebaut, Colin Sinclair, Kyle Spence, Charles Shang, Zoe Abrams, Morgan McGuire

    Abstract: We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer component of the architecture, comparing a Tacotron-based synthesizer to a novel diffusion-based synthesizer. We find the diffusion-based synthesizer to improve M… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Published in Interspeech 2024

  3. arXiv:1902.10828  [pdf, ps, other

    eess.AS cs.SD

    The VOiCES from a Distance Challenge 2019 Evaluation Plan

    Authors: Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Colleen Richey, Aaron Lawson, Maria Alejandra Barrios

    Abstract: The "VOiCES from a Distance Challenge 2019" is designed to foster research in the area of speaker recognition and automatic speech recognition (ASR) with the special focus on single channel distant/far-field audio, under noisy conditions. The main objectives of this challenge are to: (i) benchmark state-of-the-art technology in the area of speaker recognition and automatic speech recognition (ASR)… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

    Comments: Special Session for Interspeech 2019

  4. arXiv:1804.05053  [pdf, other

    cs.SD eess.AS

    Voices Obscured in Complex Environmental Settings (VOICES) corpus

    Authors: Colleen Richey, Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeff Hetherly, Cory Stephenson, Karl Ni

    Abstract: This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical appro… ▽ More

    Submitted 15 May, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

    Comments: Submitted to Interspeech 2018