Skip to main content

Showing 1–3 of 3 results for author: Shamsian, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02649  [pdf, other

    eess.AS cs.LG cs.SD

    Keyword-Guided Adaptation of Automatic Speech Recognition

    Authors: Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

    Abstract: Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to InterSpeech 2024

  2. arXiv:2309.08561  [pdf, other

    eess.AS cs.LG cs.SD

    Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

    Authors: Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

    Abstract: Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encod… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Under Review

  3. arXiv:2306.03258  [pdf, other

    eess.AS cs.SD

    LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

    Authors: Yochai Yemini, Aviv Shamsian, Lior Bracha, Sharon Gannot, Ethan Fetaya

    Abstract: Lip-to-speech involves generating a natural-sounding speech synchronized with a soundless video of a person talking. Despite recent advances, current methods still cannot produce high-quality speech with high levels of intelligibility for challenging and realistic datasets such as LRS3. In this work, we present LipVoicer, a novel method that generates high-quality speech, even for in-the-wild and… ▽ More

    Submitted 28 March, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: ICLR 2024