Skip to main content

Showing 1–4 of 4 results for author: Shafey, L E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.01828  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Retrieval Augmented End-to-End Spoken Dialog Models

    Authors: Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

    Abstract: We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM. In this paper, we apply SLM to speech dialog applications where the dialog states are inferred directly from the audio signal. Task-oriented dialogs often contain dom… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Proc. ICASSP 2024

  2. arXiv:2306.07944  [pdf, other

    eess.AS cs.AI cs.CL

    Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

    Authors: Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

    Abstract: Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations. To bridge this gap, we propose a joint speech and language model (SLM) using a Speech2Text adapter, which maps speech into text token embedding space without speech information loss. Additionally, using a CTC-based blank-filtering, w… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  3. arXiv:2110.15222  [pdf, other

    cs.CL cs.SD eess.AS

    Word-level confidence estimation for RNN transducers

    Authors: Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran

    Abstract: Confidence estimate is an often requested feature in applications such as medical transcription where errors can impact patient care and the confidence estimate could be used to alert medical professionals to verify potential errors in recognition. In this paper, we present a lightweight neural confidence model tailored for Automatic Speech Recognition (ASR) system with Recurrent Neural Network… ▽ More

    Submitted 28 September, 2021; originally announced October 2021.

    Journal ref: Proc. ASRU 2021

  4. arXiv:1907.05337  [pdf, other

    cs.CL cs.SD eess.AS

    Joint Speech Recognition and Speaker Diarization via Sequence Transduction

    Authors: Laurent El Shafey, Hagen Soltau, Izhak Shafran

    Abstract: Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system. The two systems are trained independently with different objective… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Journal ref: Proc. Interspeech 2019