Skip to main content

Showing 1–3 of 3 results for author: Dissen, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18928  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Enhanced ASR Robustness to Packet Loss with a Front-End Adaptation Network

    Authors: Yehoshua Dissen, Shiry Yonash, Israel Cohen, Joseph Keshet

    Abstract: In the realm of automatic speech recognition (ASR), robustness in noisy environments remains a significant challenge. Recent ASR models, such as Whisper, have shown promise, but their efficacy in noisy conditions can be further enhanced. This study is focused on recovering from packet loss to improve the word error rate (WER) of ASR models. We propose using a front-end adaptation network connected… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted for publication at INTERSPEECH 2024

  2. arXiv:2204.04166  [pdf, other

    cs.SD cs.LG eess.AS

    Self-supervised Speaker Diarization

    Authors: Yehoshua Dissen, Felix Kreuk, Joseph Keshet

    Abstract: Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker representations. These, however, are heavily dependent on large amounts of annotated data and can be sensitive to new domains. This study proposes an entirely unsupervised d… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  3. arXiv:1611.01783  [pdf, other

    cs.CL cs.SD

    Domain Adaptation For Formant Estimation Using Deep Learning

    Authors: Yehoshua Dissen, Joseph Keshet, Jacob Goldberger, Cynthia Clopper

    Abstract: In this paper we present a domain adaptation technique for formant estimation using a deep network. We first train a deep learning network on a small read speech dataset. We then freeze the parameters of the trained network and use several different datasets to train an adaptation layer that makes the obtained network universal in the sense that it works well for a variety of speakers and speech d… ▽ More

    Submitted 6 November, 2016; originally announced November 2016.