Skip to main content

Showing 1–5 of 5 results for author: Rossenbach, N

.
  1. arXiv:2310.08132  [pdf, other

    cs.CL cs.SD eess.AS

    On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

    Authors: Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter

    Abstract: Synthetic data generated by text-to-speech (TTS) systems can be used to improve automatic speech recognition (ASR) systems in low-resource or domain mismatch tasks. It has been shown that TTS-generated outputs still do not have the same qualities as real data. In this work we focus on the temporal structure of synthetic data and its relation to ASR training. By using a novel oracle setup we show h… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: To appear at ASRU 2023

  2. arXiv:2306.03557  [pdf, ps, other

    cs.CL cs.AI

    Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

    Authors: Parnia Bahar, Mattia Di Gangi, Nick Rossenbach, Mohammad Zeineldeen

    Abstract: Automatic Arabic diacritization is useful in many applications, ranging from reading support for language learners to accurate pronunciation predictor for downstream tasks like speech synthesis. While most of the previous works focused on models that operate on raw non-diacritized text, production systems can gain accuracy by first letting humans partly annotate ambiguous words. In this paper, we… ▽ More

    Submitted 31 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Keywords: Arabic text diacritization, partially-diacritized text, Arabic natural language processing, Accepted at INTERSPEECH 2023

  3. arXiv:2104.05379  [pdf, other

    cs.CL cs.LG

    Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

    Authors: Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

    Abstract: Recent publications on automatic-speech-recognition (ASR) have a strong focus on attention encoder-decoder (AED) architectures which tend to suffer from over-fitting in low resource scenarios. One solution to tackle this issue is to generate synthetic data with a trained text-to-speech system (TTS) if additional text is available. This was successfully applied in many publications with AED systems… ▽ More

    Submitted 13 July, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Submitted to ASRU 2021

  4. arXiv:1912.09257  [pdf, other

    cs.CL cs.LG eess.AS

    Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

    Authors: Nick Rossenbach, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: Recent advances in text-to-speech (TTS) led to the development of flexible multi-speaker end-to-end TTS systems. We extend state-of-the-art attention-based automatic speech recognition (ASR) systems with synthetic audio generated by a TTS system trained only on the ASR corpora itself. ASR and TTS systems are built separately to show that text-only data can be used to enhance existing end-to-end AS… ▽ More

    Submitted 17 February, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

    Comments: Accepted to ICASSP 2020

  5. arXiv:1906.01942  [pdf, other

    cs.CL cs.LG

    Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

    Authors: Yunsu Kim, Hendrik Rosendahl, Nick Rossenbach, Jan Rosendahl, Shahram Khadivi, Hermann Ney

    Abstract: We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on to… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: ACL 2019 Repl4NLP camera-ready