Skip to main content

Showing 1–15 of 15 results for author: Zeineldeen, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.15594  [pdf, other

    cs.CL cs.SD eess.AS

    Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR

    Authors: **tao Jiang, Yingbo Gao, Mohammad Zeineldeen, Zoltan Tuske

    Abstract: In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training. Towards this end, triphone and BPE alignments are extracted using a pre-existing hybrid ASR system. Then, regularization effect is obtained by cross-entropy based intermediate auxiliary losses computed on such alignments at a mid-layer representation of the encoder for triphone alig… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 5 pages, 1 figure, 3 tables

  2. arXiv:2309.08436  [pdf, other

    eess.AS cs.SD stat.ML

    Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

    Authors: Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transduc… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  3. arXiv:2306.05077  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Language Model Integration for Neural Machine Translation

    Authors: Christian Herold, Yingbo Gao, Mohammad Zeineldeen, Hermann Ney

    Abstract: The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can help improve translation quality. However, there has always been the assumption that the translation model also learns an implicit target-side language model during training, which inte… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: accepted at ACL2023 (Findings)

  4. arXiv:2306.03557  [pdf, ps, other

    cs.CL cs.AI

    Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

    Authors: Parnia Bahar, Mattia Di Gangi, Nick Rossenbach, Mohammad Zeineldeen

    Abstract: Automatic Arabic diacritization is useful in many applications, ranging from reading support for language learners to accurate pronunciation predictor for downstream tasks like speech synthesis. While most of the previous works focused on models that operate on raw non-diacritized text, production systems can gain accuracy by first letting humans partly annotate ambiguous words. In this paper, we… ▽ More

    Submitted 31 July, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Keywords: Arabic text diacritization, partially-diacritized text, Arabic natural language processing, Accepted at INTERSPEECH 2023

  5. arXiv:2303.05958  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

    Authors: Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran

    Abstract: This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft dis… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  6. arXiv:2301.04571  [pdf, other

    cs.CL eess.AS stat.ML

    Analyzing And Improving Neural Speaker Embeddings for ASR

    Authors: Christoph Lüscher, **g**g Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a conformer based hybrid HMM ASR system. For ASR, our improved embedding extr… ▽ More

    Submitted 20 September, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted at ITG Speech Communications 2023

  7. arXiv:2211.06369  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Enhancing and Adversarial: Improve ASR with Speaker Labels

    Authors: Wei Zhou, Haotian Wu, **g**g Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient… ▽ More

    Submitted 24 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: accepted at ICASSP 2023

  8. arXiv:2210.13397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

    Authors: Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney

    Abstract: Language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietn… ▽ More

    Submitted 22 September, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: ASR System Paper for HYKIST project

  9. arXiv:2206.12955  [pdf, other

    cs.CL eess.AS stat.ML

    Improving the Training Recipe for a Robust Conformer-based Hybrid Model

    Authors: Mohammad Zeineldeen, **g**g Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the m… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: Accepted at INTERSPEECH 2022

  10. arXiv:2111.03442  [pdf, other

    cs.CL eess.AS stat.ML

    Conformer-based Hybrid ASR System for Switchboard Dataset

    Authors: Mohammad Zeineldeen, **g**g Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

    Abstract: The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe.… ▽ More

    Submitted 19 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022

  11. arXiv:2110.09324  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Learning of Subword Dependent Model Scales

    Authors: Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: To improve the performance of state-of-the-art automatic speech recognition systems it is common practice to include external knowledge sources such as language models or prior corrections. This is usually done via log-linear model combination using separate scaling parameters for each model. Typically these parameters are manually optimized on some held-out data. In this work we propose to opti… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 2022

  12. Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

    Authors: Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

    Abstract: Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing. We propose an acoustic data-driven subword modeling (ADSM) approach that adapts the advantages of several text-based and acoustic-based subword methods into one pipeline. With a fully acoustic-oriented label design and learning process, A… ▽ More

    Submitted 27 August, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: accepted at Interspeech2021

  13. arXiv:2104.05544  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

    Authors: Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to… ▽ More

    Submitted 17 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: accepted to Interspeech 2021

  14. arXiv:2104.05379  [pdf, other

    cs.CL cs.LG

    Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

    Authors: Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

    Abstract: Recent publications on automatic-speech-recognition (ASR) have a strong focus on attention encoder-decoder (AED) architectures which tend to suffer from over-fitting in low resource scenarios. One solution to tackle this issue is to generate synthetic data with a trained text-to-speech system (TTS) if additional text is available. This was successfully applied in many publications with AED systems… ▽ More

    Submitted 13 July, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Submitted to ASRU 2021

  15. arXiv:2005.09336  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.NE

    A systematic comparison of grapheme-based vs. phoneme-based label units for encoder-decoder-attention models

    Authors: Mohammad Zeineldeen, Albert Zeyer, Wei Zhou, Thomas Ng, Ralf Schlüter, Hermann Ney

    Abstract: Following the rationale of end-to-end modeling, CTC, RNN-T or encoder-decoder-attention models for automatic speech recognition (ASR) use graphemes or grapheme-based subword units based on e.g. byte-pair encoding (BPE). The map** from pronunciation to spelling is learned completely from data. In contrast to this, classical approaches to ASR employ secondary knowledge sources in the form of phone… ▽ More

    Submitted 15 April, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: 5 pages, 6 tables