Skip to main content

Showing 1–13 of 13 results for author: Schlüter, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2309.08436  [pdf, other

    eess.AS cs.SD stat.ML

    Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

    Authors: Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transduc… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  2. arXiv:2301.04571  [pdf, other

    cs.CL eess.AS stat.ML

    Analyzing And Improving Neural Speaker Embeddings for ASR

    Authors: Christoph Lüscher, **g**g Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a conformer based hybrid HMM ASR system. For ASR, our improved embedding extr… ▽ More

    Submitted 20 September, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted at ITG Speech Communications 2023

  3. arXiv:2206.12955  [pdf, other

    cs.CL eess.AS stat.ML

    Improving the Training Recipe for a Robust Conformer-based Hybrid Model

    Authors: Mohammad Zeineldeen, **g**g Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the m… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: Accepted at INTERSPEECH 2022

  4. arXiv:2111.03442  [pdf, other

    cs.CL eess.AS stat.ML

    Conformer-based Hybrid ASR System for Switchboard Dataset

    Authors: Mohammad Zeineldeen, **g**g Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

    Abstract: The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe.… ▽ More

    Submitted 19 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022

  5. arXiv:2104.10507  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    On Sampling-Based Training Criteria for Neural Language Modeling

    Authors: Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

    Abstract: As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated. The essence of these sampling methods is that the softmax-related traversal over the entire vocabulary can be simplified, giving speedups compared to the baseline. A problem we notice about the current landscape of such sampling methods is the lack o… ▽ More

    Submitted 17 June, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Accepted at INTERSPEECH 2021

  6. arXiv:2104.05544  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

    Authors: Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to… ▽ More

    Submitted 17 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: accepted to Interspeech 2021

  7. arXiv:2104.03006  [pdf, other

    cs.CL cs.AI stat.ML

    Librispeech Transducer Model with Internal Language Model Prior Correction

    Authors: Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: We present our transducer model on Librispeech. We study variants to include an external language model (LM) with shallow fusion and subtract an estimated internal LM. This is justified by a Bayesian interpretation where the transducer model prior is given by the estimated internal LM. The subtraction of the internal LM gives us over 14% relative improvement over normal shallow fusion. Our transdu… ▽ More

    Submitted 12 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: accepted at Interspeech 2021

  8. arXiv:2005.10049  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Early Stage LM Integration Using Local and Global Log-Linear Combination

    Authors: Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM) for the task of automatic speech recognition. One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora. Language model integration is straightforw… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  9. arXiv:2005.09319  [pdf, other

    eess.AS cs.LG cs.NE stat.ML

    A New Training Pipeline for an Improved Neural Transducer

    Authors: Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

    Abstract: The RNN transducer is a promising end-to-end model candidate. We compare the original training criterion with the full marginalization over all alignments, to the commonly used maximum approximation, which simplifies, improves and speeds up our training. We also generalize from the original neural network model and study more powerful models, made possible due to the maximum approximation. We furt… ▽ More

    Submitted 18 November, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: published at Interspeech 2020

  10. arXiv:1907.01409  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR

    Authors: Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: Sequence discriminative training criteria have long been a standard tool in automatic speech recognition for improving the performance of acoustic models over their maximum likelihood / cross entropy trained counterparts. While previously a lattice approximation of the search space has been necessary to reduce computational complexity, recently proposed methods use other approximations to dispense… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Submitted to Interspeech 2019

    Journal ref: Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pp. 1601--1605

  11. arXiv:1907.01030  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

    Authors: Eugen Beck, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but rec… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  12. arXiv:1906.06207  [pdf, ps, other

    cs.CL stat.ML

    Cumulative Adaptation for BLSTM Acoustic Models

    Authors: Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney

    Abstract: This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modelling. Further, i-vectors were used as an input to t… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: Submitted to Interspeech 2019

  13. Improved training of end-to-end attention models for speech recognition

    Authors: Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney

    Abstract: Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition. In this work, we show that such models can achieve competitive results on the Switchboard 300h and LibriSpeech 1000h tasks. In particular, we report the state-of-the-art word error rates (WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets of LibriSpeec… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

    Comments: submitted to Interspeech 2018