Skip to main content

Showing 1–50 of 93 results for author: Ney, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.09646  [pdf, other

    cs.LG cs.AI cs.CL

    ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change

    Authors: David Thulke, Yingbo Gao, Petrus Pelser, Rein Brune, Rricha Jalota, Floris Fok, Michael Ramos, Ian van Wyk, Abdallah Nasir, Hayden Goldstein, Taylor Tragemann, Katie Nguyen, Ariana Fowler, Andrew Stanco, Jon Gabriel, Jordan Taylor, Dean Moro, Evgenii Tsymbalov, Juliette de Waal, Evgeny Matusov, Mudar Yaghi, Mohammad Shihadah, Hermann Ney, Christian Dugast, Jonathan Dotan , et al. (1 additional authors not shown)

    Abstract: This paper introduces ClimateGPT, a model family of domain-specific large language models that synthesize interdisciplinary research on climate change. We trained two 7B models from scratch on a science-oriented dataset of 300B tokens. For the first model, the 4.2B domain-specific tokens were included during pre-training and the second was adapted to the climate domain after pre-training. Addition… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  2. arXiv:2310.12303  [pdf, other

    cs.CL cs.AI cs.LG

    Document-Level Language Models for Machine Translation

    Authors: Frithjof Petrick, Christian Herold, Pavel Petrushkov, Shahram Khadivi, Hermann Ney

    Abstract: Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: accepted at WMT 2023

  3. arXiv:2310.07345  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers

    Authors: Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers. Both lattice-free and N-best-list approaches are examined. For lattice-free methods with phoneme-level LMs, we propose a method to approximate the context history to employ LMs with full-con… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: accepted at ASRU 2023

  4. arXiv:2310.02724  [pdf, other

    cs.LG cs.SD eess.AS

    End-to-End Training of a Neural HMM with Label and Transition Probabilities

    Authors: Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable… ▽ More

    Submitted 9 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted for Presentation at ASRU2023

  5. arXiv:2309.14130  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

    Authors: Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external language model (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mu… ▽ More

    Submitted 13 April, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: accepted at ICASSP 2024

  6. arXiv:2309.08436  [pdf, other

    eess.AS cs.SD stat.ML

    Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

    Authors: Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transduc… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  7. arXiv:2308.04286  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Comparative Analysis of the wav2vec 2.0 Feature Extractor

    Authors: Peter Vieting, Ralf Schlüter, Hermann Ney

    Abstract: Automatic speech recognition (ASR) systems typically use handcrafted feature extraction pipelines. To avoid their inherent information loss and to achieve more consistent modeling from speech to transcribed text, neural raw waveform feature extractors (FEs) are an appealing approach. Also the wav2vec 2.0 model, which has recently gained large popularity, uses a convolutional FE which operates dire… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at ITG 2023

  8. arXiv:2306.09517  [pdf, ps, other

    cs.SD eess.AS

    Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think

    Authors: Tina Raissi, Christoph Lüscher, Moritz Gunz, Ralf Schlüter, Hermann Ney

    Abstract: Building competitive hybrid hidden Markov model~(HMM) systems for automatic speech recognition~(ASR) requires a complex multi-stage pipeline consisting of several training criteria. The recent sequence-to-sequence models offer the advantage of having simpler pipelines that can start from-scratch. We propose a purely neural based single-stage from-scratch pipeline for a context-dependent hybrid HMM… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted for presentation at InterSpeech 2023

  9. arXiv:2306.05183  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Long Context Document-Level Machine Translation

    Authors: Christian Herold, Hermann Ney

    Abstract: Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many works have been published on the topic of document-level NMT, but most restrict the system to only local context, typically including just the one or two preceding sentences as additional… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: accepted at CODI 2023 (ACL workshop)

  10. arXiv:2306.05116  [pdf, other

    cs.CL cs.AI cs.LG

    On Search Strategies for Document-Level Neural Machine Translation

    Authors: Christian Herold, Hermann Ney

    Abstract: Compared to sentence-level systems, document-level neural machine translation (NMT) models produce a more consistent output across a document and are able to better resolve ambiguities within the input. There are many works on document-level NMT, mostly focusing on modifying the model architecture or training strategy to better accommodate the additional context-input. On the other hand, in most w… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 (Findings)

  11. arXiv:2306.05077  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Language Model Integration for Neural Machine Translation

    Authors: Christian Herold, Yingbo Gao, Mohammad Zeineldeen, Hermann Ney

    Abstract: The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can help improve translation quality. However, there has always been the assumption that the translation model also learns an implicit target-side language model during training, which inte… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: accepted at ACL2023 (Findings)

  12. RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition

    Authors: Wei Zhou, Eugen Beck, Simon Berger, Ralf Schlüter, Hermann Ney

    Abstract: Modern public ASR tools usually provide rich support for training various sequence-to-sequence (S2S) models, but rather simple support for decoding open-vocabulary scenarios only. For closed-vocabulary scenarios, public tools supporting lexical-constrained decoding are usually only for classical ASR, or do not support all S2S models. To eliminate this restriction on research possibilities such as… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: accepted at Interspeech 2023

  13. arXiv:2304.07101  [pdf, other

    cs.CL cs.AI cs.LG

    Task-oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10

    Authors: David Thulke, Nico Daheim, Christian Dugast, Hermann Ney

    Abstract: This paper summarizes our contributions to the document-grounded dialog tasks at the 9th and 10th Dialog System Technology Challenges (DSTC9 and DSTC10). In both iterations the task consists of three subtasks: first detect whether the current turn is knowledge seeking, second select a relevant knowledge document, and third generate a response grounded on the selected document. For DSTC9 we propose… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:2112.08844

  14. arXiv:2301.04571  [pdf, other

    cs.CL eess.AS stat.ML

    Analyzing And Improving Neural Speaker Embeddings for ASR

    Authors: Christoph Lüscher, **g**g Xu, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In this work, we present our efforts w.r.t integrating neural speaker embeddings into a conformer based hybrid HMM ASR system. For ASR, our improved embedding extr… ▽ More

    Submitted 20 September, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: Accepted at ITG Speech Communications 2023

  15. arXiv:2212.04325  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

    Authors: Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-… ▽ More

    Submitted 25 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: accepted at ICASSP 2023

  16. arXiv:2211.06369  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Enhancing and Adversarial: Improve ASR with Speaker Labels

    Authors: Wei Zhou, Haotian Wu, **g**g Xu, Mohammad Zeineldeen, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: ASR can be improved by multi-task learning (MTL) with domain enhancing or domain adversarial training, which are two opposite objectives with the aim to increase/decrease domain variance towards domain-aware/agnostic ASR, respectively. In this work, we study how to best apply these two opposite objectives with speaker labels to improve conformer-based ASR. We also propose a novel adaptive gradient… ▽ More

    Submitted 24 February, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: accepted at ICASSP 2023

  17. arXiv:2211.04898  [pdf, other

    cs.CL cs.AI

    Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token

    Authors: Baohao Liao, David Thulke, Sanjika Hewavitharana, Hermann Ney, Christof Monz

    Abstract: The pre-training of masked language models (MLMs) consumes massive computation to achieve good results on downstream NLP tasks, resulting in a large carbon footprint. In the vanilla MLM, the virtual tokens, [MASK]s, act as placeholders and gather the contextualized information from unmasked tokens to restore the corrupted information. It raises the question of whether we can append [MASK]s at a la… ▽ More

    Submitted 15 November, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Code available at: https://github.com/BaohaoLiao/3ml

  18. arXiv:2210.17418  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model

    Authors: Nico Daheim, David Thulke, Christian Dugast, Hermann Ney

    Abstract: In this work, we present a model for document-grounded response generation in dialog that is decomposed into two components according to Bayes theorem. One component is a traditional ungrounded response generation model and the other component models the reconstruction of the grounding document based on the dialog context and generated response. We propose different approximate decoding schemes an… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Comments: Accepted to Findings of EMNLP 2022

  19. arXiv:2210.15445  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Efficient Utilization of Large Pre-Trained Models for Low Resource ASR

    Authors: Peter Vieting, Christoph Lüscher, Julian Dierkes, Ralf Schlüter, Hermann Ney

    Abstract: Unsupervised representation learning has recently helped automatic speech recognition (ASR) to tackle tasks with limited labeled data. Following this, hardware limitations and applications give rise to the question how to take advantage of large pre-trained models efficiently and reduce their complexity. In this work, we study a challenging low resource conversational telephony speech corpus from… ▽ More

    Submitted 17 August, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted at ICASSP SASB 2023

  20. arXiv:2210.14742  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Monotonic segmental attention for automatic speech recognition

    Authors: Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

    Abstract: We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, on… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: accepted at SLT: https://slt2022.org/

  21. arXiv:2210.13700  [pdf, other

    eess.AS cs.CL cs.LG

    Does Joint Training Really Help Cascaded Speech Translation?

    Authors: Viet Anh Khoa Tran, David Thulke, Yingbo Gao, Christian Herold, Hermann Ney

    Abstract: Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results. However, fundamental challenges such as error propagation from the automatic speech recognition system still remain. To mitigate these problems, recently, people turn their attention to direct data and propose various joint training methods.… ▽ More

    Submitted 24 November, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

  22. arXiv:2210.13397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

    Authors: Christoph Lüscher, Mohammad Zeineldeen, Zijian Yang, Tina Raissi, Peter Vieting, Khai Le-Duc, Weiyue Wang, Ralf Schlüter, Hermann Ney

    Abstract: Language barriers present a great challenge in our increasingly connected and global world. Especially within the medical domain, e.g. hospital or emergency room, communication difficulties and delays may lead to malpractice and non-optimal patient care. In the HYKIST project, we consider patient-physician communication, more specifically between a German-speaking physician and an Arabic- or Vietn… ▽ More

    Submitted 22 September, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: ASR System Paper for HYKIST project

  23. arXiv:2210.11807  [pdf, other

    cs.CL cs.AI cs.LG

    Is Encoder-Decoder Redundant for Neural Machine Translation?

    Authors: Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney

    Abstract: Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of attention mechanism, encoder-decoder is still the de facto neural network architecture for state-of-the-art models. While the motivation for decoding information… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: accepted at AACL2022

  24. arXiv:2210.11803  [pdf, other

    cs.CL cs.AI cs.LG

    Revisiting Checkpoint Averaging for Neural Machine Translation

    Authors: Yingbo Gao, Christian Herold, Zijian Yang, Hermann Ney

    Abstract: Checkpoint averaging is a simple and effective method to boost the performance of converged neural machine translation models. The calculation is cheap to perform and the fact that the translation improvement almost comes for free, makes it widely adopted in neural machine translation research. Despite the popularity, the method itself simply takes the mean of the model parameters from several che… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: accepted at AACL2022

  25. arXiv:2210.09951  [pdf, other

    cs.SD eess.AS

    HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

    Authors: Tina Raissi, Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney

    Abstract: In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). Besides accuracy, we further analyze their capability for generating high-quality time alignment between the speech signal and the transcription, which can be crucial for many subsequen… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted for Presentation at IEEE SLT 2022

  26. arXiv:2206.12955  [pdf, other

    cs.CL eess.AS stat.ML

    Improving the Training Recipe for a Robust Conformer-based Hybrid Model

    Authors: Mohammad Zeineldeen, **g**g Xu, Christoph Lüscher, Ralf Schlüter, Hermann Ney

    Abstract: Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the m… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: Accepted at INTERSPEECH 2022

  27. Efficient Training of Neural Transducer for Speech Recognition

    Authors: Wei Zhou, Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: As one of the most popular sequence-to-sequence modeling approaches for speech recognition, the RNN-Transducer has achieved evolving performance with more and more sophisticated neural network models of growing size and increasing training epochs. While strong computation resources seem to be the prerequisite of training superior models, we try to overcome it by carefully designing a more efficien… ▽ More

    Submitted 8 August, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: accepted at Interspeech 2022

  28. arXiv:2201.09692  [pdf, ps, other

    cs.SD eess.AS

    Improving Factored Hybrid HMM Acoustic Modeling without State Tying

    Authors: Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney

    Abstract: In this work, we show that a factored hybrid hidden Markov model (FH-HMM) which is defined without any phonetic state-tying outperforms a state-of-the-art hybrid HMM. The factored hybrid HMM provides a link to transducer models in the way it models phonetic (label) context while preserving the strict separation of acoustic and language model of the hybrid HMM approach. Furthermore, we show that th… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Accepted for presentation at IEEE ICASSP 2022

    MSC Class: 68T10 ACM Class: I.2.7

  29. arXiv:2112.08844  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model

    Authors: David Thulke, Nico Daheim, Christian Dugast, Hermann Ney

    Abstract: This paper summarizes our submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations". Similar to the previous year's iteration, the task consists of three subtasks: detecting whether a turn is knowledge seeking, selecting the relevant knowledge document and finally generating a ground… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to the DSTC10 workshop at AAAI 2022

  30. arXiv:2111.06310  [pdf, other

    cs.CL cs.SD eess.AS

    Self-Normalized Importance Sampling for Neural Language Modeling

    Authors: Zijian Yang, Yingbo Gao, Alexander Gerstenberger, **tao Jiang, Ralf Schlüter, Hermann Ney

    Abstract: To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based neural language models. These training criteria typically enjoy the benefit of faster training and testing, at a cost of slightly degraded performance in terms of… ▽ More

    Submitted 17 June, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: Accepted at INTERSPEECH 2022

  31. arXiv:2111.03442  [pdf, other

    cs.CL eess.AS stat.ML

    Conformer-based Hybrid ASR System for Switchboard Dataset

    Authors: Mohammad Zeineldeen, **g**g Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

    Abstract: The recently proposed conformer architecture has been successfully used for end-to-end automatic speech recognition (ASR) architectures achieving state-of-the-art performance on different datasets. To our best knowledge, the impact of using conformer acoustic model for hybrid ASR is not investigated. In this paper, we present and evaluate a competitive conformer-based hybrid model training recipe.… ▽ More

    Submitted 19 February, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022

  32. arXiv:2110.09324  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Learning of Subword Dependent Model Scales

    Authors: Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

    Abstract: To improve the performance of state-of-the-art automatic speech recognition systems it is common practice to include external knowledge sources such as language models or prior corrections. This is usually done via log-linear model combination using separate scaling parameters for each model. Typically these parameters are manually optimized on some held-out data. In this work we propose to opti… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 2022

  33. arXiv:2110.09245  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Sequence Training of Attention Models using Approximative Recombination

    Authors: Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney

    Abstract: Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. Current state-of-the-art systems with unlimited label context circumvent this problem by limiting the summation to an n-best list of relevant competing hypotheses obt… ▽ More

    Submitted 21 April, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

  34. arXiv:2110.06841  [pdf, ps, other

    cs.CL eess.AS

    On Language Model Integration for RNN Transducer based Speech Recognition

    Authors: Wei Zhou, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

    Abstract: The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to remove this sequence prior as ILM correction. In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: accepted at ICASSP2022

  35. Investigation on Data Adaptation Techniques for Neural Named Entity Recognition

    Authors: Evgeniia Tokarchuk, David Thulke, Weiyue Wang, Christian Dugast, Hermann Ney

    Abstract: Data processing is an important step in various natural language processing tasks. As the commonly used datasets in named entity recognition contain only a limited number of samples, it is important to obtain additional labeled data in an efficient and reliable manner. A common practice is to utilize large monolingual unlabeled corpora. Another popular technique is to create synthetic data from th… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: ACL SRW 2021 - camera ready

  36. arXiv:2109.13097  [pdf, other

    cs.CL

    Towards Reinforcement Learning for Pivot-based Neural Machine Translation with Non-autoregressive Transformer

    Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

    Abstract: Pivot-based neural machine translation (NMT) is commonly used in low-resource setups, especially for translation between non-English language pairs. It benefits from using high resource source-pivot and pivot-target language pairs and an individual system is trained for both sub-tasks. However, these models have no connection during training, and the source-pivot model is not optimized to produce… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: RL4RealLife Workshop 2021 camera-ready

  37. Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

    Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

    Abstract: Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However, cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: IWSLT 2021 camera-ready

  38. arXiv:2106.07275  [pdf, other

    cs.CL cs.AI cs.LG

    Cascaded Span Extraction and Response Generation for Document-Grounded Dialog

    Authors: Nico Daheim, David Thulke, Christian Dugast, Hermann Ney

    Abstract: This paper summarizes our entries to both subtasks of the first DialDoc shared task which focuses on the agent response prediction task in goal-oriented document-grounded dialogs. The task is split into two subtasks: predicting a span in a document that grounds an agent turn and generating an agent response based on a dialog and grounding document. In the first subtask, we restrict the set of vali… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted by 1st DialDoc Workshop at ACL-IJCNLP 2021

  39. arXiv:2105.14849  [pdf, other

    cs.LG cs.AI cs.CL cs.NE cs.SD eess.AS math.ST

    Why does CTC result in peaky behavior?

    Authors: Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: The peaky behavior of CTC models is well known experimentally. However, an understanding about why peaky behavior occurs is missing, and whether this is a good property. We provide a formal analysis of the peaky behavior and gradient descent convergence properties of the CTC loss and related training criteria. Our analysis provides a deep understanding why peaky behavior occurs and when it is subo… ▽ More

    Submitted 3 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

  40. arXiv:2104.10507  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    On Sampling-Based Training Criteria for Neural Language Modeling

    Authors: Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

    Abstract: As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated. The essence of these sampling methods is that the softmax-related traversal over the entire vocabulary can be simplified, giving speedups compared to the baseline. A problem we notice about the current landscape of such sampling methods is the lack o… ▽ More

    Submitted 17 June, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Accepted at INTERSPEECH 2021

  41. Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition

    Authors: Wei Zhou, Mohammad Zeineldeen, Zuoyun Zheng, Ralf Schlüter, Hermann Ney

    Abstract: Subword units are commonly used for end-to-end automatic speech recognition (ASR), while a fully acoustic-oriented subword modeling approach is somewhat missing. We propose an acoustic data-driven subword modeling (ADSM) approach that adapts the advantages of several text-based and acoustic-based subword methods into one pipeline. With a fully acoustic-oriented label design and learning process, A… ▽ More

    Submitted 27 August, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: accepted at Interspeech2021

  42. Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

    Authors: Wei Zhou, Albert Zeyer, André Merboldt, Ralf Schlüter, Hermann Ney

    Abstract: With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM). While transducer models stay with a frame-level model definition, segmental models are defined o… ▽ More

    Submitted 15 June, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: accepted at Interspeech2021

  43. arXiv:2104.05544  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

    Authors: Mohammad Zeineldeen, Aleksandr Glushko, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to… ▽ More

    Submitted 17 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: accepted to Interspeech 2021

  44. arXiv:2104.05379  [pdf, other

    cs.CL cs.LG

    Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures

    Authors: Nick Rossenbach, Mohammad Zeineldeen, Benedikt Hilmes, Ralf Schlüter, Hermann Ney

    Abstract: Recent publications on automatic-speech-recognition (ASR) have a strong focus on attention encoder-decoder (AED) architectures which tend to suffer from over-fitting in low resource scenarios. One solution to tackle this issue is to generate synthetic data with a trained text-to-speech system (TTS) if additional text is available. This was successfully applied in many publications with AED systems… ▽ More

    Submitted 13 July, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Submitted to ASRU 2021

  45. arXiv:2104.04298  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    On Architectures and Training for Raw Waveform Feature Extraction in ASR

    Authors: Peter Vieting, Christoph Lüscher, Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: With the success of neural network based modeling in automatic speech recognition (ASR), many studies investigated acoustic modeling and learning of feature extractors directly based on the raw waveform. Recently, one line of research has focused on unsupervised pre-training of feature extractors on audio-only data to improve downstream ASR performance. In this work, we investigate the usefulness… ▽ More

    Submitted 5 October, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted for ASRU 2021

  46. arXiv:2104.03006  [pdf, other

    cs.CL cs.AI stat.ML

    Librispeech Transducer Model with Internal Language Model Prior Correction

    Authors: Albert Zeyer, André Merboldt, Wilfried Michel, Ralf Schlüter, Hermann Ney

    Abstract: We present our transducer model on Librispeech. We study variants to include an external language model (LM) with shallow fusion and subtract an estimated internal LM. This is justified by a Bayesian interpretation where the transducer model prior is given by the estimated internal LM. The subtraction of the internal LM gives us over 14% relative improvement over normal shallow fusion. Our transdu… ▽ More

    Submitted 12 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: accepted at Interspeech 2021

  47. arXiv:2104.02387  [pdf, other

    cs.SD eess.AS

    Towards Consistent Hybrid HMM Acoustic Modeling

    Authors: Tina Raissi, Eugen Beck, Ralf Schlüter, Hermann Ney

    Abstract: High-performance hybrid automatic speech recognition (ASR) systems are often trained with clustered triphone outputs, and thus require a complex training pipeline to generate the clustering. The same complex pipeline is often utilized in order to generate an alignment for use in frame-wise cross-entropy training. In this work, we propose a flat-start factored hybrid model trained by modeling the f… ▽ More

    Submitted 12 October, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    MSC Class: 68T10 ACM Class: I.2.7

  48. arXiv:2103.16710  [pdf, other

    cs.CL cs.AI cs.CV

    A study of latent monotonic attention variants

    Authors: Albert Zeyer, Ralf Schlüter, Hermann Ney

    Abstract: End-to-end models reach state-of-the-art performance for speech recognition, but global soft attention is not monotonic, which might lead to convergence problems, to instability, to bad generalisation, cannot be used for online streaming, and is also inefficient in calculation. Monotonicity can potentially fix all of this. There are several ad-hoc solutions or heuristics to introduce monotonicity,… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

  49. arXiv:2102.04643  [pdf, ps, other

    cs.CL

    Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog

    Authors: David Thulke, Nico Daheim, Christian Dugast, Hermann Ney

    Abstract: This paper summarizes our work on the first track of the ninth Dialog System Technology Challenge (DSTC 9), "Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access". The goal of the task is to generate responses to user turns in a task-oriented dialog that require knowledge from unstructured documents. The task is divided into three subtasks: detection, select… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted by DSTC9 Workshop at AAAI-2021

  50. arXiv:2011.12167  [pdf, other

    cs.CL cs.LG

    Tight Integrated End-to-End Training for Cascaded Speech Translation

    Authors: Parnia Bahar, Tobias Bieschke, Ralf Schlüter, Hermann Ney

    Abstract: A cascaded speech translation model relies on discrete and non-differentiable transcription, which provides a supervision signal from the source side and helps the transformation between source speech and target text. Such modeling suffers from error propagation between ASR and MT models. Direct speech translation is an alternative method to avoid error propagation; however, its performance is oft… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: 8 pages, accepted at SLT2021