Skip to main content

Showing 1–50 of 58 results for author: Niehues, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16777  [pdf, other

    cs.CL cs.AI

    Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

    Authors: Sai Koneru, Thai-Binh Nguyen, Ngoc-Quan Pham, Danni Liu, Zhaolin Li, Alexander Waibel, Jan Niehues

    Abstract: Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST). In this paper, we present KIT's offline submission in the constrained + LLM track by incorporating recently proposed techniques that can be added to any cascaded speech translation. Specifically, we inte… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.10421  [pdf, other

    cs.CL

    SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

    Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

    Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    ACM Class: I.2.7

  3. arXiv:2406.03881  [pdf, other

    cs.CL

    Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

    Authors: Matthias Sperber, Ondřej Bojar, Barry Haddow, Dávid Javorský, Xutai Ma, Matteo Negri, Jan Niehues, Peter Polák, Elizabeth Salesky, Katsuhito Sudoh, Marco Turchi

    Abstract: Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to fill this gap by conducting a comprehensive human evaluation… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: LREC-COLING2024 publication (with corrections for Table 3)

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  4. arXiv:2404.18031  [pdf, other

    cs.CL

    Quality Estimation with $k$-nearest Neighbors and Automatic Evaluation for Model-specific Quality Estimation

    Authors: Tu Anh Dinh, Tobias Palzer, Jan Niehues

    Abstract: Providing quality scores along with Machine Translation (MT) output, so-called reference-free Quality Estimation (QE), is crucial to inform users about the reliability of the translation. We propose a model-specific, unsupervised QE approach, termed $k$NN-QE, that extracts information from the MT model's training data using $k$-nearest neighbors. Measuring the performance of model-specific QE is n… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted to EAMT 2024

    ACM Class: I.2.7

  5. arXiv:2404.05720  [pdf, other

    cs.CL cs.AI

    Language-Independent Representations Improve Zero-Shot Summarization

    Authors: Vladimir Solovyev, Danni Liu, Jan Niehues

    Abstract: Finetuning pretrained models on downstream generation tasks often leads to catastrophic forgetting in zero-shot conditions. In this work, we focus on summarization and tackle the problem through the lens of language-independent representations. After training on monolingual summarization, we perform zero-shot transfer to new languages or language pairs. We first show naively finetuned models are h… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  6. arXiv:2310.14855  [pdf, other

    cs.CL cs.AI

    Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

    Authors: Sai Koneru, Miriam Exel, Matthias Huck, Jan Niehues

    Abstract: Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigat… ▽ More

    Submitted 18 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NAACL 2024

  7. arXiv:2309.12998  [pdf, other

    cs.CL cs.AI

    Audience-specific Explanations for Machine Translation

    Authors: Renhan Lou, Jan Niehues

    Abstract: In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds. A solution to solve this problem is to add explanations for these words. In a first step, we therefore need to identify these words or phrases. In this work we explore techniques to extract example expl… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  8. arXiv:2309.08565  [pdf, other

    cs.CL cs.AI

    How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?

    Authors: Danni Liu, Jan Niehues

    Abstract: Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity bottlenecks democratizing such customization possibilities to a wider range of languages, particularly lower-resource ones. This gap is out of sync wit… ▽ More

    Submitted 24 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: EACL 2024

  9. arXiv:2309.04316  [pdf, other

    cs.RO cs.AI

    Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

    Authors: Leonard Bärmann, Rainer Kartmann, Fabian Peller-Konrad, Jan Niehues, Alex Waibel, Tamim Asfour

    Abstract: Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mis… ▽ More

    Submitted 16 May, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: This version (v3) adds further quantitative evaluation and many improvements. v2 was presented at the Workshop on Language and Robot Learning (LangRob) at the Conference on Robot Learning (CoRL) 2023. Supplementary video available at https://youtu.be/y5O2mRGtsLM

  10. arXiv:2308.03415  [pdf, other

    cs.CL cs.AI

    End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

    Authors: Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel

    Abstract: The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work… ▽ More

    Submitted 23 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

  11. arXiv:2306.05320  [pdf, other

    cs.CL cs.SD

    KIT's Multilingual Speech Translation System for IWSLT 2023

    Authors: Danni Liu, Thai Binh Nguyen, Sai Koneru, Enes Yavuz Ugan, Ngoc-Quan Pham, Tuan-Nam Nguyen, Tu Anh Dinh, Carlos Mullov, Alexander Waibel, Jan Niehues

    Abstract: Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and te… ▽ More

    Submitted 12 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: IWSLT 2023

  12. arXiv:2305.16935  [pdf, other

    cs.CL

    Gender Lost In Translation: How Bridging The Gap Between Languages Affects Gender Bias in Zero-Shot Multilingual Translation

    Authors: Lena Cabrera, Jan Niehues

    Abstract: Neural machine translation (NMT) models often suffer from gender biases that harm users and society at large. In this work, we explore how bridging the gap between languages for which parallel data is not available affects gender bias in multilingual NMT, specifically for zero-shot directions. We evaluate translation between grammatical gender languages which requires preserving the inherent gende… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at EAMT 2023 (Workshop on Gender-Inclusive Translation Technologies (GITT))

  13. arXiv:2305.07457  [pdf, other

    cs.CL

    Perturbation-based QE: An Explainable, Unsupervised Word-level Quality Estimation Method for Blackbox Machine Translation

    Authors: Tu Anh Dinh, Jan Niehues

    Abstract: Quality Estimation (QE) is the task of predicting the quality of Machine Translation (MT) system output, without using any gold-standard translation references. State-of-the-art QE models are supervised: they require human-labeled quality of some MT system output on some datasets for training, making them domain-dependent and MT-system-dependent. There has been research on unsupervised QE, which r… ▽ More

    Submitted 13 July, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted to MT Summit 2023

    ACM Class: I.2.7

  14. arXiv:2305.03873  [pdf, other

    cs.CL

    Train Global, Tailor Local: Minimalist Multilingual Translation into Endangered Languages

    Authors: Zhong Zhou, Jan Niehues, Alex Waibel

    Abstract: In many humanitarian scenarios, translation into severely low resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, tran… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: In Proceedings of the 6th Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT) of the 17th Conference of the European Chapter of the Association for Computational Linguistic in 2023

  15. arXiv:2301.09617  [pdf, other

    cs.CV

    Fully transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study

    Authors: Sophia J. Wagner, Daniel Reisenbüchler, Nicholas P. West, Jan Moritz Niehues, Gregory Patrick Veldhuizen, Philip Quirke, Heike I. Grabsch, Piet A. van den Brandt, Gordon G. A. Hutchins, Susan D. Richman, Tanwei Yuan, Rupert Langer, Josien Christina Anna Jenniskens, Kelly Offermans, Wolfram Mueller, Richard Gray, Stephen B. Gruber, Joel K. Greenson, Gad Rennert, Joseph D. Bonner, Daniel Schmolze, Jacqueline A. James, Maurice B. Loughrey, Manuel Salto-Tellez, Hermann Brenner , et al. (6 additional authors not shown)

    Abstract: Background: Deep learning (DL) can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer. For example, a DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022. Current approaches rely on convolutional neural networks (CNNs). Transformer networks are outperforming CNNs and are replacing them in many applications, but… ▽ More

    Submitted 1 March, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: Updated Figure 2 and Table A.5

  16. Diffusion Probabilistic Models beat GANs on Medical Images

    Authors: Gustav Müller-Franzes, Jan Moritz Niehues, Firas Khader, Soroosh Tayebi Arasteh, Christoph Haarburger, Christiane Kuhl, Tianci Wang, Tianyu Han, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn

    Abstract: The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Journal ref: Sci Rep 13, 12098 (2023)

  17. arXiv:2211.11703   

    cs.CL cs.SD eess.AS

    Towards continually learning new languages

    Authors: Ngoc-Quan Pham, Jan Niehues, Alexander Waibel

    Abstract: Multilingual speech recognition with neural networks is often implemented with batch-learning, when all of the languages are available before training. An ability to add new languages after the prior training sessions can be economically beneficial, but the main challenge is catastrophic forgetting. In this work, we combine the qualities of weight factorization and elastic weight consolidation in… ▽ More

    Submitted 1 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Work in progress

  18. arXiv:2211.04939  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Speech Translation with Pre-trained Models

    Authors: Zhaolin Li, Jan Niehues

    Abstract: When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models. The availability of pre-trained models is a promising opportunity to build strong speech translation systems efficiently. In a first step, we investigate efficient strategies to build cascaded and end-to-end speech tr… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  19. arXiv:2211.01292  [pdf, other

    cs.CL cs.AI

    Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

    Authors: Danni Liu, Jan Niehues

    Abstract: The cornerstone of multilingual neural translation is shared representations across languages. Given the theoretically infinite representation power of neural networks, semantically identical sentences are likely represented differently. While representing sentences in the continuous latent space ensures expressiveness, it introduces the risk of capturing of irrelevant features which hinders the l… ▽ More

    Submitted 18 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: WMT 2022

  20. arXiv:2205.12304  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Adaptive multilingual speech recognition with pretrained models

    Authors: Ngoc-Quan Pham, Alex Waibel, Jan Niehues

    Abstract: Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretra… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  21. arXiv:2204.10593  [pdf, other

    cs.CL cs.SD eess.AS

    LibriS2S: A German-English Speech-to-Speech Translation Corpus

    Authors: Pedro Jeuris, Jan Niehues

    Abstract: Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address th… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted to LREC 2022

  22. arXiv:2204.06028  [pdf, other

    cs.CL

    CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

    Authors: Peter Polák, Ngoc-Quan Ngoc, Tuan-Nam Nguyen, Danni Liu, Carlos Mullov, Jan Niehues, Ondřej Bojar, Alexander Waibel

    Abstract: In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022. We explore strategies to utilize an offline model in a simultaneous setting without the need to modify the original model. In our experiments, we show that our onlinization algorithm is almost on par with the offline setting while being $3\times$ faster than offline in terms of latency on the test set.… ▽ More

    Submitted 11 May, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted to IWSLT22

  23. arXiv:2203.14835  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Multilingual Simultaneous Speech Translation

    Authors: Shashank Subramanya, Jan Niehues

    Abstract: Applications designed for simultaneous speech translation during events such as conferences or meetings need to balance quality and lag while displaying translated text to deliver a good user experience. One common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. Based on a technique to adapt end-to-end monolingual models… ▽ More

    Submitted 29 March, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  24. Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

    Authors: Tu Anh Dinh, Danni Liu, Jan Niehues

    Abstract: Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 6 pages, 5 figures, accepted to IEEE ICASSP 2022. arXiv admin note: text overlap with arXiv:2107.06010

    ACM Class: I.2.7

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6222-6226

  25. arXiv:2201.05700  [pdf, other

    cs.CL cs.AI

    Cost-Effective Training in Low-Resource Neural Machine Translation

    Authors: Sai Koneru, Danni Liu, Jan Niehues

    Abstract: While Active Learning (AL) techniques are explored in Neural Machine Translation (NMT), only a few works focus on tackling low annotation budgets where a limited number of sentences can get translated. Such situations are especially challenging and can occur for endangered languages with few human annotators or having cost constraints to label large amounts of data. Although AL is shown to be help… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

  26. arXiv:2103.15877  [pdf, other

    cs.CL cs.AI

    Unsupervised Machine Translation On Dravidian Languages

    Authors: Sai Koneru, Danni Liu, Jan Niehues

    Abstract: Unsupervised neural machine translation (UNMT) is beneficial especially for low resource languages such as those from the Dravidian family. However, UNMT systems tend to fail in realistic scenarios involving actual low resource languages. Recent works propose to utilize auxiliary parallel data and have achieved state-of-the-art results. In this work, we focus on unsupervised translation between En… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

  27. arXiv:2102.06558  [pdf, other

    cs.CL

    Continuous Learning in Neural Machine Translation using Bilingual Dictionaries

    Authors: Jan Niehues

    Abstract: While recent advances in deep learning led to significant improvements in machine translation, neural machine translation is often still not able to continuously adapt to the environment. For humans, as well as for machine translation, bilingual dictionaries are a promising knowledge source to continuously integrate new knowledge. However, their exploitation poses several challenges: The system ne… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 9 pages, EACL 2021

  28. arXiv:2012.15127  [pdf, other

    cs.CL

    Improving Zero-Shot Translation by Disentangling Positional Information

    Authors: Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li

    Abstract: Multilingual neural machine translation has shown the capability of directly translating between language pairs unseen in training, i.e. zero-shot translation. Despite being conceptually attractive, it often suffers from low output quality. The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training. W… ▽ More

    Submitted 30 June, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: ACL 2021

  29. arXiv:2005.12143  [pdf, other

    cs.CL

    Adapting End-to-End Speech Recognition for Readable Subtitles

    Authors: Danni Liu, Jan Niehues, Gerasimos Spanakis

    Abstract: Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investi… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: IWSLT 2020

  30. arXiv:2005.11185  [pdf, other

    cs.CL cs.SD eess.AS

    Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

    Authors: Danni Liu, Gerasimos Spanakis, Jan Niehues

    Abstract: Encoder-decoder models provide a generic architecture for sequence-to-sequence tasks such as speech recognition and translation. While offline systems are often evaluated on quality metrics like word error rates (WER) and BLEU, latency is also a crucial factor in many practical use-cases. We propose three latency reduction techniques for chunk-based incremental inference and evaluate their efficie… ▽ More

    Submitted 13 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020

  31. arXiv:2005.09940  [pdf, other

    eess.AS cs.CL cs.SD

    Relative Positional Encoding for Speech Recognition and Direct Translation

    Authors: Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alexander Waibel

    Abstract: Transformer models are powerful sequence-to-sequence architectures that are capable of directly map** speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  32. arXiv:2004.03176  [pdf, other

    cs.CL

    Machine Translation with Unsupervised Length-Constraints

    Authors: Jan Niehues

    Abstract: We have seen significant improvements in machine translation due to the usage of deep learning. While the improvements in translation quality are impressive, the encoder-decoder architecture enables many more possibilities. In this paper, we explore one of these, the generation of constraint translation. We focus on length constraints, which are essential if the translation should be displayed in… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: 8 pages

  33. arXiv:2003.09891  [pdf, other

    eess.AS cs.CL cs.SD

    Low Latency ASR for Simultaneous Speech Translation

    Authors: Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel

    Abstract: User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

  34. arXiv:1910.13296  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

    Authors: Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, Alex Waibel

    Abstract: Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhib… ▽ More

    Submitted 3 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: To appear in ICASSP 2020

  35. arXiv:1910.01859  [pdf, other

    cs.CL

    Modeling Confidence in Sequence-to-Sequence Models

    Authors: Jan Niehues, Ngoc-Quan Pham

    Abstract: Recently, significant improvements have been achieved in various natural language processing tasks using neural sequence-to-sequence models. While aiming for the best generation quality is important, ultimately it is also necessary to develop models that can assess the quality of their output. In this work, we propose to use the similarity between training and test conditions as a measure for mo… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: 8 pages; INLG 2019

  36. arXiv:1909.13790  [pdf, other

    cs.CL cs.SD eess.AS

    Incremental processing of noisy user utterances in the spoken language understanding task

    Authors: Stefan Constantin, Jan Niehues, Alex Waibel

    Abstract: The state-of-the-art neural network architectures make it possible to create spoken language understanding systems with high quality and fast processing time. One major challenge for real-world applications is the high latency of these systems caused by triggered actions with high executions times. If an action can be separated into subactions, the reaction time of the systems can be improved thro… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: 10 pages, 3 figures, 7 tables, forthcoming in W-NUT 2019

  37. arXiv:1906.08584  [pdf, other

    cs.CL

    Improving Zero-shot Translation with Language-Independent Constraints

    Authors: Ngoc-Quan Pham, Jan Niehues, Thanh-Le Ha, Alex Waibel

    Abstract: An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages. In this work, we carried out a… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: 10 pages version accepted in WMT 2019

  38. arXiv:1904.13377  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Very Deep Self-Attention Networks for End-to-End Speech Recognition

    Authors: Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Sebastian Stüker, Alexander Waibel

    Abstract: Recently, end-to-end sequence-to-sequence models for speech recognition have gained significant interest in the research community. While previous architecture choices revolve around time-delay neural networks (TDNN) and long short-term memory (LSTM) recurrent neural networks, we propose to use self-attention via the Transformer architecture as an alternative. Our analysis shows that deep Transfor… ▽ More

    Submitted 3 May, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: Submitted to INTERSPEECH 2019

  39. arXiv:1904.07209  [pdf, other

    cs.CL

    Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation

    Authors: Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel

    Abstract: Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech. Howev… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Authors' final version, accepted at TACL 2019

  40. arXiv:1812.06876  [pdf, other

    cs.CL

    Multi-task learning to improve natural language understanding

    Authors: Stefan Constantin, Jan Niehues, Alex Waibel

    Abstract: Recently advancements in sequence-to-sequence neural network architectures have led to an improved natural language understanding. When building a neural network-based Natural Language Understanding component, one main challenge is to collect enough training data. The generation of a synthetic dataset is an inexpensive and quick way to collect data. Since this data often has less variety than real… ▽ More

    Submitted 15 February, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

    Comments: 11 pages, 4 figures, 2 tables, forthcoming in IWSDS 2019

  41. arXiv:1811.03189  [pdf, other

    cs.CL

    Towards Fluent Translations from Disfluent Speech

    Authors: Elizabeth Salesky, Susanne Burger, Jan Niehues, Alex Waibel

    Abstract: When translating from speech, special consideration for conversational speech phenomena such as disfluencies is necessary. Most machine translation training data consists of well-formed written texts, causing issues when translating spontaneous speech. Previous work has introduced an intermediate step between speech recognition (ASR) and machine translation (MT) to remove disfluencies, making the… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: To appear at SLT 2018

  42. arXiv:1810.08641  [pdf, other

    cs.CL

    Optimizing Segmentation Granularity for Neural Machine Translation

    Authors: Elizabeth Salesky, Andrew Runge, Alex Coda, Jan Niehues, Graham Neubig

    Abstract: In neural machine translation (NMT), it is has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperpar… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

  43. arXiv:1809.03182  [pdf, other

    cs.CL

    Towards one-shot learning for rare-word translation with external experts

    Authors: Ngoc-Quan Pham, Jan Niehues, Alex Waibel

    Abstract: Neural machine translation (NMT) has significantly improved the quality of automatic translation models. One of the main challenges in current systems is the translation of rare words. We present a generic approach to address this weakness by having external models annotate the training data as Experts, and control the model-expert interaction with a pointer network and reinforcement learning. Our… ▽ More

    Submitted 10 September, 2018; originally announced September 2018.

    Comments: 2nd Workshop on Neural Machine Translation and Generation, ACL 2018

  44. arXiv:1808.00491  [pdf, ps, other

    cs.CL

    Low-Latency Neural Speech Translation

    Authors: Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel

    Abstract: Through the development of neural machine translation, the quality of machine translation systems has been improved significantly. By exploiting advancements in deep learning, systems are now able to better approximate the complex map** from source sentences to target sentences. But with this ability, new challenges also arise. An example is the translation of partial sentences in low-latency sp… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: 5 Pages; Interspeech

  45. arXiv:1807.11582  [pdf, other

    cs.CL cs.LG stat.ML

    A Hierarchical Approach to Neural Context-Aware Modeling

    Authors: Patrick Huber, Jan Niehues, Alex Waibel

    Abstract: We present a new recurrent neural network topology to enhance state-of-the-art machine learning systems by incorporating a broader context. Our approach overcomes recent limitations with extended narratives through a multi-layered computational approach to generate an abstract context representation. Therefore, the developed system captures the narrative on word-level, sentence-level, and context-… ▽ More

    Submitted 6 August, 2018; v1 submitted 27 July, 2018; originally announced July 2018.

    Comments: 8 pages, 2 figures, 1 table

  46. arXiv:1807.02658  [pdf, other

    cs.CL cs.LG

    Robust and Scalable Differentiable Neural Computer for Question Answering

    Authors: Jörg Franke, Jan Niehues, Alex Waibel

    Abstract: Deep learning models are often not easily adaptable to new tasks and require task-specific adjustments. The differentiable neural computer (DNC), a memory-augmented neural network, is designed as a general problem solver which can be used in a wide range of tasks. But in reality, it is hard to apply this model to new tasks. We analyze the DNC and identify possible improvements within the applicati… ▽ More

    Submitted 7 July, 2018; originally announced July 2018.

    Comments: Accepted at Workshop on Machine Reading for Question Answering (MRQA), ACL 2018. 14 pages, 5 figures

  47. arXiv:1803.09519  [pdf, other

    cs.CL

    Self-Attentional Acoustic Models

    Authors: Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel

    Abstract: Self-attention is a method of encoding sequences of vectors by relating these vectors to each-other based on pairwise similarities. These models have recently shown promising results for modeling discrete sequences, but they are non-trivial to apply to acoustic modeling due to computational and modeling issues. In this paper, we apply self-attention to acoustic modeling, proposing several improvem… ▽ More

    Submitted 18 June, 2018; v1 submitted 26 March, 2018; originally announced March 2018.

    Comments: Published at Interspeech 2018

  48. arXiv:1803.08983  [pdf, ps, other

    cs.CL cs.AI

    Automated Evaluation of Out-of-Context Errors

    Authors: Patrick Huber, Jan Niehues, Alex Waibel

    Abstract: We present a new approach to evaluate computational models for the task of text understanding by the means of out-of-context error detection. Through the novel design of our automated modification process, existing large-scale data sources can be adopted for a vast number of text understanding tasks. The data is thereby altered on a semantic level, allowing models to be tested against a challengin… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

    Comments: LREC 2018, 5 pages, Out-of-Context Error Recognition, Automatic Evaluation Dataset, Text Understanding, TEDTalk

  49. arXiv:1803.02279  [pdf, other

    cs.CL

    An End-to-End Goal-Oriented Dialog System with a Generative Natural Language Response Generation

    Authors: Stefan Constantin, Jan Niehues, Alex Waibel

    Abstract: Recently advancements in deep learning allowed the development of end-to-end trained goal-oriented dialog systems. Although these systems already achieve good performance, some simplifications limit their usage in real-life scenarios. In this work, we address two of these limitations: ignoring positional information and a fixed number of possible response candidates. We propose to use positional… ▽ More

    Submitted 15 March, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

    Comments: 11 pages, 4 figures, forthcoming in IWSDS 2018; added quantitative analysis of sensitivity to modified user utterances and minor improvements

  50. arXiv:1711.07893  [pdf, ps, other

    cs.CL

    Effective Strategies in Zero-Shot Neural Machine Translation

    Authors: Thanh-Le Ha, Jan Niehues, Alexander Waibel

    Abstract: In this paper, we proposed two strategies which can be applied to a multilingual neural machine translation system in order to better tackle zero-shot scenarios despite not having any parallel corpus. The experiments show that they are effective in terms of both performance and computing resources, especially in multilingual translation of unbalanced data in real zero-resourced condition when they… ▽ More

    Submitted 22 November, 2017; v1 submitted 21 November, 2017; originally announced November 2017.

    Comments: submitted to IWSLT17