Skip to main content

Showing 1–13 of 13 results for author: Niehues, J

Searching in archive eess. Search in all archives.
.
  1. Diffusion Probabilistic Models beat GANs on Medical Images

    Authors: Gustav Müller-Franzes, Jan Moritz Niehues, Firas Khader, Soroosh Tayebi Arasteh, Christoph Haarburger, Christiane Kuhl, Tianci Wang, Tianyu Han, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn

    Abstract: The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Journal ref: Sci Rep 13, 12098 (2023)

  2. arXiv:2211.11703   

    cs.CL cs.SD eess.AS

    Towards continually learning new languages

    Authors: Ngoc-Quan Pham, Jan Niehues, Alexander Waibel

    Abstract: Multilingual speech recognition with neural networks is often implemented with batch-learning, when all of the languages are available before training. An ability to add new languages after the prior training sessions can be economically beneficial, but the main challenge is catastrophic forgetting. In this work, we combine the qualities of weight factorization and elastic weight consolidation in… ▽ More

    Submitted 1 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Work in progress

  3. arXiv:2211.04939  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Speech Translation with Pre-trained Models

    Authors: Zhaolin Li, Jan Niehues

    Abstract: When building state-of-the-art speech translation models, the need for large computational resources is a significant obstacle due to the large training data size and complex models. The availability of pre-trained models is a promising opportunity to build strong speech translation systems efficiently. In a first step, we investigate efficient strategies to build cascaded and end-to-end speech tr… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  4. arXiv:2205.12304  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Adaptive multilingual speech recognition with pretrained models

    Authors: Ngoc-Quan Pham, Alex Waibel, Jan Niehues

    Abstract: Multilingual speech recognition with supervised learning has achieved great results as reflected in recent research. With the development of pretraining methods on audio and text data, it is imperative to transfer the knowledge from unsupervised multilingual models to facilitate recognition, especially in many languages with limited data. Our work investigated the effectiveness of using two pretra… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  5. arXiv:2204.10593  [pdf, other

    cs.CL cs.SD eess.AS

    LibriS2S: A German-English Speech-to-Speech Translation Corpus

    Authors: Pedro Jeuris, Jan Niehues

    Abstract: Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address th… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted to LREC 2022

  6. arXiv:2203.14835  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Multilingual Simultaneous Speech Translation

    Authors: Shashank Subramanya, Jan Niehues

    Abstract: Applications designed for simultaneous speech translation during events such as conferences or meetings need to balance quality and lag while displaying translated text to deliver a good user experience. One common approach to building online spoken language translation systems is by leveraging models built for offline speech translation. Based on a technique to adapt end-to-end monolingual models… ▽ More

    Submitted 29 March, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  7. Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

    Authors: Tu Anh Dinh, Danni Liu, Jan Niehues

    Abstract: Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: 6 pages, 5 figures, accepted to IEEE ICASSP 2022. arXiv admin note: text overlap with arXiv:2107.06010

    ACM Class: I.2.7

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6222-6226

  8. arXiv:2005.11185  [pdf, other

    cs.CL cs.SD eess.AS

    Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

    Authors: Danni Liu, Gerasimos Spanakis, Jan Niehues

    Abstract: Encoder-decoder models provide a generic architecture for sequence-to-sequence tasks such as speech recognition and translation. While offline systems are often evaluated on quality metrics like word error rates (WER) and BLEU, latency is also a crucial factor in many practical use-cases. We propose three latency reduction techniques for chunk-based incremental inference and evaluate their efficie… ▽ More

    Submitted 13 October, 2020; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020

  9. arXiv:2005.09940  [pdf, other

    eess.AS cs.CL cs.SD

    Relative Positional Encoding for Speech Recognition and Direct Translation

    Authors: Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky, Sebastian Stueker, Jan Niehues, Alexander Waibel

    Abstract: Transformer models are powerful sequence-to-sequence architectures that are capable of directly map** speech inputs to transcriptions or translations. However, the mechanism for modeling positions in this model was tailored for text modeling, and thus is less ideal for acoustic inputs. In this work, we adapt the relative position encoding scheme to the Speech Transformer, where the key addition… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  10. arXiv:2003.09891  [pdf, other

    eess.AS cs.CL cs.SD

    Low Latency ASR for Simultaneous Speech Translation

    Authors: Thai Son Nguyen, Jan Niehues, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Muller, Matthias Sperber, Sebastian Stueker, Alex Waibel

    Abstract: User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

  11. arXiv:1910.13296  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

    Authors: Thai-Son Nguyen, Sebastian Stueker, Jan Niehues, Alex Waibel

    Abstract: Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhib… ▽ More

    Submitted 3 February, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

    Comments: To appear in ICASSP 2020

  12. arXiv:1909.13790  [pdf, other

    cs.CL cs.SD eess.AS

    Incremental processing of noisy user utterances in the spoken language understanding task

    Authors: Stefan Constantin, Jan Niehues, Alex Waibel

    Abstract: The state-of-the-art neural network architectures make it possible to create spoken language understanding systems with high quality and fast processing time. One major challenge for real-world applications is the high latency of these systems caused by triggered actions with high executions times. If an action can be separated into subactions, the reaction time of the systems can be improved thro… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: 10 pages, 3 figures, 7 tables, forthcoming in W-NUT 2019

  13. arXiv:1904.13377  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Very Deep Self-Attention Networks for End-to-End Speech Recognition

    Authors: Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Sebastian Stüker, Alexander Waibel

    Abstract: Recently, end-to-end sequence-to-sequence models for speech recognition have gained significant interest in the research community. While previous architecture choices revolve around time-delay neural networks (TDNN) and long short-term memory (LSTM) recurrent neural networks, we propose to use self-attention via the Transformer architecture as an alternative. Our analysis shows that deep Transfor… ▽ More

    Submitted 3 May, 2019; v1 submitted 30 April, 2019; originally announced April 2019.

    Comments: Submitted to INTERSPEECH 2019