Skip to main content

Showing 1–8 of 8 results for author: Kürzinger, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2112.09323  [pdf, other

    cs.SD eess.AS

    JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

    Authors: Shinnosuke Takamichi, Ludwig Kürzinger, Takaaki Saeki, Sayaka Shiota, Shinji Watanabe

    Abstract: In this paper, we construct a new Japanese speech corpus called "JTubeSpeech." Although recent end-to-end learning requires large-size speech corpora, open-sourced such corpora for languages other than English have not yet been established. In this paper, we describe the construction of a corpus from YouTube videos and subtitles for speech recognition and speaker verification. Our method can autom… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

    Comments: Submitted to ICASSP2022

  2. arXiv:2104.01471  [pdf, other

    eess.AS

    Adversarial Joint Training with Self-Attention Mechanism for Robust End-to-End Speech Recognition

    Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel, Gerhard Rigoll

    Abstract: Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predicts the next output symbol depending on the full input sequence and the previous predictions. Inspired by the extensive applications of the generative adversarial networks (GANs) in speech enh… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  3. arXiv:2010.07597  [pdf, other

    eess.AS cs.SD eess.SP

    Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

    Authors: Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll

    Abstract: Many end-to-end Automatic Speech Recognition (ASR) systems still rely on pre-processed frequency-domain features that are handcrafted to emulate the human hearing. Our work is motivated by recent advances in integrated learnable feature extraction. For this, we propose Lightweight Sinc-Convolutions (LSC) that integrate Sinc-convolutions with depthwise convolutions as a low-parameter machine-learna… ▽ More

    Submitted 16 October, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted at INTERSPEECH 2020

  4. arXiv:2007.12892  [pdf, ps, other

    eess.AS cs.CR cs.SD

    MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

    Authors: Iustina Andronic, Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Gerhard Rigoll, Bernhard U. Seeber

    Abstract: Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attenti… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: Submitted and accepted at SPECOM 2020 conference

  5. arXiv:2007.10723  [pdf, ps, other

    eess.AS cs.SD

    Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

    Authors: Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias Watzel, Gerhard Rigoll

    Abstract: Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal C… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: To be published at SPECOM 2020

  6. CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

    Authors: Ludwig Kürzinger, Dominik Winkelbauer, Lujun Li, Tobias Watzel, Gerhard Rigoll

    Abstract: Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/ HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance. In this work, we combine freely available corpora f… ▽ More

    Submitted 5 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: Published at SPECOM 2020

    Journal ref: Speech and Computer (2020)

  7. arXiv:2006.08506  [pdf, ps, other

    eess.AS cs.CL

    Regularized Forward-Backward Decoder for Attention Models

    Authors: Tobias Watzel, Ludwig Kürzinger, Lujun Li, Gerhard Rigoll

    Abstract: Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the decoder. In this paper, we propose a novel regularization technique incorporating a second decoder during the training phase. This decoder is optimized on time-r… ▽ More

    Submitted 28 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  8. arXiv:1911.02086  [pdf, other

    eess.AS cs.CL cs.SD

    Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

    Authors: Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, Gerhard Rigoll

    Abstract: Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-on and battery-powered application scenarios for smart devices put constraints on hardware resources and power consumption, while also demanding high accuracy as well as real-time capability. Previous architectures first extracted acoustic features and then applied a neural network to classify keyword probabiliti… ▽ More

    Submitted 3 May, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Accepted at ICASSP 2020