Skip to main content

Showing 1–12 of 12 results for author: Rigoll, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2106.03932  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild

    Authors: Okan Köpüklü, Maja Taseska, Gerhard Rigoll

    Abstract: Successful active speaker detection requires a three-stage pipeline: (i) audio-visual encoding for all speakers in the clip, (ii) inter-speaker relation modeling between a reference speaker and the background speakers within each frame, and (iii) temporal modeling for the reference speaker. Each stage of this pipeline plays an important role for the final performance of the created architecture. B… ▽ More

    Submitted 7 September, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted to ICCV 2021

  2. arXiv:2104.01471  [pdf, other

    eess.AS

    Adversarial Joint Training with Self-Attention Mechanism for Robust End-to-End Speech Recognition

    Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel, Gerhard Rigoll

    Abstract: Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predicts the next output symbol depending on the full input sequence and the previous predictions. Inspired by the extensive applications of the generative adversarial networks (GANs) in speech enh… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  3. arXiv:2010.07597  [pdf, other

    eess.AS cs.SD eess.SP

    Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

    Authors: Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll

    Abstract: Many end-to-end Automatic Speech Recognition (ASR) systems still rely on pre-processed frequency-domain features that are handcrafted to emulate the human hearing. Our work is motivated by recent advances in integrated learnable feature extraction. For this, we propose Lightweight Sinc-Convolutions (LSC) that integrate Sinc-convolutions with depthwise convolutions as a low-parameter machine-learna… ▽ More

    Submitted 16 October, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted at INTERSPEECH 2020

  4. arXiv:2009.14660  [pdf, other

    cs.CV cs.LG eess.IV

    Driver Anomaly Detection: A Dataset and Contrastive Learning Approach

    Authors: Okan Köpüklü, Jiapeng Zheng, Hang Xu, Gerhard Rigoll

    Abstract: Distracted drivers are more likely to fail to anticipate hazards, which result in car accidents. Therefore, detecting anomalies in drivers' actions (i.e., any action deviating from normal driving) contains the utmost importance to reduce driver-related accidents. However, there are unbounded many anomalous actions that a driver can do while driving, which leads to an 'open set recognition' problem… ▽ More

    Submitted 30 November, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV 2021)

  5. arXiv:2009.14639  [pdf, other

    cs.CV cs.LG eess.IV

    Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing

    Authors: Okan Köpüklü, Stefan Hörmann, Fabian Herzog, Hakan Cevikalp, Gerhard Rigoll

    Abstract: Convolutional Neural Networks with 3D kernels (3D-CNNs) currently achieve state-of-the-art results in video recognition tasks due to their supremacy in extracting spatiotemporal features within video frames. There have been many successful 3D-CNN architectures surpassing the state-of-the-art results successively. However, nearly all of them are designed to operate offline creating several serious… ▽ More

    Submitted 18 October, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

  6. arXiv:2007.12892  [pdf, ps, other

    eess.AS cs.CR cs.SD

    MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

    Authors: Iustina Andronic, Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Gerhard Rigoll, Bernhard U. Seeber

    Abstract: Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attenti… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: Submitted and accepted at SPECOM 2020 conference

  7. arXiv:2007.10723  [pdf, ps, other

    eess.AS cs.SD

    Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

    Authors: Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias Watzel, Gerhard Rigoll

    Abstract: Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal C… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: To be published at SPECOM 2020

  8. CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

    Authors: Ludwig Kürzinger, Dominik Winkelbauer, Lujun Li, Tobias Watzel, Gerhard Rigoll

    Abstract: Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/ HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance. In this work, we combine freely available corpora f… ▽ More

    Submitted 5 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: Published at SPECOM 2020

    Journal ref: Speech and Computer (2020)

  9. arXiv:2006.08506  [pdf, ps, other

    eess.AS cs.CL

    Regularized Forward-Backward Decoder for Attention Models

    Authors: Tobias Watzel, Ludwig Kürzinger, Lujun Li, Gerhard Rigoll

    Abstract: Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the decoder. In this paper, we propose a novel regularization technique incorporating a second decoder during the training phase. This decoder is optimized on time-r… ▽ More

    Submitted 28 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  10. arXiv:1911.02086  [pdf, other

    eess.AS cs.CL cs.SD

    Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

    Authors: Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, Gerhard Rigoll

    Abstract: Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-on and battery-powered application scenarios for smart devices put constraints on hardware resources and power consumption, while also demanding high accuracy as well as real-time capability. Previous architectures first extracted acoustic features and then applied a neural network to classify keyword probabiliti… ▽ More

    Submitted 3 May, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Accepted at ICASSP 2020

  11. arXiv:1909.05165  [pdf, other

    cs.CV eess.IV

    Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos

    Authors: Okan Köpüklü, Fabian Herzog, Gerhard Rigoll

    Abstract: Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i.e., spatiotemporal (ST) modeling. In this survey paper, we have made a comparative analysis of different ST modeling techniques for action and gecture recognition tasks. Since Convolutional Neural Networks (CNNs) are proved to be an effective tool as a feature extr… ▽ More

    Submitted 11 January, 2021; v1 submitted 11 September, 2019; originally announced September 2019.

  12. arXiv:1907.08009  [pdf, other

    cs.CV cs.LG eess.IV

    Real-Time Driver State Monitoring Using a CNN Based Spatio-Temporal Approach

    Authors: Neslihan Kose, Okan Kopuklu, Alexander Unnervik, Gerhard Rigoll

    Abstract: Many road accidents occur due to distracted drivers. Today, driver monitoring is essential even for the latest autonomous vehicles to alert distracted drivers in order to take over control of the vehicle in case of emergency. In this paper, a spatio-temporal approach is applied to classify drivers' distraction level and movement decisions using convolutional neural networks (CNNs). We approach thi… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: Accepted for publication by the IEEE Intelligent Transportation Systems Conference (ITSC 2019)