Skip to main content

Showing 1–8 of 8 results for author: Zeiler, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2109.04894  [pdf, other

    eess.AS cs.SD eess.IV

    Large-vocabulary Audio-visual Speech Recognition in Noisy Environments

    Authors: Wentao Yu, Steffen Zeiler, Dorothea Kolossa

    Abstract: Audio-visual speech recognition (AVSR) can effectively and significantly improve the recognition rates of small-vocabulary systems, compared to their audio-only counterparts. For large-vocabulary systems, however, there are still many difficulties, such as unsatisfactory video recognition accuracies, that make it hard to improve over audio-only baselines. In this paper, we specifically consider su… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Journal ref: The IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021

  2. arXiv:2104.09482  [pdf, other

    eess.AS cs.SD

    Fusing information streams in end-to-end audio-visual speech recognition

    Authors: Wentao Yu, Steffen Zeiler, Dorothea Kolossa

    Abstract: End-to-end acoustic speech recognition has quickly gained widespread popularity and shows promising results in many studies. Specifically the joint transformer/CTC model provides very good performance in many tasks. However, under noisy and distorted conditions, the performance still degrades notably. While audio-visual speech recognition can significantly improve the recognition rate of end-to-en… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: 5 pages

    Journal ref: Published in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

  3. arXiv:2103.01173  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Unsupervised Classification of Voiced Speech and Pitch Tracking Using Forward-Backward Kalman Filtering

    Authors: Benedikt Boenninghoff, Robert M. Nickel, Steffen Zeiler, Dorothea Kolossa

    Abstract: The detection of voiced speech, the estimation of the fundamental frequency, and the tracking of pitch values over time are crucial subtasks for a variety of speech processing techniques. Many different algorithms have been developed for each of the three subtasks. We present a new algorithm that integrates the three subtasks into a single procedure. The algorithm can be applied to pre-recorded sp… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: Speech Communication; 12. ITG Symposium, 5-7 Oct. 2016

  4. arXiv:2007.14223  [pdf, other

    eess.AS cs.SD

    Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition

    Authors: Wentao Yu, Steffen Zeiler, Dorothea Kolossa

    Abstract: For many small- and medium-vocabulary tasks, audio-visual speech recognition can significantly improve the recognition rates compared to audio-only systems. However, there is still an ongoing debate regarding the best combination strategy for multi-modal information, which should allow for the translation of these gains to large-vocabulary recognition. While an integration at the level of state-po… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: 5 pages

    Journal ref: Published in Proceedings of the 28th European Signal Processing Conference (EUSIPCO), 2020

  5. arXiv:2005.13930  [pdf, other

    cs.LG cs.CL stat.ML

    Variational Autoencoder with Embedded Student-$t$ Mixture Model for Authorship Attribution

    Authors: Benedikt Boenninghoff, Steffen Zeiler, Robert M. Nickel, Dorothea Kolossa

    Abstract: Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task.… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: Preprint

  6. arXiv:1908.07844  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Similarity Learning for Authorship Verification in Social Media

    Authors: Benedikt Boenninghoff, Robert M. Nickel, Steffen Zeiler, Dorothea Kolossa

    Abstract: Authorship verification tries to answer the question if two documents with unknown authors were written by the same author or not. A range of successful technical approaches has been proposed for this task, many of which are based on traditional linguistic features such as n-grams. These algorithms achieve good results for certain types of written documents like books and novels. Forensic authorsh… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: 5 pages, 3 figures, 1 table, presented on ICASSP 2019 in Brighton, UK

  7. arXiv:1908.01551  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems

    Authors: Lea Schönherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, Dorothea Kolossa

    Abstract: Automatic speech recognition (ASR) systems can be fooled via targeted adversarial examples, which induce the ASR to produce arbitrary transcriptions in response to altered audio signals. However, state-of-the-art adversarial examples typically have to be fed into the ASR system directly, and are not successful when played in a room. The few published over-the-air adversarial examples fall into one… ▽ More

    Submitted 24 November, 2020; v1 submitted 5 August, 2019; originally announced August 2019.

  8. arXiv:1808.05665  [pdf, other

    cs.CR cs.SD eess.AS

    Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding

    Authors: Lea Schönherr, Katharina Kohls, Steffen Zeiler, Thorsten Holz, Dorothea Kolossa

    Abstract: Voice interfaces are becoming accepted widely as input methods for a diverse set of devices. This development is driven by rapid improvements in automatic speech recognition (ASR), which now performs on par with human listening in many tasks. These improvements base on an ongoing evolution of DNNs as the computational core of ASR. However, recent research results show that DNNs are vulnerable to a… ▽ More

    Submitted 30 October, 2018; v1 submitted 16 August, 2018; originally announced August 2018.