Skip to main content

Showing 1–10 of 10 results for author: Müller, N M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.03512  [pdf, other

    cs.SD cs.AI eess.AS

    Harder or Different? Understanding Generalization of Audio Deepfake Detection

    Authors: Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger

    Abstract: Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using a… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Journal ref: Interspeech 2024

  2. arXiv:2402.06304  [pdf, ps, other

    cs.SD cs.AI eess.AS

    A New Approach to Voice Authenticity

    Authors: Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger

    Abstract: Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purpo… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  3. arXiv:2401.09512  [pdf, other

    cs.SD eess.AS

    MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

    Authors: Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

    Abstract: Text-to-Speech (TTS) technology brings significant advantages, such as giving a voice to those with speech impairments, but also enables audio deepfakes and spoofs. The former mislead individuals and may propagate misinformation, while the latter undermine voice biometric security systems. AI-based detection can help to address these challenges by automatically differentiating between genuine and… ▽ More

    Submitted 16 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: IJCNN 2024

  4. arXiv:2308.11800  [pdf, other

    cs.SD cs.LG eess.AS

    Complex-valued neural networks for voice anti-spoofing

    Authors: Nicolas M. Müller, Philip Sperl, Konstantin Böttinger

    Abstract: Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper pr… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Interspeech 2023

  5. arXiv:2211.15510  [pdf, other

    cs.CV cs.LG eess.IV

    Localized Shortcut Removal

    Authors: Nicolas M. Müller, Jochen Jacobs, Jennifer Williams, Konstantin Böttinger

    Abstract: Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at… ▽ More

    Submitted 23 May, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: Accepted at XAI4CV @ CVPR2023

  6. arXiv:2203.16263  [pdf, other

    cs.SD cs.LG eess.AS

    Does Audio Deepfake Detection Generalize?

    Authors: Nicolas M. Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar, Konstantin Böttinger

    Abstract: Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Whi… ▽ More

    Submitted 21 April, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  7. arXiv:2107.09667  [pdf, other

    cs.HC cs.AI cs.SD eess.AS

    Human Perception of Audio Deepfakes

    Authors: Nicolas M. Müller, Karla Pizzi, Jennifer Williams

    Abstract: The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques, however, human detection capabilities are far less explored. In this paper, we present results from comparing the abilities of humans and machines for detecting audio deepfakes used to imitate… ▽ More

    Submitted 6 October, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: Published at ACM Multimedia 2022 Workshop DDAM First International Workshop on Deepfake Detection for Audio Multimedia at ACM Multimedia 2022

  8. arXiv:2106.12914  [pdf, other

    cs.SD eess.AS

    Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

    Authors: Nicolas M. Müller, Franziska Dieckmann, Pavel Czempin, Roman Canals, Konstantin Böttinger, Jennifer Williams

    Abstract: We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenom… ▽ More

    Submitted 28 September, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Journal ref: ASVspoof 2021 Workshop

  9. arXiv:2104.05557  [pdf, other

    eess.AS cs.SD

    SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

    Authors: Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti

    Abstract: In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transform… ▽ More

    Submitted 15 June, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: Accepted on Interspeech 2021

  10. arXiv:2010.07190  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Towards Resistant Audio Adversarial Examples

    Authors: Tom Dörr, Karla Markert, Nicolas M. Müller, Konstantin Böttinger

    Abstract: Adversarial examples tremendously threaten the availability and integrity of machine learning-based systems. While the feasibility of such attacks has been observed first in the domain of image processing, recent research shows that speech recognition is also susceptible to adversarial attacks. However, reliably bridging the air gap (i.e., making the adversarial examples work when recorded via a m… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    ACM Class: I.2

    Journal ref: SPAI 20: Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial IntelligenceOctober 2020 Pages 3-10