Skip to main content

Showing 1–9 of 9 results for author: Primus, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15897  [pdf, other

    eess.AS cs.LG cs.SD

    Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval

    Authors: Paul Primus, Gerhard Widmer

    Abstract: Matching raw audio signals with textual descriptions requires understanding the audio's content and the description's semantics and then drawing connections between the two modalities. This paper investigates a hybrid retrieval system that utilizes audio metadata as an additional clue to understand the content of audio signals before matching them with textual queries. We experimented with metadat… ▽ More

    Submitted 2 July, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 32nd European Signal Processing Conference, EUSIPCO 2024

  2. arXiv:2405.10018  [pdf, other

    eess.AS cs.SD

    Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

    Authors: Florian Schmid, Paul Primus, Toni Heittola, Annamaria Mesaros, Irene Martín-Morató, Khaled Koutini, Gerhard Widmer

    Abstract: This article describes the Data-Efficient Low-Complexity Acoustic Scene Classification Task in the DCASE 2024 Challenge and the corresponding baseline system. The task setup is a continuation of previous editions (2022 and 2023), which focused on recording device mismatches and low-complexity constraints. This year's edition introduces an additional real-world problem: participants must develop da… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Task Description Page: https://dcase.community/challenge2024/task-data-efficient-low-complexity-acoustic-scene-classification

  3. arXiv:2308.04258  [pdf, other

    eess.AS cs.IR cs.LG cs.SD

    Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

    Authors: Paul Primus, Khaled Koutini, Gerhard Widmer

    Abstract: This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers. Our method projects recordings and textual descriptions into a shared audio-caption space in which related examples from different modalities are close. Through a systematic analysis, we examine how each component of the system influences retrieval performance. As a result, we identify two k… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: submitted to DCASE Workshop 2023

  4. arXiv:2208.11460  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio & Text Augmentations

    Authors: Paul Primus, Gerhard Widmer

    Abstract: The absence of large labeled datasets remains a significant challenge in many application areas of deep learning. Researchers and practitioners typically resort to transfer learning and data augmentation to alleviate this issue. We study these strategies in the context of audio retrieval with natural language queries (Task 6b of the DCASE 2022 Challenge). Our proposed system uses pre-trained embed… ▽ More

    Submitted 29 October, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: accepted at DCASE Workshop 2022

  5. arXiv:2208.11402  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers

    Authors: Paul Primus, Gerhard Widmer

    Abstract: Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training. Zero-Shot (ZS) learning overcomes this restriction by predicting classes based on adaptable class descriptions. This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning. To this end, we compare the… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: published in EUSIPCO 2022

  6. arXiv:2011.02949  [pdf, other

    eess.AS cs.LG cs.SD

    Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples

    Authors: Paul Primus, Verena Haunschmid, Patrick Praher, Gerhard Widmer

    Abstract: Unsupervised anomalous sound detection is concerned with identifying sounds that deviate from what is defined as 'normal', without explicitly specifying the types of anomalies. A significant obstacle is the diversity and rareness of outliers, which typically prevent us from collecting a representative set of anomalous sounds. As a consequence, most anomaly detection methods use unsupervised rather… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: published in DCASE 2020 Workshop

  7. arXiv:2007.13503  [pdf, other

    eess.AS cs.LG cs.SD

    Receptive-Field Regularized CNNs for Music Classification and Tagging

    Authors: Khaled Koutini, Hamid Eghbal-Zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on l… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  8. arXiv:2007.02650  [pdf, other

    cs.LG stat.ML

    On Data Augmentation and Adversarial Risk: An Empirical Analysis

    Authors: Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer

    Abstract: Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models. These techniques rely on different ideas such as invariance-preserving transformations (e.g, expert-defined augmentation), statistical heuristics (e.g, Mixup), and learning the data distribution (e.g, GANs). However, in the adversarial setting… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: 21 pages, 15 figures, 3 tables

  9. arXiv:1909.02869  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

    Authors: Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, Gerhard Widmer

    Abstract: Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning. We study this problem in the context of machine listening (Task 1b of the DCASE 2019 Challenge). We propose a novel approach to learn domain-invariant classifiers in an end-to-end fashion by enforcing equal hidden layer representations for domain-… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Published at the Workshop on Detection and Classification of Acoustic Scenes and Events, 25-26 October 2019, New York, USA