Skip to main content

Showing 1–6 of 6 results for author: Koudounas, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14693  [pdf, other

    eess.AS cs.LG

    Voice Disorder Analysis: a Transformer-based Approach

    Authors: Alkis Koudounas, Gabriele Ciravegna, Marco Fantini, Giovanni Succo, Erika Crosetti, Tania Cerquitelli, Elena Baralis

    Abstract: Voice disorders are pathologies significantly affecting patient quality of life. However, non-invasive automated diagnosis of these pathologies is still under-explored, due to both a shortage of pathological voice data, and diversity of the recording types used for the diagnosis. This paper proposes a novel solution that adopts transformers directly working on raw voice signals and addresses data… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2406.14686  [pdf, other

    cs.CL cs.LG eess.AS

    A Contrastive Learning Approach to Mitigate Bias in Speech Models

    Authors: Alkis Koudounas, Flavio Giobergia, Eliana Pastor, Elena Baralis

    Abstract: Speech models may be affected by performance imbalance in different population subgroups, raising concerns about fair treatment across these groups. Prior attempts to mitigate unfairness either focus on user-defined subgroups, potentially overlooking other affected subgroups, or do not explicitly improve the internal representation at the subgroup level. This paper proposes the first adoption of c… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  3. arXiv:2405.00934  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Benchmarking Representations for Speech, Music, and Acoustic Events

    Authors: Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi

    Abstract: Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-traine… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  4. arXiv:2404.07226  [pdf, other

    eess.AS cs.LG cs.SD

    Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

    Authors: Alkis Koudounas, Flavio Giobergia

    Abstract: The Fearless Steps APOLLO Community Resource provides unparalleled opportunities to explore the potential of multi-speaker team communications from NASA Apollo missions. This study focuses on discovering the characteristics that make Apollo recordings more or less intelligible to Automatic Speech Recognition (ASR) methods. We extract, for each audio recording, interpretable metadata on recordings… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 2 pages

  5. arXiv:2309.07733  [pdf, other

    cs.CL cs.SD eess.AS

    Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

    Authors: Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis

    Abstract: Recent advances in eXplainable AI (XAI) have provided new insights into how models for vision, language, and tabular data operate. However, few approaches exist for understanding speech models. Existing work focuses on a few spoken language understanding (SLU) tasks, and explanations are difficult to interpret for most users. We introduce a new approach to explain speech classification models. We… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 8 pages

  6. ITALIC: An Italian Intent Classification Dataset

    Authors: Alkis Koudounas, Moreno La Quatra, Lorenzo Vaiani, Luca Colomba, Giuseppe Attanasio, Eliana Pastor, Luca Cagliero, Elena Baralis

    Abstract: Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects. We introduce ITALIC, the first large-scale speech dataset designed for intent classification in Italian. The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Itali… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023. Data and code at https://github.com/RiTA-nlp/ITALIC