Skip to main content

Showing 1–9 of 9 results for author: Tahon, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.13385  [pdf, other

    eess.AS cs.AI cs.SD

    Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

    Authors: Martin Lebourdais, Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

    Abstract: Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to sati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024, 5 pages, 2 figures, 3 tables

  2. arXiv:2406.10073  [pdf, other

    eess.AS cs.CL cs.HC cs.SD

    Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content

    Authors: Rémi Uro, Marie Tahon, David Doukhan, Antoine Laurent, Albert Rilliard

    Abstract: Transition Relevance Places are defined as the end of an utterance where the interlocutor may take the floor without interrupting the current speaker --i.e., a place where the turn is terminal. Analyzing turn terminality is useful to study the dynamic of turn-taking in spontaneous conversations. This paper presents an automatic classification of spoken utterances as Terminal or Non-Terminal in mul… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: keywords : Spoken interaction, Media, TV, Radio, Transition-Relevance Places, Turn Taking, Interruption. Accepted to InterSpeech 2024, Kos Island, Greece

  3. arXiv:2404.17552  [pdf, other

    eess.AS cs.CL cs.DL cs.LG cs.SD

    A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification

    Authors: Rémi Uro, David Doukhan, Albert Rilliard, Laëtitia Larcher, Anissa-Claire Adgharouamane, Marie Tahon, Antoine Laurent

    Abstract: This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For eac… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Keywords:, semi-automatic processing, corpus creation, diarization, speaker identification, gender-balanced, age-balanced, speaker corpus, diachrony

    Journal ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3271-3280, Marseille, 20-25 June 2022. European Language Resources Association (ELRA)

  4. arXiv:2401.08268  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    An Explainable Proxy Model for Multiabel Audio Segmentation

    Authors: Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

    Abstract: Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision-making with machine learning. In this paper, we propose an explainable multilabel segmentation model that solves speech activity (SAD), music (MD), noise (ND),… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

    Report number: AA001

  5. arXiv:2310.04481  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations

    Authors: Manon Macary, Marie Tahon, Yannick Estève, Daniel Luzzati

    Abstract: The goal of our research is to automatically retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    ACM Class: I.2.7

  6. arXiv:2307.13012  [pdf, other

    cs.SD cs.AI cs.NE eess.AS eess.SP

    Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

    Authors: Martin Lebourdais, Théo Mariotte, Marie Tahon, Anthony Larcher, Antoine Laurent, Silvio Montresor, Sylvain Meignier, Jean-Hugh Thomas

    Abstract: Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking inf… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  7. arXiv:2305.01759  [pdf, other

    eess.AS cs.AI cs.CL

    Evaluation of Speaker Anonymization on Emotional Speech

    Authors: Hubert Nourtel, Pierre Champion, Denis Jouvet, Anthony Larcher, Marie Tahon

    Abstract: Speech data carries a range of personal information, such as the speaker's identity and emotional state. These attributes can be used for malicious purposes. With the development of virtual assistants, a new generation of privacy threats has emerged. Current studies have addressed the topic of preserving speech privacy. One of them, the VoicePrivacy initiative aims to promote the development of pr… ▽ More

    Submitted 15 April, 2023; originally announced May 2023.

    Journal ref: Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication (62-66)

  8. arXiv:2210.07642  [pdf, other

    cs.SD cs.CL cs.HC cs.LG eess.AS

    Training speech emotion classifier without categorical annotations

    Authors: Meysam Shamsi, Marie Tahon

    Abstract: There are two paradigms of emotion representation, categorical labeling and dimensional description in continuous space. Therefore, the emotion recognition task can be treated as a classification or regression. The main aim of this study is to investigate the relation between these two representations and propose a classification pipeline that uses only dimensional annotation. The proposed approac… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  9. arXiv:2209.04167  [pdf, other

    cs.SD cs.AI eess.AS

    Overlapped speech and gender detection with WavLM pre-trained features

    Authors: Martin Lebourdais, Marie Tahon, Antoine Laurent, Sylvain Meignier

    Abstract: This article focuses on overlapped speech and gender detection in order to study interactions between women and men in French audiovisual media (Gender Equality Monitoring project). In this application context, we need to automatically segment the speech signal according to speakers gender, and to identify when at least two speakers speak at the same time. We propose to use WavLM model which has t… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: Submitted and accepted to Interspeech 2022