Skip to main content

Showing 1–8 of 8 results for author: Mariotte, T

.
  1. arXiv:2406.16145  [pdf, other

    cs.LG cs.AI

    Predefined Prototypes for Intra-Class Separation and Disentanglement

    Authors: Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon, Luis Vicente, Antonio Miguel, Eduardo Lleida

    Abstract: Prototypical Learning is based on the idea that there is a point (which we call prototype) around which the embeddings of a class are clustered. It has shown promising results in scenarios with little labeled data or to design explainable models. Typically, prototypes are either defined as the average of the embeddings of a class or are designed to be trainable. In this work, we propose to predefi… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.13385  [pdf, other

    eess.AS cs.AI cs.SD

    Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

    Authors: Martin Lebourdais, Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

    Abstract: Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to sati… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024, 5 pages, 2 figures, 3 tables

  3. arXiv:2406.03251  [pdf, other

    cs.SD cs.AI eess.AS

    ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

    Authors: Theo Mariotte, Anthony Larcher, Silvio Montresor, Jean-Hugh Thomas

    Abstract: Speaker Diarization (SD) aims at grou** speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually capture the audio signal. Beamforming, i.e., spatial filtering, is a common practice to process multi-microphone audio data. However, it often requires an expli… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, accepted at Interspeech 2024

  4. arXiv:2402.08312  [pdf, other

    eess.AS cs.SD

    Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection

    Authors: Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas

    Abstract: Voice Activity Detection (VAD) and Overlapped Speech Detection (OSD) are key pre-processing tasks for speaker diarization. In the meeting context, it is often easier to capture speech with a distant device. This consideration however leads to severe performance degradation. We study a unified supervised learning framework to solve distant multi-microphone joint VAD and OSD (VAD+OSD). This paper in… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 14 pages, 5 figures, accepted at IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

  5. arXiv:2401.09180  [pdf, other

    cs.LG cs.CV

    Unsupervised Multiple Domain Translation through Controlled Disentanglement in Variational Autoencoder

    Authors: Antonio Almudévar, Théo Mariotte, Alfonso Ortega, Marie Tahon

    Abstract: Unsupervised Multiple Domain Translation is the task of transforming data from one domain to other domains without having paired data to train the systems. Typically, methods based on Generative Adversarial Networks (GANs) are used to address this task. However, our proposal exclusively relies on a modified version of a Variational Autoencoder. This modification consists of the use of two latent v… ▽ More

    Submitted 18 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  6. arXiv:2401.08268  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    An Explainable Proxy Model for Multiabel Audio Segmentation

    Authors: Théo Mariotte, Antonio Almudévar, Marie Tahon, Alfonso Ortega

    Abstract: Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision-making with machine learning. In this paper, we propose an explainable multilabel segmentation model that solves speech activity (SAD), music (MD), noise (ND),… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

    Report number: AA001

  7. arXiv:2307.13012  [pdf, other

    cs.SD cs.AI cs.NE eess.AS eess.SP

    Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

    Authors: Martin Lebourdais, Théo Mariotte, Marie Tahon, Anthony Larcher, Antoine Laurent, Silvio Montresor, Sylvain Meignier, Jean-Hugh Thomas

    Abstract: Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking inf… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  8. arXiv:2306.04268  [pdf, other

    cs.SD cs.CL eess.AS

    Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features

    Authors: Théo Mariotte, Anthony Larcher, Silvio Montrésor, Jean-Hugh Thomas

    Abstract: Speaker diarization is the task of answering Who spoke and when? in an audio stream. Pipeline systems rely on speech segmentation to extract speakers' segments and achieve robust speaker diarization. This paper proposes a common framework to solve three segmentation tasks in the distant speech scenario: Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), and Speaker Change Detection… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Interspeech 2023, international Speech Communication Association (ISCA), Aug 2023, Dublin, Ireland