Skip to main content

Showing 1–36 of 36 results for author: Dutoit, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.02124  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer

    Authors: Noé Tits, Prernna Bhatnagar, Thierry Dutoit

    Abstract: In this paper, we present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (wav2vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained thanks to f… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2210.16984  [pdf, other

    cs.SD cs.LG eess.AS

    Synthesizer Preset Interpolation using Transformer Auto-Encoders

    Authors: Gwendal Le Vaillant, Thierry Dutoit

    Abstract: Sound synthesizers are widespread in modern music production but they increasingly require expert skills to be mastered. This work focuses on interpolation between presets, i.e., sets of values of all sound synthesis parameters, to enable the intuitive creation of new sounds from existing ones. We introduce a bimodal auto-encoder neural network, which simultaneously processes presets using multi… ▽ More

    Submitted 9 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE ICASSP 2023

  3. arXiv:2209.15085  [pdf, other

    eess.SP

    Cardiotocography Signal Abnormality Detection based on Deep Unsupervised Models

    Authors: Julien Bertieaux, Mohammadhadi Shateri, Fabrice Labeau, Thierry Dutoit

    Abstract: Cardiotocography (CTG) is a key element when it comes to monitoring fetal well-being. Obstetricians use it to observe the fetal heart rate (FHR) and the uterine contraction (UC). The goal is to determine how the fetus reacts to the contraction and whether it is receiving adequate oxygen. If a problem occurs, the physician can then respond with an intervention. Unfortunately, the interpretation of… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  4. arXiv:2204.07162  [pdf, ps, other

    q-bio.NC cs.AI cs.LG eess.SP

    Spatio-Temporal Analysis of Transformer based Architecture for Attention Estimation from EEG

    Authors: Victor Delvigne, Hazem Wannous, Jean-Philippe Vandeborre, Laurence Ris, Thierry Dutoit

    Abstract: For many years now, understanding the brain mechanism has been a great research subject in many different fields. Brain signal processing and especially electroencephalogram (EEG) has recently known a growing interest both in academia and industry. One of the main examples is the increasing number of Brain-Computer Interfaces (BCI) aiming to link brains and computers. In this paper, we present a n… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

  5. arXiv:2201.03902  [pdf, other

    cs.CV cs.AI eess.SP q-bio.NC

    Where Is My Mind (looking at)? Predicting Visual Attention from Brain Activity

    Authors: Victor Delvigne, Noé Tits, Luca La Fisca, Nathan Hubens, Antoine Maiorca, Hazem Wannous, Thierry Dutoit, Jean-Philippe Vandeborre

    Abstract: Visual attention estimation is an active field of research at the crossroads of different disciplines: computer vision, artificial intelligence and medicine. One of the most common approaches to estimate a saliency map representing attention is based on the observed images. In this paper, we show that visual attention can be retrieved from EEG acquisition. The results are comparable to traditional… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

  6. arXiv:2103.04097  [pdf, other

    cs.SD cs.AI cs.CL cs.HC eess.AS

    Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of cor… ▽ More

    Submitted 6 March, 2021; originally announced March 2021.

  7. arXiv:2008.11045  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ICE-Talk: an Interface for a Controllable Expressive Talking Machine

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: ICE-Talk is an open source web-based GUI that allows the use of a TTS system with controllable parameters via a text field and a clickable 2D plot. It enables the study of latent spaces for controllable TTS. Moreover it is implemented as a module that can be used as part of a Human-Agent interaction.

    Submitted 25 August, 2020; originally announced August 2020.

  8. arXiv:2008.09483  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listeni… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

  9. arXiv:2006.04142  [pdf, other

    eess.AS cs.CL cs.SD

    Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

    Authors: Onur Babacan, Thomas Drugman, Tuomo Raitio, Daniel Erro, Thierry Dutoit

    Abstract: Various parametric representations have been proposed to model the speech signal. While the performance of such vocoders is well-known in the context of speech processing, their extrapolation to singing voice synthesis might not be straightforward. The goal of this paper is twofold. First, a comparative subjective evaluation is performed across four existing techniques suitable for statistical par… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  10. arXiv:2006.04136  [pdf, ps, other

    eess.AS cs.CL

    Analysis and Synthesis of Hypo and Hyperarticulated Speech

    Authors: Benjamin Picart, Thomas Drugman, Thierry Dutoit

    Abstract: This paper focuses on the analysis and synthesis of hypo and hyperarticulated speech in the framework of HMM-based speech synthesis. First of all, a new French database matching our needs was created, which contains three identical sets, pronounced with three different degrees of articulation: neutral, hypo and hyperarticulated speech. On that basis, acoustic and phonetic analyses were performed.… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  11. arXiv:2005.11682  [pdf, other

    eess.AS cs.CL cs.SD

    Glottal source estimation robustness: A comparison of sensitivity of voice source estimation techniques

    Authors: Thomas Drugman, Thomas Dubuisson, Alexis Moinet, Nicolas D'Alessandro, Thierry Dutoit

    Abstract: This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decompositi… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  12. arXiv:2005.07901  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Oscillating Statistical Moments for Speech Polarity Detection

    Authors: Thomas Drugman, Thierry Dutoit

    Abstract: An inversion of the speech polarity may have a dramatic detrimental effect on the performance of various techniques of speech processing. An automatic method for determining the speech polarity (which is dependent upon the recording setup) is thus required as a preliminary step for ensuring the well-behaviour of such techniques. This paper proposes a new approach of polarity detection relying on o… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  13. arXiv:2005.07897  [pdf, other

    cs.SD cs.CL eess.AS

    Glottal Source Estimation using an Automatic Chirp Decomposition

    Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit

    Abstract: In a previous work, we showed that the glottal source can be estimated from speech signals by computing the Zeros of the Z-Transform (ZZT). Decomposition was achieved by separating the roots inside (causal contribution) and outside (anticausal contribution) the unit circle. In order to guarantee a correct deconvolution, time alignment on the Glottal Closure Instants (GCIs) was shown to be essentia… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  14. arXiv:2005.05313  [pdf, other

    eess.AS cs.SD

    Audio and Contact Microphones for Cough Detection

    Authors: Thomas Drugman, Jerome Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit

    Abstract: In the framework of assessing the pathology severity in chronic cough diseases, medical literature underlines the lack of tools for allowing the automatic, objective and reliable detection of cough events. This paper describes a system based on two microphones which we developed for this purpose. The proposed approach relies on a large variety of audio descriptors, an efficient algorithm of featur… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:2001.00537

  15. arXiv:2005.04724  [pdf, other

    cs.SD cs.CL eess.AS

    Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal Analysis

    Authors: Thomas Drugman, Thierry Dutoit

    Abstract: It was recently shown that complex cepstrum can be effectively used for glottal flow estimation by separating the causal and anticausal components of speech. In order to guarantee a correct estimation, some constraints on the window have been derived. Among these, the window has to be synchronized on a Glottal Closure Instant. This paper proposes an extension of the complex cepstrum-based decompos… ▽ More

    Submitted 10 May, 2020; originally announced May 2020.

  16. arXiv:2001.01000  [pdf, ps, other

    cs.SD cs.CL eess.AS

    The Deterministic plus Stochastic Model of the Residual Signal and its Applications

    Authors: Thomas Drugman, Thierry Dutoit

    Abstract: The modeling of speech production often relies on a source-filter approach. Although methods parameterizing the filter have nowadays reached a certain maturity, there is still a lot to be gained for several speech processing applications in finding an appropriate excitation model. This manuscript presents a Deterministic plus Stochastic Model (DSM) of the residual signal. The DSM consists of two c… ▽ More

    Submitted 29 December, 2019; originally announced January 2020.

  17. arXiv:2001.00842  [pdf, other

    cs.SD cs.CL eess.AS

    A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis

    Authors: Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

    Abstract: Speech generated by parametric synthesizers generally suffers from a typical buzziness, similar to what was encountered in old LPC-like vocoders. In order to alleviate this problem, a more suited modeling of the excitation should be adopted. For this, we hereby propose an adaptation of the Deterministic plus Stochastic Model (DSM) for the residual. In this model, the excitation is divided into two… ▽ More

    Submitted 29 December, 2019; originally announced January 2020.

  18. arXiv:2001.00841  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Glottal Closure and Opening Instant Detection from Speech Signals

    Authors: Thomas Drugman, Thierry Dutoit

    Abstract: This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms. The procedure is divided into two successive steps. First a mean-based signal is computed, and intervals where speech events are expected to occur are extracted from it. Secondly, at each interval a precise position of the speech event is assigned by locating a discont… ▽ More

    Submitted 28 December, 2019; originally announced January 2020.

  19. arXiv:2001.00840  [pdf, other

    cs.SD cs.CL eess.AS

    A Comparative Study of Glottal Source Estimation Techniques

    Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit

    Abstract: Source-tract decomposition (or glottal flow estimation) is one of the basic problems of speech processing. For this, several techniques have been proposed in the literature. However studies comparing different approaches are almost nonexistent. Besides, experiments have been systematically performed either on synthetic speech or on sustained vowels. In this study we compare three of the main repre… ▽ More

    Submitted 28 December, 2019; originally announced January 2020.

  20. arXiv:2001.00583  [pdf, other

    cs.SD cs.CL eess.AS

    On the Mutual Information between Source and Filter Contributions for Voice Pathology Detection

    Authors: Thomas Drugman, Thomas Dubuisson, Thierry Dutoit

    Abstract: This paper addresses the problem of automatic detection of voice pathologies directly from the speech signal. For this, we investigate the use of the glottal source estimation as a means to detect voice disorders. Three sets of features are proposed, depending on whether they are related to the speech or the glottal signal, or to prosody. The relevancy of these features is assessed through mutual… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

  21. arXiv:2001.00582  [pdf, other

    cs.SD cs.CL eess.AS

    Excitation-based Voice Quality Analysis and Modification

    Authors: Thomas Drugman, Thierry Dutoit, Baris Bozkurt

    Abstract: This paper investigates the differences occuring in the excitation for different voice qualities. Its goal is two-fold. First a large corpus containing three voice qualities (modal, soft and loud) uttered by the same speaker is analyzed and significant differences in characteristics extracted from the excitation are observed. Secondly rules of modification derived from the analysis are used to bui… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

  22. arXiv:2001.00581  [pdf, other

    cs.SD cs.CL eess.AS

    Eigenresiduals for improved Parametric Speech Synthesis

    Authors: Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

    Abstract: Statistical parametric speech synthesizers have recently shown their ability to produce natural-sounding and flexible voices. Unfortunately the delivered quality suffers from a typical buzziness due to the fact that speech is vocoded. This paper proposes a new excitation model in order to reduce this undesirable effect. This model is based on the decomposition of pitch-synchronous residual frames… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

  23. arXiv:2001.00580  [pdf, ps, other

    cs.SD cs.HC eess.AS

    Assessment of Audio Features for Automatic Cough Detection

    Authors: Thomas Drugman, Jerome Urbain, Thierry Dutoit

    Abstract: This paper addresses the issue of cough detection using only audio recordings, with the ultimate goal of quantifying and qualifying the degree of pathology for patients suffering from respiratory diseases, notably mucoviscidosis. A large set of audio features describing various aspects of the audio signal is proposed. These features are assessed in two steps. First, their intrisic potential and re… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

  24. arXiv:2001.00579  [pdf, other

    cs.SD cs.CL eess.AS

    A Comparative Evaluation of Pitch Modification Techniques

    Authors: Thomas Drugman, Thierry Dutoit

    Abstract: This paper addresses the problem of pitch modification, as an important module for an efficient voice transformation system. The Deterministic plus Stochastic Model of the residual signal we proposed in a previous work is compared to TDPSOLA, HNM and STRAIGHT. The four methods are compared through an important subjective test. The influence of the speaker gender and of the pitch modification ratio… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

  25. arXiv:2001.00537  [pdf, other

    physics.med-ph cs.SD eess.AS

    Objective Study of Sensor Relevance for Automatic Cough Detection

    Authors: Thomas Drugman, Jerome Urbain, Nathalie Bauwens, Ricardo Chessini, Carlos Valderrama, Patrick Lebecque, Thierry Dutoit

    Abstract: The development of a system for the automatic, objective and reliable detection of cough events is a need underlined by the medical literature for years. The benefit of such a tool is clear as it would allow the assessment of pathology severity in chronic cough diseases. Even though some approaches have recently reported solutions achieving this task with a relative success, there is still no stan… ▽ More

    Submitted 30 December, 2019; originally announced January 2020.

  26. arXiv:2001.00473  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review

    Authors: Thomas Drugman, Mark Thomas, Jon Gudnason, Patrick Naylor, Thierry Dutoit

    Abstract: The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six dif… ▽ More

    Submitted 28 December, 2019; originally announced January 2020.

  27. arXiv:2001.00372  [pdf, other

    cs.SD cs.CL eess.AS

    Phase-based Information for Voice Pathology Detection

    Authors: Thomas Drugman, Thomas Dubuisson, Thierry Dutoit

    Abstract: In most current approaches of speech processing, information is extracted from the magnitude spectrum. However recent perceptual studies have underlined the importance of the phase component. The goal of this paper is to investigate the potential of using phase-based features for automatically detecting voice disorders. It is shown that group delay functions are appropriate for characterizing irre… ▽ More

    Submitted 2 January, 2020; originally announced January 2020.

  28. arXiv:1912.12887  [pdf, other

    cs.SD cs.CL eess.AS

    Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

    Authors: Thomas Drugman, Alexis Moinet, Thierry Dutoit, Geoffrey Wilfart

    Abstract: This paper proposes a method to improve the quality delivered by statistical parametric speech synthesizers. For this, we use a codebook of pitch-synchronous residual frames, so as to construct a more realistic source signal. First a limited codebook of typical excitations is built from some training database. During the synthesis part, HMMs are used to generate filter and source coefficients. The… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  29. arXiv:1912.12843  [pdf, other

    cs.SD cs.CL eess.AS

    Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

    Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit

    Abstract: Complex cepstrum is known in the literature for linearly separating causal and anticausal components. Relying on advances achieved by the Zeros of the Z-Transform (ZZT) technique, we here investigate the possibility of using complex cepstrum for glottal flow estimation on a large-scale database. Via a systematic study of the windowing effects on the deconvolution quality, we show that the complex… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  30. arXiv:1912.12609  [pdf, other

    cs.SD eess.AS

    A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds

    Authors: Onur Babacan, Thomas Drugman, Nicolas d'Alessandro, Nathalie Henrich, Thierry Dutoit

    Abstract: The problem of pitch tracking has been extensively studied in the speech research community. The goal of this paper is to investigate how these techniques should be adapted to singing voice analysis, and to provide a comparative evaluation of the most representative state-of-the-art approaches. This study is carried out on a large database of annotated singing sounds with aligned EGG recordings, c… ▽ More

    Submitted 29 December, 2019; originally announced December 2019.

  31. arXiv:1912.12602  [pdf, other

    cs.SD cs.CL eess.AS

    Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation

    Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit

    Abstract: Homomorphic analysis is a well-known method for the separation of non-linearly combined signals. More particularly, the use of complex cepstrum for source-tract deconvolution has been discussed in various articles. However there exists no study which proposes a glottal flow estimation methodology based on cepstrum and reports effective results. In this paper, we show that complex cepstrum can be e… ▽ More

    Submitted 29 December, 2019; originally announced December 2019.

  32. arXiv:1910.06234  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, psychology. In this Chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, throug… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 19 pages, 6 figures. To be published in the book "Human Computer Interaction" edited by Prof. Yves Rybarczyk, published by IntechOpen

  33. arXiv:1903.11570  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

    Authors: Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, Thierry Dutoit

    Abstract: The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness, allowing to generate speech in different styles or manners, has attracted increasing attention lately. Systems able to control style have been developed and show impre… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

  34. arXiv:1901.04276  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring Transfer Learning for Low Resource Emotional TTS

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how… ▽ More

    Submitted 14 January, 2019; originally announced January 2019.

    Comments: Accepted at IntelliSys 2019

  35. arXiv:1806.09514  [pdf, ps, other

    cs.CL cs.AI eess.AS

    The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

    Authors: Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas, Thierry Dutoit

    Abstract: In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous w… ▽ More

    Submitted 25 June, 2018; originally announced June 2018.

    Comments: Submitted to SLSP 2018

  36. arXiv:1805.09197  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    ASR-based Features for Emotion Recognition: A Transfer Learning Approach

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: During the last decade, the applications of signal processing have drastically improved with deep learning. However areas of affecting computing such as emotional speech synthesis or emotion recognition from spoken language remains challenging. In this paper, we investigate the use of a neural Automatic Speech Recognition (ASR) as a feature extractor for emotion recognition. We show that these fea… ▽ More

    Submitted 1 June, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: Accepted to be published in the First Workshop on Computational Modeling of Human Multimodal Language - ACL 2018