Skip to main content

Showing 1–13 of 13 results for author: Tits, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.02124  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer

    Authors: Noé Tits, Prernna Bhatnagar, Thierry Dutoit

    Abstract: In this paper, we present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (wav2vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained thanks to f… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2310.11541  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

    Authors: Noé Tits

    Abstract: In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified aut… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at EMNLP 2023

  3. arXiv:2307.02051  [pdf, other

    eess.AS cs.AI cs.CL cs.HC cs.SD

    Flowchase: a Mobile Application for Pronunciation Training

    Authors: Noé Tits, Zoé Broisson

    Abstract: In this paper, we present a solution for providing personalized and instant feedback to English learners through a mobile application, called Flowchase, that is connected to a speech technology able to segment and analyze speech segmental and supra-segmental features. The speech processing pipeline receives linguistic information corresponding to an utterance to analyze along with a speech sample.… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023 - SLaTE workshop 2023 (Speech and Language Technology in Education)

  4. arXiv:2201.03902  [pdf, other

    cs.CV cs.AI eess.SP q-bio.NC

    Where Is My Mind (looking at)? Predicting Visual Attention from Brain Activity

    Authors: Victor Delvigne, Noé Tits, Luca La Fisca, Nathan Hubens, Antoine Maiorca, Hazem Wannous, Thierry Dutoit, Jean-Philippe Vandeborre

    Abstract: Visual attention estimation is an active field of research at the crossroads of different disciplines: computer vision, artificial intelligence and medicine. One of the most common approaches to estimate a saliency map representing attention is based on the observed images. In this paper, we show that visual attention can be retrieved from EEG acquisition. The results are comparable to traditional… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

  5. arXiv:2103.04097  [pdf, other

    cs.SD cs.AI cs.CL cs.HC eess.AS

    Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of cor… ▽ More

    Submitted 6 March, 2021; originally announced March 2021.

  6. arXiv:2008.11045  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ICE-Talk: an Interface for a Controllable Expressive Talking Machine

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: ICE-Talk is an open source web-based GUI that allows the use of a TTS system with controllable parameters via a text field and a clickable 2D plot. It enables the study of latent spaces for controllable TTS. Moreover it is implemented as a module that can be used as part of a Human-Agent interaction.

    Submitted 25 August, 2020; originally announced August 2020.

  7. arXiv:2008.09483  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: Despite the growing interest for expressive speech synthesis, synthesis of nonverbal expressions is an under-explored area. In this paper we propose an audio laughter synthesis system based on a sequence-to-sequence TTS synthesis system. We leverage transfer learning by training a deep learning model to learn to generate both speech and laughs from annotations. We evaluate our model with a listeni… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

  8. arXiv:1910.06234  [pdf, other

    eess.AS cs.HC cs.LG cs.SD

    The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, psychology. In this Chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, throug… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    Comments: 19 pages, 6 figures. To be published in the book "Human Computer Interaction" edited by Prof. Yves Rybarczyk, published by IntechOpen

  9. arXiv:1907.02784  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach

    Authors: Noé Tits

    Abstract: In this project, we aim to build a Text-to-Speech system able to produce speech with a controllable emotional expressiveness. We propose a methodology for solving this problem in three main steps. The first is the collection of emotional speech data. We discuss the various formats of existing datasets and their usability in speech generation. The second step is the development of a system to autom… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  10. arXiv:1903.11570  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

    Authors: Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, Thierry Dutoit

    Abstract: The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques. Producing realistic speech becomes possible now. As a consequence, the research on the control of the expressiveness, allowing to generate speech in different styles or manners, has attracted increasing attention lately. Systems able to control style have been developed and show impre… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

  11. arXiv:1901.04276  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring Transfer Learning for Low Resource Emotional TTS

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: During the last few years, spoken language technologies have known a big improvement thanks to Deep Learning. However Deep Learning-based algorithms require amounts of data that are often difficult and costly to gather. Particularly, modeling the variability in speech of different speakers, different styles or different emotions with few data remains challenging. In this paper, we investigate how… ▽ More

    Submitted 14 January, 2019; originally announced January 2019.

    Comments: Accepted at IntelliSys 2019

  12. arXiv:1806.09514  [pdf, ps, other

    cs.CL cs.AI eess.AS

    The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

    Authors: Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas, Thierry Dutoit

    Abstract: In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose. It contains data for male and female actors in English and a male actor in French. The database covers 5 emotion classes so it could be suitable to build synthesis and voice transformation systems with the potential to control the emotional dimension in a continuous w… ▽ More

    Submitted 25 June, 2018; originally announced June 2018.

    Comments: Submitted to SLSP 2018

  13. arXiv:1805.09197  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    ASR-based Features for Emotion Recognition: A Transfer Learning Approach

    Authors: Noé Tits, Kevin El Haddad, Thierry Dutoit

    Abstract: During the last decade, the applications of signal processing have drastically improved with deep learning. However areas of affecting computing such as emotional speech synthesis or emotion recognition from spoken language remains challenging. In this paper, we investigate the use of a neural Automatic Speech Recognition (ASR) as a feature extractor for emotion recognition. We show that these fea… ▽ More

    Submitted 1 June, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: Accepted to be published in the First Workshop on Computational Modeling of Human Multimodal Language - ACL 2018