Skip to main content

Showing 1–27 of 27 results for author: Tomashenko, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02677  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2024 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

    Abstract: The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Part… ▽ More

    Submitted 12 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 19 pages, https://www.voiceprivacychallenge.org/. arXiv admin note: substantial text overlap with arXiv:2203.12468

  2. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  3. arXiv:2305.18823  [pdf, other

    cs.SD eess.AS

    Speaker anonymization using orthogonal Householder neural network

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker… ▽ More

    Submitted 12 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  4. arXiv:2302.10790  [pdf, other

    eess.AS cs.LG cs.SD

    Federated Learning for ASR based on Wav2vec 2.0

    Authors: Tuan Nguyen, Salima Mdhaffar, Natalia Tomashenko, Jean-François Bonastre, Yannick Estève

    Abstract: This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision. Carried out on the well-known TED-LIUM 3 dataset, our experiments show that such a model can obtain, with no use of a language model, a word error rate of 10.92% on the official TED-LIUM 3 test set, without sharing any data from the different users. We al… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: 5 pages, accepted in ICASSP 2023

  5. arXiv:2205.07123  [pdf, other

    cs.CL cs.CR eess.AS

    The VoicePrivacy 2020 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

    Abstract: The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this document, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used f… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2203.12468

  6. arXiv:2204.01397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

    Authors: Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

    Abstract: Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to und… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

  7. arXiv:2203.14834  [pdf, other

    cs.SD

    Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study ana… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Submit to Interspeech2022

  8. arXiv:2203.12468  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2022 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

    Abstract: For new participants - Executive summary: (1) The task is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content, paralinguistic attributes, intelligibility and naturalness. (2) Training, development and evaluation datasets are provided in addition to 3 different baseline anonymization systems, evaluation scripts, and… ▽ More

    Submitted 28 September, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: the file is unchanged; minor correction in metadata

  9. arXiv:2202.13097  [pdf, ps, other

    cs.SD eess.AS

    Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to protect the privacy of speakers while preserving spoken linguistic information from speech. Current mainstream neural network speaker anonymization systems are complicated, containing an F0 extractor, speaker encoder, automatic speech recognition acoustic model (ASR AM), speech synthesis acoustic model and speech waveform generation model. Moreover, as an ASR AM is la… ▽ More

    Submitted 27 April, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

  10. arXiv:2111.04194  [pdf, other

    cs.CL cs.SD eess.AS

    Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

    Authors: Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

    Abstract: The widespread of powerful personal devices capable of collecting voice of their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or to participate to collaborative learning of ASR. In both cases, personalized acoustic models (AM), i.e. fine-tuned AM with specific speaker data, can be built. A question that naturally arises is whether the dissemination of p… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  11. arXiv:2111.03777  [pdf, other

    cs.CL cs.CR cs.SD eess.AS

    Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

    Authors: Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

    Abstract: This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an a… ▽ More

    Submitted 14 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Submitted to ICASSP 2022

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6972-6976

  12. arXiv:2109.00648  [pdf, other

    cs.CL cs.SD eess.AS

    The VoicePrivacy 2020 Challenge: Results and findings

    Authors: Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche

    Abstract: This paper presents the results and analyses stemming from the first VoicePrivacy 2020 Challenge which focuses on develo** anonymization solutions for speech technology. We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results. In particular, we describe the voice anonymization task and datasets used for system development and evaluati… ▽ More

    Submitted 26 September, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: Submitted to the Special Issue on Voice Privacy (Computer Speech and Language Journal - Elsevier); under review

  13. arXiv:2109.00281  [pdf, other

    cs.CR cs.SD eess.AS

    Benchmarking and challenges in security and privacy for voice biometrics

    Authors: Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier Noe, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi

    Abstract: For many decades, research in speech technologies has focused upon improving reliability. With this now meeting user expectations for a range of diverse applications, speech technology is today omni-present. As result, a focus on security and privacy has now come to the fore. Here, the research effort is in its relative infancy and progress calls for greater, multidisciplinary collaboration with s… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Comments: Submitted to the symposium of the ISCA Security & Privacy in Speech Communications (SPSC) special interest group

  14. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  15. arXiv:2011.01130  [pdf, other

    eess.AS cs.CL

    Speaker anonymisation using the McAdams coefficient

    Authors: Jose Patino, Natalia Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans

    Abstract: Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness. This paper reports an approach to anonymisation that, unlike other current approaches, requires no training data, is based upon well-known signal processing… ▽ More

    Submitted 1 September, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted at INTERSPEECH 2021

  16. arXiv:2008.13144  [pdf, other

    eess.AS cs.CR

    Speech Pseudonymisation Assessment Using Voice Similarity Matrices

    Authors: Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia Tomashenko, Andreas Nautsch, Nicholas Evans

    Abstract: The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications. These are essential since speech signals convey a wealth of rich, personal and potentially sensitive information. Anonymisation, the focus of the recent VoicePrivacy initiative, is one strategy to protect speaker identity information. Pseudony… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

    Comments: Interspeech 2020

  17. arXiv:2005.11861  [pdf, other

    cs.CL eess.AS

    ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

    Authors: Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  18. The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment

    Authors: Andreas Nautsch, Jose Patino, Natalia Tomashenko, Junichi Yamagishi, Paul-Gauthier Noe, Jean-Francois Bonastre, Massimiliano Todisco, Nicholas Evans

    Abstract: Mounting privacy legislation calls for the preservation of privacy in speech technology, though solutions are gravely lacking. While evaluation campaigns are long-proven tools to drive progress, the need to consider a privacy adversary implies that traditional approaches to evaluation must be adapted to the assessment of privacy and privacy preservation solutions. This paper presents the first ste… ▽ More

    Submitted 20 May, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: submitted to Interspeech 2020

    Journal ref: Proc Interspeech 2020

  19. arXiv:2005.08601  [pdf, other

    eess.AS cs.CL

    Design Choices for X-vector Based Speaker Anonymization

    Authors: Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi

    Abstract: The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker. In this paper, we present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender sel… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

  20. Introducing the VoicePrivacy Initiative

    Authors: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

    Abstract: The VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges. In this paper, we formulate the voice anonymization task selected for the VoicePrivacy 2020 Challenge and describe the datasets used for… ▽ More

    Submitted 11 August, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020

  21. arXiv:2003.06894  [pdf, other

    eess.AS cs.CL cs.SD

    Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

    Authors: Natalia Tomashenko, Yuri Khokhlov, Yannick Esteve

    Abstract: In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC featu… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

    Comments: 36 pages; originally was submitted to CSL in February 2017

  22. Dialogue history integration into end-to-end signal-to-concept spoken language understanding systems

    Authors: Natalia Tomashenko, Christian Raymond, Antoine Caubriere, Renato De Mori, Yannick Esteve

    Abstract: This work investigates the embeddings for representing dialog history in spoken language understanding (SLU) systems. We focus on the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. We proposed to integrate dialogue history into an end-to-end signal-to-concept SLU system. The dialog history is represented in… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted for ICASSP 2020 (Submitted: October 21, 2019)

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  23. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  24. Recent Advances in End-to-End Spoken Language Understanding

    Authors: Natalia Tomashenko, Antoine Caubriere, Yannick Esteve, Antoine Laurent, Emmanuel Morin

    Abstract: This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the model performance, we explore various techniques inclu… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Journal ref: Statistical Language and Speech Processing. SLSP 2019

  25. arXiv:1906.07601  [pdf, other

    cs.CL cs.SD eess.AS

    Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability

    Authors: Antoine Caubrière, Natalia Tomashenko, Antoine Laurent, Emmanuel Morin, Nathalie Camelin, Yannick Estève

    Abstract: We present an end-to-end approach to extract semantic concepts directly from the speech audio signal. To overcome the lack of data available for this spoken language understanding approach, we investigate the use of a transfer learning strategy based on the principles of curriculum learning. This approach allows us to exploit out-of-domain data that can help to prepare a fully neural architecture.… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to the INTERSPEECH 2019 conference. Submitted on March 29, 2019 (Paper submission deadline)

  26. TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

    Authors: François Hernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, Yannick Estève

    Abstract: In this paper, we present TED-LIUM release 3 corpus dedicated to speech recognition in English, that multiplies by more than two the available data to train acoustic models in comparison with TED-LIUM 2. We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We demonstrate that, passing… ▽ More

    Submitted 13 June, 2019; v1 submitted 12 May, 2018; originally announced May 2018.

    Comments: Submitted to SPECOM 2018, 20th International Conference on Speech and Computer; TED-LIUM 3 corpus available on https://lium.univ-lemans.fr/en/ted-lium3/

    ACM Class: I.2.7

    Journal ref: SPECOM 2018. Lecture Notes in Computer Science, vol 11096, pp 198-208

  27. Fast and Accurate OOV Decoder on High-Level Features

    Authors: Yuri Khokhlov, Natalia Tomashenko, Ivan Medennikov, Alexei Romanenko

    Abstract: This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for decoding. These features are obtained by calculating time-dependent phoneme posterior probabilities from word lattices, followed by their smoothi… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: Interspeech 2017, August 2017, Stockholm, Sweden. 2017