Skip to main content

Showing 1–10 of 10 results for author: Stan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.05384  [pdf, other

    eess.AS cs.SD

    Towards generalisable and calibrated synthetic speech detection with self-supervised representations

    Authors: Octavian Pascu, Adriana Stan, Dan Oneata, Elisabeta Oneata, Horia Cucu

    Abstract: Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deepfake detectors. However, recent studies have shown that the current audio deepfake models fall short of this desideratum. In this work we investigate the potential of pretrained self-supervised representations in building general and calibrated audio deepfake detection models. We show th… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2307.09898  [pdf, other

    eess.AS cs.AI

    An analysis on the effects of speaker embedding choice in non auto-regressive TTS

    Authors: Adriana Stan, Johannah O'Mahony

    Abstract: In this paper we introduce a first attempt on understanding how a non-autoregressive factorised multi-speaker speech synthesis architecture exploits the information present in different speaker embedding sets. We analyse if jointly learning the representations, and initialising them from pretrained models determine any quality improvements for target speaker identities. In a separate analysis, we… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted for publication at ISCA Speech Synthesis Workshop 2023

  3. SimpLex: a lexical text simplification architecture

    Authors: Ciprian-Octavian Truică, Andrei-Ionut Stan, Elena-Simona Apostol

    Abstract: Text simplification (TS) is the process of generating easy-to-understand sentences from a given sentence or piece of text. The aim of TS is to reduce both the lexical (which refers to vocabulary complexity and meaning) and syntactic (which refers to the sentence structure) complexity of a given text or sentence without the loss of meaning or nuance. In this paper, we present \textsc{SimpLex}, a no… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Journal ref: Neural Computing and Applications, 35(8):6265-6280, 2023

  4. arXiv:2302.02742  [pdf, other

    eess.AS cs.AI cs.SD

    Residual Information in Deep Speaker Embedding Architectures

    Authors: Adriana Stan

    Abstract: Speaker embeddings represent a means to extract representative vectorial representations from a speech signal such that the representation pertains to the speaker identity alone. The embeddings are commonly used to classify and discriminate between different speakers. However, there is no objective measure to evaluate the ability of a speaker embedding to disentangle the speaker identity from the… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Journal ref: Mathematics 2022, 10(21), 3927

  5. arXiv:2206.03206  [pdf, other

    eess.AS cs.AI eess.IV

    FlexLip: A Controllable Text-to-Lip System

    Authors: Dan Oneata, Beata Lorincz, Adriana Stan, Horia Cucu

    Abstract: The task of converting text input into video content is becoming an important topic for synthetic media generation. Several methods have been proposed with some of them reaching close-to-natural performances in constrained tasks. In this paper, we tackle a subissue of the text-to-video generation problem, by converting the text into lip landmarks. However, we do this using a modular, controllable… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: 16 pages, 4 tables, 4 figures

    Journal ref: Sensors. 2022; 22(11):4104

  6. arXiv:2106.01812  [pdf, other

    eess.AS cs.SD

    An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis

    Authors: Beata Lorincz, Adriana Stan, Mircea Giurgiu

    Abstract: Multi-speaker spoken datasets enable the creation of text-to-speech synthesis (TTS) systems which can output several voice identities. The multi-speaker (MSPK) scenario also enables the use of fewer training samples per speaker. However, in the resulting acoustic model, not all speakers exhibit the same synthetic quality, and some of the voice identities cannot be used at all. In this paper we e… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: Accepted at 25th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2021)

  7. arXiv:2106.01789  [pdf, other

    eess.AS cs.SD

    Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

    Authors: Beata Lorincz, Adriana Stan, Mircea Giurgiu

    Abstract: Building multispeaker neural network-based text-to-speech synthesis systems commonly relies on the availability of large amounts of high quality recordings from each speaker and conditioning the training process on the speaker's identity or on a learned representation of it. However, when little data is available from each speaker, or the number of speakers is limited, the multispeaker TTS can be… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: Accepted at EUSIPCO 2021

  8. arXiv:2105.09652  [pdf, other

    eess.AS cs.SD eess.IV

    Speaker disentanglement in video-to-speech conversion

    Authors: Dan Oneata, Adriana Stan, Horia Cucu

    Abstract: The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal. Previous approaches to this task are generally limited to the case of a single speaker, but a method that accounts for multiple speakers is desirable as it allows to i) leverage datasets with multiple speakers or few samples per speaker; and ii) control speaker identity at inference time.… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: To appear in Proc of EUSIPCO 2021

  9. arXiv:2101.05525  [pdf, other

    eess.AS cs.CL cs.SD

    An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

    Authors: Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu

    Abstract: Quantifying the confidence (or conversely the uncertainty) of a prediction is a highly desirable trait of an automatic system, as it improves the robustness and usefulness in downstream tasks. In this paper we investigate confidence estimation for end-to-end automatic speech recognition (ASR). Previous work has addressed confidence measures for lattice-based ASR, while current machine learning res… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: Accepted at SLT 2021

  10. arXiv:2009.05493  [pdf, other

    eess.AS cs.CL cs.SD

    RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications

    Authors: Adriana Stan

    Abstract: Deep learning enables the development of efficient end-to-end speech processing applications while bypassing the need for expert linguistic and signal processing features. Yet, recent studies show that good quality speech resources and phonetic transcription of the training data can enhance the results of these applications. In this paper, the RECOApy tool is introduced. RECOApy streamlines the st… ▽ More

    Submitted 15 September, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: Accepted for publication at Interspeech 2020