Skip to main content

Showing 1–12 of 12 results for author: Cucu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.05384  [pdf, other

    eess.AS cs.SD

    Towards generalisable and calibrated synthetic speech detection with self-supervised representations

    Authors: Octavian Pascu, Adriana Stan, Dan Oneata, Elisabeta Oneata, Horia Cucu

    Abstract: Generalisation -- the ability of a model to perform well on unseen data -- is crucial for building reliable deepfake detectors. However, recent studies have shown that the current audio deepfake models fall short of this desideratum. In this work we investigate the potential of pretrained self-supervised representations in building general and calibrated audio deepfake detection models. We show th… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2307.13008  [pdf

    eess.AS cs.AI

    Adaptation of Whisper models to child speech recognition

    Authors: Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Peter Corcoran, Horia Cucu

    Abstract: Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child spee… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted in Interspeech 2023

  3. arXiv:2306.13484  [pdf

    cs.AI cs.LG

    Adaptive Planning Search Algorithm for Analog Circuit Verification

    Authors: Cristian Manolache, Cristina Andronache, Alexandru Caranica, Horia Cucu, Andi Buzo, Cristian Diaconu, Georg Pelz

    Abstract: Integrated circuit verification has gathered considerable interest in recent times. Since these circuits keep growing in complexity year by year, pre-Silicon (pre-SI) verification becomes ever more important, in order to ensure proper functionality. Thus, in order to reduce the time needed for manually verifying ICs, we propose a machine learning (ML) approach, which uses less simulations. This me… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  4. arXiv:2206.03206  [pdf, other

    eess.AS cs.AI eess.IV

    FlexLip: A Controllable Text-to-Lip System

    Authors: Dan Oneata, Beata Lorincz, Adriana Stan, Horia Cucu

    Abstract: The task of converting text input into video content is becoming an important topic for synthetic media generation. Several methods have been proposed with some of them reaching close-to-natural performances in constrained tasks. In this paper, we tackle a subissue of the text-to-video generation problem, by converting the text into lip landmarks. However, we do this using a modular, controllable… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: 16 pages, 4 tables, 4 figures

    Journal ref: Sensors. 2022; 22(11):4104

  5. arXiv:2206.02391  [pdf, other

    cs.LG

    Automated Circuit Sizing with Multi-objective Optimization based on Differential Evolution and Bayesian Inference

    Authors: Catalin Visan, Octavian Pascu, Marius Stanescu, Elena-Diana Sandru, Cristian Diaconu, Andi Buzo, Georg Pelz, Horia Cucu

    Abstract: With the ever increasing complexity of specifications, manual sizing for analog circuits recently became very challenging. Especially for innovative, large-scale circuits designs, with tens of design variables, operating conditions and conflicting objectives to be optimized, design engineers spend many weeks, running time-consuming simulations, in their attempt at finding the right configuration.… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 48 pages, 13 figures, submitted to Knowledge Based Systems

  6. arXiv:2204.13206  [pdf, other

    cs.SD eess.AS eess.IV

    Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

    Authors: Dan Oneata, Horia Cucu

    Abstract: Multimodal speech recognition aims to improve the performance of automatic speech recognition (ASR) systems by leveraging additional visual information that is usually associated to the audio input. While previous approaches make crucial use of strong visual representations, e.g. by finetuning pretrained image recognition networks, significantly less attention has been paid to its counterpart: the… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Accepted at the Multimodal Learning and Applications Workshop (MULA) from CVPR 2022

  7. arXiv:2204.05419  [pdf

    eess.AS cs.SD

    A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

    Authors: Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu

    Abstract: Despite recent advancements in deep learning technologies, Child Speech Recognition remains a challenging task. Current Automatic Speech Recognition (ASR) models require substantial amounts of annotated data for training, which is scarce. In this work, we explore using the ASR model, wav2vec2, with different pretraining and finetuning configurations for self-supervised learning (SSL) toward improv… ▽ More

    Submitted 11 February, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Preprint, Submitted to IEEE Access

  8. arXiv:2203.11562  [pdf

    cs.SD cs.CL eess.AS

    A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

    Authors: Rishabh Jain, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu

    Abstract: Speech synthesis has come a long way as current text-to-speech (TTS) models can now generate natural human-sounding speech. However, most of the TTS research focuses on using adult speech data and there has been very limited work done on child speech synthesis. This study developed and validated a training pipeline for fine-tuning state-of-the-art (SOTA) neural TTS models using child speech datase… ▽ More

    Submitted 4 April, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: Submitted to IEEE ACCESS

  9. arXiv:2105.09652  [pdf, other

    eess.AS cs.SD eess.IV

    Speaker disentanglement in video-to-speech conversion

    Authors: Dan Oneata, Adriana Stan, Horia Cucu

    Abstract: The task of video-to-speech aims to translate silent video of lip movement to its corresponding audio signal. Previous approaches to this task are generally limited to the case of a single speaker, but a method that accounts for multiple speakers is desirable as it allows to i) leverage datasets with multiple speakers or few samples per speaker; and ii) control speaker identity at inference time.… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: To appear in Proc of EUSIPCO 2021

  10. arXiv:2101.05525  [pdf, other

    eess.AS cs.CL cs.SD

    An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

    Authors: Dan Oneata, Alexandru Caranica, Adriana Stan, Horia Cucu

    Abstract: Quantifying the confidence (or conversely the uncertainty) of a prediction is a highly desirable trait of an automatic system, as it improves the robustness and usefulness in downstream tasks. In this paper we investigate confidence estimation for end-to-end automatic speech recognition (ASR). Previous work has addressed confidence measures for lattice-based ASR, while current machine learning res… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: Accepted at SLT 2021

  11. arXiv:1910.12363  [pdf, other

    cs.CV cs.LG stat.ML

    The Quo Vadis submission at Traffic4cast 2019

    Authors: Dan Oneata, Cosmin George Alexandru, Marius Stanescu, Octavian Pascu, Alexandru Magan, Adrian Postelnicu, Horia Cucu

    Abstract: We describe the submission of the Quo Vadis team to the Traffic4cast competition, which was organized as part of the NeurIPS 2019 series of challenges. Our system consists of a temporal regression module, implemented as $1\times1$ 2d convolutions, augmented with spatio-temporal biases. We have found that using biases is a straightforward and efficient way to include seasonal patterns and to improv… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: Extended abstract for the Traffic4cast competition from NeurIPS 2019

  12. arXiv:1907.01195  [pdf, other

    cs.SD cs.CV eess.AS

    Kite: Automatic speech recognition for unmanned aerial vehicles

    Authors: Dan Oneata, Horia Cucu

    Abstract: This paper addresses the problem of building a speech recognition system attuned to the control of unmanned aerial vehicles (UAVs). Even though UAVs are becoming widespread, the task of creating voice interfaces for them is largely unaddressed. To this end, we introduce a multi-modal evaluation dataset for UAV control, consisting of spoken commands and associated images, which represent the visual… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: 5 pages, accepted at Interspeech 2019