Skip to main content

Showing 1–8 of 8 results for author: Salvi, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.16128  [pdf, other

    eess.AS

    Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions

    Authors: Moreno La Quatra, Maria Francesca Turco, Torbjørn Svendsen, Giampiero Salvi, Juan Rafael Orozco-Arroyave, Sabato Marco Siniscalchi

    Abstract: This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundational-based models on the standard PC-GITA (s-PC-GITA) clean data. Our results demonstrate superior performance to previously proposed models. Second, we… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2404.16547  [pdf, other

    eess.AS cs.AI cs.SD

    Develo** Acoustic Models for Automatic Speech Recognition in Swedish

    Authors: Giampiero Salvi

    Abstract: This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 16 pages, 7 figures

    MSC Class: 68T10 ACM Class: I.5.0; I.2.0; I.2.7

    Journal ref: European Student Journal of Language and Speech, 1999

  3. arXiv:2401.06588  [pdf, other

    eess.AS cs.AI cs.CV cs.LG cs.SD

    Dynamic Behaviour of Connectionist Speech Recognition with Strong Latency Constraints

    Authors: Giampiero Salvi

    Abstract: This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolut… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    ACM Class: I.5.0; I.2.7; E.4

    Journal ref: Speech Communication Volume 48, Issue 7, July 2006, Pages 802-818

  4. arXiv:2401.05717  [pdf, other

    eess.AS cs.IT cs.LG cs.SD

    Segment Boundary Detection via Class Entropy Measurements in Connectionist Phoneme Recognition

    Authors: Giampiero Salvi

    Abstract: This article investigates the possibility to use the class entropy of the output of a connectionist phoneme recogniser to predict time boundaries between phonetic classes. The rationale is that the value of the entropy should increase in proximity of a transition between two segments that are well modelled (known) by the recognition network since it is a measure of uncertainty. The advantage of th… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    ACM Class: I.5.0; I.2.7; E.4

    Journal ref: Speech Communication Volume 48, Issue 12, December 2006, Pages 1666-1676

  5. arXiv:2106.06147  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    NAAQA: A Neural Architecture for Acoustic Question Answering

    Authors: Jerome Abdelnour, Jean Rouat, Giampiero Salvi

    Abstract: The goal of the Acoustic Question Answering (AQA) task is to answer a free-form text question about the content of an acoustic scene. It was inspired by the Visual Question Answering (VQA) task. In this paper, based on the previously introduced CLEAR dataset, we propose a new benchmark for AQA, namely CLEAR2, that emphasizes the specific challenges of acoustic inputs. These include handling of var… ▽ More

    Submitted 12 January, 2024; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) in April 2021 (first revision February 2022)

    ACM Class: I.2.7; I.2.10; I.5.0

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, Volume: 45 Issue: 4, Page(s): 4997-5009

  6. arXiv:2009.05184  [pdf, other

    eess.SP cs.CR eess.SY

    STEP-GAN: A Step-by-Step Training for Multi Generator GANs with application to Cyber Security in Power Systems

    Authors: Mohammad Adiban, Arash Safari, Giampiero Salvi

    Abstract: In this study, we introduce a novel unsupervised countermeasure for smart grid power systems, based on generative adversarial networks (GANs). Given the pivotal role of smart grid systems (SGSs) in urban life, their security is of particular importance. In recent years, however, advances in the field of machine learning, have raised concerns about cyber attacks on these systems. Power systems, amo… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  7. arXiv:1902.11280  [pdf, ps, other

    cs.LG cs.SD eess.AS stat.ML

    From Visual to Acoustic Question Answering

    Authors: Jerome Abdelnour, Giampiero Salvi, Jean Rouat

    Abstract: We introduce the new task of Acoustic Question Answering (AQA) to promote research in acoustic reasoning. The AQA task consists of analyzing an acoustic scene composed by a combination of elementary sounds and answering questions that relate the position and properties of these sounds. The kind of relational questions asked, require that the models perform non-trivial reasoning in order to answer… ▽ More

    Submitted 28 February, 2019; originally announced February 2019.

  8. arXiv:1811.10561  [pdf, other

    cs.CL cs.LG cs.SD eess.AS stat.ML

    CLEAR: A Dataset for Compositional Language and Elementary Acoustic Reasoning

    Authors: Jerome Abdelnour, Giampiero Salvi, Jean Rouat

    Abstract: We introduce the task of acoustic question answering (AQA) in the area of acoustic reasoning. In this task an agent learns to answer questions on the basis of acoustic context. In order to promote research in this area, we propose a data generation paradigm adapted from CLEVR (Johnson et al. 2017). We generate acoustic scenes by leveraging a bank elementary sounds. We also provide a number of func… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: NeurIPS 2018 Visually Grounded Interaction and Language (ViGIL) Workshop