Skip to main content

Showing 1–21 of 21 results for author: García, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10272  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Connected Speech-Based Cognitive Assessment in Chinese and English

    Authors: Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

    Abstract: We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: To appear in Proceedings of Interspeech 2024

    ACM Class: J.3; I.5.4

  2. arXiv:2406.03138  [pdf, other

    cs.SD eess.AS

    A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection

    Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

    Abstract: Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-leve… ▽ More

    Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

  3. arXiv:2403.05887  [pdf, other

    eess.AS

    Aligning Speech to Languages to Enhance Code-switching Speech Recognition

    Authors: Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

    Abstract: Code-switching (CS) refers to the switching of languages within a speech signal and results in language confusion for automatic speech recognition (ASR). To address language confusion, we propose the language alignment loss that performs frame-level language identification using pseudo language labels learned from the ASR decoder. This eliminates the need for frame-level language annotations. To f… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  4. arXiv:2402.10642  [pdf, other

    eess.AS cs.AI

    Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model

    Authors: Xiangyu Zhang, Daijiao Liu, Hexin Liu, Qiquan Zhang, Hanyu Meng, Leibny Paola Garcia, Eng Siong Chng, Lina Yao

    Abstract: Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks. However, in the field of speech synthesis, although DDPMs exhibit impressive performance, their long training duration and substantial inference costs hinder practical deployment. Existing approaches primarily focus on enhancing inference speed, while approaches… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  5. arXiv:2401.08453  [pdf, other

    eess.SY

    Co-existence of Terrestrial and Non-Terrestrial Networks in S-band

    Authors: Niloofar Okati, Andre Noll Barreto, Luis Uzeda Garcia, Jeroen Wigard

    Abstract: Co-existence of terrestrial and non-terrestrial networks (NTN) is foreseen as an important component to fulfill the global coverage promised for sixth-generation (6G) of cellular networks. Due to ever rising spectrum demand, using dedicated frequency bands for terrestrial network (TN) and NTN may not be feasible. As a result, certain S-band frequency bands allocated by radio regulations to NTN net… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  6. arXiv:2311.15954  [pdf, other

    cs.CL eess.AS

    A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

    Authors: Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao, Leibny Paola Garcia

    Abstract: In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures, 4 tables

  7. arXiv:2309.16953  [pdf, other

    eess.AS cs.SD

    Enhancing Code-switching Speech Recognition with Interactive Language Biases

    Authors: Hexin Liu, Leibny Paola Garcia, Xiangyu Zhang, Andy W. H. Khong, Sanjeev Khudanpur

    Abstract: Languages usually switch within a multilingual speech signal, especially in a bilingual society. This phenomenon is referred to as code-switching (CS), making automatic speech recognition (ASR) challenging under a multilingual scenario. We propose to improve CS-ASR by biasing the hybrid CTC/attention ASR model with multi-level language information comprising frame- and token-level language posteri… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE ICASSP 2024

  8. arXiv:2309.13476  [pdf, other

    cs.CL cs.SD eess.AS

    Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

    Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

    Abstract: Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-… ▽ More

    Submitted 6 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

    ACM Class: F.2.2; I.2.7

  9. arXiv:2309.12202  [pdf

    eess.SP cs.LG q-bio.NC

    Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023

    Authors: Mahboobeh Jafari, Delaram Sadeghi, Afshin Shoeibi, Hamid Alinejad-Rokny, Amin Beheshti, David López García, Zhaolin Chen, U. Rajendra Acharya, Juan M. Gorriz

    Abstract: Schizophrenia (SZ) is a prevalent mental disorder characterized by cognitive, emotional, and behavioral changes. Symptoms of SZ include hallucinations, illusions, delusions, lack of motivation, and difficulties in concentration. Diagnosing SZ involves employing various tools, including clinical interviews, physical examinations, psychological evaluations, the Diagnostic and Statistical Manual of M… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  10. arXiv:2306.01031  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

    Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

    Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) cr… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  11. arXiv:2304.04356  [pdf

    cs.CV cs.LG eess.SY

    Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras

    Authors: Sandeep Singh Sandha, Bharathan Balaji, Luis Garcia, Mani Srivastava

    Abstract: Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks al… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

    Comments: 20 pages, IoTDI

  12. arXiv:2211.17196  [pdf, other

    cs.CL cs.SD eess.AS

    EURO: ESPnet Unsupervised ASR Open-source Toolkit

    Authors: Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-yi Lee, Shinji Watanabe, Sanjeev Khudanpur

    Abstract: This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extend… ▽ More

    Submitted 20 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

  13. arXiv:2210.14567  [pdf, other

    eess.AS cs.SD

    Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

    Authors: Hexin Liu, Haihua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur

    Abstract: Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  14. arXiv:2210.11658  [pdf, other

    eess.SP

    A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters

    Authors: Yu Xuan, Xiangyu Zhang, Shuyue Stella Li, Zihan Shen, Xin Xie, Leibny Paola Garcia, Roberto Togneri

    Abstract: The detection of abnormal fetal heartbeats during pregnancy is important for monitoring the health conditions of the fetus. While adult ECG has made several advances in modern medicine, noninvasive fetal electrocardiography (FECG) remains a great challenge. In this paper, we introduce a new method based on affine combinations of adaptive filters to extract FECG signals. The affine combination of m… ▽ More

    Submitted 26 February, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, 3 tables

  15. arXiv:2209.12702  [pdf, other

    eess.AS cs.SD

    End-to-End Lyrics Recognition with Self-supervised Learning

    Authors: Xiangyu Zhang, Shuyue Stella Li, Zhanhong He, Roberto Togneri, Leibny Paola Garcia

    Abstract: Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We e… ▽ More

    Submitted 26 October, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 4 pages, 2 figures, 3 tables

  16. arXiv:2207.08581  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    Study of the performance and scalability of federated learning for medical imaging with intermittent clients

    Authors: Judith Sáinz-Pardo Díaz, Álvaro López García

    Abstract: Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together w… ▽ More

    Submitted 3 November, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

  17. arXiv:2204.13597  [pdf, other

    eess.SP cs.LG

    PhysioGAN: Training High Fidelity Generative Model for Physiological Sensor Readings

    Authors: Moustafa Alzantot, Luis Garcia, Mani Srivastava

    Abstract: Generative models such as the variational autoencoder (VAE) and the generative adversarial networks (GAN) have proven to be incredibly powerful for the generation of synthetic data that preserves statistical properties and utility of real-world datasets, especially in the context of image and natural language text. Nevertheless, until now, there has no successful demonstration of how to apply eith… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  18. Artificial Intelligence and Dimensionality Reduction: Tools for approaching future communications

    Authors: Alejandro Ramírez-Arroyo, Luz García, Antonio Alex-Amor, Juan F. Valenzuela-Valdés

    Abstract: This article presents a novel application of the t-distributed Stochastic Neighbor Embedding (t-SNE) clustering algorithm to the telecommunication field. t-SNE is a dimensionality reduction (DR) algorithm that allows the visualization of large dataset into a 2D plot. We present the applicability of this algorithm in a communication channel dataset formed by several scenarios (anechoic, reverberati… ▽ More

    Submitted 16 March, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: IEEE Open Journal of the Communications Society

    Journal ref: IEEE Open Journal of the Communications Society, vol. 3, pp. 475-492, 2022

  19. arXiv:2110.08090  [pdf, other

    cs.SD cs.AI eess.AS

    Using DeepProbLog to perform Complex Event Processing on an Audio Stream

    Authors: Marc Roig Vilamala, Tianwei Xing, Harrison Taylor, Luis Garcia, Mani Srivastava, Lance Kaplan, Alun Preece, Angelika Kimmig, Federico Cerutti

    Abstract: In this paper, we present an approach to Complex Event Processing (CEP) that is based on DeepProbLog. This approach has the following objectives: (i) allowing the use of subsymbolic data as an input, (ii) retaining the flexibility and modularity on the definitions of complex event rules, (iii) allowing the system to be trained in an end-to-end manner and (iv) being robust against noisily labelled… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 8 pages, 3 figures

  20. Noise Reduction to Compute Tissue Mineral Density and Trabecular Bone Volume Fraction from Low Resolution QCT

    Authors: Felix Thomsen, José M. Fuertes García, Manuel Lucena, Juan Pisula, Rodrigo de Luis García, Jan Broggrefe, Claudio Delrieux

    Abstract: We propose a 3D neural network with specific loss functions for quantitative computed tomography (QCT) noise reduction to compute micro-structural parameters such as tissue mineral density (TMD) and bone volume ratio (BV/TV) with significantly higher accuracy than using no or standard noise reduction filters. The vertebra-phantom study contained high resolution peripheral and clinical CT scans wit… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: A revised version of this manuscript was accepted for publication in Computerized Medical Imaging and Graphics

  21. arXiv:2010.06047  [pdf, other

    cs.AI cs.CL eess.AS

    Artificial Intelligence, speech and language processing approaches to monitoring Alzheimer's Disease: a systematic review

    Authors: Sofia de la Fuente Garcia, Craig Ritchie, Saturnino Luz

    Abstract: Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: Pre-print submitted to the Journal of Alzheimer's Disease

    ACM Class: J.3; I.2.7; I.2.6; I.5.4