Skip to main content

Showing 1–16 of 16 results for author: Falk, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18731  [pdf, other

    eess.AS cs.AI cs.CL

    WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model

    Authors: Yi Zhu, Tiago Falk

    Abstract: Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Under review; Model script available at https://github.com/zhu00121/WavRx

  2. arXiv:2406.03657  [pdf, other

    eess.AS cs.SD

    UrBAN: Urban Beehive Acoustics and PheNoty** Dataset

    Authors: Mahsa Abdollahi, Yi Zhu, Heitor R. Guimarães, Nico Coallier, Ségolène Maucourt, Pierre Giovenazzo, Tiago H. Falk

    Abstract: In this paper, we present a multimodal dataset obtained from a honey bee colony in Montréal, Quebec, Canada, spanning the years of 2021 to 2022. This apiary comprised 10 beehives, with microphones recording more than 2000 hours of high quality raw audio, and also sensors capturing temperature, and humidity. Periodic hive inspections involved monitoring colony honey bee population changes, assessin… ▽ More

    Submitted 20 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  3. arXiv:2403.08654  [pdf, other

    eess.AS cs.SD

    An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech representation learning enables the extraction of meaningful features from raw waveforms. These features can then be efficiently used across multiple downstream tasks. However, two significant issues arise when considering the deployment of such methods ``in-the-wild": (i) Their large size, which can be prohibitive for edge applications; and (ii) their robustness to detrimen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

  4. arXiv:2311.10876  [pdf, other

    eess.AS cs.SD q-bio.QM

    MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees

    Authors: Yi Zhu, Mahsa Abdollahi, Ségolène Maucourt, Nico Coallier, Heitor R. Guimarães, Pierre Giovenazzo, Tiago H. Falk

    Abstract: We present a longitudinal multi-sensor dataset collected from honey bee colonies (Apis mellifera) with rich phenotypic measurements. Data were continuously collected between May-2020 and April-2021 from 53 hives located at two apiaries in Québec, Canada. The sensor data included audio features, temperature, and relative humidity. The phenotypic measurements contained beehive population, number of… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Under review; project webpage: https://zhu00121.github.io/MSPB-webpage/

  5. arXiv:2309.14462  [pdf, ps, other

    eess.AS cs.SD

    On the Impact of Quantization and Pruning of Self-Supervised Speech Models for Downstream Speech Recognition Tasks "In-the-Wild''

    Authors: Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors. Notwithstanding, while such models achieve SOTA performance in matched train/test conditions, their performance degrades substantially when tested in unseen conditions… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  6. arXiv:2309.12914  [pdf, other

    eess.AS cs.SD

    VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson Avila, Tiago H. Falk

    Abstract: Keyword spotting (KWS) refers to the task of identifying a set of predefined words in audio streams. With the advances seen recently with deep neural networks, it has become a popular technology to activate and control small devices, such as voice assistants. Relying on such models for edge devices, however, can be challenging due to hardware constraints. Moreover, as adversarial attacks have incr… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  7. arXiv:2309.08099  [pdf, other

    cs.SD cs.CL eess.AS

    Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection

    Authors: Yi Zhu, Saurabh Powar, Tiago H. Falk

    Abstract: Existing deepfake speech detection systems lack generalizability to unseen attacks (i.e., samples generated by generative algorithms not seen during training). Recent studies have explored the use of universal speech representations to tackle this issue and have obtained inspiring results. These works, however, have focused on innovating downstream classifiers while leaving the representation itse… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  8. arXiv:2305.14546  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

    Authors: Vamsikrishna Chemudupati, Marzieh Tahaei, Heitor Guimaraes, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago Falk

    Abstract: Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks. The self-supervised training of these models leads to universal speech representations that can be used for different downstream tasks, ranging from automatic speech recognition (ASR) to speaker identification. Recently, Whisper, a transformer-based model was proposed and traine… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2305.05443  [pdf, ps, other

    eess.AS cs.SD

    An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild'' Edge Applications

    Authors: Heitor Guimarães, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Unsupervised speech models are becoming ubiquitous in the speech and machine learning communities. Upstream models are responsible for learning meaningful representations from raw audio. Later, these representations serve as input to downstream models to solve a number of tasks, such as keyword spotting or emotion recognition. As edge speech applications start to emerge, it is important to gauge h… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Extended Abstract accepted in the Edge Intelligence Workshop (EIW) 2022

  10. arXiv:2304.02181  [pdf, other

    cs.CL cs.SD eess.AS

    On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection

    Authors: Yi Zhu, Mohamed Imoussaïne-Aïkous, Carolyn Côté-Lussier, Tiago H. Falk

    Abstract: With advances seen in deep learning, voice-based applications are burgeoning, ranging from personal assistants, affective computing, to remote disease diagnostics. As the voice contains both linguistic and para-linguistic information (e.g., vocal pitch, intonation, speech rate, loudness), there is growing interest in voice anonymization to preserve speaker privacy and identity. Voice privacy chall… ▽ More

    Submitted 26 June, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Updated version; Published at IEEE Transactions on Information Forensics and Security

    Journal ref: IEEE Transactions on Information Forensics and Security, vol. 19, pp. 5151-5165, 2024

  11. arXiv:2302.09437  [pdf, other

    eess.AS cs.SD

    RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech pre-training enables deep neural network models to capture meaningful and disentangled factors from raw waveform signals. The learned universal speech representations can then be used across numerous downstream tasks. These representations, however, are sensitive to distribution shifts caused by environmental factors, such as noise and/or room reverberation. Their large size… ▽ More

    Submitted 22 February, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  12. arXiv:2211.06562  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition. Existing models, such as HuBERT, however, can be fairly large thus may not be suitable for edge speech applications. Moreover, realistic applications typically involve speech corrupted by noise… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: ENLSP-II NeurIPS Workshop 2022, 6 pages

  13. arXiv:2003.08474  [pdf, other

    eess.SP cs.CY cs.HC stat.AP

    TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers

    Authors: Karel Mundnich, Brandon M. Booth, Michelle L'Hommedieu, Tiantian Feng, Benjamin Girault, Justin L'Hommedieu, Mackenzie Wildman, Sophia Skaaden, Amrutha Nadarajan, Jennifer L. Villatte, Tiago H. Falk, Kristina Lerman, Emilio Ferrara, Shrikanth Narayanan

    Abstract: We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their… ▽ More

    Submitted 18 December, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: 57 pages, 9 figures, journal paper

    Journal ref: Sci Data 7, 354 (2020)

  14. arXiv:1906.08823  [pdf, other

    cs.LG eess.SP stat.ML

    Cross-Subject Statistical Shift Estimation for Generalized Electroencephalography-based Mental Workload Assessment

    Authors: Isabela Albuquerque, João Monteiro, Olivier Rosanne, Abhishek Tiwari, Jean-François Gagnon, Tiago H. Falk

    Abstract: Assessment of mental workload in real-world conditions is key to ensure the performance of workers executing tasks that demand sustained attention. Previous literature has employed electroencephalography (EEG) to this end despite having observed that EEG correlates of mental workload vary across subjects and physical strain, thus making it difficult to devise models capable of simultaneously prese… ▽ More

    Submitted 22 September, 2021; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  15. arXiv:1901.05498  [pdf, ps, other

    cs.LG eess.SP stat.ML

    Deep learning-based electroencephalography analysis: a systematic review

    Authors: Yannick Roy, Hubert Banville, Isabela Albuquerque, Alexandre Gramfort, Tiago H. Falk, Jocelyn Faubert

    Abstract: Electroencephalography (EEG) is a complex signal and can require several years of training to be correctly interpreted. Recently, deep learning (DL) has shown great promise in hel** make sense of EEG signals due to its capacity to learn good feature representations from raw data. Whether DL truly presents advantages as compared to more traditional EEG processing approaches, however, remains an o… ▽ More

    Submitted 20 January, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

  16. arXiv:1711.06309  [pdf, other

    cs.SD eess.AS

    Speech Dereverberation with Context-aware Recurrent Neural Networks

    Authors: Joao Felipe Santos, Tiago H. Falk

    Abstract: In this paper, we propose a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart. Our models are capable of extracting features that take into account both short and long-term dependencies in the signal through a convolutional encoder (which extracts features from a short, bounded context of frames) and a recurrent neural network for extract… ▽ More

    Submitted 16 November, 2017; originally announced November 2017.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing