Skip to main content

Showing 1–30 of 30 results for author: Grondin, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.06310  [pdf, other

    cs.SD eess.AS

    Unsupervised Improved MVDR Beamforming for Sound Enhancement

    Authors: Jacob Kealey, John Hershey, François Grondin

    Abstract: Neural networks have recently become the dominant approach to sound separation. Their good performance relies on large datasets of isolated recordings. For speech and music, isolated single channel data are readily available; however the same does not hold in the multi-channel case, and with most other sound classes. Multi-channel methods have the potential to outperform single channel approaches… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2309.08005  [pdf, ps, other

    eess.AS cs.SD eess.IV

    Efficient Face Detection with Audio-Based Region Proposals for Human-Robot Interactions

    Authors: William Aris, François Grondin

    Abstract: Efficient face detection is critical to provide natural human-robot interactions. However, computer vision tends to involve a large computational load due to the amount of data (i.e. pixels) that needs to be processed in a short amount of time. This is undesirable on robotics platforms where multiple processes need to run in parallel and where the processing power is limited by portability constra… ▽ More

    Submitted 15 March, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  3. arXiv:2309.05057  [pdf, other

    eess.AS cs.SD

    Gray Jedi MVDR Post-filtering

    Authors: François Grondin, Caleb Rascón

    Abstract: Spatial filters can exploit deep-learning-based speech enhancement models to increase their reliability in scenarios with multiple speech sources scenarios. To further improve speech quality, it is common to perform postfiltering on the estimated target speech obtained with spatial filtering. In this work, Minimum Variance Distortionless Response (MVDR) is employed to provide the interference esti… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  4. arXiv:2303.00949  [pdf, other

    eess.AS

    Real-time Audio Video Enhancement \\with a Microphone Array and Headphones

    Authors: Jacob Kealey, Anthony Gosselin, Étienne Deshaies-Samson, Francis Cardinal, Félix Ducharme-Turcotte, Olivier Bergeron, Amélie Rioux-Joyal, Jérémy Bélec, François Grondin

    Abstract: This paper presents a complete hardware and software pipeline for real-time speech enhancement in noisy and reverberant conditions. The device consists of a microphone array and a camera mounted on eyeglasses, connected to an embedded system that enhances speech and plays back the audio in headphones, with a latency of maximum 120 msec. The proposed approach relies on face detection, tracking and… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Submitted to IROS 2023

  5. arXiv:2303.00829  [pdf, other

    eess.AS

    Ego-noise reduction of a mobile robot using noise spatial covariance matrix learning and minimum variance distortionless response

    Authors: Pierre-Olivier Lagacé, François Ferland, François Grondin

    Abstract: The performance of speech and events recognition systems significantly improved recently thanks to deep learning methods. However, some of these tasks remain challenging when algorithms are deployed on robots due to the unseen mechanical noise and electrical interference generated by their actuators while training the neural networks. Ego-noise reduction as a preprocessing step therefore can help… ▽ More

    Submitted 6 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Submitted to IROS 2023

  6. arXiv:2206.09507  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Resource-Efficient Separation Transformer

    Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin

    Abstract: Transformers have recently achieved state-of-the-art performance in speech separation. These models, however, are computationally demanding and require a lot of learnable parameters. This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-bas… ▽ More

    Submitted 15 January, 2024; v1 submitted 19 June, 2022; originally announced June 2022.

    Comments: Accepted to ICASSP 2024

  7. arXiv:2204.13622  [pdf, other

    eess.SP cs.SD eess.AS

    Fast Cross-Correlation for TDoA Estimation on Small Aperture Microphone Arrays

    Authors: François Grondin, Marc-Antoine Maheux, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud

    Abstract: This paper introduces the Fast Cross-Correlation (FCC) method for Time Difference of Arrival (TDoA) Estimation for pairs of microphones on a small aperture microphone array. FCC relies on low-rank decomposition and exploits symmetry in even and odd bases to speed up computation while preserving TDoA accuracy. FCC reduces the number of flops by a factor of 4.5 and the execution speed by factors bet… ▽ More

    Submitted 10 March, 2023; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Submitted to IEEE ICASSP 2023

  8. arXiv:2203.14409  [pdf, other

    cs.SD eess.AS

    SMP-PHAT: Lightweight DoA Estimation by Merging Microphone Pairs

    Authors: François Grondin, Marc-Antoine Maheux, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud

    Abstract: This paper introduces SMP-PHAT, which performs direction of arrival (DoA) of sound estimation with a microphone array by merging pairs of microphones that are parallel in space. This approach reduces the number of pairwise cross-correlation computations, and brings down the number of flops and memory lookups when searching for DoA. Experiments on low-cost hardware with commonly used microphone arr… ▽ More

    Submitted 27 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  9. arXiv:2202.08947  [pdf, other

    eess.SP

    Machine Learning for Touch Localization on Ultrasonic Wave Touchscreen

    Authors: Sahar Bahrami, Jérémy Moriot, Patrice Masson, François Grondin

    Abstract: Classification and regression employing a simple Deep Neural Network (DNN) are investigated to perform touch localization on a tactile surface using ultrasonic guided waves. A robotic finger first simulates the touch action and captures the data to train a model. The model is then validated with data from experiments conducted with human fingers. The localization root mean square errors (RMSE) in… ▽ More

    Submitted 27 April, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

  10. arXiv:2202.02884  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Exploring Self-Attention Mechanisms for Speech Separation

    Authors: Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin, Mirko Bronzi

    Abstract: Transformers have enabled impressive improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we proposed the SepFormer, which obtains state-of-the-art performance in speech separation with the WSJ0-2/3 Mix datasets. This paper studies in-depth Transformers for speech separation. In particular, we… ▽ More

    Submitted 27 May, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  11. arXiv:2111.04614  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Learning Filterbanks for End-to-End Acoustic Beamforming

    Authors: Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini

    Abstract: Recent work on monaural source separation has shown that performance can be increased by using fully learned filterbanks with short windows. On the other hand it is widely known that, for conventional beamforming techniques, performance increases with long analysis windows. This applies also to most hybrid neural beamforming methods which rely on a deep neural network (DNN) to estimate the spatial… ▽ More

    Submitted 19 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: accepted at ICASSP 2022

  12. arXiv:2110.10812  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    REAL-M: Towards Speech Separation on Real Mixtures

    Authors: Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin

    Abstract: In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  13. arXiv:2110.03103  [pdf, other

    eess.AS cs.SD eess.SP

    Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

    Authors: Thomas Bernard, François Grondin

    Abstract: This paper introduces a new method referred to as KISS-GEV (for Keep It Super Simple Generalized eigenvalue) beamforming. While GEV beamforming usually relies on deep neural network for estimating target and noise time-frequency masks, this method uses a signal processing approach based on the direction of arrival (DoA) of the target. This considerably reduces the amount of computations involved a… ▽ More

    Submitted 10 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

  14. Neural Network Based Lidar Gesture Recognition for Realtime Robot Teleoperation

    Authors: Simon Chamorro, Jack Collier, François Grondin

    Abstract: We propose a novel low-complexity lidar gesture recognition system for mobile robot control robust to gesture variation. Our system uses a modular approach, consisting of a pose estimation module and a gesture classifier. Pose estimates are predicted from lidar scans using a Convolutional Neural Network trained using an existing stereo-based pose estimation system. Gesture classification is accomp… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Journal ref: 2021 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)

  15. arXiv:2106.04624  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    SpeechBrain: A General-Purpose Speech Toolkit

    Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

    Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Preprint

  16. ECAPA-TDNN Embeddings for Speaker Diarization

    Authors: Nauman Dawalatabad, Mirco Ravanelli, François Grondin, Jenthe Thienpondt, Brecht Desplanques, Hwidong Na

    Abstract: Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental component of modern diarization systems. Recently, some improvements over the standard TDNN architecture used for x-vectors have been proposed. The ECAPA-TDNN model, f… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  17. arXiv:2103.03954  [pdf, other

    eess.AS cs.RO cs.SD

    ODAS: Open embeddeD Audition System

    Authors: François Grondin, Dominic Létourneau, Cédric Godin, Jean-Samuel Lauzon, Jonathan Vincent, Simon Michaud, Samuel Faucher, François Michaud

    Abstract: Artificial audition aims at providing hearing capabilities to machines, computers and robots. Existing frameworks in robot audition offer interesting sound source localization, tracking and separation performance, although involve a significant amount of computations that limit their use on robots with embedded computing capabilities. This paper presents ODAS, the Open embeddeD Audition System fra… ▽ More

    Submitted 11 May, 2022; v1 submitted 5 March, 2021; originally announced March 2021.

    Comments: This paper was published in Frontiers Robotics and AI

  18. arXiv:2103.01830  [pdf, other

    cs.SD cs.LG eess.AS

    Audio scene monitoring using redundant ad-hoc microphone array networks

    Authors: Peter Gerstoft, Yihan Hu, Michael J. Bianco, Chaitanya Patil, Ardel Alegre, Yoav Freund, Francois Grondin

    Abstract: We present a system for localizing sound sources in a room with several ad-hoc microphone arrays. Each circular array performs direction of arrival (DOA) estimation independently using commercial software. The DOAs are fed to a fusion center, concatenated, and used to perform the localization based on two proposed methods, which require only few labeled source locations (anchor points) for trainin… ▽ More

    Submitted 23 August, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: IN press, IEEE Internet of Things Journal

  19. arXiv:2010.09930  [pdf, other

    cs.SD eess.AS

    BIRD: Big Impulse Response Dataset

    Authors: François Grondin, Jean-Samuel Lauzon, Simon Michaud, Mirco Ravanelli, François Michaud

    Abstract: This paper introduces BIRD, the Big Impulse Response Dataset. This open dataset consists of 100,000 multichannel room impulse responses (RIRs) generated from simulations using the Image Method, making it the largest multichannel open dataset currently available. These RIRs can be used toperform efficient online data augmentation for scenarios that involve two microphones and multiple sound sources… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

  20. arXiv:2008.00072  [pdf, other

    cs.CV eess.IV

    Dynamic Object Tracking and Masking for Visual SLAM

    Authors: Jonathan Vincent, Mathieu Labbé, Jean-Samuel Lauzon, François Grondin, Pier-Marc Comtois-Rivet, François Michaud

    Abstract: In dynamic environments, performance of visual SLAM techniques can be impaired by visual features taken from moving objects. One solution is to identify those objects so that their visual features can be removed for localization and map**. This paper presents a simple and fast pipeline that uses deep neural networks, extended Kalman filters and visual SLAM to improve both localization and mappin… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

  21. arXiv:2007.11079  [pdf, other

    eess.AS cs.SD

    3D Localization of a Sound Source Using Mobile Microphone Arrays Referenced by SLAM

    Authors: Simon Michaud, Samuel Faucher, François Grondin, Jean-Samuel Lauzon, Mathieu Labbé, Dominic Létourneau, François Ferland, François Michaud

    Abstract: A microphone array can provide a mobile robot with the capability of localizing, tracking and separating distant sound sources in 2D, i.e., estimating their relative elevation and azimuth. To combine acoustic data with visual information in real world settings, spatial correlation must be established. The approach explored in this paper consists of having two robots, each equipped with a microphon… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

  22. arXiv:2005.09587  [pdf, other

    eess.AS

    GEV Beamforming Supported by DOA-based Masks Generated on Pairs of Microphones

    Authors: Francois Grondin, Jean-Samuel Lauzon, Jonathan Vincent, Francois Michaud

    Abstract: Distant speech processing is a challenging task, especially when dealing with the cocktail party effect. Sound source separation is thus often required as a preprocessing step prior to speech recognition to improve the signal to distortion ratio (SDR). Recently, a combination of beamforming and speech separation networks have been proposed to improve the target source quality in the direction of a… ▽ More

    Submitted 5 August, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

  23. arXiv:2002.01440  [pdf, other

    eess.AS cs.SD eess.SP

    Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT

    Authors: Francois Grondin, Hao Tang, James Glass

    Abstract: This paper proposes a straightforward 2-D method to spatially calibrate the visual field of a camera with the auditory field of an array microphone by generating and overlaying an acoustic image over an optical image. Using a low-cost microphone array and an off-the-shelf camera, we show that polynomial regression can deal efficiently with non-linear camera distortion, and that a recently proposed… ▽ More

    Submitted 12 February, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

  24. arXiv:1910.10049  [pdf, other

    eess.AS cs.SD

    Sound Event Localization and Detection Using CRNN on Pairs of Microphones

    Authors: Francois Grondin, James Glass, Iwona Sobieraj, Mark D. Plumbley

    Abstract: This paper proposes sound event localization and detection methods from multichannel recording. The proposed system is based on two Convolutional Recurrent Neural Networks (CRNNs) to perform sound event detection (SED) and time difference of arrival (TDOA) estimation on each pair of microphones in a microphone array. In this paper, the system is evaluated with a four-microphone array, and thus com… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

  25. arXiv:1907.12621  [pdf, other

    eess.AS cs.SD

    Fast and Robust 3-D Sound Source Localization with DSVD-PHAT

    Authors: Francois Grondin, James Glass

    Abstract: This paper introduces a variant of the Singular Value Decomposition with Phase Transform (SVD-PHAT), named Difference SVD-PHAT (DSVD-PHAT), to achieve robust Sound Source Localization (SSL) in noisy conditions. Experiments are performed on a Baxter robot with a four-microphone planar array mounted on its head. Results show that this method offers similar robustness to noise as the state-of-the-art… ▽ More

    Submitted 29 July, 2019; originally announced July 2019.

  26. arXiv:1906.11913  [pdf, ps, other

    eess.AS eess.SP

    Multiple Sound Source Localization with SVD-PHAT

    Authors: Francois Grondin, James Glass

    Abstract: This paper introduces a modification of phase transform on singular value decomposition (SVD-PHAT) to localize multiple sound sources. This work aims to improve localization accuracy and keeps the algorithm complexity low for real-time applications. This method relies on multiple scans of the search space, with projection of each low-dimensional observation onto orthogonal subspaces. We show that… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

  27. arXiv:1812.00115  [pdf, other

    eess.AS cs.SD

    Lightweight and Optimized Sound Source Localization and Tracking Methods for Open and Closed Microphone Array Configurations

    Authors: Francois Grondin, Francois Michaud

    Abstract: Human-robot interaction in natural settings requires filtering out the different sources of sounds from the environment. Such ability usually involves the use of microphone arrays to localize, track and separate sound sources online. Multi-microphone signal processing techniques can improve robustness to noise but the processing cost increases with the number of microphones used, limiting response… ▽ More

    Submitted 30 November, 2018; originally announced December 2018.

  28. arXiv:1811.11787  [pdf, ps, other

    eess.AS cs.SD eess.SP

    A Study of the Complexity and Accuracy of Direction of Arrival Estimation Methods Based on GCC-PHAT for a Pair of Close Microphones

    Authors: Francois Grondin, James Glass

    Abstract: This paper investigates the accuracy of various Generalized Cross-Correlation with Phase Transform (GCC-PHAT) methods for a close pair of microphones. We investigate interpolation-based methods and also propose another approach based on Singular Value Decomposition (SVD). All investigated methods are implemented in C code, and the execution time is measured to determine which approach is the most… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

  29. arXiv:1811.11785  [pdf, ps, other

    eess.AS cs.SD eess.SP

    SVD-PHAT: A Fast Sound Source Localization Method

    Authors: Francois Grondin, James Glass

    Abstract: This paper introduces a new localization method called SVD-PHAT. The SVD-PHAT method relies on Singular Value Decomposition of the SRP-PHAT projection matrix. A k-d tree is also proposed to speed up the search for the most likely direction of arrival of sound. We show that this method performs as accurately as SRP-PHAT, while reducing significantly the amount of computation required.

    Submitted 11 February, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Journal ref: Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing

  30. arXiv:1806.04841  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

    Authors: Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass

    Abstract: Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute. Most studies focus on tackling distant speech recognition as a separate problem, leaving little effort to adapting close-talking speech recognizers to distant speech. In this work, we review several approaches from a domain adaptation perspecti… ▽ More

    Submitted 13 June, 2018; originally announced June 2018.

    Comments: Interspeech, 2018