Search | arXiv e-print repository

arXiv:2005.10228 [pdf, other]

Sparsity-based audio declip** methods: selected overview, new algorithms, and large-scale evaluation

Authors: Clément Gaultier, Srđan Kitić, Rémi Gribonval, Nancy Bertin

Abstract: Recent advances in audio declip** have substantially improved the state of the art.% in certain saturation regimes. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the clip** levels in existing small-scale benchmarks are moderate and… ▽ More Recent advances in audio declip** have substantially improved the state of the art.% in certain saturation regimes. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the clip** levels in existing small-scale benchmarks are moderate and call for benchmarks with more perceptually significant clip** levels. We then propose a general algorithmic framework for declip** that covers existing and new combinations of variants of state-of-the-art techniques exploiting time-frequency sparsity: synthesis vs. analysis sparsity, with plain or structured sparsity. Finally, we systematically compare these combinations and a selection of state-of-the-art methods. Using a large-scale numerical benchmark and a smaller scale formal listening test, we provide guidelines for various clip** levels, both for speech and various musical genres. The code is made publicly available for the purpose of reproducible research and benchmarking. △ Less

Submitted 30 November, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:1910.10661 [pdf, other]

A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio

Authors: Srđan Kitić, Clément Gaultier, Grégory Pallone

Abstract: In this article we analyze the state-of-the-art in multilateration - the family of localization methods enabled by the range difference observations. These methods are computationally efficient, signal-independent, and flexible with regards to the number of sensing nodes and their spatial arrangement. However, the multilateration problem does not admit a closed-form solution in the general case, a… ▽ More In this article we analyze the state-of-the-art in multilateration - the family of localization methods enabled by the range difference observations. These methods are computationally efficient, signal-independent, and flexible with regards to the number of sensing nodes and their spatial arrangement. However, the multilateration problem does not admit a closed-form solution in the general case, and the localization performance is conditioned on the accuracy of range difference estimates. For that reason, we consider a simplified use case where multiple distributed microphones capture the signal coming from a near field sound source, and discuss their robustness to the estimation errors. In addition to surveying the relevant bibliography, we present the results of a small-scale benchmark of few "mainstream" multilateration algorithms, based on an in-house Room Impulse Response dataset. △ Less

Submitted 28 July, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: To appear at IWIS - The 1st International Workshop on the Internet of Sounds

arXiv:1812.05901 [pdf, ps, other]

Evaluation of an open-source implementation of the SRP-PHAT algorithm within the 2018 LOCATA challenge

Authors: Romain Lebarbenchon, Ewen Camberlein, Diego di Carlo, Clément Gaultier, Antoine Deleforge, Nancy Bertin

Abstract: This short paper presents an efficient, flexible implementation of the SRP-PHAT multichannel sound source localization method. The method is evaluated on the single-source tasks of the LOCATA 2018 development dataset, and an associated Matlab toolbox is made available online. This short paper presents an efficient, flexible implementation of the SRP-PHAT multichannel sound source localization method. The method is evaluated on the single-source tasks of the LOCATA 2018 development dataset, and an associated Matlab toolbox is made available online. △ Less

Submitted 14 December, 2018; originally announced December 2018.

Comments: In Proceedings of the LOCATA Challenge Workshop - a satellite event of IWAENC 2018 (arXiv:1811.08482 )

Report number: LOCATAchallenge/2018/01

arXiv:1711.11259 [pdf, other]

A modeling and algorithmic framework for (non)social (co)sparse audio restoration

Authors: Clément Gaultier, Nancy Bertin, Srđan Kitić, Rémi Gribonval

Abstract: We propose a unified modeling and algorithmic framework for audio restoration problem. It encompasses analysis sparse priors as well as more classical synthesis sparse priors, and regular sparsity as well as various forms of structured sparsity embodied by shrinkage operators (such as social shrinkage). The versatility of the framework is illustrated on two restoration scenarios: denoising, and de… ▽ More We propose a unified modeling and algorithmic framework for audio restoration problem. It encompasses analysis sparse priors as well as more classical synthesis sparse priors, and regular sparsity as well as various forms of structured sparsity embodied by shrinkage operators (such as social shrinkage). The versatility of the framework is illustrated on two restoration scenarios: denoising, and declip**. Extensive experimental results on these scenarios highlight both the speedups of 20% or even more offered by the analysis sparse prior, and the substantial declip** quality that is achievable with both the social and the plain flavor. While both flavors overall exhibit similar performance, their detailed comparison displays distinct trends depending whether declip** or denoising is considered. △ Less

Submitted 30 November, 2017; originally announced November 2017.

arXiv:1612.06287 [pdf, other]

VAST : The Virtual Acoustic Space Traveler Dataset

Authors: Clément Gaultier, Saurabh Kataria, Antoine Deleforge

Abstract: This paper introduces a new paradigm for sound source lo-calization referred to as virtual acoustic space traveling (VAST) and presents a first dataset designed for this purpose. Existing sound source localization methods are either based on an approximate physical model (physics-driven) or on a specific-purpose calibration set (data-driven). With VAST, the idea is to learn a map** from audio fe… ▽ More This paper introduces a new paradigm for sound source lo-calization referred to as virtual acoustic space traveling (VAST) and presents a first dataset designed for this purpose. Existing sound source localization methods are either based on an approximate physical model (physics-driven) or on a specific-purpose calibration set (data-driven). With VAST, the idea is to learn a map** from audio features to desired audio properties using a massive dataset of simulated room impulse responses. This virtual dataset is designed to be maximally representative of the potential audio scenes that the considered system may be evolving in, while remaining reasonably compact. We show that virtually-learned map**s on this dataset generalize to real data, overcoming some intrinsic limitations of traditional binaural sound localization methods based on time differences of arrival. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Feb 2017, Grenoble, France. International Conference on Latent Variable Analysis and Signal Separation

arXiv:1609.09747 [pdf, other]

Hearing in a shoe-box : binaural source position and wall absorption estimation using virtually supervised learning

Authors: Saurabh Kataria, Clément Gaultier, Antoine Deleforge

Abstract: This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption co… ▽ More This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients. A probabilistic high- to low-dimensional regression framework is used to learn a map** from these features to the acoustic properties. Results indicate that this map** successfully estimates the azimuth and elevation of new sources, but also their range and even the walls' absorption coefficients solely based on binaural signals. Results also reveal that incorporating random-diffusion effects in the data significantly improves the estimation of all parameters. △ Less

Submitted 20 March, 2017; v1 submitted 30 September, 2016; originally announced September 2016.

Comments: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar 2017, New-Orleans, United States

Report number: hal-01372435

Showing 1–6 of 6 results for author: Gaultier, C