Search | arXiv e-print repository

Multi-Channel Automatic Music Transcription Using Tensor Algebra

Authors: Axel Marmoret, Nancy Bertin, Jeremy Cohen

Abstract: Music is an art, perceived in unique ways by every listener, coming from acoustic signals. In the meantime, standards as musical scores exist to describe it. Even if humans can make this transcription, it is costly in terms of time and efforts, even more with the explosion of information consecutively to the rise of the Internet. In that sense, researches are driven in the direction of Automatic M… ▽ More Music is an art, perceived in unique ways by every listener, coming from acoustic signals. In the meantime, standards as musical scores exist to describe it. Even if humans can make this transcription, it is costly in terms of time and efforts, even more with the explosion of information consecutively to the rise of the Internet. In that sense, researches are driven in the direction of Automatic Music Transcription. While this task is considered solved in the case of single notes, it is still open when notes superpose themselves, forming chords. This report aims at develo** some of the existing techniques towards Music Transcription, particularly matrix factorization, and introducing the concept of multi-channel automatic music transcription. This concept will be explored with mathematical objects called tensors. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 40 pages, 14 figues, 5 tables, code can be found at: https://gitlab.inria.fr/amarmore/nonnegative-factorization

ACM Class: H.5.5

arXiv:2104.13168 [pdf, other]

dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing

Authors: Diego Di Carlo, Pinchas Tandeitnik, Cédric Foy, Antoine Deleforge, Nancy Bertin, Sharon Gannot

Abstract: This paper presents dEchorate: a new database of measured multichannel Room Impulse Responses (RIRs) including annotations of early echo timings and 3D positions of microphones, real sources and image sources under different wall configurations in a cuboid room. These data provide a tool for benchmarking recent methods in echo-aware speech enhancement, room geometry estimation, RIR estimation, aco… ▽ More This paper presents dEchorate: a new database of measured multichannel Room Impulse Responses (RIRs) including annotations of early echo timings and 3D positions of microphones, real sources and image sources under different wall configurations in a cuboid room. These data provide a tool for benchmarking recent methods in echo-aware speech enhancement, room geometry estimation, RIR estimation, acoustic echo retrieval, microphone calibration, echo labeling and reflectors estimation. The database is accompanied with software utilities to easily access, manipulate and visualize the data as well as baseline methods for echo-related tasks. △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2104.08580 [pdf, other]

Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Authors: Axel Marmoret, Jérémy E. Cohen, Nancy Bertin, Frédéric Bimbot

Abstract: Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few… ▽ More Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few patterns, we illustrate the ability of the decomposition to capture and single out repeated motifs in the corresponding compressed space, which can be interpreted from a musical viewpoint. The resulting features also turn out to be efficient for structural segmentation, leading to experimental results on the RWC Pop data set which are potentially challenging state-of-the-art approaches that rely on extensive example-based learning schemes. △ Less

Submitted 17 April, 2021; originally announced April 2021.

Comments: 7 pages, 6 figures; Code and experiments details available at https://gitlab.inria.fr/amarmore/musicntd/-/tree/0.1.0; Experiments details available at https://ax-le.github.io/resources/ISMIR2020/Notebooks_mainpage.html

Report number: ISBN: 978-0-9813537-0-8 ACM Class: H.5.5

Journal ref: 21st International Society for Music Information Retrieval Conference (ISMIR), Montréal, Canada, 2020, 788-794

arXiv:2005.10228 [pdf, other]

Sparsity-based audio declip** methods: selected overview, new algorithms, and large-scale evaluation

Authors: Clément Gaultier, Srđan Kitić, Rémi Gribonval, Nancy Bertin

Abstract: Recent advances in audio declip** have substantially improved the state of the art.% in certain saturation regimes. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the clip** levels in existing small-scale benchmarks are moderate and… ▽ More Recent advances in audio declip** have substantially improved the state of the art.% in certain saturation regimes. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the clip** levels in existing small-scale benchmarks are moderate and call for benchmarks with more perceptually significant clip** levels. We then propose a general algorithmic framework for declip** that covers existing and new combinations of variants of state-of-the-art techniques exploiting time-frequency sparsity: synthesis vs. analysis sparsity, with plain or structured sparsity. Finally, we systematically compare these combinations and a selection of state-of-the-art methods. Using a large-scale numerical benchmark and a smaller scale formal listening test, we provide guidelines for various clip** levels, both for speech and various musical genres. The code is made publicly available for the purpose of reproducible research and benchmarking. △ Less

Submitted 30 November, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:1906.08968 [pdf, other]

Mirage: 2D Source Localization Using Microphone Pair Augmentation with Echoes

Authors: Diego Di Carlo, Antoine Deleforge, Nancy Bertin

Abstract: It is commonly observed that acoustic echoes hurt performance of sound source localization (SSL) methods. We introduce the concept of microphone array augmentation with echoes (MIRAGE) and show how estimation of early-echo characteristics can in fact benefit SSL. We propose a learning-based scheme for echo estimation combined with a physics-based scheme for echo aggregation. In a simple scenario i… ▽ More It is commonly observed that acoustic echoes hurt performance of sound source localization (SSL) methods. We introduce the concept of microphone array augmentation with echoes (MIRAGE) and show how estimation of early-echo characteristics can in fact benefit SSL. We propose a learning-based scheme for echo estimation combined with a physics-based scheme for echo aggregation. In a simple scenario involving 2 microphones close to a reflective surface and one source, we show using simulated data that the proposed approach performs similarly to a correlation-based method in azimuth estimation while retrieving elevation as well from 2 microphones only, an impossible task in anechoic settings. △ Less

Submitted 21 June, 2019; originally announced June 2019.

Journal ref: International Conferenze on Acoustic, Speech Signal Processing - ICASSP 2019, May 2019, Calgary, United Kingdom

arXiv:1812.05901 [pdf, ps, other]

Evaluation of an open-source implementation of the SRP-PHAT algorithm within the 2018 LOCATA challenge

Authors: Romain Lebarbenchon, Ewen Camberlein, Diego di Carlo, Clément Gaultier, Antoine Deleforge, Nancy Bertin

Abstract: This short paper presents an efficient, flexible implementation of the SRP-PHAT multichannel sound source localization method. The method is evaluated on the single-source tasks of the LOCATA 2018 development dataset, and an associated Matlab toolbox is made available online. This short paper presents an efficient, flexible implementation of the SRP-PHAT multichannel sound source localization method. The method is evaluated on the single-source tasks of the LOCATA 2018 development dataset, and an associated Matlab toolbox is made available online. △ Less

Submitted 14 December, 2018; originally announced December 2018.

Comments: In Proceedings of the LOCATA Challenge Workshop - a satellite event of IWAENC 2018 (arXiv:1811.08482 )

Report number: LOCATAchallenge/2018/01

arXiv:1711.11259 [pdf, other]

A modeling and algorithmic framework for (non)social (co)sparse audio restoration

Authors: Clément Gaultier, Nancy Bertin, Srđan Kitić, Rémi Gribonval

Abstract: We propose a unified modeling and algorithmic framework for audio restoration problem. It encompasses analysis sparse priors as well as more classical synthesis sparse priors, and regular sparsity as well as various forms of structured sparsity embodied by shrinkage operators (such as social shrinkage). The versatility of the framework is illustrated on two restoration scenarios: denoising, and de… ▽ More We propose a unified modeling and algorithmic framework for audio restoration problem. It encompasses analysis sparse priors as well as more classical synthesis sparse priors, and regular sparsity as well as various forms of structured sparsity embodied by shrinkage operators (such as social shrinkage). The versatility of the framework is illustrated on two restoration scenarios: denoising, and declip**. Extensive experimental results on these scenarios highlight both the speedups of 20% or even more offered by the analysis sparse prior, and the substantial declip** quality that is achievable with both the social and the plain flavor. While both flavors overall exhibit similar performance, their detailed comparison displays distinct trends depending whether declip** or denoising is considered. △ Less

Submitted 30 November, 2017; originally announced November 2017.

Showing 1–7 of 7 results for author: Bertin, N