-
Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms
Authors:
Iran R. Roman,
Christopher Ick,
Sivan Ding,
Adrian S. Roman,
Brian McFee,
Juan P. Bello
Abstract:
Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific…
▽ More
Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific rooms. We present SpatialScaper, a library for SELD data simulation and augmentation. Compared to existing tools, SpatialScaper emulates virtual rooms via parameters such as size and wall absorption. This allows for parameterized placement (including movement) of foreground and background sound sources. SpatialScaper also includes data augmentation pipelines that can be applied to existing SELD data. As a case study, we use SpatialScaper to add rooms to the DCASE SELD data. Training a model with our data led to progressive performance improves as a direct function of acoustic diversity. These results show that SpatialScaper is valuable to train robust SELD models.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Robust DOA estimation using deep acoustic imaging
Authors:
Adrian S. Roman,
Iran R. Roman,
Juan P. Bello
Abstract:
Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution…
▽ More
Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution microphone arrays, yet most DoAE datasets use low-resolution ones. Therefore, we first propose a super-resolution method to upsample low-resolution microphones. Next, we benchmark DoAE models that use SIMs as input. We arrive to a model that uses SIMs for DoAE estimation and outperforms a baseline and a state-of-the-art model. Our study highlights the relevance of acoustic imaging for DoAE tasks.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Total variation in popular rap vocals from 2009-2023: extension of the analysis by Georgieva, Ripolles & McFee
Authors:
Kelvin L Walls,
Iran R Roman,
Bea Steers,
Elena Georgieva
Abstract:
Pitch variability in rap vocals is overlooked in favor of the genre's uniquely dynamic rhythmic properties. We present an analysis of fundamental frequency (F0) variation in rap vocals over the past 14 years, focusing on song examples that represent the state of modern rap music. Our analysis aims at identifying meaningful trends over time, and is in turn a continuation of the 2023 analysis by Geo…
▽ More
Pitch variability in rap vocals is overlooked in favor of the genre's uniquely dynamic rhythmic properties. We present an analysis of fundamental frequency (F0) variation in rap vocals over the past 14 years, focusing on song examples that represent the state of modern rap music. Our analysis aims at identifying meaningful trends over time, and is in turn a continuation of the 2023 analysis by Georgieva, Ripolles & McFee. They found rap to be an outlier with larger F0 variation compared to other genres, but with a declining trend since the genre's inception. However, they only analyzed data through 2010. Our analysis looks beyond 2010. We once again observe rap's large F0 variation, but with a decelerated decline in recent years.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
F0 analysis of Ghanaian pop singing reveals progressive alignment with equal temperament over the past three decades: a case study
Authors:
Iran R. Roman,
Daniel Faronbi,
Isabelle Burger-Weiser,
Leila Adu-Gilmore
Abstract:
Contemporary Ghanaian popular singing combines European and traditional Ghanaian influences. We hypothesize that access to technology embedded with equal temperament catalyzed a progressive alignment of Ghanaian singing with equal-tempered scales over time. To test this, we study the Ghanaian singer Daddy Lumba, whose work spans from the earliest Ghanaian electronic style in the late 1980s to the…
▽ More
Contemporary Ghanaian popular singing combines European and traditional Ghanaian influences. We hypothesize that access to technology embedded with equal temperament catalyzed a progressive alignment of Ghanaian singing with equal-tempered scales over time. To test this, we study the Ghanaian singer Daddy Lumba, whose work spans from the earliest Ghanaian electronic style in the late 1980s to the present. Studying a singular musician as a case study allows us to refine our analysis without over-interpreting the findings. We curated a collection of his songs, distributed between 1989 and 2016, to extract F0 values from isolated vocals. We used Gaussian mixture modeling (GMM) to approximate each song's scale and found that the pitch variance has been decreasing over time. We also determined whether the GMM components follow the arithmetic relationships observed in equal-tempered scales, and observed that Daddy Lumba's singing better aligns with equal temperament in recent years. Together, results reveal the impact of exposure to equal-tempered scales, resulting in lessened microtonal content in Daddy Lumba's singing. Our study highlights a potential vulnerability of Ghanaian musical scales and implies a need for research that maps and archives singing styles.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions
Authors:
Saksham Singh Kushwaha,
Iran R. Roman,
Magdalena Fuentes,
Juan Pablo Bello
Abstract:
Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing a…
▽ More
Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing approaches assume recordings by non-coincident microphones to use methods that are susceptible to differences in room reverberation. We present a CRNN able to estimate the distance of moving sound sources across multiple datasets featuring diverse rooms, outperforming a recently-published approach. We also characterize our model's performance as a function of sound source distance and different training losses. This analysis reveals optimal training using a loss that weighs model errors as an inverse function of the sound source true distance. Our study is the first to demonstrate that sound source distance estimation can be performed across diverse acoustic conditions using deep learning.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Soundata: A Python library for reproducible use of audio datasets
Authors:
Magdalena Fuentes,
Justin Salamon,
Pablo Zinemanas,
Martín Rocamora,
Genís Paja,
Irán R. Román,
Marius Miron,
Xavier Serra,
Juan Pablo Bello
Abstract:
Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, valid…
▽ More
Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, validate that the dataset is complete and correct, and more. Soundata is based and inspired on mirdata and design to complement mirdata by working with environmental sound, bioacoustic and speech datasets, among others. Soundata was created to be easy to use, easy to contribute to, and to increase reproducibility and standardize usage of sound datasets in a flexible way.
△ Less
Submitted 4 October, 2021; v1 submitted 26 September, 2021;
originally announced September 2021.