Skip to main content

Showing 1–6 of 6 results for author: Roman, I R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  2. arXiv:2401.08717  [pdf, other

    cs.SD eess.AS

    Robust DOA estimation using deep acoustic imaging

    Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello

    Abstract: Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2312.14036  [pdf, other

    cs.SD eess.AS

    Total variation in popular rap vocals from 2009-2023: extension of the analysis by Georgieva, Ripolles & McFee

    Authors: Kelvin L Walls, Iran R Roman, Bea Steers, Elena Georgieva

    Abstract: Pitch variability in rap vocals is overlooked in favor of the genre's uniquely dynamic rhythmic properties. We present an analysis of fundamental frequency (F0) variation in rap vocals over the past 14 years, focusing on song examples that represent the state of modern rap music. Our analysis aims at identifying meaningful trends over time, and is in turn a continuation of the 2023 analysis by Geo… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Journal ref: Ismir 2023 Hybrid Conference 2023 Nov 5

  4. arXiv:2310.00870  [pdf, other

    cs.SD cs.IR eess.AS

    F0 analysis of Ghanaian pop singing reveals progressive alignment with equal temperament over the past three decades: a case study

    Authors: Iran R. Roman, Daniel Faronbi, Isabelle Burger-Weiser, Leila Adu-Gilmore

    Abstract: Contemporary Ghanaian popular singing combines European and traditional Ghanaian influences. We hypothesize that access to technology embedded with equal temperament catalyzed a progressive alignment of Ghanaian singing with equal-tempered scales over time. To test this, we study the Ghanaian singer Daddy Lumba, whose work spans from the earliest Ghanaian electronic style in the late 1980s to the… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: Pages 27-33

  5. arXiv:2309.09288  [pdf, other

    cs.SD eess.AS

    Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions

    Authors: Saksham Singh Kushwaha, Iran R. Roman, Magdalena Fuentes, Juan Pablo Bello

    Abstract: Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing a… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted in WASPAA 2023

  6. arXiv:2109.12690  [pdf, ps, other

    cs.SD cs.DB cs.LG eess.AS

    Soundata: A Python library for reproducible use of audio datasets

    Authors: Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

    Abstract: Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, valid… ▽ More

    Submitted 4 October, 2021; v1 submitted 26 September, 2021; originally announced September 2021.