Skip to main content

Showing 1–3 of 3 results for author: Roman, A S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2401.17129  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes

    Authors: Adrian S. Roman, Baladithya Balamurugan, Rithik Pothuganti

    Abstract: This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a fr… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  2. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  3. arXiv:2401.08717  [pdf, other

    cs.SD eess.AS

    Robust DOA estimation using deep acoustic imaging

    Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello

    Abstract: Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.