Skip to main content

Showing 1–12 of 12 results for author: McFee, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  2. arXiv:2309.03337  [pdf, other

    eess.AS cs.SD

    Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

    Authors: Christopher Ick, Brian McFee

    Abstract: As deeper and more complex models are developed for the task of sound event localization and detection (SELD), the demand for annotated spatial audio data continues to increase. Annotating field recordings with 360$^{\circ}$ video takes many hours from trained annotators, while recording events within motion-tracked laboratories are bounded by cost and expertise. Because of this, localization mode… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures, 3 tables, presented in the Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)

  3. arXiv:2307.10834  [pdf, other

    eess.AS cs.SD

    Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

    Authors: Changhong Wang, Gaël Richard, Brian McFee

    Abstract: Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allo… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 7 pages, 3 figures, accepted to the conference of the International Society for Music Information Retrieval (ISMIR 2023)

  4. arXiv:2304.12521  [pdf, other

    cs.SD eess.AS

    Foley Sound Synthesis at the DCASE 2023 Challenge

    Authors: Keunwoo Choi, Jaekwon Im, Laurie Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, Mathieu Lagrange, Shinosuke Takamichi

    Abstract: The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic F… ▽ More

    Submitted 28 September, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: DCASE 2023 Challenge - Task 7 - Technical Report (Submitted to DCASE 2023 Workshop)

  5. arXiv:2207.10760  [pdf, ps, other

    cs.SD cs.AI cs.MM eess.AS

    A Proposal for Foley Sound Synthesis Challenge

    Authors: Keunwoo Choi, Sangshin Oh, Minsung Kang, Brian McFee

    Abstract: "Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e.g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen. While foley is traditionally produced by foley artists, there is increasing interest in automatic or machine-assisted techniques building upon recent advances in… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  6. arXiv:2104.01304  [pdf, other

    cs.SD eess.AS

    Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

    Authors: Jeffrey Tumminia, Amanda Kuznecov, Sophia Tsilerides, Ilana Weinstein, Brian McFee, Michael Picheny, Aaron R. Kaufman

    Abstract: United States Courts make audio recordings of oral arguments available as public record, but these recordings rarely include speaker annotations. This paper addresses the Speech Audio Diarization problem, answering the question of "Who spoke when?" in the domain of judicial oral argument proceedings. We present a workflow for diarizing the speech of judges using audio recordings of oral arguments,… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: Under review for InterSpeech 2021

  7. arXiv:2102.03468  [pdf, other

    eess.AS cs.LG cs.SD

    Sound Event Detection in Urban Audio With Single and Multi-Rate PCEN

    Authors: Christopher Ick, Brian McFee

    Abstract: Recent literature has demonstrated that the use of per-channel energy normalization (PCEN), has significant performance improvements over traditional log-scaled mel-frequency spectrograms in acoustic sound event detection (SED) in a multi-class setting with overlap** events. However, the configuration of PCEN's parameters is sensitive to the recording environment, the characteristics of the clas… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, 1 table, accepted for publication in IEEE ICASSP 2021

  8. arXiv:2102.03229  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Task Self-Supervised Pre-Training for Music Classification

    Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

    Abstract: Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset.… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  9. arXiv:2009.04172  [pdf, other

    eess.AS cs.LG cs.SD

    Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks

    Authors: Helena Cuesta, Brian McFee, Emilia Gómez

    Abstract: This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs). We address the major challenges of ensemble singing, i.e., all melodic sources are vocals and singers sing in harmony. We build upon an existing architecture to produce a pitch salience function of the input signal, where the harmonic constant-Q tr… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: Accepted to the 21st International Society for Music Information Retrieval (ISMIR) Conference (2020)

  10. arXiv:1910.10246  [pdf, other

    cs.SD cs.LG eess.AS

    Learning the helix topology of musical pitch

    Authors: Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello

    Abstract: To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between freque… ▽ More

    Submitted 4 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: 5 pages, 6 figures. To appear in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, May 2020

  11. arXiv:1809.00381  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Multitask Learning for Fundamental Frequency Estimation in Music

    Authors: Rachel M. Bittner, Brian McFee, Juan P. Bello

    Abstract: Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line e… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

  12. arXiv:1804.10070  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Adaptive pooling operators for weakly labeled sound event detection

    Authors: Brian McFee, Justin Salamon, Juan Pablo Bello

    Abstract: Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor- and cost-intensive for hu… ▽ More

    Submitted 10 August, 2018; v1 submitted 26 April, 2018; originally announced April 2018.