Skip to main content

Showing 1–18 of 18 results for author: McFee, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  2. arXiv:2309.03337  [pdf, other

    eess.AS cs.SD

    Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

    Authors: Christopher Ick, Brian McFee

    Abstract: As deeper and more complex models are developed for the task of sound event localization and detection (SELD), the demand for annotated spatial audio data continues to increase. Annotating field recordings with 360$^{\circ}$ video takes many hours from trained annotators, while recording events within motion-tracked laboratories are bounded by cost and expertise. Because of this, localization mode… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures, 3 tables, presented in the Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)

  3. arXiv:2307.10834  [pdf, other

    eess.AS cs.SD

    Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

    Authors: Changhong Wang, Gaël Richard, Brian McFee

    Abstract: Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allo… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 7 pages, 3 figures, accepted to the conference of the International Society for Music Information Retrieval (ISMIR 2023)

  4. arXiv:2304.12521  [pdf, other

    cs.SD eess.AS

    Foley Sound Synthesis at the DCASE 2023 Challenge

    Authors: Keunwoo Choi, Jaekwon Im, Laurie Heller, Brian McFee, Keisuke Imoto, Yuki Okamoto, Mathieu Lagrange, Shinosuke Takamichi

    Abstract: The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic F… ▽ More

    Submitted 28 September, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: DCASE 2023 Challenge - Task 7 - Technical Report (Submitted to DCASE 2023 Workshop)

  5. arXiv:2207.10760  [pdf, ps, other

    cs.SD cs.AI cs.MM eess.AS

    A Proposal for Foley Sound Synthesis Challenge

    Authors: Keunwoo Choi, Sangshin Oh, Minsung Kang, Brian McFee

    Abstract: "Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e.g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen. While foley is traditionally produced by foley artists, there is increasing interest in automatic or machine-assisted techniques building upon recent advances in… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  6. arXiv:2104.01304  [pdf, other

    cs.SD eess.AS

    Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio

    Authors: Jeffrey Tumminia, Amanda Kuznecov, Sophia Tsilerides, Ilana Weinstein, Brian McFee, Michael Picheny, Aaron R. Kaufman

    Abstract: United States Courts make audio recordings of oral arguments available as public record, but these recordings rarely include speaker annotations. This paper addresses the Speech Audio Diarization problem, answering the question of "Who spoke when?" in the domain of judicial oral argument proceedings. We present a workflow for diarizing the speech of judges using audio recordings of oral arguments,… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: Under review for InterSpeech 2021

  7. arXiv:2102.03468  [pdf, other

    eess.AS cs.LG cs.SD

    Sound Event Detection in Urban Audio With Single and Multi-Rate PCEN

    Authors: Christopher Ick, Brian McFee

    Abstract: Recent literature has demonstrated that the use of per-channel energy normalization (PCEN), has significant performance improvements over traditional log-scaled mel-frequency spectrograms in acoustic sound event detection (SED) in a multi-class setting with overlap** events. However, the configuration of PCEN's parameters is sensitive to the recording environment, the characteristics of the clas… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, 1 table, accepted for publication in IEEE ICASSP 2021

  8. arXiv:2102.03229  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Task Self-Supervised Pre-Training for Music Classification

    Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

    Abstract: Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset.… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  9. arXiv:2009.04172  [pdf, other

    eess.AS cs.LG cs.SD

    Multiple F0 Estimation in Vocal Ensembles using Convolutional Neural Networks

    Authors: Helena Cuesta, Brian McFee, Emilia Gómez

    Abstract: This paper addresses the extraction of multiple F0 values from polyphonic and a cappella vocal performances using convolutional neural networks (CNNs). We address the major challenges of ensemble singing, i.e., all melodic sources are vocals and singers sing in harmony. We build upon an existing architecture to produce a pitch salience function of the input signal, where the harmonic constant-Q tr… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: Accepted to the 21st International Society for Music Information Retrieval (ISMIR) Conference (2020)

  10. arXiv:1910.10246  [pdf, other

    cs.SD cs.LG eess.AS

    Learning the helix topology of musical pitch

    Authors: Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello

    Abstract: To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between freque… ▽ More

    Submitted 4 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: 5 pages, 6 figures. To appear in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, May 2020

  11. arXiv:1905.03314  [pdf, ps, other

    cs.CY astro-ph.IM physics.ed-ph

    Entrofy Your Cohort: A Data Science Approach to Candidate Selection

    Authors: D. Huppenkothen, B. McFee, L. Norén

    Abstract: Selecting a cohort from a set of candidates is a common task within and beyond academia. Admitting students, awarding grants, choosing speakers for a conference are situations where human biases may affect the make-up of the final cohort. We propose a new algorithm, Entrofy, designed to be part of a larger decision making strategy aimed at making cohort selection as just, quantitative, transparent… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: 22 pages, 4 figures, submitted to PLOS One. The accompanying software is available at https://github.com/dhuppenkothen/entrofy

  12. arXiv:1902.01023  [pdf, other

    cs.IR

    Enhanced Hierarchical Music Structure Annotations via Feature Level Similarity Fusion

    Authors: Christopher J. Tralie, Brian McFee

    Abstract: We describe a novel pipeline to automatically discover hierarchies of repeated sections in musical audio. The proposed method uses similarity network fusion (SNF) to combine different frame-level features into clean affinity matrices, which are then used as input to spectral clustering. While prior spectral clustering approaches to music structure analysis have pre-processed affinity matrices with… ▽ More

    Submitted 3 February, 2019; originally announced February 2019.

    Comments: 5 pages, 3 figures, 1 table

    ACM Class: H.5.5

    Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2019

  13. arXiv:1809.00381  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Multitask Learning for Fundamental Frequency Estimation in Music

    Authors: Rachel M. Bittner, Brian McFee, Juan P. Bello

    Abstract: Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line e… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

  14. arXiv:1804.10070  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Adaptive pooling operators for weakly labeled sound event detection

    Authors: Brian McFee, Justin Salamon, Juan Pablo Bello

    Abstract: Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor- and cost-intensive for hu… ▽ More

    Submitted 10 August, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

  15. arXiv:1608.04868  [pdf, other

    cs.MM cs.AI cs.CL

    Towards Music Captioning: Generating Music Playlist Descriptions

    Authors: Keunwoo Choi, George Fazekas, Brian McFee, Kyunghyun Cho, Mark Sandler

    Abstract: Descriptions are often provided along with recommendations to help users' discovery. Recommending automatically generated music playlists (e.g. personalised playlists) introduces the problem of generating descriptions. In this paper, we propose a method for generating music playlist descriptions, which is called as music captioning. In the proposed method, audio content analysis and natural langua… ▽ More

    Submitted 15 January, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

    Comments: 2 pages, ISMIR 2016 Late-breaking/session extended abstract

  16. Codebook based Audio Feature Representation for Music Information Retrieval

    Authors: Yonatan Vaizman, Brian McFee, Gert Lanckriet

    Abstract: Digital music has become prolific in the web in recent decades. Automated recommendation systems are essential for users to discover music they love and for artists to reach appropriate audience. When manual annotations and user preference data is lacking (e.g. for new artists) these systems must rely on \emph{content based} methods. Besides powerful machine learning tools for classification and r… ▽ More

    Submitted 19 December, 2013; originally announced December 2013.

    Comments: Journal paper. Submitted to IEEE transactions on Audio, Speech and Language Processing. Submitted on Dec 18th, 2013

  17. arXiv:1105.2344  [pdf, ps, other

    cs.MM

    Learning content similarity for music recommendation

    Authors: Brian McFee, Luke Barrington, Gert Lanckriet

    Abstract: Many tasks in music information retrieval, such as recommendation, and playlist generation for online radio, fall naturally into the query-by-example setting, wherein a user queries the system by providing a song, and the system responds with a list of relevant or similar song recommendations. Such applications ultimately depend on the notion of similarity between items to produce high-quality res… ▽ More

    Submitted 11 May, 2011; originally announced May 2011.

  18. arXiv:1008.5163  [pdf, ps, other

    cs.AI

    Learning Multi-modal Similarity

    Authors: Brian McFee, Gert Lanckriet

    Abstract: In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challe… ▽ More

    Submitted 30 August, 2010; originally announced August 2010.