Skip to main content

Showing 1–33 of 33 results for author: Bello, J P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  2. arXiv:2401.08717  [pdf, other

    cs.SD eess.AS

    Robust DOA estimation using deep acoustic imaging

    Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello

    Abstract: Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2309.13343  [pdf, other

    cs.SD eess.AS

    Two vs. Four-Channel Sound Event Localization and Detection

    Authors: Julia Wilkins, Magdalena Fuentes, Luca Bondi, Shabnam Ghaffarzadegan, Ali Abavisani, Juan Pablo Bello

    Abstract: Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficial to further the development of SELD systems using a multichannel recording setup such as first-order Ambisonics (FOA), most consumer electronics devi… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  4. arXiv:2309.09288  [pdf, other

    cs.SD eess.AS

    Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions

    Authors: Saksham Singh Kushwaha, Iran R. Roman, Magdalena Fuentes, Juan Pablo Bello

    Abstract: Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing a… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted in WASPAA 2023

  5. arXiv:2308.09089  [pdf, other

    cs.SD cs.CV cs.IR cs.MM eess.AS

    Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

    Authors: Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto

    Abstract: Finding the right sound effects (SFX) to match moments in a video is a difficult and time-consuming task, and relies heavily on the quality and completeness of text metadata. Retrieving high-quality (HQ) SFX using a video frame directly as the query is an attractive alternative, removing the reliance on text metadata and providing a low barrier to entry for non-experts. Due to the lack of HQ audio… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: WASPAA 2023. Project page: https://juliawilkins.github.io/sound-effects-retrieval-from-video/. 4 pages, 2 figures, 2 tables

  6. arXiv:2303.10667  [pdf, other

    cs.SD eess.AS

    Audio-Text Models Do Not Yet Leverage Natural Language

    Authors: Ho-Hsiang Wu, Oriol Nieto, Juan Pablo Bello, Justin Salamon

    Abstract: Multi-modal contrastive learning techniques in the audio-text domain have quickly become a highly active area of research. Most works are evaluated with standard audio retrieval and classification benchmarks assuming that (i) these models are capable of leveraging the rich information contained in natural language, and (ii) current benchmarks are able to capture the nuances of such information. In… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  7. arXiv:2211.08367  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    FlowGrad: Using Motion for Visual Sound Source Localization

    Authors: Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

    Abstract: Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos. While it proves to be effective for widely used benchmark datasets, the method falls short for challenging scenarios like urban traffic. This work introduces temporal context into the state-of-the-ar… ▽ More

    Submitted 14 April, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted in ICASSP 2023

  8. arXiv:2205.01273  [pdf, other

    cs.SD eess.AS

    Few-Shot Musical Source Separation

    Authors: Yu Wang, Daniel Stoller, Rachel M. Bittner, Juan Pablo Bello

    Abstract: Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: ICASSP 2022

  9. arXiv:2204.05156  [pdf, other

    cs.SD eess.AS

    How to Listen? Rethinking Visual Sound Localization

    Authors: Ho-Hsiang Wu, Magdalena Fuentes, Prem Seetharaman, Juan Pablo Bello

    Abstract: Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and urban traffic. Previous works are usually evaluated with datasets having mostly a single dominant visible object, and proposed models usually require the introduc… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  10. arXiv:2203.10425  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    A Study on Robustness to Perturbations for Representations of Environmental Sound

    Authors: Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello

    Abstract: Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. The… ▽ More

    Submitted 6 July, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted in EUSIPCO 2022

  11. arXiv:2203.06220  [pdf, other

    cs.SD cs.NI eess.AS

    Infrastructure-free, Deep Learned Urban Noise Monitoring at $\sim$100mW

    Authors: Jihoon Yun, Sangeeta Srivastava, Dhrubojyoti Roy, Nathan Stohs, Charlie Mydlarz, Mahin Salman, Bea Steers, Juan Pablo Bello, Anish Arora

    Abstract: The Sounds of New York City (SONYC) wireless sensor network (WSN) has been fielded in Manhattan and Brooklyn over the past five years, as part of a larger human-in-the-loop cyber-physical control system for monitoring, analyzing, and mitigating urban noise pollution. We describe the evolution of the 2-tier SONYC WSN from an acoustic data collection fabric into a 3-tier in situ noise complaint moni… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted in ICCPS 2022

  12. arXiv:2110.11499  [pdf, other

    cs.SD cs.LG eess.AS

    Wav2CLIP: Learning Robust Audio Representations From CLIP

    Authors: Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

    Abstract: We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pre-trained audio representation algorithms. Wav2CLIP projects audio into a shared e… ▽ More

    Submitted 15 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  13. arXiv:2110.09600  [pdf, other

    cs.SD eess.AS

    Who calls the shots? Rethinking Few-Shot Learning for Audio

    Authors: Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, Juan Pablo Bello

    Abstract: Few-shot learning aims to train models that can recognize novel classes given just a handful of labeled examples, known as the support set. While the field has seen notable advances in recent years, they have often focused on multi-class image classification. Audio, in contrast, is often multi-label due to overlap** sounds, resulting in unique properties such as polyphony and signal-to-noise rat… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: WASPAA 2021

  14. arXiv:2109.12690  [pdf, ps, other

    cs.SD cs.DB cs.LG eess.AS

    Soundata: A Python library for reproducible use of audio datasets

    Authors: Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

    Abstract: Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, valid… ▽ More

    Submitted 4 October, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

  15. arXiv:2106.01149  [pdf, other

    cs.SD cs.IR eess.AS

    Exploring modality-agnostic representations for music classification

    Authors: Ho-Hsiang Wu, Magdalena Fuentes, Juan P. Bello

    Abstract: Music information is often conveyed or recorded across multiple data modalities including but not limited to audio, images, text and scores. However, music information retrieval research has almost exclusively focused on single modality recognition, requiring development of separate models for each modality. Some multi-modal works require multiple coexisting modalities given to the model as inputs… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

  16. arXiv:2105.02911  [pdf, other

    eess.AS cs.SD

    Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes

    Authors: Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, Juan Pablo Bello

    Abstract: While the estimation of what sound sources are, when they occur, and from where they originate has been well-studied, the estimation of how loud these sound sources are has been often overlooked. Current solutions to this task, which we refer to as source-specific sound level estimation (SSSLE), suffer from challenges due to the impracticality of acquiring realistic data and a lack of robustness t… ▽ More

    Submitted 29 July, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, WASPAA 2021 preprint

  17. arXiv:2102.03229  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Task Self-Supervised Pre-Training for Music Classification

    Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

    Abstract: Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset.… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  18. arXiv:2009.05188  [pdf, other

    cs.SD cs.LG eess.AS

    SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context

    Authors: Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, Juan Pablo Bello

    Abstract: We present SONYC-UST-V2, a dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  19. arXiv:2008.02791  [pdf, other

    cs.SD eess.AS

    Few-Shot Drum Transcription in Polyphonic Music

    Authors: Yu Wang, Justin Salamon, Mark Cartwright, Nicholas J. Bryan, Juan Pablo Bello

    Abstract: Data-driven approaches to automatic drum transcription (ADT) are often limited to a predefined, small vocabulary of percussion instrument classes. Such models cannot recognize out-of-vocabulary classes nor are they able to adapt to finer-grained vocabularies. In this work, we address open vocabulary ADT by introducing few-shot learning to the task. We train a Prototypical Network on a synthetic da… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: ISMIR 2020 camera-ready

  20. arXiv:2003.01037  [pdf, other

    cs.SD cs.LG eess.AS

    One or Two Components? The Scattering Transform Answers

    Authors: Vincent Lostanlen, Alice Cohen-Hadria, Juan Pablo Bello

    Abstract: With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network. First, we show that renormalizing second-order nodes by their first-order parents gives a simple numerical criterion to assess whether two neighboring components will interfere psychoacoustically. Secondly, we run a man… ▽ More

    Submitted 25 June, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 5 pages, 4 figures, in English. Proceedings of the European Signal Processing Conference (EUSIPCO 2020)

  21. arXiv:1911.00417  [pdf, other

    cs.SD cs.LG eess.AS

    Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

    Authors: Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

    Abstract: This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalize… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

    Comments: 5 pages, 3 figures. Presented at the 3rd International Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE). 25--26 October 2019, New York, NY, USA

  22. arXiv:1910.10246  [pdf, other

    cs.SD cs.LG eess.AS

    Learning the helix topology of musical pitch

    Authors: Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello

    Abstract: To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between freque… ▽ More

    Submitted 4 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: 5 pages, 6 figures. To appear in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, May 2020

  23. arXiv:1906.08512  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Adversarial Learning for Improved Onsets and Frames Music Transcription

    Authors: Jong Wook Kim, Juan Pablo Bello

    Abstract: Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. Ho… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  24. arXiv:1905.08352  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Robust sound event detection in bioacoustic sensor networks

    Authors: Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello

    Abstract: Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and acro… ▽ More

    Submitted 29 October, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 2019

  25. arXiv:1903.03195  [pdf, other

    cs.SD eess.AS

    The life of a New York City noise sensor network

    Authors: Charlie Mydlarz, Mohit Sharma, Yitzchak Lockerman, Ben Steers, Claudio Silva, Juan Pablo Bello

    Abstract: Noise pollution is one of the topmost quality of life issues for urban residents in the United States. Continued exposure to high levels of noise has proven effects on health, including acute effects such as sleep disruption, and long-term effects such as hypertension, heart disease, and hearing loss. To investigate and ultimately aid in the mitigation of urban noise, a network of 55 sensor nodes… ▽ More

    Submitted 26 March, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: This article belongs to the Section Intelligent Sensors, 24 pages, 15 figures, 3 tables, 45 references

    Journal ref: Sensors 2019, 19, 1415

  26. arXiv:1811.00223  [pdf, other

    cs.SD eess.AS stat.ML

    Neural Music Synthesis for Flexible Timbre Control

    Authors: Jong Wook Kim, Rachel Bittner, Aparna Kumar, Juan Pablo Bello

    Abstract: The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process --- creating audio samples from a score and instrument information --- is modeled using generative neural networks. This paper describes a neural music synthesis model with flexible timbre controls, which consists of a recurrent neural network conditioned… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

  27. arXiv:1809.00381  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Multitask Learning for Fundamental Frequency Estimation in Music

    Authors: Rachel M. Bittner, Brian McFee, Juan P. Bello

    Abstract: Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line e… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

  28. arXiv:1805.00889  [pdf, other

    cs.SD cs.CY cs.HC eess.AS

    SONYC: A System for the Monitoring, Analysis and Mitigation of Urban Noise Pollution

    Authors: Juan Pablo Bello, Claudio Silva, Oded Nov, R. Luke DuBois, Anish Arora, Justin Salamon, Charles Mydlarz, Harish Doraiswamy

    Abstract: We present the Sounds of New York City (SONYC) project, a smart cities initiative focused on develo** a cyber-physical system for the monitoring, analysis and mitigation of urban noise pollution. Noise pollution is one of the topmost quality of life issues for urban residents in the U.S. with proven effects on health, education, the economy, and the environment. Yet, most cities lack the resourc… ▽ More

    Submitted 18 May, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: Accepted May 2018, Communications of the ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record will be published in Communications of the ACM

  29. arXiv:1804.10070  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Adaptive pooling operators for weakly labeled sound event detection

    Authors: Brian McFee, Justin Salamon, Juan Pablo Bello

    Abstract: Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor- and cost-intensive for hu… ▽ More

    Submitted 10 August, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

  30. arXiv:1802.06182  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    CREPE: A Convolutional Representation for Pitch Estimation

    Authors: Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello

    Abstract: The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

    Comments: ICASSP 2018

  31. arXiv:1608.04363  [pdf, other

    cs.SD cs.CV cs.LG cs.NE

    Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

    Authors: Justin Salamon, Juan Pablo Bello

    Abstract: The ability of deep convolutional neural networks (CNN) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep convolutional neural network architecture for env… ▽ More

    Submitted 28 November, 2016; v1 submitted 15 August, 2016; originally announced August 2016.

    Comments: Accepted November 2016, IEEE Signal Processing Letters. Copyright IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material, creating new collective works, for resale or redistribution, or reuse of any copyrighted component of this work in other works

  32. arXiv:1605.08450  [pdf, other

    cs.SD

    The Implementation of Low-cost Urban Acoustic Monitoring Devices

    Authors: Charlie Mydlarz, Justin Salamon, Juan Pablo Bello

    Abstract: The urban sound environment of New York City (NYC) can be, amongst other things: loud, intrusive, exciting and dynamic. As indicated by the large majority of noise complaints registered with the NYC 311 information/complaints line, the urban sound environment has a profound effect on the quality of life of the city's inhabitants. To monitor and ultimately understand these sonic environments, a pro… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

    Comments: Accepted into the Journal of Applied Acoustics special issue: Acoustics of Smart Cities. 26 pages, 12 figures

    ACM Class: H.5.5; C.0; C.3; C.4

  33. arXiv:1605.08396  [pdf, other

    cs.SD cs.NE

    Robust Downbeat Tracking Using an Ensemble of Convolutional Networks

    Authors: S. Durand, J. P. Bello, B. David, G. Richard

    Abstract: In this paper, we present a novel state of the art system for automatic downbeat tracking from music signals. The audio signal is first segmented in frames which are synchronized at the tatum level of the music. We then extract different kind of features based on harmony, melody, rhythm and bass content to feed convolutional neural networks that are adapted to take advantage of each feature charac… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.