Skip to main content

Showing 1–50 of 54 results for author: Bello, J

.
  1. arXiv:2401.12238  [pdf, other

    eess.AS cs.LG cs.SD

    Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

    Authors: Iran R. Roman, Christopher Ick, Sivan Ding, Adrian S. Roman, Brian McFee, Juan P. Bello

    Abstract: Sound event localization and detection (SELD) is an important task in machine listening. Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels. SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape. However, RIRs require manual collection in specific… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table, to be presented at ICASSP 2024 in Seoul, South Korea

  2. arXiv:2401.08717  [pdf, other

    cs.SD eess.AS

    Robust DOA estimation using deep acoustic imaging

    Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello

    Abstract: Direction of arrival estimation (DoAE) aims at tracking a sound in azimuth and elevation. Recent advancements include data-driven models with inputs derived from ambisonics intensity vectors or correlations between channels in a microphone array. A spherical intensity map (SIM), or acoustic image, is an alternative input representation that remains underexplored. SIMs benefit from high-resolution… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2312.10118  [pdf, other

    cs.CV

    From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior

    Authors: Jaeho Moon, Juan Luis Gonzalez Bello, Byeongjun Kwon, Munchurl Kim

    Abstract: Self-supervised monocular depth estimation (DE) is an approach to learning depth without costly depth ground truths. However, it often struggles with moving objects that violate the static scene assumption during training. To address this issue, we introduce a coarse-to-fine training strategy leveraging the ground contacting prior based on the observation that most moving objects in outdoor scenes… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  4. arXiv:2312.08136  [pdf, other

    cs.CV eess.IV

    ProNeRF: Learning Efficient Projection-Aware Ray Sampling for Fine-Grained Implicit Neural Radiance Fields

    Authors: Juan Luis Gonzalez Bello, Minh-Quan Viet Bui, Munchurl Kim

    Abstract: Recent advances in neural rendering have shown that, albeit slow, implicit compact models can learn a scene's geometries and view-dependent appearances from multiple views. To maintain such a small memory footprint but achieve faster inference times, recent works have adopted `sampler' networks that adaptively sample a small subset of points along each ray in the implicit neural radiance fields. A… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Visit our project website at https://kaist-viclab.github.io/pronerf-site/

  5. arXiv:2312.08071  [pdf, other

    cs.CV eess.IV

    Novel View Synthesis with View-Dependent Effects from a Single Image

    Authors: Juan Luis Gonzalez Bello, Munchurl Kim

    Abstract: In this paper, we firstly consider view-dependent effects into single image-based novel view synthesis (NVS) problems. For this, we propose to exploit the camera motion priors in NVS to model view-dependent appearance or effects (VDE) as the negative disparity in the scene. By recognizing specularities "follow" the camera motion, we infuse VDEs into the input images by aggregating input pixel colo… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Visit our website https://kaist-viclab.github.io/monovde-site

  6. arXiv:2309.13343  [pdf, other

    cs.SD eess.AS

    Two vs. Four-Channel Sound Event Localization and Detection

    Authors: Julia Wilkins, Magdalena Fuentes, Luca Bondi, Shabnam Ghaffarzadegan, Ali Abavisani, Juan Pablo Bello

    Abstract: Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficial to further the development of SELD systems using a multichannel recording setup such as first-order Ambisonics (FOA), most consumer electronics devi… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  7. arXiv:2309.09288  [pdf, other

    cs.SD eess.AS

    Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions

    Authors: Saksham Singh Kushwaha, Iran R. Roman, Magdalena Fuentes, Juan Pablo Bello

    Abstract: Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing a… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted in WASPAA 2023

  8. arXiv:2308.09089  [pdf, other

    cs.SD cs.CV cs.IR cs.MM eess.AS

    Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries

    Authors: Julia Wilkins, Justin Salamon, Magdalena Fuentes, Juan Pablo Bello, Oriol Nieto

    Abstract: Finding the right sound effects (SFX) to match moments in a video is a difficult and time-consuming task, and relies heavily on the quality and completeness of text metadata. Retrieving high-quality (HQ) SFX using a video frame directly as the query is an attractive alternative, removing the reliance on text metadata and providing a low barrier to entry for non-experts. Due to the lack of HQ audio… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: WASPAA 2023. Project page: https://juliawilkins.github.io/sound-effects-retrieval-from-video/. 4 pages, 2 figures, 2 tables

  9. arXiv:2308.06246  [pdf, other

    cs.HC

    ARGUS: Visualization of AI-Assisted Task Guidance in AR

    Authors: Sonia Castelo, Joao Rulff, Erin McGowan, Bea Steers, Guande Wu, Shaoyu Chen, Iran Roman, Roque Lopez, Ethan Brewer, Chen Zhao, **g Qian, Kyunghyun Cho, He He, Qi Sun, Huy Vo, Juan Bello, Michael Krone, Claudio Silva

    Abstract: The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive the 3D environment, reason about physical tasks, and model the performer, all in real-time. Within this framework, a wide variety of senso… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 11 pages, 8 figures. This is the author's version of the article of the article that has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics

  10. arXiv:2303.10667  [pdf, other

    cs.SD eess.AS

    Audio-Text Models Do Not Yet Leverage Natural Language

    Authors: Ho-Hsiang Wu, Oriol Nieto, Juan Pablo Bello, Justin Salamon

    Abstract: Multi-modal contrastive learning techniques in the audio-text domain have quickly become a highly active area of research. Most works are evaluated with standard audio retrieval and classification benchmarks assuming that (i) these models are capable of leveraging the rich information contained in natural language, and (ii) current benchmarks are able to capture the nuances of such information. In… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  11. arXiv:2301.03085  [pdf, other

    stat.ME math.ST

    Granger causality test for heteroskedastic and structural-break time series using generalized least squares

    Authors: Hugo J. Bello

    Abstract: This paper proposes a novel method (GLS Granger test) to determine causal relationships between time series based on the estimation of the autocovariance matrix and generalized least squares. We show the effectiveness of proposed autocovariance matrix estimator (the sliding autocovariance matrix) and we compare the proposed method with the classical Granger F-test with via a synthetic dataset and… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

  12. arXiv:2211.08367  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    FlowGrad: Using Motion for Visual Sound Source Localization

    Authors: Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

    Abstract: Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos. While it proves to be effective for widely used benchmark datasets, the method falls short for challenging scenarios like urban traffic. This work introduces temporal context into the state-of-the-ar… ▽ More

    Submitted 14 April, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted in ICASSP 2023

  13. arXiv:2205.13064  [pdf, other

    cs.CY cs.HC cs.LG cs.SD eess.AS

    Urban Rhapsody: Large-scale exploration of urban soundscapes

    Authors: Joao Rulff, Fabio Miranda, Maryam Hosseini, Marcos Lage, Mark Cartwright, Graham Dove, Juan Bello, Claudio T. Silva

    Abstract: Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant analytical challenges. One way to address these c… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted at EuroVis 2022. Source code available at: https://github.com/VIDA-NYU/Urban-Rhapsody

  14. arXiv:2205.08851  [pdf, other

    cs.CV eess.IV

    Positional Information is All You Need: A Novel Pipeline for Self-Supervised SVDE from Videos

    Authors: Juan Luis Gonzalez Bello, Jaeho Moon, Munchurl Kim

    Abstract: Recently, much attention has been drawn to learning the underlying 3D structures of a scene from monocular videos in a fully self-supervised fashion. One of the most challenging aspects of this task is handling the independently moving objects as they break the rigid-scene assumption. For the first time, we show that pixel positional information can be exploited to learn SVDE (Single View Depth Es… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

  15. arXiv:2205.01273  [pdf, other

    cs.SD eess.AS

    Few-Shot Musical Source Separation

    Authors: Yu Wang, Daniel Stoller, Rachel M. Bittner, Juan Pablo Bello

    Abstract: Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: ICASSP 2022

  16. arXiv:2204.09776  [pdf

    cond-mat.mes-hall

    Ferrimagnet GdFeCo characterization for spin-orbitronics: large field-like and dam**-like torques

    Authors: Héloïse Damas, Alberto Anadon, David Céspedes-Berrocal, Junior Alegre-Saenz, Jean-Loïs Bello, Aldo Arriola-Córdova, Sylvie Migot, Jaafar Ghanbaja, Olivier Copie, Michel Hehn, Vincent Cros, Sébastien Petit-Watelot, Juan-Carlos Rojas-Sánchez

    Abstract: Spintronics is showing promising results in the search for new materials and effects to reduce energy consumption in information technology. Among these materials, ferrimagnets are of special interest, since they can produce large spin currents that trigger the magnetization dynamics of adjacent layers or even their own magnetization. Here, we present a study of the generation of spin current by G… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: 20 pages, 4 figures

    Journal ref: Physica Status Solidi - Rapid Research Letters 2022

  17. arXiv:2204.05156  [pdf, other

    cs.SD eess.AS

    How to Listen? Rethinking Visual Sound Localization

    Authors: Ho-Hsiang Wu, Magdalena Fuentes, Prem Seetharaman, Juan Pablo Bello

    Abstract: Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and urban traffic. Previous works are usually evaluated with datasets having mostly a single dominant visible object, and proposed models usually require the introduc… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  18. arXiv:2203.10425  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    A Study on Robustness to Perturbations for Representations of Environmental Sound

    Authors: Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello

    Abstract: Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. The… ▽ More

    Submitted 6 July, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted in EUSIPCO 2022

  19. arXiv:2203.06220  [pdf, other

    cs.SD cs.NI eess.AS

    Infrastructure-free, Deep Learned Urban Noise Monitoring at $\sim$100mW

    Authors: Jihoon Yun, Sangeeta Srivastava, Dhrubojyoti Roy, Nathan Stohs, Charlie Mydlarz, Mahin Salman, Bea Steers, Juan Pablo Bello, Anish Arora

    Abstract: The Sounds of New York City (SONYC) wireless sensor network (WSN) has been fielded in Manhattan and Brooklyn over the past five years, as part of a larger human-in-the-loop cyber-physical control system for monitoring, analyzing, and mitigating urban noise pollution. We describe the evolution of the 2-tier SONYC WSN from an acoustic data collection fabric into a 3-tier in situ noise complaint moni… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted in ICCPS 2022

  20. arXiv:2110.11499  [pdf, other

    cs.SD cs.LG eess.AS

    Wav2CLIP: Learning Robust Audio Representations From CLIP

    Authors: Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

    Abstract: We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pre-trained audio representation algorithms. Wav2CLIP projects audio into a shared e… ▽ More

    Submitted 15 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  21. arXiv:2110.09600  [pdf, other

    cs.SD eess.AS

    Who calls the shots? Rethinking Few-Shot Learning for Audio

    Authors: Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, Juan Pablo Bello

    Abstract: Few-shot learning aims to train models that can recognize novel classes given just a handful of labeled examples, known as the support set. While the field has seen notable advances in recent years, they have often focused on multi-class image classification. Audio, in contrast, is often multi-label due to overlap** sounds, resulting in unique properties such as polyphony and signal-to-noise rat… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: WASPAA 2021

  22. arXiv:2109.12690  [pdf, ps, other

    cs.SD cs.DB cs.LG eess.AS

    Soundata: A Python library for reproducible use of audio datasets

    Authors: Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

    Abstract: Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, valid… ▽ More

    Submitted 4 October, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

  23. arXiv:2108.06404  [pdf, other

    math.PR math.CO

    Cyclic Cellular Automata and Greenberg-Hastings Models on Regular Trees

    Authors: Jason Bello, David Sivakoff

    Abstract: We study the cyclic cellular automaton (CCA) and the Greenberg-Hastings model (GHM) with $κ\ge 3$ colors and contact threshold $θ\ge 2$ on the infinite $(d+1)$-regular tree, $T_d$. When the initial state has the uniform product distribution, we show that these dynamical systems exhibit at least two distinct phases. For sufficiently large $d$, we show that if $κ(θ-1) \le d - O(\sqrt{dκ\ln(d)})$, th… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

    Comments: 22 pages, 2 figures

    MSC Class: 60K35 (Primary) 05D99 (Secondary)

  24. arXiv:2106.01149  [pdf, other

    cs.SD cs.IR eess.AS

    Exploring modality-agnostic representations for music classification

    Authors: Ho-Hsiang Wu, Magdalena Fuentes, Juan P. Bello

    Abstract: Music information is often conveyed or recorded across multiple data modalities including but not limited to audio, images, text and scores. However, music information retrieval research has almost exclusively focused on single modality recognition, requiring development of separate models for each modality. Some multi-modal works require multiple coexisting modalities given to the model as inputs… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

  25. arXiv:2105.02911  [pdf, other

    eess.AS cs.SD

    Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes

    Authors: Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, Juan Pablo Bello

    Abstract: While the estimation of what sound sources are, when they occur, and from where they originate has been well-studied, the estimation of how loud these sound sources are has been often overlooked. Current solutions to this task, which we refer to as source-specific sound level estimation (SSSLE), suffer from challenges due to the impracticality of acquiring realistic data and a lack of robustness t… ▽ More

    Submitted 29 July, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, WASPAA 2021 preprint

  26. arXiv:2103.07362  [pdf, other

    cs.CV

    PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss

    Authors: Juan Luis Gonzalez Bello, Munchurl Kim

    Abstract: In this paper, we propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. The PLADE-Net is the first work that shows unprecedented accuracy levels, exceeding 95\% in terms of the $δ^1$ metric on the challenging KITTI dataset. Our PLADE-Net is based on a new network architecture with neural positional encoding and a novel loss function that borrows fro… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: Accepted paper (poster) at CVPR2021

  27. arXiv:2103.03727  [pdf

    cs.SI

    Suicide Classificaction for News Media Using Convolutional Neural Network

    Authors: Hugo J. Bello, Nora Palomar-Ciria, Enrique Baca-García, Celia Lozano

    Abstract: Currently, the process of evaluating suicides is highly subjective, which limits the efficacy and accuracy of prevention efforts. Artificial intelligence (AI) has emerged as a means of investigating large datasets to identify patterns within "big data" that can determine the factors on suicide outcomes. Here, we use AI tools to extract the topic from (press and social) media text. However, news me… ▽ More

    Submitted 18 February, 2021; originally announced March 2021.

  28. arXiv:2102.13502  [pdf

    cond-mat.mtrl-sci

    Direct imaging of chiral domain walls and Néel-type skyrmionium in ferrimagnetic alloys

    Authors: Boris Seng, Daniel Schönke, Javier Yeste, Robert M. Reeve, Nico Kerber, Daniel Lacour, Jean-Loïs Bello, Nicolas Bergeard, Fabian Kammerbauer, Mona Bhukta, Tom Ferté, Christine Boeglin, Florin Radu, Radu Abrudan, Torsten Kachel, Stéphane Mangin, Michel Hehn, Mathias Kläui

    Abstract: The evolution of chiral spin structures is studied in ferrimagnet Ta/Ir/Fe/GdFeCo/Pt multilayers as a function of temperature using scanning electron microscopy with polarization analysis (SEMPA). The GdFeCo ferrimagnet exhibits pure right-hand Néel-type domain wall (DW) spin textures over a large temperature range. This indicates the presence of a negative Dzyaloshinskii-Moriya interaction (DMI)… ▽ More

    Submitted 21 July, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

  29. arXiv:2102.03229  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Task Self-Supervised Pre-Training for Music Classification

    Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

    Abstract: Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and annotations for audio are time consuming and less intuitive. Besides, models learned from labeled dataset often embed biases specific to that particular dataset.… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  30. arXiv:2012.07490  [pdf

    cs.CL cs.SI

    Machine Learning to study the impact of gender-based violence in the news media

    Authors: Hugo J. Bello, Nora Palomar, Elisa Gallego, Lourdes Jiménez Navascués, Celia Lozano

    Abstract: While it remains a taboo topic, gender-based violence (GBV) undermines the health, dignity, security and autonomy of its victims. Many factors have been studied to generate or maintain this kind of violence, however, the influence of the media is still uncertain. Here, we use Machine Learning tools to extrapolate the effect of the news in GBV. By feeding neural networks with news, the topic inform… ▽ More

    Submitted 27 November, 2020; originally announced December 2020.

  31. arXiv:2010.09137  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Current-induced spin torques on single GdFeCo magnetic layers

    Authors: David Céspedes-Berrocal, Heloïse Damas, Sébastien Petit-Watelot, David Maccariello, ** Tang, Aldo Arriola-Córdova, Pierre Vallobra, Yong Xu, Jean-Loïs Bello, Elodie Martin, Sylvie Migot, Jaafar Ghanbaja, Shufeng Zhang, Michel Hehn, Stéphane Mangin, Christos Panagopoulos, Vincent Cros, Albert Fert, Juan-Carlos Rojas-Sánchez

    Abstract: Spintronics exploits spin-orbit coupling (SOC) to generate spin currents, spin torques, and, in the absence of inversion symmetry, Rashba, and Dzyaloshinskii-Moriya interactions (DMI). The widely used magnetic materials, based on 3d metals such as Fe and Co, possess a small SOC. To circumvent this shortcoming, the common practice has been to utilize the large SOC of nonmagnetic layers of 5d heavy… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

    Comments: 26 pages, 4 figures plus 5 pages of sup. information

    Journal ref: Advanced Materials 2021

  32. arXiv:2009.05188  [pdf, other

    cs.SD cs.LG eess.AS

    SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context

    Authors: Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, Juan Pablo Bello

    Abstract: We present SONYC-UST-V2, a dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  33. arXiv:2008.02791  [pdf, other

    cs.SD eess.AS

    Few-Shot Drum Transcription in Polyphonic Music

    Authors: Yu Wang, Justin Salamon, Mark Cartwright, Nicholas J. Bryan, Juan Pablo Bello

    Abstract: Data-driven approaches to automatic drum transcription (ADT) are often limited to a predefined, small vocabulary of percussion instrument classes. Such models cannot recognize out-of-vocabulary classes nor are they able to adapt to finer-grained vocabularies. In this work, we address open vocabulary ADT by introducing few-shot learning to the task. We train a Prototypical Network on a synthetic da… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: ISMIR 2020 camera-ready

  34. arXiv:2006.16583  [pdf, other

    eess.IV

    Pan-Sharpening with Color-Aware Perceptual Loss and Guided Re-Colorization

    Authors: Juan Luis Gonzalez Bello, Soomin Seo, Munchurl Kim

    Abstract: We present a novel color-aware perceptual (CAP) loss for learning the task of pan-sharpening. Our CAP loss is designed to focus on the deep features of a pre-trained VGG network that are more sensitive to spatial details and ignore color information to allow the network to extract the structural information from the PAN image while kee** the color from the lower resolution MS image. Additionally… ▽ More

    Submitted 30 June, 2020; originally announced June 2020.

  35. arXiv:2003.01037  [pdf, other

    cs.SD cs.LG eess.AS

    One or Two Components? The Scattering Transform Answers

    Authors: Vincent Lostanlen, Alice Cohen-Hadria, Juan Pablo Bello

    Abstract: With the aim of constructing a biologically plausible model of machine listening, we study the representation of a multicomponent stationary signal by a wavelet scattering network. First, we show that renormalizing second-order nodes by their first-order parents gives a simple numerical criterion to assess whether two neighboring components will interfere psychoacoustically. Secondly, we run a man… ▽ More

    Submitted 25 June, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 5 pages, 4 figures, in English. Proceedings of the European Signal Processing Conference (EUSIPCO 2020)

  36. arXiv:1911.00417  [pdf, other

    cs.SD cs.LG eess.AS

    Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

    Authors: Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

    Abstract: This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalize… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

    Comments: 5 pages, 3 figures. Presented at the 3rd International Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE). 25--26 October 2019, New York, NY, USA

  37. arXiv:1910.10246  [pdf, other

    cs.SD cs.LG eess.AS

    Learning the helix topology of musical pitch

    Authors: Vincent Lostanlen, Sripathi Sridhar, Brian McFee, Andrew Farnsworth, Juan Pablo Bello

    Abstract: To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between freque… ▽ More

    Submitted 4 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: 5 pages, 6 figures. To appear in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Barcelona, Spain, May 2020

  38. arXiv:1910.01089  [pdf, other

    eess.SP cs.CV eess.IV

    Deep 3D Pan via adaptive "t-shaped" convolutions with global and local adaptive dilations

    Authors: Juan Luis Gonzalez Bello, Munchurl Kim

    Abstract: Recent advances in deep learning have shown promising results in many low-level vision tasks. However, solving the single-image-based view synthesis is still an open problem. In particular, the generation of new images at parallel camera views given a single input image is of great interest, as it enables 3D visualization of the 2D input scenery. We propose a novel network architecture to perform… ▽ More

    Submitted 20 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: Check our video at https://www.youtube.com/watch?v=o0b-e282Rt4

  39. arXiv:1909.09349  [pdf, other

    eess.IV cs.CV eess.SP

    Deep 3D-Zoom Net: Unsupervised Learning of Photo-Realistic 3D-Zoom

    Authors: Juan Luis Gonzalez Bello, Munchurl Kim

    Abstract: The 3D-zoom operation is the positive translation of the camera in the Z-axis, perpendicular to the image plane. In contrast, the optical zoom changes the focal length and the digital zoom is used to enlarge a certain region of an image to the original image size. In this paper, we are the first to formulate an unsupervised 3D-zoom learning problem where images with an arbitrary zoom factor can be… ▽ More

    Submitted 2 October, 2019; v1 submitted 20 September, 2019; originally announced September 2019.

    Comments: Check our video at https://www.youtube.com/watch?v=Gz76VYwUzZ8

  40. arXiv:1906.08512  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Adversarial Learning for Improved Onsets and Frames Music Transcription

    Authors: Jong Wook Kim, Juan Pablo Bello

    Abstract: Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. Ho… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

  41. arXiv:1905.08352  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Robust sound event detection in bioacoustic sensor networks

    Authors: Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello

    Abstract: Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and acro… ▽ More

    Submitted 29 October, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: 32 pages, in English. Submitted to PLOS ONE journal in February 2019; revised August 2019; published October 2019

  42. arXiv:1903.08514  [pdf, other

    eess.IV cs.LG

    A Novel Monocular Disparity Estimation Network with Domain Transformation and Ambiguity Learning

    Authors: Juan Luis Gonzalez Bello, Munchurl Kim

    Abstract: Convolutional neural networks (CNN) have shown state-of-the-art results for low-level computer vision problems such as stereo and monocular disparity estimations, but still, have much room to further improve their performance in terms of accuracy, numbers of parameters, etc. Recent works have uncovered the advantages of using an unsupervised scheme to train CNN's to estimate monocular disparity, w… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

  43. arXiv:1903.03195  [pdf, other

    cs.SD eess.AS

    The life of a New York City noise sensor network

    Authors: Charlie Mydlarz, Mohit Sharma, Yitzchak Lockerman, Ben Steers, Claudio Silva, Juan Pablo Bello

    Abstract: Noise pollution is one of the topmost quality of life issues for urban residents in the United States. Continued exposure to high levels of noise has proven effects on health, including acute effects such as sleep disruption, and long-term effects such as hypertension, heart disease, and hearing loss. To investigate and ultimately aid in the mitigation of urban noise, a network of 55 sensor nodes… ▽ More

    Submitted 26 March, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: This article belongs to the Section Intelligent Sensors, 24 pages, 15 figures, 3 tables, 45 references

    Journal ref: Sensors 2019, 19, 1415

  44. arXiv:1811.00223  [pdf, other

    cs.SD eess.AS stat.ML

    Neural Music Synthesis for Flexible Timbre Control

    Authors: Jong Wook Kim, Rachel Bittner, Aparna Kumar, Juan Pablo Bello

    Abstract: The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process --- creating audio samples from a score and instrument information --- is modeled using generative neural networks. This paper describes a neural music synthesis model with flexible timbre controls, which consists of a recurrent neural network conditioned… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

  45. arXiv:1809.00381  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Multitask Learning for Fundamental Frequency Estimation in Music

    Authors: Rachel M. Bittner, Brian McFee, Juan P. Bello

    Abstract: Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line e… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.

  46. arXiv:1805.00889  [pdf, other

    cs.SD cs.CY cs.HC eess.AS

    SONYC: A System for the Monitoring, Analysis and Mitigation of Urban Noise Pollution

    Authors: Juan Pablo Bello, Claudio Silva, Oded Nov, R. Luke DuBois, Anish Arora, Justin Salamon, Charles Mydlarz, Harish Doraiswamy

    Abstract: We present the Sounds of New York City (SONYC) project, a smart cities initiative focused on develo** a cyber-physical system for the monitoring, analysis and mitigation of urban noise pollution. Noise pollution is one of the topmost quality of life issues for urban residents in the U.S. with proven effects on health, education, the economy, and the environment. Yet, most cities lack the resourc… ▽ More

    Submitted 18 May, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: Accepted May 2018, Communications of the ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record will be published in Communications of the ACM

  47. arXiv:1804.10070  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Adaptive pooling operators for weakly labeled sound event detection

    Authors: Brian McFee, Justin Salamon, Juan Pablo Bello

    Abstract: Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor- and cost-intensive for hu… ▽ More

    Submitted 10 August, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

  48. arXiv:1802.06182  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    CREPE: A Convolutional Representation for Pitch Estimation

    Authors: Jong Wook Kim, Justin Salamon, Peter Li, Juan Pablo Bello

    Abstract: The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

    Comments: ICASSP 2018

  49. Confidence regions for neutrino oscillation parameters from double-Chooz data

    Authors: B. Vargas Perez, J. García-Ravelo, Dionisio Tun, Jorge Garcia Bello, Jesús Escamilla Roa

    Abstract: In this work, an independent and detailed statistical analysis of the double-Chooz experiment is performed. To have a thorough understanding of the implications of the double-Chooz data on both oscillation parameters $\sin^{2}(2θ_{13})$ and $Δm^2_{31}$, we decided to analyze the data corresponding to the Far detector, with no additional restriction. By doing this, confidence regions and best fit v… ▽ More

    Submitted 1 August, 2018; v1 submitted 14 December, 2017; originally announced December 2017.

    Comments: 11 pages, 2 b/n figure, 5 color figures, 6 tables

    Journal ref: Phys. Rev. D 97, 093005 (2018)

  50. arXiv:1608.04363  [pdf, other

    cs.SD cs.CV cs.LG cs.NE

    Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

    Authors: Justin Salamon, Juan Pablo Bello

    Abstract: The ability of deep convolutional neural networks (CNN) to learn discriminative spectro-temporal patterns makes them well suited to environmental sound classification. However, the relative scarcity of labeled data has impeded the exploitation of this family of high-capacity models. This study has two primary contributions: first, we propose a deep convolutional neural network architecture for env… ▽ More

    Submitted 28 November, 2016; v1 submitted 15 August, 2016; originally announced August 2016.

    Comments: Accepted November 2016, IEEE Signal Processing Letters. Copyright IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material, creating new collective works, for resale or redistribution, or reuse of any copyrighted component of this work in other works