Skip to main content

Showing 1–20 of 20 results for author: Ellis, P W

.
  1. arXiv:2405.00079  [pdf

    physics.soc-ph

    A global evidence map of human well-being and biodiversity co-benefits and trade-offs of natural climate solutions

    Authors: Charlotte H. Chang, James T. Erbaugh, Paola Fajardo, Luci Lu, István Molnár, Dávid Papp, Brian E. Robinson, Kemen Austin, Susan Cook-Patton, Timm Kroeger, Lindsey Smart, Miguel Castro, Samantha H. Cheng, Peter W. Ellis, Rob I. McDonald, Teevrat Garg, Erin E. Poor, Preston Welker, Andrew R. Tilman, Stephen A. Wood, Yuta J. Masuda

    Abstract: Natural climate solutions (NCS) are critical for mitigating climate change through ecosystem-based carbon removal and emissions reductions. NCS implementation can also generate biodiversity and human well-being co-benefits and trade-offs ("NCS co-impacts"), but the volume of evidence on NCS co-impacts has grown rapidly across disciplines, is poorly understood, and remains to be systematically coll… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 28 pages, 5 figures

  2. Dataset balancing can hurt model performance

    Authors: R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal

    Abstract: Machine learning from training data with a skewed distribution of examples per class can lead to models that favor performance on common classes at the expense of performance on rare ones. AudioSet has a very wide range of priors over its 527 sound event classes. Classification performance on AudioSet is usually evaluated by a simple average over per-class metrics, meaning that performance on rare… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: 5 pages, 3 figures, ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

  3. arXiv:2210.07856  [pdf, other

    eess.AS cs.SD

    Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

    Authors: Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

    Abstract: The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset. The systems need to be able to correctly detect the sound events present in a recorded audio clip, as well as localize the events in time. This year's task is a follow-up of DCASE 2021 Task 4, wi… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022)

  4. arXiv:2208.12415  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    MuLan: A Joint Embedding of Music Audio and Natural Language

    Authors: Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P. W. Ellis

    Abstract: Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedd… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: To appear in ISMIR 2022

  5. arXiv:2105.07031  [pdf, other

    cs.SD eess.AS

    The Benefit Of Temporally-Strong Labels In Audio Event Classification

    Authors: Shawn Hershey, Daniel P W Ellis, Eduardo Fonseca, Aren Jansen, Caroline Liu, R Channing Moore, Manoj Plakal

    Abstract: To reveal the importance of temporal precision in ground truth audio event labels, we collected precise (~0.1 sec resolution) "strong" labels for a portion of the AudioSet dataset. We devised a temporally strong evaluation set (including explicit negatives of varying difficulty) and a small strong-labeled training subset of 67k clips (compared to the original dataset's 1.8M clips labeled at 10 sec… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at ICASSP 2021

  6. arXiv:2105.02132  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Learning from Automatically Separated Sound Scenes

    Authors: Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

    Abstract: Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and each other is semantically constrained: the sound scene contains the union of source classes and not all classes naturally co-occur. With this motivation, this… ▽ More

    Submitted 14 September, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  7. arXiv:2011.01143  [pdf, other

    cs.SD cs.CV eess.AS

    Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

    Authors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

    Abstract: Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioScope, a novel audio-visual sound separation framework that can be trained without supervision to isolate on-screen sound sources from real in-the-wild videos. Pri… ▽ More

    Submitted 29 May, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: ICLR 2021, 27 pages

  8. arXiv:2005.00878  [pdf, other

    cs.SD cs.LG eess.AS

    Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

    Authors: Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore, Xavier Serra

    Abstract: The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and model-agnostic method based on a teacher-student framework with loss masking to first ident… ▽ More

    Submitted 25 July, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted in IEEE Signal Processing Letters, openly accessible at https://ieeexplore.ieee.org/document/9130823

    Journal ref: IEEE Signal Processing Letters, Vol. 27, 2020, pages 1235-1239

  9. Orientational correlations in active and passive nematic defects

    Authors: D. J. G. Pearce, J. Nambisan, P. W. Ellis, A. Fernandez-Nieves, L. Giomi

    Abstract: We investigate the emergence of orientational order among +1/2 disclinations in active nematic liquid crystals. Using a combination of theoretical and experimental methods, we show that +1/2 disclinations have short-range antiferromagnetic alignment, as a consequence of the elastic torques originating from their polar structure. The presence of intermediate -1/2 disclinations, however, turns this… ▽ More

    Submitted 5 November, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: 6 pages, 4 figures

  10. arXiv:1911.07951  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Improving Universal Sound Separation Using Sound Classification

    Authors: Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis

    Abstract: Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic s… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  11. arXiv:1911.05894  [pdf, other

    cs.SD eess.AS stat.ML

    Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

    Authors: Aren Jansen, Daniel P. W. Ellis, Shawn Hershey, R. Channing Moore, Manoj Plakal, Ashok C. Popat, Rif A. Saurous

    Abstract: Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: This extended version of a ICASSP 2020 submission under same title has an added figure and additional discussion for easier consumption

  12. arXiv:1906.02975  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Audio tagging with noisy labels and minimal supervision

    Authors: Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra

    Abstract: This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio tagging with noisy labels and minimal supervision". This task was hosted on the Kaggle platform as "Freesound Audio Tagging 2019". The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sou… ▽ More

    Submitted 19 January, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: DCASE2019 Workshop

  13. arXiv:1901.01189  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Learning Sound Event Classifiers from Web Audio with Noisy Labels

    Authors: Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

    Abstract: As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the map**. There is, however, little research into the impact of these errors. To foster the investigation of label no… ▽ More

    Submitted 7 March, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)

  14. arXiv:1808.00606  [pdf, other

    cs.SD eess.AS

    AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

    Authors: Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

    Abstract: Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or… ▽ More

    Submitted 23 August, 2018; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Interspeech, 2018

  15. arXiv:1807.09902  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

    Authors: Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

    Abstract: This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the A… ▽ More

    Submitted 6 October, 2018; v1 submitted 25 July, 2018; originally announced July 2018.

    Comments: Camera ready for DCASE Workshop 2018

  16. Geometrical control of active turbulence in curved topographies

    Authors: D. J. G. Pearce, Perry W. Ellis, Alberto Fernandez-Nieves, L. Giomi

    Abstract: We investigate the turbulent dynamics of a two-dimensional active nematic liquid crystal con- strained on a curved surface. Using a combination of hydrodynamic and particle-based simulations, we demonstrate that the fundamental structural features of the fluid, such as the topological charge density, the defect number density, the nematic order parameter and defect creation and annihilation rates,… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

    Comments: 6 pages, 4 figures

    Journal ref: Phys. Rev. Lett. 122, 168002 (2019)

  17. arXiv:1711.02209  [pdf, ps, other

    cs.SD eess.AS stat.ML

    Unsupervised Learning of Semantic Audio Representations

    Authors: Aren Jansen, Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous

    Abstract: Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the ca… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: Submitted to ICASSP 2018

  18. arXiv:1609.09430  [pdf, other

    cs.SD cs.LG stat.ML

    CNN Architectures for Large-Scale Audio Classification

    Authors: Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

    Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th… ▽ More

    Submitted 10 January, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new additions

  19. arXiv:1512.08756  [pdf, other

    cs.LG cs.NE

    Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems

    Authors: Colin Raffel, Daniel P. W. Ellis

    Abstract: We propose a simplified model of attention which is applicable to feed-forward neural networks and demonstrate that the resulting model can solve the synthetic "addition" and "multiplication" long-term memory problems for sequence lengths which are both longer and more widely varying than the best published results for these tasks.

    Submitted 20 September, 2016; v1 submitted 29 December, 2015; originally announced December 2015.

  20. Stable nematic droplets with handles

    Authors: E. Pairam, J. Vallamkondu, V. Koning, B. C. van Zuiden, P. W. Ellis, M. A. Bates, V. Vitelli, A. Fernandez Nieves

    Abstract: We stabilize nematic droplets with handles against surface-tension-driven instabilities using a yield-stress material as outer fluid and study the complex nematic textures and defect structures that result from the competition between topological constraints and the elasticity of the nematic liquid crystal. We uncover a surprisingly persistent twisted configuration of the nematic director inside t… ▽ More

    Submitted 25 April, 2013; v1 submitted 8 December, 2012; originally announced December 2012.

    Comments: 23 pages, 4 figures, PNAS (2013)