Skip to main content

Showing 1–9 of 9 results for author: Hershey, S

Searching in archive cs. Search in all archives.
.
  1. Dataset balancing can hurt model performance

    Authors: R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal

    Abstract: Machine learning from training data with a skewed distribution of examples per class can lead to models that favor performance on common classes at the expense of performance on rare ones. AudioSet has a very wide range of priors over its 527 sound event classes. Classification performance on AudioSet is usually evaluated by a simple average over per-class metrics, meaning that performance on rare… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: 5 pages, 3 figures, ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

  2. arXiv:2105.07031  [pdf, other

    cs.SD eess.AS

    The Benefit Of Temporally-Strong Labels In Audio Event Classification

    Authors: Shawn Hershey, Daniel P W Ellis, Eduardo Fonseca, Aren Jansen, Caroline Liu, R Channing Moore, Manoj Plakal

    Abstract: To reveal the importance of temporal precision in ground truth audio event labels, we collected precise (~0.1 sec resolution) "strong" labels for a portion of the AudioSet dataset. We devised a temporally strong evaluation set (including explicit negatives of varying difficulty) and a small strong-labeled training subset of 67k clips (compared to the original dataset's 1.8M clips labeled at 10 sec… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at ICASSP 2021

  3. arXiv:2105.02132  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Learning from Automatically Separated Sound Scenes

    Authors: Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

    Abstract: Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and each other is semantically constrained: the sound scene contains the union of source classes and not all classes naturally co-occur. With this motivation, this… ▽ More

    Submitted 14 September, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  4. arXiv:2011.01143  [pdf, other

    cs.SD cs.CV eess.AS

    Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

    Authors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

    Abstract: Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioScope, a novel audio-visual sound separation framework that can be trained without supervision to isolate on-screen sound sources from real in-the-wild videos. Pri… ▽ More

    Submitted 29 May, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: ICLR 2021, 27 pages

  5. arXiv:2005.00878  [pdf, other

    cs.SD cs.LG eess.AS

    Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

    Authors: Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore, Xavier Serra

    Abstract: The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and model-agnostic method based on a teacher-student framework with loss masking to first ident… ▽ More

    Submitted 25 July, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted in IEEE Signal Processing Letters, openly accessible at https://ieeexplore.ieee.org/document/9130823

    Journal ref: IEEE Signal Processing Letters, Vol. 27, 2020, pages 1235-1239

  6. arXiv:1911.05894  [pdf, other

    cs.SD eess.AS stat.ML

    Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

    Authors: Aren Jansen, Daniel P. W. Ellis, Shawn Hershey, R. Channing Moore, Manoj Plakal, Ashok C. Popat, Rif A. Saurous

    Abstract: Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: This extended version of a ICASSP 2020 submission under same title has an added figure and additional discussion for easier consumption

  7. arXiv:1711.02209  [pdf, ps, other

    cs.SD eess.AS stat.ML

    Unsupervised Learning of Semantic Audio Representations

    Authors: Aren Jansen, Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous

    Abstract: Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the ca… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: Submitted to ICASSP 2018

  8. arXiv:1609.09430  [pdf, other

    cs.SD cs.LG stat.ML

    CNN Architectures for Large-Scale Audio Classification

    Authors: Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

    Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th… ▽ More

    Submitted 10 January, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new additions

  9. arXiv:1212.2991  [pdf, other

    cs.SE cs.AI stat.ML

    Accelerating Inference: towards a full Language, Compiler and Hardware stack

    Authors: Shawn Hershey, Jeff Bernstein, Bill Bradley, Andrew Schweitzer, Noah Stein, Theo Weber, Ben Vigoda

    Abstract: We introduce Dimple, a fully open-source API for probabilistic modeling. Dimple allows the user to specify probabilistic models in the form of graphical models, Bayesian networks, or factor graphs, and performs inference (by automatically deriving an inference engine from a variety of algorithms) on the model. Dimple also serves as a compiler for GP5, a hardware accelerator for inference.

    Submitted 12 December, 2012; originally announced December 2012.