Skip to main content

Showing 1–22 of 22 results for author: Seetharaman, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.04686  [pdf, other

    cs.SD cs.AI eess.AS

    VampNet: Music Generation via Masked Acoustic Token Modeling

    Authors: Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo

    Abstract: We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that at… ▽ More

    Submitted 12 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

  2. arXiv:2306.06546  [pdf, other

    cs.SD cs.LG eess.AS

    High-Fidelity Audio Compression with Improved RVQGAN

    Authors: Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar

    Abstract: Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz… ▽ More

    Submitted 26 October, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023 (spotlight)

  3. arXiv:2208.12387  [pdf, other

    cs.SD cs.LG eess.AS

    Music Separation Enhancement with Generative Modeling

    Authors: Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo

    Abstract: Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-a… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: Accepted to ISMIR 2022

  4. arXiv:2204.05156  [pdf, other

    cs.SD eess.AS

    How to Listen? Rethinking Visual Sound Localization

    Authors: Ho-Hsiang Wu, Magdalena Fuentes, Prem Seetharaman, Juan Pablo Bello

    Abstract: Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and urban traffic. Previous works are usually evaluated with datasets having mostly a single dominant visible object, and proposed models usually require the introduc… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

  5. arXiv:2110.13071  [pdf, other

    cs.SD cs.LG eess.AS

    Unsupervised Source Separation By Steering Pretrained Music Models

    Authors: Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo

    Abstract: We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a latent encoding of the audio used to generate audio. This generated audio is fed to a pretrained music tagger that creates source labels. The cross-entropy loss be… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP 2022

  6. arXiv:2110.11499  [pdf, other

    cs.SD cs.LG eess.AS

    Wav2CLIP: Learning Robust Audio Representations From CLIP

    Authors: Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

    Abstract: We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pre-trained audio representation algorithms. Wav2CLIP projects audio into a shared e… ▽ More

    Submitted 15 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  7. arXiv:2110.10139  [pdf, other

    eess.AS cs.SD

    Chunked Autoregressive GAN for Conditional Waveform Synthesis

    Authors: Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

    Abstract: Conditional waveform synthesis models learn a distribution of audio waveforms given conditioning such as text, mel-spectrograms, or MIDI. These systems employ deep generative models that model the waveform via either sequential (autoregressive) or parallel (non-autoregressive) sampling. Generative adversarial networks (GANs) have become a common choice for non-autoregressive waveform synthesis. Ho… ▽ More

    Submitted 3 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  8. arXiv:2011.00803  [pdf, other

    cs.SD eess.AS

    What's All the FUSS About Free Universal Sound Separation Data?

    Authors: Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

    Abstract: We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio data drawn from 357 classes, which are used to create mixtures of one to four sources. To simulate reverberation, an acoustic room simulator is used to generate… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  9. arXiv:2011.00801  [pdf, other

    cs.SD eess.AS

    Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes

    Authors: Nicolas Turpault, Romain Serizel, Scott Wisdom, Hakan Erdogan, John Hershey, Eduardo Fonseca, Prem Seetharaman, Justin Salamon

    Abstract: We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4 depending on time related modifications (time position of an event and length of clips) and we study the impact of non-target sound events and reverberation. We… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  10. arXiv:2010.12650  [pdf, other

    cs.SD cs.LG

    A Study of Transfer Learning in Music Source Separation

    Authors: Andreas Bugler, Bryan Pardo, Prem Seetharaman

    Abstract: Supervised deep learning methods for performing audio source separation can be very effective in domains where there is a large amount of training data. While some music domains have enough data suitable for training a separation system, such as rock and pop genres, many musical domains do not, such as classical music, choral music, and non-Western music traditions. It is well known that transferr… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: 4 pages + 1 reference page. 3 figures. Submitted to ICASSP

    ACM Class: I.5.4

  11. arXiv:2007.14469  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    AutoClip: Adaptive Gradient Clip** for Source Separation Networks

    Authors: Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux

    Abstract: Clip** the gradient is a known approach to improving gradient descent, but requires hand selection of a clip** threshold hyperparameter. We present AutoClip, a simple method for automatically and adaptively choosing a gradient clip** threshold, based on the history of gradient norms observed during training. Experimental results show that applying AutoClip results in improved generalization… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: Accepted at 2020 IEEE International Workshop on Machine Learning for Signal Processing, Sept.\ 21--24, 2020, Espoo, Finland

  12. arXiv:2007.06123  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    OtoWorld: Towards Learning to Separate by Learning to Move

    Authors: Omkar Ranadive, Grant Gasser, David Terpay, Prem Seetharaman

    Abstract: We present OtoWorld, an interactive environment in which agents must learn to listen in order to solve navigational tasks. The purpose of OtoWorld is to facilitate reinforcement learning research in computer audition, where agents must learn to listen to the world around them to navigate. OtoWorld is built on three open source libraries: OpenAI Gym for environment and agent interaction, PyRoomAcou… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

    Comments: Published in Self Supervision in Audio and Speech Workshop, 37th International Conference on Machine Learning, Vienna, Austria (ICML 2020)

  13. arXiv:2007.03932  [pdf, other

    cs.SD eess.AS eess.SP

    Improving Sound Event Detection In Domestic Environments Using Sound Separation

    Authors: Nicolas Turpault, Scott Wisdom, Hakan Erdogan, John Hershey, Romain Serizel, Eduardo Fonseca, Prem Seetharaman, Justin Salamon

    Abstract: Performing sound event detection on real-world recordings often implies dealing with overlap** target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the classifier level. We propose to use sound separation as a pre-processing for sound event detection. In this paper we start from a sound separation model trained on t… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

  14. arXiv:2006.13331  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation

    Authors: Alisa Liu, Alexander Fang, Gaƫtan Hadjeres, Prem Seetharaman, Bryan Pardo

    Abstract: Deep learning has rapidly become the state-of-the-art approach for music generation. However, training a deep model typically requires a large training set, which is often not available for specific musical styles. In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain. The key intuition… ▽ More

    Submitted 20 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 2 pages, 2 figures, Machine Learning for Media Discovery (ML4MD) Workshop at ICML 2020

  15. arXiv:2006.13329  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach

    Authors: Alexander Fang, Alisa Liu, Prem Seetharaman, Bryan Pardo

    Abstract: Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems. Thus, it can be difficult to determine whether deep models generate stylistically correct output without expert evaluation, but this is expensive and time-consuming. Therefore, there is a need for automatic, inter… ▽ More

    Submitted 17 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: 2 pages, 3 figures, Machine Learning for Media Discovery (ML4MD) Workshop at ICML 2020

  16. arXiv:1910.12626  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Model selection for deep audio source separation via clustering analysis

    Authors: Alisa Liu, Prem Seetharaman, Bryan Pardo

    Abstract: Audio source separation is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals). Deep learning models are the state-of-the-art in source separation, given that the mixture to be separated is similar to the mixtures the deep model was trained on. This requires the end user to know enough about each model's training… ▽ More

    Submitted 26 July, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

  17. arXiv:1910.12621  [pdf, other

    eess.AS cs.LG cs.SD

    Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments

    Authors: Ethan Manilow, Prem Seetharaman, Bryan Pardo

    Abstract: We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks. This novel architecture, which we call Cerberus, builds on the Chimera network for source separation by add… ▽ More

    Submitted 12 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Accepted to ICASSP 2020

  18. arXiv:1910.11133  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Bootstrap** deep music separation from primitive auditory grou** principles

    Authors: Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo

    Abstract: Separating an audio scene such as a cocktail party into constituent, meaningful components is a core task in computer audition. Deep networks are the state-of-the-art approach. They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known. However, the vast majority of available audio is not isolated. The brain uses prim… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

  19. arXiv:1909.08494  [pdf, other

    cs.SD cs.LG eess.AS

    Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity

    Authors: Ethan Manilow, Gordon Wichern, Prem Seetharaman, Jonathan Le Roux

    Abstract: Music source separation performance has greatly improved in recent years with the advent of approaches based on deep learning. Such methods typically require large amounts of labelled training data, which in the case of music consist of mixtures and corresponding instrument stems. However, stems are unavailable for most commercial music, and only limited datasets have so far been released to the p… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted for publication at WASPAA 2019

  20. arXiv:1811.03076  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Class-conditional embeddings for music source separation

    Authors: Prem Seetharaman, Gordon Wichern, Shrikant Venkataramani, Jonathan Le Roux

    Abstract: Isolating individual instruments in a musical mixture has a myriad of potential applications, and seems imminently achievable given the levels of performance reached by recent deep learning methods. While most musical source separation techniques learn an independent model for each instrument, we propose using a common embedding space for the time-frequency bins of all instruments in a mixture ins… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: 5 pages

  21. arXiv:1811.02130  [pdf, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    Bootstrap** single-channel source separation via unsupervised spatial clustering on stereo mixtures

    Authors: Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo

    Abstract: Separating an audio scene into isolated sources is a fundamental problem in computer audition, analogous to image segmentation in visual scene analysis. Source separation systems based on deep learning are currently the most successful approaches for solving the underdetermined separation problem, where there are more sources than channels. Traditionally, such systems are trained on sound mixtures… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    Comments: 5 pages, 2 figures

  22. arXiv:1606.03539  [pdf

    cs.CY

    Firm Growth and Innovation in the ERP Industry: A Systems Thinking Approach

    Authors: Srujana Pinjala, Rahul Roy, Priya Seetharaman

    Abstract: Achievement and sustenance of growth are essential themes in organizational literature. In our paper, we develop models using systems thinking approach to understand how firms achieve and sustain growth in a technology-intensive product domain. We augment these to explain the possible impact of a disruptive technological innovation. We use enterprise software industry as the context where SAP has… ▽ More

    Submitted 10 June, 2016; originally announced June 2016.

    Comments: Research-in-progress ISBN# 978-0-646-95337-3 Presented at the Australasian Conference on Information Systems 2015 (arXiv:1605.01032)

    Report number: ACIS/2015/219