Skip to main content

Showing 1–10 of 10 results for author: Bittner, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.07160  [pdf, other

    cs.SD cs.LG eess.AS

    LLark: A Multimodal Instruction-Following Language Model for Music

    Authors: Josh Gardner, Simon Durand, Daniel Stoller, Rachel M. Bittner

    Abstract: Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for \emph{music} understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets… ▽ More

    Submitted 2 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICML camera-ready version

  2. arXiv:2205.01273  [pdf, other

    cs.SD eess.AS

    Few-Shot Musical Source Separation

    Authors: Yu Wang, Daniel Stoller, Rachel M. Bittner, Juan Pablo Bello

    Abstract: Deep learning-based approaches to musical source separation are often limited to the instrument classes that the models are trained on and do not generalize to separate unseen instruments. To address this, we propose a few-shot musical source separation paradigm. We condition a generic U-Net source separation model using few audio examples of the target instrument. We train a few-shot conditioning… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: ICASSP 2022

  3. arXiv:2203.09893  [pdf, other

    cs.SD cs.LG eess.AS

    A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation

    Authors: Rachel M. Bittner, Juan José Bosch, David Rubinstein, Gabriel Meseguer-Brocal, Sebastian Ewert

    Abstract: Automatic Music Transcription (AMT) has been recognized as a key enabling technology with a wide range of applications. Given the task's complexity, best results have typically been reported for systems focusing on specific settings, e.g. instrument-specific systems tend to yield improved results over instrument-agnostic methods. Similarly, higher accuracy can be obtained when only estimating fram… ▽ More

    Submitted 12 May, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

  4. arXiv:2110.05580  [pdf, other

    cs.SD eess.AS

    vocadito: A dataset of solo vocals with $f_0$, note, and lyric annotations

    Authors: Rachel M. Bittner, Katherine Pasalo, Juan José Bosch, Gabriel Meseguer-Brocal, David Rubinstein

    Abstract: To compliment the existing set of datasets, we present a small dataset entitled vocadito, consisting of 40 short excerpts of monophonic singing, sung in 7 different languages by singers with varying of levels of training, and recorded on a variety of devices. We provide several types of annotations, including $f_0$, lyrics, and two different note annotations. All annotations were created by musici… ▽ More

    Submitted 29 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

  5. Audio-based Musical Version Identification: Elements and Challenges

    Authors: Furkan Yesiler, Guillaume Doras, Rachel M. Bittner, Christopher J. Tralie, Joan Serrà

    Abstract: In this article, we aim to provide a review of the key ideas and approaches proposed in 20 years of scientific literature around musical version identification (VI) research and connect them to current practice. For more than a decade, VI systems suffered from the accuracy-scalability trade-off, with attempts to increase accuracy that typically resulted in cumbersome, non-scalable systems. Recent… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted to be published in IEEE Signal Processing Magazine

  6. arXiv:2103.12864  [pdf, other

    cs.SD eess.AS

    Learned complex masks for multi-instrument source separation

    Authors: Andreas Jansson, Rachel M. Bittner, Nicola Montecchio, Tillman Weyde

    Abstract: Music source separation in the time-frequency domain is commonly achieved by applying a soft or binary mask to the magnitude component of (complex) spectrograms. The phase component is usually not estimated, but instead copied from the mixture and applied to the magnitudes of the estimated isolated sources. While this method has several practical advantages, it imposes an upper bound on the perfor… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

  7. arXiv:2008.02069  [pdf, other

    cs.LG cs.IR cs.SD eess.AS stat.ML

    Data Cleansing with Contrastive Learning for Vocal Note Event Annotations

    Authors: Gabriel Meseguer-Brocal, Rachel Bittner, Simon Durand, Brian Brost

    Abstract: Data cleansing is a well studied strategy for cleaning erroneous labels in datasets, which has not yet been widely adopted in Music Information Retrieval. Previously proposed data cleansing models do not consider structured (e.g. time varying) labels, such as those common to music data. We propose a novel data cleansing model for time-varying, structured labels which exploits the local structure o… ▽ More

    Submitted 27 April, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: 21st International Society for Music Information Retrieval Conference 11-15 October 2020, Montreal, Canada

  8. arXiv:1903.07515  [pdf, other

    stat.ML cs.LG

    Approximating exponential family models (not single distributions) with a two-network architecture

    Authors: Sean R. Bittner, John P. Cunningham

    Abstract: Recently much attention has been paid to deep generative models, since they have been used to great success for variational inference, generation of complex data types, and more. In most all of these settings, the goal has been to find a particular member of that model family: optimized parameters index a distribution that is close (via a divergence or classification metric) to a target distributi… ▽ More

    Submitted 18 March, 2019; originally announced March 2019.

  9. arXiv:1811.00223  [pdf, other

    cs.SD eess.AS stat.ML

    Neural Music Synthesis for Flexible Timbre Control

    Authors: Jong Wook Kim, Rachel Bittner, Aparna Kumar, Juan Pablo Bello

    Abstract: The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process --- creating audio samples from a score and instrument information --- is modeled using generative neural networks. This paper describes a neural music synthesis model with flexible timbre controls, which consists of a recurrent neural network conditioned… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

  10. arXiv:1809.00381  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Multitask Learning for Fundamental Frequency Estimation in Music

    Authors: Rachel M. Bittner, Brian McFee, Juan P. Bello

    Abstract: Fundamental frequency (f0) estimation from polyphonic music includes the tasks of multiple-f0, melody, vocal, and bass line estimation. Historically these problems have been approached separately, and only recently, using learning-based approaches. We present a multitask deep learning architecture that jointly estimates outputs for various tasks including multiple-f0, melody, vocal and bass line e… ▽ More

    Submitted 2 September, 2018; originally announced September 2018.