Skip to main content

Showing 1–14 of 14 results for author: Donahue, C

Searching in archive eess. Search in all archives.
.
  1. arXiv:2311.07069  [pdf, other

    cs.SD eess.AS

    Music ControlNet: Multiple Time-varying Controls for Music Generation

    Authors: Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan

    Abstract: Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less suitable for precise control over time-varying attributes such as the positions of beats in time or the changing dynamics of the music. We propose Music ControlN… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 11 pages, 4 figure, 5 tables, Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  2. arXiv:2306.08620  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Anticipatory Music Transformer

    Authors: John Thickstun, David Hall, Chris Donahue, Percy Liang

    Abstract: We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stop** times in the event sequence. This work is motivated by pro… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 33 pages, 6 figures

  3. arXiv:2305.06594  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

    Authors: Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

    Abstract: Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally alig… ▽ More

    Submitted 22 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: accepted at AAAI 2024, music samples available at https://tinyurl.com/v2meow

  4. arXiv:2301.12662  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    SingSong: Generating musical accompaniments from singing

    Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

    Abstract: We present SingSong, a system that generates instrumental music to accompany input vocals, potentially offering musicians and non-musicians alike an intuitive new way to create music featuring their own voice. To accomplish this, we build on recent developments in musical source separation and audio generation. Specifically, we apply a state-of-the-art source separation algorithm to a large corpus… ▽ More

    Submitted 29 January, 2023; originally announced January 2023.

  5. arXiv:2212.01884  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Melody transcription via generative pre-training

    Authors: Chris Donahue, John Thickstun, Percy Liang

    Abstract: Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for so… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: Published as a conference paper at ISMIR 2022

  6. arXiv:2202.09729  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    It's Raw! Audio Generation with State-Space Models

    Authors: Karan Goel, Albert Gu, Chris Donahue, Christopher Ré

    Abstract: Develo** architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SaShiMi, a new multi-s… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

    Comments: 23 pages, 7 figures, 7 tables

  7. arXiv:2107.05916  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Automatic Instrumentation by Learning to Separate Parts in Symbolic Multitrack Music

    Authors: Hao-Wen Dong, Chris Donahue, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Modern keyboards allow a musician to play multiple instruments at the same time by assigning zones -- fixed pitch ranges of the keyboard -- to different instruments. In this paper, we aim to further extend this idea and examine the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance. In addition to the online, real-time-capable se… ▽ More

    Submitted 21 October, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: ISMIR 2021 camera ready

  8. arXiv:2107.05677  [pdf, other

    cs.SD cs.IR cs.LG cs.MM eess.AS

    Codified audio language modeling learns useful representations for music information retrieval

    Authors: Rodrigo Castellon, Chris Donahue, Percy Liang

    Abstract: We demonstrate that language models pre-trained on codified (discretely-encoded) music audio learn representations that are useful for downstream MIR tasks. Specifically, we explore representations from Jukebox (Dhariwal et al. 2020): a music generation system containing a language model trained on codified audio from 1M songs. To determine if Jukebox's representations contain useful information f… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: To appear in the proceedings of ISMIR 2021

  9. arXiv:1907.04868  [pdf, other

    cs.SD cs.LG cs.MM eess.AS stat.ML

    LakhNES: Improving multi-instrumental music generation with cross-domain pre-training

    Authors: Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W. Cottrell, Julian McAuley

    Abstract: We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit.… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

    Comments: Published as a conference paper at ISMIR 2019

  10. arXiv:1904.07944  [pdf, other

    cs.SD cs.LG eess.AS

    Expediting TTS Synthesis with Adversarial Vocoding

    Authors: Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian McAuley

    Abstract: Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn map**s from perceptually-informed s… ▽ More

    Submitted 25 July, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Published as a conference paper at INTERSPEECH 2019

  11. arXiv:1902.08710  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    GANSynth: Adversarial Neural Audio Synthesis

    Authors: Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts

    Abstract: Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence. Autoregressive models, such as WaveNet, model local structure at the expense of global latent structure and slow iterative sampling, while Generative Adversarial Networks (GANs), have global latent conditioning and efficient parall… ▽ More

    Submitted 14 April, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

    Comments: Colab Notebook: http://goo.gl/magenta/gansynth-demo

  12. arXiv:1810.05246  [pdf, other

    cs.LG cs.HC cs.SD eess.AS stat.ML

    Piano Genie

    Authors: Chris Donahue, Ian Simon, Sander Dieleman

    Abstract: We present Piano Genie, an intelligent controller which allows non-musicians to improvise on the piano. With Piano Genie, a user performs on a simple interface with eight buttons, and their performance is decoded into the space of plausible piano music in real time. To learn a suitable map** procedure for this problem, we train recurrent neural network autoencoders with discrete bottlenecks: an… ▽ More

    Submitted 22 March, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ACM IUI 2019

  13. arXiv:1806.04278  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    The NES Music Database: A multi-instrumental dataset with expressive performance attributes

    Authors: Chris Donahue, Huanru Henry Mao, Julian McAuley

    Abstract: Existing research on music generation focuses on composition, but often ignores the expressive performance characteristics required for plausible renditions of resultant pieces. In this paper, we introduce the Nintendo Entertainment System Music Database (NES-MDB), a large corpus allowing for separate examination of the tasks of composition and performance. NES-MDB contains thousands of multi-inst… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: Published as a conference paper at ISMIR 2018

  14. arXiv:1711.05747  [pdf, other

    cs.SD cs.LG cs.NE eess.AS

    Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

    Authors: Chris Donahue, Bo Li, Rohit Prabhavalkar

    Abstract: We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. I… ▽ More

    Submitted 30 October, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: Published as a conference paper at ICASSP 2018