Skip to main content

Showing 1–11 of 11 results for author: Dieleman, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.14048  [pdf, ps, other

    cs.SD cs.CL eess.AS

    The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data

    Authors: Alice Baird, Rachel Manzelli, Panagiotis Tzirakis, Chris Gagne, Haoqi Li, Sadie Allen, Sander Dieleman, Brian Kulis, Shrikanth S. Narayanan, Alan Cowen

    Abstract: The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; wi… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  2. arXiv:2202.07765  [pdf, other

    cs.LG cs.AI cs.CV cs.SD eess.AS

    General-purpose, long-context autoregressive modeling with Perceiver AR

    Authors: Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel

    Abstract: Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic… ▽ More

    Submitted 14 June, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  3. arXiv:2111.12124  [pdf, ps, other

    cs.SD eess.AS

    Towards Learning Universal Audio Representations

    Authors: Luyu Wang, Pauline Luc, Yan Wu, Adria Recasens, Lucas Smaira, Andrew Brock, Andrew Jaegle, Jean-Baptiste Alayrac, Sander Dieleman, Joao Carreira, Aaron van den Oord

    Abstract: The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learni… ▽ More

    Submitted 23 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  4. arXiv:2103.06089  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Variable-rate discrete representation learning

    Authors: Sander Dieleman, Charlie Nash, Jesse Engel, Karen Simonyan

    Abstract: Semantically meaningful information content in perceptual signals is usually unevenly distributed. In speech signals for example, there are often many silences, and the speed of pronunciation can vary considerably. In this work, we propose slow autoencoders (SlowAEs) for unsupervised learning of high-level variable-rate discrete representations of sequences, and apply them to speech. We show that… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: 26 pages, 15 figures, samples can be found at https://vdrl.github.io/

  5. arXiv:2006.03575  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Adversarial Text-to-Speech

    Authors: Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

    Abstract: Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest. In this work, we take on the challenging task of learning to synthesise speech from normalised text or phonemes in an end-to-end manner, resulting in models which operate directly on character or phoneme input sequences and produce raw speech audi… ▽ More

    Submitted 17 March, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: 23 pages. In proceedings of ICLR 2021

  6. arXiv:1909.11646  [pdf, other

    cs.SD cs.LG eess.AS

    High Fidelity Speech Synthesis with Adversarial Networks

    Authors: Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

    Abstract: Generative adversarial networks have seen rapid development in recent years and have led to remarkable improvements in generative modelling of images. However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech. To address this paucity, we introdu… ▽ More

    Submitted 26 September, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

  7. arXiv:1810.12247  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

    Authors: Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

    Abstract: Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of… ▽ More

    Submitted 17 January, 2019; v1 submitted 29 October, 2018; originally announced October 2018.

    Comments: Examples available at https://goo.gl/magenta/maestro-examples

  8. arXiv:1810.05246  [pdf, other

    cs.LG cs.HC cs.SD eess.AS stat.ML

    Piano Genie

    Authors: Chris Donahue, Ian Simon, Sander Dieleman

    Abstract: We present Piano Genie, an intelligent controller which allows non-musicians to improvise on the piano. With Piano Genie, a user performs on a simple interface with eight buttons, and their performance is decoded into the space of plausible piano music in real time. To learn a suitable map** procedure for this problem, we train recurrent neural network autoencoders with discrete bottlenecks: an… ▽ More

    Submitted 22 March, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ACM IUI 2019

  9. arXiv:1808.03715  [pdf, ps, other

    cs.SD cs.LG eess.AS

    This Time with Feeling: Learning Expressive Musical Performance

    Authors: Sageev Oore, Ian Simon, Sander Dieleman, Douglas Eck, Karen Simonyan

    Abstract: Music generation has generally been focused on either creating scores or interpreting them. We discuss differences between these two problems and propose that, in fact, it may be valuable to work in the space of direct $\it performance$ generation: jointly predicting the notes $\it and$ $\it also$ their expressive timing and dynamics. We consider the significance and qualities of the data set need… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Comments: Includes links to urls for audio samples

  10. arXiv:1806.10474  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    The challenge of realistic music generation: modelling raw audio at scale

    Authors: Sander Dieleman, Aäron van den Oord, Karen Simonyan

    Abstract: Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio d… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: 13 pages, 2 figures, submitted to NIPS 2018

  11. arXiv:1802.08435  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Neural Audio Synthesis

    Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu

    Abstract: Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating high-quality samples. Efficient sampling for this class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high outp… ▽ More

    Submitted 25 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: 10 pages