Skip to main content

Showing 1–21 of 21 results for author: Lattner, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08384  [pdf, other

    cs.SD cs.AI eess.AS

    Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models

    Authors: Javier Nistal, Marco Pasini, Cyran Aouameur, Maarten Grachten, Stefan Lattner

    Abstract: Recent advancements in deep generative models present new opportunities for music production but also pose challenges, such as high computational demands and limited audio quality. Moreover, current systems frequently rely solely on text input and typically focus on producing complete musical pieces, which is incompatible with existing workflows in music production. To address these issues, we int… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 2 figures, 3 tables

  2. arXiv:2405.08679  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

    Abstract: This paper addresses the problem of self-supervised general-purpose audio representation learning. We explore the use of Joint-Embedding Predictive Architectures (JEPA) for this task, which consists of splitting an input mel-spectrogram into two parts (context and target), computing neural representations for each, and training the neural network to predict the target representations from the cont… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

  3. arXiv:2402.01412  [pdf, other

    cs.SD cs.LG eess.AS

    Bass Accompaniment Generation via Latent Diffusion

    Authors: Marco Pasini, Maarten Grachten, Stefan Lattner

    Abstract: The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent dif… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: ICASSP 2024

  4. arXiv:2401.05064  [pdf, other

    cs.SD cs.LG eess.AS

    Singer Identity Representation Learning using Self-Supervised Techniques

    Authors: Bernardo Torres, Stefan Lattner, Gaël Richard

    Abstract: Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different self… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Accepted at the ISMIR conference, Milan, Italy, 2023

    Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

  5. arXiv:2311.13058  [pdf, other

    cs.SD eess.AS

    Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Music source separation is focused on extracting distinct sonic elements from composite tracks. Historically, many methods have been grounded in supervised learning, necessitating labeled data, which is occasionally constrained in its diversity. More recent methods have delved into N-shot techniques that utilize one or more audio samples to aid in the separation. However, a challenge with some of… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 4 pages, 2 figures, 1 table; Accepted at the 37th Conference on Neural Information Processing Systems (2023), Machine Learning for Audio Workshop

  6. arXiv:2309.02265  [pdf, other

    eess.AS cs.SD

    PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

    Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

    Abstract: In this paper, we address the problem of pitch estimation using Self Supervised Learning (SSL). The SSL paradigm we use is equivariance to pitch transposition, which enables our model to accurately perform pitch estimation on monophonic audio after being trained only on a small unlabeled dataset. We use a lightweight ($<$ 30k parameters) Siamese neural network that takes as inputs two different pi… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  7. arXiv:2308.09454  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring Sampling Techniques for Generating Melodies with a Transformer Language Model

    Authors: Mathias Rose Bjare, Stefan Lattner, Gerhard Widmer

    Abstract: Research in natural language processing has demonstrated that the quality of generations from trained autoregressive language models is significantly influenced by the used sampling strategy. In this study, we investigate the impact of different sampling techniques on musical qualities such as diversity and structure. To accomplish this, we train a high-capacity transformer model on a vast collect… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 7 pages, 5 figures, 1 table, accepted at the 24th Int. Society for Music Information Retrieval Conf., Milan, Italy, 2023

  8. arXiv:2211.13016  [pdf, other

    cs.SD cs.AI eess.AS

    On the Typicality of Musical Sequences

    Authors: Mathias Rose Bjare, Stefan Lattner

    Abstract: It has been shown in a recent publication that words in human-produced English language tend to have an information content close to the conditional entropy. In this paper, we show that the same is true for events in human-produced monophonic musical sequences. We also show how "typical sampling" influences the distribution of information around the entropy for single events and sequences.

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 2 pages, 1 figure, Accepted at the Extended Abstracts for the Late-Breaking Demo Session of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022

  9. arXiv:2208.01141  [pdf, other

    cs.SD cs.LG eess.AS

    SampleMatch: Drum Sample Retrieval by Musical Context

    Authors: Stefan Lattner

    Abstract: Modern digital music production typically involves combining numerous acoustic elements to compile a piece of music. Important types of such elements are drum samples, which determine the characteristics of the percussive components of the piece. Artists must use their aesthetic judgement to assess whether a given drum sample fits the current musical context. However, selecting drum samples from a… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: 8 pages, 3 figures, 1 table; Accepted at the ISMIR conference, Bengaluru, India, 2022

  10. Stochastic Restoration of Heavily Compressed Musical Audio using Generative Adversarial Networks

    Authors: Stefan Lattner, Javier Nistal

    Abstract: Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep learning techniques. However, only a few works tackle the… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 21 pages, 5 figures, published in MDPI Electronics Special Issue "Machine Learning Applied to Music/Audio Signal Processing"

    Journal ref: MDPI Electronics 2021, 10, 1349

  11. arXiv:2206.14723  [pdf, other

    cs.SD cs.LG eess.AS

    DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial Networks

    Authors: Javier Nistal, Cyran Aouameur, Ithan Velarde, Stefan Lattner

    Abstract: In contemporary popular music production, drum sound design is commonly performed by cumbersome browsing and processing of pre-recorded samples in sound libraries. One can also use specialized synthesis hardware, typically controlled through low-level, musically meaningless parameters. Today, the field of Deep Learning offers methods to control the synthesis process via learned high-level features… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 7 pages, 2 figures, 3 tables, ICML2022 Machine Learning for Audio Synthesis (MLAS) Workshop, for sound examples visit https://cslmusicteam.sony.fr/drumgan-vst/

  12. arXiv:2108.01216  [pdf, other

    cs.SD eess.AS

    DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: Generative Adversarial Networks (GANs) have achieved excellent audio synthesis quality in the last years. However, making them operable with semantically meaningful controls remains an open challenge. An obvious approach is to control the GAN by conditioning it on metadata contained in audio datasets. Unfortunately, audio datasets often lack the desired annotations, especially in the musical domai… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 9 pages, 3 figures, 2 tables, accepted to ISMIR2021

    Journal ref: 22nd International Society for Music Information Retrieval (ISMIR 2021)

  13. arXiv:2105.01531  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding

    Authors: Javier Nistal, Cyran Aouameur, Stefan Lattner, Gaël Richard

    Abstract: Influenced by the field of Computer Vision, Generative Adversarial Networks (GANs) are often adopted for the audio domain using fixed-size two-dimensional spectrogram representations as the "image data". However, in the (musical) audio domain, it is often desired to generate output of variable duration. This paper presents VQCPC-GAN, an adversarial framework for synthesizing variable-length audio… ▽ More

    Submitted 30 July, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure, 1 table; accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

    Journal ref: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

  14. arXiv:2008.12073  [pdf, other

    eess.AS cs.SD

    DrumGAN: Synthesis of Drum Sounds With Timbral Feature Conditioning Using Generative Adversarial Networks

    Authors: J. Nistal, S. Lattner, G. Richard

    Abstract: Synthetic creation of drum sounds (e.g., in drum machines) is commonly performed using analog or digital synthesis, allowing a musician to sculpt the desired timbre modifying various parameters. Typically, such parameters control low-level features of the sound and often have no musical meaning or perceptual correspondence. With the rise of Deep Learning, data-driven processing of audio emerges as… ▽ More

    Submitted 28 June, 2022; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: 8 pages, 1 figure, 3 tables, accepted in Proc. of the 21st International Society for Music Information Retrieval (ISMIR2020)

  15. arXiv:2006.09266  [pdf, other

    eess.AS cs.SD

    Comparing Representations for Audio Synthesis Using Generative Adversarial Networks

    Authors: Javier Nistal, Stefan Lattner, Gaël Richard

    Abstract: In this paper, we compare different audio signal representations, including the raw audio waveform and a variety of time-frequency representations, for the task of audio synthesis with Generative Adversarial Networks (GANs). We conduct the experiments on a subset of the NSynth dataset. The architecture follows the benchmark Progressive Growing Wasserstein GAN. We perform experiments both in a full… ▽ More

    Submitted 17 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 5 pages, 1 figure, 5 tables, to be published in European Signal Processing Conference (EUSIPCO)

  16. arXiv:2001.01720  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Modeling Musical Structure with Artificial Neural Networks

    Authors: Stefan Lattner

    Abstract: In recent years, artificial neural networks (ANNs) have become a universal tool for tackling real-world problems. ANNs have also shown great success in music-related tasks including music summarization and classification, similarity estimation, computer-aided or autonomous composition, and automatic music analysis. As structure is a fundamental characteristic of Western music, it plays a role in a… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 152 pages, 28 figures, 10 tables. PhD thesis, Johannes Kepler University Linz, October 2019. Includes results from https://www.ijcai.org/Proceedings/15/Papers/348.pdf, arXiv:1612.04742, arXiv:1708.05325, arXiv:1806.08236, and arXiv:1806.08686 (see Section 1.2 for detailed information)

  17. arXiv:1908.00948  [pdf, other

    cs.SD cs.HC cs.LG eess.AS

    High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction

    Authors: Stefan Lattner, Maarten Grachten

    Abstract: Spurred by the potential of deep learning, computational music generation has gained renewed academic interest. A crucial issue in music generation is that of user control, especially in scenarios where the music generation process is conditioned on existing musical material. Here we propose a model for conditional kick drum track generation that takes existing musical material as input, in additi… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Comments: Paper accepted at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019), New Paltz, New York, U.S.A., October 20-23; 6 pages, 3 figures, 1 table

  18. arXiv:1907.05982  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Learning Complex Basis Functions for Invariant Representations of Audio

    Authors: Stefan Lattner, Monika Dörfler, Andreas Arzt

    Abstract: Learning features from data has shown to be more successful than using hand-crafted features for many machine learning tasks. In music information retrieval (MIR), features learned from windowed spectrograms are highly variant to transformations like transposition or time-shift. Such variances are undesirable when they are irrelevant for the respective MIR task. We propose an architecture called C… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: Paper accepted at the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8; 8 pages, 4 figures, 4 tables

  19. arXiv:1807.07278  [pdf, other

    cs.SD cs.MM eess.AS

    Audio-to-Score Alignment using Transposition-invariant Features

    Authors: Andreas Arzt, Stefan Lattner

    Abstract: Audio-to-score alignment is an important pre-processing step for in-depth analysis of classical music. In this paper, we apply novel transposition-invariant audio features to this task. These low-dimensional features represent local pitch intervals and are learned in an unsupervised fashion by a gated autoencoder. Our results show that the proposed features are indeed fully transposition-invariant… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

    Comments: 19th International Society for Music Information Retrieval Conference, Paris, France, 2018

  20. arXiv:1806.08686  [pdf, other

    cs.SD cs.AI eess.AS

    A Predictive Model for Music Based on Learned Interval Representations

    Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

    Abstract: Connectionist sequence models (e.g., RNNs) applied to musical sequences suffer from two known problems: First, they have strictly "absolute pitch perception". Therefore, they fail to generalize over musical concepts which are commonly perceived in terms of relative distances between pitches (e.g., melodies, scale types, modes, cadences, or chord types). Second, they fall short of capturing the con… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

    Comments: Paper accepted at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27; 8 pages, 3 figures

  21. arXiv:1806.08236  [pdf, other

    cs.SD cs.LG eess.AS

    Learning Transposition-Invariant Interval Features from Symbolic Music and Audio

    Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

    Abstract: Many music theoretical constructs (such as scale types, modes, cadences, and chord types) are defined in terms of pitch intervals---relative distances between pitches. Therefore, when computer models are employed in music tasks, it can be useful to operate on interval representations rather than on the raw musical surface. Moreover, interval representations are transposition-invariant, valuable fo… ▽ More

    Submitted 4 February, 2019; v1 submitted 21 June, 2018; originally announced June 2018.

    Comments: Paper accepted at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27; 8 pages, 5 figures