Skip to main content

Showing 1–5 of 5 results for author: Mancusi, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.16969  [pdf, other

    cs.SD cs.LG eess.AS

    COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations

    Authors: Ruben Ciranni, Emilian Postolache, Giorgio Mariani, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

    Abstract: We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples. Our method operates at the level of stems (or their combinations) composing music tracks and allows the objective evaluation of compositional models for music in the task of accompaniment generation… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Demo page: https://github.com/gladia-research-group/cocola

  2. arXiv:2302.02257  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

    Authors: Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà

    Abstract: In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: ICLR 2024 oral presentation. Demo page: https://gladia-research-group.github.io/multi-source-diffusion-models/

  3. arXiv:2301.08562  [pdf, other

    cs.LG cs.SD eess.AS

    Latent Autoregressive Source Separation

    Authors: Emilian Postolache, Giorgio Mariani, Michele Mancusi, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

    Abstract: Autoregressive models have achieved impressive results over a wide range of domains in terms of generation quality and downstream task performance. In the continuous domain, a key factor behind this success is the usage of quantized latent spaces (e.g., obtained via VQ-VAE autoencoders), which allow for dimensionality reduction and faster inference times. However, using existing pre-trained models… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023

  4. arXiv:2201.05013  [pdf, other

    cs.SD cs.LG eess.AS

    Fish sounds: towards the evaluation of marine acoustic biodiversity through data-driven audio source separation

    Authors: Michele Mancusi, Nicola Zonca, Emanuele Rodolà, Silvia Zuffi

    Abstract: The marine ecosystem is changing at an alarming rate, exhibiting biodiversity loss and the migration of tropical species to temperate basins. Monitoring the underwater environments and their inhabitants is of fundamental importance to understand the evolution of these systems and implement safeguard policies. However, assessing and tracking biodiversity is often a complex task, especially in large… ▽ More

    Submitted 14 January, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  5. arXiv:2110.05313  [pdf, other

    cs.LG cs.SD eess.AS

    Unsupervised Source Separation via Bayesian Inference in the Latent Domain

    Authors: Michele Mancusi, Emilian Postolache, Giorgio Mariani, Marco Fumero, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

    Abstract: State of the art audio source separation models rely on supervised data-driven approaches, which can be expensive in terms of labeling resources. On the other hand, approaches for training these models without any direct supervision are typically high-demanding in terms of memory and time requirements, and remain impractical to be used at inference time. We aim to tackle these limitations by propo… ▽ More

    Submitted 30 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, submitted to Interspeech 2022