Skip to main content

Showing 1–3 of 3 results for author: Zinemanas, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2402.09318  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

    Authors: Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora

    Abstract: We present PECMAE, an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing represe… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  2. arXiv:2211.08367  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    FlowGrad: Using Motion for Visual Sound Source Localization

    Authors: Rajsuryan Singh, Pablo Zinemanas, Xavier Serra, Juan Pablo Bello, Magdalena Fuentes

    Abstract: Most recent work in visual sound source localization relies on semantic audio-visual representations learned in a self-supervised manner, and by design excludes temporal information present in videos. While it proves to be effective for widely used benchmark datasets, the method falls short for challenging scenarios like urban traffic. This work introduces temporal context into the state-of-the-ar… ▽ More

    Submitted 14 April, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted in ICASSP 2023

  3. arXiv:2109.12690  [pdf, ps, other

    cs.SD cs.DB cs.LG eess.AS

    Soundata: A Python library for reproducible use of audio datasets

    Authors: Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Paja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

    Abstract: Soundata is a Python library for loading and working with audio datasets in a standardized way, removing the need for writing custom loaders in every project, and improving reproducibility by providing tools to validate data against a canonical version. It speeds up research pipelines by allowing users to quickly download a dataset, load it into memory in a standardized and reproducible way, valid… ▽ More

    Submitted 4 October, 2021; v1 submitted 26 September, 2021; originally announced September 2021.