Skip to main content

Showing 1–12 of 12 results for author: Favory, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2304.12257  [pdf, other

    cs.SD eess.AS

    Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

    Authors: Pablo Alonso-Jiménez, Xavier Favory, Hadrien Foroughmand, Grigoris Bourdalas, Xavier Serra, Thomas Lidy, Dmitry Bogdanov

    Abstract: In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to us… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23)

  2. arXiv:2110.07410  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

    Authors: Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra

    Abstract: Automated audio captioning (AAC) is the task of automatically generating textual descriptions for general audio signals. A captioning system has to identify various information from the input signal and express it with natural language. Existing works mainly focus on investigating new methods and try to improve their performance measured on existing datasets. Having attracted attention only recent… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: 5 pages, 4 figures. Accepted at Detection and Classification of Acoustic Scenes and Events 2021 (DCASE2021)

  3. arXiv:2104.00437  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    Enriched Music Representations with Multiple Cross-modal Contrastive Learning

    Authors: Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov

    Abstract: Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information. Deep learning is commonly used to obtain representations using various sources of information, such as the audio, interactions between users and songs, or associated genre metadata. Recently, contrastive learning has led to representations that generalize bet… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: Accepted for publication to IEEE Signal Processing Letters

    Report number: SPL-30069-2021

  4. arXiv:2010.14171  [pdf, other

    cs.SD cs.IR cs.LG eess.AS stat.ML

    Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and Tags

    Authors: Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

    Abstract: Self-supervised audio representation learning offers an attractive alternative for obtaining generic audio embeddings, capable to be employed into various downstream tasks. Published approaches that consider both audio and words/tags associated with audio do not employ text processing models that are capable to generalize to tags unknown during training. In this work we propose a method for learni… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure

  5. arXiv:2010.00475  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    FSD50K: An Open Dataset of Human-Labeled Sound Events

    Authors: Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra

    Abstract: Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube vid… ▽ More

    Submitted 23 April, 2022; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Accepted version in TASLP. Main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. https://ieeexplore.ieee.org/document/9645159

  6. arXiv:2006.08386  [pdf, other

    cs.LG cs.IR eess.AS stat.ML

    COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

    Authors: Xavier Favory, Konstantinos Drossos, Tuomas Virtanen, Xavier Serra

    Abstract: Audio representation learning based on deep neural networks (DNNs) emerged as an alternative approach to hand-crafted features. For achieving high performance, DNNs often need a large amount of annotated data which can be difficult and costly to obtain. In this paper, we propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags. A… ▽ More

    Submitted 8 July, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 8 pages, 1 figure, workshop on Self-supervision in Audio and Speech at the 37th International Conference on Machine Learning (ICML), 2020, Vienna, Austria

  7. arXiv:2004.03985  [pdf, other

    cs.IR cs.HC cs.LG cs.SD

    Search Result Clustering in Collaborative Sound Collections

    Authors: Xavier Favory, Frederic Font, Xavier Serra

    Abstract: The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows us… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 8 pages, 4 figures, Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR 20), June 8-11, 2020, Dublin, Ireland. ACM, NewYork, NY, USA, 8 pages

    ACM Class: H.3.3

  8. arXiv:1911.11853  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Neural Percussive Synthesis Parameterised by High-Level Timbral Features

    Authors: António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra

    Abstract: We present a deep neural network-based methodology for synthesising percussive sounds with control over high-level timbral characteristics of the sounds. This approach allows for intuitive control of a synthesizer, enabling the user to shape sounds without extensive knowledge of signal processing. We use a feedforward convolutional neural network-based architecture, which is able to map input para… ▽ More

    Submitted 3 April, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

  9. arXiv:1905.06717  [pdf, other

    cs.SD cs.HC eess.AS

    Multi Web Audio Sequencer: Collaborative Music Making

    Authors: Xavier Favory, Xavier Serra

    Abstract: Recent advancements in web-based audio systems have enabled sufficiently accurate timing control and real-time sound processing capabilities. Numerous specialized music tools, as well as digital audio workstations, are now accessible from browsers. Features such as the large accessibility of data and real-time communication between clients make the web attractive for collaborative data manipulatio… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: 4 pages, 4 figures, short paper of the Web Audio Conference 2018

  10. arXiv:1901.01189  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Learning Sound Event Classifiers from Web Audio with Noisy Labels

    Authors: Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

    Abstract: As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the map**. There is, however, little research into the impact of these errors. To foster the investigation of label no… ▽ More

    Submitted 7 March, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)

  11. arXiv:1811.10988  [pdf, other

    cs.IR cs.HC cs.LG cs.SD eess.AS

    Facilitating the Manual Annotation of Sounds When Using Large Taxonomies

    Authors: Xavier Favory, Eduardo Fonseca, Frederic Font, Xavier Serra

    Abstract: Properly annotated multimedia content is crucial for supporting advances in many Information Retrieval applications. It enables, for instance, the development of automatic tools for the annotation of large and diverse multimedia collections. In the context of everyday sounds and online collections, the content to describe is very diverse and involves many different types of concepts, often organis… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: 5 pages, 5 figures, IEEE FRUCT International Workshop on Semantic Audio and the Internet of Things

    Journal ref: Proceedings of the 23rd Conference of Open Innovations Association FRUCT, Bologna, Italy. 2018. ISSN 2305-7254, ISBN 978-952-68653-6-2, FRUCT Oy, e-ISSN 2343-0737 (license CC BY-ND)

  12. arXiv:1807.09902  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

    Authors: Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

    Abstract: This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the A… ▽ More

    Submitted 6 October, 2018; v1 submitted 25 July, 2018; originally announced July 2018.

    Comments: Camera ready for DCASE Workshop 2018