Search | arXiv e-print repository

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio

Authors: Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas, Dmitry Bogdanov, Xavier Serra, Martín Rocamora

Abstract: We present PECMAE, an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing represe… ▽ More We present PECMAE, an interpretable model for music audio classification based on prototype learning. Our model is based on a previous method, APNet, which jointly learns an autoencoder and a prototypical network. Instead, we propose to decouple both training processes. This enables us to leverage existing self-supervised autoencoders pre-trained on much larger data (EnCodecMAE), providing representations with better generalization. APNet allows prototypes' reconstruction to waveforms for interpretability relying on the nearest training data samples. In contrast, we explore using a diffusion decoder that allows reconstruction without such dependency. We evaluate our method on datasets for music instrument classification (Medley-Solos-DB) and genre recognition (GTZAN and a larger in-house dataset), the latter being a more challenging task not addressed with prototypical networks before. We find that the prototype-based models preserve most of the performance achieved with the autoencoder embeddings, while the sonification of prototypes benefits understanding the behavior of the classifier. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2312.05994 [pdf, other]

mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks

Authors: Christos Plachouras, Pablo Alonso-Jiménez, Dmitry Bogdanov

Abstract: Music Information Retrieval (MIR) research is increasingly leveraging representation learning to obtain more compact, powerful music audio representations for various downstream MIR tasks. However, current representation evaluation methods are fragmented due to discrepancies in audio and label preprocessing, downstream model and metric implementations, data availability, and computational resource… ▽ More Music Information Retrieval (MIR) research is increasingly leveraging representation learning to obtain more compact, powerful music audio representations for various downstream MIR tasks. However, current representation evaluation methods are fragmented due to discrepancies in audio and label preprocessing, downstream model and metric implementations, data availability, and computational resources, often leading to inconsistent and limited results. In this work, we introduce mir_ref, an MIR Representation Evaluation Framework focused on seamless, transparent, local-first experiment orchestration to support representation development. It features implementations of a variety of components such as MIR datasets, tasks, embedding models, and tools for result analysis and visualization, while facilitating the implementation of custom components. To demonstrate its utility, we use it to conduct an extensive evaluation of several embedding models across various tasks and datasets, including evaluating their robustness to various audio perturbations and the ease of extracting relevant information from them. △ Less

Submitted 12 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Machine Learning for Audio Workshop, Neural Information Processing Systems (NeurIPS) 2023, New Orleans, LA

arXiv:2311.10057 [pdf, other]

The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

Authors: Ilaria Manco, Benno Weck, SeungHeon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam

Abstract: We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models o… ▽ More We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance. △ Less

Submitted 22 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted to NeurIPS 2023 Workshop on Machine Learning for Audio

arXiv:2309.16418 [pdf, other]

Efficient Supervised Training of Audio Transformers for Music Representation Learning

Authors: Pablo Alonso-Jiménez, Xavier Serra, Dmitry Bogdanov

Abstract: In this work, we address music representation learning using convolution-free transformers. We build on top of existing spectrogram-based audio transformers such as AST and train our models on a supervised task using patchout training similar to PaSST. In contrast to previous works, we study how specific design decisions affect downstream music tagging tasks instead of focusing on the training tas… ▽ More In this work, we address music representation learning using convolution-free transformers. We build on top of existing spectrogram-based audio transformers such as AST and train our models on a supervised task using patchout training similar to PaSST. In contrast to previous works, we study how specific design decisions affect downstream music tagging tasks instead of focusing on the training task. We assess the impact of initializing the models with different pre-trained weights, using various input audio segment lengths, using learned representations from different blocks and tokens of the transformer for downstream tasks, and applying patchout at inference to speed up feature extraction. We find that 1) initializing the model from ImageNet or AudioSet weights and using longer input segments are beneficial both for the training and downstream tasks, 2) the best representations for the considered downstream tasks are located in the middle blocks of the transformer, and 3) using patchout at inference allows faster processing than our convolutional baselines while maintaining superior performance. The resulting models, MAEST, are publicly available and obtain the best performance among open models in music tagging tasks. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: Accepted at the 2023 International Society for Music Information Retrieval Conference (ISMIR'23)

arXiv:2304.12257 [pdf, other]

Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

Authors: Pablo Alonso-Jiménez, Xavier Favory, Hadrien Foroughmand, Grigoris Bourdalas, Xavier Serra, Thomas Lidy, Dmitry Bogdanov

Abstract: In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to us… ▽ More In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to using playlist data as a source of music similarity information and investigate three approaches to generate anchor and positive track pairs. We evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood, and instrument tagging) and music similarity. We find that creating anchor and positive track pairs by relying on co-occurrences in playlists provides better music similarity and competitive classification results compared to choosing tracks from the same artist as in previous works. Additionally, our best pre-training approach based on playlists provides superior classification performance for most datasets. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23)

arXiv:2301.06167 [pdf]

UN Handbook on Privacy-Preserving Computation Techniques

Authors: David W. Archer, Borja de Balle Pigem, Dan Bogdanov, Mark Craddock, Adria Gascon, Ronald Jansen, Matjaž Jug, Kim Laine, Robert McLellan, Olga Ohrimenko, Mariana Raykova, Andrew Trask, Simon Wardley

Abstract: This paper describes privacy-preserving approaches for the statistical analysis. It describes motivations for privacy-preserving approaches for the statistical analysis of sensitive data, presents examples of use cases where such methods may apply and describes relevant technical capabilities to assure privacy preservation while still allowing analysis of sensitive data. Our focus is on methods th… ▽ More This paper describes privacy-preserving approaches for the statistical analysis. It describes motivations for privacy-preserving approaches for the statistical analysis of sensitive data, presents examples of use cases where such methods may apply and describes relevant technical capabilities to assure privacy preservation while still allowing analysis of sensitive data. Our focus is on methods that enable protecting privacy of data while it is being processed, not only while it is at rest on a system or in transit between systems. The information in this document is intended for use by statisticians and data scientists, data curators and architects, IT specialists, and security and information assurance specialists, so we explicitly avoid cryptographic technical details of the technologies we describe. △ Less

Submitted 15 January, 2023; originally announced January 2023.

Comments: 50 pages

arXiv:2203.15448 [pdf, other]

ZK-SecreC: a Domain-Specific Language for Zero Knowledge Proofs

Authors: Dan Bogdanov, Joosep Jääger, Peeter Laud, Härmel Nestra, Martin Pettai, Jaak Randmets, Ville Sokk, Kert Tali, Sandhra-Mirella Valdma

Abstract: We present ZK-SecreC, a domain-specific language for zero-knowledge proofs. We present the rationale for its design, its syntax and semantics, and demonstrate its usefulness on the basis of a number of non-trivial examples. The design features a type system, where each piece of data is assigned both a confidentiality and an integrity type, which are not orthogonal to each other. We perform an empi… ▽ More We present ZK-SecreC, a domain-specific language for zero-knowledge proofs. We present the rationale for its design, its syntax and semantics, and demonstrate its usefulness on the basis of a number of non-trivial examples. The design features a type system, where each piece of data is assigned both a confidentiality and an integrity type, which are not orthogonal to each other. We perform an empiric evaluation of the statements produced by its compiler in terms of their size. We also show the integration of the compiler with the implementation of a zero-knowledge proof technique, and evaluate the running time of both Prover and Verifier. △ Less

Submitted 26 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: 75 pp

arXiv:2104.00437 [pdf, other]

doi 10.1109/LSP.2021.3071082

Enriched Music Representations with Multiple Cross-modal Contrastive Learning

Authors: Andres Ferraro, Xavier Favory, Konstantinos Drossos, Yuntae Kim, Dmitry Bogdanov

Abstract: Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information. Deep learning is commonly used to obtain representations using various sources of information, such as the audio, interactions between users and songs, or associated genre metadata. Recently, contrastive learning has led to representations that generalize bet… ▽ More Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information. Deep learning is commonly used to obtain representations using various sources of information, such as the audio, interactions between users and songs, or associated genre metadata. Recently, contrastive learning has led to representations that generalize better compared to traditional supervised methods. In this paper, we present a novel approach that combines multiple types of information related to music using cross-modal contrastive learning, allowing us to learn an audio feature from heterogeneous data simultaneously. We align the latent representations obtained from playlists-track interactions, genre metadata, and the tracks' audio, by maximizing the agreement between these modality representations using a contrastive loss. We evaluate our approach in three tasks, namely, genre classification, playlist continuation and automatic tagging. We compare the performances with a baseline audio-based CNN trained to predict these modalities. We also study the importance of including multiple sources of information when training our embedding model. The results suggest that the proposed method outperforms the baseline in all the three downstream tasks and achieves comparable performance to the state-of-the-art. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: Accepted for publication to IEEE Signal Processing Letters

Report number: SPL-30069-2021

arXiv:2102.00201 [pdf, other]

Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Authors: Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov

Abstract: One of the main limitations in the field of audio signal processing is the lack of large public datasets with audio representations and high-quality annotations due to restrictions of copyrighted commercial music. We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649,091tracks and 148,826 associated playlists annotated by 30,652 different tags. All the data is gathered fr… ▽ More One of the main limitations in the field of audio signal processing is the lack of large public datasets with audio representations and high-quality annotations due to restrictions of copyrighted commercial music. We present Melon Playlist Dataset, a public dataset of mel-spectrograms for 649,091tracks and 148,826 associated playlists annotated by 30,652 different tags. All the data is gathered from Melon, a popular Korean streaming service. The dataset is suitable for music information retrieval tasks, in particular, auto-tagging and automatic playlist continuation. Even though the latter can be addressed by collaborative filtering approaches, audio provides opportunities for research on track suggestions and building systems resistant to the cold-start problem, for which we provide a baseline. Moreover, the playlists and the annotations included in the Melon Playlist Dataset make it suitable for metric learning and representation learning. △ Less

Submitted 30 January, 2021; originally announced February 2021.

Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2012.12927 [pdf]

Towards a common performance and effectiveness terminology for digital proximity tracing applications

Authors: Justus Benzler, Dan Bogdanov, Göran Kirchner, Wouter Lueks, Raquel Lucas, Rui Oliveira, Bart Preneel, Marcel Salathe, Carmela Troncoso, Viktor von Wyl

Abstract: Digital proximity tracing (DPT) for Sars-CoV-2 pandemic mitigation is a complex intervention with the primary goal to notify app users about possible risk exposures to infected persons. Policymakers and DPT operators need to know whether their system works as expected in terms of speed or yield (performance) and whether DPT is making an effective contribution to pandemic mitigation (also in compar… ▽ More Digital proximity tracing (DPT) for Sars-CoV-2 pandemic mitigation is a complex intervention with the primary goal to notify app users about possible risk exposures to infected persons. Policymakers and DPT operators need to know whether their system works as expected in terms of speed or yield (performance) and whether DPT is making an effective contribution to pandemic mitigation (also in comparison to and beyond established mitigation measures, particularly manual contact tracing). Thereby, performance and effectiveness are not to be confused. Not only are there conceptual differences but also diverse data requirements. This article describes differences between performance and effectiveness measures and attempts to develop a terminology and classification system for DPT evaluation. We discuss key aspects for critical assessments of whether the integration of additional data measurements into DPT apps - beyond what is required to fulfill its primary notification role - may facilitate an understanding of performance and effectiveness of planned and deployed DPT apps. Therefore, the terminology and a classification matrix may offer some guidance to DPT system operators regarding which measurements to prioritize. DPT developers and operators may also make conscious decisions to integrate measures for epidemic monitoring but should be aware that this introduces a secondary purpose to DPT that is not part of the original DPT design. Ultimately, the integration of further information for epidemic monitoring into DPT involves a trade-off between data granularity and linkage on the one hand, and privacy on the other. Decision-makers should be aware of the trade-off and take it into account when planning and develo** DPT notification and monitoring systems or intending to assess the added value of DPT relative to existing contact tracing systems. △ Less

Submitted 23 December, 2020; originally announced December 2020.

arXiv:2008.11507 [pdf, other]

The Freesound Loop Dataset and Annotation Tool

Authors: Antonio Ramires, Frederic Font, Dmitry Bogdanov, Jordan B. L. Smith, Yi-Hsuan Yang, Joann Ching, Bo-Yu Chen, Yueh-Kao Wu, Hsu Wei-Han, Xavier Serra

Abstract: Music loops are essential ingredients in electronic music production, and there is a high demand for pre-recorded loops in a variety of styles. Several commercial and community databases have been created to meet this demand, but most are not suitable for research due to their strict licensing. We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by expe… ▽ More Music loops are essential ingredients in electronic music production, and there is a high demand for pre-recorded loops in a variety of styles. Several commercial and community databases have been created to meet this demand, but most are not suitable for research due to their strict licensing. We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts. The loops originate from Freesound, a community database of audio recordings released under Creative Commons licenses, so the audio in our dataset may be redistributed. The annotations include instrument, tempo, meter, key and genre tags. We describe the methodology used to assemble and annotate the data, and report on the distribution of tags in the data and inter-annotator agreement. We also present to the community an online loop annotator tool that we developed. To illustrate the usefulness of FSLD, we present short case studies on using it to estimate tempo and key, generate music tracks, and evaluate a loop separation algorithm. We anticipate that the community will find yet more uses for the data, in applications from automatic loop characterisation to algorithmic composition. △ Less

Submitted 23 September, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Comments: This work will be presented in the 21st International Society for Music Information Retrieval (ISMIR2020). Annotator website: http://mtg.upf.edu/fslannotator Dataset: https://zenodo.org/record/3967852

arXiv:2006.00751 [pdf, other]

Evaluation of CNN-based Automatic Music Tagging Models

Authors: Minz Won, Andres Ferraro, Dmitry Bogdanov, Xavier Serra

Abstract: Recent advances in deep learning accelerated the development of content-based automatic music tagging systems. Music information retrieval (MIR) researchers proposed various architecture designs, mainly based on convolutional neural networks (CNNs), that achieve state-of-the-art results in this multi-label binary classification task. However, due to the differences in experimental setups followed… ▽ More Recent advances in deep learning accelerated the development of content-based automatic music tagging systems. Music information retrieval (MIR) researchers proposed various architecture designs, mainly based on convolutional neural networks (CNNs), that achieve state-of-the-art results in this multi-label binary classification task. However, due to the differences in experimental setups followed by researchers, such as using different dataset splits and software versions for evaluation, it is difficult to compare the proposed architectures directly with each other. To facilitate further research, in this paper we conduct a consistent evaluation of different music tagging models on three datasets (MagnaTagATune, Million Song Dataset, and MTG-Jamendo) and provide reference results using common evaluation metrics (ROC-AUC and PR-AUC). Furthermore, all the models are evaluated with perturbed inputs to investigate the generalization capabilities concerning time stretch, pitch shift, dynamic range compression, and addition of white noise. For reproducibility, we provide the PyTorch implementations with the pre-trained models. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 7 pages, 2 figures, Sound and Music Computing 2020 (SMC 2020)

arXiv:2003.07393 [pdf, ps, other]

TensorFlow Audio Models in Essentia

Authors: Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

Abstract: Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pr… ▽ More Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a cross-collection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models. △ Less

Submitted 16 March, 2020; originally announced March 2020.

arXiv:1911.04827 [pdf, other]

Artist and style exposure bias in collaborative filtering based music recommendations

Authors: Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jason Yoon

Abstract: Algorithms have an increasing influence on the music that we consume and understanding their behavior is fundamental to make sure they give a fair exposure to all artists across different styles. In this on-going work we contribute to this research direction analyzing the impact of collaborative filtering recommendations from the perspective of artist and music style exposure given by the system.… ▽ More Algorithms have an increasing influence on the music that we consume and understanding their behavior is fundamental to make sure they give a fair exposure to all artists across different styles. In this on-going work we contribute to this research direction analyzing the impact of collaborative filtering recommendations from the perspective of artist and music style exposure given by the system. We first analyze the distribution of the recommendations considering the exposure of different styles or genres and compare it to the users' listening behavior. This comparison suggests that the system is reinforcing the popularity of the items. Then, we simulate the effect of the system in the long term with a feedback loop. From this simulation we can see how the system gives less opportunity to the majority of artists, concentrating the users on fewer items. The results of our analysis demonstrate the need for a better evaluation methodology for current music recommendation algorithms, not only limited to user-focused relevance metrics. △ Less

Submitted 12 November, 2019; originally announced November 2019.

Comments: Presented at Workshop on Designing Human-Centric MIR Systems, ISMIR 2019

arXiv:1911.04824 [pdf, other]

How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging

Authors: Andres Ferraro, Dmitry Bogdanov, Xavier Serra, Jay Ho Jeon, Jason Yoon

Abstract: Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram rep… ▽ More Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times. △ Less

Submitted 28 June, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: The 28th European Signal Processing Conference (EUSIPCO)

arXiv:1903.11833 [pdf, ps, other]

Skip prediction using boosting trees based on acoustic features of tracks in sessions

Authors: Andrés Ferraro, Dmitry Bogdanov, Xavier Serra

Abstract: The Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not. In this paper, we describe our approach to this problem and the final system that was submitted to the challenge by our team from the Music Technology Group (MTG) under the name "aferraro". This system consists in combining the predictions of multiple boosting trees mo… ▽ More The Spotify Sequential Skip Prediction Challenge focuses on predicting if a track in a session will be skipped by the user or not. In this paper, we describe our approach to this problem and the final system that was submitted to the challenge by our team from the Music Technology Group (MTG) under the name "aferraro". This system consists in combining the predictions of multiple boosting trees models trained with features extracted from the sessions and the tracks. The proposed approach achieves good overall performance (MAA of 0.554), with our model ranked 14th out of more than 600 submissions in the final leaderboard. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Journal ref: WSDM Cup 2019 Workshop on the 12th ACM International Conference on Web Search and Data Mining

arXiv:1901.02296 [pdf, other]

Using offline metrics and user behavior analysis to combine multiple systems for music recommendation

Authors: Andres Ferraro, Dmitry Bogdanov, Kyumin Choi, Xavier Serra

Abstract: There are many offline metrics that can be used as a reference for evaluation and optimization of the performance of recommender systems. Hybrid recommendation approaches are commonly used to improve some of those metrics by combining different systems. In this work we focus on music recommendation and propose a new way to improve recommendations, with respect to a desired metric of choice, by com… ▽ More There are many offline metrics that can be used as a reference for evaluation and optimization of the performance of recommender systems. Hybrid recommendation approaches are commonly used to improve some of those metrics by combining different systems. In this work we focus on music recommendation and propose a new way to improve recommendations, with respect to a desired metric of choice, by combining multiple systems for each user individually based on their expected performance. Essentially, our approach consists in predicting an expected error that each system will produce for each user based on their previous activity. To this end, we propose to train regression models for different metrics predicting the performance of each system based on a number of features characterizing previous user behavior in the system. We then use different fusion strategies to combine recommendations generated by each system. Following this approach one can optimize the final hybrid system with respect to the desired metric of choice. As a proof of concept, we conduct experiments combining two recommendation systems, a Matrix Factorization model and a popularity-based recommender. We use the data provided by Melon, a Korean music streaming service, to train and evaluate the performance of the systems. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Journal ref: Conference on Recommender Systems (RecSys) 2018, REVEAL Workshop

arXiv:1901.00450 [pdf, ps, other]

doi 10.1145/3267471.3267473

Automatic playlist continuation using a hybrid recommender system combining features from text and audio

Authors: Andres Ferraro, Dmitry Bogdanov, Jisang Yoon, KwangSeob Kim, Xavier Serra

Abstract: The ACM RecSys Challenge 2018 focuses on music recommendation in the context of automatic playlist continuation. In this paper, we describe our approach to the problem and the final hybrid system that was submitted to the challenge by our team Cocoplaya. This system consists in combining the recommendations produced by two different models using ranking fusion. The first model is based on Matrix F… ▽ More The ACM RecSys Challenge 2018 focuses on music recommendation in the context of automatic playlist continuation. In this paper, we describe our approach to the problem and the final hybrid system that was submitted to the challenge by our team Cocoplaya. This system consists in combining the recommendations produced by two different models using ranking fusion. The first model is based on Matrix Factorization and it incorporates information from tracks' audio and playlist titles. The second model generates recommendations based on typical track co-occurrences considering their proximity in the playlists. The proposed approach is efficient and achieves a good overall performance, with our model ranked 4th on the creative track of the challenge leaderboard. △ Less

Submitted 2 January, 2019; originally announced January 2019.

Comments: 5 pages

Journal ref: Proceeding RecSys Challenge '18 Proceedings of the ACM Recommender Systems Challenge 2018

arXiv:1604.03603 [pdf, other]

Algorithmic computation of polynomial amoebas

Authors: D. V. Bogdanov, A. A. Kytmanov, T. M. Sadykov

Abstract: We present algorithms for computation and visualization of amoebas, their contours, compactified amoebas and sections of three-dimensional amoebas by two-dimensional planes. We also provide method and an algorithm for the computation of~polynomials whose amoebas exhibit the most complicated topology among all polynomials with a fixed Newton polytope. The presented algorithms are implemented in com… ▽ More We present algorithms for computation and visualization of amoebas, their contours, compactified amoebas and sections of three-dimensional amoebas by two-dimensional planes. We also provide method and an algorithm for the computation of~polynomials whose amoebas exhibit the most complicated topology among all polynomials with a fixed Newton polytope. The presented algorithms are implemented in computer algebra systems Matlab 8 and Mathematica 9. △ Less

Submitted 12 April, 2016; originally announced April 2016.

MSC Class: 14Q10; 14Q05; 14Q15 ACM Class: G.4

Showing 1–19 of 19 results for author: Bogdanov, D