-
Resource-constrained stereo singing voice cancellation
Authors:
Clara Borrelli,
James Rae,
Dogac Basaran,
Matt McVicar,
Mehrez Souden,
Matthias Mauch
Abstract:
We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited a…
▽ More
We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited and singing voice processing has to run with limited look-ahead. In practice, this is realised by adapting an existing mono model to handle stereo input. Improvements in quality are obtained by tuning model parameters and expanding the training set. Moreover, we highlight the benefits a stereo model brings by introducing a new metric which detects attenuation inconsistencies between channels. Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial, confirming the effectiveness of our techniques in stringent listening tests.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Lyric document embeddings for music tagging
Authors:
Matt McVicar,
Bruno Di Giorgi,
Baris Dundar,
Matthias Mauch
Abstract:
We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neur…
▽ More
We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neural architectures. Evaluating on a wide range of tagging tasks such as genre classification, explicit content identification and era detection, we find that averaging word embeddings outperform more complex architectures in many downstream metrics.
△ Less
Submitted 29 November, 2021;
originally announced December 2021.
-
Meta-song evaluation for chord recognition
Authors:
Yizhao Ni,
Matt Mcvicar,
Raul Santos-Rodriguez,
Tijl De Bie
Abstract:
We present a new approach to evaluate chord recognition systems on songs which do not have full annotations. The principle is to use online chord databases to generate high accurate "pseudo annotations" for these songs and compute "pseudo accuracies" of test systems. Statistical models that model the relationship between "pseudo accuracy" and real performance are then applied to estimate test syst…
▽ More
We present a new approach to evaluate chord recognition systems on songs which do not have full annotations. The principle is to use online chord databases to generate high accurate "pseudo annotations" for these songs and compute "pseudo accuracies" of test systems. Statistical models that model the relationship between "pseudo accuracy" and real performance are then applied to estimate test systems' performance. The approach goes beyond the existing evaluation metrics, allowing us to carry out extensive analysis on chord recognition systems, such as their generalizations to different genres. In the experiments we applied this method to evaluate three state-of-the-art chord recognition systems, of which the results verified its reliability.
△ Less
Submitted 2 September, 2011;
originally announced September 2011.
-
An end-to-end machine learning system for harmonic analysis of music
Authors:
Yizhao Ni,
Matt Mcvicar,
Raul Santos-Rodriguez,
Tijl De Bie
Abstract:
We present a new system for simultaneous estimation of keys, chords, and bass notes from music audio. It makes use of a novel chromagram representation of audio that takes perception of loudness into account. Furthermore, it is fully based on machine learning (instead of expert knowledge), such that it is potentially applicable to a wider range of genres as long as training data is available. As c…
▽ More
We present a new system for simultaneous estimation of keys, chords, and bass notes from music audio. It makes use of a novel chromagram representation of audio that takes perception of loudness into account. Furthermore, it is fully based on machine learning (instead of expert knowledge), such that it is potentially applicable to a wider range of genres as long as training data is available. As compared to other models, the proposed system is fast and memory efficient, while achieving state-of-the-art performance.
△ Less
Submitted 25 July, 2011;
originally announced July 2011.