-
Resource-constrained stereo singing voice cancellation
Authors:
Clara Borrelli,
James Rae,
Dogac Basaran,
Matt McVicar,
Mehrez Souden,
Matthias Mauch
Abstract:
We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited a…
▽ More
We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited and singing voice processing has to run with limited look-ahead. In practice, this is realised by adapting an existing mono model to handle stereo input. Improvements in quality are obtained by tuning model parameters and expanding the training set. Moreover, we highlight the benefits a stereo model brings by introducing a new metric which detects attenuation inconsistencies between channels. Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial, confirming the effectiveness of our techniques in stringent listening tests.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Low-$Q^2$ elastic electron-proton scattering using a gas jet target
Authors:
Y. Wang,
J. C. Bernauer,
B. S. Schlimme,
P. Achenbach,
S. Aulenbacher,
M. Ball,
M. Biroth,
D. Bonaventura,
D. Bosnar,
P. Brand,
S. Caiazza,
M. Christmann,
E. Cline,
A. Denig,
M. O. Distler,
L. Doria,
P. Eckert,
A. Esser,
I. Friscic,
S. Gagneur,
J. Geimer,
S. Grieser,
P. Gulker,
P. Herrmann,
M. Hoek
, et al. (32 additional authors not shown)
Abstract:
In this paper, we describe an experiment measuring low-$Q^2$ elastic electron-proton scattering using a newly developed cryogenic supersonic gas jet target in the A1 three-spectrometer facility at the Mainz Microtron. We measured the proton electric form factor within the four-momentum transfer range of $0.01\le Q^2 \le 0.045(\text{GeV/c})^2$. The experiment showed consistent results with the exis…
▽ More
In this paper, we describe an experiment measuring low-$Q^2$ elastic electron-proton scattering using a newly developed cryogenic supersonic gas jet target in the A1 three-spectrometer facility at the Mainz Microtron. We measured the proton electric form factor within the four-momentum transfer range of $0.01\le Q^2 \le 0.045(\text{GeV/c})^2$. The experiment showed consistent results with the existing measurements. The data we collected demonstrated the feasibility of the gas jet target and the potential of future scattering experiments using high-resolution spectrometers with this gas jet target.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings
Authors:
Massimo Quadrana,
Antoine Larreche-Mouly,
Matthias Mauch
Abstract:
Song embeddings are a key component of most music recommendation engines. In this work, we study the hyper-parameter optimization of behavioral song embeddings based on Word2Vec on a selection of downstream tasks, namely next-song recommendation, false neighbor rejection, and artist and genre clustering. We present new optimization objectives and metrics to monitor the effects of hyper-parameter o…
▽ More
Song embeddings are a key component of most music recommendation engines. In this work, we study the hyper-parameter optimization of behavioral song embeddings based on Word2Vec on a selection of downstream tasks, namely next-song recommendation, false neighbor rejection, and artist and genre clustering. We present new optimization objectives and metrics to monitor the effects of hyper-parameter optimization. We show that single-objective optimization can cause side effects on the non optimized metrics and propose a simple multi-objective optimization to mitigate these effects. We find that next-song recommendation quality of Word2Vec is anti-correlated with song popularity, and we show how song embedding optimization can balance performance across different popularity levels. We then show potential positive downstream effects on the task of play prediction. Finally, we provide useful insights on the effects of training dataset scale by testing hyper-parameter optimization on an industry-scale dataset.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
Lyric document embeddings for music tagging
Authors:
Matt McVicar,
Bruno Di Giorgi,
Baris Dundar,
Matthias Mauch
Abstract:
We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neur…
▽ More
We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neural architectures. Evaluating on a wide range of tagging tasks such as genre classification, explicit content identification and era detection, we find that averaging word embeddings outperform more complex architectures in many downstream metrics.
△ Less
Submitted 29 November, 2021;
originally announced December 2021.
-
Operation and characterization of a windowless gas jet target in high-intensity electron beams
Authors:
B. S. Schlimme,
S. Aulenbacher,
P. Brand,
M. Littich,
Y. Wang,
P. Achenbach,
M. Ball,
J. C. Bernauer,
M. Biroth,
D. Bonaventura,
D. Bosnar,
S. Caiazza,
M. Christmann,
E. Cline,
A. Denig,
M. O. Distler,
L. Doria,
P. Eckert,
A. Esser,
I. Friščić,
S. Gagneur,
J. Geimer,
S. Grieser,
P. Gülker,
P. Herrmann
, et al. (32 additional authors not shown)
Abstract:
A cryogenic supersonic gas jet target was developed for the MAGIX experiment at the high-intensity electron accelerator MESA. It will be operated as an internal, windowless target in the energy-recovering recirculation arc of the accelerator with different target gases, e.g., hydrogen, deuterium, helium, oxygen, argon, or xenon. Detailed studies have been carried out at the existing A1 multi-spect…
▽ More
A cryogenic supersonic gas jet target was developed for the MAGIX experiment at the high-intensity electron accelerator MESA. It will be operated as an internal, windowless target in the energy-recovering recirculation arc of the accelerator with different target gases, e.g., hydrogen, deuterium, helium, oxygen, argon, or xenon. Detailed studies have been carried out at the existing A1 multi-spectrometer facility at the electron accelerator MAMI. This paper focuses on the developed handling procedures and diagnostic tools, and on the performance of the gas jet target under beam conditions. Considering the special features of this type of target, it proves to be well suited for a new generation of high-precision electron scattering experiments at high-intensity electron accelerators.
△ Less
Submitted 16 July, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks
Authors:
Bruno Di Giorgi,
Matthias Mauch,
Mark Levy
Abstract:
The human ability to track musical downbeats is robust to changes in tempo, and it extends to tempi never previously encountered. We propose a deterministic time-war** operation that enables this skill in a convolutional neural network (CNN) by allowing the network to learn rhythmic patterns independently of tempo. Unlike conventional deep learning approaches, which learn rhythmic patterns at th…
▽ More
The human ability to track musical downbeats is robust to changes in tempo, and it extends to tempi never previously encountered. We propose a deterministic time-war** operation that enables this skill in a convolutional neural network (CNN) by allowing the network to learn rhythmic patterns independently of tempo. Unlike conventional deep learning approaches, which learn rhythmic patterns at the tempi present in the training dataset, the patterns learned in our model are tempo-invariant, leading to better tempo generalisation and more efficient usage of the network capacity. We test the generalisation property on a synthetic dataset created by rendering the Groove MIDI Dataset using FluidSynth, split into a training set containing the original performances and a test set containing tempo-scaled versions rendered with different SoundFonts (test-time augmentation). The proposed model generalises nearly perfectly to unseen tempi (F-measure of 0.89 on both training and test sets), whereas a comparable conventional CNN achieves similar accuracy only for the training set (0.89) and drops to 0.54 on the test set. The generalisation advantage of the proposed model extends to real music, as shown by results on the GTZAN and Ballroom datasets.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Electron beam studies of light collection in a scintillating counter with embedded fibers
Authors:
M. Lauß,
P. Achenbach,
S. Aulenbacher,
M. Ball,
I. Beltschikow,
M. Biroth,
P. Brand,
S. Caiazza,
M. Christmann,
O. Corell,
A. Denig,
L. Doria,
P. Drexler,
J. Geimer,
P. Gülker,
T. Kolar,
W. Lauth,
M. Littich,
M. Lupberger,
S. Lunkenheimer,
D. Markus,
M. Mauch,
H. Merkel,
M. Mihovilovič,
J. Müller
, et al. (6 additional authors not shown)
Abstract:
The light collection of several fiber configurations embedded in a box-shaped plastic scintillating counter was studied by scanning with minimum ionizing electrons. The light was read out by silicon photomultipliers at both ends. The light yield produced by the 855-MeV beam of the Mainz Microtron showed a strong dependence on the transverse distance from the beam position to the fibers. The observ…
▽ More
The light collection of several fiber configurations embedded in a box-shaped plastic scintillating counter was studied by scanning with minimum ionizing electrons. The light was read out by silicon photomultipliers at both ends. The light yield produced by the 855-MeV beam of the Mainz Microtron showed a strong dependence on the transverse distance from the beam position to the fibers. The observations were modeled by attributing the collection of indirect light inside of the counter and of direct light reaching a fiber to the total light yield. The light collection with fibers was compared to that of a scintillating counter without fibers. These studies were carried out within the development of plastic scintillating detectors as an active veto system for the DarkMESA electron beam-dump experiment that will search for light dark matter particles in the MeV mass range.
△ Less
Submitted 2 July, 2021; v1 submitted 15 January, 2021;
originally announced January 2021.
-
Development of large area focal plane detectors for MAGIX
Authors:
P. Gülker,
P. Achenbach,
S. Aulenbacher,
J. Bernauer,
S. Caiazza,
M. Christmann,
A. Denig,
S. Grieser,
A. -K. Hergemöller,
B. Hetz,
A. Khoukaz,
M. Klein,
T. Kolar,
M. Littich,
S. Lunkenheimer,
M. Mauch,
H. Merkel,
M. Mihovilovic,
J. Muller,
J. Rausch,
Y. Schelhaas,
S. Schlimme,
S. Sirca
Abstract:
MAGIX is a planned experiment that will be implemented at the upcoming accelerator MESA in Mainz. Due to its location in the energy-recovering lane of the accelerator beam-currents up to 1mA with a maximum energy of 105 MeV will be available for precision experiments. MAGIX itself consists of a jet-target and two magnetic spectrometers. Inside the spectrometers GEM-based detectors will be used in…
▽ More
MAGIX is a planned experiment that will be implemented at the upcoming accelerator MESA in Mainz. Due to its location in the energy-recovering lane of the accelerator beam-currents up to 1mA with a maximum energy of 105 MeV will be available for precision experiments. MAGIX itself consists of a jet-target and two magnetic spectrometers. Inside the spectrometers GEM-based detectors will be used in the focal plane for track reconstruction. The design goals for the detector modules are a spatial resolution of 50 um, a size of 1.20 m x 0.3 m and a minimal material budget. To accomplish these goals we started develo** several GEM-prototypes to study different behaviors and techniques to optimize the final detector design. The GEM foils used are provided by CERN and are trained, stretched and framed in our laboratory. The readout is done with an SRS based system. In this contribution the requirements, achievements and the ongoing developments are presented.
△ Less
Submitted 2 August, 2019; v1 submitted 13 June, 2019;
originally announced June 2019.
-
The Evolution of Popular Music: USA 1960-2010
Authors:
Matthias Mauch,
Robert M. MacCallum,
Mark Levy,
Armand M. Leroi
Abstract:
In modern societies, cultural change seems ceaseless. The flux of fashion is especially obvious for popular music. While much has been written about the origin and evolution of pop, most claims about its history are anecdotal rather than scientific in nature. To rectify this we investigate the US Billboard Hot 100 between 1960 and 2010. Using Music Information Retrieval (MIR) and text-mining tools…
▽ More
In modern societies, cultural change seems ceaseless. The flux of fashion is especially obvious for popular music. While much has been written about the origin and evolution of pop, most claims about its history are anecdotal rather than scientific in nature. To rectify this we investigate the US Billboard Hot 100 between 1960 and 2010. Using Music Information Retrieval (MIR) and text-mining tools we analyse the musical properties of ~17,000 recordings that appeared in the charts and demonstrate quantitative trends in their harmonic and timbral properties. We then use these properties to produce an audio-based classification of musical styles and study the evolution of musical diversity and disparity, testing, and rejecting, several classical theories of cultural change. Finally, we investigate whether pop musical evolution has been gradual or punctuated. We show that, although pop music has evolved continuously, it did so with particular rapidity during three stylistic "revolutions" around 1964, 1983 and 1991. We conclude by discussing how our study points the way to a quantitative science of cultural change.
△ Less
Submitted 17 February, 2015;
originally announced February 2015.
-
Sequential Complexity as a Descriptor for Musical Similarity
Authors:
Peter Foster,
Matthias Mauch,
Simon Dixon
Abstract:
We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors…
▽ More
We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.
△ Less
Submitted 28 September, 2014; v1 submitted 27 February, 2014;
originally announced February 2014.