Skip to main content

Showing 1–19 of 19 results for author: Sturm, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.12666  [pdf, other

    cs.SD cs.LG eess.AS

    SYMPLEX: Controllable Symbolic Music Generation using Simplex Diffusion with Vocabulary Priors

    Authors: Nicolas Jonason, Luca Casini, Bob L. T. Sturm

    Abstract: We present a new approach for fast and controllable generation of symbolic music based on the simplex diffusion, which is essentially a diffusion process operating on probabilities rather than the signal space. This objective has been applied in domains such as natural language processing but here we apply it to generating 4-bar multi-instrument music loops using an orderless representation. We sh… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  2. arXiv:2311.10384  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation of Symbolic Music with LLMs

    Authors: Nicolas Jonason, Luca Casini, Carl Thomé, Bob L. T. Sturm

    Abstract: We explore the use of large language models (LLMs) for music generation using a retrieval system to select relevant examples. We find promising initial results for music generation in a dialogue with the user, especially considering the ease with which such a system can be implemented. The code is available online.

    Submitted 28 December, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: LBD @ ISMIR 2023

  3. arXiv:2309.07658  [pdf, other

    cs.SD eess.AS

    DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input

    Authors: Nicolas Jonason, Xin Wang, Erica Cooper, Lauri Juvela, Bob L. T. Sturm, Junichi Yamagishi

    Abstract: We explore the use of neural synthesis for acoustic guitar from string-wise MIDI input. We propose four different systems and compare them with both objective metrics and subjective evaluation against natural audio and a sample-based baseline. We iteratively develop these four systems by making various considerations on the architecture and intermediate tasks, such as predicting pitch and loudness… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  4. arXiv:2305.03530  [pdf, other

    cs.SD cs.LG eess.AS

    Exploring Softly Masked Language Modelling for Controllable Symbolic Music Generation

    Authors: Nicolas Jonason, Bob L. T. Sturm

    Abstract: This document presents some early explorations of applying Softly Masked Language Modelling (SMLM) to symbolic music generation. SMLM can be seen as a generalisation of masked language modelling (MLM), where instead of each element of the input set being either known or unknown, each element can be known, unknown or partly known. We demonstrate some results of applying SMLM to constrained symbolic… ▽ More

    Submitted 11 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Version 1.1

  5. arXiv:2301.01578  [pdf, other

    cs.SD eess.AS

    Validity in Music Information Research Experiments

    Authors: Bob L. T. Sturm, Arthur Flexer

    Abstract: Validity is the truth of an inference made from evidence, such as data collected in an experiment, and is central to working scientifically. Given the maturity of the domain of music information research (MIR), validity in our opinion should be discussed and considered much more than it has been so far. Considering validity in one's work can improve its scientific and engineering value. Puzzling M… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

  6. arXiv:2212.02610  [pdf, other

    cs.SD cs.LG eess.AS

    Audio Latent Space Cartography

    Authors: Nicolas Jonason, Bob L. T. Sturm

    Abstract: We explore the generation of visualisations of audio latent spaces using an audio-to-image generation pipeline. We believe this can help with the interpretability of audio latent spaces. We demonstrate a variety of results on the NSynth dataset. A web demo is available.

    Submitted 7 December, 2022; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: Late Breaking / Demo, ISMIR 2022 (https://ismir2022program.ismir.net/lbd_413.html)

    ACM Class: J.5

  7. arXiv:2211.11225  [pdf, other

    cs.SD cs.LG eess.AS

    TimbreCLIP: Connecting Timbre to Text and Images

    Authors: Nicolas Jonason, Bob L. T. Sturm

    Abstract: We present work in progress on TimbreCLIP, an audio-text cross modal embedding trained on single instrument notes. We evaluate the models with a cross-modal retrieval task on synth patches. Finally, we demonstrate the application of TimbreCLIP on two tasks: text-driven audio equalization and timbre to image generation.

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Submitted to AAAI workshop on creative AI across modalities

    ACM Class: J.5

  8. arXiv:2206.03224  [pdf

    cs.CY cs.AI cs.HC

    The Beyond the Fence Musical and Computer Says Show Documentary

    Authors: Simon Colton, Maria Teresa Llano, Rose Hepworth, John Charnley, Catherine V. Gale, Archie Baron, Francois Pachet, Pierre Roy, Pablo Gervas, Nick Collins, Bob Sturm, Tillman Weyde, Daniel Wolff, James Robert Lloyd

    Abstract: During 2015 and early 2016, the cultural application of Computational Creativity research and practice took a big leap forward, with a project where multiple computational systems were used to provide advice and material for a new musical theatre production. Billed as the world's first 'computer musical... conceived by computer and substantially crafted by computer', Beyond The Fence was staged in… ▽ More

    Submitted 11 May, 2022; originally announced June 2022.

    Journal ref: The Seventh International Conference on Computational Creativity, {ICCC} 2016

  9. arXiv:2010.07913  [pdf, other

    eess.AS cs.SD

    Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark

    Authors: Bhusan Chettri, Emmanouil Benetos, Bob L. T. Sturm

    Abstract: The Automatic Speaker Verification Spoofing and Countermeasures Challenges motivate research in protecting speech biometric systems against a variety of different access attacks. The 2017 edition focused on replay spoofing attacks, and involved participants building and training systems on a provided dataset (ASVspoof 2017). More than 60 research papers have so far been published with this dataset… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020

  10. arXiv:2005.07788  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Reliable Local Explanations for Machine Listening

    Authors: Saumitra Mishra, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

    Abstract: One way to analyse the behaviour of machine learning models is through local explanations that highlight input features that maximally influence model predictions. Sensitivity analysis, which involves analysing the effect of input perturbations on model predictions, is one of the methods to generate local explanations. Meaningful input perturbations are essential for generating reliable explanatio… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

    Comments: 8 pages plus references. Accepted at the IJCNN 2020 Special Session on Explainable Computational/Artificial Intelligence. Camera-ready version

  11. arXiv:1904.09533  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    GAN-based Generation and Automatic Selection of Explanations for Neural Networks

    Authors: Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

    Abstract: One way to interpret trained deep neural networks (DNNs) is by inspecting characteristics that neurons in the model respond to, such as by iteratively optimising the model input (e.g., an image) to maximally activate specific neurons. However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual,… ▽ More

    Submitted 27 April, 2019; v1 submitted 20 April, 2019; originally announced April 2019.

    Comments: 8 pages plus references and appendix. Accepted at the ICLR 2019 Workshop "Safe Machine Learning: Specification, Robustness and Assurance". Camera-ready version. v2: Corrected page header

    Journal ref: SafeML Workshop at the International Conference on Learning Representations (ICLR) 2019

  12. arXiv:1904.04589  [pdf, other

    eess.AS cs.SD

    Ensemble Models for Spoofing Detection in Automatic Speaker Verification

    Authors: Bhusan Chettri, Daniel Stoller, Veronica Morfi, Marco A. Martínez Ramírez, Emmanouil Benetos, Bob L. Sturm

    Abstract: Detecting spoofing attempts of automatic speaker verification (ASV) systems is challenging, especially when using only one modeling approach. For robustness, we use both deep neural networks and traditional machine learning models and combine them as ensemble models through logistic regression. They are trained to detect logical access (LA) and physical access (PA) attacks on the dataset released… ▽ More

    Submitted 4 July, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

    Comments: Accepted at Interspeech 2019, Graz, Austria

  13. arXiv:1805.09164  [pdf, other

    eess.AS cs.SD

    A Study On Convolutional Neural Network Based End-To-End Replay Anti-Spoofing

    Authors: Bhusan Chettri, Saumitra Mishra, Bob L. Sturm, Emmanouil Benetos

    Abstract: The second Automatic Speaker Verification Spoofing and Countermeasures challenge (ASVspoof 2017) focused on "replay attack" detection. The best deep-learning systems to compete in ASVspoof 2017 used Convolutional Neural Networks (CNNs) as a feature extractor. In this paper, we study their performance in an end-to-end setting. We find that these architectures show poor generalization in the evaluat… ▽ More

    Submitted 22 May, 2018; originally announced May 2018.

    Comments: 6 pages

  14. arXiv:1606.03044  [pdf, ps, other

    cs.SD cs.AI cs.IR

    The "Horse'' Inside: Seeking Causes Behind the Behaviours of Music Content Analysis Systems

    Authors: Bob L. Sturm

    Abstract: Building systems that possess the sensitivity and intelligence to identify and describe high-level attributes in music audio signals continues to be an elusive goal, but one that surely has broad and deep implications for a wide variety of applications. Hundreds of papers have so far been published toward this goal, and great progress appears to have been made. Some systems produce remarkable accu… ▽ More

    Submitted 9 June, 2016; originally announced June 2016.

    Comments: 32 pages, 17 figures, this work was accepted for publication in a journal special issue in Apr. 2015

    ACM Class: H.3.1; I.2.6; J.5

  15. arXiv:1604.08723  [pdf, other

    cs.SD cs.LG

    Music transcription modelling and composition using deep learning

    Authors: Bob L. Sturm, João Felipe Santos, Oded Ben-Tal, Iryna Korshunova

    Abstract: We apply deep learning methods, specifically long short-term memory (LSTM) networks, to music transcription modelling and composition. We build and train LSTM networks using approximately 23,000 music transcriptions expressed with a high-level vocabulary (ABC notation), and use them to generate new transcriptions. Our practical aim is to create music transcription models useful in particular conte… ▽ More

    Submitted 29 April, 2016; originally announced April 2016.

    Comments: 16 pages, 4 figures, contribution to 1st Conference on Computer Simulation of Musical Creativity

  16. arXiv:1507.04761  [pdf, other

    cs.LG cs.NE cs.SD

    Deep Learning and Music Adversaries

    Authors: Corey Kereliuk, Bob L. Sturm, Jan Larsen

    Abstract: An adversary is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclas… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

    Comments: 13 pages, 6 figures, 3 tables, 6 sections

  17. arXiv:1410.0001  [pdf, other

    cs.IR cs.SD

    On Evaluation Validity in Music Autotagging

    Authors: Fabien Gouyon, Bob L. Sturm, Joao Lobato Oliveira, Nuno Hespanhol, Thibault Langlois

    Abstract: Music autotagging, an established problem in Music Information Retrieval, aims to alleviate the human cost required to manually annotate collections of recorded music with textual labels by automating the process. Many autotagging systems have been proposed and evaluated by procedures and datasets that are now standard (used in MIREX, for instance). Very little work, however, has been dedicated to… ▽ More

    Submitted 30 September, 2014; originally announced October 2014.

    Comments: Submitted for journal publication in September 2014

  18. The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

    Authors: Bob L. Sturm

    Abstract: The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR… ▽ More

    Submitted 7 June, 2013; v1 submitted 6 June, 2013; originally announced June 2013.

    Comments: 29 pages, 7 figures, 6 tables, 128 references

  19. arXiv:1103.6246  [pdf, other

    cs.DS

    Sparse Vector Distributions and Recovery from Compressed Sensing

    Authors: Bob L. Sturm

    Abstract: It is well known that the performance of sparse vector recovery algorithms from compressive measurements can depend on the distribution underlying the non-zero elements of a sparse vector. However, the extent of these effects has yet to be explored, and formally presented. In this paper, I empirically investigate this dependence for seven distributions and fifteen recovery algorithms. The two mora… ▽ More

    Submitted 15 July, 2011; v1 submitted 31 March, 2011; originally announced March 2011.

    Comments: Originally submitted to IEEE Signal Processing Letters in March 2011, but rejected June 2011. Revised, expanded, and submitted July 2011 to EURASIP Journal special issue on sparse signal processing

    ACM Class: F.2.1; G.1.2; G.1.3; G.1.6