Skip to main content

Showing 1–32 of 32 results for author: Dixon, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14850  [pdf, other

    eess.AS

    DExter: Learning and Controlling Performance Expression with Diffusion Models

    Authors: Huan Zhang, Shreyan Chowdhury, Carlos Eduardo Cancino-Chacón, **hua Liang, Simon Dixon, Gerhard Widmer

    Abstract: In the pursuit of develo** expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, performance parameters are represented in a continuous expression space and a diffusion model is trained to predict these continuous parameters while b… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: in submission to appsci special session

  2. arXiv:2405.18386  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

    Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

    Abstract: Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; o… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Code and demo are available at: https://github.com/ldzhangyx/instruct-musicgen

  3. arXiv:2405.16687  [pdf, other

    cs.SD eess.AS

    Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

    Authors: Xavier Riley, Simon Dixon

    Abstract: The Charlie Parker Omnibook is a cornerstone of jazz music education, described by pianist Ethan Iverson as "the most important jazz education text ever published". In this work we propose a new transcription pipeline and explore the extent to which state of the art music technology is able to reconstruct these scores directly from the audio without human intervention. Our pipeline includes: a new… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2402.15258  [pdf, other

    eess.AS cs.LG cs.SD

    High Resolution Guitar Transcription via Domain Adaptation

    Authors: Xavier Riley, Drew Edwards, Simon Dixon

    Abstract: Automatic music transcription (AMT) has achieved high accuracy for piano due to the availability of large, high-quality datasets such as MAESTRO and MAPS, but comparable datasets are not yet available for other instruments. In recent work, however, it has been demonstrated that aligning scores to transcription model activations can produce high quality AMT training data for instruments other than… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024

  5. arXiv:2402.06178  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

    Authors: Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

    Abstract: Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and inst… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to IJCAI 2024

  6. arXiv:2402.01424  [pdf, other

    cs.SD cs.LG eess.AS

    A Data-Driven Analysis of Robust Automatic Piano Transcription

    Authors: Drew Edwards, Simon Dixon, Emmanouil Benetos, Akira Maezawa, Yuta Kusaka

    Abstract: Algorithms for automatic piano transcription have improved dramatically in recent years due to new datasets and modeling techniques. Recent developments have focused primarily on adapting new neural network architectures, such as the Transformer and Perceiver, in order to yield more accurate systems. In this work, we study transcription systems from the perspective of their training data. By measu… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in IEEE Signal Processing Letters on 31 Janurary, 2024

  7. arXiv:2311.08884  [pdf, other

    cs.SD cs.MM eess.AS

    CREPE Notes: A new method for segmenting pitch contours into discrete notes

    Authors: Xavier Riley, Simon Dixon

    Abstract: Tracking the fundamental frequency (f0) of a monophonic instrumental performance is effectively a solved problem with several solutions achieving 99% accuracy. However, the related task of automatic music transcription requires a further processing step to segment an f0 contour into discrete notes. This sub-task of note segmentation is necessary to enable a range of applications including musicolo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Journal ref: Proceedings of the 20th Sound and Music Computing Conference. June 15-17, 2023. Stockholm, Sweden

  8. arXiv:2311.02023  [pdf, other

    cs.SD cs.MM eess.AS

    FiloBass: A Dataset and Corpus Based Study of Jazz Basslines

    Authors: Xavier Riley, Simon Dixon

    Abstract: We present FiloBass: a novel corpus of music scores and annotations which focuses on the important but often overlooked role of the double bass in jazz accompaniment. Inspired by recent work that sheds light on the role of the soloist, we offer a collection of 48 manually verified transcriptions of professional jazz bassists, comprising over 50,000 note events, which are based on the backing track… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: ISMIR 2023

    Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023, Milan, Italy

  9. arXiv:2310.12404  [pdf, other

    cs.SD cs.CL cs.HC cs.LG eess.AS

    Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing

    Authors: Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon Dixon

    Abstract: Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpre… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Source code and demo video are available at \url{https://sites.google.com/view/loop-copilot}

  10. arXiv:2309.02567  [pdf, other

    eess.AS cs.MM cs.SD

    Symbolic Music Representations for Classification Tasks: A Systematic Evaluation

    Authors: Huan Zhang, Emmanouil Karystinaios, Simon Dixon, Gerhard Widmer, Carlos Eduardo Cancino-Chacón

    Abstract: Music Information Retrieval (MIR) has seen a recent surge in deep learning-based approaches, which often involve encoding symbolic music (i.e., music represented in terms of discrete note events) in an image-like or language like fashion. However, symbolic music is neither an image nor a sentence, and research in the symbolic domain lacks a comprehensive overview of the different available represe… ▽ More

    Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: To be published in the Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

    Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

  11. arXiv:2302.13678  [pdf, other

    cs.SD cs.AI eess.AS

    A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

    Authors: Brendan O'Connor, Simon Dixon

    Abstract: Previous research has shown that established techniques for spoken voice conversion (VC) do not perform as well when applied to singing voice conversion (SVC). We propose an alternative loss component in a loss function that is otherwise well-established among VC tasks, which has been shown to improve our model's SVC performance. We first trained a singer identity embedding (SIE) network on mel-sp… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Submitted to the Sound and Music Computing Conference 2023

  12. arXiv:2208.11671  [pdf, other

    cs.SD cs.CL eess.AS

    Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model

    Authors: Yixiao Zhang, Junyan Jiang, Gus Xia, Simon Dixon

    Abstract: Lyric interpretations can help people understand songs and their lyrics quickly, and can also make it easier to manage, retrieve and discover songs efficiently from the growing mass of music archives. In this paper we propose BART-fusion, a novel model for generating lyric interpretations from lyrics and music audio that combines a large-scale pre-trained language model with an audio encoder. We e… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Accepted to ISMIR 2022

  13. arXiv:2205.05871  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio

    Authors: Yin-Jyun Luo, Sebastian Ewert, Simon Dixon

    Abstract: Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentangleme… ▽ More

    Submitted 14 June, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: The paper is accepted to IJCAI 2022

  14. arXiv:2204.08822  [pdf, other

    cs.SD cs.AI eess.AS

    A Convolutional-Attentional Neural Framework for Structure-Aware Performance-Score Synchronization

    Authors: Ruchit Agrawal, Daniel Wolff, Simon Dixon

    Abstract: Performance-score synchronization is an integral task in signal processing, which entails generating an accurate map** between an audio recording of a performance and the corresponding musical score. Traditional synchronization methods compute alignment using knowledge-driven and stochastic approaches, and are typically unable to generalize well to different domains and modalities. We present a… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: Published in IEEE Signal Processing Letters, Volume 29, December 2021

  15. arXiv:2111.08839  [pdf, other

    cs.SD eess.AS

    Zero-shot Singing Technique Conversion

    Authors: Brendan O'Connor, Simon Dixon, George Fazekas

    Abstract: In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swap** out a source singer's technique information for that of the target's during conversion, the input spectrogram… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: In Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR 2021), Tokyo, Japan, November 15-16, 2021

  16. An Exploratory Study on Perceptual Spaces of the Singing Voice

    Authors: Brendan O'Connor, Simon Dixon, George Fazekas

    Abstract: Sixty participants provided dissimilarity ratings between various singing techniques. Multidimensional scaling, class averaging and clustering techniques were used to analyse timbral spaces and how they change between different singers, genders and registers. Clustering analysis showed that ground-truth similarity and silhouette scores that were not significantly different between gender or regist… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: In Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020), Stockholm, Sweden, October 15-19, 2020

  17. arXiv:2108.02625  [pdf, other

    cs.SD cs.CL cs.IR eess.AS

    MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription

    Authors: Emir Demirel, Sven Ahlbäck, Simon Dixon

    Abstract: This paper makes several contributions to automatic lyrics transcription (ALT) research. Our main contribution is a novel variant of the Multistreaming Time-Delay Neural Network (MTDNN) architecture, called MSTRE-Net, which processes the temporal information using multiple streams in parallel with varying resolutions kee** the network more compact, and thus with a faster inference and an improve… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

  18. arXiv:2107.13617  [pdf, other

    cs.SD cs.IR cs.LG cs.NE eess.AS

    Pitch-Informed Instrument Assignment Using a Deep Convolutional Network with Multiple Kernel Shapes

    Authors: Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck

    Abstract: This paper proposes a deep convolutional neural network for performing note-level instrument assignment. Given a polyphonic multi-instrumental music signal along with its ground truth or predicted notes, the objective is to assign an instrumental source for each note. This problem is addressed as a pitch-informed classification task where each note is analysed individually. We also propose to util… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: 4 figures, 4 tables and 7 pages. Accepted for publication at ISMIR Conference 2021

  19. arXiv:2106.10977  [pdf, other

    cs.IR cs.SD eess.AS

    Computational Pronunciation Analysis in Sung Utterances

    Authors: Emir Demirel, Sven Ahlback, Simon Dixon

    Abstract: Recent automatic lyrics transcription (ALT) approaches focus on building stronger acoustic models or in-domain language models, while the pronunciation aspect is seldom touched upon. This paper applies a novel computational analysis on the pronunciation variances in sung utterances and further proposes a new pronunciation model adapted for singing. The singing-adapted model is tested on multiple p… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  20. arXiv:2102.09202  [pdf, other

    cs.SD eess.AS

    Low Resource Audio-to-Lyrics Alignment From Polyphonic Music Recordings

    Authors: Emir Demirel, Sven Ahlbäck, Simon Dixon

    Abstract: Lyrics alignment in long music recordings can be memory exhaustive when performed in a single pass. In this study, we present a novel method that performs audio-to-lyrics alignment with a low memory consumption footprint regardless of the duration of the music recording. The proposed system first spots the anchoring words within the audio signal. With respect to these anchors, the recording is the… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

  21. arXiv:2102.00382  [pdf, other

    cs.SD cs.LG eess.AS

    Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

    Authors: Ruchit Agrawal, Daniel Wolff, Simon Dixon

    Abstract: The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment, an important subtask of music information retrieval. We present a novel method to detect such differences between the score and performance for a given piece of music using progressively dilated convolutional neural networks. Our method incorporates… ▽ More

    Submitted 13 February, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: ICASSP 2021 camera-ready version. Copyrights belong to IEEE

  22. Adversarial Unsupervised Domain Adaptation for Harmonic-Percussive Source Separation

    Authors: Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck, Patrik Ohlsson

    Abstract: This paper addresses the problem of domain adaptation for the task of music source separation. Using datasets from two different domains, we compare the performance of a deep learning-based harmonic-percussive source separation model under different training scenarios, including supervised joint training using data from both domains and pre-training in one domain with fine-tuning in another. We pr… ▽ More

    Submitted 3 January, 2021; originally announced January 2021.

    Comments: 5 pages, 2 figures and 1 table. Accepted for publication in IEEE Signal Processing Letters

  23. arXiv:2011.07546  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

    Authors: Ruchit Agrawal, Simon Dixon

    Abstract: Audio-to-score alignment aims at generating an accurate map** between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time War** (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

    Comments: Accepted at EUSIPCO 2020

  24. arXiv:2007.14333  [pdf, other

    eess.AS cs.LG cs.SD

    A Hybrid Approach to Audio-to-Score Alignment

    Authors: Ruchit Agrawal, Simon Dixon

    Abstract: Audio-to-score alignment aims at generating an accurate map** between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time War** (DTW) and employ handcrafted features. We explore the usage of neural networks as a preprocessing step for DTW-based automatic alignment methods. Experiments on music data from different acoustic conditions demonstr… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: ML4MD at ICML 2019

  25. arXiv:2007.06486  [pdf, other

    eess.AS cs.CL cs.LG

    Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention

    Authors: Emir Demirel, Sven Ahlback, Simon Dixon

    Abstract: Speech recognition is a well developed research field so that the current state of the art systems are being used in many applications in the software industry, yet as by today, there still does not exist such robust system for the recognition of words and sentences from singing voice. This paper proposes a complete pipeline for this task which may commonly be referred as automatic lyrics transcri… ▽ More

    Submitted 24 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

  26. arXiv:2005.07788  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Reliable Local Explanations for Machine Listening

    Authors: Saumitra Mishra, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

    Abstract: One way to analyse the behaviour of machine learning models is through local explanations that highlight input features that maximally influence model predictions. Sensitivity analysis, which involves analysing the effect of input perturbations on model predictions, is one of the methods to generate local explanations. Meaningful input perturbations are essential for generating reliable explanatio… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

    Comments: 8 pages plus references. Accepted at the IJCNN 2020 Special Session on Explainable Computational/Artificial Intelligence. Camera-ready version

  27. arXiv:1911.06393  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

    Authors: Daniel Stoller, Mi Tian, Sebastian Ewert, Simon Dixon

    Abstract: Convolutional neural networks (CNNs) with dilated filters such as the Wavenet or the Temporal Convolutional Network (TCN) have shown good results in a variety of sequence modelling tasks. However, efficiently modelling long-term dependencies in these sequences is still challenging. Although the receptive field of these models grows exponentially with the number of layers, computing the convolution… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: Code available at https://github.com/f90/Seq-U-Net

  28. arXiv:1905.01899  [pdf, other

    cs.SD eess.AS

    Investigating kernel shapes and skip connections for deep learning-based harmonic-percussive separation

    Authors: Carlos Lordelo, Emmanouil Benetos, Simon Dixon, Sven Ahlbäck

    Abstract: In this paper we propose an efficient deep learning encoder-decoder network for performing Harmonic-Percussive Source Separation (HPSS). It is shown that we are able to greatly reduce the number of model trainable parameters by using a dense arrangement of skip connections between the model layers. We also explore the utilisation of different kernel sizes for the 2D filters of the convolutional la… ▽ More

    Submitted 30 July, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: Accepted for publication at WASPAA 2019, 5 pages, 5 figures

  29. arXiv:1904.09533  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    GAN-based Generation and Automatic Selection of Explanations for Neural Networks

    Authors: Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon

    Abstract: One way to interpret trained deep neural networks (DNNs) is by inspecting characteristics that neurons in the model respond to, such as by iteratively optimising the model input (e.g., an image) to maximally activate specific neurons. However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual,… ▽ More

    Submitted 27 April, 2019; v1 submitted 20 April, 2019; originally announced April 2019.

    Comments: 8 pages plus references and appendix. Accepted at the ICLR 2019 Workshop "Safe Machine Learning: Specification, Robustness and Assurance". Camera-ready version. v2: Corrected page header

    Journal ref: SafeML Workshop at the International Conference on Learning Representations (ICLR) 2019

  30. arXiv:1806.03185  [pdf, other

    cs.SD eess.AS stat.ML

    Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

    Authors: Daniel Stoller, Sebastian Ewert, Simon Dixon

    Abstract: Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, em… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.

    Comments: 7 pages (1 for references), 4 figures, 3 tables. Appearing in the proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018) (camera-ready version). Implementation available at https://github.com/f90/Wave-U-Net

    Journal ref: 19th International Society for Music Information Retrieval Conference (ISMIR 2018)

  31. arXiv:1804.01650  [pdf, other

    cs.SD cs.LG eess.AS

    Jointly Detecting and Separating Singing Voice: A Multi-Task Approach

    Authors: Daniel Stoller, Sebastian Ewert, Simon Dixon

    Abstract: A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to exploit their correlation. While intuitive in principle, it can be challenging to identify related tasks and construct the model to optimally share information be… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: 10 pages, 2 figures, accepted for the 14th International Conference on Latent Variable Analysis and Signal Separation

  32. arXiv:1802.05178  [pdf, other

    cs.MM cs.SD eess.AS

    Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders

    Authors: Adib Mehrabi, Keunwoo Choi, Simon Dixon, Mark Sandler

    Abstract: The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual similarity between vocalisations and qu… ▽ More

    Submitted 14 February, 2018; originally announced February 2018.

    Comments: ICASSP 2018 camera-ready