Skip to main content

Showing 1–41 of 41 results for author: Herremans, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08820  [pdf, other

    eess.AS cs.CL

    DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage

    Authors: Kyra Wang, Dorien Herremans

    Abstract: Laughing, sighing, stuttering, and other forms of paralanguage do not contribute any direct lexical meaning to speech, but they provide crucial propositional context that aids semantic and pragmatic processes such as irony. It is thus important for artificial social agents to both understand and be able to generate speech with semantically-important paralanguage. Most speech datasets do not includ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 4 pages, 1 figure, submitted to IEEE TENCON 2024

  2. arXiv:2406.08809  [pdf, other

    cs.SD cs.AI eess.AS

    Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

    Authors: Jaeyong Kang, Dorien Herremans

    Abstract: Deep learning models for music have advanced drastically in the last few years. But how good are machine learning models at capturing emotion these days and what challenges are researchers facing? In this paper, we provide a comprehensive overview of the available music-emotion datasets and discuss evaluation standards as well as competitions in the field. We also provide a brief overview of vario… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.02255  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    MidiCaps -- A large-scale MIDI dataset with text captions

    Authors: Jan Melechovsky, Abhinaba Roy, Dorien Herremans

    Abstract: Generative models guided by text prompts are increasingly becoming more popular. However, no text-to-MIDI models currently exist, mostly due to the lack of a captioned MIDI dataset. This work aims to enable research that combines LLMs with symbolic music by presenting the first large-scale MIDI dataset with text captions that is openly available: MidiCaps. MIDI (Musical Instrument Digital Interfac… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Under review

  4. arXiv:2406.01018  [pdf, other

    eess.AS cs.LG cs.SD

    Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

    Authors: Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans

    Abstract: With rapid globalization, the need to build inclusive and representative speech technology cannot be overstated. Accent is an important aspect of speech that needs to be taken into consideration while building inclusive speech synthesizers. Inclusive speech technology aims to erase any biases towards specific groups, such as people of certain accent. We note that state-of-the-art Text-to-Speech (T… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under review

  5. arXiv:2402.17467  [pdf, other

    cs.IR cs.AI cs.SD eess.AS

    Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

    Authors: Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller, Dorien Herremans

    Abstract: Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies processing music data. However, the practice of leveraging NLP tools for symbolic music data is not novel in MIR. Music has been frequently compared to language, as… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 36 pages, 5 figures, 4 tables

  6. arXiv:2311.08355  [pdf, other

    eess.AS

    Mustango: Toward Controllable Text-to-Music Generation

    Authors: Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria

    Abstract: The quality of the text-to-music models has reached new heights due to recent advancements in diffusion models. The controllability of various musical aspects, however, has barely been explored. In this paper, we propose Mustango: a music-domain-knowledge-inspired text-to-music system based on diffusion. Mustango aims to control the generated music, not only with general text captions, but with mo… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  7. arXiv:2311.00968  [pdf, other

    cs.SD cs.AI eess.AS

    Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

    Authors: Jaeyong Kang, Soujanya Poria, Dorien Herremans

    Abstract: Numerous studies in the field of music generation have demonstrated impressive performance, yet virtually no models are able to directly generate music to match accompanying videos. In this work, we develop a generative music AI framework, Video2Music, that can match a provided video. We first curated a unique collection of music videos. Then, we analysed the music videos to obtain semantic, scene… ▽ More

    Submitted 4 March, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Journal ref: Expert Systems with Applications 249 (2024): 123640

  8. arXiv:2302.00286  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

    Authors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

    Abstract: In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilize… ▽ More

    Submitted 1 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.10805

  9. arXiv:2212.00973  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling

    Authors: Z. Guo, J. Kang, D. Herremans

    Abstract: Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attri… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: This paper is accepted at AAAI 2023

  10. arXiv:2211.07283  [pdf, other

    eess.AS cs.SD

    SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

    Authors: Perry Lam, Huayun Zhang, Nancy F. Chen, Berrak Sisman, Dorien Herremans

    Abstract: Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to acc… ▽ More

    Submitted 1 June, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

  11. arXiv:2211.03316  [pdf, other

    eess.AS cs.LG cs.SD

    Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

    Authors: Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans

    Abstract: Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, which is converted to any desired target accent. Ou… ▽ More

    Submitted 3 June, 2024; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: preprint submitted to a conference, under review

  12. arXiv:2210.05148  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

    Authors: Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji

    Abstract: In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT). Instead of treating AMT as a discriminative task in which the model is trained to convert spectrograms into piano rolls, we think of it as a conditional generative task where we train our model to generate realistic looking piano rolls from pure Gaussian noise conditioned on spectrograms.… ▽ More

    Submitted 20 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of ICASSP - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2023

  13. arXiv:2206.10805  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

    Authors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy Hung, Ju-Chiang Wang, Dorien Herremans

    Abstract: In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano rolls, and the source separation module that utiliz… ▽ More

    Submitted 28 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Submitted to ISMIR

  14. arXiv:2204.11437  [pdf, other

    cs.SD eess.AS eess.SP

    Understanding Audio Features via Trainable Basis Functions

    Authors: Kwan Yee Heung, Kin Wai Cheuk, Dorien Herremans

    Abstract: In this paper we explore the possibility of maximizing the information represented in spectrograms by making the spectrogram basis functions trainable. We experiment with two different tasks, namely keyword spotting (KWS) and automatic speech recognition (ASR). For most neural network models, the architecture and hyperparameters are typically fine-tuned and optimized in experiments. Input features… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: under review in Interspeech 2022

  15. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu **, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  16. arXiv:2202.10453  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

    Authors: Phoebe Chua, Dimos Makris, Dorien Herremans, Gemma Roig, Kat Agres

    Abstract: Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived e… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

    Comments: 16 pages with 9 figures

  17. arXiv:2202.04464  [pdf, other

    cs.SD cs.LG eess.AS

    Conditional Drums Generation using Compound Word Representations

    Authors: Dimos Makris, Guo Zixun, Maximos Kaliakatsos-Papakostas, Dorien Herremans

    Abstract: The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a n… ▽ More

    Submitted 21 February, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: Accepted for the 11th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART), 2022

  18. arXiv:2107.04954  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data

    Authors: Kin Wai Cheuk, Dorien Herremans, Li Su

    Abstract: Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlab… ▽ More

    Submitted 29 July, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

    Comments: Accepted in ACMMM 21. Camera ready version

  19. arXiv:2106.12174  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Deep Neural Network Based Respiratory Pathology Classification Using Cough Sounds

    Authors: Balamurali B T, Hwan Ing Hee, Saumitra Kapoor, Oon Hoe Teoh, Sung Shin Teng, Khai Pin Lee, Dorien Herremans, Jer Ming Chen

    Abstract: Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). In order to train a deep neural network model, we collected a new data… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    MSC Class: 62-XX; 92-XX; 68Txx; ACM Class: J.3; I.2

  20. arXiv:2104.13056  [pdf, other

    cs.SD cs.LG eess.AS

    Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework

    Authors: Dimos Makris, Kat R. Agres, Dorien Herremans

    Abstract: The field of automatic music composition has seen great progress in the last few years, much of which can be attributed to advances in deep neural networks. There are numerous studies that present different strategies for generating sheet music from scratch. The inclusion of high-level musical characteristics (e.g., perceived emotional qualities), however, as conditions for controlling the generat… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted for the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021 (virtual)

  21. arXiv:2104.06607  [pdf, other

    cs.SD eess.AS

    Revisiting the Onsets and Frames Model with Additive Attention

    Authors: Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans

    Abstract: Recent advances in automatic music transcription (AMT) have achieved highly accurate polyphonic piano transcription results by incorporating onset and offset detection. The existing literature, however, focuses mainly on the leverage of deep and complex models to achieve state-of-the-art (SOTA) accuracy, without understanding model behaviour. In this paper, we conduct a comprehensive examination o… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted in IJCNN 2021 Special Session S04. https://dr-costas.github.io/rlasmp2021-website/

  22. arXiv:2102.13397  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Underwater Acoustic Communication Receiver Using Deep Belief Network

    Authors: Abigail Lee-Leon, Chau Yuen, Dorien Herremans

    Abstract: Underwater environments create a challenging channel for communications. In this paper, we design a novel receiver system by exploring the machine learning technique--Deep Belief Network (DBN)-- to combat the signal distortion caused by the Doppler effect and multi-path propagation. We evaluate the performance of the proposed receiver system in both simulation experiments and sea trials. Our propo… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

  23. arXiv:2010.11188  [pdf

    cs.SD cs.CV eess.AS

    AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies

    Authors: Ha Thi Phuong Thao, Balamurali B. T., Dorien Herremans, Gemma Roig

    Abstract: In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet. We take both audio and video into account and incorporate the relation among multiple modalities by applying self-attention mechanism in a novel manner into the extracted features for emotion prediction. We compare it to the typically temporal integrati… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: 8 pages, 6 figures

    Journal ref: Proceedings of the International Conference on Pattern Recognition (ICPR2020)

  24. arXiv:2010.09969  [pdf, other

    cs.SD cs.LG eess.AS

    The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy

    Authors: Kin Wai Cheuk, Yin-Jyun Luo, Emmanouil Benetos, Dorien Herremans

    Abstract: Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the final transcription. We attempt to use only the pitc… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted in ICPR

  25. arXiv:2010.06230  [pdf, ps, other

    cs.SD cs.SC eess.AS

    A variational autoencoder for music generation controlled by tonal tension

    Authors: Rui Guo, Ivor Simpson, Thor Magnusson, Chris Kiefer, Dorien Herremans

    Abstract: Many of the music generation systems based on neural networks are fully autonomous and do not offer control over the generation process. In this research, we present a controllable music generation system in terms of tonal tension. We incorporate two tonal tension measures based on the Spiral Array Tension theory into a variational autoencoder model. This allows us to control the direction of the… ▽ More

    Submitted 14 October, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: 2020 Joint Conference on AI Music Creativity

  26. arXiv:2009.04459  [pdf, other

    cs.SD cs.LG eess.AS

    A dataset and classification model for Malay, Hindi, Tamil and Chinese music

    Authors: Fajilatun Nahar, Kat Agres, Balamurali BT, Dorien Herremans

    Abstract: In this paper we present a new dataset, with musical excepts from the three main ethnic groups in Singapore: Chinese, Malay and Indian (both Hindi and Tamil). We use this new dataset to train different classification models to distinguish the origin of the music in terms of these ethnic groups. The classification models were optimized by exploring the use of different musical features as the input… ▽ More

    Submitted 15 September, 2020; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: 4 pages

  27. arXiv:2007.15474  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

    Authors: Hao Hao Tan, Dorien Herremans

    Abstract: High-level musical qualities (such as emotion) are often abstract, subjective, and hard to quantify. Given these difficulties, it is not easy to learn good feature representations with supervised learning techniques, either because of the insufficiency of labels, or the subjectiveness (and hence large variance) in human-annotated labels. In this paper, we present a framework that can learn high-le… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Journal ref: Proc. of 21st International Society of Music Information Retrieval Conference, ISMIR 2020

  28. arXiv:2006.09833  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance

    Authors: Hao Hao Tan, Yin-Jyun Luo, Dorien Herremans

    Abstract: We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics. We demonstrate how the model is able to apply fine-grained style morphing over the course of syn… ▽ More

    Submitted 12 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Journal ref: Published at ICML Workshop on Machine Learning for Media Discovery Workshop (ML4MD) 2020

  29. arXiv:2001.09989  [pdf, other

    cs.SD eess.AS

    The impact of Audio input representations on neural network based music transcription

    Authors: Kin Wai Cheuk, Kat Agres, Dorien Herremans

    Abstract: This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription. We use our own GPU based spectrogram extraction tool, nnAudio, to investigate the influence of using a linear-frequency spectrogram, log-frequency spectrogram, Mel spectrogram, and constant-Q transform (CQT). Our results show that a $8.33$% increase in transcription accu… ▽ More

    Submitted 21 July, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: Paper accepted in IJCNN 2020

    Journal ref: IJCNN 2020

  30. arXiv:2001.09988  [pdf, other

    cs.SD eess.AS

    Regression-based music emotion prediction using triplet neural networks

    Authors: Kin Wai Cheuk, Yin-Jyun Luo, Balamurali B, T, Gemma Roig, Dorien Herremans

    Abstract: In this paper, we adapt triplet neural networks (TNNs) to a regression task, music emotion prediction. Since TNNs were initially introduced for classification, and not for regression, we propose a mechanism that allows them to provide meaningful low dimensional representations for regression tasks. We then use these new representations as the input for regression algorithms such as support vector… ▽ More

    Submitted 21 July, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: Paper Accepted i nIJCNN 2020

    Journal ref: IJCNN 2020

  31. arXiv:1912.12055  [pdf, other

    cs.SD cs.LG eess.AS

    nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks

    Authors: Kin Wai Cheuk, Hans Anderson, Kat Agres, Dorien Herremans

    Abstract: Converting time domain waveforms to frequency domain spectrograms is typically considered to be a prepossessing step done before model training. This approach, however, has several drawbacks. First, it takes a lot of hard disk space to store different frequency domain representations. This is especially true during the model development and tuning process, when exploring various types of spectrogr… ▽ More

    Submitted 21 August, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

    Comments: Accepted In IEEE Access

  32. arXiv:1912.02613  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders

    Authors: Yin-Jyun Luo, Chin-Chen Hsu, Kat Agres, Dorien Herremans

    Abstract: We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of variational autoencoders. It employs separate encoders to learn disentangled latent representations of singer identity and vocal technique separately, with a joint… ▽ More

    Submitted 24 February, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted to ICASSP 2020

  33. arXiv:1910.02049  [pdf, ps, other

    cs.SD cs.IR cs.LG eess.AS

    Midi Miner -- A Python library for tonal tension and track classification

    Authors: Rui Guo, Dorien Herremans, Thor Magnusson

    Abstract: We present a Python library, called Midi Miner, that can calculate tonal tension and classify different tracks. MIDI (Music Instrument Digital Interface) is a hardware and software standard for communicating musical events between digital music devices. It is often used for tasks such as music representation, communication between devices, and even music generation [5]. Tension is an essential ele… ▽ More

    Submitted 26 May, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 2 pages. ISMIR - Late Breaking Demo, Delft, The Netherlands. November 2019

  34. arXiv:1910.01463  [pdf, other

    cs.SD cs.LG eess.AS

    Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

    Authors: Kin Wai Cheuk, Balamurali B. T., Gemma Roig, Dorien Herremans

    Abstract: We present an approach to tackle the speaker recognition problem using Triplet Neural Networks. Currently, the $i$-vector representation with probabilistic linear discriminant analysis (PLDA) is the most commonly used technique to solve this problem, due to high classification accuracy with a relatively short computation time. In this paper, we explore a neural network approach, namely Triplet Neu… ▽ More

    Submitted 3 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted for ASRU 2019

    MSC Class: 68T10; 68Txx

    Journal ref: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019). Singapore. 2019

  35. arXiv:1909.02850  [pdf, other

    eess.SP cs.LG cs.SD stat.ML

    Doppler Invariant Demodulation for Shallow Water Acoustic Communications Using Deep Belief Networks

    Authors: Abigail Lee-Leon, Chau Yuen, Dorien Herremans

    Abstract: Shallow water environments create a challenging channel for communications. In this paper, we focus on the challenges posed by the frequency-selective signal distortion called the Doppler effect. We explore the design and performance of machine learning (ML) based demodulation methods --- (1) Deep Belief Network-feed forward Neural Network (DBN-NN) and (2) Deep Belief Network-Convolutional Neural… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Journal ref: Proceedings of 16th IEEE Asia Pacific Wireless Communications Symposium (APWCS). 2019. Singapore

  36. arXiv:1906.08152  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

    Authors: Yin-Jyun Luo, Kat Agres, Dorien Herremans

    Abstract: In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For… ▽ More

    Submitted 29 June, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

    Comments: 20th Conference of the International Society for Music Information Retrieval

  37. arXiv:1905.08076  [pdf, other

    cs.SD cs.IR cs.LG eess.AS stat.ML

    Dance Hit Song Prediction

    Authors: Dorien herremans, David Martens, Kenneth Sörensen

    Abstract: Record companies invest billions of dollars in new talent around the globe each year. Gaining insight into what actually makes a hit song would provide tremendous benefits for the music industry. In this research we tackle this question by focussing on the dance hit song classification problem. A database of dance hit songs from 1985 until 2013 is built, including basic musical features, as well a… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Journal ref: Journal of New music Research. 43:302 (2014)

  38. MorpheuS: generating structured music with constrained patterns and tension

    Authors: Dorien Herremans, Elaine Chew

    Abstract: Automatic music generation systems have gained in popularity and sophistication as advances in cloud computing have enabled large-scale complex computations such as deep models and optimization algorithms on personal devices. Yet, they still face an important challenge, that of long-term structure, which is key to conveying a sense of musical coherence. We present the MorpheuS music generation sys… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.

    Comments: IEEE Transactions on Affective Computing. PP(99)

  39. arXiv:1812.04186  [pdf, other

    cs.SD cs.LG eess.AS

    A Functional Taxonomy of Music Generation Systems

    Authors: Dorien Herremans, Ching-Hua Chuan, Elaine Chew

    Abstract: Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing. Despite the many breakthroughs, issues such as the musical tasks targeted by different machines and the degree to which they succeed remain open questions. We present a functional taxonomy for music generation systems with reference to existing systems. The taxonomy organizes sys… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: survey, music generation, taxonomy, functional survey, survey, automatic composition, algorithmic composition

    MSC Class: 68Txx; 68-XX

    Journal ref: ACM Computing Surveys (CSUR), 50(5), 69. https://dl.acm.org/citation.cfm?id=3145473.3108242

  40. arXiv:1812.01278  [pdf, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy

    Authors: Kin Wah Edward Lin, Balamurali B. T., Enyan Koh, Simon Lui, Dorien Herremans

    Abstract: Separating a singing voice from its music accompaniment remains an important challenge in the field of music information retrieval. We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms. The p… ▽ More

    Submitted 4 December, 2018; originally announced December 2018.

    Comments: In Press, Neural Computing and Applications, Springer. 2019

    MSC Class: 68-XX; 68Txx

  41. arXiv:1811.12408  [pdf, other

    cs.SD cs.IR cs.LG eess.AS stat.ML

    From Context to Concept: Exploring Semantic Relationships in Music with Word2Vec

    Authors: Ching-Hua Chuan, Kat Agres, Dorien Herremans

    Abstract: We explore the potential of a popular distributional semantics vector space model, word2vec, for capturing meaningful relationships in ecological (complex polyphonic) music. More precisely, the skip-gram version of word2vec is used to model slices of music from a large corpus spanning eight musical genres. In this newly learned vector space, a metric based on cosine distance is able to distinguish… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

    Comments: Accepted for publication in Neural Computing and Applications, Springer. In Press

    MSC Class: 68Txx; 68Wxx

    Journal ref: Neural Computing and Applications, Springer. 2019