Search | arXiv e-print repository

Is Disentanglement enough? On Latent Representations for Controllable Music Generation

Abstract: Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between dis… ▽ More Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between disentanglement and controllability by conducting a systematic study using different supervised disentanglement learning algorithms based on the Variational Auto-Encoder (VAE) architecture. Our experiments show that a high degree of disentanglement can be achieved by using different forms of supervision to train a strong discriminative encoder. However, in the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes. To this end, we also propose methods and metrics to help evaluate the quality of a latent space with respect to the afforded degree of controllability. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: To be published in: Proceedings of 22nd International Society for Music Information Retrieval Conference (ISMIR), Online, 2021

arXiv:2104.09018 [pdf, other]

doi 10.5334/tismir.53

An Interdisciplinary Review of Music Performance Analysis

Authors: Alexander Lerch, Claire Arthur, Ashis Pati, Siddharth Gururani

Abstract: A musical performance renders an acoustic realization of a musical score or other representation of a composition. Different performances of the same composition may vary in terms of performance parameters such as timing or dynamics, and these variations may have a major impact on how a listener perceives the music. The analysis of music performance has traditionally been a peripheral topic for th… ▽ More A musical performance renders an acoustic realization of a musical score or other representation of a composition. Different performances of the same composition may vary in terms of performance parameters such as timing or dynamics, and these variations may have a major impact on how a listener perceives the music. The analysis of music performance has traditionally been a peripheral topic for the MIR research community, where often a single audio recording is used as representative of a musical work. This paper surveys the field of Music Performance Analysis (MPA) from several perspectives including the measurement of performance parameters, the relation of those parameters to the actions and intentions of a performer or perceptual effects on a listener, and finally the assessment of musical performance. This paper also discusses MPA as it relates to MIR, pointing out opportunities for collaboration and future research in both areas. △ Less

Submitted 18 April, 2021; originally announced April 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:1907.00178

ACM Class: A.1

Journal ref: Transactions of the International Society for Music Information Retrieval, 3(1), pp.221-245, 2020

arXiv:2008.00203 [pdf, other]

Score-informed Networks for Music Performance Assessment

Authors: Jiawen Huang, Yun-Ning Hung, Ashis Pati, Siddharth Kumar Gururani, Alexander Lerch

Abstract: The assessment of music performances in most cases takes into account the underlying musical score being performed. While there have been several automatic approaches for objective music performance assessment (MPA) based on extracted features from both the performance audio and the score, deep neural network-based methods incorporating score information into MPA models have not yet been investiga… ▽ More The assessment of music performances in most cases takes into account the underlying musical score being performed. While there have been several automatic approaches for objective music performance assessment (MPA) based on extracted features from both the performance audio and the score, deep neural network-based methods incorporating score information into MPA models have not yet been investigated. In this paper, we introduce three different models capable of score-informed performance assessment. These are (i) a convolutional neural network that utilizes a simple time-series input comprising of aligned pitch contours and score, (ii) a joint embedding model which learns a joint latent space for pitch contours and scores, and (iii) a distance matrix-based convolutional neural network which utilizes patterns in the distance matrix between pitch contours and musical score to predict assessment ratings. Our results provide insights into the suitability of different architectures and input representations and demonstrate the benefits of score-informed models as compared to score-independent models. △ Less

Submitted 1 August, 2020; originally announced August 2020.

Comments: To appear at 21st International Society for Music Information Retrieval Conference, Montréal, Canada, 2020

arXiv:2007.15067 [pdf, other]

dMelodies: A Music Dataset for Disentanglement Learning

Authors: Ashis Pati, Siddharth Gururani, Alexander Lerch

Abstract: Representation learning focused on disentangling the underlying factors of variation in given data has become an important area of research in machine learning. However, most of the studies in this area have relied on datasets from the computer vision domain and thus, have not been readily extended to music. In this paper, we present a new symbolic music dataset that will help researchers working… ▽ More Representation learning focused on disentangling the underlying factors of variation in given data has become an important area of research in machine learning. However, most of the studies in this area have relied on datasets from the computer vision domain and thus, have not been readily extended to music. In this paper, we present a new symbolic music dataset that will help researchers working on disentanglement problems demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. To this end, we create a dataset comprising of 2-bar monophonic melodies where each melody is the result of a unique combination of nine latent factors that span ordinal, categorical, and binary types. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning. In addition, we present benchmarking experiments using popular unsupervised disentanglement algorithms on this dataset and compare the results with those obtained on an image-based dataset. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: To be published in: Proceedings of 21st International Society for Music Information Retrieval Conference (ISMIR), Montréal, Canada, 2020

arXiv:2004.05485 [pdf, other]

Attribute-based Regularization of Latent Spaces for Variational Auto-Encoders

Authors: Ashis Pati, Alexander Lerch

Abstract: Selective manipulation of data attributes using deep generative models is an active area of research. In this paper, we present a novel method to structure the latent space of a Variational Auto-Encoder (VAE) to encode different continuous-valued attributes explicitly. This is accomplished by using an attribute regularization loss which enforces a monotonic relationship between the attribute value… ▽ More Selective manipulation of data attributes using deep generative models is an active area of research. In this paper, we present a novel method to structure the latent space of a Variational Auto-Encoder (VAE) to encode different continuous-valued attributes explicitly. This is accomplished by using an attribute regularization loss which enforces a monotonic relationship between the attribute values and the latent code of the dimension along which the attribute is to be encoded. Consequently, post-training, the model can be used to manipulate the attribute by simply changing the latent code of the corresponding regularized dimension. The results obtained from several quantitative and qualitative experiments show that the proposed method leads to disentangled and interpretable latent spaces that can be used to effectively manipulate a wide range of data attributes spanning image and symbolic music domains. △ Less

Submitted 28 July, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

arXiv:1907.05208 [pdf, other]

Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs

Authors: Benjamin Genchel, Ashis Pati, Alexander Lerch

Abstract: Deep generative models for symbolic music are typically designed to model temporal dependencies in music so as to predict the next musical event given previous events. In many cases, such models are expected to learn abstract concepts such as harmony, meter, and rhythm from raw musical data without any additional information. In this study, we investigate the effects of explicitly conditioning dee… ▽ More Deep generative models for symbolic music are typically designed to model temporal dependencies in music so as to predict the next musical event given previous events. In many cases, such models are expected to learn abstract concepts such as harmony, meter, and rhythm from raw musical data without any additional information. In this study, we investigate the effects of explicitly conditioning deep generative models with musically relevant information. Specifically, we study the effects of four different conditioning inputs on the performance of a recurrent monophonic melody generation model. Several combinations of these conditioning inputs are used to train different model variants which are then evaluated using three objective evaluation paradigms across two genres of music. The results indicate musically relevant conditioning significantly improves learning and performance, and reveal how this information affects learning of musical features related to pitch and rhythm. An informal subjective evaluation suggests a corresponding improvement in the aesthetic quality of generations. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: In Proceedings of the 7th International Workshop on Musical Meta-creation (MUME). Charlotte, North Carolina 2019

arXiv:1907.01164 [pdf, other]

Learning to Traverse Latent Spaces for Musical Score Inpainting

Authors: Ashis Pati, Alexander Lerch, Gaëtan Hadjeres

Abstract: Music Inpainting is the task of filling in missing or lost information in a piece of music. We investigate this task from an interactive music creation perspective. To this end, a novel deep learning-based approach for musical score inpainting is proposed. The designed model takes both past and future musical context into account and is capable of suggesting ways to connect them in a musically mea… ▽ More Music Inpainting is the task of filling in missing or lost information in a piece of music. We investigate this task from an interactive music creation perspective. To this end, a novel deep learning-based approach for musical score inpainting is proposed. The designed model takes both past and future musical context into account and is capable of suggesting ways to connect them in a musically meaningful manner. To achieve this, we leverage the representational power of the latent space of a Variational Auto-Encoder and train a Recurrent Neural Network which learns to traverse this latent space conditioned on the past and future musical contexts. Consequently, the designed model is capable of generating several measures of music to connect two musical excerpts. The capabilities and performance of the model are showcased by comparison with competitive baselines using several objective and subjective evaluation methods. The results show that the model generates meaningful inpaintings and can be used in interactive music creation applications. Overall, the method demonstrates the merit of learning complex trajectories in the latent spaces of deep generative models. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019, Delft, The Netherlands; 6 pages, 8 figures

Journal ref: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019, Delft, The Netherlands

arXiv:1907.00178 [pdf, other]

Music Performance Analysis: A Survey

Authors: Alexander Lerch, Claire Arthur, Ashis Pati, Siddharth Gururani

Abstract: Music Information Retrieval (MIR) tends to focus on the analysis of audio signals. Often, a single music recording is used as representative of a "song" even though different performances of the same song may reveal different properties. A performance is distinct in many ways from a (arguably more abstract) representation of a "song," "piece," or musical score. The characteristics of the (recorded… ▽ More Music Information Retrieval (MIR) tends to focus on the analysis of audio signals. Often, a single music recording is used as representative of a "song" even though different performances of the same song may reveal different properties. A performance is distinct in many ways from a (arguably more abstract) representation of a "song," "piece," or musical score. The characteristics of the (recorded) performance -- as opposed to the score or musical idea -- can have a major impact on how a listener perceives music. The analysis of music performance, however, has been traditionally only a peripheral topic for the MIR research community. This paper surveys the field of Music Performance Analysis (MPA) from various perspectives, discusses its significance to the field of MIR, and points out opportunities for future research in this field. △ Less

Submitted 29 June, 2019; originally announced July 2019.

Comments: To be published in: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019

arXiv:1803.06982 [pdf, other]

doi 10.1088/2633-1357/abb2af

Quantifying coherence with quantum addition

Authors: Chiranjib Mukhopadhyay, Arun Kumar Pati, Sk Sazim

Abstract: Quantum addition channels have been recently introduced in the context of deriving entropic power inequalities for finite dimensional quantum systems. We prove a reverse entropy power equality which can be used to analytically prove an inequality conjectured recently for arbitrary dimension and arbitrary addition weight. We show that the relative entropic difference between the output of such a qu… ▽ More Quantum addition channels have been recently introduced in the context of deriving entropic power inequalities for finite dimensional quantum systems. We prove a reverse entropy power equality which can be used to analytically prove an inequality conjectured recently for arbitrary dimension and arbitrary addition weight. We show that the relative entropic difference between the output of such a quantum additon channel and the corresponding classical mixture quantitatively captures the amount of coherence present in a quantum system. This new coherence measure admits an upper bound in terms of the relative entropy of coherence and is utilized to formulate a state-dependent uncertainty relation for two observables. Our results may provide deep insights to the origin of quantum coherence for mixed states that truly come from the discrepancy between quantum addition and the classical mixture. △ Less

Submitted 20 March, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: 6 pages + references, comments/suggestions most welcome

arXiv:1705.01208 [pdf, other]

A Rule-Based Computational Model of Cognitive Arithmetic

Authors: Ashis Pati, Kantwon Rogers, Hanqing Zhu

Abstract: Cognitive arithmetic studies the mental processes used in solving math problems. This area of research explores the retrieval mechanisms and strategies used by people during a common cognitive task. Past research has shown that human performance in arithmetic operations is correlated to the numerical size of the problem. Past research on cognitive arithmetic has pinpointed this trend to either ret… ▽ More Cognitive arithmetic studies the mental processes used in solving math problems. This area of research explores the retrieval mechanisms and strategies used by people during a common cognitive task. Past research has shown that human performance in arithmetic operations is correlated to the numerical size of the problem. Past research on cognitive arithmetic has pinpointed this trend to either retrieval strength, error checking, or strategy-based approaches when solving equations. This paper describes a rule-based computational model that performs the four major arithmetic operations (addition, subtraction, multiplication and division) on two operands. We then evaluated our model to probe its validity in representing the prevailing concepts observed in psychology experiments from the related works. The experiments specifically explore the problem size effect, an activation-based model for fact retrieval, backup strategies when retrieval fails, and finally optimization strategies when faced with large operands. From our experimental results, we concluded that our model's response times were comparable to results observed when people performed similar tasks during psychology experiments. The fit of our model in reproducing these results and incorporating accuracy into our model are discussed. △ Less

Submitted 2 May, 2017; originally announced May 2017.

arXiv:1503.05085 [pdf, ps, other]

doi 10.1209/0295-5075/113/50002

Stronger Error Disturbance Relations for Incompatible Quantum Measurements

Authors: Chiranjib Mukhopadhyay, Namrata Shukla, Arun Kumar Pati

Abstract: We formulate a new error-disturbance relation, which is free from explicit dependence upon variances in observables. This error-disturbance relation shows improvement over the one provided by the Branciard inequality and the Ozawa inequality for some initial states and for particular class of joint measurements under consideration. We also prove a modified form of Ozawa's error-disturbance relatio… ▽ More We formulate a new error-disturbance relation, which is free from explicit dependence upon variances in observables. This error-disturbance relation shows improvement over the one provided by the Branciard inequality and the Ozawa inequality for some initial states and for particular class of joint measurements under consideration. We also prove a modified form of Ozawa's error-disturbance relation. The later relation provides a tighter bound compared to the Ozawa and the Branciard inequalities for a small number of states. △ Less

Submitted 13 December, 2016; v1 submitted 17 March, 2015; originally announced March 2015.

Comments: 5+pages, 3 figures

Journal ref: Europhysics Letters 113 50002 (2016)

arXiv:1502.01272 [pdf, other]

doi 10.1103/PhysRevA.91.042323

Monogamy, polygamy, and other properties of entanglement of purification

Authors: Shrobona Bagchi, Arun Kumar Pati

Abstract: For bipartite pure and mixed quantum states, in addition to the quantum mutual information, there is another measure of total correlation, namely, the entanglement of purification. We study the monogamy, polygamy, and additivity properties of the entanglement of purification for pure and mixed states. In this paper, we show that, in contrast to the quantum mutual information which is strictly mono… ▽ More For bipartite pure and mixed quantum states, in addition to the quantum mutual information, there is another measure of total correlation, namely, the entanglement of purification. We study the monogamy, polygamy, and additivity properties of the entanglement of purification for pure and mixed states. In this paper, we show that, in contrast to the quantum mutual information which is strictly monogamous for any tripartite pure states, the entanglement of purification is polygamous for the same. This shows that there can be genuinely two types of total correlation across any bipartite cross in a pure tripartite state. Furthermore, we find the lower bound and actual values of the entanglement of purification for different classes of tripartite and higher-dimensional bipartite mixed states. Thereafter, we show that if entanglement of purification is not additive on tensor product states, it is actually subadditive. Using these results, we identify some states which are additive on tensor products for entanglement of purification. The implications of these findings on the quantum advantage of dense coding are briefly discussed, whereby we show that for tripartite pure states, it is strictly monogamous and if it is nonadditive, then it is superadditive on tensor product states. △ Less

Submitted 15 November, 2015; v1 submitted 4 February, 2015; originally announced February 2015.

Comments: 12 pages, 2 figures, Published version

Journal ref: Phys. Rev. A 91, 042323 (2015)

Showing 1–12 of 12 results for author: Pati, A