-
Structure-informed Positional Encoding for Music Generation
Authors:
Manvi Agarwal,
Changhong Wang,
Gaël Richard
Abstract:
Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary po…
▽ More
Music generated by deep learning methods often suffers from a lack of coherence and long-term organization. Yet, multi-scale hierarchical structure is a distinctive feature of music signals. To leverage this information, we propose a structure-informed positional encoding framework for music generation with Transformers. We design three variants in terms of absolute, relative and non-stationary positional information. We comprehensively test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation. As a comparison, we choose multiple baselines from the literature and demonstrate the merits of our methods using several musically-motivated evaluation metrics. In particular, our methods improve the melodic and structural consistency of the generated pieces.
△ Less
Submitted 28 February, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Markov Decision Processes with Long-Term Average Constraints
Authors:
Mridul Agarwal,
Qinbo Bai,
Vaneet Aggarwal
Abstract:
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximize the long-term average reward while simultaneously kee** the $K$ long-term average costs lower than a certain threshold. In this paper, we propose…
▽ More
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximize the long-term average reward while simultaneously kee** the $K$ long-term average costs lower than a certain threshold. In this paper, we propose CMDP-PSRL, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. Further, for MDP with $S$ states, $A$ actions, and diameter $D$, we prove that following CMDP-PSRL algorithm, the agent can bound the regret of not accumulating rewards from optimal policy by $\Tilde{O}(poly(DSA)\sqrt{T})$. Further, we show that the violations for any of the $K$ constraints is also bounded by $\Tilde{O}(poly(DSA)\sqrt{T})$. To the best of our knowledge, this is the first work which obtains a $\Tilde{O}(\sqrt{T})$ regret bounds for ergodic MDPs with long-term average constraints.
△ Less
Submitted 20 June, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm
Authors:
Qinbo Bai,
Mridul Agarwal,
Vaneet Aggarwal
Abstract:
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, a biased estimator is proposed. The pr…
▽ More
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A policy-gradient based model-free algorithm is proposed for the problem. To compute an estimate of the gradient, a biased estimator is proposed. The proposed algorithm is shown to achieve convergence to within an $ε$ of the global optima after sampling $\mathcal{O}(\frac{M^4σ^2}{(1-γ)^8ε^4})$ trajectories where $γ$ is the discount factor and $M$ is the number of the agents, thus achieving the same dependence on $ε$ as the policy gradient algorithm for the standard reinforcement learning.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
COV-ELM classifier: An Extreme Learning Machine based identification of COVID-19 using Chest X-Ray Images
Authors:
Sheetal Rajpal,
Manoj Agarwal,
Ankit Rajpal,
Navin Lakhyani,
Arpita Saggar,
Naveen Kumar
Abstract:
Coronaviruses constitute a family of viruses that gives rise to respiratory diseases. As COVID-19 is highly contagious, early diagnosis of COVID-19 is crucial for an effective treatment strategy. However, the RT-PCR test which is considered to be a gold standard in the diagnosis of COVID-19 suffers from a high false-negative rate. Chest X-ray (CXR) image analysis has emerged as a feasible and effe…
▽ More
Coronaviruses constitute a family of viruses that gives rise to respiratory diseases. As COVID-19 is highly contagious, early diagnosis of COVID-19 is crucial for an effective treatment strategy. However, the RT-PCR test which is considered to be a gold standard in the diagnosis of COVID-19 suffers from a high false-negative rate. Chest X-ray (CXR) image analysis has emerged as a feasible and effective diagnostic technique towards this objective. In this work, we propose the COVID-19 classification problem as a three-class classification problem to distinguish between COVID-19, normal, and pneumonia classes. We propose a three-stage framework, named COV-ELM. Stage one deals with preprocessing and transformation while stage two deals with feature extraction. These extracted features are passed as an input to the ELM at the third stage, resulting in the identification of COVID-19. The choice of ELM in this work has been motivated by its faster convergence, better generalization capability, and shorter training time in comparison to the conventional gradient-based learning algorithms. As bigger and diverse datasets become available, ELM can be quickly retrained as compared to its gradient-based competitor models. The proposed model achieved a macro average F1-score of 0.95 and the overall sensitivity of ${0.94 \pm 0.02} at a 95% confidence interval. When compared to state-of-the-art machine learning algorithms, the COV-ELM is found to outperform its competitors in this three-class classification scenario. Further, LIME has been integrated with the proposed COV-ELM model to generate annotated CXR images. The annotations are based on the superpixels that have contributed to distinguish between the different classes. It was observed that the superpixels correspond to the regions of the human lungs that are clinically observed in COVID-19 and Pneumonia cases.
△ Less
Submitted 28 September, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Accelerating Reinforcement Learning Agent with EEG-based Implicit Human Feedback
Authors:
Duo Xu,
Mohit Agarwal,
Ekansh Gupta,
Faramarz Fekri,
Raghupathy Sivakumar
Abstract:
Providing Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning. However, previous methods require human observer to give inputs explicitly (e.g., press buttons, voice interface), burdening the human in the loop of RL agent's learning process. Further, it is sometimes difficult or impossible to obtain the explicit human advise (feedback), e.g.,…
▽ More
Providing Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning. However, previous methods require human observer to give inputs explicitly (e.g., press buttons, voice interface), burdening the human in the loop of RL agent's learning process. Further, it is sometimes difficult or impossible to obtain the explicit human advise (feedback), e.g., autonomous driving, disabled rehabilitation, etc. In this work, we investigate capturing human's intrinsic reactions as implicit (and natural) feedback through EEG in the form of error-related potentials (ErrP), providing a natural and direct way for humans to improve the RL agent learning. As such, the human intelligence can be integrated via implicit feedback with RL algorithms to accelerate the learning of RL agent. We develop three reasonably complex 2D discrete navigational games to experimentally evaluate the overall performance of the proposed work. Major contributions of our work are as follows,
(i) we propose and experimentally validate the zero-shot learning of ErrPs, where the ErrPs can be learned for one game, and transferred to other unseen games, (ii) we propose a novel RL framework for integrating implicit human feedbacks via ErrPs with RL agent, improving the label efficiency and robustness to human mistakes, and (iii) compared to prior works, we scale the application of ErrPs to reasonably complex environments, and demonstrate the significance of our approach for accelerated learning through real user experiments.
△ Less
Submitted 14 October, 2020; v1 submitted 29 June, 2020;
originally announced June 2020.