Search | arXiv e-print repository

arXiv:2405.19373 [pdf, other]

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Authors: Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

Abstract: Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attem… ▽ More Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by International Conference on Neural Computing for Advanced Applications, 2024

arXiv:2312.11896 [pdf, other]

Stable Relay Learning Optimization Approach for Fast Power System Production Cost Minimization Simulation

Authors: Zishan Guo, Qinran Hu, Tao Qian, Xin Fang, Renjie Hu, Zaijun Wu

Abstract: Production cost minimization (PCM) simulation is commonly employed for assessing the operational efficiency, economic viability, and reliability, providing valuable insights for power system planning and operations. However, solving a PCM problem is time-consuming, consisting of numerous binary variables for simulation horizon extending over months and years. This hinders rapid assessment of moder… ▽ More Production cost minimization (PCM) simulation is commonly employed for assessing the operational efficiency, economic viability, and reliability, providing valuable insights for power system planning and operations. However, solving a PCM problem is time-consuming, consisting of numerous binary variables for simulation horizon extending over months and years. This hinders rapid assessment of modern energy systems with diverse planning requirements. Existing methods for accelerating PCM tend to sacrifice accuracy for speed. In this paper, we propose a stable relay learning optimization (s-RLO) approach within the Branch and Bound (B&B) algorithm. The proposed approach offers rapid and stable performance, and ensures optimal solutions. The two-stage s-RLO involves an imitation learning (IL) phase for accurate policy initialization and a reinforcement learning (RL) phase for time-efficient fine-tuning. When implemented on the popular SCIP solver, s-RLO returns the optimal solution up to 2 times faster than the default relpscost rule and 1.4 times faster than IL, or exhibits a smaller gap at the predefined time limit. The proposed approach shows stable performance, reducing fluctuations by approximately 50% compared with IL. The efficacy of the proposed s-RLO approach is supported by numerical results. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Submitted to IEEE Transactions on Power Systems on December 15, 2023

arXiv:2308.02867 [pdf, other]

A Systematic Exploration of Joint-training for Singing Voice Synthesis

Authors: Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin **

Abstract: There has been a growing interest in using end-to-end acoustic models for singing voice synthesis (SVS). Typically, these models require an additional vocoder to transform the generated acoustic features into the final waveform. However, since the acoustic model and the vocoder are not jointly optimized, a gap can exist between the two models, leading to suboptimal performance. Although a similar… ▽ More There has been a growing interest in using end-to-end acoustic models for singing voice synthesis (SVS). Typically, these models require an additional vocoder to transform the generated acoustic features into the final waveform. However, since the acoustic model and the vocoder are not jointly optimized, a gap can exist between the two models, leading to suboptimal performance. Although a similar problem has been addressed in the TTS systems by joint-training or by replacing acoustic features with a latent representation, adopting corresponding approaches to SVS is not an easy task. How to improve the joint-training of SVS systems has not been well explored. In this paper, we conduct a systematic investigation of how to better perform a joint-training of an acoustic model and a vocoder for SVS. We carry out extensive experiments and demonstrate that our joint-training strategy outperforms baselines, achieving more stable performance across different datasets while also increasing the interpretability of the entire framework. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2303.08607 [pdf, other]

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

Authors: Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin **

Abstract: Singing voice synthesis (SVS), as a specific task for generating the vocal singing voice from a music score, has drawn much attention in recent years. SVS faces the challenge that the singing has various pronunciation flexibility conditioned on the same music score. Most of the previous works of SVS can not well handle the misalignment between the music score and actual singing. In this paper, we… ▽ More Singing voice synthesis (SVS), as a specific task for generating the vocal singing voice from a music score, has drawn much attention in recent years. SVS faces the challenge that the singing has various pronunciation flexibility conditioned on the same music score. Most of the previous works of SVS can not well handle the misalignment between the music score and actual singing. In this paper, we propose an acoustic feature processing strategy, named PHONEix, with a phoneme distribution predictor, to alleviate the gap between the music score and the singing voice, which can be easily adopted in different SVS systems. Extensive experiments in various settings demonstrate the effectiveness of our PHONEix in both objective and subjective evaluations. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2208.10059 [pdf, ps, other]

Sampling Gaussian Stationary Random Fields: A Stochastic Realization Approach

Authors: Bin Zhu, Jiahao Liu, Zhengshou Lai, Tao Qian

Abstract: Generating large-scale samples of stationary random fields is of great importance in the fields such as geomaterial modeling and uncertainty quantification. Traditional methodologies based on covariance matrix decomposition have the diffculty of being computationally expensive, which is even more serious when the dimension of the random field is large. This paper proposes an effcient stochastic re… ▽ More Generating large-scale samples of stationary random fields is of great importance in the fields such as geomaterial modeling and uncertainty quantification. Traditional methodologies based on covariance matrix decomposition have the diffculty of being computationally expensive, which is even more serious when the dimension of the random field is large. This paper proposes an effcient stochastic realization approach for sampling Gaussian stationary random fields from a systems and control point of view. Specifically, we take the exponential and Gaussian covariance functions as examples and make a decoupling assumption when there are multiple dimensions. Then a rational spectral density is constructed in each dimension using techniques from covariance extension, and the corresponding autoregressive moving-average (ARMA) model is obtained via spectral factorization. As a result, samples of the random field with a specific covariance function can be generated very effciently in the space domain by implementing the ARMA recursion using a white noise input. Such a procedure is computationally cheap due to the fact that the constructed ARMA model has a low order. Furthermore, the same method is integrated to multiscale simulations where interpolations of the generated samples are achieved when one zooms into finer scales. Both theoretical analysis and simulation results show that our approach performs favorably compared with covariance matrix decomposition methods. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 17 pages, 9 figures

arXiv:2205.07319 [pdf]

cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms

Authors: Tracy Qian, Jackson Kaunismaa, Tony Chung

Abstract: Analysing music in the field of machine learning is a very difficult problem with numerous constraints to consider. The nature of audio data, with its very high dimensionality and widely varying scales of structure, is one of the primary reasons why it is so difficult to model. There are many applications of machine learning in music, like the classifying the mood of a piece of music, conditional… ▽ More Analysing music in the field of machine learning is a very difficult problem with numerous constraints to consider. The nature of audio data, with its very high dimensionality and widely varying scales of structure, is one of the primary reasons why it is so difficult to model. There are many applications of machine learning in music, like the classifying the mood of a piece of music, conditional music generation, or popularity prediction. The goal for this project was to develop a genre-conditional generative model of music based on Mel spectrograms and evaluate its performance by comparing it to existing generative music models that use note-based representations. We initially implemented an autoregressive, RNN-based generative model called MelNet . However, due to its slow speed and low fidelity output, we decided to create a new, fully convolutional architecture that is based on the MelGAN [4] and conditional GAN architectures, called cMelGAN. △ Less

Submitted 15 May, 2022; originally announced May 2022.

arXiv:2205.04029 [pdf, other]

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis

Authors: Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin **

Abstract: This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training,… ▽ More This paper introduces a new open-source platform named Muskits for end-to-end music processing, which mainly focuses on end-to-end singing voice synthesis (E2E-SVS). Muskits supports state-of-the-art SVS models, including RNN SVS, transformer SVS, and XiaoiceSing. The design of Muskits follows the style of widely-used speech processing toolkits, ESPnet and Kaldi, for data prepossessing, training, and recipe pipelines. To the best of our knowledge, this toolkit is the first platform that allows a fair and highly-reproducible comparison between several published works in SVS. In addition, we also demonstrate several advanced usages based on the toolkit functionalities, including multilingual training and transfer learning. This paper describes the major framework of Muskits, its functionalities, and experimental results in single-singer, multi-singer, multilingual, and transfer learning scenarios. The toolkit is publicly available at https://github.com/SJTMusicTeam/Muskits. △ Less

Submitted 2 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: Accepted by Interspeech

arXiv:2203.17001 [pdf, other]

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

Authors: Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin **

Abstract: Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation… ▽ More Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations. △ Less

Submitted 6 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted by INTERSPEECH 2022

arXiv:2203.05571 [pdf, other]

doi 10.1117/12.2544074

Deep Convolutional Neural Networks for Molecular Subty** of Gliomas Using Magnetic Resonance Imaging

Authors: Dong Wei, Yiming Li, Yinyan Wang, Tianyi Qian, Yefeng Zheng

Abstract: Knowledge of molecular subtypes of gliomas can provide valuable information for tailored therapies. This study aimed to investigate the use of deep convolutional neural networks (DCNNs) for noninvasive glioma subty** with radiological imaging data according to the new taxonomy announced by the World Health Organization in 2016. Methods: A DCNN model was developed for the prediction of the five g… ▽ More Knowledge of molecular subtypes of gliomas can provide valuable information for tailored therapies. This study aimed to investigate the use of deep convolutional neural networks (DCNNs) for noninvasive glioma subty** with radiological imaging data according to the new taxonomy announced by the World Health Organization in 2016. Methods: A DCNN model was developed for the prediction of the five glioma subtypes based on a hierarchical classification paradigm. This model used three parallel, weight-sharing, deep residual learning networks to process 2.5-dimensional input of trimodal MRI data, including T1-weighted, T1-weighted with contrast enhancement, and T2-weighted images. A data set comprising 1,016 real patients was collected for evaluation of the developed DCNN model. The predictive performance was evaluated via the area under the curve (AUC) from the receiver operating characteristic analysis. For comparison, the performance of a radiomics-based approach was also evaluated. Results: The AUCs of the DCNN model for the four classification tasks in the hierarchical classification paradigm were 0.89, 0.89, 0.85, and 0.66, respectively, as compared to 0.85, 0.75, 0.67, and 0.59 of the radiomics approach. Conclusion: The results showed that the developed DCNN model can predict glioma subtypes with promising performance, given sufficient, non-ill-balanced training data. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: Proc. SPIE 11314, Medical Imaging 2020: Computer-Aided Diagnosis

arXiv:1906.07361 [pdf, other]

A Novel Feature Representation for Single-Channel Heartbeat Classification based on Adaptive Fourier Decomposition

Authors: Chunyu Tan, Liming Zhang, Hau-tieng Wu, Tao Qian

Abstract: This paper proposes a novel approach for heartbeat classification from single-lead electrocardiogram (ECG) signals based on the novel adaptive Fourier decomposition (AFD). AFD is a recently developed signal processing tool that provides useful morphological features, referred to as AFD-derived instantaneous frequency (IF) features, that are different from those provided by traditional tools. A sup… ▽ More This paper proposes a novel approach for heartbeat classification from single-lead electrocardiogram (ECG) signals based on the novel adaptive Fourier decomposition (AFD). AFD is a recently developed signal processing tool that provides useful morphological features, referred to as AFD-derived instantaneous frequency (IF) features, that are different from those provided by traditional tools. A support vector machine (SVM) classifier is trained with the AFD-derived IF features, ECG landmark features, and RR interval features. To evaluate the performance of the trained classifier, the Association for the Advancement of Medical Instrumentation (AAMI) standard is applied to the publicly available benchmark databases, including MIT-BIH arrhythmia database and MIT-BIH supraventricular arrhythmia database, to classify heartbeats from single-lead ECG. The overall performance in terms of sensitivities and positive predictive values is comparable to the state-of-the-art automatic heartbeat classification algorithms based on two-leads ECG. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Showing 1–10 of 10 results for author: Qian, T