Skip to main content

Showing 1–8 of 8 results for author: Lian, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.01494  [pdf, other

    eess.AS cs.SD eess.SP

    PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

    Authors: Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

    Abstract: In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception. To improve the content naturalness of converted audio, we have developed an end-to-end EVC architecture inspired by the high audio quality of… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP2024

  2. arXiv:2401.09752  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

    Authors: Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Björn Schuller, Wenming Zheng

    Abstract: In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers. Consequently, when the trained model is confronted with data from new speakers, its performance tends to degrade. To address the issue, we propose a Dynamic Joint Distribu… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  3. arXiv:2312.06466  [pdf, other

    cs.SD eess.AS

    Towards Domain-Specific Cross-Corpus Speech Emotion Recognition Approach

    Authors: Yan Zhao, Yuan Zong, Hailun Lian, Cheng Lu, **gang Shi, Wenming Zheng

    Abstract: Cross-corpus speech emotion recognition (SER) poses a challenge due to feature distribution mismatch, potentially degrading the performance of established SER methods. In this paper, we tackle this challenge by proposing a novel transfer subspace learning method called acoustic knowledgeguided transfer linear regression (AKTLR). Unlike existing approaches, which often overlook domain-specific know… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  4. arXiv:2310.03992  [pdf, other

    cs.SD eess.AS

    Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

    Authors: Yan Zhao, Yuan Zong, **cen Wang, Hailun Lian, Cheng Lu, Li Zhao, Wenming Zheng

    Abstract: In this paper, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDAN) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDAN), whose key contribution lies in the introduction of a novel regularization term called… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  5. arXiv:2308.14568  [pdf, other

    cs.SD eess.AS

    Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

    Authors: Yong Wang, Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Sunan Li

    Abstract: In this paper, we propose a novel time-frequency joint learning method for speech emotion recognition, called Time-Frequency Transformer. Its advantage is that the Time-Frequency Transformer can excavate global emotion patterns in the time-frequency domain of speech signal while modeling the local emotional correlations in the time domain and frequency domain respectively. For the purpose, we firs… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by International Conference on Neural Information Processing (ICONIP2023)

  6. arXiv:2302.08921  [pdf, other

    cs.SD cs.CL eess.AS

    Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

    Authors: Yan Zhao, **cen Wang, Yuan Zong, Wenming Zheng, Hailun Lian, Li Zhao

    Abstract: In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. Specifically, DIDAN first adopts a simple deep regression network consisting of a set of conv… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  7. arXiv:2210.12430  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Speech Emotion Recognition via an Attentive Time-Frequency Neural Network

    Authors: Cheng Lu, Wenming Zheng, Hailun Lian, Yuan Zong, Chuangao Tang, Sunan Li, Yan Zhao

    Abstract: Spectrogram is commonly used as the input feature of deep neural networks to learn the high(er)-level time-frequency pattern of speech signal for speech emotion recognition (SER). \textcolor{black}{Generally, different emotions correspond to specific energy activations both within frequency bands and time frames on spectrogram, which indicates the frequency and time domains are both essential to r… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: This paper has been accepted as a regular paper on IEEE Transactions on Computational Social Systems

  8. arXiv:2210.03460  [pdf, other

    eess.IV cs.CV

    Flexible Alignment Super-Resolution Network for Multi-Contrast MRI

    Authors: Yiming Liu, Mengxi Zhang, Weiqin Zhang, Bo Jiang, Bo Hou, Dan Liu, Jie Chen, Heqing Lian

    Abstract: Magnetic resonance imaging plays an essential role in clinical diagnosis by acquiring the structural information of biological tissue. Recently, many multi-contrast MRI super-resolution networks achieve good effects. However, most studies ignore the impact of the inappropriate foreground scale and patch size of multi-contrast MRI, which probably leads to inappropriate feature alignment. To tackle… ▽ More

    Submitted 8 January, 2023; v1 submitted 7 October, 2022; originally announced October 2022.