Skip to main content

Showing 1–13 of 13 results for author: Lian, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2401.05698  [pdf, other

    cs.CV cs.HC cs.MM cs.SD eess.AS

    HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

    Authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in recent years for its critical role in creating emotion-ware intelligent machines. Previous efforts in this area are dominated by the supervised learning paradigm. Despite significant progress, supervised learning is meeting its bottleneck due to the longstanding data scarcity issue in AVER. Motivated by recent advances in… ▽ More

    Submitted 1 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by Information Fusion. The code is available at https://github.com/sunlicai/HiCMAE

    Journal ref: Information Fusion, 2024

  2. arXiv:2311.15339  [pdf, other

    cs.CV cs.CR cs.LG eess.IV

    Adversarial Purification of Information Masking

    Authors: Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao

    Abstract: Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  3. arXiv:2306.09361  [pdf, other

    eess.AS cs.CL cs.SD

    MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition

    Authors: Haiyang Sun, Fulin Zhang, Yingying Gao, Zheng Lian, Shilei Zhang, Junlan Feng

    Abstract: Speech Emotion Recognition (SER) is an important research topic in human-computer interaction. Many recent works focus on directly extracting emotional cues through pre-trained knowledge, frequently overlooking considerations of appropriateness and comprehensiveness. Therefore, we propose a novel framework for pre-training knowledge in SER, called Multi-perspective Fusion Search Network (MFSN). Co… ▽ More

    Submitted 26 June, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

  4. arXiv:2305.13774  [pdf, other

    cs.SD eess.AS

    ADD 2023: the Second Audio Deepfake Detection Challenge

    Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

    Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  5. arXiv:2203.13617  [pdf, other

    eess.AS cs.LG cs.SD

    EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition

    Authors: Haiyang Sun, Zheng Lian, Bin Liu, Ying Li, Licai Sun, Cong Cai, Jianhua Tao, Meng Wang, Yuan Cheng

    Abstract: Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural a… ▽ More

    Submitted 9 June, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to Interspeech 2023

  6. arXiv:2202.08433  [pdf, ps, other

    cs.SD cs.LG eess.AS

    ADD 2022: the First Audio Deep Synthesis Detection Challenge

    Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

    Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More

    Submitted 26 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  7. arXiv:1911.02163  [pdf, ps, other

    cs.CV eess.IV

    SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and Segmentation

    Authors: Xiao Sun, Zhouhui Lian, Jianguo Xiao

    Abstract: Point cloud analysis has drawn broader attentions due to its increasing demands in various fields. Despite the impressive performance has been achieved on several databases, researchers neglect the fact that the orientation of those point cloud data is aligned. Varying the orientation of point cloud may lead to the degradation of performance, restricting the capacity of generalizing to real applic… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: 8 pages, 7 figures

  8. arXiv:1910.13807  [pdf, other

    eess.AS cs.LG cs.SD

    Domain adversarial learning for emotion recognition

    Authors: Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang

    Abstract: In practical applications for emotion recognition, users do not always exist in the training corpus. The mismatch between training speakers and testing speakers affects the performance of the trained model. To deal with this problem, we need our model to focus on emotion-related information, while ignoring the difference between speaker identities. In this paper, we look into the use of the domain… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: submitted to ICASSP2020

  9. Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition

    Authors: Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang

    Abstract: Prior works on speech emotion recognition utilize various unsupervised learning approaches to deal with low-resource samples. However, these methods pay less attention to modeling the long-term dynamic dependency, which is important for speech emotion recognition. To deal with this problem, this paper combines the unsupervised representation learning strategy -- Future Observation Prediction (FOP)… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Journal ref: Proc. Interspeech 2019, 3840-3844

  10. arXiv:1910.11269  [pdf, other

    cs.SD eess.AS

    Towards Fine-Grained Prosody Control for Voice Conversion

    Authors: Zheng Lian, Zhengqi Wen

    Abstract: In a typical voice conversion system, prior works utilize various acoustic features (e.g., the pitch, voiced/unvoiced flag, aperiodicity) of the source speech to control the prosody of generated waveform. However, the prosody is related with many factors, such as the intonation, stress and rhythm. It is a challenging task to perfectly describe the prosody through acoustic features. To deal with th… ▽ More

    Submitted 27 May, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

  11. Conversational Emotion Analysis via Attention Mechanisms

    Authors: Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang

    Abstract: Different from the emotion recognition in individual utterances, we propose a multimodal learning framework using relation and dependencies among the utterances for conversational emotion analysis. The attention mechanism is applied to the fusion of the acoustic and lexical features. Then these fusion representations are fed into the self-attention based bi-directional gated recurrent unit (GRU) l… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Journal ref: Proc. Interspeech 2019, 1936-1940

  12. arXiv:1910.11174  [pdf

    cs.CV cs.LG eess.AS

    Speech Emotion Recognition via Contrastive Loss under Siamese Networks

    Authors: Zheng Lian, Ya Li, Jianhua Tao, Jian Huang

    Abstract: Speech emotion recognition is an important aspect of human-computer interaction. Prior work proposes various end-to-end models to improve the classification performance. However, most of them rely on the cross-entropy loss together with softmax as the supervision component, which does not explicitly encourage discriminative learning of features. In this paper, we introduce the contrastive loss fun… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: ASMMC-MMAC 2018 Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data

  13. arXiv:1801.01237   

    cs.HC cs.SD eess.AS

    A pairwise discriminative task for speech emotion recognition

    Authors: Zheng Lian, Ya Li, Jianhua Tao, Jian Huang

    Abstract: I have submitted a new version to arXiv:1910.11174. I forget to choose to replace the old version, but submitted a new one. It's my mistake.

    Submitted 30 October, 2019; v1 submitted 3 January, 2018; originally announced January 2018.

    Comments: I have submitted a new version to arXiv:1910.11174. I forget to choose to replace the old version, but submitted a new one. It's my mistake