Skip to main content

Showing 1–50 of 106 results for author: Xie, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19749  [pdf, other

    eess.IV cs.CV

    SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

    Abstract: Automatic vessel segmentation is paramount for develo** next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.14976  [pdf, other

    eess.IV cs.CV

    CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of develo** cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui **, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui **g, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  5. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  6. arXiv:2406.07952  [pdf, other

    eess.IV cs.CV

    Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

    Authors: Zhenhuan Zhou, Along He, Yanlin Wu, Rui Yao, Xueshuo Xie, Tao Li

    Abstract: In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature learning. However, previous networks still have limitations in addressing the above issues. Firstly, previous networks simultaneously fuse multi-level features or… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages

  7. arXiv:2406.03912  [pdf, other

    cs.AI cs.LG cs.RO eess.SY

    GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

    Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

    Abstract: Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  8. arXiv:2402.11632  [pdf, other

    eess.SP

    Reliable long timescale decision-directed channel estimation for OFDM system

    Authors: Xun Wang, Xin Xie, Cunqing Hua, Jianan Hong, Pengwenlong Gu

    Abstract: Decision-directed channel estimation (DDCE) is one kind of blind channel estimation method that tracks the channel blindly by an iterative algorithm without relying on the pilots, which can increase the utilization of wireless resource. However, one major problem of DDCE is the performance degradation caused by error accumulation during the tracking process. In this paper, we propose an reliable D… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  9. arXiv:2401.14008  [pdf, other

    cs.IT eess.SP

    Massive Unsourced Random Access for Near-Field Communications

    Authors: Xinyu Xie, Yongpeng Wu, Jian** An, Derrick Wing Kwan Ng, Chengwen Xing, Wenjun Zhang

    Abstract: This paper investigates the unsourced random access (URA) problem with a massive multiple-input multiple-output receiver that serves wireless devices in the near-field of radiation. We employ an uncoupled transmission protocol without appending redundancies to the slot-wise encoded messages. To exploit the channel sparsity for block length reduction while facing the collapsed sparse structure in t… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Transactions on Communications

  10. LightSleepNet: Design of a Personalized Portable Sleep Staging System Based on Single-Channel EEG

    Authors: Yiqiao Liao, Chao Zhang, Milin Zhang, Zhihua Wang, Xiang Xie

    Abstract: This paper proposed LightSleepNet - a light-weight, 1-d Convolutional Neural Network (CNN) based personalized architecture for real-time sleep staging, which can be implemented on various mobile platforms with limited hardware resources. The proposed architecture only requires an input of 30s single-channel EEG signal for the classification. Two residual blocks consisting of group 1-d convolution… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures, published by IEEE TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2021, 69(1): 224-228

  11. arXiv:2401.11856  [pdf, other

    eess.IV cs.CV

    MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation

    Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Xiu-Ling Liu, Zeng-Guang Hou

    Abstract: Medical image segmentation takes an important position in various clinical applications. Deep learning has emerged as the predominant solution for automated segmentation of volumetric medical images. 2.5D-based segmentation models bridge computational efficiency of 2D-based models and spatial perception capabilities of 3D-based models. However, prevailing 2.5D-based models often treat each slice e… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Under Review

  12. Multi-Channel Multi-Domain based Knowledge Distillation Algorithm for Sleep Staging with Single-Channel EEG

    Authors: Chao Zhang, Yiqiao Liao, Siqi Han, Milin Zhang, Zhihua Wang, Xiang Xie

    Abstract: This paper proposed a Multi-Channel Multi-Domain (MCMD) based knowledge distillation algorithm for sleep staging using single-channel EEG. Both knowledge from different domains and different channels are learnt in the proposed algorithm, simultaneously. A multi-channel pre-training and single-channel fine-tuning scheme is used in the proposed work. The knowledge from different channels in the sour… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 5 pages, 2 figures, published by IEEE TCAS-II

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2022, 69(11): 4608-4612

  13. arXiv:2312.17508  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion

    Authors: Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie

    Abstract: Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. Existing approaches cannot well express fine-grained emotional attributes. In this paper, we propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion. We introduce a two-stage pipeline to effect… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by INTERSPEECH 2023

  14. arXiv:2312.08641  [pdf, other

    eess.AS cs.SD

    Towards Automatic Data Augmentation for Disordered Speech Recognition

    Authors: Zengrui **, Xurong Xie, Tianzi Wang, Mengzhe Geng, Jiajun Deng, Guinan Li, Shujie Hu, Xunying Liu

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and end-to-end Conformer ASR systems on such data. The handcrafted temporal and spectral mask operations in the standard SpecAugment method that are task an… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: To appear at IEEE ICASSP 2024

  15. arXiv:2311.06829  [pdf, ps, other

    eess.SP

    Joint Design of Coding and Modulation for Digital Over-the-Air Computation

    Authors: Xin Xie, Cunqinq Hua, Jianan Hong, Yuejun Wei

    Abstract: Due to its high communication efficiency, over-the-air computation (AirComp) has been expected to carry out various computing tasks in the next-generation wireless networks. However, up to now, most applications of AirComp are explored in the analog domain, which limits the capability of AirComp in resisting the complex wireless environment, not to mention to integrate the AirComp technique to the… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: This paper has been submitted to IEEE ICC 2024

  16. arXiv:2309.17056  [pdf, other

    cs.SD eess.AS

    ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech

    Authors: Wenhao Guan, Qi Su, Haodong Zhou, Shiyu Miao, Xingjia Xie, Lin Li, Qingyang Hong

    Abstract: The diffusion models including Denoising Diffusion Probabilistic Models (DDPM) and score-based generative models have demonstrated excellent performance in speech synthesis tasks. However, its effectiveness comes at the cost of numerous sampling steps, resulting in prolonged sampling time required to synthesize high-quality speech. This drawback hinders its practical applicability in real-world sc… ▽ More

    Submitted 31 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP2024

  17. arXiv:2308.02282  [pdf, other

    cs.LG cs.AI eess.SP

    DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

    Authors: Wang Lu, **dong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, Xing Xie

    Abstract: Time series remains one of the most challenging modalities in machine learning research. The out-of-distribution (OOD) detection and generalization on time series tend to suffer due to its non-stationary property, i.e., the distribution changes over time. The dynamic distributions inside time series pose great challenges to existing algorithms to identify invariant distributions since they mainly… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: Journal version of arXiv:2209.07027; 17 pages

  18. arXiv:2307.05270  [pdf, other

    eess.IV cs.CV

    APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie

    Abstract: Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based map** between sinograms and CT images. However, these methods have not considered the correlation between adjacent projection views… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  19. arXiv:2306.14608  [pdf, other

    eess.AS cs.CL

    Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

    Authors: Jiajun Deng, Guinan Li, Xurong Xie, Zengrui **, Mingyu Cui, Tianzi Wang, Shujie Hu, Mengzhe Geng, Xunying Liu

    Abstract: Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models. Speaker and environment level characteristics are separately mo… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  20. arXiv:2305.11049  [pdf, other

    eess.IV cs.CV cs.LG

    NODE-ImgNet: a PDE-informed effective and robust model for image denoising

    Authors: Xinheng Xie, Yue Wu, Hao Ni, Cuiyu He

    Abstract: Inspired by the traditional partial differential equation (PDE) approach for image denoising, we propose a novel neural network architecture, referred as NODE-ImgNet, that combines neural ordinary differential equations (NODEs) with convolutional neural network (CNN) blocks. NODE-ImgNet is intrinsically a PDE model, where the dynamic system is learned implicitly without the explicit specification… ▽ More

    Submitted 6 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  21. arXiv:2305.10659  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Use of Speech Impairment Severity for Dysarthric Speech Recognition

    Authors: Mengzhe Geng, Zengrui **, Tianzi Wang, Shujie Hu, Jiajun Deng, Mingyu Cui, Guinan Li, Jianwei Yu, Xurong Xie, Xunying Liu

    Abstract: A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using speaker-identity only. To this end, this paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognit… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH2023

  22. arXiv:2304.12184  [pdf, other

    eess.SP cs.AI cs.IT cs.LG

    Active RIS-aided EH-NOMA Networks: A Deep Reinforcement Learning Approach

    Authors: Zhaoyuan Shi, Huabing Lu, Xianzhong Xie, Helin Yang, Chongwen Huang, Jun Cai, Zhiguo Ding

    Abstract: An active reconfigurable intelligent surface (RIS)-aided multi-user downlink communication system is investigated, where non-orthogonal multiple access (NOMA) is employed to improve spectral efficiency, and the active RIS is powered by energy harvesting (EH). The problem of joint control of the RIS's amplification matrix and phase shift matrix is formulated to maximize the communication success ra… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  23. arXiv:2304.11670  [pdf, other

    cs.CV eess.IV

    Evading DeepFake Detectors via Adversarial Statistical Consistency

    Authors: Yang Hou, Qing Guo, Yihao Huang, Xiaofei Xie, Lei Ma, Jianjun Zhao

    Abstract: In recent years, as various realistic face forgery techniques known as DeepFake improves by leaps and bounds,more and more DeepFake detection techniques have been proposed. These methods typically rely on detecting statistical differences between natural (i.e., real) and DeepFakegenerated images in both spatial and frequency domains. In this work, we propose to explicitly minimize the statistical… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023

  24. arXiv:2304.08708  [pdf

    cs.SD eess.AS

    A Voice Disease Detection Method Based on MFCCs and Shallow CNN

    Authors: Hao Cai, Can Li, Fei Ding

    Abstract: The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a w… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  25. arXiv:2304.04106  [pdf, other

    eess.IV cs.CV

    MedGen3D: A Deep Generative Framework for Paired 3D Image and Mask Generation

    Authors: Kun Han, Yifeng Xiong, Chenyu You, Pooya Khosravi, Shanlin Sun, Xiangyi Yan, James Duncan, Xiaohui Xie

    Abstract: Acquiring and annotating sufficient labeled data is crucial in develo** accurate and robust learning-based models, but obtaining such data can be challenging in many medical image segmentation tasks. One promising solution is to synthesize realistic data with ground-truth mask annotations. However, no prior studies have explored generating complete 3D volumetric images with masks. In this paper,… ▽ More

    Submitted 4 July, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Accepted by MICCAI 2023. Project Page: https://krishan999.github.io/MedGen3D/

  26. arXiv:2303.16242  [pdf, other

    eess.IV cs.CV

    CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution

    Authors: Zixuan Chen, Jian-Huang Lai, Lingxiao Yang, Xiaohua Xie

    Abstract: Medical image arbitrary-scale super-resolution (MIASSR) has recently gained widespread attention, aiming to super sample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their application in various scenarios. To overcome these li… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: This paper is accepted by the International Conference on Computer Vision (ICCV) 2023

  27. arXiv:2303.14701  [pdf, ps, other

    eess.SP

    Mathematical Characterization of Signal Semantics and Rethinking of the Mathematical Theory of Information

    Authors: Guangming Shi, Dahua Gao, Shuai Ma, Minxi Yang, Yong Xiao, Xuemei Xie

    Abstract: Shannon information theory is established based on probability and bits, and the communication technology based on this theory realizes the information age. The original goal of Shannon's information theory is to describe and transmit information content. However, due to information is related to cognition, and cognition is considered to be subjective, Shannon information theory is to describe and… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

  28. arXiv:2303.14133  [pdf, other

    eess.IV cs.CR cs.CV

    Adversarial Attack and Defense for Medical Image Analysis: Methods and Applications

    Authors: Junhao Dong, Junxi Chen, Xiaohua Xie, Jianhuang Lai, Hao Chen

    Abstract: Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in deep medical diagnosis systems. In this expositio… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  29. arXiv:2302.14570   

    math.OC eess.SY

    Byzantine-Resilient Multi-Agent Distributed Exact Optimization with Less Data

    Authors: Yang Zhai, Zhi-Wei Liu, Dong Yue, Songlin Hu, Xiangpeng Xie

    Abstract: This paper studies the distributed multi-agent resilient optimization problem under the f-total Byzantine attacks. Compared with the previous work on Byzantineresilient multi-agent exact optimization problems, we do not require the communication topology to be fully connected. Under the redundancy of cost functions, we propose the distributed comparative gradient elimination resilient optimization… ▽ More

    Submitted 28 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: There are some errors in the provement of this paper

  30. arXiv:2302.14564  [pdf, other

    cs.SD cs.AI eess.AS

    Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Zengrui **, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

    Abstract: Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends… ▽ More

    Submitted 22 June, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: accepted by ICASSP 2023

  31. arXiv:2302.07521  [pdf, other

    eess.AS cs.SD

    Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

    Authors: Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui **, Guinan Li, Shujie Hu, Xunying Liu

    Abstract: Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compac… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  32. arXiv:2211.16700  [pdf, ps, other

    eess.SP cs.NI

    AirCon: Over-the-Air Consensus for Wireless Blockchain Networks

    Authors: Xin Xie, Cunqing Hua, Pengwenlong Gu, Wenchao Xu

    Abstract: Blockchain has been deemed as a promising solution for providing security and privacy protection in the next-generation wireless networks. Large-scale concurrent access for massive wireless devices to accomplish the consensus procedure may consume prohibitive communication and computing resources, and thus may limit the application of blockchain in wireless conditions. As most existing consensus p… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 13 pages, 22 figures

  33. arXiv:2211.09313  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition

    Authors: Xurong Xie, Xunying Liu, Hui Chen, Hongan Wang

    Abstract: Modeling the speaker variability is a key challenge for automatic speech recognition (ASR) systems. In this paper, the learning hidden unit contributions (LHUC) based adaptation techniques with compact speaker dependent (SD) parameters are used to facilitate both speaker adaptive training (SAT) and unsupervised test-time speaker adaptation for end-to-end (E2E) lattice-free MMI (LF-MMI) models. An… ▽ More

    Submitted 6 January, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: 6 pages, 2 figures, submitted to ICASSP 2023

  34. arXiv:2211.01646  [pdf, other

    eess.AS cs.SD

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Authors: Zengrui **, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu

    Abstract: Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personali… ▽ More

    Submitted 19 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  35. arXiv:2210.11658  [pdf, other

    eess.SP

    A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters

    Authors: Yu Xuan, Xiangyu Zhang, Shuyue Stella Li, Zihan Shen, Xin Xie, Leibny Paola Garcia, Roberto Togneri

    Abstract: The detection of abnormal fetal heartbeats during pregnancy is important for monitoring the health conditions of the fetus. While adult ECG has made several advances in modern medicine, noninvasive fetal electrocardiography (FECG) remains a great challenge. In this paper, we introduce a new method based on affine combinations of adaptive filters to extract FECG signals. The affine combination of m… ▽ More

    Submitted 26 February, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 5 pages, 4 figures, 3 tables

  36. arXiv:2207.14631  [pdf, other

    cs.IT eess.SP

    Phase Code Discovery for Pulse Compression Radar: A Genetic Algorithm Approach

    Authors: Xinyan Xie, Runxin Zhang, Yulin Shao, Lu Lu

    Abstract: Discovering sequences with desired properties has long been an interesting intellectual pursuit. In pulse compression radar (PCR), discovering phase codes with low aperiodic autocorrelations is essential for a good estimation performance. The design of phase code, however, is mathematically non-trivial as the aperiodic autocorrelation properties of a sequence are intractable to characterize. In th… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: Keywords: Genetic algorithm, pulse compression radar, phase code, mismatched receiver, signal-to-clutter ratio

  37. arXiv:2206.13801  [pdf, other

    cs.IT eess.SP

    Joint Precoding for Active Intelligent Transmitting Surface Empowered Outdoor-to-Indoor Communication in mmWave Cellular Networks

    Authors: Xie Xie, Chen He, Feifei Gao, Zhu Han, Z. Jane Wang

    Abstract: Outdoor-to-indoor communications in millimeter-wave (mmWave) cellular networks have been one challenging research problem due to the severe attenuation and the high penetration loss caused by the propagation characteristics of mmWave signals. We propose a viable solution to implement the outdoor-to-indoor mmWave communication system with the aid of an active intelligent transmitting surface (activ… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: 30 pages, 8 figures

  38. arXiv:2206.12045  [pdf, other

    eess.AS cs.SD

    Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

    Authors: Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui **, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

    Abstract: A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability. In this paper, compact speaker dependent learning hidden unit contributions (LHUC) are used to facilitate both speaker adaptive training (SAT) and test time unsupervised speaker adaptation for state-of-the-art Conformer based end-to-end ASR systems. The sensitivity during adaptation to supervi… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: It's accepted to INTERSPEECH 2022. arXiv admin note: text overlap with arXiv:2206.11596

  39. Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

    Authors: Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng

    Abstract: Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system fea… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: It' s accepted to ISCA 2022

  40. arXiv:2206.07327  [pdf, other

    eess.AS cs.AI

    Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

    Abstract: Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech. Their practical application to atypical task domains such as elderly and disordered speech across languages is often limited by the difficulty in collecting such specialist data from target speakers. This pa… ▽ More

    Submitted 22 June, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: accepted by INTERSPEECH 2023

  41. arXiv:2205.14294  [pdf, other

    eess.AS

    Deep Representation Decomposition for Rate-Invariant Speaker Verification

    Authors: Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li

    Abstract: While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker verification systems, which may actually degrade the system performance. To reduce intra-class discrepancy caused by speaking rate, we propose a deep representation deco… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: Accepted by Odyssey 2022

  42. arXiv:2205.08738  [pdf, other

    cs.MM cs.CV eess.IV

    3D-VFD: A Victim-free Detector against 3D Adversarial Point Clouds

    Authors: Jiahao Zhu, Huajun Zhou, Zixuan Chen, Yi Zhou, Xiaohua Xie

    Abstract: 3D deep models consuming point clouds have achieved sound application effects in computer vision. However, recent studies have shown they are vulnerable to 3D adversarial point clouds. In this paper, we regard these malicious point clouds as 3D steganography examples and present a new perspective, 3D steganalysis, to counter such examples. Specifically, we propose 3D-VFD, a victim-free detector ag… ▽ More

    Submitted 15 February, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 6 pages, 13pages

  43. arXiv:2205.05675  [pdf, other

    cs.CV eess.IV

    NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

    Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

  44. arXiv:2204.14154  [pdf, other

    cs.IT eess.SP

    Outage Performance of Uplink Rate Splitting Multiple Access with Randomly Deployed Users

    Authors: Huabing Lu, Xianzhong Xie, Zhaoyuan Shi, Hongjian Lei, Nan Zhao, Jun Cai

    Abstract: With the rapid proliferation of smart devices in wireless networks, more powerful technologies are expected to fulfill the network requirements of high throughput, massive connectivity, and diversify quality of service. To this end, rate splitting multiple access (RSMA) is proposed as a promising solution to improve spectral efficiency and provide better fairness for the next-generation mobile net… ▽ More

    Submitted 10 April, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: 38 pages,8 figures

  45. arXiv:2203.14593  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition

    Authors: Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui **, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu

    Abstract: Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel fo… ▽ More

    Submitted 28 May, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted to INTERSPEECH 2023

  46. arXiv:2203.10274  [pdf, other

    eess.AS cs.AI

    Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

    Authors: Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

    Abstract: Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech. Their practical application to disordered speech recognition is often limited by the difficulty in collecting such specialist data from impaired speakers. This paper presents a cross-domain acoustic-to-articulatory (… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: accepted by ICASSP 2022

  47. arXiv:2203.10095  [pdf, other

    eess.IV cs.CV

    AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

    Authors: Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, **g Zhang, Xian Wu

    Abstract: Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the da… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted by MICCAI 2021 (the 24th International Conference on Medical Image Computing and Computer Assisted Intervention)

  48. arXiv:2202.10290  [pdf, other

    eess.AS cs.AI cs.LG cs.SD q-bio.QM

    Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

    Authors: Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng

    Abstract: Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech in recent decades, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. Sources of heterogeneity commonly found in normal speech including accent or gender, when further compounded with the variability over age and speech pathology severity level, create… ▽ More

    Submitted 17 March, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: In submission to IEEE/ACM Transactions on Audio Speech and Language Processing

  49. arXiv:2201.09685  [pdf, other

    cs.IT eess.SP

    Robust Joint Design for Intelligent Reflecting Surfaces Assisted Cell-Free Networks

    Authors: Xie Xie, Chen He, Xiaoya Li, Zhu Han, Z. Jane Wang

    Abstract: Intelligent reflecting surfaces (IRSs) have emerged as a promising economical solution to implement cell-free networks. However, the performance gains achieved by IRSs critically depend on smartly tuned passive beamforming based on the assumption that the accurate channel state information (CSI) knowledge is available, which is practically impossible. Thus, in this paper, we investigate the impact… ▽ More

    Submitted 20 February, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: 30 pages

  50. arXiv:2201.09432  [pdf

    eess.AS cs.SD

    Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition

    Authors: Xurong Xie, Xiang Sui, Xunying Liu, Lan Wang

    Abstract: The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural… ▽ More

    Submitted 14 June, 2024; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: Published in JOURNAL OF INTEGRATION TECHNOLOGY CNKI:SUN:JCJI.0.2015-06-003

    Journal ref: JOURNAL OF INTEGRATION TECHNOLOGY, Vol. 4, No. 6, Nov. 2015