Skip to main content

Showing 1–50 of 203 results for author: Xu, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.13413  [pdf, other

    eess.IV cs.CV

    Recurrent Inference Machine for Medical Image Registration

    Authors: Yi Zhang, Yidong Zhao, Hui Xue, Peter Kellman, Stefan Klein, Qian Tao

    Abstract: Image registration is essential for medical image applications where alignment of voxels across multiple images is needed for qualitative or quantitative analysis. With recent advancements in deep neural networks and parallel computing, deep learning-based medical image registration methods become competitive with their flexible modelling and fast inference capabilities. However, compared to tradi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.11653  [pdf, other

    eess.SY

    Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs

    Authors: Min Hua, Dong Chen, Kun Jiang, Fanggang Zhang, **hai Wang, Bo Wang, Quan Zhou, Hongming Xu

    Abstract: Cooperative adaptive cruise control (CACC) has been recognized as a fundamental function of autonomous driving, in which platoon stability and energy efficiency are outstanding challenges that are difficult to accommodate in real-world operations. This paper studied the CACC of connected and autonomous vehicles (CAVs) based on the multi-agent reinforcement learning algorithm (MARL) to optimize pla… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.11446  [pdf, other

    eess.SP

    Approximate Angular Domain Expression for Near-Field XL-MIMO Channel

    Authors: Hongbo Xing, Yuxiang Zhang, Jianhua Zhang, Huixin Xu, Guangyi Liu, Qixing Wang

    Abstract: As Extremely Large-Scale Multiple-Input-Multiple-Output (XL-MIMO) technology advances and frequency band rises, the near-field effects in communication are intensifying. A concise and accurate near-field XL-MIMO channel model serves as the cornerstone for investigating the near-field effects. However, existing angular domain XL-MIMO channel models under near-field conditions require non-closed-for… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.11265  [pdf, ps, other

    eess.SY

    Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning

    Authors: Yuanzhe Geng, Erwu Liu, Wei Ni, Rui Wang, Yan Liu, Hao Xu, Chen Cai, Abbas Jamalipour

    Abstract: This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays for… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui **, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  6. arXiv:2406.09656  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement

    Authors: **gcheng Li, Ye Qiao, Haocheng Xu, Sitao Huang

    Abstract: Images captured under low-light scenarios often suffer from low quality. Previous CNN-based deep learning methods often involve using Retinex theory. Nevertheless, most of them cannot perform well in more complicated datasets like LOL-v2 while consuming too much computational resources. Besides, some of these methods require sophisticated training at different stages, making the procedure even mor… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.07256  [pdf, ps, other

    cs.SD cs.AI eess.AS

    AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

    Authors: Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, Binbin Zhang, Jun Du, Jia Bin, Ming Li

    Abstract: The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  8. arXiv:2406.06643  [pdf

    eess.IV

    Transforming Heart Chamber Imaging: Self-Supervised Learning for Whole Heart Reconstruction and Segmentation

    Authors: Abdul Qayyum, Hao Xu, Brian P. Halliday, Cristobal Rodero, Christopher W. Lanyon, Richard D. Wilkinson, Steven Alexander Niederer

    Abstract: Automated segmentation of Cardiac Magnetic Resonance (CMR) plays a pivotal role in efficiently assessing cardiac function, offering rapid clinical evaluations that benefit both healthcare practitioners and patients. While recent research has primarily focused on delineating structures in the short-axis orientation, less attention has been given to long-axis representations, mainly due to the compl… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2206.07349 by other authors

  9. arXiv:2406.06220  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Label-Loo**: Highly Efficient Decoding for Transducers

    Authors: Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: This paper introduces a highly efficient greedy decoding algorithm for Transducer inference. We propose a novel data structure using CUDA tensors to represent partial hypotheses in a batch that supports parallelized hypothesis manipulations. During decoding, our algorithm maximizes GPU parallelism by adopting a nested-loop design, where the inner loop consumes all blank predictions, while non-blan… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  10. arXiv:2405.17100  [pdf, other

    cs.CR cs.SD eess.AS

    Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

    Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, **g Chen, Kun He, Ruiying Du

    Abstract: The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.15831  [pdf, other

    eess.SY cs.AI cs.LG

    Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

    Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

    Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Power Systems

  12. arXiv:2405.15607  [pdf, other

    eess.SP

    Channel Estimation and Reconstruction in Fluid Antenna System: Oversampling is Essential

    Authors: Wee Kiat New, Kai-Kit Wong, Hao Xu, Farshad Rostami Ghadi, Ross Murch, Chan-Byoung Chae

    Abstract: Fluid antenna system (FAS) has recently surfaced as a promising technology for the upcoming sixth generation (6G) wireless networks. Unlike traditional antenna system (TAS) with fixed antenna location, FAS introduces a flexible component where the radiating element can switch its position within a predefined space. This capability allows FAS to achieve additional diversity and multiplexing gains.… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 14 figures - including subfigures. Submitted for potential publication

  13. arXiv:2405.05715  [pdf, other

    eess.SP

    Shifting the ISAC Trade-Off with Fluid Antenna Systems

    Authors: Jiaqi Zou, Hao Xu, Chao Wang, Lvxin Xu, Songlin Sun, Kaitao Meng, Christos Masouros, Kai-Kit Wong

    Abstract: As an emerging antenna technology, a fluid antenna system (FAS) enhances spatial diversity to improve both sensing and communication performance by shifting the active antennas among available ports. In this letter, we study the potential of shifting the integrated sensing and communication (ISAC) trade-off with FAS. We propose the model for FAS-enabled ISAC and jointly optimize the transmit beamf… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures

  14. arXiv:2405.02361  [pdf, other

    eess.IV

    Technical report on target classification in SAR track

    Authors: Haonan Xu, Han Yinan, Haotian Si, Yang Yang

    Abstract: This report proposes a robust method for classifying oceanic and atmospheric phenomena using synthetic aperture radar (SAR) imagery. Our proposed method leverages the powerful pre-trained model Swin Transformer v2 Large as the backbone and employs carefully designed data augmentation and exponential moving average during training to enhance the model's generalization capability and stability. In t… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.06221, arXiv:2111.12797 by other authors

  15. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  16. arXiv:2404.19167  [pdf

    eess.IV physics.med-ph

    Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

    Authors: Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo ** Lee, Michael Ohliger, Hui Xue, Yang Yang

    Abstract: Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  17. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  18. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  19. arXiv:2404.04295  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

    Authors: Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg

    Abstract: This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that P… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: accepted at the ICASSP 2024 conference

  20. arXiv:2404.02384  [pdf

    eess.IV

    Inline AI: Open-source Deep Learning Inference for Cardiac MR

    Authors: Hui Xue, Rhodri H Davies, James Howard, Hunain Shiwani, Azaan Rehman, Iain Pierce, Henry Procter, Marianna Fontana, James C Moon, Eylem Levelt, Peter Kellman

    Abstract: Cardiac Magnetic Resonance (CMR) is established as a non-invasive imaging technique for evaluation of heart function, anatomy, and myocardial tissue characterization. Quantitative biomarkers are central for diagnosis and management of heart disease. Deep learning (DL) is playing an ever more important role in extracting these quantitative measures from CMR images. While many researchers have repor… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2404.02382  [pdf

    eess.IV

    Imaging transformer for MRI denoising with the SNR unit training: enabling generalization across field-strengths, imaging contrasts, and anatomy

    Authors: Hui Xue, Sarah Hooper, Azaan Rehman, Iain Pierce, Thomas Treibel, Rhodri Davies, W Patricia Bandettini, Rajiv Ramasawmy, Ahsan Javed, Zheren Zhu, Yang Yang, James Moon, Adrienne Campbell, Peter Kellman

    Abstract: The ability to recover MRI signal from noise is key to achieve fast acquisition, accurate quantification, and high image quality. Past work has shown convolutional neural networks can be used with abundant and paired low and high-SNR images for training. However, for applications where high-SNR data is difficult to produce at scale (e.g. with aggressive acceleration, high resolution, or low field… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  22. arXiv:2403.19983  [pdf, other

    eess.IV cs.CV

    A multi-stage semi-supervised learning for ankle fracture classification on CT images

    Authors: Hongzhi Liu, Guicheng Li, Jiacheng Nie, Hui Tang, Chunfeng Yang, Qian** Feng, Hailin Xu, Yang Chen

    Abstract: Because of the complicated mechanism of ankle injury, it is very difficult to diagnose ankle fracture in clinic. In order to simplify the process of fracture diagnosis, an automatic diagnosis model of ankle fracture was proposed. Firstly, a tibia-fibula segmentation network is proposed for the joint tibiofibular region of the ankle joint, and the corresponding segmentation dataset is established o… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  23. arXiv:2403.16665  [pdf, other

    cs.DS cs.DM eess.SP stat.CO

    Adaptive Frequency Bin Interval in FFT via Dense Sampling Factor $α$

    Authors: Haichao Xu

    Abstract: The Fast Fourier Transform (FFT) is a fundamental tool for signal analysis, widely used across various fields. However, traditional FFT methods encounter challenges in adjusting the frequency bin interval, which may impede accurate spectral analysis. In this study, we propose a method for adjusting the frequency bin interval in FFT by introducing a parameter $α$. We elucidate the underlying princi… ▽ More

    Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  24. arXiv:2403.16643  [pdf, other

    eess.IV cs.CV

    Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

    Authors: Qing** Zheng, Ling Zheng, Yuanfan Guo, Ying Li, Songcen Xu, Jiankang Deng, Hang Xu

    Abstract: Artifact-free super-resolution (SR) aims to translate low-resolution images into their high-resolution counterparts with a strict integrity of the original content, eliminating any distortions or synthetic details. While traditional diffusion-based SR techniques have demonstrated remarkable abilities to enhance image detail, they are prone to artifact introduction during iterative procedures. Such… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  25. arXiv:2403.16397  [pdf, other

    eess.SP cs.AI

    RadioGAT: A Joint Model-based and Data-driven Framework for Multi-band Radiomap Reconstruction via Graph Attention Networks

    Authors: Xiaojie Li, Songyang Zhang, Hang Li, Xiaoyang Li, Lexi Xu, Haigao Xu, Hui Mei, Guangxu Zhu, Nan Qi, Ming Xiao

    Abstract: Multi-band radiomap reconstruction (MB-RMR) is a key component in wireless communications for tasks such as spectrum management and network planning. However, traditional machine-learning-based MB-RMR methods, which rely heavily on simulated data or complete structured ground truth, face significant deployment challenges. These challenges stem from the differences between simulated and actual data… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: submitted to IEEE journal for possible publication

  26. arXiv:2403.13332  [pdf, other

    eess.AS cs.SD

    TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

    Authors: Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

    Abstract: Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention. Existing KWS search algorithms typically follow a frame-synchronous approach, where search decisions are made repeatedly at each frame despite the fact that most frames are keyword-irrelevant. In this paper, we propose TDT… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP2024

  27. arXiv:2402.16116  [pdf, other

    cs.IT eess.SP

    On Performance of RIS-Aided Fluid Antenna Systems

    Authors: Farshad Rostami Ghadi, Kai-Kit Wong, Wee Kiat New, Hao Xu, Ross Murch, Yangyang Zhang

    Abstract: This letter studies the performance of reconfigurable intelligent surface (RIS)-aided communications for a fluid antenna system (FAS) enabled receiver. Specifically, a fixed singleantenna base station (BS) transmits information through a RIS to a mobile user (MU) which is equipped with a planar fluid antenna in the absence of a direct link.We first analyze the spatial correlation structures among… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  28. arXiv:2402.06903  [pdf, other

    eess.SY math.DS

    High-Performance Distributed Control for Large-Scale Linear Systems: A Partitioned Distributed Observer Approach

    Authors: Haotian Xu, Shuai Liu, Ling Shi

    Abstract: In recent years, the distributed-observer-based distributed control law has shown powerful ability to arbitrarily approximate the centralized control performance. However, the traditional distributed observer requires each local observer to reconstruct the state information of the whole system, which is unrealistic for large-scale scenarios. To fill this gap, this paper develops a greedy-idea-base… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  29. arXiv:2402.05722  [pdf, other

    cs.IT eess.SP

    Physical Layer Security over Fluid Antenna Systems

    Authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, Wee Kiat New, Hao Xu, Chan-Byoung Chae

    Abstract: This paper investigates the performance of physical layer security (PLS) in fluid antenna-aided communication systems under arbitrary correlated fading channels. In particular, it is considered that a single fixed-antenna transmitter aims to send confidential information to a legitimate receiver equipped with a planar fluid antenna system (FAS), while an eavesdropper, also taking advantage of a pl… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  30. arXiv:2402.01172  [pdf, other

    cs.CL cs.SD eess.AS

    Streaming Sequence Transduction through Dynamic Compression

    Authors: Weiting Tan, Yunmo Chen, Tongfei Chen, Guanghui Qin, Haoran Xu, Heidi C. Zhang, Benjamin Van Durme, Philipp Koehn

    Abstract: We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrat… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  31. arXiv:2401.06798  [pdf

    q-bio.NC eess.IV

    Evaluation of Mean Shift, ComBat, and CycleGAN for Harmonizing Brain Connectivity Matrices Across Sites

    Authors: Hanliang Xu, Nancy R. Newlin, Michael E. Kim, Chenyu Gao, Praitayini Kanakaraj, Aravind R. Krishnan, Lucas W. Remedios, Nazirah Mohd Khairi, Kimberly Pechman, Derek Archer, Timothy J. Hohman, Angela L. Jefferson, The BIOCARD Study Team, Ivana Isgum, Yuankai Huo, Daniel Moyer, Kurt G. Schilling, Bennett A. Landman

    Abstract: Connectivity matrices derived from diffusion MRI (dMRI) provide an interpretable and generalizable way of understanding the human brain connectome. However, dMRI suffers from inter-site and between-scanner variation, which impedes analysis across datasets to improve robustness and reproducibility of results. To evaluate different harmonization approaches on connectivity matrices, we compared graph… ▽ More

    Submitted 24 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 11 pages, 5 figures, to be published in SPIE Medical Imaging 2024: Image Processing

  32. arXiv:2401.03476  [pdf, other

    cs.MM cs.AI cs.HC cs.SD eess.AS

    Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

    Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

    Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, ICASSP 2024

  33. arXiv:2401.00662  [pdf, other

    cs.SD eess.AS

    Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

    Authors: Huimeng Wang, Zengrui **, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi Wang, Haoning Xu, Xunying Liu

    Abstract: Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained ASR models to limited dysarthric speech via data-intensive parameter fine-tuning leads to poor generalization. To this end, this paper presents an ext… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

    Comments: To appear at IEEE ICASSP 2024

  34. arXiv:2401.00475  [pdf, other

    cs.SD eess.AS

    E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

    Authors: Hongfei Xue, Yuhao Liang, Bingshen Mu, Shiliang Zhang, Mengzhe Chen, Qian Chen, Lei Xie

    Abstract: This study focuses on emotion-sensitive spoken dialogue in human-machine speech interaction. With the advancement of Large Language Models (LLMs), dialogue systems can handle multimodal data, including audio. Recent models have enhanced the understanding of complex audio signals through the integration of various audio events. However, they are unable to generate appropriate responses based on emo… ▽ More

    Submitted 6 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: 6 pages, 3 figures

  35. arXiv:2312.17444  [pdf, other

    cs.ET eess.SP

    Reconfigurable Frequency Multipliers Based on Complementary Ferroelectric Transistors

    Authors: Haotian Xu, Jianyi Yang, Cheng Zhuo, Thomas Kämpfe, Kai Ni, Xunzhao Yin

    Abstract: Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering an… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 6 pages, 8 figures, 1 table. Accepted by Design Automation and Test in Europe (DATE) 2024

  36. arXiv:2312.15267  [pdf, other

    eess.SP math.NA

    A new type of window functions constructed with exponential function

    Authors: Haichao Xu, Xingpao Suo

    Abstract: The Discrete Fourier Transform (DFT) is widely utilized for signal analysis but is plagued by spectral leakage, leading to inaccuracies in signal approximation. Window functions play a crucial role in mitigating spectral leakage by providing weighting mechanisms for discrete signals. In this paper, we introduce a novel window type based on exponential function, allowing for adjustable parameters a… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  37. arXiv:2312.12644  [pdf, other

    eess.IV cs.CV physics.med-ph

    Rotational Augmented Noise2Inverse for Low-dose Computed Tomography Reconstruction

    Authors: Hang Xu, Alessandro Perelli

    Abstract: In this work, we present a novel self-supervised method for Low Dose Computed Tomography (LDCT) reconstruction. Reducing the radiation dose to patients during a CT scan is a crucial challenge since the quality of the reconstruction highly degrades because of low photons or limited measurements. Supervised deep learning methods have shown the ability to remove noise in images but require accurate g… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 14 pages, 12 figures, accepted manuscript in IEEE Transactions on Radiation and Plasma Medical Sciences

    MSC Class: 92C55; 94A08 ACM Class: I.4.5; J.3

  38. Fluid Antenna-Assisted MIMO Transmission Exploiting Statistical CSI

    Authors: Yuqi Ye, Li You, Jue Wang, Hao Xu, Kai-Kit Wong, Xiqi Gao

    Abstract: In conventional multiple-input multiple-output (MIMO) communication systems, the positions of antennas are fixed. To take full advantage of spatial degrees of freedom, a new technology called fluid antenna (FA) is proposed to obtain higher achievable rate and diversity gain. Most existing works on FA exploit instantaneous channel state information (CSI). However, in FA-assisted systems, it is diff… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: to appear in IEEE Communications Letters

    Journal ref: IEEE Communications Letters, vol. 28, no. 1, pp. 223-227, Jan. 2024

  39. arXiv:2312.04371  [pdf, other

    math.OC cs.LG cs.MA eess.SY

    A Scalable Network-Aware Multi-Agent Reinforcement Learning Framework for Decentralized Inverter-based Voltage Control

    Authors: Han Xu, Jialin Zheng, Guannan Qu

    Abstract: This paper addresses the challenges associated with decentralized voltage control in power grids due to an increase in distributed generations (DGs). Traditional model-based voltage control methods struggle with the rapid energy fluctuations and uncertainties of these DGs. While multi-agent reinforcement learning (MARL) has shown potential for decentralized secondary control, scalability issues ar… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  40. arXiv:2311.11041  [pdf, other

    cs.IT eess.SP

    Channel Estimation for FAS-assisted Multiuser mmWave Systems

    Authors: Hao Xu, Gui Zhou, Kai-Kit Wong, Wee Kiat New, Chao Wang, Chan-Byoung Chae, Ross Murch, Shi **, Yangyang Zhang

    Abstract: This letter investigates the challenge of channel estimation in a multiuser millimeter-wave (mmWave) time-division duplexing (TDD) system. In this system, the base station (BS) employs a multi-antenna uniform linear array (ULA), while each mobile user is equipped with a fluid antenna system (FAS). Accurate channel state information (CSI) plays a crucial role in the precise placement of antennas in… ▽ More

    Submitted 3 January, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures

  41. arXiv:2311.07036  [pdf

    eess.SY

    An Event-Based Synchronization Framework for Controller Hardware-in-the-loop Simulation of Electric Railway Power Electronics Systems

    Authors: Jialin Zheng, Yangbin Zeng, Han Xu, Weicheng Liu, Di Mou, Zhengming Zhao

    Abstract: The Controller Hardware_in_the_loop (CHIL) simulation is gaining popularity as a cost_effective, efficient, and reliable tool in the design and development process of fast_growing electrified transportation power converters. However, it is challenging to implement the conventional CHIL simulations on the railway power converters with complex topologies and high switching frequencies due to strict… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  42. arXiv:2311.07029  [pdf

    eess.SY

    Accurate Time-segmented Loss Model for SiC MOSFETs in Electro-thermal Multi-Rate Simulation

    Authors: Jialin Zheng, Zhengming Zhao, Han Xu, Weicheng Liu, Yangbin Zeng

    Abstract: Compared with silicon (Si) power devices, Silicon carbide (SiC) devices have the advantages of fast switching speed and low on-resistance. However, the effects of non-ideal characteristics of SiC MOSFETs and stray parameters (especially parasitic inductance) on switching losses need to be further evaluated. In this paper, a transient loss model based on SiC MOSFET and SiC Schottky barrier diode (S… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  43. arXiv:2311.06501  [pdf, ps, other

    eess.SP

    Sum-Rate Optimization for RIS-Aided Multiuser Communications with Movable Antenna

    Authors: Yunan Sun, Hao Xu, Chongjun Ouyang, Hongwen Yang

    Abstract: Reconfigurable intelligent surface (RIS) is known as a promising technology to improve the performance of wireless communication networks, which has been extensively studied. Movable antenna (MA) is a novel technology that fully exploits the antenna position for enhancing the channel capacity. In this paper, we propose a new RIS-aided multiuser communication system with MAs. The sum-rate is maximi… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 5 pages

  44. FPGA-Based Implicit-Explicit Real-time Simulation Solver for Railway Wireless Power Transfer with Nonlinear Magnetic Coupling Components

    Authors: Han Xu, Yangbin Zeng, Jialin Zheng, Kainan Chen, Weicheng Liu, Zhengming Zhao

    Abstract: Railway Wireless Power Transfer (WPT) is a promising non-contact power supply solution, but constructing prototypes for controller testing can be both costly and unsafe. Real-time hardware-in-the-loop simulation is an effective and secure testing tool, but simulating the dynamic charging process of railway WPT systems is challenging due to the continuous changes in the nonlinear magnetic coupling… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  45. Numerical Derivative-based Flexible Integration Algorithm for Power Electronic Systems Simulation Considering Nonlinear Components

    Authors: Han Xu, Bochen Shi, Zhujun Yu, Jialin Zheng, Zhengming Zhao

    Abstract: Simulation is an efficient tool in the design and control of power electronic systems. However, quick and accurate simulation of them is still challenging, especially when the system contains a large number of switches and state variables. Conventional general-purpose integration algorithms assume nonlinearity within systems but face inefficiency in handling the piecewise characteristics of power… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 10 pages, 8 figures

  46. arXiv:2310.08981  [pdf, other

    cs.SD cs.MM eess.AS

    Low-latency Speech Enhancement via Speech Token Generation

    Authors: Huaying Xue, Xiulian Peng, Yan Lu

    Abstract: Existing deep learning based speech enhancement mainly employ a data-driven approach, which leverage large amounts of data with a variety of noise types to achieve noise removal from noisy signal. However, the high dependence on the data limits its generalization on the unseen complex noises in real-life environment. In this paper, we focus on the low-latency scenario and regard speech enhancement… ▽ More

    Submitted 23 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 5 pages, ICASSP2024(accepted)

  47. arXiv:2310.02629  [pdf, other

    cs.SD eess.AS

    BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

    Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More

    Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  48. arXiv:2309.16937  [pdf, other

    cs.CL cs.SD eess.AS

    SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

    Authors: Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Jie Liu, Lei Xie

    Abstract: Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth noting that various layers' representations potentially contain distinct information that has not been fully leveraged. In this study, w… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 5 pages, 2 figures. Accepted by ICME 2024

  49. arXiv:2309.15796  [pdf, other

    eess.AS cs.CL cs.LG

    Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

    Authors: Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

    Abstract: Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-temporal Classification (OTC), a novel training criterion that explicitly incorporates label uncertainties originating from such weak supervision. Thi… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  50. arXiv:2309.10716  [pdf, other

    cs.RO eess.SY

    Learning Model Predictive Control with Error Dynamics Regression for Autonomous Racing

    Authors: Haoru Xue, Edward L. Zhu, John M. Dolan, Francesco Borrelli

    Abstract: This work presents a novel Learning Model Predictive Control (LMPC) strategy for autonomous racing at the handling limit that can iteratively explore and learn unknown dynamics in high-speed operational domains. We start from existing LMPC formulations and modify the system dynamics learning method. In particular, our approach uses a nominal, global, nonlinear, physics-based model with a local, li… ▽ More

    Submitted 7 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted by ICRA 2024