Skip to main content

Showing 1–50 of 167 results for author: Zhu, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11401  [pdf, other

    eess.AS

    An Exploration of Length Generalization in Transformer-Based Speech Enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

    Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  2. arXiv:2406.11336  [pdf, other

    eess.SY

    LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model

    Authors: Mingyang Gao, Suyang Zhou, Wei Gu, Zhi Wu, Zijian Hu, Hong Zhu, Haiquan Liu

    Abstract: Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures and 5 tables

  3. arXiv:2406.10897  [pdf, ps, other

    eess.SP

    When NOMA Meets AIGC: Enhanced Wireless Federated Learning

    Authors: Ding Xu, Lingjie Duan, Hongbo Zhu

    Abstract: Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages, submitted to IEEE TWC for possible publication

  4. arXiv:2406.10895  [pdf, ps, other

    eess.SP

    Fair Computation Offloading for RSMA-Assisted Mobile Edge Computing Networks

    Authors: Ding Xu, Lingjie Duan, Haitao Zhao, Hongbo Zhu

    Abstract: Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages,submitted to IEEE TWC for possible publication

  5. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2405.17496  [pdf, other

    eess.IV

    UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

    Authors: Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang

    Abstract: Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, w… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  7. arXiv:2405.05521  [pdf, other

    cs.LG eess.SY

    Machine Learning for Scalable and Optimal Load Shedding Under Power System Contingency

    Authors: Yuqi Zhou, Hao Zhu

    Abstract: Prompt and effective corrective actions in response to unexpected contingencies are crucial for improving power system resilience and preventing cascading blackouts. The optimal load shedding (OLS) accounting for network limits has the potential to address the diverse system-wide impacts of contingency scenarios as compared to traditional local schemes. However, due to the fast cascading propagati… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  8. Optimal Structure of Receive Beamforming for Over-the-Air Computation

    Authors: Hongbin Zhu, Hua Qian

    Abstract: We investigate fast data aggregation via over-the-air computation (AirComp) over wireless networks. In this scenario, an access point (AP) with multiple antennas aims to recover the arithmetic mean of sensory data from multiple wireless devices. To minimize estimation distortion, we formulate a mean-squared-error (MSE) minimization problem that considers joint optimization of transmit scalars at w… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Published on IEEE ICASSP 2024

  9. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  10. arXiv:2403.16150  [pdf, other

    eess.SP

    Fusion of Active and Passive Measurements for Robust and Scalable Positioning

    Authors: Hong Zhu, Alexander Venus, Erik Leitinger, Stefan Tertinek, Klaus Witrisal

    Abstract: This paper addresses the challenge of achieving reliable and robust positioning of a mobile agent, such as a radio device carried by a person, in scenarios where direct line-of-sight (LOS) links are obstructed or unavailable. The human body is considered as an extended object that scatters, attenuates and blocks the radio signals. We propose a novel particle-based sum-product algorithm (SPA) that… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  11. arXiv:2403.10815  [pdf, other

    eess.IV cs.CV

    MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

    Authors: Mude Hui, Zihao Wei, Hongru Zhu, Fei Xia, Yuyin Zhou

    Abstract: Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yiel… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  12. arXiv:2403.08247  [pdf, other

    eess.IV cs.CV

    A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

    Authors: Hongyang Zhu, Xin Lu, Yanwei Qin, Xinran Yu, Tianjiao Sun, Yunsong Zhao

    Abstract: Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on th… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  13. arXiv:2403.05912  [pdf, other

    eess.IV cs.CV

    Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

    Authors: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

    Abstract: Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  14. arXiv:2402.19387  [pdf, other

    eess.IV cs.CV

    SeD: Semantic-Aware Discriminator for Image Super-Resolution

    Authors: Bingchen Li, Xin Li, Hanxin Zhu, Yeying **, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen

    Abstract: Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and caus… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR2024

  15. arXiv:2402.17797  [pdf, other

    eess.IV cs.CV

    Neural Radiance Fields in Medical Imaging: Challenges and Next Steps

    Authors: Xin Wang, Shu Hu, Heng Fan, Hongtu Zhu, Xin Li

    Abstract: Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data. However, they face unique challenges when applied to medical applications. This paper presents a comprehensive examination of applications of NeRFs in medical imaging, hig… ▽ More

    Submitted 21 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  16. arXiv:2402.14025  [pdf, other

    eess.SP

    Spectral Efficiency Maximization for Active RIS-aided Cell-Free Massive MIMO Systems with Imperfect CSI

    Authors: Mahdi Eskandari, Huiling Zhu, Jiangzhou Wang

    Abstract: A cell-free network merged with active reconfigurable reflecting surfaces (RIS) is investigated in this paper. Based on the imperfect channel state information (CSI), the aggregated channel from the user to the access point (AP) is initially estimated using the linear minimum mean square error (LMMSE) technique. The central processing unit (CPU) then detects uplink data from individual users throu… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  17. arXiv:2402.11294  [pdf, other

    cs.IT eess.SP

    Power Optimization for Integrated Active and Passive Sensing in DFRC Systems

    Authors: Xingliang Lou, Wenchao Xia, Kai-Kit Wong, Haitao Zhao, Tony Q. S. Quek, Hongbo Zhu

    Abstract: Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  18. arXiv:2402.06875  [pdf, other

    eess.IV cs.CV

    Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework

    Authors: Mengqi Wu, Lintao Zhang, Pew-Thian Yap, Hongtu Zhu, Mingxia Liu

    Abstract: Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However,… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  19. arXiv:2402.03394  [pdf, other

    eess.IV

    Artificial Intelligence in Image-based Cardiovascular Disease Analysis: A Comprehensive Survey and Future Outlook

    Authors: Xin Wang, Hongtu Zhu

    Abstract: Recent advancements in Artificial Intelligence (AI) have significantly influenced the field of Cardiovascular Disease (CVD) analysis, particularly in image-based diagnostics. Our paper presents an extensive review of AI applications in image-based CVD analysis, offering insights into its current state and future potential. We systematically categorize the literature based on the primary anatomical… ▽ More

    Submitted 22 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  20. arXiv:2402.02735  [pdf, other

    eess.SY

    Timed-Elastic-Band Based Variable Splitting for Autonomous Trajectory Planning

    Authors: Hao Zhu, Kefan **, Rui Gao, Jialin Wang, C. -J. Richard Shi

    Abstract: Existing trajectory planning methods are struggling to handle the issue of autonomous track swinging during navigation, resulting in significant errors when reaching the destination. In this article, we address autonomous trajectory planning problems, which aims at develo** innovative solutions to enhance the adaptability and robustness of unmanned systems in navigating complex and dynamic envir… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  21. arXiv:2401.09686  [pdf, other

    eess.AS cs.SD

    An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

    Authors: Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

    Abstract: Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform… ▽ More

    Submitted 13 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  22. arXiv:2401.04965  [pdf

    eess.SP cs.LG

    ConvConcatNet: a deep convolutional neural network to reconstruct mel spectrogram from the EEG

    Authors: Xiran Xu, Bo Wang, Yujie Yan, Haolin Zhu, Zechen Zhang, Xihong Wu, **g Chen

    Abstract: To investigate the processing of speech in the brain, simple linear models are commonly used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly dynamic and complex non-linear system like the brain. Although non-linear methods with neural networks have been developed recently, reconstructing unseen stimuli from unse… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 2 pages, 1 figure, 2 tables

  23. arXiv:2401.04964  [pdf

    eess.SP cs.SD eess.AS

    Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording

    Authors: Bo Wang, Xiran Xu, Zechen Zhang, Haolin Zhu, YuJie Yan, Xihong Wu, **g Chen

    Abstract: Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits… ▽ More

    Submitted 31 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: 2 pages, 2 figures, accepted by ICASSP 2024

  24. arXiv:2311.09857  [pdf

    physics.optics eess.SP physics.app-ph

    Integrated lithium niobate photonic millimeter-wave radar

    Authors: Sha Zhu, Yiwen Zhang, Jiaxue Feng, Yongji Wang, Kunpeng Zhai, Hanke Feng, Edwin Yue Bun Pun, Ning Hua Zhu, Cheng Wang

    Abstract: Millimeter-wave (mmWave,>30 GHz) radars are the key enabler in the coming 6G era for high-resolution sensing and detection of targets. Photonic radar provides an effective approach to overcome the limitations of electronic radars thanks to the high frequency, broad bandwidth, and excellent reconfigurability of photonic systems. However, conventional photonic radars are mostly realized in tabletop… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  25. arXiv:2310.09843  [pdf

    cs.SD cs.AI eess.AS

    CoCoFormer: A controllable feature-rich polyphonic music generation method

    Authors: Jiuyang Zhou, Tengfei Niu, Hong Zhu, ** Wang

    Abstract: This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music t… ▽ More

    Submitted 27 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  26. arXiv:2310.08960  [pdf, other

    eess.SP

    A unified framework for STAR-RIS coefficients optimization

    Authors: Hancheng Zhu, Yuanwei Liu, Yik Chung Wu, Vincent K. N. Lau

    Abstract: Simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which serves users located on both sides of the surface, has recently emerged as a promising enhancement to the traditional reflective only RIS. Due to the lack of a unified comparison of communication systems equipped with different modes of STAR-RIS and the performance degradation caused by the constraint… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  27. arXiv:2310.06328  [pdf, other

    cs.LG eess.SP

    Antenna Response Consistency Driven Self-supervised Learning for WIFI-based Human Activity Recognition

    Authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

    Abstract: Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate a… ▽ More

    Submitted 28 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  28. Continuous 3D Myocardial Motion Tracking via Echocardiography

    Authors: Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin

    Abstract: Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally. However, current techniques suffer from incomplete and inaccurate motion estimation of the myocardium in both spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. To address these challenge… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 18 pages, 11 figures

    Journal ref: IEEE Transactions on Medical Imaging, June 2024

  29. arXiv:2310.01656  [pdf, other

    eess.SY eess.SP

    Data-driven Forced Oscillation Localization using Inferred Impulse Responses

    Authors: Shaohui Liu, Hao Zhu, Vassilis Kekatos

    Abstract: Poorly damped oscillations pose threats to the stability and reliability of interconnected power systems. In this work, we propose a comprehensive data-driven framework for inferring the sources of forced oscillation (FO) using solely synchrophasor measurements. During normal grid operations, fast-rate ambient data are collected to recover the impulse responses in the small-signal regime, without… ▽ More

    Submitted 15 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  30. Symbol Detection for Coarsely Quantized OTFS

    Authors: Junwei He, Haochuan Zhang, Chao Dong, Huimin Zhu

    Abstract: This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropi… ▽ More

    Submitted 20 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

  31. arXiv:2309.07140  [pdf

    eess.SP cs.LG eess.SY

    Short-term power load forecasting method based on CNN-SAEDN-Res

    Authors: Yang Cui, Han Zhu, Yijian Wang, Lu Zhang, Yang Li

    Abstract: In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is com… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: in Chinese language, Accepted by Electric Power Automation Equipment

    Journal ref: Electric Power Automation Equipment 44 (2024) 164-170

  32. arXiv:2309.03472  [pdf, other

    cs.CV eess.IV

    Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation

    Authors: Xiangjie Sui, Hanwei Zhu, Xuelin Liu, Yuming Fang, Shiqi Wang, Zhou Wang

    Abstract: Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that sig… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: 12 pages, 5 figures

  33. arXiv:2309.00831  [pdf, other

    eess.IV cs.CV

    Multi-scale, Data-driven and Anatomically Constrained Deep Learning Image Registration for Adult and Fetal Echocardiography

    Authors: Md. Kamrul Hasan, Haobo Zhu, Guang Yang, Choon Hwai Yap

    Abstract: Temporal echocardiography image registration is a basis for clinical quantifications such as cardiac motion estimation, myocardial strain assessments, and stroke volume quantifications. In past studies, deep learning image registration (DLIR) has shown promising results and is consistently accurate and precise, requiring less computational time. We propose that a greater focus on the warped moving… ▽ More

    Submitted 11 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: Our data-driven and anatomically constrained DLIR method's source code will be publicly available at https://github.com/kamruleee51/DdC-AC-DLIR

  34. arXiv:2308.14774  [pdf, other

    eess.AS cs.SD eess.SP q-bio.QM

    EEG-Derived Voice Signature for Attended Speaker Detection

    Authors: Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

    Abstract: \textit{Objective:} Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of t… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 8 pages, 2 figures

  35. arXiv:2308.06547  [pdf, other

    eess.AS cs.CL cs.SD

    Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

    Authors: Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

    Abstract: When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either fi… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

  36. arXiv:2308.02531  [pdf

    eess.AS cs.AI cs.SD

    Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

    Authors: Jiuyang Zhou, Hong Zhu, ** Wang

    Abstract: Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship between long-distance notes. In this paper, we propose a polyphonic music generation neural network named Choir Transformer[ https://github.com/Zjy0401/choir-tran… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  37. arXiv:2308.02412  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

    Authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

    Abstract: Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying d… ▽ More

    Submitted 19 July, 2023; originally announced August 2023.

  38. arXiv:2307.12871  [pdf, other

    eess.SY

    Topology-aware Piecewise Linearization of the AC Power Flow through Generative Modeling

    Authors: Young-ho Cho, Hao Zhu

    Abstract: Effective power flow modeling critically affects the ability to efficiently solve large-scale grid optimization problems, especially those with topology-related decision variables. In this work, we put forth a generative modeling approach to obtain a piecewise linear (PWL) approximation of AC power flow by training a simple neural network model from actual data samples. By using the ReLU activatio… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  39. arXiv:2306.15433  [pdf, other

    eess.SP

    Recursive LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems to Save Computations and Memories

    Authors: Hufei Zhu, Fuqin Deng, Yikui Zhai, Jiaming Zhong, Yanyang Liang

    Abstract: Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the ha… ▽ More

    Submitted 5 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  40. arXiv:2305.08541  [pdf, other

    cs.SD eess.AS

    Ripple sparse self-attention for monaural speech enhancement

    Authors: Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

    Abstract: The use of Transformer represents a recent success in speech enhancement. However, as its core component, self-attention suffers from quadratic complexity, which is computationally prohibited for long speech recordings. Moreover, it allows each time frame to attend to all time frames, neglecting the strong local correlations of speech signals. This study presents a simple yet effective sparse self… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 5 pages, ICASSP 2023 published

  41. arXiv:2305.05152  [pdf, other

    cs.SD cs.MM eess.AS

    Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

    Authors: Yanzhen Ren, Hongcheng Zhu, Liming Zhai, Zongkun Sun, Rubing Shen, Lina Wang

    Abstract: Voice conversion (VC), as a voice style transfer technology, is becoming increasingly prevalent while raising serious concerns about its illegal use. Proactively tracing the origins of VC-generated speeches, i.e., speaker traceability, can prevent the misuse of VC, but unfortunately has not been extensively studied. In this paper, we are the first to investigate the speaker traceability for VC and… ▽ More

    Submitted 26 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: has been accepted by ACM MM 2023

  42. arXiv:2305.04294   

    eess.IV cs.CV

    PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement

    Authors: Zhen Huang, Han Li, Shitong Shao, Heqin Zhu, Huijie Hu, Zhiwei Cheng, Jianji Wang, S. Kevin Zhou

    Abstract: The pelvis, the lower part of the trunk, supports and balances the trunk. Landmark detection from a pelvic X-ray (PXR) facilitates downstream analysis and computer-assisted diagnosis and treatment of pelvic diseases. Although PXRs have the advantages of low radiation and reduced cost compared to computed tomography (CT) images, their 2D pelvis-tissue superposition of 3D structures confuses clinica… ▽ More

    Submitted 7 June, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

    Comments: will revise it and resubmit it again later

  43. arXiv:2304.11039  [pdf, other

    cs.IT eess.SP

    An Optimization Framework For Anomaly Detection Scores Refinement With Side Information

    Authors: Ali Maatouk, Fadhel Ayed, Wenjie Li, Yu Wang, Hong Zhu, Jiantao Ye

    Abstract: This paper considers an anomaly detection problem in which a detection algorithm assigns anomaly scores to multi-dimensional data points, such as cellular networks' Key Performance Indicators (KPIs). We propose an optimization framework to refine these anomaly scores by leveraging side information in the form of a causality graph between the various features of the data points. The refinement bloc… ▽ More

    Submitted 30 August, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  44. arXiv:2304.02606  [pdf, other

    eess.SP

    Two-Timescale Design for RIS-aided Cell-free Massive MIMO Systems with Imperfect CSI

    Authors: Mahdi Eskandari, Kangda Zhi, Huiling Zhu, Cunhua Pan, Jiangzhou Wang

    Abstract: The objective of this paper is to evaluate the effectiveness of a two-timescale transmission design in cell-free massive multi-input multiple-output (MIMO) systems incorporating reconfigurable intelligent surfaces (RISs) under the assumption of imperfect channel state information (CSI). We examine the Rician channel model and formulate the passive beamforming for the RISs based on statistical chan… ▽ More

    Submitted 14 March, 2023; originally announced April 2023.

  45. arXiv:2304.00837  [pdf, other

    cs.CV eess.SP

    Disorder-invariant Implicit Neural Representation

    Authors: Hao Zhu, Shaowen Xie, Zhen Liu, Fengyi Liu, Qi Zhang, You Zhou, Yi Lin, Zhan Ma, Xun Cao

    Abstract: Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the expressive power of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be greatly solved by re-arranging the coordinates of the… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Journal extension of the CVPR'23 highlight paper "DINER: Disorder-invariant Implicit Neural Representation". In the extension, we model the expressive power of the DINER using parametric functions in the attribute space. As a result, better results are achieved than the conference version. arXiv admin note: substantial text overlap with arXiv:2211.07871

  46. arXiv:2303.07558  [pdf, other

    eess.SY math.OC

    Optimal Power System Topology Control Under Uncertain Wildfire Risk

    Authors: Yuqi Zhou, Kaarthik Sundar, Deepjyoti Deka, Hao Zhu

    Abstract: Wildfires pose a significant threat to the safe and reliable operation of electric power systems. They can quickly spread and cause severe damage to power infrastructure. To reduce the risk, public safety power shutoffs are often used to restore power balance and prevent widespread blackouts. However, the unpredictability of wildfires makes it challenging to implement effective counter-measures in… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: 10 pages, 6 figures

  47. arXiv:2303.03575  [pdf, other

    stat.ME eess.SP stat.CO

    Adaptive Importance Sampling and Quasi-Monte Carlo Methods for 6G URLLC Systems

    Authors: Xiongwen Ke, Houying Zhu, Kai Yi, Gaoning He, Ganghua Yang, Yu Guang Wang

    Abstract: In this paper, we propose an efficient simulation method based on adaptive importance sampling, which can automatically find the optimal proposal within the Gaussian family based on previous samples, to evaluate the probability of bit error rate (BER) or word error rate (WER). These two measures, which involve high-dimensional black-box integration and rare-event sampling, can characterize the per… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: importance sampling for system model

  48. Gap-closing Matters: Perceptual Quality Evaluation and Optimization of Low-Light Image Enhancement

    Authors: Baoliang Chen, Lingyu Zhu, Hanwei Zhu, Wenhan Yang, Linqi Song, Shiqi Wang

    Abstract: There is a growing consensus in the research community that the optimization of low-light image enhancement approaches should be guided by the visual quality perceived by end users. Despite the substantial efforts invested in the design of low-light enhancement algorithms, there has been comparatively limited focus on assessing subjective and objective quality systematically. To mitigate this gap… ▽ More

    Submitted 20 June, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Basis Angle Consistency in Sec.3.2 will be revised

  49. arXiv:2302.08660  [pdf, ps, other

    eess.SP

    Improved Recursive Algorithms for V-BLAST to Save Computations and Memories

    Authors: Hufei Zhu, Yanyang Liang, Fuqin Deng, Genquan Chen, Jiaming Zhong

    Abstract: For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. This paper pr… ▽ More

    Submitted 5 December, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  50. arXiv:2302.05812  [pdf, other

    eess.SP cs.AR

    Software-Defined MIMO OFDM Joint Radar-Communication Platform with Fully Digital mmWave Architecture

    Authors: Ceyhun D. Ozkaptan, Haocheng Zhu, Eylem Ekici, Onur Altintas

    Abstract: Large-scale deployment of connected vehicles with cooperative sensing and maneuvering technologies increases the demand for vehicle-to-everything communication (V2X) band in 5.9 GHz. Besides the V2X spectrum, the under-utilized millimeter-wave (mmWave) bands at 24 and 77 GHz can be leveraged to supplement V2X communication and support high data rates for emerging broadband applications. For this p… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: To appear at 3rd IEEE International Symposium on Joint Communications & Sensing (JC&S 2023)

    ACM Class: B.4.1; B.4.5