Search | arXiv e-print repository

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

Authors: Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

Abstract: The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In thi… ▽ More The use of Transformer architectures has facilitated remarkable progress in speech enhancement. Training Transformers using substantially long speech utterances is often infeasible as self-attention suffers from quadratic complexity. It is a critical and unexplored challenge for a Transformer-based speech enhancement model to learn from short speech utterances and generalize to longer ones. In this paper, we conduct comprehensive experiments to explore the length generalization problem in speech enhancement with Transformer. Our findings first establish that position embedding provides an effective instrument to alleviate the impact of utterance length on Transformer-based speech enhancement. Specifically, we explore four different position embedding schemes to enable length generalization. The results confirm the superiority of relative position embeddings (RPEs) over absolute PE (APEs) in length generalization. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.11336 [pdf, other]

LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model

Authors: Mingyang Gao, Suyang Zhou, Wei Gu, Zhi Wu, Zijian Hu, Hong Zhu, Haiquan Liu

Abstract: Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent… ▽ More Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent years, pre-trained language models (PLMs) have made significant advancements, demonstrating superior performance in various fields. This paper proposes a load forecasting method based on PLMs, which offers not only accurate predictive ability but also general and flexible applicability. Additionally, a data modeling method is proposed to effectively transform load sequence data into natural language for PLM training. Furthermore, we introduce a data enhancement strategy that eliminate the impact of PLM hallucinations on forecasting results. The effectiveness of the proposed method has been validated on two real-world datasets. Compared with existing methods, our approach shows state-of-the-art performance across all validation metrics. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures and 5 tables

arXiv:2406.10897 [pdf, ps, other]

When NOMA Meets AIGC: Enhanced Wireless Federated Learning

Authors: Ding Xu, Lingjie Duan, Hongbo Zhu

Abstract: Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types… ▽ More Wireless federated learning (WFL) enables devices to collaboratively train a global model via local model training, uploading and aggregating. However, WFL faces the data scarcity/heterogeneity problem (i.e., data are limited and unevenly distributed among devices) that degrades the learning performance. In this regard, artificial intelligence generated content (AIGC) can synthesize various types of data to compensate for the insufficient local data. Nevertheless, downloading synthetic data or uploading local models iteratively takes a lot of time, especially for a large amount of devices. To address this issue, we propose to leverage non-orthogonal multiple access (NOMA) to achieve efficient synthetic data and local model transmission. This paper is the first to combine AIGC and NOMA with WFL to maximally enhance the learning performance. For the proposed NOMA+AIGC-enhanced WFL, the problem of jointly optimizing the synthetic data distribution, two-way communication and computation resource allocation to minimize the global learning error is investigated. The problem belongs to NP-hard mixed integer nonlinear programming, whose optimal solution is intractable to find. We first employ the block coordinate descent method to decouple the complicated-coupled variables, and then resort to our analytical method to derive an efficient low-complexity local optimal solution with partial closed-form results. Extensive simulations validate the superiority of the proposed scheme compared to the existing and benchmark schemes such as the frequency/time division multiple access based AIGC-enhanced schemes. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages, submitted to IEEE TWC for possible publication

arXiv:2406.10895 [pdf, ps, other]

Fair Computation Offloading for RSMA-Assisted Mobile Edge Computing Networks

Authors: Ding Xu, Lingjie Duan, Haitao Zhao, Hongbo Zhu

Abstract: Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issu… ▽ More Rate splitting multiple access (RSMA) provides a flexible transmission framework that can be applied in mobile edge computing (MEC) systems. However, the research work on RSMA-assisted MEC systems is still at the infancy and many design issues remain unsolved, such as the MEC server and channel allocation problem in general multi-server and multi-channel scenarios as well as the user fairness issues. In this regard, we study an RSMA-assisted MEC system with multiple MEC servers, channels and devices, and consider the fairness among devices. A max-min fairness computation offloading problem to maximize the minimum computation offloading rate is investigated. Since the problem is difficult to solve optimally, we develop an efficient algorithm to obtain a suboptimal solution. Particularly, the time allocation and the computing frequency allocation are derived as closed-form functions of the transmit power allocation and the successive interference cancellation (SIC) decoding order, while the SIC decoding order is obtained heuristically, and the bisection search and the successive convex approximation methods are employed to optimize the transmit power allocation. For the MEC server and channel allocation problem, we transform it into a hypergraph matching problem and solve it by matching theory. Simulation results demonstrate that the proposed RSMA-assisted MEC system outperforms current MEC systems under various system setups. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages,submitted to IEEE TWC for possible publication

arXiv:2405.19298 [pdf, other]

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA) model, which is capable of producing qualitatively comparative responses and effectively translating these discrete comparative levels into a continuous quality score. Specifically, during training, we present to generate scaled-up comparative instructions by comparing images from the same IQA dataset, allowing for more flexible integration of diverse IQA datasets. Utilizing the established large-scale training corpus, we develop a human-like visual quality comparator. During inference, moving beyond binary choices, we propose a soft comparison method that calculates the likelihood of the test image being preferred over multiple predefined anchor images. The quality score is further optimized by maximum a posteriori estimation with the resulting probability matrix. Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios. Moreover, we verify that the probability-matrix-based inference conversion not only improves the rating accuracy of Compare2Score but also zero-shot general-purpose LMMs, suggesting its intrinsic effectiveness. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17496 [pdf, other]

UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Authors: Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang

Abstract: Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, w… ▽ More Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, we propose a new UU-Mamba model, integrating the U-Mamba model with the Sharpness-Aware Minimization (SAM) optimizer and an uncertainty-aware loss function. SAM enhances generalization by locating flat minima in the loss landscape, thus reducing overfitting. The uncertainty-aware loss combines region-based, distribution-based, and pixel-based loss designs to improve segmentation accuracy and robustness. Evaluation of our method is performed on the ACDC cardiac dataset, outperforming state-of-the-art models including TransUNet, Swin-Unet, nnUNet, and nnFormer. Our approach achieves Dice Similarity Coefficient (DSC) and Mean Squared Error (MSE) scores, demonstrating its effectiveness in cardiac MRI segmentation. △ Less

Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.05521 [pdf, other]

Machine Learning for Scalable and Optimal Load Shedding Under Power System Contingency

Authors: Yuqi Zhou, Hao Zhu

Abstract: Prompt and effective corrective actions in response to unexpected contingencies are crucial for improving power system resilience and preventing cascading blackouts. The optimal load shedding (OLS) accounting for network limits has the potential to address the diverse system-wide impacts of contingency scenarios as compared to traditional local schemes. However, due to the fast cascading propagati… ▽ More Prompt and effective corrective actions in response to unexpected contingencies are crucial for improving power system resilience and preventing cascading blackouts. The optimal load shedding (OLS) accounting for network limits has the potential to address the diverse system-wide impacts of contingency scenarios as compared to traditional local schemes. However, due to the fast cascading propagation of initial contingencies, real-time OLS solutions are challenging to attain in large systems with high computation and communication needs. In this paper, we propose a decentralized design that leverages offline training of a neural network (NN) model for individual load centers to autonomously construct the OLS solutions from locally available measurements. Our learning-for-OLS approach can greatly reduce the computation and communication needs during online emergency responses, thus preventing the cascading propagation of contingencies for enhanced power grid resilience. Numerical studies on both the IEEE 118-bus system and a synthetic Texas 2000-bus system have demonstrated the efficiency and effectiveness of our scalable OLS learning design for timely power system emergency operations. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.14036 [pdf, ps, other]

doi 10.1109/ICASSP48485.2024.10446858

Optimal Structure of Receive Beamforming for Over-the-Air Computation

Authors: Hongbin Zhu, Hua Qian

Abstract: We investigate fast data aggregation via over-the-air computation (AirComp) over wireless networks. In this scenario, an access point (AP) with multiple antennas aims to recover the arithmetic mean of sensory data from multiple wireless devices. To minimize estimation distortion, we formulate a mean-squared-error (MSE) minimization problem that considers joint optimization of transmit scalars at w… ▽ More We investigate fast data aggregation via over-the-air computation (AirComp) over wireless networks. In this scenario, an access point (AP) with multiple antennas aims to recover the arithmetic mean of sensory data from multiple wireless devices. To minimize estimation distortion, we formulate a mean-squared-error (MSE) minimization problem that considers joint optimization of transmit scalars at wireless devices, denoising factor, and receive beamforming vector at the AP. We derive closed-form expressions for the transmit scalars and denoising factor, resulting in a non-convex quadratic constrained quadratic programming (QCQP) problem concerning the receive beamforming vector. To tackle the computational complexity of the beamforming design, particularly relevant in massive multiple-input multiple-output (MIMO) AirComp systems, we explore the optimal structure of receive beamforming using successive convex approximation (SCA) and Lagrange duality. By leveraging the proposed optimal beamforming structure, we develop two efficient algorithms based on SCA and semi-definite relaxation (SDR). These algorithms enable fast wireless aggregation with low computational complexity and yield almost identical mean square error (MSE) performance compared to baseline algorithms. Simulation results validate the effectiveness of our proposed methods. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Published on IEEE ICASSP 2024

arXiv:2404.11313 [pdf, other]

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2403.16150 [pdf, other]

Fusion of Active and Passive Measurements for Robust and Scalable Positioning

Authors: Hong Zhu, Alexander Venus, Erik Leitinger, Stefan Tertinek, Klaus Witrisal

Abstract: This paper addresses the challenge of achieving reliable and robust positioning of a mobile agent, such as a radio device carried by a person, in scenarios where direct line-of-sight (LOS) links are obstructed or unavailable. The human body is considered as an extended object that scatters, attenuates and blocks the radio signals. We propose a novel particle-based sum-product algorithm (SPA) that… ▽ More This paper addresses the challenge of achieving reliable and robust positioning of a mobile agent, such as a radio device carried by a person, in scenarios where direct line-of-sight (LOS) links are obstructed or unavailable. The human body is considered as an extended object that scatters, attenuates and blocks the radio signals. We propose a novel particle-based sum-product algorithm (SPA) that fuses active measurements between the agent and anchors with passive measurements from pairs of anchors reflected off the body. We first formulate radio signal models for both active and passive measurements. Then, a joint tracking algorithm that utilizes both active and passive measurements is developed for the extended object. The algorithm exploits the probabilistic data association (PDA) for multiple object-related measurements. The results demonstrate superior accuracy during and after the obstructed line-of-sight (OLOS) situation, outperforming conventional methods that solely rely on active measurements. The proposed joint estimation approach significantly enhances the localization robustness via radio sensing. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.10815 [pdf, other]

MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

Authors: Mude Hui, Zihao Wei, Hongru Zhu, Fei Xia, Yuyin Zhou

Abstract: Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yiel… ▽ More Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yield incomplete outputs and Denoising Diffusion Probabilistic Models (DDPM) excel at capturing details, our method integrates INR's structural coherence with DDPM's fine-detail enhancement capabilities. We pretrain an INR model to transform 2D axially-projected images into a preliminary 3D volume. This pretrained INR acts as a global prior guiding DDPM's generative process through a linear interpolation between INR outputs and noise inputs. This strategy enriches the diffusion process with structured 3D information, enhancing detail and reducing noise in localized 2D images. By conditioning the diffusion model on the closest 2D projection, MicroDiffusion substantially enhances fidelity in resulting 3D reconstructions, surpassing INR and standard DDPM outputs with unparalleled image quality and structural fidelity. Our code and dataset are available at https://github.com/UCSC-VLAA/MicroDiffusion. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2403.08247 [pdf, other]

A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Authors: Hongyang Zhu, Xin Lu, Yanwei Qin, Xinran Yu, Tianjiao Sun, Yunsong Zhao

Abstract: Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on th… ▽ More Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on the sinogram by innovatively updating the response inconsistency compensation coefficients of detector units, which is achieved by employing the group sparse constraint and the projection-view direction sparse constraint on the stripe artifacts. Simultaneously, we apply the sparse constraint on the reconstructed image to further rectified ring artifacts in the image domain. The key advantage of the proposed method lies in considering the relationship between the response inconsistency compensation coefficients of the detector units and the projection views, which enables a more accurate correction of the response of the detector units. An alternating minimization method is designed to solve the model. Comparative experiments on real photon counting detector data demonstrate that the proposed method not only surpasses existing methods in removing ring artifacts but also excels in preserving structural details and image fidelity. △ Less

Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.05912 [pdf, other]

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

Authors: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

Abstract: Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st… ▽ More Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent studies have attempted to enhance SAM with medical expertise by pre-training on large-scale medical segmentation datasets. However, challenges still exist in 3D tumor lesion segmentation owing to tumor complexity and the imbalance in foreground and background regions. Therefore, we introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation. We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks, facilitating the generation of more precise segmentation masks. Furthermore, an iterative refinement scheme is implemented in M-SAM to refine the segmentation masks progressively, leading to improved performance. Extensive experiments on seven tumor lesion segmentation datasets indicate that our M-SAM not only achieves high segmentation accuracy but also exhibits robust generalization. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.19387 [pdf, other]

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Authors: Bingchen Li, Xin Li, Hanxin Zhu, Yeying **, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen

Abstract: Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and caus… ▽ More Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and causes counter-intuitive generation results. To mitigate this, we propose the simple and effective Semantic-aware Discriminator (denoted as SeD), which encourages the SR network to learn the fine-grained distributions by introducing the semantics of images as a condition. Concretely, we aim to excavate the semantics of images from a well-trained semantic extractor. Under different semantics, the discriminator is able to distinguish the real-fake images individually and adaptively, which guides the SR network to learn the more fine-grained semantic-aware textures. To obtain accurate and abundant semantics, we take full advantage of recently popular pretrained vision models (PVMs) with extensive datasets, and then incorporate its semantic features into the discriminator through a well-designed spatial cross-attention module. In this way, our proposed semantic-aware discriminator empowered the SR network to produce more photo-realistic and pleasing images. Extensive experiments on two typical tasks, i.e., SR and Real SR have demonstrated the effectiveness of our proposed methods. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: CVPR2024

arXiv:2402.17797 [pdf, other]

Neural Radiance Fields in Medical Imaging: Challenges and Next Steps

Authors: Xin Wang, Shu Hu, Heng Fan, Hongtu Zhu, Xin Li

Abstract: Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data. However, they face unique challenges when applied to medical applications. This paper presents a comprehensive examination of applications of NeRFs in medical imaging, hig… ▽ More Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data. However, they face unique challenges when applied to medical applications. This paper presents a comprehensive examination of applications of NeRFs in medical imaging, highlighting four imminent challenges, including fundamental imaging principles, inner structure requirement, object boundary definition, and color density significance. We discuss current methods on different organs and discuss related limitations. We also review several datasets and evaluation metrics and propose several promising directions for future research. △ Less

Submitted 21 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.14025 [pdf, other]

Spectral Efficiency Maximization for Active RIS-aided Cell-Free Massive MIMO Systems with Imperfect CSI

Authors: Mahdi Eskandari, Huiling Zhu, Jiangzhou Wang

Abstract: A cell-free network merged with active reconfigurable reflecting surfaces (RIS) is investigated in this paper. Based on the imperfect channel state information (CSI), the aggregated channel from the user to the access point (AP) is initially estimated using the linear minimum mean square error (LMMSE) technique. The central processing unit (CPU) then detects uplink data from individual users throu… ▽ More A cell-free network merged with active reconfigurable reflecting surfaces (RIS) is investigated in this paper. Based on the imperfect channel state information (CSI), the aggregated channel from the user to the access point (AP) is initially estimated using the linear minimum mean square error (LMMSE) technique. The central processing unit (CPU) then detects uplink data from individual users through the utilization of the maximum ratio combining (MRC) approach, relying on the estimated channel. Then, a closed-form expression for uplink spectral efficiency (SE) is derived which demonstrates its reliance on statistical CSI (S-CSI) alone. The amplitude gain of each active RIS element is derived in a closed-form expression as a function of the number of active RIS elements, the number of users, and the size of each reflecting element. A soft actor-critic (SAC) algorithm is utilized to design the phase shift of the active RIS to maximize the uplink SE. Simulation results emphasize the robustness of the proposed SAC algorithm, showcasing its effectiveness in cell-free networks under the influence of imperfect CSI. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.11294 [pdf, other]

Power Optimization for Integrated Active and Passive Sensing in DFRC Systems

Authors: Xingliang Lou, Wenchao Xia, Kai-Kit Wong, Haitao Zhao, Tony Q. S. Quek, Hongbo Zhu

Abstract: Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user… ▽ More Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user equipments while detecting a target according to echo signals. In contrast, passive sensing is performed at the receive access points (RAPs). We consider both the cases where the capacity of the backhaul links between the RAPs and BS is unlimited or limited and adopt different fusion strategies. Specifically, when the backhaul capacity is unlimited, the BS and RAPs transfer sensing signals they have received to the central controller (CC) for signal fusion. The CC processes the signals and leverages the generalized likelihood ratio test detector to determine the present of a target. However, when the backhaul capacity is limited, each RAP, as well as the BS, makes decisions independently and sends its binary inference results to the CC for result fusion via voting aggregation. Then, aiming at maximize the target detection probability under communication quality of service constraints, two power optimization algorithms are proposed. Finally, numerical simulations demonstrate that the sensing performance in case of unlimited backhaul capacity is much better than that in case of limited backhaul capacity. Moreover, it implied that the proposed IAPS scheme outperforms only-passive and only-active sensing schemes, especially in unlimited capacity case. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.06875 [pdf, other]

Disentangled Latent Energy-Based Style Translation: An Image-Level Structural MRI Harmonization Framework

Authors: Mengqi Wu, Lintao Zhang, Pew-Thian Yap, Hongtu Zhu, Mingxia Liu

Abstract: Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However,… ▽ More Brain magnetic resonance imaging (MRI) has been extensively employed across clinical and research fields, but often exhibits sensitivity to site effects arising from non-biological variations such as differences in field strength and scanner vendors. Numerous retrospective MRI harmonization techniques have demonstrated encouraging outcomes in reducing the site effects at the image level. However, existing methods generally suffer from high computational requirements and limited generalizability, restricting their applicability to unseen MRIs. In this paper, we design a novel disentangled latent energy-based style translation (DLEST) framework for unpaired image-level MRI harmonization, consisting of (a) site-invariant image generation (SIG), (b) site-specific style translation (SST), and (c) site-specific MRI synthesis (SMS). Specifically, the SIG employs a latent autoencoder to encode MRIs into a low-dimensional latent space and reconstruct MRIs from latent codes. The SST utilizes an energy-based model to comprehend the global latent distribution of a target domain and translate source latent codes toward the target domain, while SMS enables MRI synthesis with a target-specific style. By disentangling image generation and style translation in latent space, the DLEST can achieve efficient style translation. Our model was trained on T1-weighted MRIs from a public dataset (with 3,984 subjects across 58 acquisition sites/settings) and validated on an independent dataset (with 9 traveling subjects scanned in 11 sites/settings) in four tasks: histogram and feature visualization, site classification, brain tissue segmentation, and site-specific structural MRI synthesis. Qualitative and quantitative results demonstrate the superiority of our method over several state-of-the-arts. △ Less

Submitted 29 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.03394 [pdf, other]

Artificial Intelligence in Image-based Cardiovascular Disease Analysis: A Comprehensive Survey and Future Outlook

Authors: Xin Wang, Hongtu Zhu

Abstract: Recent advancements in Artificial Intelligence (AI) have significantly influenced the field of Cardiovascular Disease (CVD) analysis, particularly in image-based diagnostics. Our paper presents an extensive review of AI applications in image-based CVD analysis, offering insights into its current state and future potential. We systematically categorize the literature based on the primary anatomical… ▽ More Recent advancements in Artificial Intelligence (AI) have significantly influenced the field of Cardiovascular Disease (CVD) analysis, particularly in image-based diagnostics. Our paper presents an extensive review of AI applications in image-based CVD analysis, offering insights into its current state and future potential. We systematically categorize the literature based on the primary anatomical structures related to CVD, dividing them into non-vessel structures (such as ventricles and atria) and vessel structures (including the aorta and coronary arteries). This categorization provides a structured approach to explore various imaging modalities like Magnetic Resonance Imaging (MRI), which are commonly used in CVD research. Our review encompasses these modalities, giving a broad perspective on the diverse imaging techniques integrated with AI for CVD analysis. Additionally, we compile a list of publicly accessible cardiac image datasets and code repositories, intending to support research reproducibility and facilitate data and algorithm sharing within the community. We conclude with an examination of the challenges and limitations inherent in current AI-based CVD analysis methods and suggest directions for future research to overcome these hurdles. △ Less

Submitted 22 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02735 [pdf, other]

Timed-Elastic-Band Based Variable Splitting for Autonomous Trajectory Planning

Authors: Hao Zhu, Kefan **, Rui Gao, Jialin Wang, C. -J. Richard Shi

Abstract: Existing trajectory planning methods are struggling to handle the issue of autonomous track swinging during navigation, resulting in significant errors when reaching the destination. In this article, we address autonomous trajectory planning problems, which aims at develo** innovative solutions to enhance the adaptability and robustness of unmanned systems in navigating complex and dynamic envir… ▽ More Existing trajectory planning methods are struggling to handle the issue of autonomous track swinging during navigation, resulting in significant errors when reaching the destination. In this article, we address autonomous trajectory planning problems, which aims at develo** innovative solutions to enhance the adaptability and robustness of unmanned systems in navigating complex and dynamic environments. We first introduce the variable splitting (VS) method as a constrained optimization method to reimagine the renowned Timed-Elastic-Band (TEB) algorithm, resulting in a novel collision avoidance approach named Timed-Elastic-Band based variable splitting (TEB-VS). The proposed TEB-VS demonstrates superior navigation stability, while maintaining nearly identical resource consumption to TEB. We then analyze the convergence of the proposed TEB-VS method. To evaluate the effectiveness and efficiency of TEB-VS, extensive experiments have been conducted using TurtleBot2 in both simulated environments and real-world datasets. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.09686 [pdf, other]

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Authors: Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

Abstract: Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform… ▽ More Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform a comprehensive empirical study evaluating five positional encoding methods, i.e., Sinusoidal and learned absolute position embedding (APE), T5-RPE, KERPLE, as well as the Transformer without positional encoding (No-Pos), across both causal and noncausal configurations. We conduct extensive speech enhancement experiments, involving spectral map** and masking methods. Our findings establish that positional encoding is not quite helpful for the models in a causal configuration, which indicates that causal attention may implicitly incorporate position information. In a noncausal configuration, the models significantly benefit from the use of positional encoding. In addition, we find that among the four position embeddings, relative position embeddings outperform APEs. △ Less

Submitted 13 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.04965 [pdf]

ConvConcatNet: a deep convolutional neural network to reconstruct mel spectrogram from the EEG

Authors: Xiran Xu, Bo Wang, Yujie Yan, Haolin Zhu, Zechen Zhang, Xihong Wu, **g Chen

Abstract: To investigate the processing of speech in the brain, simple linear models are commonly used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly dynamic and complex non-linear system like the brain. Although non-linear methods with neural networks have been developed recently, reconstructing unseen stimuli from unse… ▽ More To investigate the processing of speech in the brain, simple linear models are commonly used to establish a relationship between brain signals and speech features. However, these linear models are ill-equipped to model a highly dynamic and complex non-linear system like the brain. Although non-linear methods with neural networks have been developed recently, reconstructing unseen stimuli from unseen subjects' EEG is still a highly challenging task. This work presents a novel method, ConvConcatNet, to reconstruct mel-specgrams from EEG, in which the deep convolution neural network and extensive concatenation operation were combined. With our ConvConcatNet model, the Pearson correlation between the reconstructed and the target mel-spectrogram can achieve 0.0420, which was ranked as No.1 in the Task 2 of the Auditory EEG Challenge. The codes and models to implement our work will be available on Github: https://github.com/xuxiran/ConvConcatNet △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 2 pages, 1 figure, 2 tables

arXiv:2401.04964 [pdf]

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording

Authors: Bo Wang, Xiran Xu, Zechen Zhang, Haolin Zhu, YuJie Yan, Xihong Wu, **g Chen

Abstract: Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits… ▽ More Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits of using self-supervised speech representation and contextual text embedding. Through feature fusion and model ensemble, an accuracy of 60.29% was achieved, and the performance was ranked as No.2 in Task 1 of the Auditory EEG Challenge (ICASSP 2024). The code to implement our work is available on Github: https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch. △ Less

Submitted 31 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 2 pages, 2 figures, accepted by ICASSP 2024

arXiv:2311.09857 [pdf]

Integrated lithium niobate photonic millimeter-wave radar

Authors: Sha Zhu, Yiwen Zhang, Jiaxue Feng, Yongji Wang, Kunpeng Zhai, Hanke Feng, Edwin Yue Bun Pun, Ning Hua Zhu, Cheng Wang

Abstract: Millimeter-wave (mmWave,>30 GHz) radars are the key enabler in the coming 6G era for high-resolution sensing and detection of targets. Photonic radar provides an effective approach to overcome the limitations of electronic radars thanks to the high frequency, broad bandwidth, and excellent reconfigurability of photonic systems. However, conventional photonic radars are mostly realized in tabletop… ▽ More Millimeter-wave (mmWave,>30 GHz) radars are the key enabler in the coming 6G era for high-resolution sensing and detection of targets. Photonic radar provides an effective approach to overcome the limitations of electronic radars thanks to the high frequency, broad bandwidth, and excellent reconfigurability of photonic systems. However, conventional photonic radars are mostly realized in tabletop systems composed of bulky discrete components, whereas the more compact integrated photonic radars are difficult to reach the mmWave bands due to the unsatisfactory bandwidths and signal integrity of the underlining electro-optic modulators. Here, we overcome these challenges and demonstrate a centimeter-resolution integrated photonic radar operating in the mmWave V band (40-50 GHz) based on a 4-inch wafer-scale thin-film lithium niobate (TFLN) technology. The fabricated TFLN mmWave photonic integrated circuit consists of a first electro-optic modulator capable of generating a broadband linear frequency modulated mmWave radar waveform through optical frequency multiplication of a low-frequency input signal, and a second electro-optic modulator responsible for frequency de-chirp of the received reflected echo wave, therefore greatly relieving the bandwidth requirements for the analog-to-digital converter in the receiver. Thanks to the absence of optical and electrical filters in the system, our integrated photonic mmWave radar features continuous on-demand tunability of the center frequency and bandwidth, currently only limited by the bandwidths of electrical amplifiers. We achieve multi-target ranging with a resolution of 1.50 cm and velocity measurement with a resolution of 0.067 m/s. Furthermore, we construct an inverse synthetic aperture radar (ISAR) and successfully demonstrate the imaging of targets with various shapes and postures with a two-dimensional resolution of 1.50 cm * 1.06 cm. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.09843 [pdf]

CoCoFormer: A controllable feature-rich polyphonic music generation method

Authors: Jiuyang Zhou, Tengfei Niu, Hong Zhu, ** Wang

Abstract: This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music t… ▽ More This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music textures. This paper proposed Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level. In this paper, the self-supervised method improves the loss function and performs joint training through conditional control input and unconditional input training. In order to alleviate the lack of diversity on generated samples caused by the teacher forcing training, this paper added an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms. In this paper, the experiments proves that CoCoFormer has reached the current better level than current models. On the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways. △ Less

Submitted 27 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.08960 [pdf, other]

A unified framework for STAR-RIS coefficients optimization

Authors: Hancheng Zhu, Yuanwei Liu, Yik Chung Wu, Vincent K. N. Lau

Abstract: Simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which serves users located on both sides of the surface, has recently emerged as a promising enhancement to the traditional reflective only RIS. Due to the lack of a unified comparison of communication systems equipped with different modes of STAR-RIS and the performance degradation caused by the constraint… ▽ More Simultaneously transmitting and reflecting (STAR) reconfigurable intelligent surface (RIS), which serves users located on both sides of the surface, has recently emerged as a promising enhancement to the traditional reflective only RIS. Due to the lack of a unified comparison of communication systems equipped with different modes of STAR-RIS and the performance degradation caused by the constraints involving discrete selection, this paper proposes a unified optimization framework for handling the STAR-RIS operating mode and discrete phase constraints. With a judiciously introduced penalty term, this framework transforms the original problem into two iterative subproblems, with one containing the selection-type constraints, and the other subproblem handling other wireless resource. Convergent point of the whole algorithm is found to be at least a stationary point under mild conditions. As an illustrative example, the proposed framework is applied to a sum-rate maximization problem in the downlink transmission. Simulation results show that the algorithms from the proposed framework outperform other existing algorithms tailored for different STAR-RIS scenarios. Furthermore, it is found that 4 or even 2 discrete phases STAR-RIS could achieve almost the same sum-rate performance as the continuous phase setting, showing for the first time that discrete phase is not necessarily a cause of significant performance degradation. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.06328 [pdf, other]

Antenna Response Consistency Driven Self-supervised Learning for WIFI-based Human Activity Recognition

Authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

Abstract: Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate a… ▽ More Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space. To address this challenge, we introduce \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency (ARC) as a solution to define proper alignment criteria. ARC is designed to retain semantic information from the input space while introducing robustness to real-world noise. Moreover, we substantiate the effectiveness of ARC through a comprehensive set of experiments, demonstrating its capability to enhance the performance of self-supervised learning for WiFi-based HAR by achieving an increase of over 5\% in accuracy in most cases and achieving a best accuracy of 94.97\%. △ Less

Submitted 28 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.02792 [pdf, other]

doi 10.1109/TMI.2024.3419780

Continuous 3D Myocardial Motion Tracking via Echocardiography

Authors: Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin

Abstract: Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally. However, current techniques suffer from incomplete and inaccurate motion estimation of the myocardium in both spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. To address these challenge… ▽ More Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of cardiovascular diseases (CVDs), the foremost cause of death globally. However, current techniques suffer from incomplete and inaccurate motion estimation of the myocardium in both spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. To address these challenges, this paper introduces the Neural Cardiac Motion Field (NeuralCMF). NeuralCMF leverages implicit neural representation (INR) to model the 3D structure and the comprehensive 6D forward/backward motion of the heart. This method surpasses pixel-wise limitations by offering the capability to continuously query the precise shape and motion of the myocardium at any specific point throughout the cardiac cycle, enhancing the detailed analysis of cardiac dynamics beyond traditional speckle tracking. Notably, NeuralCMF operates without the need for paired datasets, and its optimization is self-supervised through the physics knowledge priors in both space and time dimensions, ensuring compatibility with both 2D and 3D echocardiogram video inputs. Experimental validations across three representative datasets support the robustness and innovative nature of the NeuralCMF, marking significant advantages over existing state-of-the-art methods in cardiac imaging and motion tracking. △ Less

Submitted 27 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 18 pages, 11 figures

Journal ref: IEEE Transactions on Medical Imaging, June 2024

arXiv:2310.01656 [pdf, other]

Data-driven Forced Oscillation Localization using Inferred Impulse Responses

Authors: Shaohui Liu, Hao Zhu, Vassilis Kekatos

Abstract: Poorly damped oscillations pose threats to the stability and reliability of interconnected power systems. In this work, we propose a comprehensive data-driven framework for inferring the sources of forced oscillation (FO) using solely synchrophasor measurements. During normal grid operations, fast-rate ambient data are collected to recover the impulse responses in the small-signal regime, without… ▽ More Poorly damped oscillations pose threats to the stability and reliability of interconnected power systems. In this work, we propose a comprehensive data-driven framework for inferring the sources of forced oscillation (FO) using solely synchrophasor measurements. During normal grid operations, fast-rate ambient data are collected to recover the impulse responses in the small-signal regime, without requiring the system model. When FO events occur, the source is estimated based on the frequency domain analysis by fitting the least-squares (LS) error for the FO data using the impulse responses recovered previously. Although the proposed framework is purely data-driven, the result has been established theoretically via model-based analysis of linearized dynamics under a few realistic assumptions. Numerical validations demonstrate its applicability to realistic power systems including nonlinear, higher-order dynamics with control effects using the IEEE 68-bus system, and the 240-bus system from the IEEE-NASPI FO source location contest. The generalizability of the proposed methodology has been validated using different types of measurements and partial sensor coverage conditions. △ Less

Submitted 15 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.11759 [pdf, other]

doi 10.1109/LSP.2024.3356915

Symbol Detection for Coarsely Quantized OTFS

Authors: Junwei He, Haochuan Zhang, Chao Dong, Huimin Zhu

Abstract: This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropi… ▽ More This paper explicitly models a coarse and noisy quantization in a communication system empowered by orthogonal time frequency space (OTFS) for cost and power efficiency. We first point out, with coarse quantization, the effective channel is imbalanced and thus no longer able to circularly shift the transmitted symbols along the delay-Doppler domain. Meanwhile, the effective channel is non-isotropic, which imposes a significant loss to symbol detection algorithms like the original approximate message passing (AMP). Although the algorithm of generalized expectation consistent for signal recovery (GEC-SR) can mitigate this loss, the complexity in computation is prohibitively high, mainly due to an dramatic increase in the matrix size of OTFS. In this context, we propose a low-complexity algorithm that incorporates into the GEC-SR a quick inversion of quasi-banded matrices, reducing the complexity from a cubic order to a linear order while kee** the performance at the same level. △ Less

Submitted 20 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.07140 [pdf]

doi 10.16081/j.epae.202308018

Short-term power load forecasting method based on CNN-SAEDN-Res

Authors: Yang Cui, Han Zhu, Yijian Wang, Lu Zhang, Yang Li

Abstract: In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is com… ▽ More In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is composed of a two-dimensional convolutional neural network, which is used to mine the local correlation between data and obtain high-dimensional data features. The initial load fore-casting module consists of a self-attention encoder-decoder network and a feedforward neural network (FFN). The module utilizes self-attention mechanisms to encode high-dimensional features. This operation can obtain the global correlation between data. Therefore, the model is able to retain important information based on the coupling relationship between the data in data mixed with non-time series factors. Then, self-attention decoding is per-formed and the feedforward neural network is used to regression initial load. This paper introduces the residual mechanism to build the load optimization module. The module generates residual load values to optimize the initial load. The simulation results show that the proposed load forecasting method has advantages in terms of prediction accuracy and prediction stability. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: in Chinese language, Accepted by Electric Power Automation Equipment

Journal ref: Electric Power Automation Equipment 44 (2024) 164-170

arXiv:2309.03472 [pdf, other]

Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation

Authors: Xiangjie Sui, Hanwei Zhu, Xuelin Liu, Yuming Fang, Shiqi Wang, Zhou Wang

Abstract: Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that sig… ▽ More Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that significantly sway user gaze patterns and the overreliance on a single viewport sequence from the 360$^\circ$ image for quality inference. To address these issues, we introduce a unique generative scanpath representation (GSR) for effective quality inference of 360$^\circ$ images, which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition. More specifically, given a viewing condition characterized by the starting point of viewing and exploration time, a set of scanpaths consisting of dynamic visual fixations can be produced using an apt scanpath generator. Following this vein, we use the scanpaths to convert the 360$^\circ$ image into the unique GSR, which provides a global overview of gazed-focused contents derived from scanpaths. As such, the quality inference of the 360$^\circ$ image is swiftly transformed to that of GSR. We then propose an efficient OIQA computational framework by learning the quality maps of GSR. Comprehensive experimental results validate that the predictions of the proposed framework are highly consistent with human perception in the spatiotemporal domain, especially in the challenging context of locally distorted 360$^\circ$ images under varied viewing conditions. The code will be released at https://github.com/xiangjieSui/GSR △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 12 pages, 5 figures

arXiv:2309.00831 [pdf, other]

Multi-scale, Data-driven and Anatomically Constrained Deep Learning Image Registration for Adult and Fetal Echocardiography

Authors: Md. Kamrul Hasan, Haobo Zhu, Guang Yang, Choon Hwai Yap

Abstract: Temporal echocardiography image registration is a basis for clinical quantifications such as cardiac motion estimation, myocardial strain assessments, and stroke volume quantifications. In past studies, deep learning image registration (DLIR) has shown promising results and is consistently accurate and precise, requiring less computational time. We propose that a greater focus on the warped moving… ▽ More Temporal echocardiography image registration is a basis for clinical quantifications such as cardiac motion estimation, myocardial strain assessments, and stroke volume quantifications. In past studies, deep learning image registration (DLIR) has shown promising results and is consistently accurate and precise, requiring less computational time. We propose that a greater focus on the warped moving image's anatomic plausibility and image quality can support robust DLIR performance. Further, past implementations have focused on adult echocardiography, and there is an absence of DLIR implementations for fetal echocardiography. We propose a framework that combines three strategies for DLIR in both fetal and adult echo: (1) an anatomic shape-encoded loss to preserve physiological myocardial and left ventricular anatomical topologies in warped images; (2) a data-driven loss that is trained adversarially to preserve good image texture features in warped images; and (3) a multi-scale training scheme of a data-driven and anatomically constrained algorithm to improve accuracy. Our tests show that good anatomical topology and image textures are strongly linked to shape-encoded and data-driven adversarial losses. They improve different aspects of registration performance in a non-overlap** way, justifying their combination. Despite fundamental distinctions between adult and fetal echo images, we show that these strategies can provide excellent registration results in both adult and fetal echocardiography using the publicly available CAMUS adult echo dataset and our private multi-demographic fetal echo dataset. Our approach outperforms traditional non-DL gold standard registration approaches, including Optical Flow and Elastix. Registration improvements could be translated to more accurate and precise clinical quantification of cardiac ejection fraction, demonstrating a potential for translation. △ Less

Submitted 11 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

Comments: Our data-driven and anatomically constrained DLIR method's source code will be publicly available at https://github.com/kamruleee51/DdC-AC-DLIR

arXiv:2308.14774 [pdf, other]

EEG-Derived Voice Signature for Attended Speaker Detection

Authors: Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

Abstract: \textit{Objective:} Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of t… ▽ More \textit{Objective:} Conventional EEG-based auditory attention detection (AAD) is achieved by comparing the time-varying speech stimuli and the elicited EEG signals. However, in order to obtain reliable correlation values, these methods necessitate a long decision window, resulting in a long detection latency. Humans have a remarkable ability to recognize and follow a known speaker, regardless of the spoken content. In this paper, we seek to detect the attended speaker among the pre-enrolled speakers from the elicited EEG signals. In this manner, we avoid relying on the speech stimuli for AAD at run-time. In doing so, we propose a novel EEG-based attended speaker detection (E-ASD) task. \textit{Methods:} We encode a speaker's voice with a fixed dimensional vector, known as speaker embedding, and project it to an audio-derived voice signature, which characterizes the speaker's unique voice regardless of the spoken content. We hypothesize that such a voice signature also exists in the listener's brain that can be decoded from the elicited EEG signals, referred to as EEG-derived voice signature. By comparing the audio-derived voice signature and the EEG-derived voice signature, we are able to effectively detect the attended speaker in the listening brain. \textit{Results:} Experiments show that E-ASD can effectively detect the attended speaker from the 0.5s EEG decision windows, achieving 99.78\% AAD accuracy, 99.94\% AUC, and 0.27\% EER. \textit{Conclusion:} We conclude that it is possible to derive the attended speaker's voice signature from the EEG signals so as to detect the attended speaker in a listening brain. \textit{Significance:} We present the first proof of concept for detecting the attended speaker from the elicited EEG signals in a cocktail party environment. The successful implementation of E-ASD marks a non-trivial, but crucial step towards smart hearing aids. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 8 pages, 2 figures

arXiv:2308.06547 [pdf, other]

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

Authors: Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Abstract: When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either fi… ▽ More When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

arXiv:2308.02531 [pdf]

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

Authors: Jiuyang Zhou, Hong Zhu, ** Wang

Abstract: Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship between long-distance notes. In this paper, we propose a polyphonic music generation neural network named Choir Transformer[ https://github.com/Zjy0401/choir-tran… ▽ More Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship between long-distance notes. In this paper, we propose a polyphonic music generation neural network named Choir Transformer[ https://github.com/Zjy0401/choir-transformer], with relative positional attention to better model the structure of music. We also proposed a music representation suitable for polyphonic music generation. The performance of Choir Transformer surpasses the previous state-of-the-art accuracy of 4.06%. We also measures the harmony metrics of polyphonic music. Experiments show that the harmony metrics are close to the music of Bach. In practical application, the generated melody and rhythm can be adjusted according to the specified input, with different styles of music like folk music or pop music and so on. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2308.02412 [pdf, other]

Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

Authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

Abstract: Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying d… ▽ More Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying deep learning models in the context of CSI-based HAR due to the privacy and incomprehensibility of CSI-based HAR data. On the other hand, SSL has emerged as a promising approach for learning meaningful representations from data without heavy reliance on labeled examples. Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms. In this paper, we undertake a comprehensive inventory and analysis of the potential held by different categories of SSL algorithms, including those that have been previously studied and those that have not yet been explored, within the field. We provide an in-depth investigation of SSL algorithms in the context of WiFi CSI-based HAR. We evaluate four categories of SSL algorithms using three publicly available CSI HAR datasets, each encompassing different tasks and environmental settings. To ensure relevance to real-world applications, we design performance metrics that align with specific requirements. Furthermore, our experimental findings uncover several limitations and blind spots in existing work, highlighting the barriers that need to be addressed before SSL can be effectively deployed in real-world WiFi-based HAR applications. Our results also serve as a practical guideline for industry practitioners and provide valuable insights for future research endeavors in this field. △ Less

Submitted 19 July, 2023; originally announced August 2023.

arXiv:2307.12871 [pdf, other]

Topology-aware Piecewise Linearization of the AC Power Flow through Generative Modeling

Authors: Young-ho Cho, Hao Zhu

Abstract: Effective power flow modeling critically affects the ability to efficiently solve large-scale grid optimization problems, especially those with topology-related decision variables. In this work, we put forth a generative modeling approach to obtain a piecewise linear (PWL) approximation of AC power flow by training a simple neural network model from actual data samples. By using the ReLU activatio… ▽ More Effective power flow modeling critically affects the ability to efficiently solve large-scale grid optimization problems, especially those with topology-related decision variables. In this work, we put forth a generative modeling approach to obtain a piecewise linear (PWL) approximation of AC power flow by training a simple neural network model from actual data samples. By using the ReLU activation, the NN models can produce a PWL map** from the input voltage magnitudes and angles to the output power flow and injection. Our proposed generative PWL model uniquely accounts for the nonlinear and topology-related couplings of power flow models, and thus it can greatly improve the accuracy and consistency of output power variables. Most importantly, it enables to reformulate the nonlinear power flow and line status-related constraints into mixed-integer linear ones, such that one can efficiently solve grid topology optimization tasks like the AC optimal transmission switching (OTS) problem. Numerical tests using the IEEE 14- and 118-bus test systems have demonstrated the modeling accuracy of the proposed PWL approximation using a generative approach, as well as its ability in enabling competitive OTS solutions at very low computation order. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2306.15433 [pdf, other]

Recursive LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems to Save Computations and Memories

Authors: Hufei Zhu, Fuqin Deng, Yikui Zhai, Jiaming Zhong, Yanyang Liang

Abstract: Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the ha… ▽ More Firstly, a reordered description is given for the linear minimum mean square error (LMMSE)-based iterative soft interference cancellation (ISIC) detection process for Mutipleinput multiple-output (MIMO) wireless communication systems, which is based on the equivalent channel matrix. Then the above reordered description is applied to compare the detection process for LMMSE-ISIC with that for the hard decision (HD)-based ordered successive interference cancellation (OSIC) scheme, to draw the conclusion that the former is the extension of the latter. Finally, the recursive scheme for HD-OSIC with reduced complexity and memory saving is extended to propose the recursive scheme for LMMSE-ISIC, where the required computations and memories are reduced by computing the filtering bias and the estimate from the Hermitian inverse matrix and the symbol estimate vector, and updating the Hermitian inverse matrix and the symbol estimate vector efficiently. Assume N transmitters and M (no less than N) receivers in the MIMO system. Compared to the existing low-complexity LMMSE-ISIC scheme, the proposed recursive LMMSE-ISIC scheme requires no more than 1/6 computations and no more than 1/5 memory units. △ Less

Submitted 5 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2305.08541 [pdf, other]

Ripple sparse self-attention for monaural speech enhancement

Authors: Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

Abstract: The use of Transformer represents a recent success in speech enhancement. However, as its core component, self-attention suffers from quadratic complexity, which is computationally prohibited for long speech recordings. Moreover, it allows each time frame to attend to all time frames, neglecting the strong local correlations of speech signals. This study presents a simple yet effective sparse self… ▽ More The use of Transformer represents a recent success in speech enhancement. However, as its core component, self-attention suffers from quadratic complexity, which is computationally prohibited for long speech recordings. Moreover, it allows each time frame to attend to all time frames, neglecting the strong local correlations of speech signals. This study presents a simple yet effective sparse self-attention for speech enhancement, called ripple attention, which simultaneously performs fine- and coarse-grained modeling for local and global dependencies, respectively. Specifically, we employ local band attention to enable each frame to attend to its closest neighbor frames in a window at fine granularity, while employing dilated attention outside the window to model the global dependencies at a coarse granularity. We evaluate the efficacy of our ripple attention for speech enhancement on two commonly used training objectives. Extensive experimental results consistently confirm the superior performance of the ripple attention design over standard full self-attention, blockwise attention, and dual-path attention (Sep-Former) in terms of speech quality and intelligibility. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 5 pages, ICASSP 2023 published

arXiv:2305.05152 [pdf, other]

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

Authors: Yanzhen Ren, Hongcheng Zhu, Liming Zhai, Zongkun Sun, Rubing Shen, Lina Wang

Abstract: Voice conversion (VC), as a voice style transfer technology, is becoming increasingly prevalent while raising serious concerns about its illegal use. Proactively tracing the origins of VC-generated speeches, i.e., speaker traceability, can prevent the misuse of VC, but unfortunately has not been extensively studied. In this paper, we are the first to investigate the speaker traceability for VC and… ▽ More Voice conversion (VC), as a voice style transfer technology, is becoming increasingly prevalent while raising serious concerns about its illegal use. Proactively tracing the origins of VC-generated speeches, i.e., speaker traceability, can prevent the misuse of VC, but unfortunately has not been extensively studied. In this paper, we are the first to investigate the speaker traceability for VC and propose a traceable VC framework named VoxTracer. Our VoxTracer is similar to but beyond the paradigm of audio watermarking. We first use unique speaker embedding to represent speaker identity. Then we design a VAE-Glow structure, in which the hiding process imperceptibly integrates the source speaker identity into the VC, and the tracing process accurately recovers the source speaker identity and even the source speech in spite of severe speech quality degradation. To address the speech mismatch between the hiding and tracing processes affected by different distortions, we also adopt an asynchronous training strategy to optimize the VAE-Glow models. The VoxTracer is versatile enough to be applied to arbitrary VC methods and popular audio coding standards. Extensive experiments demonstrate that the VoxTracer achieves not only high imperceptibility in hiding, but also nearly 100% tracing accuracy against various types of audio lossy compressions (AAC, MP3, Opus and SILK) with a broad range of bitrates (16 kbps - 128 kbps) even in a very short time duration (0.74s). Our speech demo is available at https://anonymous.4open.science/w/DEMOofVoxTracer. △ Less

Submitted 26 July, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: has been accepted by ACM MM 2023

arXiv:2305.04294

PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement

Authors: Zhen Huang, Han Li, Shitong Shao, Heqin Zhu, Huijie Hu, Zhiwei Cheng, Jianji Wang, S. Kevin Zhou

Abstract: The pelvis, the lower part of the trunk, supports and balances the trunk. Landmark detection from a pelvic X-ray (PXR) facilitates downstream analysis and computer-assisted diagnosis and treatment of pelvic diseases. Although PXRs have the advantages of low radiation and reduced cost compared to computed tomography (CT) images, their 2D pelvis-tissue superposition of 3D structures confuses clinica… ▽ More The pelvis, the lower part of the trunk, supports and balances the trunk. Landmark detection from a pelvic X-ray (PXR) facilitates downstream analysis and computer-assisted diagnosis and treatment of pelvic diseases. Although PXRs have the advantages of low radiation and reduced cost compared to computed tomography (CT) images, their 2D pelvis-tissue superposition of 3D structures confuses clinical decision-making. In this paper, we propose a PELvis Extraction (PELE) module that utilizes 3D prior anatomical knowledge in CT to guide and well isolate the pelvis from PXRs, thereby eliminating the influence of soft tissue. We conduct an extensive evaluation based on two public datasets and one private dataset, totaling 850 PXRs. The experimental results show that the proposed PELE module significantly improves the accuracy of PXRs landmark detection and achieves state-of-the-art performances in several benchmark metrics, thus better serving downstream tasks. △ Less

Submitted 7 June, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: will revise it and resubmit it again later

arXiv:2304.11039 [pdf, other]

An Optimization Framework For Anomaly Detection Scores Refinement With Side Information

Authors: Ali Maatouk, Fadhel Ayed, Wenjie Li, Yu Wang, Hong Zhu, Jiantao Ye

Abstract: This paper considers an anomaly detection problem in which a detection algorithm assigns anomaly scores to multi-dimensional data points, such as cellular networks' Key Performance Indicators (KPIs). We propose an optimization framework to refine these anomaly scores by leveraging side information in the form of a causality graph between the various features of the data points. The refinement bloc… ▽ More This paper considers an anomaly detection problem in which a detection algorithm assigns anomaly scores to multi-dimensional data points, such as cellular networks' Key Performance Indicators (KPIs). We propose an optimization framework to refine these anomaly scores by leveraging side information in the form of a causality graph between the various features of the data points. The refinement block builds on causality theory and a proposed notion of confidence scores. After motivating our framework, smoothness properties are proved for the ensuing mathematical expressions. Next, equipped with these results, a gradient descent algorithm is proposed, and a proof of its convergence to a stationary point is provided. Our results hold (i) for any causal anomaly detection algorithm and (ii) for any side information in the form of a directed acyclic graph. Numerical results are provided to illustrate the advantage of our proposed framework in dealing with False Positives (FPs) and False Negatives (FNs). Additionally, the effect of the graph's structure on the expected performance advantage and the various trade-offs that take place are analyzed. △ Less

Submitted 30 August, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.02606 [pdf, other]

Two-Timescale Design for RIS-aided Cell-free Massive MIMO Systems with Imperfect CSI

Authors: Mahdi Eskandari, Kangda Zhi, Huiling Zhu, Cunhua Pan, Jiangzhou Wang

Abstract: The objective of this paper is to evaluate the effectiveness of a two-timescale transmission design in cell-free massive multi-input multiple-output (MIMO) systems incorporating reconfigurable intelligent surfaces (RISs) under the assumption of imperfect channel state information (CSI). We examine the Rician channel model and formulate the passive beamforming for the RISs based on statistical chan… ▽ More The objective of this paper is to evaluate the effectiveness of a two-timescale transmission design in cell-free massive multi-input multiple-output (MIMO) systems incorporating reconfigurable intelligent surfaces (RISs) under the assumption of imperfect channel state information (CSI). We examine the Rician channel model and formulate the passive beamforming for the RISs based on statistical channel state information (S-CSI). To that end, we put forth a linear minimum mean square error (LMMSE) estimator with the aim of estimating the aggregation of channels from the users to the APs within each channel coherence interval. Meanwhile, the active beamforming for the radio units (APs) is executed using the maximum ratio combining (MRC) approach, which utilizes the instantaneous aggregated channels, that result from the combination of the direct and reflected channels from the RISs. Subsequently, we derive the closed-form expressions of the achievable uplink spectral efficiency (SE), which is a function of S-CSI elements such as distance-dependent path loss, Rician factors as well as the number of RIS elements and AP antennas. We then optimize the phase shifts of the RISs to maximize the sum SE of the users, utilizing the soft actor-critic (SAC) which is a deep reinforcement learning (RL) method, and relying on the derived closed-form expressions. Numerical evaluations affirm that, despite the presence of imperfect CSI, the deployment of RIS in cell-free systems can lead to significant performance improvement. △ Less

Submitted 14 March, 2023; originally announced April 2023.

arXiv:2304.00837 [pdf, other]

Disorder-invariant Implicit Neural Representation

Authors: Hao Zhu, Shaowen Xie, Zhen Liu, Fengyi Liu, Qi Zhang, You Zhou, Yi Lin, Zhan Ma, Xun Cao

Abstract: Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the expressive power of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be greatly solved by re-arranging the coordinates of the… ▽ More Implicit neural representation (INR) characterizes the attributes of a signal as a function of corresponding coordinates which emerges as a sharp weapon for solving inverse problems. However, the expressive power of INR is limited by the spectral bias in the network training. In this paper, we find that such a frequency-related problem could be greatly solved by re-arranging the coordinates of the input signal, for which we propose the disorder-invariant implicit neural representation (DINER) by augmenting a hash-table to a traditional INR backbone. Given discrete signals sharing the same histogram of attributes and different arrangement orders, the hash-table could project the coordinates into the same distribution for which the mapped signal can be better modeled using the subsequent INR network, leading to significantly alleviated spectral bias. Furthermore, the expressive power of the DINER is determined by the width of the hash-table. Different width corresponds to different geometrical elements in the attribute space, \textit{e.g.}, 1D curve, 2D curved-plane and 3D curved-volume when the width is set as $1$, $2$ and $3$, respectively. More covered areas of the geometrical elements result in stronger expressive power. Experiments not only reveal the generalization of the DINER for different INR backbones (MLP vs. SIREN) and various tasks (image/video representation, phase retrieval, refractive index recovery, and neural radiance field optimization) but also show the superiority over the state-of-the-art algorithms both in quality and speed. \textit{Project page:} \url{https://ezio77.github.io/DINER-website/} △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: Journal extension of the CVPR'23 highlight paper "DINER: Disorder-invariant Implicit Neural Representation". In the extension, we model the expressive power of the DINER using parametric functions in the attribute space. As a result, better results are achieved than the conference version. arXiv admin note: substantial text overlap with arXiv:2211.07871

arXiv:2303.07558 [pdf, other]

Optimal Power System Topology Control Under Uncertain Wildfire Risk

Authors: Yuqi Zhou, Kaarthik Sundar, Deepjyoti Deka, Hao Zhu

Abstract: Wildfires pose a significant threat to the safe and reliable operation of electric power systems. They can quickly spread and cause severe damage to power infrastructure. To reduce the risk, public safety power shutoffs are often used to restore power balance and prevent widespread blackouts. However, the unpredictability of wildfires makes it challenging to implement effective counter-measures in… ▽ More Wildfires pose a significant threat to the safe and reliable operation of electric power systems. They can quickly spread and cause severe damage to power infrastructure. To reduce the risk, public safety power shutoffs are often used to restore power balance and prevent widespread blackouts. However, the unpredictability of wildfires makes it challenging to implement effective counter-measures in a timely manner. This paper proposes an optimization-based topology control problem as a solution to mitigate the impact of wildfires. The goal is to find the optimal network topology that minimizes total operating costs and prevents load shedding during power shutoffs under uncertain line shutoff scenarios caused by uncertain spreading wildfires. The solution involves solving two-stage stochastic mixed-integer linear programs, with preventive and corrective control actions taken when the risk of a wildfire and corresponding outage line scenarios are known with low and high confidence, respectively. The Progressive Hedging algorithm is used to solve both problems. The effectiveness of the proposed approach is demonstrated using data from the RTS-GMLC system that is artificially geo-located in Southern California, including actual wildfire incidents and risk maps. Our work provides a crucial study of the comparative benefits due to accurate risk forecast and corresponding preventive control over real-time corrective control that may not be realistic. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 10 pages, 6 figures

arXiv:2303.03575 [pdf, other]

Adaptive Importance Sampling and Quasi-Monte Carlo Methods for 6G URLLC Systems

Authors: Xiongwen Ke, Houying Zhu, Kai Yi, Gaoning He, Ganghua Yang, Yu Guang Wang

Abstract: In this paper, we propose an efficient simulation method based on adaptive importance sampling, which can automatically find the optimal proposal within the Gaussian family based on previous samples, to evaluate the probability of bit error rate (BER) or word error rate (WER). These two measures, which involve high-dimensional black-box integration and rare-event sampling, can characterize the per… ▽ More In this paper, we propose an efficient simulation method based on adaptive importance sampling, which can automatically find the optimal proposal within the Gaussian family based on previous samples, to evaluate the probability of bit error rate (BER) or word error rate (WER). These two measures, which involve high-dimensional black-box integration and rare-event sampling, can characterize the performance of coded modulation. We further integrate the quasi-Monte Carlo method within our framework to improve the convergence speed. The proposed importance sampling algorithm is demonstrated to have much higher efficiency than the standard Monte Carlo method in the AWGN scenario. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: importance sampling for system model

arXiv:2302.11464 [pdf, other]

doi 10.1109/TMM.2023.3312851

Gap-closing Matters: Perceptual Quality Evaluation and Optimization of Low-Light Image Enhancement

Authors: Baoliang Chen, Lingyu Zhu, Hanwei Zhu, Wenhan Yang, Linqi Song, Shiqi Wang

Abstract: There is a growing consensus in the research community that the optimization of low-light image enhancement approaches should be guided by the visual quality perceived by end users. Despite the substantial efforts invested in the design of low-light enhancement algorithms, there has been comparatively limited focus on assessing subjective and objective quality systematically. To mitigate this gap… ▽ More There is a growing consensus in the research community that the optimization of low-light image enhancement approaches should be guided by the visual quality perceived by end users. Despite the substantial efforts invested in the design of low-light enhancement algorithms, there has been comparatively limited focus on assessing subjective and objective quality systematically. To mitigate this gap and provide a clear path towards optimizing low-light image enhancement for better visual quality, we propose a gap-closing framework. In particular, our gap-closing framework starts with the creation of a large-scale dataset for Subjective QUality Assessment of REconstructed LOw-Light Images (SQUARE-LOL). This database serves as the foundation for studying the quality of enhanced images and conducting a comprehensive subjective user study. Subsequently, we propose an objective quality assessment measure that plays a critical role in bridging the gap between visual quality and enhancement. Finally, we demonstrate that our proposed objective quality measure can be incorporated into the process of optimizing the learning of the enhancement model toward perceptual optimality. We validate the effectiveness of our proposed framework through both the accuracy of quality prediction and the perceptual quality of image enhancement. Our database and codes are publicly available at https://github.com/Baoliang93/IACA_For_Lowlight_IQA. △ Less

Submitted 20 June, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Basis Angle Consistency in Sec.3.2 will be revised

arXiv:2302.08660 [pdf, ps, other]

Improved Recursive Algorithms for V-BLAST to Save Computations and Memories

Authors: Hufei Zhu, Yanyang Liang, Fuqin Deng, Genquan Chen, Jiaming Zhong

Abstract: For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. This paper pr… ▽ More For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. This paper proposes Improvements V and VI to replace Improvements I and II, respectively. Instead of the lemma for inversion of partitioned matrix applied in Improvement I, Improvement V uses another lemma to speed up the matrix inversion step by 167%. Then the formulas adopted in our Improvement V are applied to deduce Improvement VI for interference cancellation, which saves memories without sacrificing speed compared to Improvement II. In the existing algorithm with speed advantage, the proposed algorithm I with speed advantage replaces Improvement I with Improvement V, while the proposed algorithm II with both speed advantage and memory saving replaces Improvements I and II with Improvements V and VI, respectively. Both proposed algorithms speed up the existing algorithm with speed advantage by 130%, while the proposed algorithm II achieves the speedup of 186% and saves about half memories, compared to the existing algorithm with memory saving. △ Less

Submitted 5 December, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2302.05812 [pdf, other]

Software-Defined MIMO OFDM Joint Radar-Communication Platform with Fully Digital mmWave Architecture

Authors: Ceyhun D. Ozkaptan, Haocheng Zhu, Eylem Ekici, Onur Altintas

Abstract: Large-scale deployment of connected vehicles with cooperative sensing and maneuvering technologies increases the demand for vehicle-to-everything communication (V2X) band in 5.9 GHz. Besides the V2X spectrum, the under-utilized millimeter-wave (mmWave) bands at 24 and 77 GHz can be leveraged to supplement V2X communication and support high data rates for emerging broadband applications. For this p… ▽ More Large-scale deployment of connected vehicles with cooperative sensing and maneuvering technologies increases the demand for vehicle-to-everything communication (V2X) band in 5.9 GHz. Besides the V2X spectrum, the under-utilized millimeter-wave (mmWave) bands at 24 and 77 GHz can be leveraged to supplement V2X communication and support high data rates for emerging broadband applications. For this purpose, joint radar-communication (JRC) systems have been proposed in the literature to perform both functions using the same waveform and hardware. In this work, we present a software-defined multiple-input and multiple-output (MIMO) JRC with orthogonal frequency division multiplexing (OFDM) for the 24 GHz mmWave band. We implement a real-time operating full-duplex JRC platform using commercially available software-defined radios and custom-built mmWave front-ends. With fully digital MIMO architecture, we demonstrate simultaneous data transmission and high-resolution radar imaging capabilities of MIMO OFDM JRC in the mmWave band. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Comments: To appear at 3rd IEEE International Symposium on Joint Communications & Sensing (JC&S 2023)

ACM Class: B.4.1; B.4.5

Showing 1–50 of 167 results for author: Zhu, H