Search | arXiv e-print repository

Exploiting Structured Sparsity in Near Field: From the Perspective of Decomposition

Authors: Xufeng Guo, Yuanbin Chen, Ying Wang, Chau Yuen

Abstract: The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regim… ▽ More The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regime remains an open issue. To answer this question, this article delves into structured sparsity in the near-field realm, examining its peculiarities and challenges. In particular, we present the key features of near-field structured sparsity in contrast to the far-field counterpart, drawing from both physical and mathematical perspectives. Upon unmasking the theoretical bottlenecks, we resort to bypassing them by decoupling the geometric parameters of the scatterers, termed the triple parametric decomposition (TPD) framework. It is demonstrated that our novel TPD framework can achieve robust recovery of near-field sparse channels by applying the potential structured sparsity and avoiding the curse of complexity and overhead. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: This aricle has been accepted for publication in IEEE Commag

arXiv:2406.14064 [pdf, other]

PAPR Reduction with Pre-chirp Selection for Affine Frequency Division Multiple

Authors: Haozhi Yuan, Yin Xu, Xinghao Guo, Tianyao Ma, Haoyang Li, Dazhi He, Wenjun Zhang

Abstract: Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique based on discrete affine Fourier transform (DAFT). By properly tuning pre-chirp parameter and post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely avoid overlap of different paths, thus constitutes a full representation of delay-Doppler profile, which significantly improves… ▽ More Affine frequency division multiplexing (AFDM) is a promising new multicarrier technique based on discrete affine Fourier transform (DAFT). By properly tuning pre-chirp parameter and post-chirp parameter in the DAFT, the effective channel in the DAFT domain can completely avoid overlap of different paths, thus constitutes a full representation of delay-Doppler profile, which significantly improves the system performance in high mobility scenarios. However, AFDM has the crucial problem of high peak-to-average power ratio (PAPR) caused by phase randomness of modulated symbols. In this letter, an algorithm named grouped pre-chirp selection (GPS) is proposed to reduce the PAPR by changing the value of pre-chirp parameter on sub-carriers group by group. Specifically, it is demonstrated first that the important properties of AFDM system are maintained when implementing GPS. Secondly, we elaborate the operation steps of GPS algorithm, illustrating its effect on PAPR reduction and its advantage in terms of computational complexity compared with the ungrouped approach. Finally, simulation results of PAPR reduction in the form of complementary cumulative distribution function (CCDF) show the effectiveness of the proposed GPS algorithm. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.08374 [pdf, other]

2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures

arXiv:2406.02422 [pdf, other]

IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI

Authors: Ziyun Liang, Xiaoqing Guo, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They inten… ▽ More Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMask2 △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02164 [pdf, other]

Sparse Recovery for Holographic MIMO Channels: Leveraging the Clustered Sparsity

Authors: Yuqing Guo, Xufeng Guo, Yuanbin Chen, Ying Wang

Abstract: Envisioned as the next-generation transceiver technology, the holographic multiple-input-multiple-output (HMIMO) garners attention for its superior capabilities of fabricating electromagnetic (EM) waves. However, the densely packed antenna elements significantly increase the dimension of the HMIMO channel matrix, rendering traditional channel estimation methods inefficient. While the dimension cur… ▽ More Envisioned as the next-generation transceiver technology, the holographic multiple-input-multiple-output (HMIMO) garners attention for its superior capabilities of fabricating electromagnetic (EM) waves. However, the densely packed antenna elements significantly increase the dimension of the HMIMO channel matrix, rendering traditional channel estimation methods inefficient. While the dimension curse can be relieved to avoid the proportional increase with the antenna density using the state-of-the-art wavenumber-domain sparse representation, the sparse recovery complexity remains tied to the order of non-zero elements in the sparse channel, which still considerably exceeds the number of scatterers. By modeling the inherent clustered sparsity using a Gaussian mixed model (GMM)-based von Mises-Fisher (vMF) distribution, the to-be-estimated channel characteristics can be compressed to the scatterer level. Upon the sparsity extraction, a novel wavenumber-domain expectation-maximization (WD-EM) algorithm is proposed to implement the cluster-by-cluster variational inference, thus significantly reducing the computational complexity. Simulation results verify the robustness of the proposed scheme across overheads and signal-to-noise ratio (SNR). △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: This manuscript has been submitted to IEEE journal, 5 pages, 3 figures

arXiv:2405.19659 [pdf, other]

CSANet: Channel Spatial Attention Network for Robust 3D Face Alignment and Reconstruction

Authors: Yilin Liu, Xuezhou Guo, Xinqi Wang, Fangzhou Du

Abstract: Our project proposes an end-to-end 3D face alignment and reconstruction network. The backbone of our model is built by Bottle-Neck structure via Depth-wise Separable Convolution. We integrate Coordinate Attention mechanism and Spatial Group-wise Enhancement to extract more representative features. For more stable training process and better convergence, we jointly use Wing loss and the Weighted Pa… ▽ More Our project proposes an end-to-end 3D face alignment and reconstruction network. The backbone of our model is built by Bottle-Neck structure via Depth-wise Separable Convolution. We integrate Coordinate Attention mechanism and Spatial Group-wise Enhancement to extract more representative features. For more stable training process and better convergence, we jointly use Wing loss and the Weighted Parameter Distance Cost to learn parameters for 3D Morphable model and 3D vertices. Our proposed model outperforms all baseline models both quantitatively and qualitatively. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

arXiv:2405.12996 [pdf, other]

Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, patient populations, and hospitals. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we extensively evaluated the proposed model using a total of 9,783 18F-FDG studies (1,596 patients) with low-dose/low-count levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by nuclear medicine physicians, the proposed method produced superior denoised results that are comparable to or even better than the 100% full-count images as well as previous DL baselines. The presented results show the potential of achieving low-dose PET while maintaining image quality. Lastly, a group of real low-dose scans was also included for evaluation. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

arXiv:2405.06995 [pdf, other]

Benchmarking Cross-Domain Audio-Visual Deception Detection

Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot

Abstract: Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features d… ▽ More Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. Protocols and source code are available at \href{https://github.com/Redaimao/cross_domain_DD}{https://github.com/Redaimao/cross\_domain\_DD}. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 10 pages

arXiv:2404.09131 [pdf, other]

Design of Artificial Interference Signals for Covert Communication Aided by Multiple Friendly Nodes

Authors: Xuyang Zhao. Wei Guo, Yongchao Wang

Abstract: In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of ch… ▽ More In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of channel coefficients to optimize the basis matrix of the artificial noise signals space in the absence of accurate channel fading information between the friendly interference nodes and the legitimate receiver. The optimization problem aims to design artificial noise signals within the space to facilitate covert communication while minimizing the impact on the performance of legitimate communication. Second, a customized Rimannian Stochastic Variance Reduced Gradient (R-SVRG) algorithm is proposed to solve the non-convex problem. In the algorithm, we employ the Riemannian optimization framework to analyze the geometric structure of the basis matrix constraints and transform the original non-convex optimization problem into an unconstrained problem on the complex Stiefel manifold for solution. Third, we theoretically prove the convergence of the proposed algorithm to a stationary point. In the end, we evaluate the performance of the proposed strategy for generating artificial noise signals through numerical simulations. The results demonstrate that our approach significantly outperforms the Gaussian artificial noise strategy without optimization. △ Less

Submitted 9 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.03869 [pdf, other]

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

Abstract: The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is resha** our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, r… ▽ More The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is resha** our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, real-world multi-agent systems usually contain agents with different functions and strategies, while the existing scalable MARL methods only have limited heterogeneity. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. we first leverage a latent network to adaptively learn strategy patterns for each agent. Second, we introduce a heterogeneous layer for decision-making, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity at the same time. We implement our approach based on the state-of-the-art backbone PPO-based algorithm as SHPPO, while our approach is agnostic to the backbone and can be seamlessly plugged into any parameter-shared MARL method. SHPPO exhibits superior performance over the baselines such as MAPPO and HAPPO in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability and offering insights into the learned latent representation's impact on team performance by visualization. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.19002 [pdf, other]

Robust Active Speaker Detection in Noisy Environments

Authors: Siva Sai Nagender Vasireddy, Chenxu Zhang, Xiaohu Guo, Yapeng Tian

Abstract: This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation a… ▽ More This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds in the surrounding environment can negatively impact performance. To overcome this, we propose a novel framework that utilizes audio-visual speech separation as guidance to learn noise-free audio features. These features are then utilized in an ASD model, and both tasks are jointly optimized in an end-to-end framework. Our proposed framework mitigates residual noise and audio quality reduction issues that can occur in a naive cascaded two-stage framework that directly uses separated speech for ASD, and enables the two tasks to be optimized simultaneously. To further enhance the robustness of the audio features and handle inherent speech noises, we propose a dynamic weighted loss approach to train the speech separator. We also collected a real-world noise audio dataset to facilitate investigations. Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments. The framework is general and can be applied to different ASD approaches to improve their robustness. Our code, models, and data will be released. △ Less

Submitted 30 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 15 pages, 5 figures

arXiv:2403.11071 [pdf, other]

Wavenumber Domain Sparse Channel Estimation in Holographic MIMO

Authors: Xufeng Guo, Yuanbin Chen, Ying Wang, Zhaocheng Wang, Zhu Han

Abstract: In this paper, we investigate the sparse channel estimation in holographic multiple-input multiple-output (HMIMO) systems. The conventional angular-domain representation fails to capture the continuous angular power spectrum characterized by the spatially-stationary electromagnetic random field, thus leading to the ambiguous detection of the significant angular power, which is referred to as the p… ▽ More In this paper, we investigate the sparse channel estimation in holographic multiple-input multiple-output (HMIMO) systems. The conventional angular-domain representation fails to capture the continuous angular power spectrum characterized by the spatially-stationary electromagnetic random field, thus leading to the ambiguous detection of the significant angular power, which is referred to as the power leakage. To tackle this challenge, the HMIMO channel is represented in the wavenumber domain for exploring its cluster-dominated sparsity. Specifically, a finite set of Fourier harmonics acts as a series of sampling probes to encapsulate the integral of the power spectrum over specific angular regions. This technique effectively eliminates power leakage resulting from power mismatches induced by the use of discrete angular-domain probes. Next, the channel estimation problem is recast as a sparse recovery of the significant angular power spectrum over the continuous integration region. We then propose an accompanying graph-cut-based swap expansion (GCSE) algorithm to extract beneficial sparsity inherent in HMIMO channels. Numerical results demonstrate that this wavenumber-domainbased GCSE approach achieves robust performance with rapid convergence. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: This paper has been accepted in 2024 ICC

arXiv:2402.09567 [pdf, other]

TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction

Authors: Xueqi Guo, Luyao Shi, Xiongchao Chen, Qiong Liu, Bo Zhou, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Lawrence H. Staib, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek

Abstract: Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for ea… ▽ More Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for early frames where intensity-based image registration techniques often fail. To address this issue, we propose a novel method called Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) that utilizes an all-to-one map** to convert early frames into those with tracer distribution similar to the last reference frame. The TAI-GAN consists of a feature-wise linear modulation layer that encodes channel-wise parameters generated from temporal information and rough cardiac segmentation masks with local shifts that serve as anatomical information. Our proposed method was evaluated on a clinical 82-Rb PET dataset, and the results show that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, the motion estimation accuracy and subsequent myocardial blood flow (MBF) quantification with both conventional and deep learning-based motion correction methods were improved compared to using the original frames. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Under revision at Medical Image Analysis

arXiv:2401.14285 [pdf, other]

POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map Generation

Authors: Bo Zhou, Jun Hou, Tianqi Chen, Yinchi Zhou, Xiongchao Chen, Huidong Xie, Qiong Liu, Xueqi Guo, Yu-Jung Tsai, Vladimir Y. Panin, Takuya Toyonaga, James S. Duncan, Chi Liu

Abstract: Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prio… ▽ More Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $μ$-map generation, resulting in the production of high-quality $μ$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 10 pages, 5 figures

arXiv:2401.13140 [pdf, other]

Dual-Domain Coarse-to-Fine Progressive Estimation Network for Simultaneous Denoising, Limited-View Reconstruction, and Attenuation Correction of Cardiac SPECT

Authors: Xiongchao Chen, Bo Zhou, Xueqi Guo, Huidong Xie, Qiong Liu, James S. Duncan, Albert J. Sinusas, Chi Liu

Abstract: Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy. Additionally, Computed Tomog… ▽ More Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy. Additionally, Computed Tomography (CT) is commonly used to derive attenuation maps ($μ$-maps) for attenuation correction (AC) of cardiac SPECT, but it will introduce additional radiation exposure and SPECT-CT misalignments. Although various methods have been developed to solely focus on LD denoising, LV reconstruction, or CT-free AC in SPECT, the solution for simultaneously addressing these tasks remains challenging and under-explored. Furthermore, it is essential to explore the potential of fusing cross-domain and cross-modality information across these interrelated tasks to further enhance the accuracy of each task. Thus, we propose a Dual-Domain Coarse-to-Fine Progressive Network (DuDoCFNet), a multi-task learning method for simultaneous LD denoising, LV reconstruction, and CT-free $μ$-map generation of cardiac SPECT. Paired dual-domain networks in DuDoCFNet are cascaded using a multi-layer fusion mechanism for cross-domain and cross-modality feature fusion. Two-stage progressive learning strategies are applied in both projection and image domains to achieve coarse-to-fine estimations of SPECT projections and CT-derived $μ$-maps. Our experiments demonstrate DuDoCFNet's superior accuracy in estimating projections, generating $μ$-maps, and AC reconstructions compared to existing single- or multi-task learning methods, under various iterations and LD levels. The source code of this work is available at https://github.com/XiongchaoChen/DuDoCFNet-MultiTask. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 11 Pages, 10 figures, 4 tables

arXiv:2401.08120 [pdf]

Operation Scheme Optimizations to Achieve Ultra-high Endurance (1010) in Flash Memory with Robust Reliabilities

Authors: Yang Feng, Zhaohui Sun, Chengcheng Wang, Xinyi Guo, Junyao Mei, Yueran Qi, **g Liu, Junyu Zhang, Jixuan Wu, Xuepeng Zhan, Jiezhi Chen

Abstract: Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel ho… ▽ More Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel hot electrons injection (CHEI) and hot hole injection (HHI) to implement program/erase (PE) cycling together with a balanced memory window (MW) at the high-Vth (HV) mode, impressively, the endurance can be greatly extended to 1010 PE cycles, which is a record-high value in flash memory. Moreover, by using the proposed electric-field-assisted relaxation (EAR) scheme, the degradation of flash cells can be well suppressed with better subthreshold swings (SS) and lower leakage currents (sub-10pA after 1010 PE cycles). Our results shed light on the optimization strategy of flash memory to serve as SCM and implementendurance-required CIM tasks. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.13013 [pdf, ps, other]

User-Assisted Networked Sensing in OFDM Cellular Network with Erroneous Anchor Position Information

Authors: Xianzhen Guo, Qin Shi, Liang Liu, Shuowen Zhang

Abstract: In the sixth-generation (6G) integrated sensing and communication (ISAC) cellular network, base stations (BSs) can collaborate with each other to reap not only the cooperative communication gain, but also the networked sensing gain. In contrast to cooperative communication where both line-of-sight (LOS) paths and non-line-of-sight (NLOS) paths are useful, networked sensing mainly relies on the LOS… ▽ More In the sixth-generation (6G) integrated sensing and communication (ISAC) cellular network, base stations (BSs) can collaborate with each other to reap not only the cooperative communication gain, but also the networked sensing gain. In contrast to cooperative communication where both line-of-sight (LOS) paths and non-line-of-sight (NLOS) paths are useful, networked sensing mainly relies on the LOS paths. However, in practice, the number of BSs possessing LOS paths to a target can be small, leading to marginal networked sensing gain. Because the density of user equipments (UEs) is much larger than that of the BSs, this paper considers a UE-assisted networked sensing architecture, where a BS transmits communication signals in the downlink, while the UEs that receive the echo signals scattered by a target can cooperate with the BS to localize it. Under this scheme, however, the positions of the UEs are estimated by Global Positioning System (GPS) and subject to unknown errors. If some UEs with significantly erroneous position information are used as anchors, the localization performance can be severely degraded. Based on the outlier detection technique, this paper proposes an efficient method to select a subset of UEs with accurate position information as anchors for localizing the target. Numerical results show that our scheme can select good UEs as anchors with very high probability, indicating that networked sensing can be realized in practice with the aid of UEs. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: accepted by ICASSP 2024

arXiv:2311.06009 [pdf, other]

doi 10.1007/978-3-031-43990-2_57

Polar-Net: A Clinical-Friendly Model for Alzheimer's Disease Detection in OCTA Images

Authors: Shouyue Liu, **kui Hao, Yanwu Xu, Huazhu Fu, Xinyu Guo, Jiang Liu, Yalin Zheng, Yonghuai Liu, Jiong Zhang, Yitian Zhao

Abstract: Optical Coherence Tomography Angiography (OCTA) is a promising tool for detecting Alzheimer's disease (AD) by imaging the retinal microvasculature. Ophthalmologists commonly use region-based analysis, such as the ETDRS grid, to study OCTA image biomarkers and understand the correlation with AD. However, existing studies have used general deep computer vision methods, which present challenges in pr… ▽ More Optical Coherence Tomography Angiography (OCTA) is a promising tool for detecting Alzheimer's disease (AD) by imaging the retinal microvasculature. Ophthalmologists commonly use region-based analysis, such as the ETDRS grid, to study OCTA image biomarkers and understand the correlation with AD. However, existing studies have used general deep computer vision methods, which present challenges in providing interpretable results and leveraging clinical prior knowledge. To address these challenges, we propose a novel deep-learning framework called Polar-Net. Our approach involves map** OCTA images from Cartesian coordinates to polar coordinates, which allows for the use of approximate sector convolution and enables the implementation of the ETDRS grid-based regional analysis method commonly used in clinical practice. Furthermore, Polar-Net incorporates clinical prior information of each sector region into the training process, which further enhances its performance. Additionally, our framework adapts to acquire the importance of the corresponding retinal region, which helps researchers and clinicians understand the model's decision-making process in detecting AD and assess its conformity to clinical observations. Through evaluations on private and public datasets, we have demonstrated that Polar-Net outperforms existing state-of-the-art methods and provides more valuable pathological evidence for the association between retinal vascular changes and AD. In addition, we also show that the two innovative modules introduced in our framework have a significant impact on improving overall performance. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: Accepted by MICCAI2023

arXiv:2311.04248 [pdf, other]

DDPET-3D: Dose-aware Diffusion Model for 3D Ultra Low-dose PET Imaging

Authors: Huidong Xie, Weijie Gan, Bo Zhou, Xiongchao Chen, Qiong Liu, Xueqi Guo, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Ge Wang, Chi Liu

Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image… ▽ More As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image reconstructions due to the memory burden. Directly stacking 2D slices together to create 3D image volumes would results in severe inconsistencies between slices. Previous works tried to either apply a penalty term along the z-axis to remove inconsistencies or reconstruct the 3D image volumes with 2 pre-trained perpendicular 2D diffusion models. Nonetheless, these previous methods failed to produce satisfactory results in challenging cases for PET image denoising. In addition to administered dose, the noise levels in PET images are affected by several other factors in clinical settings, e.g. scan time, medical history, patient size, and weight, etc. Therefore, a method to simultaneously denoise PET images with different noise-levels is needed. Here, we proposed a Dose-aware Diffusion model for 3D low-dose PET imaging (DDPET-3D) to address these challenges. We extensively evaluated DDPET-3D on 100 patients with 6 different low-dose levels (a total of 600 testing studies), and demonstrated superior performance over previous diffusion models for 3D imaging problems as well as previous noise-aware medical image denoising models. The code is available at: https://github.com/xxx/xxx. △ Less

Submitted 28 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Paper under review. 16 pages, 11 figures, 4 tables

arXiv:2310.15566 [pdf, other]

RIS-Aided Receive Generalized Spatial Modulation Design with Reflecting Modulation

Authors: Xinghao Guo, Yin Xu, Hanjiang Hong, De Mi, Ruiqi Liu, Dazhi He, Wenjun Zhang, Yi-yan Wu

Abstract: Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception perform… ▽ More Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception performance compared to conventional SM. To overcome the limitations of SM, this paper combines GSM with RIS and proposes the RIS-aided receive generalized spatial modulation (RIS-RGSM) scheme. The RIS-RGSM diversity scheme is realized via a simple improvement based on the state-of-the-art scheme. To further increase the transmission rate, a novel RIS-RGSM MUX scheme is proposed, where the reflection phase shifts and on/off states of RIS elements are configured to achieve bit map**. The theoretical bit error rate (BER) of the proposed scheme is derived and agrees well with the simulation results. Numerical simulations show that the RIS-RGSM MUX scheme has better BER performance than the diversity scheme. The proposed scheme can significantly increase the transmission rate and maintain good performance compared to the existing scheme under a limited number of antennas. △ Less

Submitted 15 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 6 pages, submitted to Conference

arXiv:2310.15565 [pdf, other]

Capacity-based Spatial Modulation Constellation and Pre-scaling Design

Authors: Xinghao Guo, Hanjiang Hong, Yin Xu, Yi-yan Wu, Dazhi He, Wenjun Zhang

Abstract: Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scali… ▽ More Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scaling coefficients are optimized by maximizing the BICM capacity without channel state information (CSI) feedback. Optimization results are given for the multiple-input-single-output (MISO) system with Rayleigh channel. Simulation result shows the proposed scheme provides a meaningful performance gain compared to conventional SM system without CSI feedback. The proposed optimization design scheme can be a promising technology for future 6G to achieve high-efficiency. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 6 pages,conference

arXiv:2309.02780 [pdf, other]

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

Abstract: This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our pro… ▽ More This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code. △ Less

Submitted 11 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.12443 [pdf, other]

TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction

Authors: Xueqi Guo, Luyao Shi, Xiongchao Chen, Bo Zhou, Qiong Liu, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek

Abstract: The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods… ▽ More The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one map**. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted by Simulation and Synthesis in Medical Imaging (SASHIMI 2023, MICCAI workshop), preprint version

arXiv:2307.09624 [pdf, other]

Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions

Authors: Huidong Xie, Bo Zhou, Xiongchao Chen, Xueqi Guo, Stephanie Thorn, Yi-Hwa Liu, Ge Wang, Albert Sinusas, Chi Liu

Abstract: Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects… ▽ More Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners. △ Less

Submitted 23 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: Early accepted by MICCAI 2023 in Vancouver, Canada

arXiv:2305.15911 [pdf, other]

NexToU: Efficient Topology-Aware U-Net for Medical Image Segmentation

Authors: Pengcheng Shi, Xutao Guo, Yanwu Yang, Chenfei Ye, Ting Ma

Abstract: Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variabili… ▽ More Convolutional neural networks (CNN) and Transformer variants have emerged as the leading medical image segmentation backbones. Nonetheless, due to their limitations in either preserving global image context or efficiently processing irregular shapes in visual objects, these backbones struggle to effectively integrate information from diverse anatomical regions and reduce inter-individual variability, particularly for the vasculature. Motivated by the successful breakthroughs of graph neural networks (GNN) in capturing topological properties and non-Euclidean relationships across various fields, we propose NexToU, a novel hybrid architecture for medical image segmentation. NexToU comprises improved Pool GNN and Swin GNN modules from Vision GNN (ViG) for learning both global and local topological representations while minimizing computational costs. To address the containment and exclusion relationships among various anatomical structures, we reformulate the topological interaction (TI) module based on the nature of binary trees, rapidly encoding the topological constraints into NexToU. Extensive experiments conducted on three datasets (including distinct imaging dimensions, disease types, and imaging modalities) demonstrate that our method consistently outperforms other state-of-the-art (SOTA) architectures. All the code is publicly available at https://github.com/PengchengShi1220/NexToU. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 13 pages, 6 figures

arXiv:2305.12553 [pdf, other]

Markov $α$-Potential Games

Authors: Xin Guo, Xinyu Li, Chinmay Maheshwari, Shankar Sastry, Manxi Wu

Abstract: This paper proposes a new framework of Markov $α$-potential games to study Markov games. In this new framework, Markov games are shown to be Markov $α$-potential games, and the existence of an associated $α$-potential function is established. Any optimizer of an $α$-potential function is shown to be an $α$-stationary NE. Two important classes of practically significant Markov games, Markov congest… ▽ More This paper proposes a new framework of Markov $α$-potential games to study Markov games. In this new framework, Markov games are shown to be Markov $α$-potential games, and the existence of an associated $α$-potential function is established. Any optimizer of an $α$-potential function is shown to be an $α$-stationary NE. Two important classes of practically significant Markov games, Markov congestion games and the perturbed Markov team games, are studied via this framework of Markov $α$-potential games, with explicit characterization of an upper bound for $α$ and its relation to game parameters. Additionally, a semi-infinite linear programming based formulation is presented to obtain an upper bound for $α$ for any Markov game. Furthermore, two equilibrium approximation algorithms, namely the projected gradient-ascent algorithm and the sequential maximum improvement algorithm, are presented along with their Nash regret analysis, and corroborated by numerical experiments. △ Less

Submitted 9 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 32 pages, 3 figures

MSC Class: 91A68; 91A50; 91A15; 91A14; 91A10

arXiv:2305.08051 [pdf, other]

Prox-DBRO-VR: A Unified Analysis on Decentralized Byzantine-Resilient Composite Stochastic Optimization with Variance Reduction and Non-Asymptotic Convergence Rates

Authors: **hui Hu, Guo Chen, Huaqing Li, Xiaoyu Guo, Tingwen Huang

Abstract: Decentralized stochastic gradient algorithms resolve efficiently large-scale finite-sum optimization problems when all agents over networks are reliable. However, most of these algorithms are not resilient to adverse conditions, such as malfunctioning agents, software bugs, and cyber attacks. This paper aims to handle a class of general composite finite-sum optimization problems over multi-agent c… ▽ More Decentralized stochastic gradient algorithms resolve efficiently large-scale finite-sum optimization problems when all agents over networks are reliable. However, most of these algorithms are not resilient to adverse conditions, such as malfunctioning agents, software bugs, and cyber attacks. This paper aims to handle a class of general composite finite-sum optimization problems over multi-agent cyber-physical systems (CPSs) in the presence of an unknown number of Byzantine agents. Based on the proximal map** method, variance-reduced (VR) techniques, and a norm-penalized approximation strategy, we propose a decentralized Byzantine-resilient and proximal-gradient algorithmic framework, dubbed Prox-DBRO-VR,which achieves an optimization and control goal using only local computations and communications. To reduce asymptotically the variance generated by evaluating the local noisy stochastic gradients, we incorporate two localized VR techniques (SAGA and LSVRG) into Prox-DBRO-VR to design Prox-DBRO-SAGA and Prox-DBRO-LSVRG. By analyzing the contraction relationships among the gradient-learning error, robust consensus condition, and optimality gap in a unified theoretical framework, it is demonstrated that both Prox-DBRO-SAGA and Prox-DBRO-LSVRG,with a well-designed constant (resp., decaying) step-size, converge linearly (resp., sublinearly) inside an error ball around the optimal solution to the original problem under standard assumptions. The trade-off between convergence accuracy and the number of Byzantine agents in both linear and sub-linear cases is also characterized. In simulation, the effectiveness and practicability of the proposed algorithms are manifested via resolving a decentralized sparse machine-learning problem over multi-agent CPSs under various Byzantine attacks. △ Less

Submitted 29 April, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

Comments: 17 pages, 13 figures

arXiv:2305.04293 [pdf, other]

Location Tracking for Reconfigurable Intelligent Surfaces Aided Vehicle Platoons: Diverse Sparsities Inspired Approaches

Authors: Yuanbin Chen, Ying Wang, Xufeng Guo, Zhu Han, ** Zhang

Abstract: In this paper, we investigate the employment of reconfigurable intelligent surfaces (RISs) into vehicle platoons, functioning in tandem with a base station (BS) in support of the high-precision location tracking. In particular, the use of a RIS imposes additional structured sparsity that, when paired with the initial sparse line-of-sight (LoS) channels of the BS, facilitates beneficial group spars… ▽ More In this paper, we investigate the employment of reconfigurable intelligent surfaces (RISs) into vehicle platoons, functioning in tandem with a base station (BS) in support of the high-precision location tracking. In particular, the use of a RIS imposes additional structured sparsity that, when paired with the initial sparse line-of-sight (LoS) channels of the BS, facilitates beneficial group sparsity. The resultant group sparsity significantly enriches the energies of the original direct-only channel, enabling a greater concentration of the LoS channel energies emanated from the same vehicle location index. Furthermore, the burst sparsity is exposed by representing the non-line-of-sight (NLoS) channels as their sparse copies. This thus constitutes the philosophy of the diverse sparsities of interest. Then, a diverse dynamic layered structured sparsity (DiLuS) framework is customized for capturing different priors for this pair of sparsities, based upon which the location tracking problem is formulated as a maximum a posterior (MAP) estimate of the location. Nevertheless, the tracking issue is highly intractable due to the ill-conditioned sensing matrix, intricately coupled latent variables associated with the BS and RIS, and the spatialtemporal correlations among the vehicle platoon. To circumvent these hurdles, we propose an efficient algorithm, namely DiLuS enabled spatial-temporal platoon localization (DiLuS-STPL), which incorporates both variational Bayesian inference (VBI) and message passing techniques for recursively achieving parameter updates in a turbo-like way. Finally, we demonstrate through extensive simulation results that the localization relying exclusively upon a BS and a RIS may achieve the comparable precision performance obtained by the two individual BSs, along with the robustness and superiority of our proposed algorithm as compared to various benchmark schemes. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: This manuscript has been accepted for publication in IEEE JSAC

arXiv:2305.01899 [pdf, other]

Revolutionizing Agrifood Systems with Artificial Intelligence: A Survey

Authors: Tao Chen, Liang Lv, Di Wang, **g Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, Dacheng Tao

Abstract: With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applicati… ▽ More With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages. Recently, artificial intelligence (AI) techniques such as deep learning (DL) have demonstrated their strong abilities in various areas, including language, vision, remote sensing (RS), and agrifood systems applications. However, the overall impact of AI on agrifood systems remains unclear. In this paper, we thoroughly review how AI techniques can transform agrifood systems and contribute to the modern agrifood industry. Firstly, we summarize the data acquisition methods in agrifood systems, including acquisition, storage, and processing techniques. Secondly, we present a progress review of AI methods in agrifood systems, specifically in agriculture, animal husbandry, and fishery, covering topics such as agrifood classification, growth monitoring, yield prediction, and quality assessment. Furthermore, we highlight potential challenges and promising research opportunities for transforming modern agrifood systems with AI. We hope this survey could offer an overall picture to newcomers in the field and serve as a starting point for their further research. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: Submitted to ACM

arXiv:2304.14900 [pdf, other]

Unified Noise-aware Network for Low-count PET Denoising

Authors: Huidong Xie, Qiong Liu, Bo Zhou, Xiongchao Chen, Xueqi Guo, Chi Liu

Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. However, low-count PET scans often suffer from high image noise, which can negatively impact image quality and diagnostic performance. Recent advances in deep learning have shown great potential for recovering underlying signal from noisy counterparts. Howeve… ▽ More As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. However, low-count PET scans often suffer from high image noise, which can negatively impact image quality and diagnostic performance. Recent advances in deep learning have shown great potential for recovering underlying signal from noisy counterparts. However, neural networks trained on a specific noise level cannot be easily generalized to other noise levels due to different noise amplitude and variances. To obtain optimal denoised results, we may need to train multiple networks using data with different noise levels. But this approach may be infeasible in reality due to limited data availability. Denoising dynamic PET images presents additional challenge due to tracer decay and continuously changing noise levels across dynamic frames. To address these issues, we propose a Unified Noise-aware Network (UNN) that combines multiple sub-networks with varying denoising power to generate optimal denoised results regardless of the input noise levels. Evaluated using large-scale data from two medical centers with different vendors, presented results showed that the UNN can consistently produce promising denoised results regardless of input noise levels, and demonstrate superior performance over networks trained on single noise level data, especially for extremely low-count data. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: 10 Pages, 6 Figures, 1 table. Paper under review

arXiv:2304.08380 [pdf, other]

Physics-inspired Neuroacoustic Computing Based on Tunable Nonlinear Multiple-scattering

Authors: Ali Momeni, Xinxin Guo, Herve Lissek, Romain Fleury

Abstract: Waves, such as light and sound, inherently bounce and mix due to multiple scattering induced by the complex material objects that surround us. This scattering process severely scrambles the information carried by waves, challenging conventional communication systems, sensing paradigms, and wave-based computing schemes. Here, we show that instead of being a hindrance, multiple scattering can be ben… ▽ More Waves, such as light and sound, inherently bounce and mix due to multiple scattering induced by the complex material objects that surround us. This scattering process severely scrambles the information carried by waves, challenging conventional communication systems, sensing paradigms, and wave-based computing schemes. Here, we show that instead of being a hindrance, multiple scattering can be beneficial to enable and enhance analog nonlinear information map**, allowing for the direct physical implementation of computational paradigms such as reservoir computing and extreme learning machines. We propose a physics-inspired version of such computational architectures for speech and vowel recognition that operate directly in the native domain of the input signal, namely on real-sounds, without any digital pre-processing or encoding conversion and backpropagation training computation. We first implement it in a proof-of-concept prototype, a nonlinear chaotic acoustic cavity containing multiple tunable and power-efficient nonlinear meta-scatterers. We prove the efficiency of the acoustic-based computing system for vowel recognition tasks with high testing classification accuracy (91.4%). Finally, we demonstrate the high performance of vowel recognition in the natural environment of a reverberation room. Our results open the way for efficient acoustic learning machines that operate directly on the input sound, and leverage physics to enable Natural Language Processing (NLP). △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28 pages

arXiv:2304.00570 [pdf, other]

FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising

Authors: Bo Zhou, Huidong Xie, Qiong Liu, Xiongchao Chen, Xueqi Guo, Zhicheng Feng, Jun Hou, S. Kevin Zhou, Biao Li, Axel Rominger, Kuangyu Shi, James S. Duncan, Chi Liu

Abstract: Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutio… ▽ More Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutions for training a robust model is difficult due to privacy and security concerns of patient data. Moreover, low-count PET data at different institutions may have different data distribution, thus requiring personalized models. While previous federated learning (FL) algorithms enable multi-institution collaborative training without the need of aggregating local data, addressing the large domain shift in the application of multi-institutional low-count PET denoising remains a challenge and is still highly under-explored. In this work, we propose FedFTN, a personalized federated learning strategy that addresses these challenges. FedFTN uses a local deep feature transformation network (FTN) to modulate the feature outputs of a globally shared denoising network, enabling personalized low-count PET denoising for each institution. During the federated learning process, only the denoising network's weights are communicated and aggregated, while the FTN remains at the local institutions for feature transformation. We evaluated our method using a large-scale dataset of multi-institutional low-count PET imaging data from three medical centers located across three continents, and showed that FedFTN provides high-quality low-count PET images, outperforming previous baseline FL reconstruction methods across all low-count levels at all three institutions. △ Less

Submitted 6 October, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: 13 pages, 6 figures, Accepted at Medical Image Analysis Journal (MedIA)

arXiv:2303.15124 [pdf, other]

Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

Authors: Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

Abstract: Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that… ▽ More Medical images often contain artificial markers added by doctors, which can negatively affect the accuracy of AI-based diagnosis. To address this issue and recover the missing visual contents, inpainting techniques are highly needed. However, existing inpainting methods require manual mask input, limiting their application scenarios. In this paper, we introduce a novel blind inpainting method that automatically completes visual contents without specifying masks for target areas in an image. Our proposed model includes a mask-free reconstruction network and an object-aware discriminator. The reconstruction network consists of two branches that predict the corrupted regions with artificial markers and simultaneously recover the missing visual contents. The object-aware discriminator relies on the powerful recognition capabilities of the dense object detector to ensure that the markers of reconstructed images cannot be detected in any local regions. As a result, the reconstructed image can be close to the clean one as much as possible. Our proposed method is evaluated on different medical image datasets, covering multiple imaging modalities such as ultrasound (US), magnetic resonance imaging (MRI), and electron microscopy (EM), demonstrating that our method is effective and robust against various unknown missing region patterns. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.14044 [pdf, other]

MusicFace: Music-driven Expressive Singing Face Synthesis

Authors: Pengfei Liu, Wen** Deng, Hengda Li, **tai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng

Abstract: It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse str… ▽ More It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into human voice stream and background music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressiveness of the generated results, we propose to decompose head movements generation into speed generation and direction generation, and decompose eye states generation into the short-time eye blinking generation and the long-time eye closing generation to model them separately. We also build a novel SingingFace Dataset to support the training and evaluation of this task, and to facilitate future works on this topic. Extensive experiments and user study show that our proposed method is capable of synthesizing vivid singing face, which is better than state-of-the-art methods qualitatively and quantitatively. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: Accepted to CVMJ

arXiv:2302.14314 [pdf, other]

Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers

Authors: Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Kong, Bingquan Shen, Alex Kot

Abstract: Continual learning involves training neural networks incrementally for new tasks while retaining the knowledge of previous tasks. However, efficiently fine-tuning the model for sequential tasks with minimal computational resources remains a challenge. In this paper, we propose Task Incremental Continual Learning (TI-CL) of audio classifiers with both parameter-efficient and compute-efficient Audio… ▽ More Continual learning involves training neural networks incrementally for new tasks while retaining the knowledge of previous tasks. However, efficiently fine-tuning the model for sequential tasks with minimal computational resources remains a challenge. In this paper, we propose Task Incremental Continual Learning (TI-CL) of audio classifiers with both parameter-efficient and compute-efficient Audio Spectrogram Transformers (AST). To reduce the trainable parameters without performance degradation for TI-CL, we compare several Parameter Efficient Transfer (PET) methods and propose AST with Convolutional Adapters for TI-CL, which has less than 5% of trainable parameters of the fully fine-tuned counterparts. To reduce the computational complexity, we introduce a novel Frequency-Time factorized Attention (FTA) method that replaces the traditional self-attention in transformers for audio spectrograms. FTA achieves competitive performance with only a factor of the computations required by Global Self-Attention (GSA). Finally, we formulate our method for TI-CL, called Adapter Incremental Continual Learning (AI-CL), as a combination of the "parameter-efficient" Convolutional Adapter and the "compute-efficient" FTA. Experiments on ESC-50, SpeechCommandsV2 (SCv2), and Audio-Visual Event (AVE) benchmarks show that our proposed method prevents catastrophic forgetting in TI-CL while maintaining a lower computational budget. △ Less

Submitted 2 January, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.14277 [pdf, other]

DECOR-NET: A COVID-19 Lung Infection Segmentation Network Improved by Emphasizing Low-level Features and Decorrelating Features

Authors: Jiesi Hu, Yanwu Yang, Xutao Guo, Ting Ma

Abstract: Since 2019, coronavirus Disease 2019 (COVID-19) has been widely spread and posed a serious threat to public health. Chest Computed Tomography (CT) holds great potential for screening and diagnosis of this disease. The segmentation of COVID-19 CT imaging can achieves quantitative evaluation of infections and tracks disease progression. COVID-19 infections are characterized by high heterogeneity and… ▽ More Since 2019, coronavirus Disease 2019 (COVID-19) has been widely spread and posed a serious threat to public health. Chest Computed Tomography (CT) holds great potential for screening and diagnosis of this disease. The segmentation of COVID-19 CT imaging can achieves quantitative evaluation of infections and tracks disease progression. COVID-19 infections are characterized by high heterogeneity and unclear boundaries, so capturing low-level features such as texture and intensity is critical for segmentation. However, segmentation networks that emphasize low-level features are still lacking. In this work, we propose a DECOR-Net capable of capturing more decorrelated low-level features. The channel re-weighting strategy is applied to obtain plenty of low-level features and the dependencies between channels are reduced by proposed decorrelation loss. Experiments show that DECOR-Net outperforms other cutting-edge methods and surpasses the baseline by 5.1% and 4.9% in terms of Dice coefficient and intersection over union. Moreover, the proposed decorrelation loss can improve the performance constantly under different settings. The Code is available at https://github.com/jiesihu/DECOR-Net.git. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.12662 [pdf, other]

FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

Authors: Tianpeng Deng, Yanqi Huang, Guoqiang Han, Zhenwei Shi, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-**g Guo, C. L. Philip Chen, Chu Han

Abstract: Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated… ▽ More Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated training samples and numerous rounds of communication which hinder their practicability in the real-world clinical scenario. In this paper, we propose a universal and lightweight federated learning framework, named Federated Deep-Broad Learning (FedDBL), to achieve superior classification performance with limited training samples and only one-round communication. By simply associating a pre-trained deep learning feature extractor, a fast and lightweight broad learning inference system and a classical federated aggregation approach, FedDBL can dramatically reduce data dependency and improve communication efficiency. Five-fold cross-validation demonstrates that FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Furthermore, due to the lightweight design and one-round communication, FedDBL reduces the communication burden from 4.6GB to only 276.5KB per client using the ResNet-50 backbone at 50-round training. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk. Code is available at https://github.com/tianpeng-deng/FedDBL. △ Less

Submitted 17 December, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

arXiv:2302.07135 [pdf, other]

Fast-MC-PET: A Novel Deep Learning-aided Motion Correction and Reconstruction Framework for Accelerated PET

Authors: Bo Zhou, Yu-Jung Tsai, Jiazhen Zhang, Xueqi Guo, Huidong Xie, Xiongchao Chen, Tianshun Miao, Yihuan Lu, James S. Duncan, Chi Liu

Abstract: Patient motion during PET is inevitable. Its long acquisition time not only increases the motion and the associated artifacts but also the patient's discomfort, thus PET acceleration is desirable. However, accelerating PET acquisition will result in reconstructed images with low SNR, and the image quality will still be degraded by motion-induced artifacts. Most of the previous PET motion correctio… ▽ More Patient motion during PET is inevitable. Its long acquisition time not only increases the motion and the associated artifacts but also the patient's discomfort, thus PET acceleration is desirable. However, accelerating PET acquisition will result in reconstructed images with low SNR, and the image quality will still be degraded by motion-induced artifacts. Most of the previous PET motion correction methods are motion type specific that require motion modeling, thus may fail when multiple types of motion present together. Also, those methods are customized for standard long acquisition and could not be directly applied to accelerated PET. To this end, modeling-free universal motion correction reconstruction for accelerated PET is still highly under-explored. In this work, we propose a novel deep learning-aided motion correction and reconstruction framework for accelerated PET, called Fast-MC-PET. Our framework consists of a universal motion correction (UMC) and a short-to-long acquisition reconstruction (SL-Reon) module. The UMC enables modeling-free motion correction by estimating quasi-continuous motion from ultra-short frame reconstructions and using this information for motion-compensated reconstruction. Then, the SL-Recon converts the accelerated UMC image with low counts to a high-quality image with high counts for our final reconstruction output. Our experimental results on human studies show that our Fast-MC-PET can enable 7-fold acceleration and use only 2 minutes acquisition to generate high-quality reconstruction images that outperform/match previous motion correction reconstruction methods using standard 15 minutes long acquisition data. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: Accepted at Information Processing in Medical Imaging (IPMI 2023)

arXiv:2301.12325 [pdf, ps, other]

Data-Driven Load-Current Sharing Control for Multi-Stack Fuel Cell System with Circulating Current Mitigation

Authors: Yiqiao Xu, Xiaoyu Guo, Zhen Dong, Zhengtao Ding, Alessandra Parisio

Abstract: The global trend toward renewable power generation has drawn great attention to hydrogen Fuel Cells (FCs), which have a wide variety of applications, from utility power stations to laptops. The Multi-stack Fuel Cell System (MFCS), which is an assembly of FC stacks, can be a remedy for obstacles in high-power applications. However, the output voltage of FC stacks varies dramatically under variable… ▽ More The global trend toward renewable power generation has drawn great attention to hydrogen Fuel Cells (FCs), which have a wide variety of applications, from utility power stations to laptops. The Multi-stack Fuel Cell System (MFCS), which is an assembly of FC stacks, can be a remedy for obstacles in high-power applications. However, the output voltage of FC stacks varies dramatically under variable load conditions; hence, in order for MFCS to be efficiently operated and guarantee an appropriate load-current sharing among the FC stacks, advanced converter controllers for power conditioning need to be designed. An accurate circuit model is essential for controller design, which accounts for the fact that the parameters of some converter components may change due to aging and repetitive stress in long-term operations. Existing control frameworks and parametric and non-parametric system identification techniques do not consider the aforementioned challenges. Thus, this paper investigates the potential of a data-driven method that, without system identification, directly implements control on paralleled converters using raw data. Based on pre-collected input/output trajectories, a non-parametric representation of the overall circuit is produced for implementing predictive control. While approaching equal current sharing within the MFCS, the proposed method considers the minimization of load-following error and mitigation of circulating current between the converters. Simulation results verify the effectiveness of the proposed method. △ Less

Submitted 28 January, 2023; originally announced January 2023.

arXiv:2211.12080 [pdf, other]

Robust Training for Speaker Verification against Noisy Labels

Authors: Zhihua Fang, Liang He, Hanhan Ma, Xiaochen Guo, Lin Li

Abstract: The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a novel two-stage learning method to filter out noisy labels from speaker datasets. Since a DNN will first fit data with clean labels, we first train the model with a… ▽ More The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a novel two-stage learning method to filter out noisy labels from speaker datasets. Since a DNN will first fit data with clean labels, we first train the model with all data for several epochs. Then, based on this model, the model predictions are compared with the labels using our proposed the OR-Gate with top-k mechanism to select the data with clean labels and the selected data is used to train the model. This process is iterated until the training is completed. We have demonstrated the effectiveness of this method in filtering noisy labels through extensive experiments and have achieved excellent performance on the VoxCeleb (1 and 2) with different added noise rates. △ Less

Submitted 25 May, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted by INTERSPEECH 2023

arXiv:2210.17408 [pdf, ps, other]

Accelerating Diffusion Models via Pre-segmentation Diffusion Sampling for Medical Image Segmentation

Authors: Xutao Guo, Yanwu Yang, Chenfei Ye, Shang Lu, Yang Xiang, Ting Ma

Abstract: Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian n… ▽ More Based on the Denoising Diffusion Probabilistic Model (DDPM), medical image segmentation can be described as a conditional image generation task, which allows to compute pixel-wise uncertainty maps of the segmentation and allows an implicit ensemble of segmentations to boost the segmentation performance. However, DDPM requires many iterative denoising steps to generate segmentations from Gaussian noise, resulting in extremely inefficient inference. To mitigate the issue, we propose a principled acceleration strategy, called pre-segmentation diffusion sampling DDPM (PD-DDPM), which is specially used for medical image segmentation. The key idea is to obtain pre-segmentation results based on a separately trained segmentation network, and construct noise predictions (non-Gaussian distribution) according to the forward diffusion rule. We can then start with noisy predictions and use fewer reverse steps to generate segmentation results. Experiments show that PD-DDPM yields better segmentation results over representative baseline methods even if the number of reverse steps is significantly reduced. Moreover, PD-DDPM is orthogonal to existing advanced segmentation models, which can be combined to further improve the segmentation performance. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.13721 [pdf, other]

Multi-modal Dynamic Graph Network: Coupling Structural and Functional Connectome for Disease Diagnosis and Classification

Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Ting Ma

Abstract: Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of gre… ▽ More Multi-modal neuroimaging technology has greatlly facilitated the efficiency and diagnosis accuracy, which provides complementary information in discovering objective disease biomarkers. Conventional deep learning methods, e.g. convolutional neural networks, overlook relationships between nodes and fail to capture topological properties in graphs. Graph neural networks have been proven to be of great importance in modeling brain connectome networks and relating disease-specific patterns. However, most existing graph methods explicitly require known graph structures, which are not available in the sophisticated brain system. Especially in heterogeneous multi-modal brain networks, there exists a great challenge to model interactions among brain regions in consideration of inter-modal dependencies. In this study, we propose a Multi-modal Dynamic Graph Convolution Network (MDGCN) for structural and functional brain network learning. Our method benefits from modeling inter-modal representations and relating attentive multi-model associations into dynamic graphs with a compositional correspondence matrix. Moreover, a bilateral graph convolution layer is proposed to aggregate multi-modal representations in terms of multi-modal associations. Extensive experiments on three datasets demonstrate the superiority of our proposed method in terms of disease classification, with the accuracy of 90.4%, 85.9% and 98.3% in predicting Mild Cognitive Impairment (MCI), Parkinson's disease (PD), and schizophrenia (SCHZ) respectively. Furthermore, our statistical evaluations on the correspondence matrix exhibit a high correspondence with previous evidence of biomarkers. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.12621 [pdf, other]

Development of a Hybrid Simulation and Experiment Test Platform for Dynamic Positioning Vessels

Authors: Changjun Hu, Quan Shi, Xin Li, Xiaoxian Guo

Abstract: The harsh ocean environment and complex operating condition require high dynamic positioning (DP) capability of offshore vessel. The design, development and performance evaluation of DP system are generally carried out by numerical simulations or scale model experiments. Compared with the time-consuming and laborious experiment, the simulation is convenient and low cost, but its results lack pract… ▽ More The harsh ocean environment and complex operating condition require high dynamic positioning (DP) capability of offshore vessel. The design, development and performance evaluation of DP system are generally carried out by numerical simulations or scale model experiments. Compared with the time-consuming and laborious experiment, the simulation is convenient and low cost, but its results lack practical reference due to oversimplification of the model. Therefore, this paper presents a hybrid simulation and experiment test platform for DP vessels. Its characteristics are: the realistic calculation of environmental loads and motion response, the consistency of algorithms and parameters for simulation and experiment greatly shortening the time of experiment adjusting, switchable and online renewable controller facilitating algorithm testing. The test platform can test the performance of DP system and determine the operational time window. In the hydrodynamic simulation, the six degree-of-freedom model is used to describe the dynamic response of the DP vessel, considering the fluid memory effect and frequency-dependent hydrodynamic parameters. In the experiment, the similarity theory based on the same Froude number is used to ensure the consistency of control parameters with simulation. Finally, a case study of DP shuttle tanker is used to verify the credibility of the test platform. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2210.11577 [pdf, other]

Global Convergence of Direct Policy Search for State-Feedback $\mathcal{H}_\infty$ Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

Authors: Xingang Guo, Bin Hu

Abstract: Direct policy search has been widely applied in modern reinforcement learning and continuous control. However, the theoretical properties of direct policy search on nonsmooth robust control synthesis have not been fully understood. The optimal $\mathcal{H}_\infty$ control framework aims at designing a policy to minimize the closed-loop $\mathcal{H}_\infty$ norm, and is arguably the most fundamenta… ▽ More Direct policy search has been widely applied in modern reinforcement learning and continuous control. However, the theoretical properties of direct policy search on nonsmooth robust control synthesis have not been fully understood. The optimal $\mathcal{H}_\infty$ control framework aims at designing a policy to minimize the closed-loop $\mathcal{H}_\infty$ norm, and is arguably the most fundamental robust control paradigm. In this work, we show that direct policy search is guaranteed to find the global solution of the robust $\mathcal{H}_\infty$ state-feedback control design problem. Notice that policy search for optimal $\mathcal{H}_\infty$ control leads to a constrained nonconvex nonsmooth optimization problem, where the nonconvex feasible set consists of all the policies stabilizing the closed-loop dynamics. We show that for this nonsmooth optimization problem, all Clarke stationary points are global minimum. Next, we identify the coerciveness of the closed-loop $\mathcal{H}_\infty$ objective function, and prove that all the sublevel sets of the resultant policy search problem are compact. Based on these properties, we show that Goldstein's subgradient method and its implementable variants can be guaranteed to stay in the nonconvex feasible set and eventually find the global optimal solution of the $\mathcal{H}_\infty$ state-feedback synthesis problem. Our work builds a new connection between nonconvex nonsmooth optimization theory and robust control, leading to an interesting global convergence result for direct policy search on optimal $\mathcal{H}_\infty$ synthesis. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2210.01125 [pdf]

Spectral2Spectral: Image-spectral Similarity Assisted Spectral CT Deep Reconstruction without Reference

Authors: Xiaodong Guo, Longhui Li, Dingyue Chang, Peng He, Peng Feng, Hengyong Yu, Weiwen Wu

Abstract: Spectral computed tomography based on a photon-counting detector (PCD) attracts more and more attentions since it has the capability to provide more accurate identification and quantitative analysis for biomedical materials. The limited number of photons within narrow energy bins leads to imaging results of low signal-noise ratio. The existing supervised deep reconstruction networks for CT reconst… ▽ More Spectral computed tomography based on a photon-counting detector (PCD) attracts more and more attentions since it has the capability to provide more accurate identification and quantitative analysis for biomedical materials. The limited number of photons within narrow energy bins leads to imaging results of low signal-noise ratio. The existing supervised deep reconstruction networks for CT reconstruction are difficult to address these challenges because it is usually impossible to acquire noise-free clinical images with clear structures as references. In this paper, we propose an iterative deep reconstruction network to synergize unsupervised method and data priors into a unified framework, named as Spectral2Spectral. Our Spectral2Spectral employs an unsupervised deep training strategy to obtain high-quality images from noisy data in an end-to-end fashion. The structural similarity prior within image-spectral domain is refined as a regularization term to further constrain the network training. The weights of neural network are automatically updated to capture image features and structures within the iterative process. Three large-scale preclinical datasets experiments demonstrate that the Spectral2spectral reconstructs better image quality than other the state-of-the-art methods. △ Less

Submitted 16 November, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: Accepted by IEEE TCI

arXiv:2210.00902 [pdf]

AdaComm: Tracing Channel Dynamics for Reliable Cross-Technology Communication

Authors: Weiguo Wang, Xiaolong Zheng, Yuan He, Xiuzhen Guo

Abstract: Cross-Technology Communication (CTC) is an emerging technology to support direct communication between wireless devices that follow different standards. In spite of the many different proposals from the community to enable CTC, the performance aspect of CTC is an equally important problem but has seldom been studied before. We find this problem is extremely challenging, due to the following reason… ▽ More Cross-Technology Communication (CTC) is an emerging technology to support direct communication between wireless devices that follow different standards. In spite of the many different proposals from the community to enable CTC, the performance aspect of CTC is an equally important problem but has seldom been studied before. We find this problem is extremely challenging, due to the following reasons: on one hand, a link for CTC is essentially different from a conventional wireless link. The conventional link indicators like RSSI (received signal strength indicator) and SNR (signal to noise ratio) cannot be used to directly characterize a CTC link. On the other hand, the indirect indicators like PER (packet error rate), which is adopted by many existing CTC proposals, cannot capture the short-term link behavior. As a result, the existing CTC proposals fail to keep reliable performance under dynamic channel conditions. In order to address the above challenge, we in this paper propose AdaComm, a generic framework to achieve self-adaptive CTC in dynamic channels. Instead of reactively adjusting the CTC sender, AdaComm adopts online learning mechanism to adaptively adjust the decoding model at the CTC receiver. The self-adaptive decoding model automatically learns the effective features directly from the raw received signals that are embedded with the current channel state. With the lossless channel information, AdaComm further adopts the fine tuning and full training modes to cope with the continuous and abrupt channel dynamics. We implement AdaComm and integrate it with two existing CTC approaches that respectively employ CSI (channel state information) and RSSI as the information carrier. The evaluation results demonstrate that AdaComm can significantly reduce the SER (symbol error rate) by 72.9% and 49.2%, respectively, compared with the existing approaches. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.15351 [pdf]

Efficient Ambient LoRa Backscatter with On-Off Keying Modulation

Authors: Xiuzhen Guo, Longfei Shangguan, Yuan He, Jia Zhang, Haotian Jiang, Awais Ahmad Siddiqi, Yunhao Liu

Abstract: Backscatter communication holds potential for ubiquitous and low-cost connectivity among low-power IoT devices. To avoid interference between the carrier signal and the backscatter signal, recent works propose a frequency-shifting technique to separate these two signals in the frequency domain. Such proposals, however, have to occupy the precious wireless spectrum that is already overcrowded, and… ▽ More Backscatter communication holds potential for ubiquitous and low-cost connectivity among low-power IoT devices. To avoid interference between the carrier signal and the backscatter signal, recent works propose a frequency-shifting technique to separate these two signals in the frequency domain. Such proposals, however, have to occupy the precious wireless spectrum that is already overcrowded, and increase the power, cost, and complexity of the backscatter tag. In this paper, we revisit the classic ON-OFF Keying (OOK) modulation and propose Aloba, a backscatter system that takes the ambient LoRa transmissions as the excitation and piggybacks the in-band OOK modulated signals over the LoRa transmissions. Our design enables the backsactter signal to work in the same frequency band of the carrier signal, meanwhile achieving flexible data rate at different transmission range. The key contributions of Aloba include: (1) the design of a low-power backscatter tag that can pick up the ambient LoRa signals from other signals. (2) a novel decoding algorithm to demodulate both the carrier signal and the backscatter signal from their superposition. We further adopt link coding mechanism and interleave operation to enhance the reliability of backscatter signal decoding. We implement Aloba and conduct head-to-head comparison with the state-of-the-art LoRa backscatter system PLoRa in various settings. The experiment results show Aloba can achieve 199.4 Kbps data rate at various distances, 52.4 times higher than PLoRa. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.15348 [pdf]

Saiyan: Design and Implementation of a Low-power Demodulator for LoRa Backscatter Systems

Authors: Xiuzhen Guo, Longfei Shangguan, Yuan He, Nan **g, Jiacheng Zhang, Haotian Jiang, Yunhao Liu

Abstract: The radio range of backscatter systems continues growing as new wireless communication primitives are continuously invented. Nevertheless, both the bit error rate and the packet loss rate of backscatter signals increase rapidly with the radio range, thereby necessitating the cooperation between the access point and the backscatter tags through a feedback loop. Unfortunately, the low-power nature o… ▽ More The radio range of backscatter systems continues growing as new wireless communication primitives are continuously invented. Nevertheless, both the bit error rate and the packet loss rate of backscatter signals increase rapidly with the radio range, thereby necessitating the cooperation between the access point and the backscatter tags through a feedback loop. Unfortunately, the low-power nature of backscatter tags limits their ability to demodulate feedback signals from a remote access point and scales down to such circumstances. This paper presents Saiyan, an ultra-low-power demodulator for long-range LoRa backscatter systems. With Saiyan, a backscatter tag can demodulate feedback signals from a remote access point with moderate power consumption and then perform an immediate packet retransmission in the presence of packet loss. Moreover, Saiyan enables rate adaption and channel hop**-two PHY-layer operations that are important to channel efficiency yet unavailable on long-range backscatter systems. We prototype Saiyan on a two-layer PCB board and evaluate its performance in different environments. Results show that Saiyan achieves 5 gain on the demodulation range, compared with state-of-the-art systems. Our ASIC simulation shows that the power consumption of Saiyan is around 93.2 uW. Code and hardware schematics can be found at: https://github.com/ZangJac/Saiyan. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.15195 [pdf]

RF-Transformer: A Unified Backscatter Radio Hardware Abstraction

Authors: Xiuzhen Guo, Yuan He, Zihao Yu, Jiacheng Zhang, Yunhao Liu, Longfei Shangguan

Abstract: This paper presents RF-Transformer, a unified backscatter radio hardware abstraction that allows a low-power IoT device to directly communicate with heterogeneous wireless receivers at the minimum power consumption. Unlike existing backscatter systems that are tailored to a specific wireless communication protocol, RF-Transformer provides a programmable interface to the micro-controller, allowing… ▽ More This paper presents RF-Transformer, a unified backscatter radio hardware abstraction that allows a low-power IoT device to directly communicate with heterogeneous wireless receivers at the minimum power consumption. Unlike existing backscatter systems that are tailored to a specific wireless communication protocol, RF-Transformer provides a programmable interface to the micro-controller, allowing IoT devices to synthesize different types of protocol-compliant backscatter signals sharing radically different PHY-layer designs. To show the efficacy of our design, we implement a PCB prototype of RF-Transformer on 2.4 GHz ISM band and showcase its capability on generating standard ZigBee, Bluetooth, LoRa, and Wi-Fi 802.11b/g/n/ac packets. Our extensive field studies show that RF-Transformer achieves 23.8 Mbps, 247.1 Kbps, 986.5 Kbps, and 27.3 Kbps throughput when generating standard Wi-Fi, ZigBee, Bluetooth, and LoRa signals while consuming 7.6-74.2 less power than their active counterparts. Our ASIC simulation based on the 65-nm CMOS process shows that the power gain of RF-Transformer can further grow to 92-678. We further integrate RF-Transformer with pressure sensors and present a case study on detecting foot traffic density in hallways. Our 7-day case studies demonstrate RFTransformer can reliably transmit sensor data to a commodity gateway by synthesizing LoRa packets on top of Wi-Fi signals. Our experimental results also verify the compatibility of RF-Transformer with commodity receivers. Code and hardware schematics can be found at: https://github.com/LeFsCC/RF-Transformer. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.08933 [pdf, ps, other]

Estimating Brain Age with Global and Local Dependencies

Authors: Yanwu Yang, Xutao Guo, Zhikai Chang, Chenfei Ye, Yang Xiang, Haiyan Lv, Ting Ma

Abstract: The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such a… ▽ More The brain age has been proven to be a phenotype of relevance to cognitive performance and brain disease. Achieving accurate brain age prediction is an essential prerequisite for optimizing the predicted brain-age difference as a biomarker. As a comprehensive biological characteristic, the brain age is hard to be exploited accurately with models using feature engineering and local processing such as local convolution and recurrent operations that process one local neighborhood at a time. Instead, Vision Transformers learn global attentive interaction of patch tokens, introducing less inductive bias and modeling long-range dependencies. In terms of this, we proposed a novel network for learning brain age interpreting with global and local dependencies, where the corresponding representations are captured by Successive Permuted Transformer (SPT) and convolution blocks. The SPT brings computation efficiency and locates the 3D spatial information indirectly via continuously encoding 2D slices from different views. Finally, we collect a large cohort of 22645 subjects with ages ranging from 14 to 97 and our network performed the best among a series of deep learning methods, yielding a mean absolute error (MAE) of 2.855 in validation set, and 2.911 in an independent test set. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Showing 1–50 of 95 results for author: Guo, X