Search | arXiv e-print repository

MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction

Authors: Jiaqi Cui, Xinyi Zeng, Pinxian Zeng, Bo Liu, Xi Wu, Jiliu Zhou, Yan Wang

Abstract: Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g.… ▽ More Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g., patients' clinical tabular, resulting in compromised reconstruction with limited diagnostic utility. Moreover, they often overlook the semantic consistency between real SPET and reconstructed images, leading to distorted semantic contexts. To tackle these problems, we propose a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from multi-modal inputs, including LPET images and clinical tabular. Specifically, our MCAD incorporates a Multi-modal conditional Encoder (Mc-Encoder) to extract multi-modal features, followed by a conditional diffusion process to blend noise with multi-modal features and gradually map blended features to the target SPET images. To balance multi-modal inputs, the Mc-Encoder embeds Optimal Multi-modal Transport co-Attention (OMTA) to narrow the heterogeneity gap between image and tabular while capturing their interactions, providing sufficient guidance for reconstruction. In addition, to mitigate semantic distortions, we introduce the Multi-Modal Masked Text Reconstruction (M3TRec), which leverages semantic knowledge extracted from denoised PET images to restore the masked clinical tabular, thereby compelling the network to maintain accurate semantics during reconstruction. To expedite the diffusion process, we further introduce an adversarial diffusive network with a reduced number of diffusion steps. Experiments show that our method achieves the state-of-the-art performance both qualitatively and quantitatively. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Early accepted by MICCAI2024

arXiv:2402.00376 [pdf]

doi 10.1109/ICASSP48485.2024.10446360

Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction

Authors: Jiaqi Cui, Yan Wang, Lu Wen, Pinxian Zeng, Xi Wu, Jiliu Zhou, Dinggang Shen

Abstract: To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, le… ▽ More To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, leading to compromised reconstruction. In this paper, we propose a 3D point-based context clusters GAN, namely PCC-GAN, to reconstruct high-quality SPET images from LPET. Specifically, inspired by the geometric representation power of points, we resort to a point-based representation to enhance the explicit expression of the image structure, thus facilitating the reconstruction with finer details. Moreover, a context clustering strategy is applied to explore the contextual relationships among points, which mitigates the ambiguities of small structures in the reconstructed images. Experiments on both clinical and phantom datasets demonstrate that our PCC-GAN outperforms the state-of-the-art reconstruction methods qualitatively and quantitatively. Code is available at https://github.com/gluucose/PCCGAN. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted by ICASSP 2024

arXiv:2308.05365 [pdf]

TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms

Authors: Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen

Abstract: To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminishe… ▽ More To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from prorogating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2212.01209 [pdf, other]

FECAM: Frequency Enhanced Channel Attention Mechanism for Time Series Forecasting

Authors: Maowei Jiang, Pengyu Zeng, Kai Wang, Huan Liu, Wenbo Chen, Haoran Liu

Abstract: Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real… ▽ More Time series forecasting is a long-standing challenge due to the real-world information is in various scenario (e.g., energy, weather, traffic, economics, earthquake warning). However some mainstream forecasting model forecasting result is derailed dramatically from ground truth. We believe it's the reason that model's lacking ability of capturing frequency information which richly contains in real world datasets. At present, the mainstream frequency information extraction methods are Fourier transform(FT) based. However, use of FT is problematic due to Gibbs phenomenon. If the values on both sides of sequences differ significantly, oscillatory approximations are observed around both sides and high frequency noise will be introduced. Therefore We propose a novel frequency enhanced channel attention that adaptively modelling frequency interdependencies between channels based on Discrete Cosine Transform which would intrinsically avoid high frequency noise caused by problematic periodity during Fourier Transform, which is defined as Gibbs Phenomenon. We show that this network generalize extremely effectively across six real-world datasets and achieve state-of-the-art performance, we further demonstrate that frequency enhanced channel attention mechanism module can be flexibly applied to different networks. This module can improve the prediction ability of existing mainstream networks, which reduces 35.99% MSE on LSTM, 10.01% on Reformer, 8.71% on Informer, 8.29% on Autoformer, 8.06% on Transformer, etc., at a slight computational cost ,with just a few line of code. Our codes and data are available at https://github.com/Zero-coder/FECAM. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 11pages.10 figures,conference. arXiv admin note: text overlap with arXiv:2205.14415 by other authors

arXiv:2211.14017 [pdf, other]

Learnable Blur Kernel for Single-Image Defocus Deblurring in the Wild

Authors: Jucai Zhai, Pengcheng Zeng, Chihao Ma, Yong Zhao, Jie Chen

Abstract: Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue… ▽ More Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue, we propose a novel defocus deblurring method that uses the guidance of the defocus map to implement image deblurring. The proposed method consists of a learnable blur kernel to estimate the defocus map, which is an unsupervised method, and a single-image defocus deblurring generative adversarial network (DefocusGAN) for the first time. The proposed network can learn the deblurring of different regions and recover realistic details. We propose a defocus adversarial loss to guide this training process. Competitive experimental results confirm that with a learnable blur kernel, the generated defocus map can achieve results comparable to supervised methods. In the single-image defocus deblurring task, the proposed method achieves state-of-the-art results, especially significant improvements in perceptual quality, where PSNR reaches 25.56 dB and LPIPS reaches 0.111. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 9 pages, 7 figures

arXiv:2206.09302 [pdf, other]

Delay-aware Multiple Access Design for Intelligent Reflecting Surface Aided Uplink Transmission

Authors: Piao Zeng, Guangji Chen, Qingqing Wu, Deli Qiao, Abbas Jamalipour

Abstract: In this paper, we develop a hybrid multiple access (MA) protocol for an intelligent reflecting surface (IRS) aided uplink transmission network by incorporating the IRS-aided time-division MA (I-TDMA) protocol and the IRS-aided non-orthogonal MA (I-NOMA) protocol as special cases. Two typical communication scenarios, namely the transmit power limited case and the transmit energy limited case are co… ▽ More In this paper, we develop a hybrid multiple access (MA) protocol for an intelligent reflecting surface (IRS) aided uplink transmission network by incorporating the IRS-aided time-division MA (I-TDMA) protocol and the IRS-aided non-orthogonal MA (I-NOMA) protocol as special cases. Two typical communication scenarios, namely the transmit power limited case and the transmit energy limited case are considered, where the device's rearranged order, time and power allocation, as well as dynamic IRS beamforming patterns over time are jointly optimized to minimize the sum transmission delay. To shed light on the superiority of the proposed IRS-aided hybrid MA (I-HMA) protocol over conventional protocols, the conditions under which I-HMA outperforms I-TDMA and I-NOMA are revealed by characterizing their corresponding optimal solution. Then, a computationally efficient algorithm is proposed to obtain the high-quality solution to the corresponding optimization problems. Simulation results validate our theoretical findings, demonstrate the superiority of the proposed design, and draw some useful insights. Specifically, it is found that the proposed protocol can significantly reduce the sum transmission delay by combining the additional gain of dynamic IRS beamforming with the high spectral efficiency of NOMA, which thus reveals that integrating IRS into the proposed HMA protocol is an effective solution for delay-aware optimization. Furthermore, it reveals that the proposed design reduces the time consumption not only from the system-centric view, but also from the device-centric view. △ Less

Submitted 26 June, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: Submitted to TWC

arXiv:2111.11600 [pdf, other]

Throughput Maximization for Active Intelligent Reflecting Surface Aided Wireless Powered Communications

Authors: Piao Zeng, Deli Qiao, Qingqing Wu, Yuan Wu

Abstract: This paper considers an active intelligent reflecting surface (IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy and then transmit information to a hybrid access point (HAP). Different from the existing works on passive IRS-aided WPCNs, this is the first work that introduces the active IRS in WPCNs. To guarantee fairness, the problem is formulated as an a… ▽ More This paper considers an active intelligent reflecting surface (IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy and then transmit information to a hybrid access point (HAP). Different from the existing works on passive IRS-aided WPCNs, this is the first work that introduces the active IRS in WPCNs. To guarantee fairness, the problem is formulated as an amplifying power-limited weighted sum throughput (WST) maximization problem, which is solved by successive convex approximation technique and fractional programming alternatively. To balance the performance and complexity tradeoff, three beamforming setups are considered at the active IRS, namely user-adaptive IRS beamforming, uplink-adaptive IRS beamforming, and static IRS beamforming. Numerical results demonstrate the significant superiority of employing active IRS in WPCNs and the benefits of dynamic IRS beamforming. Specifically, it is found that compared to the passive IRS, the active IRS not only improves the WST greatly, but also is more energy-efficient and can significantly extend the transmission coverage. Moreover, different from the symmetric deployment strategy of passive IRS, it is more preferable to deploy the active IRS near the devices. △ Less

Submitted 11 January, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: Submitted to Wireless Communications Letters

arXiv:2109.03233 [pdf, other]

Contrastive Learning with Temporal Correlated Medical Images: A Case Study using Lung Segmentation in Chest X-Rays

Authors: Dewen Zeng, John N. Kheir, Peng Zeng, Yiyu Shi

Abstract: Contrastive learning has been proved to be a promising technique for image-level representation learning from unlabeled data. Many existing works have demonstrated improved results by applying contrastive learning in classification and object detection tasks for either natural images or medical images. However, its application to medical image segmentation tasks has been limited. In this work, we… ▽ More Contrastive learning has been proved to be a promising technique for image-level representation learning from unlabeled data. Many existing works have demonstrated improved results by applying contrastive learning in classification and object detection tasks for either natural images or medical images. However, its application to medical image segmentation tasks has been limited. In this work, we use lung segmentation in chest X-rays as a case study and propose a contrastive learning framework with temporal correlated medical images, named CL-TCI, to learn superior encoders for initializing the segmentation network. We adapt CL-TCI from two state-of-the-art contrastive learning methods-MoCo and SimCLR. Experiment results on three chest X-ray datasets show that under two different segmentation backbones, U-Net and Deeplab-V3, CL-TCI can outperform all baselines that do not incorporate any temporal correlation in both semi-supervised learning setting and transfer learning setting with limited annotation. This suggests that information among temporal correlated medical images can indeed improve contrastive learning performance. Between the two variations of CL-TCI, CL-TCI adapted from MoCo outperforms CL-TCI adapted from SimCLR in most settings, indicating that more contrastive samples can benefit the learning process and help the network learn high-quality representations. △ Less

Submitted 16 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

Comments: 7 pages, submitted to ICCAD'21 special session

arXiv:2108.13603 [pdf, ps, other]

Energy Minimization for IRS-aided WPCNs with Non-linear Energy Harvesting Model

Authors: Piao Zeng, Qingqing Wu, Deli Qiao

Abstract: This paper considers an intelligent reflecting surface(IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy from a power station (PS) in the downlink (DL) and then transmit information using non-orthogonal multiple access (NOMA) to a data sink in the uplink (UL). However, most existing works on WPCNs adopted the simplified linear energy-harvesting model and… ▽ More This paper considers an intelligent reflecting surface(IRS)-aided wireless powered communication network (WPCN), where devices first harvest energy from a power station (PS) in the downlink (DL) and then transmit information using non-orthogonal multiple access (NOMA) to a data sink in the uplink (UL). However, most existing works on WPCNs adopted the simplified linear energy-harvesting model and also cannot guarantee strict user quality-of-service requirements. To address these issues, we aim to minimize the total transmit energy consumption at the PS by jointly optimizing the resource allocation and IRS phase shifts over time, subject to the minimum throughput requirements of all devices. The formulated problem is decomposed into two subproblems, and solved iteratively in an alternative manner by employing difference of convex functions programming, successive convex approximation, and penalty-based algorithm. Numerical results demonstrate the significant performance gains achieved by the proposed algorithm over benchmark schemes and reveal the benefits of integrating IRS into WPCNs. In particular, employing different IRS phase shifts over UL and DL outperforms the case with static IRS beamforming. △ Less

Submitted 1 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: Accepted by IEEE WCL

arXiv:2007.06859 [pdf, ps, other]

Joint Beamforming Design for IRS-Aided Communications with Channel Estimation Errors

Authors: Piao Zeng, Deli Qiao, Haifeng Qian

Abstract: This paper investigates the joint design of the beamforming scheme in intelligent reflecting surface (IRS) assisted multiuser (MU) multiple-input multiple-output (MIMO) downlink transmissions. Channel estimation errors associated with the minimum mean square error (MMSE) estimation are assumed and the weighted sum rate (WSR) is adopted as the performance metric. Low-resolution phase shifters (PSs)… ▽ More This paper investigates the joint design of the beamforming scheme in intelligent reflecting surface (IRS) assisted multiuser (MU) multiple-input multiple-output (MIMO) downlink transmissions. Channel estimation errors associated with the minimum mean square error (MMSE) estimation are assumed and the weighted sum rate (WSR) is adopted as the performance metric. Low-resolution phase shifters (PSs) in practical implementations are taken into account as well. Under the constraint of the transmit power and discrete phase shifters (PSs), an optimization problem is formulated to maximize the WSR of all users. To obtain the optimal beamforming matrices at the IRS, two solutions based on the majorization-minimization (MM) and successive convex approximation (SCA) methods, respectively, are proposed. Through simulation results, both of the proposed two schemes achieve a significant improvement in WSR. Furthermore, the superiority of the SCA-based solution is demonstrated. Overall, two viable solutions to the joint beamforming design in IRS-aided MU-MIMO downlink communication systems with channel estimation errors are provided. △ Less

Submitted 14 July, 2020; originally announced July 2020.

Showing 1–10 of 10 results for author: Zeng, P