-
Towards unlocking the mystery of adversarial fragility of neural networks
Authors:
**gchao Gao,
Raghu Mudumbai,
Xiaodong Wu,
Jirong Yi,
Catherine Xu,
Hui Xie,
Weiyu Xu
Abstract:
In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne…
▽ More
In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness. Our matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
Authors:
Tianqi Chen,
Jun Hou,
Yinchi Zhou,
Huidong Xie,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Menghua Xia,
James S. Duncan,
Chi Liu,
Bo Zhou
Abstract:
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t…
▽ More
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods.
△ Less
Submitted 15 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Large-scale Outdoor Cell-free mMIMO Channel Measurement in an Urban Scenario at 3.5 GHz
Authors:
Yuning Zhang,
Thomas Choi,
Zihang Cheng,
Issei Kanno,
Masaaki Ito,
Jorge Gomez-Ponce,
Hussein Hammoud,
Bowei Wu,
Ashwani Pradhan,
Kelvin Arana,
Pramod Krishna,
Tianyi Yang,
Tyler Chen,
Ishita Vasishtha,
Haoyu Xie,
Linyu Sun,
Andreas F. Molisch
Abstract:
The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandw…
▽ More
The design of cell-free massive MIMO (CF-mMIMO) systems requires accurate, measurement-based channel models. This paper provides the first results from the by far most extensive outdoor measurement campaign for CF-mMIMO channels in an urban environment. We measured impulse responses between over 20,000 potential access point (AP) locations and 80 user equipments (UEs) at 3.5 GHz with 350 MHz bandwidth (BW). Measurements use a "virtual array" approach at the AP and a hybrid switched/virtual approach at the UE. This paper describes the sounder design, measurement environment, data processing, and sample results, particularly the evolution of the power-delay profiles (PDPs) as a function of the AP locations, and its relation to the propagation environment.
△ Less
Submitted 6 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data
Authors:
Huidong Xie,
Weijie Gan,
Bo Zhou,
Ming-Kai Chen,
Michal Kulon,
Annemarie Boustani,
Benjamin A. Spencer,
Reimund Bayerlein,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Menghua Xia,
Yinchi Zhou,
Hui Liu,
Liang Guo,
Hongyu An,
Ulugbek S. Kamilov,
Hanzhong Wang,
Biao Li,
Axel Rominger,
Kuangyu Shi,
Ge Wang,
Ramsey D. Badawi,
Chi Liu
Abstract:
As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi…
▽ More
As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, patient populations, and hospitals. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we extensively evaluated the proposed model using a total of 9,783 18F-FDG studies (1,596 patients) with low-dose/low-count levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by nuclear medicine physicians, the proposed method produced superior denoised results that are comparable to or even better than the 100% full-count images as well as previous DL baselines. The presented results show the potential of achieving low-dose PET while maintaining image quality. Lastly, a group of real low-dose scans was also included for evaluation.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Hybrid Digital-Analog Semantic Communications
Authors:
Huiqiang Xie,
Zhi** Qin,
Zhu Han,
Khaled B. Letaief
Abstract:
Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is…
▽ More
Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is achieved through the introduction of digital-analog allocation and fusion modules. To strike a balance between data rate and distortion, we design new loss functions that take into account long-distance dependencies in the semantic distortion constraint, essential information recovery in the channel distortion constraint, and optimal bit stream generation in the rate constraint. Additionally, we propose denoising diffusion-based signal detection techniques, which involve carefully designed variance schedules and sampling algorithms to refine transmitted signals. Through extensive numerical experiments, we will demonstrate that HDA-DeepSC exhibits robustness to channel variations and is capable of supporting various communication scenarios. Our proposed framework outperforms existing benchmarks in terms of peak signal-to-noise ratio and multi-scale structural similarity, showcasing its superiority in semantic communication quality.
△ Less
Submitted 27 May, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Cascaded Multi-path Shortcut Diffusion Model for Medical Image Translation
Authors:
Yinchi Zhou,
Tianqi Chen,
Jun Hou,
Huidong Xie,
Nicha C. Dvornek,
S. Kevin Zhou,
David L. Wilson,
James S. Duncan,
Chi Liu,
Bo Zhou
Abstract:
Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their c…
▽ More
Image-to-image translation is a vital component in medical imaging processing, with many uses in a wide range of imaging modalities and clinical scenarios. Previous methods include Generative Adversarial Networks (GANs) and Diffusion Models (DMs), which offer realism but suffer from instability and lack uncertainty estimation. Even though both GAN and DM methods have individually exhibited their capability in medical image translation tasks, the potential of combining a GAN and DM to further improve translation performance and to enable uncertainty estimation remains largely unexplored. In this work, we address these challenges by proposing a Cascade Multi-path Shortcut Diffusion Model (CMDM) for high-quality medical image translation and uncertainty estimation. To reduce the required number of iterations and ensure robust performance, our method first obtains a conditional GAN-generated prior image that will be used for the efficient reverse translation with a DM in the subsequent step. Additionally, a multi-path shortcut diffusion strategy is employed to refine translation results and estimate uncertainty. A cascaded pipeline further enhances translation quality, incorporating residual averaging between cascades. We collected three different medical image datasets with two sub-tasks for each dataset to test the generalizability of our approach. Our experimental results found that CMDM can produce high-quality translations comparable to state-of-the-art methods while providing reasonable uncertainty estimations that correlate well with the translation error.
△ Less
Submitted 5 April, 2024;
originally announced May 2024.
-
Semantic MIMO Systems for Speech-to-Text Transmission
Authors:
Zhenzi Weng,
Zhi** Qin,
Huiqiang Xie,
Xiaoming Tao,
Khaled B. Letaief
Abstract:
Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve…
▽ More
Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate the transmission with high semantic fidelity to identify the critical semantic information and guarantee it is recovered accurately. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
LpQcM: Adaptable Lesion-Quantification-Consistent Modulation for Deep Learning Low-Count PET Image Denoising
Authors:
Menghua Xia,
Huidong Xie,
Qiong Liu,
Bo Zhou,
Hanzhong Wang,
Biao Li,
Axel Rominger,
Kuangyu Shi,
Georges EI Fakhri,
Chi Liu
Abstract:
Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LpQcM) strategy…
▽ More
Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LpQcM) strategy for enhanced PET image denoising, via employing downstream lesion quantification analysis as auxiliary tools. The LpQcM is a plug-and-play design adaptable to a wide range of model architectures, modulating the sampling and optimization procedures of model training without adding any computational burden to the inference phase. Specifically, the LpQcM consists of two components, the lesion-perceived modulation (LpM) and the multiscale quantification-consistent modulation (QcM). The LpM enhances lesion contrast and visibility by allocating higher sampling weights and stricter loss criteria to lesion-present samples determined by an auxiliary segmentation network than lesion-absent ones. The QcM further emphasizes accuracy of quantification for both the mean and maximum standardized uptake value (SUVmean and SUVmax) across multiscale sub-regions throughout the entire image, thereby enhancing the overall image quality. Experiments conducted on large PET datasets from multiple centers and vendors, and varying noise levels demonstrated the LpQcM efficacy across various denoising frameworks. Compared to frameworks without LpQcM, the integration of LpQcM reduces the lesion SUVmean bias by 2.92% on average and increases the peak signal-to-noise ratio (PSNR) by 0.34 on average, for denoising images of extremely low-count levels below 10%.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
DE-CGAN: Boosting rTMS Treatment Prediction with Diversity Enhancing Conditional Generative Adversarial Networks
Authors:
Matthew Squires,
Xiaohui Tao,
Soman Elangovan,
Raj Gururajan,
Haoran Xie,
Xujuan Zhou,
Yuefeng Li,
U Rajendra Acharya
Abstract:
Repetitive Transcranial Magnetic Stimulation (rTMS) is a well-supported, evidence-based treatment for depression. However, patterns of response to this treatment are inconsistent. Emerging evidence suggests that artificial intelligence can predict rTMS treatment outcomes for most patients using fMRI connectivity features. While these models can reliably predict treatment outcomes for many patients…
▽ More
Repetitive Transcranial Magnetic Stimulation (rTMS) is a well-supported, evidence-based treatment for depression. However, patterns of response to this treatment are inconsistent. Emerging evidence suggests that artificial intelligence can predict rTMS treatment outcomes for most patients using fMRI connectivity features. While these models can reliably predict treatment outcomes for many patients for some underrepresented fMRI connectivity measures DNN models are unable to reliably predict treatment outcomes. As such we propose a novel method, Diversity Enhancing Conditional General Adversarial Network (DE-CGAN) for oversampling these underrepresented examples. DE-CGAN creates synthetic examples in difficult-to-classify regions by first identifying these data points and then creating conditioned synthetic examples to enhance data diversity. Through empirical experiments we show that a classification model trained using a diversity enhanced training set outperforms traditional data augmentation techniques and existing benchmark results. This work shows that increasing the diversity of a training dataset can improve classification model performance. Furthermore, this work provides evidence for the utility of synthetic patients providing larger more robust datasets for both AI researchers and psychiatrists to explore variable relationships.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic Model
Authors:
Weijie Gan,
Huidong Xie,
Carl von Gall,
Günther Platsch,
Michael T. Jurkiewicz,
Andrea Andrade,
Udunna C. Anazodo,
Ulugbek S. Kamilov,
Hongyu An,
Jorge Cabello
Abstract:
Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET re…
▽ More
Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Towards Intelligent Communications: Large Model Empowered Semantic Communications
Authors:
Huiqiang Xie,
Zhi** Qin,
Xiaoming Tao,
Zhu Han
Abstract:
Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication…
▽ More
Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication by enhancing semantic understanding and contextual understanding. This article systematically investigates the large model-empowered semantic communication systems from potential applications to system design. First, we propose a new semantic communication architecture that seamlessly integrates large models into semantic communication through the introduction of a memory module. Then, the typical applications are illustrated to show the benefits of the new architecture. Besides, we discuss the key designs in implementing the new semantic communication systems from module design to system training. Finally, the potential research directions are identified to boost the large model-empowered semantic communications.
△ Less
Submitted 19 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
TAI-GAN: A Temporally and Anatomically Informed Generative Adversarial Network for early-to-late frame conversion in dynamic cardiac PET inter-frame motion correction
Authors:
Xueqi Guo,
Luyao Shi,
Xiongchao Chen,
Qiong Liu,
Bo Zhou,
Huidong Xie,
Yi-Hwa Liu,
Richard Palyo,
Edward J. Miller,
Albert J. Sinusas,
Lawrence H. Staib,
Bruce Spottiswoode,
Chi Liu,
Nicha C. Dvornek
Abstract:
Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for ea…
▽ More
Inter-frame motion in dynamic cardiac positron emission tomography (PET) using rubidium-82 (82-Rb) myocardial perfusion imaging impacts myocardial blood flow (MBF) quantification and the diagnosis accuracy of coronary artery diseases. However, the high cross-frame distribution variation due to rapid tracer kinetics poses a considerable challenge for inter-frame motion correction, especially for early frames where intensity-based image registration techniques often fail. To address this issue, we propose a novel method called Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) that utilizes an all-to-one map** to convert early frames into those with tracer distribution similar to the last reference frame. The TAI-GAN consists of a feature-wise linear modulation layer that encodes channel-wise parameters generated from temporal information and rough cardiac segmentation masks with local shifts that serve as anatomical information. Our proposed method was evaluated on a clinical 82-Rb PET dataset, and the results show that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, the motion estimation accuracy and subsequent myocardial blood flow (MBF) quantification with both conventional and deep learning-based motion correction methods were improved compared to using the original frames.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map Generation
Authors:
Bo Zhou,
Jun Hou,
Tianqi Chen,
Yinchi Zhou,
Xiongchao Chen,
Huidong Xie,
Qiong Liu,
Xueqi Guo,
Yu-Jung Tsai,
Vladimir Y. Panin,
Takuya Toyonaga,
James S. Duncan,
Chi Liu
Abstract:
Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prio…
▽ More
Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $μ$-map generation, resulting in the production of high-quality $μ$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Dual-Domain Coarse-to-Fine Progressive Estimation Network for Simultaneous Denoising, Limited-View Reconstruction, and Attenuation Correction of Cardiac SPECT
Authors:
Xiongchao Chen,
Bo Zhou,
Xueqi Guo,
Huidong Xie,
Qiong Liu,
James S. Duncan,
Albert J. Sinusas,
Chi Liu
Abstract:
Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy. Additionally, Computed Tomog…
▽ More
Single-Photon Emission Computed Tomography (SPECT) is widely applied for the diagnosis of coronary artery diseases. Low-dose (LD) SPECT aims to minimize radiation exposure but leads to increased image noise. Limited-view (LV) SPECT, such as the latest GE MyoSPECT ES system, enables accelerated scanning and reduces hardware expenses but degrades reconstruction accuracy. Additionally, Computed Tomography (CT) is commonly used to derive attenuation maps ($μ$-maps) for attenuation correction (AC) of cardiac SPECT, but it will introduce additional radiation exposure and SPECT-CT misalignments. Although various methods have been developed to solely focus on LD denoising, LV reconstruction, or CT-free AC in SPECT, the solution for simultaneously addressing these tasks remains challenging and under-explored. Furthermore, it is essential to explore the potential of fusing cross-domain and cross-modality information across these interrelated tasks to further enhance the accuracy of each task. Thus, we propose a Dual-Domain Coarse-to-Fine Progressive Network (DuDoCFNet), a multi-task learning method for simultaneous LD denoising, LV reconstruction, and CT-free $μ$-map generation of cardiac SPECT. Paired dual-domain networks in DuDoCFNet are cascaded using a multi-layer fusion mechanism for cross-domain and cross-modality feature fusion. Two-stage progressive learning strategies are applied in both projection and image domains to achieve coarse-to-fine estimations of SPECT projections and CT-derived $μ$-maps. Our experiments demonstrate DuDoCFNet's superior accuracy in estimating projections, generating $μ$-maps, and AC reconstructions compared to existing single- or multi-task learning methods, under various iterations and LD levels. The source code of this work is available at https://github.com/XiongchaoChen/DuDoCFNet-MultiTask.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Distance Guided Generative Adversarial Network for Explainable Binary Classifications
Authors:
Xiangyu Xiong,
Yue Sun,
Xiaohong Liu,
Wei Ke,
Chan-Tong Lam,
Jiangang Chen,
Mingfeng Jiang,
Mingwei Wang,
Hui Xie,
Tong Tong,
Qinquan Gao,
Hao Chen,
Tao Tan
Abstract:
Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classi…
▽ More
Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by map** the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression
Authors:
Yi-Hsin Chen,
Hong-Sheng Xie,
Cheng-Wei Chen,
Zong-Lin Gao,
Wen-Hsiao Peng,
Martin Benjak,
Jörn Ostermann
Abstract:
Conditional coding has lately emerged as the mainstream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that…
▽ More
Conditional coding has lately emerged as the mainstream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that the residual frame has a lower entropy rate than that of the intra frame. Recognizing that this assumption is not always true due to dis-occlusion phenomena or unreliable motion estimates, we propose a masked conditional residual coding scheme. It learns a soft mask to form a hybrid of conditional coding and conditional residual coding in a pixel adaptive manner. We introduce a Transformer-based conditional autoencoder. Several strategies are investigated with regard to how to condition a Transformer-based autoencoder for inter-frame coding, a topic that is largely under-explored. Additionally, we propose a channel transform module (CTM) to decorrelate the image latents along the channel dimension, with the aim of using the simple hyperprior to approach similar compression performance to the channel-wise autoregressive model. Experimental results confirm the superiority of our masked conditional residual transformer (termed MaskCRT) to both conditional coding and conditional residual coding. On commonly used datasets, MaskCRT shows comparable BD-rate results to VTM-17.0 under the low delay P configuration in terms of PSNR-RGB. It also opens up a new research direction for advancing learned video compression.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
gcDLSeg: Integrating Graph-cut into Deep Learning for Binary Semantic Segmentation
Authors:
Hui Xie,
Weiyu Xu,
Ya Xing Wang,
John Buatti,
Xiaodong Wu
Abstract:
Binary semantic segmentation in computer vision is a fundamental problem. As a model-based segmentation method, the graph-cut approach was one of the most successful binary segmentation methods thanks to its global optimality guarantee of the solutions and its practical polynomial-time complexity. Recently, many deep learning (DL) based methods have been developed for this task and yielded remarka…
▽ More
Binary semantic segmentation in computer vision is a fundamental problem. As a model-based segmentation method, the graph-cut approach was one of the most successful binary segmentation methods thanks to its global optimality guarantee of the solutions and its practical polynomial-time complexity. Recently, many deep learning (DL) based methods have been developed for this task and yielded remarkable performance, resulting in a paradigm shift in this field. To combine the strengths of both approaches, we propose in this study to integrate the graph-cut approach into a deep learning network for end-to-end learning. Unfortunately, backward propagation through the graph-cut module in the DL network is challenging due to the combinatorial nature of the graph-cut algorithm. To tackle this challenge, we propose a novel residual graph-cut loss and a quasi-residual connection, enabling the backward propagation of the gradients of the residual graph-cut loss for effective feature learning guided by the graph-cut segmentation model. In the inference phase, globally optimal segmentation is achieved with respect to the graph-cut energy defined on the optimized image features learned from DL networks. Experiments on the public AZH chronic wound data set and the pancreas cancer data set from the medical segmentation decathlon (MSD) demonstrated promising segmentation accuracy, and improved robustness against adversarial attacks.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss
Authors:
Junbo Peng,
Chih-Wei Chang,
Huiqiao Xie,
Richard L. J. Qiu,
Justin Roper,
Tonghe Wang,
Beth Bradshaw,
Xiangyang Tang,
Xiaofeng Yang
Abstract:
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately…
▽ More
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately represent the features of the target image manifold. Although deep learning-based decomposition methods have been reported, these methods are in the supervised-learning framework requiring paired data for training, which is not readily available in clinical settings.
Purpose: This work aims to develop an unsupervised-learning framework with data-measurement consistency for image-domain material decomposition in DECT.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
DDPET-3D: Dose-aware Diffusion Model for 3D Ultra Low-dose PET Imaging
Authors:
Huidong Xie,
Weijie Gan,
Bo Zhou,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Liang Guo,
Hongyu An,
Ulugbek S. Kamilov,
Ge Wang,
Chi Liu
Abstract:
As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image…
▽ More
As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image reconstructions due to the memory burden. Directly stacking 2D slices together to create 3D image volumes would results in severe inconsistencies between slices. Previous works tried to either apply a penalty term along the z-axis to remove inconsistencies or reconstruct the 3D image volumes with 2 pre-trained perpendicular 2D diffusion models. Nonetheless, these previous methods failed to produce satisfactory results in challenging cases for PET image denoising. In addition to administered dose, the noise levels in PET images are affected by several other factors in clinical settings, e.g. scan time, medical history, patient size, and weight, etc. Therefore, a method to simultaneously denoise PET images with different noise-levels is needed. Here, we proposed a Dose-aware Diffusion model for 3D low-dose PET imaging (DDPET-3D) to address these challenges. We extensively evaluated DDPET-3D on 100 patients with 6 different low-dose levels (a total of 600 testing studies), and demonstrated superior performance over previous diffusion models for 3D imaging problems as well as previous noise-aware medical image denoising models. The code is available at: https://github.com/xxx/xxx.
△ Less
Submitted 28 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
An Overview on IEEE 802.11bf: WLAN Sensing
Authors:
Rui Du,
Haocheng Hua,
Hailiang Xie,
Xianxin Song,
Zhonghao Lyu,
Mengshi Hu,
Narengerile,
Yan Xin,
Stephen McCann,
Michael Montemurro,
Tony Xiao Han,
Jie Xu
Abstract:
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications.…
▽ More
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to meet advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its formation and standardization timeline. Next, we discuss the WLAN sensing use cases with the corresponding key performance indicator (KPI) requirements. After reviewing previous WLAN sensing research based on communication-oriented WLAN standards, we identify their limitations and underscore the practical need for the new sensing-oriented amendment in 802.11bf. Furthermore, we discuss the WLAN sensing framework and procedure used for measurement acquisition, by considering both sensing at sub-7GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively, and address their shared features, similarities, and differences. In addition, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, as well as quantization and compression techniques. We also describe the methodologies and the channel modeling used by the IEEE 802.11bf TG for evaluation. Finally, we discuss the challenges and future research directions to motivate more research endeavors towards this field in details.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
A global product of fine-scale urban building height based on spaceborne lidar
Authors:
Xiao Ma,
Guang Zheng,
Chi Xu,
L. Monika Moskal,
Peng Gong,
Qinghua Guo,
Huabing Huang,
Xuecao Li,
Yong Pang,
Cheng Wang,
Huan Xie,
Bailang Yu,
Bo Zhao,
Yuyu Zhou
Abstract:
Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heig…
▽ More
Characterizing urban environments with broad coverages and high precision is more important than ever for achieving the UN's Sustainable Development Goals (SDGs) as half of the world's populations are living in cities. Urban building height as a fundamental 3D urban structural feature has far-reaching applications. However, so far, producing readily available datasets of recent urban building heights with fine spatial resolutions and global coverages remains a challenging task. Here, we provide an up-to-date global product of urban building heights based on a fine grid size of 150 m around 2020 by combining the spaceborne lidar instrument of GEDI and multi-sourced data including remotely sensed images (i.e., Landsat-8, Sentinel-2, and Sentinel-1) and topographic data. Our results revealed that the estimated method of building height samples based on the GEDI data was effective with 0.78 of Pearson's r and 3.67 m of RMSE in comparison to the reference data. The map** product also demonstrated good performance as indicated by its strong correlation with the reference data (i.e., Pearson's r = 0.71, RMSE = 4.60 m). Compared with the currently existing products, our global urban building height map holds the ability to provide a higher spatial resolution (i.e., 150 m) with a great level of inherent details about the spatial heterogeneity and flexibility of updating using the GEDI samples as inputs. This work will boost future urban studies across many fields including climate, environmental, ecological, and social sciences.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction
Authors:
Xueqi Guo,
Luyao Shi,
Xiongchao Chen,
Bo Zhou,
Qiong Liu,
Huidong Xie,
Yi-Hwa Liu,
Richard Palyo,
Edward J. Miller,
Albert J. Sinusas,
Bruce Spottiswoode,
Chi Liu,
Nicha C. Dvornek
Abstract:
The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods…
▽ More
The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one map**. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions
Authors:
Huidong Xie,
Bo Zhou,
Xiongchao Chen,
Xueqi Guo,
Stephanie Thorn,
Yi-Hwa Liu,
Ge Wang,
Albert Sinusas,
Chi Liu
Abstract:
Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects…
▽ More
Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.
△ Less
Submitted 23 July, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Enhancing Spectrum Sensing via Reconfigurable Intelligent Surfaces: Passive or Active Sensing and How Many Reflecting Elements are Needed?
Authors:
Hao Xie,
Dong Li,
Bowen Gu
Abstract:
Cognitive radio has been proposed to alleviate the scarcity of available spectrum caused by the significant demand for wideband services and the fragmentation of spectrum resources. However, sensing performance is quite poor due to the low sensing signal-to-noise ratio, especially in complex environments with severe channel fading. Fortunately, reconfigurable intelligent surface (RIS)-aided spectr…
▽ More
Cognitive radio has been proposed to alleviate the scarcity of available spectrum caused by the significant demand for wideband services and the fragmentation of spectrum resources. However, sensing performance is quite poor due to the low sensing signal-to-noise ratio, especially in complex environments with severe channel fading. Fortunately, reconfigurable intelligent surface (RIS)-aided spectrum sensing can effectively tackle the above challenge due to its high array gain. Nevertheless, the traditional passive RIS may suffer from the ``double fading'' effect, which severely limits the performance of passive RIS-aided spectrum sensing. Thus, a crucial challenge is how to fully exploit the potential advantages of the RIS and further improve the sensing performance. To this end, we introduce the active RIS into spectrum sensing and respectively formulate two optimization problems for the passive RIS and the active RIS to maximize the detection probability. In light of the intractability of the formulated problems, we develop a one-stage optimization algorithm with inner approximation and a two-stage optimization algorithm with a bisection method to obtain sub-optimal solutions, and apply the Rayleigh quotient to obtain the upper and lower bounds of the detection probability. Furthermore, in order to gain more insight into the impact of the RIS on spectrum sensing, we respectively investigate the number configuration for passive RIS and active RIS and analyze how many reflecting elements are needed to achieve the detection probability close to 1. Simulation results verify that the proposed algorithms outperform existing algorithms under the same parameter configuration, and achieve a detection probability close to 1 with even fewer reflecting elements or antennas than existing schemes.
△ Less
Submitted 21 October, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances
Authors:
Huang Xie,
Khazar Khorrami,
Okko Räsänen,
Tuomas Virtanen
Abstract:
This paper explores grading text-based audio retrieval relevances with crowdsourcing assessments. Given a free-form text (e.g., a caption) as a query, crowdworkers are asked to grade audio clips using numeric scores (between 0 and 100) to indicate their judgements of how much the sound content of an audio clip matches the text, where 0 indicates no content match at all and 100 indicates perfect co…
▽ More
This paper explores grading text-based audio retrieval relevances with crowdsourcing assessments. Given a free-form text (e.g., a caption) as a query, crowdworkers are asked to grade audio clips using numeric scores (between 0 and 100) to indicate their judgements of how much the sound content of an audio clip matches the text, where 0 indicates no content match at all and 100 indicates perfect content match. We integrate the crowdsourced relevances into training and evaluating text-based audio retrieval systems, and evaluate the effect of using them together with binary relevances from audio captioning. Conventionally, these binary relevances are defined by captioning-based audio-caption pairs, where being positive indicates that the caption describes the paired audio, and being negative applies to all other pairs. Experimental results indicate that there is no clear benefit from incorporating crowdsourced relevances alongside binary relevances when the crowdsourced relevances are binarized for contrastive learning. Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.
△ Less
Submitted 15 August, 2023; v1 submitted 16 June, 2023;
originally announced June 2023.
-
Implementation of Multiple-Step Quantized STDP Based on Novel Memristive Synapses
Authors:
Y. Liu,
D. Wang,
Z. Dong,
H. Xie,
W. Zhao
Abstract:
Memristors have been widely studied as artificial synapses in neuromorphic circuits, due to their functional similarity with biological synapses, low operating power, and high integration density. In this work, a memristive synapse, composed of four memristors and two resistors, for SNN is designed and utilized for a neuron circuit implementing the robust spike-timing dependent plasticity learning…
▽ More
Memristors have been widely studied as artificial synapses in neuromorphic circuits, due to their functional similarity with biological synapses, low operating power, and high integration density. In this work, a memristive synapse, composed of four memristors and two resistors, for SNN is designed and utilized for a neuron circuit implementing the robust spike-timing dependent plasticity learning. The synapse can be either excitatory or inhibitory by rationally arranging the resistors in the circuit. This is the first of its kind, enabling Hebbian and anti-Hebbian training without requiring additional processing of neural signals. Then, a neuron circuit is designed based on the proposed synapses. The robustness and compatibility of this neuron circuit are greatly enhanced by employing the clock-based square-wave pulsed to transmit spikes and modulate the synaptic weight. To study the performance of proposed synapses and circuit, simulations based on behavior models are carried out in the MATLAB Simulink and Simscape. Specially, a memristor model with balanced flexibility, efficiency, convergence, and emulation performance, is developed through including the nonlinear Joule effect. Using this memristor model in pattern learning, the influence of weak signal-induced weight variation on circuit performance can be rigorously assessed. This proposed circuit could give some inspiration for combining the analog memristive synapse and leaky integrate-and-fire neuron with digital control units, prompting their development as edge computing devices.
△ Less
Submitted 27 August, 2023; v1 submitted 10 June, 2023;
originally announced June 2023.
-
Two-Bit RIS-Aided Communications at 3.5GHz: Some Insights from the Measurement Results Under Multiple Practical Scenes
Authors:
Shun Zhang,
Haoran Sun,
Runze Yu,
Hongshenyuan Cui,
Jian Ren,
Feifei Gao,
Shi **,
Hongxiang Xie,
Hao Wang
Abstract:
In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit ph…
▽ More
In this paper, we propose a two-bit reconfigurable intelligent surface (RIS)-aided communication system, which mainly consists of a two-bit RIS, a transmitter and a receiver. A corresponding prototype verification system is designed to perform experimental tests in practical environments. The carrier frequency is set as 3.5GHz, and the RIS array possesses 256 units, each of which adopts two-bit phase quantization. In particular, we adopt a self-developed broadband intelligent communication system 40MHz-Net (BICT-40N) terminal in order to fully acquire the channel information. The terminal mainly includes a baseband board and a radio frequency (RF) front-end board, where the latter can achieve 26 dB transmitting link gain and 33 dB receiving link gain. The orthogonal frequency division multiplexing (OFDM) signal is used for the terminal, where the bandwidth is 40MHz and the subcarrier spacing is 625KHz. Also, the terminal supports a series of modulation modes, including QPSK, QAM, etc.Through experimental tests, we validate a few functions and properties of the RIS as follows. First, we validate a novel RIS power consumption model, which considers both the static and the dynamic power consumption. Besides, we demonstrate the existence of the imaging interference and find that two-bit RIS can lower the imaging interference about 10 dBm. Moreover, we verify that the RIS can outperform the metal plate in terms of the beam focusing performance. In addition, we find that the RIS has the ability to improve the channel stationarity. Then, we realize the multi-beam reflection of the RIS utilizing the pattern addition (PA) algorithm. Lastly, we validate the existence of the mutual coupling between different RIS units.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Computation-Efficient Backscatter-Blessed MEC with User Reciprocity
Authors:
Bowen Gu,
Hao Xie,
Dong Li
Abstract:
This letter proposes a new user cooperative offloading protocol called user reciprocity in backscatter communication (BackCom)-aided mobile edge computing systems with efficient computation, whose quintessence is that each user can switch alternately between the active or the BackCom mode in different slots, and one user works in the active mode and the other user works in the BackCom mode in each…
▽ More
This letter proposes a new user cooperative offloading protocol called user reciprocity in backscatter communication (BackCom)-aided mobile edge computing systems with efficient computation, whose quintessence is that each user can switch alternately between the active or the BackCom mode in different slots, and one user works in the active mode and the other user works in the BackCom mode in each time slot. In particular, the user in the BackCom mode can always use the signal transmitted by the user in the active mode for more data transmission in a spectrum-sharing manner. To evaluate the proposed protocol, a computation efficiency (CE) maximization-based optimization problem is formulated by jointly power control, time scheduling, reflection coefficient adjustment, and computing frequency allocation, while satisfying various physical constraints on the maximum energy budget, the computing frequency threshold, the minimum computed bits, and harvested energy threshold. To solve this non-convex problem, Dinkelbach's method and quadratic transform are first employed to transform the complex fractional forms into linear ones. Then, an iterative algorithm is designed by decomposing the resulting problem to obtain the suboptimal solution. The closed-form solutions for the transmit power, the RC, and the local computing frequency are provided for more insights. Besides, the analytical performance gain with the reciprocal mode is also derived. Simulation results demonstrate that the proposed scheme outperforms benchmark schemes regarding the CE.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Edge Learning for Large-Scale Internet of Things With Task-Oriented Efficient Communication
Authors:
Haihui Xie,
Minghua Xia,
Peiran Wu,
Shuai Wang,
H. Vincent Poor
Abstract:
In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wir…
▽ More
In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wireless resource allocation and edge learning error prediction. In particular, we start with multi-user scheduling to alleviate co-channel interference in dense networks. Then, we perform optimal power allocation in parallel for different learning tasks. Thanks to the high parallelization of the designed algorithm, extensive experimental results corroborate that the multi-user scheduling and task-oriented power allocation improve the performance of distinct edge learning tasks efficiently compared with the state-of-the-art benchmark algorithms.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Unified Noise-aware Network for Low-count PET Denoising
Authors:
Huidong Xie,
Qiong Liu,
Bo Zhou,
Xiongchao Chen,
Xueqi Guo,
Chi Liu
Abstract:
As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. However, low-count PET scans often suffer from high image noise, which can negatively impact image quality and diagnostic performance. Recent advances in deep learning have shown great potential for recovering underlying signal from noisy counterparts. Howeve…
▽ More
As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. However, low-count PET scans often suffer from high image noise, which can negatively impact image quality and diagnostic performance. Recent advances in deep learning have shown great potential for recovering underlying signal from noisy counterparts. However, neural networks trained on a specific noise level cannot be easily generalized to other noise levels due to different noise amplitude and variances. To obtain optimal denoised results, we may need to train multiple networks using data with different noise levels. But this approach may be infeasible in reality due to limited data availability. Denoising dynamic PET images presents additional challenge due to tracer decay and continuously changing noise levels across dynamic frames. To address these issues, we propose a Unified Noise-aware Network (UNN) that combines multiple sub-networks with varying denoising power to generate optimal denoised results regardless of the input noise levels. Evaluated using large-scale data from two medical centers with different vendors, presented results showed that the UNN can consistently produce promising denoised results regardless of input noise levels, and demonstrate superior performance over networks trained on single noise level data, especially for extremely low-count data.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Optimizing Energy Efficiency in Metro Systems Under Uncertainty Disturbances Using Reinforcement Learning
Authors:
Haiqin Xie,
Cheng Wang,
Shicheng Li,
Yue Zhang,
Shanshan Wang
Abstract:
In the realm of urban transportation, metro systems serve as crucial and sustainable means of public transit. However, their substantial energy consumption poses a challenge to the goal of sustainability. Disturbances such as delays and passenger flow changes can further exacerbate this issue by negatively affecting energy efficiency in metro systems. To tackle this problem, we propose a policy-ba…
▽ More
In the realm of urban transportation, metro systems serve as crucial and sustainable means of public transit. However, their substantial energy consumption poses a challenge to the goal of sustainability. Disturbances such as delays and passenger flow changes can further exacerbate this issue by negatively affecting energy efficiency in metro systems. To tackle this problem, we propose a policy-based reinforcement learning approach that reschedules the metro timetable and optimizes energy efficiency in metro systems under disturbances by adjusting the dwell time and cruise speed of trains. Our experiments conducted in a simulation environment demonstrate the superiority of our method over baseline methods, achieving a traction energy consumption reduction of up to 10.9% and an increase in regenerative braking energy utilization of up to 47.9%. This study provides an effective solution to the energy-saving problem of urban rail transit.
△ Less
Submitted 17 May, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Real-Time Ground Fault Detection for Inverter-Based Microgrid Systems
Authors:
**gwei Dong,
Yucheng Liao,
Haiwei Xie,
Jochen Cremer,
Peyman Mohajerin Esfahani
Abstract:
Ground fault detection in inverter-based microgrid (IBM) systems is challenging, particularly in a real-time setting, as the fault current deviates slightly from the nominal value. This difficulty is reinforced when there are partially decoupled disturbances and modeling uncertainties. The conventional solution of installing more relays to obtain additional measurements is costly and also increase…
▽ More
Ground fault detection in inverter-based microgrid (IBM) systems is challenging, particularly in a real-time setting, as the fault current deviates slightly from the nominal value. This difficulty is reinforced when there are partially decoupled disturbances and modeling uncertainties. The conventional solution of installing more relays to obtain additional measurements is costly and also increases the complexity of the system. In this paper, we propose a data-assisted diagnosis scheme based on an optimization-based fault detection filter with the output current as the only measurement. Modeling the microgrid dynamics and the diagnosis filter, we formulate the filter design as a quadratic programming (QP) problem that accounts for decoupling partial disturbances, robustness to non-decoupled disturbances and modeling uncertainties by training with data, and ensuring fault sensitivity simultaneously. To ease the computational effort, we also provide an approximate but analytical solution to this QP. Additionally, we use classical statistical results to provide a thresholding mechanism that enjoys probabilistic false-alarm guarantees. Finally, we implement the IBM system with Simulink and Real Time Digital Simulator (RTDS) to verify the effectiveness of the proposed method through simulations.
△ Less
Submitted 3 April, 2024; v1 submitted 24 April, 2023;
originally announced April 2023.
-
To Reflect or Not To Reflect: On-Off Control and Number Configuration for Reflecting Elements in RIS-Aided Wireless Systems
Authors:
Hao Xie,
Dong Li
Abstract:
Reconfigurable intelligent surface (RIS) has been regarded as a promising technique due to its high array gain and low power. However, the traditional passive RIS suffers from the ``double fading'' effect, which has restricted the performance of passive RIS-aided communications. Fortunately, active RIS can alleviate this problem since it can adjust the phase shift and amplify the received signal s…
▽ More
Reconfigurable intelligent surface (RIS) has been regarded as a promising technique due to its high array gain and low power. However, the traditional passive RIS suffers from the ``double fading'' effect, which has restricted the performance of passive RIS-aided communications. Fortunately, active RIS can alleviate this problem since it can adjust the phase shift and amplify the received signal simultaneously. Nevertheless, a high beamforming gain often requires a number of reflecting elements, which leads to non-negligible power consumption, especially for the active RIS. Thus, one challenge is how to improve the scalability of the RIS and the energy efficiency. Different from the existing works where all reflecting elements are activated, we propose a novel element on-off mechanism where reflecting elements can be flexibly activated and deactivated. Two different optimization problems for passive RIS and active RIS are formulated by maximizing the total energy efficiency. We develop two different alternating optimization-based iterative algorithms to obtain sub-optimal solutions. Furthermore, we consider special cases involving rate maximization problems for given the same total power budget, and respectively analyze the number configuration for passive RIS and active RIS. Simulation results verify that reflecting elements under the proposed algorithms can be flexibly activated and deactivated.
△ Less
Submitted 26 April, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Regularised Learning with Selected Physics for Power System Dynamics
Authors:
Haiwei Xie,
Federica Bellizio,
Jochen L. Cremer,
Goran Strbac
Abstract:
Due to the increasing system stability issues caused by the technological revolutions of power system equipment, the assessment of the dynamic security of the systems for changing operating conditions (OCs) is nowadays crucial. To address the computational time problem of conventional dynamic security assessment tools, many machine learning (ML) approaches have been proposed and well-studied in th…
▽ More
Due to the increasing system stability issues caused by the technological revolutions of power system equipment, the assessment of the dynamic security of the systems for changing operating conditions (OCs) is nowadays crucial. To address the computational time problem of conventional dynamic security assessment tools, many machine learning (ML) approaches have been proposed and well-studied in this context. However, these learned models only rely on data, and thus miss resourceful information offered by the physical system. To this end, this paper focuses on combining the power system dynamical model together with the conventional ML. Going beyond the classic Physics Informed Neural Networks (PINNs), this paper proposes Selected Physics Informed Neural Networks (SPINNs) to predict the system dynamics for varying OCs. A two-level structure of feed-forward NNs is proposed, where the first NN predicts the generator bus rotor angles (system states) and the second NN learns to adapt to varying OCs. We show a case study on an IEEE-9 bus system that considering selected physics in model training reduces the amount of needed training data. Moreover, the trained model effectively predicted long-term dynamics that were beyond the time scale of the collected training dataset (extrapolation).
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
FedFTN: Personalized Federated Learning with Deep Feature Transformation Network for Multi-institutional Low-count PET Denoising
Authors:
Bo Zhou,
Huidong Xie,
Qiong Liu,
Xiongchao Chen,
Xueqi Guo,
Zhicheng Feng,
Jun Hou,
S. Kevin Zhou,
Biao Li,
Axel Rominger,
Kuangyu Shi,
James S. Duncan,
Chi Liu
Abstract:
Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutio…
▽ More
Low-count PET is an efficient way to reduce radiation exposure and acquisition time, but the reconstructed images often suffer from low signal-to-noise ratio (SNR), thus affecting diagnosis and other downstream tasks. Recent advances in deep learning have shown great potential in improving low-count PET image quality, but acquiring a large, centralized, and diverse dataset from multiple institutions for training a robust model is difficult due to privacy and security concerns of patient data. Moreover, low-count PET data at different institutions may have different data distribution, thus requiring personalized models. While previous federated learning (FL) algorithms enable multi-institution collaborative training without the need of aggregating local data, addressing the large domain shift in the application of multi-institutional low-count PET denoising remains a challenge and is still highly under-explored. In this work, we propose FedFTN, a personalized federated learning strategy that addresses these challenges. FedFTN uses a local deep feature transformation network (FTN) to modulate the feature outputs of a globally shared denoising network, enabling personalized low-count PET denoising for each institution. During the federated learning process, only the denoising network's weights are communicated and aggregated, while the FTN remains at the local institutions for feature transformation. We evaluated our method using a large-scale dataset of multi-institutional low-count PET imaging data from three medical centers located across three continents, and showed that FedFTN provides high-quality low-count PET images, outperforming previous baseline FL reconstruction methods across all low-count levels at all three institutions.
△ Less
Submitted 6 October, 2023; v1 submitted 2 April, 2023;
originally announced April 2023.
-
Toward Polar Sea-Ice Classification using Color-based Segmentation and Auto-labeling of Sentinel-2 Imagery to Train an Efficient Deep Learning Model
Authors:
Jurdana Masuma Iqrah,
Younghyun Koo,
Wei Wang,
Hongjie Xie,
Sushil Prasad
Abstract:
Global warming is an urgent issue that is generating catastrophic environmental changes, such as the melting of sea ice and glaciers, particularly in the polar regions. The melting pattern and retreat of polar sea ice cover is an essential indicator of global warming. The Sentinel-2 satellite (S2) captures high-resolution optical imagery over the polar regions. This research aims at develo** a r…
▽ More
Global warming is an urgent issue that is generating catastrophic environmental changes, such as the melting of sea ice and glaciers, particularly in the polar regions. The melting pattern and retreat of polar sea ice cover is an essential indicator of global warming. The Sentinel-2 satellite (S2) captures high-resolution optical imagery over the polar regions. This research aims at develo** a robust and effective system for classifying polar sea ice as thick or snow-covered, young or thin, or open water using S2 images. A key challenge is the lack of labeled S2 training data to serve as the ground truth. We demonstrate a method with high precision to segment and automatically label the S2 images based on suitably determined color thresholds and employ these auto-labeled data to train a U-Net machine model (a fully convolutional neural network), yielding good classification accuracy. Evaluation results over S2 data from the polar summer season in the Ross Sea region of the Antarctic show that the U-Net model trained on auto-labeled data has an accuracy of 90.18% over the original S2 images, whereas the U-Net model trained on manually labeled data has an accuracy of 91.39%. Filtering out the thin clouds and shadows from the S2 images further improves U-Net's accuracy, respectively, to 98.97% for auto-labeled and 98.40% for manually labeled training datasets.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Semantic Communication with Memory
Authors:
Huiqiang Xie,
Zhi** Qin,
Geoffrey Ye Li
Abstract:
While semantic communication succeeds in efficiently transmitting due to the strong capability to extract the essential semantic information, it is still far from the intelligent or human-like communications. In this paper, we introduce an essential component, memory, into semantic communications to mimic human communications. Particularly, we investigate a deep learning (DL) based semantic commun…
▽ More
While semantic communication succeeds in efficiently transmitting due to the strong capability to extract the essential semantic information, it is still far from the intelligent or human-like communications. In this paper, we introduce an essential component, memory, into semantic communications to mimic human communications. Particularly, we investigate a deep learning (DL) based semantic communication system with memory, named Mem-DeepSC, by considering the scenario question answer task. We exploit the universal Transformer based transceiver to extract the semantic information and introduce the memory module to process the context information. Moreover, we derive the relationship between the length of semantic signal and the channel noise to validate the possibility of dynamic transmission. Specially, we propose two dynamic transmission methods to enhance the transmission reliability as well as to reduce the communication overhead by masking some unessential elements, which are recognized through training the model with mutual information. Numerical results show that the proposed Mem-DeepSC is superior to benchmarks in terms of answer accuracy and transmission efficiency, i.e., number of transmitted symbols.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Fast-MC-PET: A Novel Deep Learning-aided Motion Correction and Reconstruction Framework for Accelerated PET
Authors:
Bo Zhou,
Yu-Jung Tsai,
Jiazhen Zhang,
Xueqi Guo,
Huidong Xie,
Xiongchao Chen,
Tianshun Miao,
Yihuan Lu,
James S. Duncan,
Chi Liu
Abstract:
Patient motion during PET is inevitable. Its long acquisition time not only increases the motion and the associated artifacts but also the patient's discomfort, thus PET acceleration is desirable. However, accelerating PET acquisition will result in reconstructed images with low SNR, and the image quality will still be degraded by motion-induced artifacts. Most of the previous PET motion correctio…
▽ More
Patient motion during PET is inevitable. Its long acquisition time not only increases the motion and the associated artifacts but also the patient's discomfort, thus PET acceleration is desirable. However, accelerating PET acquisition will result in reconstructed images with low SNR, and the image quality will still be degraded by motion-induced artifacts. Most of the previous PET motion correction methods are motion type specific that require motion modeling, thus may fail when multiple types of motion present together. Also, those methods are customized for standard long acquisition and could not be directly applied to accelerated PET. To this end, modeling-free universal motion correction reconstruction for accelerated PET is still highly under-explored. In this work, we propose a novel deep learning-aided motion correction and reconstruction framework for accelerated PET, called Fast-MC-PET. Our framework consists of a universal motion correction (UMC) and a short-to-long acquisition reconstruction (SL-Reon) module. The UMC enables modeling-free motion correction by estimating quasi-continuous motion from ultra-short frame reconstructions and using this information for motion-compensated reconstruction. Then, the SL-Recon converts the accelerated UMC image with low counts to a high-quality image with high counts for our final reconstruction output. Our experimental results on human studies show that our Fast-MC-PET can enable 7-fold acceleration and use only 2 minutes acquisition to generate high-quality reconstruction images that outperform/match previous motion correction reconstruction methods using standard 15 minutes long acquisition data.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
A variational autoencoder-based nonnegative matrix factorisation model for deep dictionary learning
Authors:
Hong-Bo Xie,
Caoyuan Li,
Shuliang Wang,
Richard Yi Da Xu,
Kerrie Mengersen
Abstract:
Construction of dictionaries using nonnegative matrix factorisation (NMF) has extensive applications in signal processing and machine learning. With the advances in deep learning, training compact and robust dictionaries using deep neural networks, i.e., dictionaries of deep features, has been proposed. In this study, we propose a probabilistic generative model which employs a variational autoenco…
▽ More
Construction of dictionaries using nonnegative matrix factorisation (NMF) has extensive applications in signal processing and machine learning. With the advances in deep learning, training compact and robust dictionaries using deep neural networks, i.e., dictionaries of deep features, has been proposed. In this study, we propose a probabilistic generative model which employs a variational autoencoder (VAE) to perform nonnegative dictionary learning. In contrast to the existing VAE models, we cast the model under a statistical framework with latent variables obeying a Gamma distribution and design a new loss function to guarantee the nonnegative dictionaries. We adopt an acceptance-rejection sampling reparameterization trick to update the latent variables iteratively. We apply the dictionaries learned from VAE-NMF to two signal processing tasks, i.e., enhancement of speech and extraction of muscle synergies. Experimental results demonstrate that VAE-NMF performs better in learning the latent nonnegative dictionaries in comparison with state-of-the-art methods.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Exploring Hybrid Active-Passive RIS-Aided MEC Systems: From the Mode-Switching Perspective
Authors:
Hao Xie,
Dong Li,
Bowen Gu
Abstract:
Mobile edge computing (MEC) has been regarded as a promising technique to support latencysensitivity and computation-intensive serves. However, the low offloading rate caused by the random channel fading characteristic becomes a major bottleneck in restricting the performance of the MEC. Fortunately, reconfigurable intelligent surface (RIS) can alleviate this problem since it can boost both the sp…
▽ More
Mobile edge computing (MEC) has been regarded as a promising technique to support latencysensitivity and computation-intensive serves. However, the low offloading rate caused by the random channel fading characteristic becomes a major bottleneck in restricting the performance of the MEC. Fortunately, reconfigurable intelligent surface (RIS) can alleviate this problem since it can boost both the spectrum- and energy- efficiency. Different from the existing works adopting either fully active or fully passive RIS, we propose a novel hybrid RIS in which reflecting units can flexibly switch between active and passive modes. To achieve a tradeoff between the latency and energy consumption, an optimization problem is formulated by minimizing the total cost. In light of the intractability of the problem, we develop an alternating optimization-based iterative algorithm by combining the successive convex approximation method, the variable substitution, and the singular value decomposition (SVD) to obtain sub-optimal solutions. Furthermore, in order to gain more insight into the problem, we consider two special cases involving a latency minimization problem and an energy consumption minimization problem, and respectively analyze the tradeoff between the number of active and passive units. Simulation results verify that the proposed algorithm can achieve flexible mode switching and significantly outperforms existing algorithms.
△ Less
Submitted 21 March, 2024; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Rethinking Generative Methods for Image Restoration in Physics-based Vision: A Theoretical Analysis from the Perspective of Information
Authors:
Xudong Kang,
Haoran Xie,
Man-Leung Wong,
**g Qin
Abstract:
End-to-end generative methods are considered a more promising solution for image restoration in physics-based vision compared with the traditional deconstructive methods based on handcrafted composition models. However, existing generative methods still have plenty of room for improvement in quantitative performance. More crucially, these methods are considered black boxes due to weak interpretabi…
▽ More
End-to-end generative methods are considered a more promising solution for image restoration in physics-based vision compared with the traditional deconstructive methods based on handcrafted composition models. However, existing generative methods still have plenty of room for improvement in quantitative performance. More crucially, these methods are considered black boxes due to weak interpretability and there is rarely a theory trying to explain their mechanism and learning process. In this study, we try to re-interpret these generative methods for image restoration tasks using information theory. Different from conventional understanding, we analyzed the information flow of these methods and identified three sources of information (extracted high-level information, retained low-level information, and external information that is absent from the source inputs) are involved and optimized respectively in generating the restoration results. We further derived their learning behaviors, optimization objectives, and the corresponding information boundaries by extending the information bottleneck principle. Based on this theoretic framework, we found that many existing generative methods tend to be direct applications of the general models designed for conventional generation tasks, which may suffer from problems including over-invested abstraction processes, inherent details loss, and vanishing gradients or imbalance in training. We analyzed these issues with both intuitive and theoretical explanations and proved them with empirical evidence respectively. Ultimately, we proposed general solutions or ideas to address the above issue and validated these approaches with performance boosts on six datasets of three different image restoration tasks.
△ Less
Submitted 8 December, 2022; v1 submitted 5 December, 2022;
originally announced December 2022.
-
On Negative Sampling for Contrastive Audio-Text Retrieval
Authors:
Huang Xie,
Okko Räsänen,
Tuomas Virtanen
Abstract:
This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text sam…
▽ More
This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text samples. With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling. Experimental results show that retrieval performance varies dramatically among different strategies. Particularly, by selecting semi-hard negatives with cross-modality scores, the retrieval system gains improved performance in both text-to-audio and audio-to-text retrieval. Besides, we show that feature collapse occurs while sampling hard negatives with cross-modality scores.
△ Less
Submitted 17 February, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Hybrid mmWave MIMO Systems under Hardware Impairments and Beam Squint: Channel Model and Dictionary Learning-aided Configuration
Authors:
Hongxiang Xie,
Joan Palacios,
Nuria González-Prelcic
Abstract:
Low overhead channel estimation based on compressive sensing (CS) has been widely investigated for hybrid wideband millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems. The channel sparsifying dictionaries used in prior work are built from ideal array response vectors evaluated on discrete angles of arrival/departure. In addition, these dictionaries are assumed to be the same for…
▽ More
Low overhead channel estimation based on compressive sensing (CS) has been widely investigated for hybrid wideband millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems. The channel sparsifying dictionaries used in prior work are built from ideal array response vectors evaluated on discrete angles of arrival/departure. In addition, these dictionaries are assumed to be the same for all subcarriers, without considering the impacts of hardware impairments and beam squint. In this manuscript, we derive a general channel and signal model that explicitly incorporates the impacts of hardware impairments, practical pulse sha** functions, and beam squint, overcoming the limitations of mmWave MIMO channel and signal models commonly used in previous work. Then, we propose a dictionary learning (DL) algorithm to obtain the sparsifying dictionaries embedding hardware impairments, by considering the effect of beam squint without introducing it into the learning process. We also design a novel CS channel estimation algorithm under beam squint and hardware impairments, where the channel structures at different subcarriers are exploited to enable channel parameter estimation with low complexity and high accuracy. Numerical results demonstrate the effectiveness of the proposed DL and channel estimation strategy when applied to realistic mmWave channels.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
A deep learning network with differentiable dynamic programming for retina OCT surface segmentation
Authors:
Hui Xie,
Weiyu Xu,
Xiaodong Wu
Abstract:
Multiple-surface segmentation in Optical Coherence Tomography (OCT) images is a challenge problem, further complicated by the frequent presence of weak image boundaries. Recently, many deep learning (DL) based methods have been developed for this task and yield remarkable performance. Unfortunately, due to the scarcity of training data in medical imaging, it is challenging for DL networks to learn…
▽ More
Multiple-surface segmentation in Optical Coherence Tomography (OCT) images is a challenge problem, further complicated by the frequent presence of weak image boundaries. Recently, many deep learning (DL) based methods have been developed for this task and yield remarkable performance. Unfortunately, due to the scarcity of training data in medical imaging, it is challenging for DL networks to learn the global structure of the target surfaces, including surface smoothness. To bridge this gap, this study proposes to seamlessly unify a U-Net for feature learning with a constrained differentiable dynamic programming module to achieve an end-to-end learning for retina OCT surface segmentation to explicitly enforce surface smoothness. It effectively utilizes the feedback from the downstream model optimization module to guide feature learning, yielding a better enforcement of global structures of the target surfaces. Experiments on Duke AMD (age-related macular degeneration) and JHU MS (multiple sclerosis) OCT datasets for retinal layer segmentation demonstrated very promising segmentation accuracy.
△ Less
Submitted 8 October, 2022;
originally announced October 2022.
-
ASTF: Visual Abstractions of Time-Varying Patterns in Radio Signals
Authors:
Ying Zhao,
Luhao Ge,
Huixuan Xie,
Genghuai Bai,
Zhao Zhang,
Qiang Wei,
Yun Lin,
Yuchao Liu,
Fangfang Zhou
Abstract:
A time-frequency diagram is a commonly used visualization for observing the time-frequency distribution of radio signals and analyzing their time-varying patterns of communication states in radio monitoring and management. While it excels when performing short-term signal analyses, it becomes inadaptable for long-term signal analyses because it cannot adequately depict signal time-varying patterns…
▽ More
A time-frequency diagram is a commonly used visualization for observing the time-frequency distribution of radio signals and analyzing their time-varying patterns of communication states in radio monitoring and management. While it excels when performing short-term signal analyses, it becomes inadaptable for long-term signal analyses because it cannot adequately depict signal time-varying patterns in a large time span on a space-limited screen. This research thus presents an abstract signal time-frequency (ASTF) diagram to address this problem. In the diagram design, a visual abstraction method is proposed to visually encode signal communication state changes in time slices. A time segmentation algorithm is proposed to divide a large time span into time slices.Three new quantified metrics and a loss function are defined to ensure the preservation of important time-varying information in the time segmentation. An algorithm performance experiment and a user study are conducted to evaluate the effectiveness of the diagram for long-term signal analyses.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Gain without Pain: Recycling Reflected Energy from Wireless Powered RIS-aided Communications
Authors:
Hao Xie,
Bowen Gu,
Dong Li,
Zhi Lin,
Yongjun Xu
Abstract:
In this paper, we investigate and analyze energy recycling for a reconfigurable intelligent surface (RIS)-aided wireless-powered communication network. As opposed to the existing works where the energy harvested by Internet of things (IoT) devices only come from the power station, IoT devices are also allowed to recycle energy from other IoT devices. In particular, we propose group switching- and…
▽ More
In this paper, we investigate and analyze energy recycling for a reconfigurable intelligent surface (RIS)-aided wireless-powered communication network. As opposed to the existing works where the energy harvested by Internet of things (IoT) devices only come from the power station, IoT devices are also allowed to recycle energy from other IoT devices. In particular, we propose group switching- and user switching-based protocols with time-division multiple access to evaluate the impact of energy recycling on system performance. Two different optimization problems are respectively formulated for maximizing the sum throughput by jointly optimizing the energy beamforming vectors, the transmit power, the transmission time, the receive beamforming vectors, the grou** factors, and the phase-shift matrices, where the constraints of the minimum throughput, the harvested energy, the maximum transmit power, the phase shift, the grou**, and the time allocation are taken into account. In light of the intractability of the above problems, we respectively develop two alternating optimization-based iterative algorithms by combining the successive convex approximation method and the penalty-based method to obtain corresponding sub-optimal solutions. Simulation results verify that the energy recycling-based mechanism can assist in enhancing the performance of IoT devices in terms of energy harvesting and information transmission. Besides, we also verify that the group switching-based algorithm can improve more sum throughput of IoT devices, and the user switching-based algorithm can harvest more energy.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Vector Quantized Semantic Communication System
Authors:
Qifan Fu,
Huiqiang Xie,
Zhi** Qin,
Gregory Slabaugh,
Xiaoming Tao
Abstract:
Although analog semantic communication systems have received considerable attention in the literature, there is less work on digital semantic communication systems. In this paper, we develop a deep learning (DL)-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a convolutional neural network (CNN)-based transceiver to extr…
▽ More
Although analog semantic communication systems have received considerable attention in the literature, there is less work on digital semantic communication systems. In this paper, we develop a deep learning (DL)-enabled vector quantized (VQ) semantic communication system for image transmission, named VQ-DeepSC. Specifically, we propose a convolutional neural network (CNN)-based transceiver to extract multi-scale semantic features of images and introduce multi-scale semantic embedding spaces to perform semantic feature quantization, rendering the data compatible with digital communication systems. Furthermore, we employ adversarial training to improve the quality of received images by introducing a PatchGAN discriminator. Experimental results demonstrate that the proposed VQ-DeepSC is more robustness than BPG in digital communication systems and has comparable MS-SSIM performance to the DeepJSCC method.
△ Less
Submitted 12 April, 2023; v1 submitted 23 September, 2022;
originally announced September 2022.
-
Language-based Audio Retrieval Task in DCASE 2022 Challenge
Authors:
Huang Xie,
Samuel Lip**,
Tuomas Virtanen
Abstract:
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at develo** computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), whi…
▽ More
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at develo** computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), which is about generating audio captions for audio signals, language-based audio retrieval (Subtask 6B) focuses on ranking audio signals according to their relevance to natural language textual captions. In DCASE 2022 Challenge, the provided baseline system for Subtask 6B was significantly outperformed, with top performance being 0.276 in mAP@10. This paper presents the outcome of Subtask 6B in terms of submitted systems' performance and analysis.
△ Less
Submitted 4 October, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Deformable Image Registration using Unsupervised Deep Learning for CBCT-guided Abdominal Radiotherapy
Authors:
Huiqiao Xie,
Yang Lei,
Yabo Fu,
Tonghe Wang,
Justin Roper,
Jeffrey D. Bradley,
Pretesh Patel,
Tian Liu,
Xiaofeng Yang
Abstract:
CBCTs in image-guided radiotherapy provide crucial anatomy information for patient setup and plan evaluation. Longitudinal CBCT image registration could quantify the inter-fractional anatomic changes. The purpose of this study is to propose an unsupervised deep learning based CBCT-CBCT deformable image registration. The proposed deformable registration workflow consists of training and inference s…
▽ More
CBCTs in image-guided radiotherapy provide crucial anatomy information for patient setup and plan evaluation. Longitudinal CBCT image registration could quantify the inter-fractional anatomic changes. The purpose of this study is to propose an unsupervised deep learning based CBCT-CBCT deformable image registration. The proposed deformable registration workflow consists of training and inference stages that share the same feed-forward path through a spatial transformation-based network (STN). The STN consists of a global generative adversarial network (GlobalGAN) and a local GAN (LocalGAN) to predict the coarse- and fine-scale motions, respectively. The network was trained by minimizing the image similarity loss and the deformable vector field (DVF) regularization loss without the supervision of ground truth DVFs. During the inference stage, patches of local DVF were predicted by the trained LocalGAN and fused to form a whole-image DVF. The local whole-image DVF was subsequently combined with the GlobalGAN generated DVF to obtain final DVF. The proposed method was evaluated using 100 fractional CBCTs from 20 abdominal cancer patients in the experiments and 105 fractional CBCTs from a cohort of 21 different abdominal cancer patients in a holdout test. Qualitatively, the registration results show great alignment between the deformed CBCT images and the target CBCT image. Quantitatively, the average target registration error (TRE) calculated on the fiducial markers and manually identified landmarks was 1.91+-1.11 mm. The average mean absolute error (MAE), normalized cross correlation (NCC) between the deformed CBCT and target CBCT were 33.42+-7.48 HU, 0.94+-0.04, respectively. This promising registration method could provide fast and accurate longitudinal CBCT alignment to facilitate inter-fractional anatomic changes analysis and prediction.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
An Overview on IEEE 802.11bf: WLAN Sensing
Authors:
Rui Du,
Hailiang Xie,
Mengshi Hu,
Narengerile,
Yan Xin,
Stephen McCann,
Michael Montemurro,
Tony Xiao Han,
Jie Xu
Abstract:
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent sensing requirements in emerging applications. T…
▽ More
With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent sensing requirements in emerging applications. To resolve this issue, a new Task Group (TG), namely IEEE 802.11bf, has been established by the IEEE 802.11 working group, with the objective of creating a new amendment to the WLAN standard to provide advanced sensing requirements while minimizing the effect on communications. This paper provides a comprehensive overview on the up-to-date efforts in the IEEE 802.11bf TG. First, we introduce the definition of the 802.11bf amendment and its standardization timeline. Then, we discuss the WLAN sensing procedure and framework used for measurement acquisition, by considering both conventional sensing at sub-7 GHz and directional multi-gigabit (DMG) sensing at 60 GHz, respectively. Next, we present various candidate technical features for IEEE 802.11bf, including waveform/sequence design, feedback types, quantization, as well as security and privacy. Finally, we describe the methodologies used by the IEEE 802.11bf TG to evaluate the alternative performance. It is desired that this overview paper provide useful insights on IEEE 802.11 WLAN sensing to people with great interests and promote the IEEE 802.11bf standard to be widely deployed.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.