Search | arXiv e-print repository

Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data

Authors: Yinqiu Feng, Bo Zhang, Lingxi Xiao, Yutian Yang, Tana Gegen, Zexi Chen

Abstract: In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator networ… ▽ More In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator network architecture founded on deep convolutional neural networks (CNNs), leveraging the adversarial training paradigm for model optimization. Through extensive experimentation across diverse medical image datasets, our method exhibits robust performance, consistently generating synthetic images that closely emulate the structural and textural attributes of authentic medical images. △ Less

Submitted 22 May, 2024; originally announced June 2024.

arXiv:2406.16981 [pdf]

Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

Authors: Lingxi Xiao, **xin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional iterative algorithms. Functional magnetic resonance imaging (fMRI) of the auditory cortex of a single subject is analyzed and compared to the wavelet domain signal processing technology based on repeated times and the world's most influential SPM8. Experiments show that this algorithm is the fastest in computing time, and its detection effect is comparable to the traditional iterative algorithm. However, this has a higher practical value for the processing of FMRI data. In addition, the wavelet analysis method proposed signal processing to speed up the calculation rate. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14485

Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zi** Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA. This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA. △ Less

Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12456 [pdf, other]

Deep-learning-based groupwise registration for motion correction of cardiac $T_1$ map**

Authors: Yi Zhang, Yidong Zhao, Lu Huang, Liming Xia, Qian Tao

Abstract: Quantitative $T_1$ map** by MRI is an increasingly important tool for clinical assessment of cardiovascular diseases. The cardiac $T_1$ map is derived by fitting a known signal model to a series of baseline images, while the quality of this map can be deteriorated by involuntary respiratory and cardiac motion. To correct motion, a template image is often needed to register all baseline images, b… ▽ More Quantitative $T_1$ map** by MRI is an increasingly important tool for clinical assessment of cardiovascular diseases. The cardiac $T_1$ map is derived by fitting a known signal model to a series of baseline images, while the quality of this map can be deteriorated by involuntary respiratory and cardiac motion. To correct motion, a template image is often needed to register all baseline images, but the choice of template is nontrivial, leading to inconsistent performance sensitive to image contrast. In this work, we propose a novel deep-learning-based groupwise registration framework, which omits the need for a template, and registers all baseline images simultaneously. We design two groupwise losses for this registration framework: the first is a linear principal component analysis (PCA) loss that enforces alignment of baseline images irrespective of the intensity variation, and the second is an auxiliary relaxometry loss that enforces adherence of intensity profile to the signal model. We extensively evaluated our method, termed ``PCA-Relax'', and other baseline methods on an in-house cardiac MRI dataset including both pre- and post-contrast $T_1$ sequences. All methods were evaluated under three distinct training-and-evaluation strategies, namely, standard, one-shot, and test-time-adaptation. The proposed PCA-Relax showed further improved performance of registration and map** over well-established baselines. The proposed groupwise framework is generic and can be adapted to applications involving multiple images. △ Less

Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: MICCAI 2024. Contents may slightly differ from the camera-ready version

arXiv:2406.00690 [pdf, other]

Electromagnetic Wave Property Inspired Radio Environment Knowledge Construction and AI-based Verification for 6G Digital Twin Channel

Authors: Jialin Wang, Jianhua Zhang, Yutong Sun, Yuxiang Zhang, Tao Jiang, Liang Xia

Abstract: As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and t… ▽ More As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and the construction method based on artificial intelligence (AI) is the most concentrated. However, in the existing methods, the environment information input into the neural network (NN) has many dimensions, and the correlation between the environment and the channel relationship is unclear, resulting in a highly complex relationship construction process. To solve this issue, in this paper, we propose a construction method of radio environment knowledge (REK) inspired by the electromagnetic wave property to quantify the contribution of radio propagation. Specifically, a range selection scheme for effective environment information based on random geometry is proposed to reduce the redundancy of environment information. We quantify the contribution of radio propagation reflection, diffraction and scatterer blockage using environment information and propose a flow chart of REK construction to replace the feature extraction process partially based on NN. To validate REK's effectiveness, we conduct a path loss prediction task based on a lightweight convolutional neural network (CNN) employing a simple two-layer convolutional structure. The results show that the accuracy of the range selection method reaches 90\%; the constructed REK maintains the prediction error of 0.3 and only needs 0.04 seconds of testing time, effectively reducing the network complexity. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.03119 [pdf, ps, other]

DAFT-Spread Affine Frequency Division Multiple Access for Downlink Transmission

Authors: Yiwei Tao, Miaowen Wen, Yao Ge, Tianqi Mao, Lixia Xiao, Jun Li

Abstract: Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In thi… ▽ More Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In this paper, we propose a discrete affine Fourier transform (DAFT)-spread AFDMA scheme based on the properties of the AFDM systems, named DAFT-s-AFDMA to significantly reduce the PAPR by resorting to the DAFT. We formulate the transmitted time-domain signals of the proposed DAFT-s-AFDMA schemes with localized and interleaved chirp subcarrier allocation strategies. Accordingly, we derive the guidelines for setting the DAFT parameters, revealing the insights of PAPR reduction. Finally, simulation results of PAPR comparison in terms of the complementary cumulative distribution function (CCDF) show that the proposed DAFT-s-AFDMA schemes with localized and interleaved strategies can both attain better PAPR performances than the conventional O-AFDMA scheme. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.10232 [pdf, other]

Channel Estimation for AFDM With Superimposed Pilots

Authors: Kai Zheng, Miaowen Wen, Tianqi Mao, Lixia Xiao, Zhaocheng Wang

Abstract: The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve… ▽ More The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve the SE, we propose a novel AFDM channel estimation scheme by introducing the superimposed pilots in the DAFT domain. An effective pilot placement method that minimizes the channel estimation error is also developed with a rigorous proof. To mitigate the pilot-data interference, we further propose an iterative channel estimator and signal detector. Simulation results demonstrate that both channel estimation and data detection performances can be improved by the proposed scheme as the number of superimposed pilots increases. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.04162 [pdf, other]

Wireless Resource Optimization in Hybrid Semantic/Bit Communication Networks

Authors: Le Xia, Yao Sun, Dusit Niyato, Lan Zhang, Muhammad Ali Imran

Abstract: Recently, semantic communication (SemCom) has shown great potential in significant resource savings and efficient information exchanges, thus naturally introducing a novel and practical cellular network paradigm where two modes of SemCom and conventional bit communication (BitCom) coexist. Nevertheless, the involved wireless resource management becomes rather complicated and challenging, given the… ▽ More Recently, semantic communication (SemCom) has shown great potential in significant resource savings and efficient information exchanges, thus naturally introducing a novel and practical cellular network paradigm where two modes of SemCom and conventional bit communication (BitCom) coexist. Nevertheless, the involved wireless resource management becomes rather complicated and challenging, given the unique background knowledge matching and time-consuming semantic coding requirements in SemCom. To this end, this paper jointly investigates user association (UA), mode selection (MS), and bandwidth allocation (BA) problems in a hybrid semantic/bit communication network (HSB-Net). Concretely, we first identify a unified performance metric of message throughput for both SemCom and BitCom links. Next, we specially develop a knowledge matching-aware two-stage tandem packet queuing model and theoretically derive the average packet loss ratio and queuing latency. Combined with practical constraints, we then formulate a joint optimization problem for UA, MS, and BA to maximize the overall message throughput of HSB-Net. Afterward, we propose an optimal resource management strategy by utilizing a Lagrange primal-dual transformation method and a preference list-based heuristic algorithm with polynomial-time complexity. Numerical results not only demonstrate the accuracy of our analytical queuing model, but also validate the performance superiority of our proposed strategy compared with different benchmarks. △ Less

Submitted 13 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: This paper has been submitted to IEEE Transactions on Communications for peer review

arXiv:2403.09355 [pdf, other]

Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction

Authors: Hanyu Chen, Zhixiu Hao, Lin Guo, Liying Xiao

Abstract: Sparse-view Computed Tomography (CT) image reconstruction is a promising approach to reduce radiation exposure, but it inevitably leads to image degradation. Although diffusion model-based approaches are computationally expensive and suffer from the training-sampling discrepancy, they provide a potential solution to the problem. This study introduces a novel Cascaded Diffusion with Discrepancy Mit… ▽ More Sparse-view Computed Tomography (CT) image reconstruction is a promising approach to reduce radiation exposure, but it inevitably leads to image degradation. Although diffusion model-based approaches are computationally expensive and suffer from the training-sampling discrepancy, they provide a potential solution to the problem. This study introduces a novel Cascaded Diffusion with Discrepancy Mitigation (CDDM) framework, including the low-quality image generation in latent space and the high-quality image generation in pixel space which contains data consistency and discrepancy mitigation in a one-step reconstruction process. The cascaded framework minimizes computational costs by moving some inference steps from pixel space to latent space. The discrepancy mitigation technique addresses the training-sampling gap induced by data consistency, ensuring the data distribution is close to the original manifold. A specialized Alternating Direction Method of Multipliers (ADMM) is employed to process image gradients in separate directions, offering a more targeted approach to regularization. Experimental results across two datasets demonstrate CDDM's superior performance in high-quality image generation with clearer boundaries compared to existing methods, highlighting the framework's computational efficiency. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.07951 [pdf, other]

SAMDA: Leveraging SAM on Few-Shot Domain Adaptation for Electronic Microscopy Segmentation

Authors: Yiran Wang, Li Xiao

Abstract: It has been shown that traditional deep learning methods for electronic microscopy segmentation usually suffer from low transferability when samples and annotations are limited, while large-scale vision foundation models are more robust when transferring between different domains but facing sub-optimal improvement under fine-tuning. In this work, we present a new few-shot domain adaptation framewo… ▽ More It has been shown that traditional deep learning methods for electronic microscopy segmentation usually suffer from low transferability when samples and annotations are limited, while large-scale vision foundation models are more robust when transferring between different domains but facing sub-optimal improvement under fine-tuning. In this work, we present a new few-shot domain adaptation framework SAMDA, which combines the Segment Anything Model(SAM) with nnUNet in the embedding space to achieve high transferability and accuracy. Specifically, we choose the Unet-based network as the "expert" component to learn segmentation features efficiently and design a SAM-based adaptation module as the "generic" component for domain transfer. By amalgamating the "generic" and "expert" components, we mitigate the modality imbalance in the complex pre-training knowledge inherent to large-scale Vision Foundation models and the challenge of transferability inherent to traditional neural networks. The effectiveness of our model is evaluated on two electron microscopic image datasets with different modalities for mitochondria segmentation, which improves the dice coefficient on the target domain by 6.7%. Also, the SAM-based adaptor performs significantly better with only a single annotated image than the 10-shot domain adaptation on nnUNet. We further verify our model on four MRI datasets from different sources to prove its generalization ability. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2401.12468 [pdf, ps, other]

Minimum observability of probabilistic Boolean networks

Authors: Jiayi Xu, Shihua Fu, Liyuan Xia, Jianjun Wang

Abstract: This paper studies the minimum observability of probabilistic Boolean networks (PBNs), the main objective of which is to add the fewest measurements to make an unobservable PBN become observable. First of all, the algebraic form of a PBN is established with the help of semi-tensor product (STP) of matrices. By combining the algebraic forms of two identical PBNs into a parallel system, a method to… ▽ More This paper studies the minimum observability of probabilistic Boolean networks (PBNs), the main objective of which is to add the fewest measurements to make an unobservable PBN become observable. First of all, the algebraic form of a PBN is established with the help of semi-tensor product (STP) of matrices. By combining the algebraic forms of two identical PBNs into a parallel system, a method to search the states that need to be H-distinguishable is proposed based on the robust set reachability technique. Secondly, a necessary and sufficient condition is given to find the minimum measurements such that a given set can be H-distinguishable. Moreover, by comparing the numbers of measurements for all the feasible H-distinguishable state sets, the least measurements that make the system observable are gained. Finally, an example is given to verify the validity of the obtained results. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2312.01586 [pdf, ps, other]

On the Maximization of Long-Run Reward CVaR for Markov Decision Processes

Authors: Li Xia, Zhihui Yu, Peter W. Glynn

Abstract: This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directio… ▽ More This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directions, we prove that the maximum of long-run CVaR of MDPs over the set of history-dependent randomized policies can be found within the class of stationary randomized policies. In contrast to classical MDPs, we find that there may not exist an optimal stationary deterministic policy for maximizing CVaR. Instead, we prove the existence of an optimal stationary randomized policy that requires randomizing over at most two actions. Via a convex optimization representation of CVaR, we convert the long-run CVaR maximization MDP into a minimax problem, where we prove the interchangeability of minimum and maximum and the related existence of saddle point solutions. Furthermore, we propose an algorithm that finds the saddle point solution by solving two linear programs. These results are then extended to objectives that involve maximizing some combination of mean and CVaR of rewards simultaneously. Finally, we conduct numerical experiments to demonstrate the main results. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: Risk-seeking optimization of CVaR in MDP

arXiv:2312.01125 [pdf, other]

Design and Performance Analysis of Index Modulation Empowered AFDM System

Authors: **g Zhu, Qu Luo, Gaojie Chen, Pei Xiao, Lixia Xiao

Abstract: In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming… ▽ More In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming bit streams. Then, two power allocation strategies, namely power reallocation (PR) strategy and power saving (PS) strategy, are proposed to enhance BER and EE performance, respectively. Furthermore, the average bit error probability (ABEP) is theoretically analyzed. Simulation results demonstrate that the proposed AFDM-IM scheme achieves better BER performance than the conventional AFDM scheme. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.15339 [pdf, other]

Adversarial Purification of Information Masking

Authors: Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao

Abstract: Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal… ▽ More Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal purification results. We emphasize the potential threat of residual adversarial perturbations to target models, quantitatively establishing a relationship between perturbation scale and attack capability. Notably, the residual perturbations on the purified image primarily stem from the same-position patch and similar patches of the adversarial sample. We propose a novel adversarial purification approach named Information Mask Purification (IMPure), aims to extensively eliminate adversarial perturbations. To obtain an adversarial sample, we first mask part of the patches information, then reconstruct the patches to resist adversarial perturbations from the patches. We reconstruct all patches in parallel to obtain a cohesive image. Then, in order to protect the purified samples against potential similar regional perturbations, we simulate this risk by randomly mixing the purified samples with the input samples before inputting them into the feature extraction network. Finally, we establish a combined constraint of pixel loss and perceptual loss to augment the model's reconstruction adaptability. Extensive experiments on the ImageNet dataset with three classifier models demonstrate that our approach achieves state-of-the-art results against nine adversarial attack methods. Implementation code and pre-trained weights can be accessed at \textcolor{blue}{https://github.com/NoWindButRain/IMPure}. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.11804 [pdf, ps, other]

Robust Multidimentional Chinese Remainder Theorem for Integer Vector Reconstruction

Authors: Li Xiao, Haiye Huo, Xiang-Gen Xia

Abstract: The problem of robustly reconstructing an integer vector from its erroneous remainders appears in many applications in the field of multidimensional (MD) signal processing. To address this problem, a robust MD Chinese remainder theorem (CRT) was recently proposed for a special class of moduli, where the remaining integer matrices left-divided by a greatest common left divisor (gcld) of all the mod… ▽ More The problem of robustly reconstructing an integer vector from its erroneous remainders appears in many applications in the field of multidimensional (MD) signal processing. To address this problem, a robust MD Chinese remainder theorem (CRT) was recently proposed for a special class of moduli, where the remaining integer matrices left-divided by a greatest common left divisor (gcld) of all the moduli are pairwise commutative and coprime. The strict constraint on the moduli limits the usefulness of the robust MD-CRT in practice. In this paper, we investigate the robust MD-CRT for a general set of moduli. We first introduce a necessary and sufficient condition on the difference between paired remainder errors, followed by a simple sufficient condition on the remainder error bound, for the robust MD-CRT for general moduli, where the conditions are associated with (the minimum distances of) these lattices generated by gcld's of paired moduli, and a closed-form reconstruction algorithm is presented. We then generalize the above results of the robust MD-CRT from integer vectors/matrices to real ones. Finally, we validate the robust MD-CRT for general moduli by employing numerical simulations, and apply it to MD sinusoidal frequency estimation based on multiple sub-Nyquist samplers. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 12 pages, 5 figure

arXiv:2311.08955 [pdf, ps, other]

A Spectral Diffusion Prior for Hyperspectral Image Super-Resolution

Authors: Jianjun Liu, Zebin Wu, Liang Xiao

Abstract: Fusion-based hyperspectral image (HSI) super-resolution aims to produce a high-spatial-resolution HSI by fusing a low-spatial-resolution HSI and a high-spatial-resolution multispectral image. Such a HSI super-resolution process can be modeled as an inverse problem, where the prior knowledge is essential for obtaining the desired solution. Motivated by the success of diffusion models, we propose a… ▽ More Fusion-based hyperspectral image (HSI) super-resolution aims to produce a high-spatial-resolution HSI by fusing a low-spatial-resolution HSI and a high-spatial-resolution multispectral image. Such a HSI super-resolution process can be modeled as an inverse problem, where the prior knowledge is essential for obtaining the desired solution. Motivated by the success of diffusion models, we propose a novel spectral diffusion prior for fusion-based HSI super-resolution. Specifically, we first investigate the spectrum generation problem and design a spectral diffusion model to model the spectral data distribution. Then, in the framework of maximum a posteriori, we keep the transition information between every two neighboring states during the reverse generative process, and thereby embed the knowledge of trained spectral diffusion model into the fusion problem in the form of a regularization term. At last, we treat each generation step of the final optimization problem as its subproblem, and employ the Adam to solve these subproblems in a reverse sequence. Experimental results conducted on both synthetic and real datasets demonstrate the effectiveness of the proposed approach. The code of the proposed approach will be available on https://github.com/liuofficial/SDP. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.16869 [pdf]

Single-pixel imaging based on deep learning

Authors: Kai Song, Yaoxing Bian, Ku Wu, Hongrui Liu, Shuang** Han, Jiaming Li, Jiazhao Tian, Chengbin Qin, Jianyong Hu, Liantuan Xiao

Abstract: Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to i… ▽ More Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to its exceptional reconstruction quality, fast reconstruction speed, and the potential to complete advanced sensing tasks without reconstructing images. Here, this advance is discussed and some opinions are offered. Firstly, based on the fundamental principles of single-pixel imaging and deep learning, the principles and algorithms of single-pixel imaging based on deep learning are described and analyzed. Subsequently, the implementation technologies of single-pixel imaging based on deep learning are reviewed. They are divided into super-resolution single-pixel imaging, single-pixel imaging through scattering media, photon-level single-pixel imaging, optical encryption based on single-pixel imaging, color single-pixel imaging, and image-free sensing according to diverse application fields. Finally, major challenges and corresponding feasible approaches are discussed, as well as more possible applications in the future. △ Less

Submitted 16 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.05858 [pdf, other]

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Authors: **gliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li

Abstract: Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effecti… ▽ More Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of expected value substituting, twin value distribution learning, and variance-based critic gradient adjusting. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses or matches a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales. △ Less

Submitted 28 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.12461 [pdf, other]

Knowledge Base Aware Semantic Communication in Vehicular Networks

Authors: Le Xia, Yao Sun, Dusit Niyato, Kairong Ma, Jiawen Kang, Muhammad Ali Imran

Abstract: Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background k… ▽ More Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to realize efficient vehicle-to-vehicle service provisioning for multiple users at the same time. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee. Simulation results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, and user knowledge preference satisfaction compared with two different benchmarks. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: This paper has been accepted for publication by 2023 IEEE International Conference on Communications (ICC 2023). Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: substantial text overlap with arXiv:2302.11993

arXiv:2309.06981 [pdf, other]

MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan

Abstract: Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation… ▽ More Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation of existing poisoning attacks against unseen targets. Then, we optimize a universal backdoor that is capable of attacking arbitrary targets. Next, we embed the speaker's characteristics and semantics information into the backdoor, making it imperceptible. Finally, we estimate the channel distortion and integrate it into the backdoor. We validate our attack on 6 popular SV models. Specifically, we poison a total of 53 models and use our trigger to attack 16,430 enrolled speakers, composed of 310 target speakers enrolled in 53 poisoned models. Our attack achieves 100% attack success rate with a 15% poison rate. By decreasing the poison rate to 3%, the attack success rate remains around 50%. We validate our attack in 3 real-world scenarios and successfully demonstrate the attack through both over-the-air and over-the-telephony-line scenarios. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted by Mobicom 2023

arXiv:2308.15483 [pdf, other]

Generative AI for Semantic Communication: Architecture, Challenges, and Outlook

Authors: Le Xia, Yao Sun, Chengsi Liang, Lei Zhang, Muhammad Ali Imran, Dusit Niyato

Abstract: Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorpo… ▽ More Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorporating generative artificial intelligence (GAI) technologies with SemCom. Recognizing GAI's powerful capability in automating and creating valuable, diverse, and personalized multimodal content, this article first highlights the principal characteristics of the combination of GAI and SemCom along with their pertinent benefits and challenges. To tackle these challenges, we further propose a novel GAI-assisted SemCom network (GAI-SCN) framework in a cloud-edge-mobile design. Specifically, by employing global and local GAI models, our GAI-SCN enables multimodal semantic content provisioning, semantic-level joint-source-channel coding, and AIGC acquisition to maximize the efficiency and reliability of semantic reasoning and resource utilization. Afterward, we present a detailed implementation workflow of GAI-SCN, followed by corresponding initial simulations for performance evaluation in comparison with two benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of GAI-SCN. △ Less

Submitted 18 January, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: This article has been submitted to IEEE Wireless Communications Magazine for the second round of peer review after completing the major revision

arXiv:2307.13346 [pdf, other]

A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis

Authors: Li Xiao, ** Yang, Xinhong Li, Wei** Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang, Yanzhen Ren

Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic breathing disorder caused by a blockage in the upper airways. Snoring is a prominent symptom of OSAHS, and previous studies have attempted to identify the obstruction site of the upper airways by snoring sounds. Despite some progress, the classification of the obstruction site remains challenging in real-world clinical settings due to… ▽ More Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic breathing disorder caused by a blockage in the upper airways. Snoring is a prominent symptom of OSAHS, and previous studies have attempted to identify the obstruction site of the upper airways by snoring sounds. Despite some progress, the classification of the obstruction site remains challenging in real-world clinical settings due to the influence of sleep body position on upper airways. To address this challenge, this paper proposes a snore-based sleep body position recognition dataset (SSBPR) consisting of 7570 snoring recordings, which comprises six distinct labels for sleep body position: supine, supine but left lateral head, supine but right lateral head, left-side lying, right-side lying and prone. Experimental results show that snoring sounds exhibit certain acoustic features that enable their effective utilization for identifying body posture during sleep in real-world scenarios. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to INTERSPEECH 2023

arXiv:2307.13295 [pdf, other]

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

Authors: Youqiang Zheng, Li Xiao, Wei** Tu, Yuhong Yang, Xinmeng Xu

Abstract: Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec with the neural vocoder. In this paper, we propose a novel framework named CQNV, which combines the coarsely quantized parameters of a traditional parametric cod… ▽ More Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec with the neural vocoder. In this paper, we propose a novel framework named CQNV, which combines the coarsely quantized parameters of a traditional parametric codec to reduce the bitrate with a neural vocoder to improve the quality of the decoded speech. Furthermore, we introduce a parameters processing module into the neural vocoder to enhance the application of the bitstream of traditional speech coding parameters to the neural vocoder, further improving the reconstructed speech's quality. In the experiments, both subjective and objective evaluations demonstrate the effectiveness of the proposed CQNV framework. Specifically, our proposed method can achieve higher quality reconstructed speech at 1.1 kbps than Lyra and Encodec at 3 kbps. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2307.09729 [pdf, other]

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.16616 [pdf, other]

Channel Measurement, Modeling, and Simulation for 6G: A Survey and Tutorial

Authors: Jianhua Zhang, Jiaxin Lin, Pan Tang, Yuxiang Zhang, Huixin Xu, Tianyang Gao, Haiyang Miao, Zeyong Chai, Zhengfu Zhou, Yi Li, Huiwen Gong, Yameng Liu, Zhiqiang Yuan, Lei Tian, Shaoshi Yang, Liang Xia, Guangyi Liu, ** Zhang

Abstract: The sixth generation (6G) mobile communications have attracted substantial attention in the global research community of information and communication technologies (ICT). 6G systems are expected to support not only extended 5G usage scenarios, but also new usage scenarios, such as integrated sensing and communication (ISAC), integrated artificial intelligence (AI) and communication, and communicat… ▽ More The sixth generation (6G) mobile communications have attracted substantial attention in the global research community of information and communication technologies (ICT). 6G systems are expected to support not only extended 5G usage scenarios, but also new usage scenarios, such as integrated sensing and communication (ISAC), integrated artificial intelligence (AI) and communication, and communication and ubiquitous connectivity. To realize this goal, channel characteristics must be comprehensively studied and properly exploited, so as to promote the design, standardization, and optimization of 6G systems. In this paper, we first summarize the requirements and challenges in 6G channel research. Our focus is on channels for five promising technologies enabling 6G, including terahertz (THz), extreme MIMO (E-MIMO), ISAC, reconfigurable intelligent surface (RIS), and space-air-ground integrated network (SAGIN). Then, a survey of the progress of the 6G channel research regarding the above five promising technologies is presented in terms of the latest measurement campaigns, new characteristics, modeling methods, and research prospects. Moreover, a tutorial on the 6G channel simulations is presented. We introduce the BUPTCMG- 6G, a 6G link-level channel simulator, developed based on the ITU/3GPP 3D geometry-based stochastic model (GBSM) methodology. The simulator supports the channel simulation of the aforementioned 6G potential technologies. To facilitate the use of the simulator, the tutorial encompasses the design framework, user guidelines, and application examples. This paper offers in-depth, hands-on insights into the best practices of channel measurements, modeling, and simulations for the evaluation of 6G technologies, the development of 6G standards, and the implementation and optimization of 6G systems. △ Less

Submitted 28 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 41 pages,52 figures

arXiv:2304.12704 [pdf, other]

GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

Authors: Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng

Abstract: Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genr… ▽ More Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genre-consistent dance generation framework, GTN-Bailando. First, we propose the Genre Token Network (GTN), which infers the genre from music to enhance the genre consistency of long-term dance generation. Second, to improve the generalization capability of the model, the strategy of pre-training and fine-tuning is adopted.Experimental results on the AIST++ dataset show that the proposed dance generation framework outperforms state-of-the-art methods in terms of motion quality and genre consistency. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted by ICASSP2023.Demo page: https://im1eon.github.io/ICASSP23-GTNB-DG/

arXiv:2302.11993 [pdf, other]

doi 10.1109/TWC.2023.3319442

xURLLC-Aware Service Provisioning in Vehicular Networks: A Semantic Communication Perspective

Authors: Le Xia, Yao Sun, Dusit Niyato, Daquan Feng, Lei Feng, Muhammad Ali Imran

Abstract: Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency req… ▽ More Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency requirements. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to simultaneously realize efficient service provisioning for multiple users in vehicle-to-vehicle networks. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks in alignment with the next-generation ultra-reliable and low-latency communication (xURLLC) requirements. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee and computational complexity. Numerical results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, user knowledge matching degree and knowledge preference satisfaction compared with two benchmarks. △ Less

Submitted 23 September, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: This paper has been accepted for publication by IEEE Transactions on Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2212.14142 [pdf, other]

doi 10.1109/TVT.2023.3318510

Joint User Association and Bandwidth Allocation in Semantic Communication Networks

Authors: Le Xia, Yao Sun, Dusit Niyato, Xiaoqian Li, Muhammad Ali Imran

Abstract: Semantic communication (SemCom) has recently been considered a promising solution to guarantee high resource utilization and transmission reliability for future wireless networks. Nevertheless, the unique demand for background knowledge matching makes it challenging to achieve efficient wireless resource management for multiple users in SemCom-enabled networks (SC-Nets). To this end, this paper in… ▽ More Semantic communication (SemCom) has recently been considered a promising solution to guarantee high resource utilization and transmission reliability for future wireless networks. Nevertheless, the unique demand for background knowledge matching makes it challenging to achieve efficient wireless resource management for multiple users in SemCom-enabled networks (SC-Nets). To this end, this paper investigates SemCom from a networking perspective, where two fundamental problems of user association (UA) and bandwidth allocation (BA) are systematically addressed in the SC-Net. First, considering varying knowledge matching states between mobile users and associated base stations, we identify two general SC-Net scenarios, namely perfect knowledge matching-based SC-Net and imperfect knowledge matching-based SC-Net. Afterward, for each SC-Net scenario, we describe its distinctive semantic channel model from the semantic information theory perspective, whereby a concept of bit-rate-to-message-rate transformation is developed along with a new semantics-level metric, namely system throughput in message (STM), to measure the overall network performance. In this way, we then formulate a joint STM-maximization problem of UA and BA for each SC-Net scenario, followed by a corresponding optimal solution proposed. Numerical results in both scenarios demonstrate significant superiority and reliability of our solutions in the STM performance compared with two benchmarks. △ Less

Submitted 23 September, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: This paper has been accepted for publication by IEEE Transactions on Vehicular Technology. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2212.12134 [pdf, other]

AMDET: Attention based Multiple Dimensions EEG Transformer for Emotion Recognition

Authors: Yongling Xu, Yang Du, **g Zou, Tianying Zhou, Lushan Xiao, Li Liu, Pengcheng

Abstract: Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a dee… ▽ More Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a deep model called Attention-based Multiple Dimensions EEG Transformer (AMDET), which can exploit the complementarity among the spectral-spatial-temporal features of EEG data by employing the multi-dimensional global attention mechanism. We transformed the original EEG data into 3D temporal-spectral-spatial representations and then the AMDET would use spectral-spatial transformer encoder layer to extract effective features in the EEG signal and concentrate on the critical time frame with a temporal attention layer. We conduct extensive experiments on the DEAP, SEED, and SEED-IV datasets to evaluate the performance of AMDET and the results outperform the state-of-the-art baseline on three datasets. Accuracy rates of 97.48%, 96.85%, 97.17%, 87.32% were achieved in the DEAP-Arousal, DEAP-Valence, SEED, and SEED-IV datasets, respectively. We also conduct extensive experiments to explore the possible brain regions that influence emotions and the coupling of EEG signals. AMDET can perform as well even with few channels which are identified by visualizing what learned model focus on. The accuracy could achieve over 90% even with only eight channels and it is of great use and benefit for practical applications. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2212.00687 [pdf]

3D-EPI Blip-Up/Down Acquisition (BUDA) with CAIPI and Joint Hankel Structured Low-Rank Reconstruction for Rapid Distortion-Free High-Resolution T2* Map**

Authors: Zhifeng Chen, Congyu Liao, Xiaozhi Cao, Benedikt A. Poser, Zhongbiao Xu, Wei-Ching Lo, Manyi Wen, Jae** Cho, Qiyuan Tian, Yaohui Wang, Yanqiu Feng, Ling Xia, Wufan Chen, Feng Liu, Berkin Bilgic

Abstract: Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* map**. Methods: 3D-Blip-Up and -Down Acquisition (3D-BUDA) sequence is designed for both single- and multi-echo 3D GRE-EPI imaging using multiple shots with blip-up and -down readouts to encode B0 fi… ▽ More Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* map**. Methods: 3D-Blip-Up and -Down Acquisition (3D-BUDA) sequence is designed for both single- and multi-echo 3D GRE-EPI imaging using multiple shots with blip-up and -down readouts to encode B0 field map information. Complementary k-space coverage is achieved using controlled aliasing in parallel imaging (CAIPI) sampling across the shots. For image reconstruction, an iterative hard-thresholding algorithm is employed to minimize the cost function that combines field map information informed parallel imaging with the structured low-rank constraint for multi-shot 3D-BUDA data. Extending 3D-BUDA to multi-echo imaging permits T2* map**. For this, we propose constructing a joint Hankel matrix along both echo and shot dimensions to improve the reconstruction. Results: Experimental results on in vivo multi-echo data demonstrate that, by performing joint reconstruction along with both echo and shot dimensions, reconstruction accuracy is improved compared to standard 3D-BUDA reconstruction. CAIPI sampling is further shown to enhance the image quality. For T2* map**, T2* values from 3D-Joint-CAIPI-BUDA and reference multi-echo GRE are within limits of agreement as quantified by Bland-Altman analysis. Conclusions: The proposed technique enables rapid 3D distortion-free high-resolution imaging and T2* map**. Specifically, 3D-BUDA enables 1-mm isotropic whole-brain imaging in 22 s at 3 T and 9 s on a 7 T scanner. The combination of multi-echo 3D-BUDA with CAIPI acquisition and joint reconstruction enables distortion-free whole-brain T2* map** in 47 s at 1.1x1.1x1.0 mm3 resolution. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.13229 [pdf, other]

DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis

Authors: Xian Wu, Shuxin Yang, Zhaopeng Qiu, Shen Ge, Yangtian Yan, Xingwang Wu, Yefeng Zheng, S. Kevin Zhou, Li Xiao

Abstract: Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate… ▽ More Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate medical reports automatically. Different from typical image captioning approaches that generate reports with an encoder and a decoder, DeltaNet applies a conditional generation process. In particular, given a medical image, DeltaNet employs three steps to generate a report: 1) first retrieving related medical reports, i.e., the historical reports from the same or similar patients; 2) then comparing retrieved images and current image to find the differences; 3) finally generating a new report to accommodate identified differences based on the conditional report. We evaluate DeltaNet on a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches. Besides COVID-19, the proposed DeltaNet can be applied to other diseases as well. We validate its generalization capabilities on the public IU-Xray and MIMIC-CXR datasets for chest-related diseases. Code is available at \url{https://github.com/LX-doctorAI1/DeltaNet}. △ Less

Submitted 12 November, 2022; originally announced November 2022.

arXiv:2211.01241 [pdf, other]

doi 10.1109/MWC.004.2200393

WiserVR: Semantic Communication Enabled Wireless Virtual Reality Delivery

Authors: Le Xia, Yao Sun, Chengsi Liang, Daquan Feng, Runze Cheng, Yang Yang, Muhammad Ali Imran

Abstract: Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using… ▽ More Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360° video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR. △ Less

Submitted 13 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: This magazine article has been accepted for publication by IEEE Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2211.00648 [pdf]

doi 10.1038/s41467-023-38898-4

Non-line-of-sight imaging with arbitrary illumination and detection pattern

Authors: Xintong Liu, Jianyu Wang, Zuoqiang Shi, Xing Fu, Lingyun Qiu

Abstract: Non-line-of-sight (NLOS) imaging aims at reconstructing targets obscured from the direct line of sight. Existing NLOS imaging algorithms require dense measurements at rectangular grid points in a large area of the relay surface, which severely hinders their availability to variable relay scenarios in practical applications such as robotic vision, autonomous driving, rescue operations and remote se… ▽ More Non-line-of-sight (NLOS) imaging aims at reconstructing targets obscured from the direct line of sight. Existing NLOS imaging algorithms require dense measurements at rectangular grid points in a large area of the relay surface, which severely hinders their availability to variable relay scenarios in practical applications such as robotic vision, autonomous driving, rescue operations and remote sensing. In this work, we propose a Bayesian framework for NLOS imaging with no specific requirements on the spatial pattern of illumination and detection points. By introducing virtual confocal signals, we design a confocal complemented signal-object collaborative regularization (CC-SOCR) algorithm for high quality reconstructions. Our approach is capable of reconstructing both albedo and surface normal of the hidden objects with fine details under the most general relay setting. Moreover, with a regular relay surface, coarse rather than dense measurements are enough for our approach such that the acquisition time can be reduced significantly. As demonstrated in multiple experiments, the new framework substantially enhances the applicability of NLOS imaging. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: main article: 32 pages with 8 figures; supplementary information: 49 pages with 26 figures

arXiv:2210.15418 [pdf, other]

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

Authors: **gyi li, Wei** tu, Li xiao

Abstract: Voice conversion (VC) can be achieved by first extracting source content information and target speaker information, and then reconstructing waveform with these information. However, current approaches normally either extract dirty content information with speaker information leaked in, or demand a large amount of annotated data for training. Besides, the quality of reconstructed waveform can be d… ▽ More Voice conversion (VC) can be achieved by first extracting source content information and target speaker information, and then reconstructing waveform with these information. However, current approaches normally either extract dirty content information with speaker information leaked in, or demand a large amount of annotated data for training. Besides, the quality of reconstructed waveform can be degraded by the mismatch between conversion model and vocoder. In this paper, we adopt the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to WavLM features, and propose the spectrogram-resize based data augmentation to improve the purity of extracted content information. Experimental results show that the proposed method outperforms the latest VC models trained with annotated data and has greater robustness. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.14481 [pdf]

Calibrationless Reconstruction of Uniformly-Undersampled Multi-Channel MR Data with Deep Learning Estimated ESPIRiT Maps

Authors: Junhao Zhang, Zheyuan Yi, Yujiao Zhao, Linfang Xiao, Jiahao Hu, Christopher Man, Vick Lau, Shi Su, Fei Chen, Alex T. L. Leong, Ed X. Wu

Abstract: Purpose: To develop a truly calibrationless reconstruction method that derives ESPIRiT maps from uniformly-undersampled multi-channel MR data by deep learning. Methods: ESPIRiT, one commonly used parallel imaging reconstruction technique, forms the images from undersampled MR k-space data using ESPIRiT maps that effectively represents coil sensitivity information. Accurate ESPIRiT map estimation r… ▽ More Purpose: To develop a truly calibrationless reconstruction method that derives ESPIRiT maps from uniformly-undersampled multi-channel MR data by deep learning. Methods: ESPIRiT, one commonly used parallel imaging reconstruction technique, forms the images from undersampled MR k-space data using ESPIRiT maps that effectively represents coil sensitivity information. Accurate ESPIRiT map estimation requires quality coil sensitivity calibration or autocalibration data. We present a U-Net based deep learning model to estimate the multi-channel ESPIRiT maps directly from uniformly-undersampled multi-channel multi-slice MR data. The model is trained using fully-sampled multi-slice axial brain datasets from the same MR receiving coil system. To utilize subject-coil geometric parameters available for each dataset, the training imposes a hybrid loss on ESPIRiT maps at the original locations as well as their corresponding locations within the standard reference multi-slice axial stack. The performance of the approach was evaluated using publicly available T1-weighed brain and cardiac data. Results: The proposed model robustly predicted multi-channel ESPIRiT maps from uniformly-undersampled k-space data. They were highly comparable to the reference ESPIRiT maps directly computed from 24 consecutive central k-space lines. Further, they led to excellent ESPIRiT reconstruction performance even at high acceleration, exhibiting a similar level of errors and artifacts to that by using reference ESPIRiT maps. Conclusion: A new deep learning approach is developed to estimate ESPIRiT maps directly from uniformly-undersampled MR data. It presents a general strategy for calibrationless parallel imaging reconstruction through learning from coil and protocol specific data. △ Less

Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.06730 [pdf]

Robust Electromagnetic Interference (EMI) Elimination via Simultaneous Sensing and Deep Learning Prediction for RF Shielding-free MRI

Authors: Yujiao Zhao, Linfang Xiao, Vick Lau, Yilong Liu, Alex T. Leong, Ed X. Wu

Abstract: At present, MRI scans are performed inside a fully-enclosed RF shielding room, posing stringent installation requirement and unnecessary patient discomfort. We aim to develop an electromagnetic interference (EMI) cancellation strategy for MRI with no or incomplete RF shielding. In this study, a simultaneous sensing and deep learning driven EMI cancellation strategy is presented to model, predict a… ▽ More At present, MRI scans are performed inside a fully-enclosed RF shielding room, posing stringent installation requirement and unnecessary patient discomfort. We aim to develop an electromagnetic interference (EMI) cancellation strategy for MRI with no or incomplete RF shielding. In this study, a simultaneous sensing and deep learning driven EMI cancellation strategy is presented to model, predict and remove EMI signals from acquired MRI signals. Specifically, during each MRI scan, separate EMI sensing coils placed in various spatial locations are utilized to simultaneously sample environmental and internal EMI signals within two windows (for both conventional MRI signal acquisition and EMI characterization acquisition). Then a CNN model is trained using the EMI characterization data to relate EMI signals detected by EMI sensing coils to EMI signals in MRI receive coil. This model is utilized to retrospectively predict and remove EMI signals components detected by MRI receive coil during the MRI signal acquisition window. We implemented and demonstrated this strategy for various EMI sources on a mobile ultra-low-field 0.055 T permanent magnet MRI scanner and a 1.5 T superconducting magnet MRI scanner with no or incomplete RF shielding. Our experimental results demonstrate that the method is highly effective and robust in predicting and removing various EMI sources from both external environments and internal scanner electronics at both 0.055 T (2.3 MHz) and 1.5 T (64 MHz), producing final image signal-to-noise ratios that are comparable to those obtained using a fully enclosed RF shielding. Our proposed strategy enables MRI operation with no or incomplete RF shielding, alleviating MRI installation and operational requirements. It is also potentially applicable to other scenarios of accurate RF signal detection or discrimination in presence of external and internal EMI or RF sources. △ Less

Submitted 5 January, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Submitted to NMR in Biomedicine

arXiv:2209.10425 [pdf, other]

Consecutive Knowledge Meta-Adaptation Learning for Unsupervised Medical Diagnosis

Authors: Yumin Zhang, Yawen Hou, Xiuyi Chen, Hongyuan Yu, Long Xia

Abstract: Deep learning-based Computer-Aided Diagnosis (CAD) has attracted appealing attention in academic researches and clinical applications. Nevertheless, the Convolutional Neural Networks (CNNs) diagnosis system heavily relies on the well-labeled lesion dataset, and the sensitivity to the variation of data distribution also restricts the potential application of CNNs in CAD. Unsupervised Domain Adaptat… ▽ More Deep learning-based Computer-Aided Diagnosis (CAD) has attracted appealing attention in academic researches and clinical applications. Nevertheless, the Convolutional Neural Networks (CNNs) diagnosis system heavily relies on the well-labeled lesion dataset, and the sensitivity to the variation of data distribution also restricts the potential application of CNNs in CAD. Unsupervised Domain Adaptation (UDA) methods are developed to solve the expensive annotation and domain gaps problem and have achieved remarkable success in medical image analysis. Yet existing UDA approaches only adapt knowledge learned from the source lesion domain to a single target lesion domain, which is against the clinical scenario: the new unlabeled target domains to be diagnosed always arrive in an online and continual manner. Moreover, the performance of existing approaches degrades dramatically on previously learned target lesion domains, due to the newly learned knowledge overwriting the previously learned knowledge (i.e., catastrophic forgetting). To deal with the above issues, we develop a meta-adaptation framework named Consecutive Lesion Knowledge Meta-Adaptation (CLKM), which mainly consists of Semantic Adaptation Phase (SAP) and Representation Adaptation Phase (RAP) to learn the diagnosis model in an online and continual manner. In the SAP, the semantic knowledge learned from the source lesion domain is transferred to consecutive target lesion domains. In the RAP, the feature-extractor is optimized to align the transferable representation knowledge across the source and multiple target lesion domains. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2208.08370 [pdf, other]

Energy-grade double pricing mechanism for a combined heat and power system using the asynchronous dispatch method

Authors: Xinyi Yi, Ye Guo, Hongbin Sun, Qiuwei Wu, Li Xiao

Abstract: The problem of heat and electricity pricing in combined heat and power systems regarding the time scales of electricity and heat, as well as thermal energy quality, is studied. Based on the asynchronous coordinated dispatch of the combined heat and power system, an energy-grade double pricing mechanism is proposed. Under the pricing mechanism, the resulting merchandise surplus of the heat system o… ▽ More The problem of heat and electricity pricing in combined heat and power systems regarding the time scales of electricity and heat, as well as thermal energy quality, is studied. Based on the asynchronous coordinated dispatch of the combined heat and power system, an energy-grade double pricing mechanism is proposed. Under the pricing mechanism, the resulting merchandise surplus of the heat system operator at each heat dispatch interval can be decomposed into interpretable parts and its revenue adequacy can be guaranteed for all heat dispatch intervals. And the electric power system operator's resulting merchandise surplus is composed of non-negative components at each electricity dispatch interval, also ensuring its revenue adequacy. In addition, the effects of different time scales and cogeneration are analyzed in different kinds of combined heat and power units' pricing. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.03952 [pdf, other]

A Multiple Market Trading Mechanism for Electricity, Renewable Energy Certificate and Carbon Emission Right of Virtual Power Plants

Authors: Zhihong Huang, Ye Guo, Qiuwei Wu, Li Xiao, Hongbin Sun

Abstract: A multiple market trading mechanism for the VPP to participate in electricity, renewable energy certificate (REC) and carbon emission right (CER) markets is proposed. With the introduction of the inventory mechanism of REC and CER, the profit of the VPP increases and better trading decisions with multiple markets are made under the requirements of renewable portfolio standard (RPS) and carbon emis… ▽ More A multiple market trading mechanism for the VPP to participate in electricity, renewable energy certificate (REC) and carbon emission right (CER) markets is proposed. With the introduction of the inventory mechanism of REC and CER, the profit of the VPP increases and better trading decisions with multiple markets are made under the requirements of renewable portfolio standard (RPS) and carbon emission (CE) quota requirements. According to the Karush-Kuhn-Tucker (KKT) conditions of the proposed model, properties of the multiple market trading mechanism are discussed. Results from case studies verify the effectiveness of the proposed model. △ Less

Submitted 8 August, 2022; originally announced August 2022.

arXiv:2207.05848 [pdf, other]

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

Authors: Hanqing Guo, Chenning Li, Lingkun Li, Zhichao Cao, Qiben Yan, Li Xiao

Abstract: In this paper, we propose NEC (Neural Enhanced Cancellation), a defense mechanism, which prevents unauthorized microphones from capturing a target speaker's voice. Compared with the existing scrambling-based audio cancellation approaches, NEC can selectively remove a target speaker's voice from a mixed speech without causing interference to others. Specifically, for a target speaker, we design a D… ▽ More In this paper, we propose NEC (Neural Enhanced Cancellation), a defense mechanism, which prevents unauthorized microphones from capturing a target speaker's voice. Compared with the existing scrambling-based audio cancellation approaches, NEC can selectively remove a target speaker's voice from a mixed speech without causing interference to others. Specifically, for a target speaker, we design a Deep Neural Network (DNN) model to extract high-level speaker-specific but utterance-independent vocal features from his/her reference audios. When the microphone is recording, the DNN generates a shadow sound to cancel the target voice in real-time. Moreover, we modulate the audible shadow sound onto an ultrasound frequency, making it inaudible for humans. By leveraging the non-linearity of the microphone circuit, the microphone can accurately decode the shadow sound for target voice cancellation. We implement and evaluate NEC comprehensively with 8 smartphone microphones in different settings. The results show that NEC effectively mutes the target speaker at a microphone without interfering with other users' normal conversations. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: 12 pages

arXiv:2207.04596 [pdf, other]

Frequency-Angle Two-Dimensional Reflection Coefficient Modeling Based on Terahertz Channel Measurement

Authors: Zhaowei Chang, Jianhua Zhang, Pan Tang, Lei Tian, Li Yu, Guangyi Liu, Liang Xia

Abstract: Terahertz (THz) channel propagation characteristics are vital for the design, evaluation, and optimization for THz communication systems. Moreover, reflection plays a significant role in channel propagation. In this letter, the reflection coefficient of the THz channel is researched based on extensive measurement campaigns. Firstly, we set up the THz channel sounder from 220 to 320 GHz with the in… ▽ More Terahertz (THz) channel propagation characteristics are vital for the design, evaluation, and optimization for THz communication systems. Moreover, reflection plays a significant role in channel propagation. In this letter, the reflection coefficient of the THz channel is researched based on extensive measurement campaigns. Firstly, we set up the THz channel sounder from 220 to 320 GHz with the incident angle ranging from 10° to 80°. Based on the measured propagation loss, the reflection coefficients of five building materials, i.e., glass, tile, aluminium alloy, board, and plasterboard, are calculated separately for frequencies and incident angles. It is found that the lack of THz relative parameters leads to the Fresnel model of non-metallic materials can not fit the measured data well. Thus, we propose a frequency-angle two-dimensional reflection coefficient model by modifying the Fresnel model with the Lorenz and Drude model. The proposed model characterizes the frequency and incident angle for reflection coefficients and shows low root-mean-square error with the measured data. Generally, these results are useful for modeling THz channels. △ Less

Submitted 10 July, 2022; originally announced July 2022.

arXiv:2205.14496 [pdf, other]

SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Authors: Hanqing Guo, Qiben Yan, Nikolay Ivanov, Ying Zhu, Li Xiao, Eric J. Hunter

Abstract: Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrogr… ▽ More Voice-activated systems are integrated into a variety of desktop, mobile, and Internet-of-Things (IoT) devices. However, voice spoofing attacks, such as impersonation and replay attacks, in which malicious attackers synthesize the voice of a victim or simply replay it, have brought growing security concerns. Existing speaker verification techniques distinguish individual speakers via the spectrographic features extracted from an audible frequency range of voice commands. However, they often have high error rates and/or long delays. In this paper, we explore a new direction of human voice research by scrutinizing the unique characteristics of human speech at the ultrasound frequency band. Our research indicates that the high-frequency ultrasound components (e.g. speech fricatives) from 20 to 48 kHz can significantly enhance the security and accuracy of speaker verification. We propose a speaker verification system, SUPERVOICE that uses a two-stream DNN architecture with a feature fusion mechanism to generate distinctive speaker models. To test the system, we create a speech dataset with 12 hours of audio (8,950 voice samples) from 127 participants. In addition, we create a second spoofed voice dataset to evaluate its security. In order to balance between controlled recordings and real-world applications, the audio recordings are collected from two quiet rooms by 8 different recording devices, including 7 smartphones and an ultrasound microphone. Our evaluation shows that SUPERVOICE achieves 0.58% equal error rate in the speaker verification task, it only takes 120 ms for testing an incoming utterance, outperforming all existing speaker verification systems. Moreover, within 91 ms processing time, SUPERVOICE achieves 0% equal error rate in detecting replay attacks launched by 5 different loudspeakers. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2204.02839 [pdf]

CCAT-NET: A Novel Transformer Based Semi-supervised Framework for Covid-19 Lung Lesion Segmentation

Authors: Mingyang Liu, Li Xiao, Huiqin Jiang, Qing He

Abstract: The spread of the novel coronavirus disease 2019 (COVID-19) has claimed millions of lives. Automatic segmentation of lesions from CT images can assist doctors with screening, treatment, and monitoring. However, accurate segmentation of lesions from CT images can be very challenging due to data and model limitations. Recently, Transformer-based networks have attracted a lot of attention in the area… ▽ More The spread of the novel coronavirus disease 2019 (COVID-19) has claimed millions of lives. Automatic segmentation of lesions from CT images can assist doctors with screening, treatment, and monitoring. However, accurate segmentation of lesions from CT images can be very challenging due to data and model limitations. Recently, Transformer-based networks have attracted a lot of attention in the area of computer vision, as Transformer outperforms CNN at a bunch of tasks. In this work, we propose a novel network structure that combines CNN and Transformer for the segmentation of COVID-19 lesions. We further propose an efficient semi-supervised learning framework to address the shortage of labeled data. Extensive experiments showed that our proposed network outperforms most existing networks and the semi-supervised learning framework can outperform the base network by 3.0% and 8.2% in terms of Dice coefficient and sensitivity. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.02100 [pdf, other]

Learning Incrementally to Segment Multiple Organs in a CT Image

Authors: Pengbo Liu, Xia Wang, Mengsi Fan, Hongli Pan, Minmin Yin, Xiaohong Zhu, Dandan Du, Xiaoying Zhao, Li Xiao, Lian Ding, Xingwang Wu, S. Kevin Zhou

Abstract: There exists a large number of datasets for organ segmentation, which are partially annotated and sequentially constructed. A typical dataset is constructed at a certain time by curating medical images and annotating the organs of interest. In other words, new datasets with annotations of new organ categories are built over time. To unleash the potential behind these partially labeled, sequentiall… ▽ More There exists a large number of datasets for organ segmentation, which are partially annotated and sequentially constructed. A typical dataset is constructed at a certain time by curating medical images and annotating the organs of interest. In other words, new datasets with annotations of new organ categories are built over time. To unleash the potential behind these partially labeled, sequentially-constructed datasets, we propose to incrementally learn a multi-organ segmentation model. In each incremental learning (IL) stage, we lose the access to previous data and annotations, whose knowledge is assumingly captured by the current model, and gain the access to a new dataset with annotations of new organ categories, from which we learn to update the organ segmentation model to include the new organs. While IL is notorious for its `catastrophic forgetting' weakness in the context of natural image analysis, we experimentally discover that such a weakness mostly disappears for CT multi-organ segmentation. To further stabilize the model performance across the IL stages, we introduce a light memory module and some loss functions to restrain the representation of different categories in feature space, aggregating feature representation of the same class and separating feature representation of different classes. Extensive experiments on five open-sourced datasets are conducted to illustrate the effectiveness of our method. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2103.04526

arXiv:2202.08195 [pdf, other]

doi 10.1016/j.media.2023.102933

Nuclei Segmentation with Point Annotations from Pathology Images via Self-Supervised Learning and Co-Training

Authors: Yi Lin, Zhiyong Qu, Hao Chen, Zhongke Gao, Yuexiang Li, Lili Xia, Kai Ma, Yefeng Zheng, Kwang-Ting Cheng

Abstract: Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as po… ▽ More Nuclei segmentation is a crucial task for whole slide image analysis in digital pathology. Generally, the segmentation performance of fully-supervised learning heavily depends on the amount and quality of the annotated data. However, it is time-consuming and expensive for professional pathologists to provide accurate pixel-level ground truth, while it is much easier to get coarse labels such as point annotations. In this paper, we propose a weakly-supervised learning method for nuclei segmentation that only requires point annotations for training. First, coarse pixel-level labels are derived from the point annotations based on the Voronoi diagram and the k-means clustering method to avoid overfitting. Second, a co-training strategy with an exponential moving average method is designed to refine the incomplete supervision of the coarse labels. Third, a self-supervised visual representation learning method is tailored for nuclei segmentation of pathology images that transforms the hematoxylin component images into the H&E stained images to gain better understanding of the relationship between the nuclei and cytoplasm. We comprehensively evaluate the proposed method using two public datasets. Both visual and quantitative results demonstrate the superiority of our method to the state-of-the-art methods, and its competitive performance compared to the fully-supervised methods. Code: https://github.com/hust-linyi/SC-Net △ Less

Submitted 17 August, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by MedIA

arXiv:2202.07632 [pdf, other]

doi 10.1109/INFOCOMWKSHPS54753.2022.9797984

Wireless Resource Management in Intelligent Semantic Communication Networks

Authors: Le Xia, Yao Sun, Xiaoqian Li, Gang Feng, Muhammad Ali Imran

Abstract: The prosperity of artificial intelligence (AI) has laid a promising paradigm of communication system, i.e., intelligent semantic communication (ISC), where semantic contents, instead of traditional bit sequences, are coded by AI models for efficient communication. Due to the unique demand of background knowledge for semantic recovery, wireless resource management faces new challenges in ISC. In th… ▽ More The prosperity of artificial intelligence (AI) has laid a promising paradigm of communication system, i.e., intelligent semantic communication (ISC), where semantic contents, instead of traditional bit sequences, are coded by AI models for efficient communication. Due to the unique demand of background knowledge for semantic recovery, wireless resource management faces new challenges in ISC. In this paper, we address the user association (UA) and bandwidth allocation (BA) problems in an ISC-enabled heterogeneous network (ISC-HetNet). We first introduce the auxiliary knowledge base (KB) into the system model, and develop a new performance metric for the ISC-HetNet, named system throughput in message (STM). Joint optimization of UA and BA is then formulated with the aim of STM maximization subject to KB matching and wireless bandwidth constraints. To this end, we propose a two-stage solution, including a stochastic programming method in the first stage to obtain a deterministic objective with semantic confidence, and a heuristic algorithm in the second stage to reach the optimality of UA and BA. Numerical results show great superiority and reliability of our proposed solution on the STM performance when compared with two baseline algorithms. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: This work has been accepted for publication by 2022 IEEE International Conference on Computer Communications (INFOCOM). Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2112.15011 [pdf, other]

Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment

Authors: Shuxin Yang, Xian Wu, Shen Ge, S. Kevin Zhou, Li Xiao

Abstract: In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray imag… ▽ More In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distil and restore medical knowledge from textual embedding without manual labour; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, with the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics. △ Less

Submitted 1 June, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

arXiv:2112.15009 [pdf, ps, other]

doi 10.1016/j.media.2022.102510

Knowledge Matters: Radiology Report Generation with General and Specific Knowledge

Authors: Shuxin Yang, Xian Wu, Shen Ge, Shaohua Kevin Zhou, Li Xiao

Abstract: Automatic radiology report generation is critical in clinics which can relieve experienced radiologists from the heavy workload and remind inexperienced radiologists of misdiagnosis or missed diagnose. Existing approaches mainly formulate radiology report generation as an image captioning task and adopt the encoder-decoder framework. However, in the medical domain, such pure data-driven approaches… ▽ More Automatic radiology report generation is critical in clinics which can relieve experienced radiologists from the heavy workload and remind inexperienced radiologists of misdiagnosis or missed diagnose. Existing approaches mainly formulate radiology report generation as an image captioning task and adopt the encoder-decoder framework. However, in the medical domain, such pure data-driven approaches suffer from the following problems: 1) visual and textual bias problem; 2) lack of expert knowledge. In this paper, we propose a knowledge-enhanced radiology report generation approach introduces two types of medical knowledge: 1) General knowledge, which is input independent and provides the broad knowledge for report generation; 2) Specific knowledge, which is input dependent and provides the fine-grained knowledge for report generation. To fully utilize both the general and specific knowledge, we also propose a knowledge-enhanced multi-head attention mechanism. By merging the visual features of the radiology image with general knowledge and specific knowledge, the proposed model can improve the quality of generated reports. Experimental results on two publicly available datasets IU-Xray and MIMIC-CXR show that the proposed knowledge enhanced approach outperforms state-of-the-art image captioning based methods. Ablation studies also demonstrate that both general and specific knowledge can help to improve the performance of radiology report generation. △ Less

Submitted 6 November, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

Comments: Medical Image Analysis

arXiv:2110.11591 [pdf, ps, other]

doi 10.1109/TGRS.2022.3143156

Model Inspired Autoencoder for Unsupervised Hyperspectral Image Super-Resolution

Authors: Jianjun Liu, Zebin Wu, Liang Xiao, Xiao-Jun Wu

Abstract: This paper focuses on hyperspectral image (HSI) super-resolution that aims to fuse a low-spatial-resolution HSI and a high-spatial-resolution multispectral image to form a high-spatial-resolution HSI (HR-HSI). Existing deep learning-based approaches are mostly supervised that rely on a large number of labeled training samples, which is unrealistic. The commonly used model-based approaches are unsu… ▽ More This paper focuses on hyperspectral image (HSI) super-resolution that aims to fuse a low-spatial-resolution HSI and a high-spatial-resolution multispectral image to form a high-spatial-resolution HSI (HR-HSI). Existing deep learning-based approaches are mostly supervised that rely on a large number of labeled training samples, which is unrealistic. The commonly used model-based approaches are unsupervised and flexible but rely on hand-craft priors. Inspired by the specific properties of model, we make the first attempt to design a model inspired deep network for HSI super-resolution in an unsupervised manner. This approach consists of an implicit autoencoder network built on the target HR-HSI that treats each pixel as an individual sample. The nonnegative matrix factorization (NMF) of the target HR-HSI is integrated into the autoencoder network, where the two NMF parts, spectral and spatial matrices, are treated as decoder parameters and hidden outputs respectively. In the encoding stage, we present a pixel-wise fusion model to estimate hidden outputs directly, and then reformulate and unfold the model's algorithm to form the encoder network. With the specific architecture, the proposed network is similar to a manifold prior-based model, and can be trained patch by patch rather than the entire image. Moreover, we propose an additional unsupervised network to estimate the point spread function and spectral response function. Experimental results conducted on both synthetic and real datasets demonstrate the effectiveness of the proposed approach. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2022, vol 60

arXiv:2107.05464 [pdf, other]

IGrow: A Smart Agriculture Solution to Autonomous Greenhouse Control

Authors: Xiaoyan Cao, Yao Yao, Lanqing Li, Wanpeng Zhang, Zhicheng An, Zhong Zhang, Li Xiao, Shihui Guo, Xiaoyu Cao, Meihong Wu, Dijun Luo

Abstract: Agriculture is the foundation of human civilization. However, the rapid increase of the global population poses a challenge on this cornerstone by demanding more food. Modern autonomous greenhouses, equipped with sensors and actuators, provide a promising solution to the problem by empowering precise control for high-efficient food production. However, the optimal control of autonomous greenhouses… ▽ More Agriculture is the foundation of human civilization. However, the rapid increase of the global population poses a challenge on this cornerstone by demanding more food. Modern autonomous greenhouses, equipped with sensors and actuators, provide a promising solution to the problem by empowering precise control for high-efficient food production. However, the optimal control of autonomous greenhouses is challenging, requiring decision-making based on high-dimensional sensory data, and the scaling of production is limited by the scarcity of labor capable of handling this task. With the advances of artificial intelligence (AI), the internet of things (IoT), and cloud computing technologies, we are hopeful to provide a solution to automate and smarten greenhouse control to address the above challenges. In this paper, we propose a smart agriculture solution named iGrow, for autonomous greenhouse control (AGC): (1) for the first time, we formulate the AGC problem as a Markov decision process (MDP) optimization problem; (2) we design a neural network-based simulator incorporated with the incremental mechanism to simulate the complete planting process of an autonomous greenhouse, which provides a testbed for the optimization of control strategies; (3) we propose a closed-loop bi-level optimization algorithm, which can dynamically re-optimize the greenhouse control strategy with newly observed data during real-world production. We not only conduct simulation experiments but also deploy iGrow in real scenarios, and experimental results demonstrate the effectiveness and superiority of iGrow in autonomous greenhouse simulation and optimal control. Particularly, compelling results from the tomato pilot project in real autonomous greenhouses show that our solution significantly increases crop yield (+10.15\%) and net profit (+92.70\%) with statistical significance compared to planting experts. △ Less

Submitted 14 March, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: 9 pages, 5 figures, 2 tables, accepted by AAAI 2022

Showing 1–50 of 79 results for author: Xiao, L