-
Adaptive Self-Supervised Consistency-Guided Diffusion Model for Accelerated MRI Reconstruction
Authors:
Mojtaba Safari,
Zach Eidex,
Shaoyan Pan,
Richard L. J. Qiu,
Xiaofeng Yang
Abstract:
Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named "Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)" to accelerate data acquisition without requiring fully sampled datasets. Materials and Methods: We used the fastMRI multi-coil brain axial T2-weighted (T2-w) dataset from 1,376 cases and single-coil brain quantitative…
▽ More
Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named "Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)" to accelerate data acquisition without requiring fully sampled datasets. Materials and Methods: We used the fastMRI multi-coil brain axial T2-weighted (T2-w) dataset from 1,376 cases and single-coil brain quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) T1 maps from 318 cases to train and test our model. Robustness against domain shift was evaluated using two out-of-distribution (OOD) datasets: multi-coil brain axial postcontrast T1 -weighted (T1c) dataset from 50 cases and axial T1-weighted (T1-w) dataset from 50 patients. Data were retrospectively subsampled at acceleration rates R in {2x, 4x, 8x}. ASSCGD partitions a random sampling pattern into two disjoint sets, ensuring data consistency during training. We compared our method with ReconFormer Transformer and SS-MRI, assessing performance using normalized mean squared error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Statistical tests included one-way analysis of variance (ANOVA) and multi-comparison Tukey's Honesty Significant Difference (HSD) tests. Results: ASSCGD preserved fine structures and brain abnormalities visually better than comparative methods at R = 8x for both multi-coil and single-coil datasets. It achieved the lowest NMSE at R in {4x, 8x}, and the highest PSNR and SSIM values at all acceleration rates for the multi-coil dataset. Similar trends were observed for the single-coil dataset, though SSIM values were comparable to ReconFormer at R in {2x, 8x}. These results were further confirmed by the voxel-wise correlation scatter plots. OOD results showed significant (p << 10^-5 ) improvements in undersampled image quality after reconstruction.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Mammo-CLIP: Leveraging Contrastive Language-Image Pre-training (CLIP) for Enhanced Breast Cancer Diagnosis with Multi-view Mammography
Authors:
Xuxin Chen,
Yuheng Li,
Mingzhe Hu,
Ella Salari,
Xiaoqian Chen,
Richard L. J. Qiu,
Bin Zheng,
Xiaofeng Yang
Abstract:
Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, develo** multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-tr…
▽ More
Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, develo** multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such CAD schemes have been used in clinical practice. To overcome the challenges, we investigate a new approach based on Contrastive Language-Image Pre-training (CLIP), which has sparked interest across various medical imaging tasks. By solving the challenges in (1) effectively adapting the single-view CLIP for multi-view feature fusion and (2) efficiently fine-tuning this parameter-dense model with limited samples and computational resources, we introduce Mammo-CLIP, the first multi-modal framework to process multi-view mammograms and corresponding simple texts. Mammo-CLIP uses an early feature fusion strategy to learn multi-view relationships in four mammograms acquired from the CC and MLO views of the left and right breasts. To enhance learning efficiency, plug-and-play adapters are added into CLIP image and text encoders for fine-tuning parameters and limiting updates to about 1% of the parameters. For framework evaluation, we assembled two datasets retrospectively. The first dataset, comprising 470 malignant and 479 benign cases, was used for few-shot fine-tuning and internal evaluation of the proposed Mammo-CLIP via 5-fold cross-validation. The second dataset, including 60 malignant and 294 benign cases, was used to test generalizability of Mammo-CLIP. Study results show that Mammo-CLIP outperforms the state-of-art cross-view transformer in AUC (0.841 vs. 0.817, 0.837 vs. 0.807) on both datasets. It also surpasses previous two CLIP-based methods by 20.3% and 14.3%. This study highlights the potential of applying the finetuned vision-language models for develo** next-generation, image-text-based CAD schemes of breast cancer.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Ray-driven Spectral CT Reconstruction Based on Neural Base-Material Fields
Authors:
Ligen Shi,
Chang Liu,
** Yang,
Jun Qiu,
Xing Zhao
Abstract:
In spectral CT reconstruction, the basis materials decomposition involves solving a large-scale nonlinear system of integral equations, which is highly ill-posed mathematically. This paper proposes a model that parameterizes the attenuation coefficients of the object using a neural field representation, thereby avoiding the complex calculations of pixel-driven projection coefficient matrices durin…
▽ More
In spectral CT reconstruction, the basis materials decomposition involves solving a large-scale nonlinear system of integral equations, which is highly ill-posed mathematically. This paper proposes a model that parameterizes the attenuation coefficients of the object using a neural field representation, thereby avoiding the complex calculations of pixel-driven projection coefficient matrices during the discretization process of line integrals. It introduces a lightweight discretization method for line integrals based on a ray-driven neural field, enhancing the accuracy of the integral approximation during the discretization process. The basis materials are represented as continuous vector-valued implicit functions to establish a neural field parameterization model for the basis materials. The auto-differentiation framework of deep learning is then used to solve the implicit continuous function of the neural base-material fields. This method is not limited by the spatial resolution of reconstructed images, and the network has compact and regular properties. Experimental validation shows that our method performs exceptionally well in addressing the spectral CT reconstruction. Additionally, it fulfils the requirements for the generation of high-resolution reconstruction images.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Applying Large Language Models to Power Systems: Potential Security Threats
Authors:
Jiaqi Ruan,
Gaoqi Liang,
Huan Zhao,
Guolong Liu,
Xianzhuo Sun,
**g Qiu,
Zhao Xu,
Fushuan Wen,
Zhao Yang Dong
Abstract:
Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and d…
▽ More
Applying large language models (LLMs) to modern power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this article analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and development of countermeasures.
△ Less
Submitted 24 January, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss
Authors:
Junbo Peng,
Chih-Wei Chang,
Huiqiao Xie,
Richard L. J. Qiu,
Justin Roper,
Tonghe Wang,
Beth Bradshaw,
Xiangyang Tang,
Xiaofeng Yang
Abstract:
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately…
▽ More
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately represent the features of the target image manifold. Although deep learning-based decomposition methods have been reported, these methods are in the supervised-learning framework requiring paired data for training, which is not readily available in clinical settings.
Purpose: This work aims to develop an unsupervised-learning framework with data-measurement consistency for image-domain material decomposition in DECT.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer
Authors:
** Qiu,
Lu Huang,
Boyu Li,
Jun Zhang,
Lu Lu,
Zejun Ma
Abstract:
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual entities, which is essential in practical applications, especially for streaming Automatic Speech Recognition (ASR). However, deep biasing with large-scale rare words remains challenging, as the performance drops significantly when more distractors exist and there are words with similar grapheme seq…
▽ More
Deep biasing for the Transducer can improve the recognition performance of rare words or contextual entities, which is essential in practical applications, especially for streaming Automatic Speech Recognition (ASR). However, deep biasing with large-scale rare words remains challenging, as the performance drops significantly when more distractors exist and there are words with similar grapheme sequences in the bias list. In this paper, we combine the phoneme and textual information of rare words in Transducers to distinguish words with similar pronunciation or spelling. Moreover, the introduction of training with text-only data containing more rare words benefits large-scale deep biasing. The experiments on the LibriSpeech corpus demonstrate that the proposed method achieves state-of-the-art performance on rare word error rate for different scales and levels of bias lists.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Authors:
Jianing Qiu,
Jian Wu,
Hao Wei,
Peilun Shi,
Minqing Zhang,
Yunyun Sun,
Lin Li,
Hanruo Liu,
Hongyi Liu,
Simeng Hou,
Yuyang Zhao,
Xuehui Shi,
Junfang Xian,
Xiaoxia Qu,
Sirui Zhu,
Lijie Pan,
Xiaoniao Chen,
Xiaojia Zhang,
Shuai Jiang,
Kebing Wang,
Chenlong Yang,
Mingqiang Chen,
Sujie Fan,
Jianhua Hu,
Aiguo Lv
, et al. (17 additional authors not shown)
Abstract:
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi…
▽ More
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Multi-feature concatenation and multi-classifier stacking: an interpretable and generalizable machine learning method for MDD discrimination with rsfMRI
Authors:
Yunsong Luo,
Wenyu Chen,
Ling Zhan,
Jiang Qiu,
Tao Jia
Abstract:
Major depressive disorder is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of mental diseases. Different machine learning algorithms are then developed to exploit the rich inf…
▽ More
Major depressive disorder is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of mental diseases. Different machine learning algorithms are then developed to exploit the rich information in rsfMRI and discriminate MDD patients from normal controls. Despite recent advances reported, the discrimination accuracy has room for further improvement. The generalizability and interpretability of the method are not sufficiently addressed either. Here, we propose a machine learning method (MFMC) for MDD discrimination by concatenating multiple features and stacking multiple classifiers. MFMC is tested on the REST-meta-MDD data set that contains 2428 subjects collected from 25 different sites. MFMC yields 96.9% MDD discrimination accuracy, demonstrating a significant improvement over existing methods. In addition, the generalizability of MFMC is validated by the good performance when the training and testing subjects are from independent sites. The use of XGBoost as the meta classifier allows us to probe the decision process of MFMC. We identify 13 feature values related to 9 brain regions including the posterior cingulate gyrus, superior frontal gyrus orbital part, and angular gyrus, which contribute most to the classification and also demonstrate significant differences at the group level. The use of these 13 feature values alone can reach 87% of MFMC's full performance when taking all feature values. These features may serve as clinically useful diagnostic and prognostic biomarkers for mental disorders in the future.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Synthetic white balancing for intra-operative hyperspectral imaging
Authors:
Anisha Bahl,
Conor C. Horgan,
Mirek Janatka,
Oscar J. MacCormac,
Philip Noonan,
Yi**g Xie,
Jianrong Qiu,
Nicola Cavalcanti,
Philipp Fürnstahl,
Michael Ebner,
Mads S. Bergholt,
Jonathan Shapey,
Tom Vercauteren
Abstract:
Hyperspectral imaging shows promise for surgical applications to non-invasively provide spatially-resolved, spectral information. For calibration purposes, a white reference image of a highly-reflective Lambertian surface should be obtained under the same imaging conditions. Standard white references are not sterilizable, and so are unsuitable for surgical environments. We demonstrate the necessit…
▽ More
Hyperspectral imaging shows promise for surgical applications to non-invasively provide spatially-resolved, spectral information. For calibration purposes, a white reference image of a highly-reflective Lambertian surface should be obtained under the same imaging conditions. Standard white references are not sterilizable, and so are unsuitable for surgical environments. We demonstrate the necessity for in situ white references and address this by proposing a novel, sterile, synthetic reference construction algorithm. The use of references obtained at different distances and lighting conditions to the subject were examined. Spectral and color reconstructions were compared with standard measurements qualitatively and quantitatively, using $ΔE$ and normalised RMSE respectively. The algorithm forms a composite image from a video of a standard sterile ruler, whose imperfect reflectivity is compensated for. The reference is modelled as the product of independent spatial and spectral components, and a scalar factor accounting for gain, exposure, and light intensity. Evaluation of synthetic references against ideal but non-sterile references is performed using the same metrics alongside pixel-by-pixel errors. Finally, intraoperative integration is assessed though cadaveric experiments. Improper white balancing leads to increases in all quantitative and qualitative errors. Synthetic references achieve median pixel-by-pixel errors lower than 6.5% and produce similar reconstructions and errors to an ideal reference. The algorithm integrated well into surgical workflow, achieving median pixel-by-pixel errors of 4.77%, while maintaining good spectral and color reconstruction.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Deep Reinforcement Learning Based System for Intraoperative Hyperspectral Video Autofocusing
Authors:
Charlie Budd,
Jianrong Qiu,
Oscar MacCormac,
Martin Huber,
Christopher Mower,
Mirek Janatka,
Théo Trotouin,
Jonathan Shapey,
Mads S. Bergholt,
Tom Vercauteren
Abstract:
Hyperspectral imaging (HSI) captures a greater level of spectral detail than traditional optical imaging, making it a potentially valuable intraoperative tool when precise tissue differentiation is essential. Hardware limitations of current optical systems used for handheld real-time video HSI result in a limited focal depth, thereby posing usability issues for integration of the technology into t…
▽ More
Hyperspectral imaging (HSI) captures a greater level of spectral detail than traditional optical imaging, making it a potentially valuable intraoperative tool when precise tissue differentiation is essential. Hardware limitations of current optical systems used for handheld real-time video HSI result in a limited focal depth, thereby posing usability issues for integration of the technology into the operating room. This work integrates a focus-tunable liquid lens into a video HSI exoscope, and proposes novel video autofocusing methods based on deep reinforcement learning. A first-of-its-kind robotic focal-time scan was performed to create a realistic and reproducible testing dataset. We benchmarked our proposed autofocus algorithm against traditional policies, and found our novel approach to perform significantly ($p<0.05$) better than traditional techniques ($0.070\pm.098$ mean absolute focal error compared to $0.146\pm.148$). In addition, we performed a blinded usability trial by having two neurosurgeons compare the system with different autofocus policies, and found our novel approach to be the most favourable, making our system a desirable addition for intraoperative HSI.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model
Authors:
Shaoyan Pan,
Elham Abouei,
Jacob Wynne,
Tonghe Wang,
Richard L. J. Qiu,
Yuheng Li,
Chih-Wei Chang,
Junbo Peng,
Justin Roper,
Pretesh Patel,
David S. Yu,
Hui Mao,
Xiaofeng Yang
Abstract:
Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to…
▽ More
Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to facilitate radiation treatment planning. MC-DDPM implements diffusion processes with a shifted-window transformer network to generate sCT from MRI. The proposed model consists of two processes: a forward process which adds Gaussian noise to real CT scans, and a reverse process in which a shifted-window transformer V-net (Swin-Vnet) denoises the noisy CT scans conditioned on the MRI from the same patient to produce noise-free CT scans. With an optimally trained Swin-Vnet, the reverse diffusion process was used to generate sCT scans matching MRI anatomy. We evaluated the proposed method by generating sCT from MRI on a brain dataset and a prostate dataset. Qualitative evaluation was performed using the mean absolute error (MAE) of Hounsfield unit (HU), peak signal to noise ratio (PSNR), multi-scale Structure Similarity index (MS-SSIM) and normalized cross correlation (NCC) indexes between ground truth CTs and sCTs. MC-DDPM generated brain sCTs with state-of-the-art quantitative results with MAE 43.317 HU, PSNR 27.046 dB, SSIM 0.965, and NCC 0.983. For the prostate dataset, MC-DDPM achieved MAE 59.953 HU, PSNR 26.920 dB, SSIM 0.849, and NCC 0.948. In conclusion, we have developed and validated a novel approach for generating CT images from routine MRIs using a transformer-based DDPM. This model effectively captures the complex relationship between CT and MRI images, allowing for robust and high-quality synthetic CT (sCT) images to be generated in minutes.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI
Authors:
Yuheng Li,
Jacob Wynne,
**g Wang,
Richard L. J. Qiu,
Justin Roper,
Shaoyan Pan,
Ashesh B. Jani,
Tian Liu,
Pretesh R. Patel,
Hui Mao,
Xiaofeng Yang
Abstract:
Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learni…
▽ More
Biparametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection using convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learning (SSL) utilizes unlabeled data to generate meaningful semantic representations without the need for costly annotations, enhancing model performance on tasks with limited labeled data. We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI) and demonstrate the effectiveness of our proposed self-supervised pre-training framework. Using a large prostate bpMRI dataset with 1500 patients, we first pretrain CSwin transformer using multi-task self-supervised learning to improve data-efficiency and network generalizability. We then finetune using lesion annotations to perform csPCa detection. Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888 AUC and 0.545 Average Precision (AP), significantly outperforming four comparable models (Swin UNETR, DynUNet, Attention UNet, UNet). Using a separate bpMRI dataset with 158 patients, we evaluate our method robustness to external hold-out data. Self-supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating good generalization to external data.
△ Less
Submitted 17 March, 2024; v1 submitted 30 April, 2023;
originally announced May 2023.
-
Cycle-guided Denoising Diffusion Probability Model for 3D Cross-modality MRI Synthesis
Authors:
Shaoyan Pan,
Chih-Wei Chang,
Junbo Peng,
Jiahan Zhang,
Richard L. J. Qiu,
Tonghe Wang,
Justin Roper,
Tian Liu,
Hui Mao,
Xiaofeng Yang
Abstract:
This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two mod…
▽ More
This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two modalities. This improves image-to-image translation ac-curacy. We evaluated the CG-DDPM quantitatively using mean absolute error (MAE), multi-scale structural similarity index measure (MSSIM), and peak sig-nal-to-noise ratio (PSNR), as well as the network synthesis consistency, on the BraTS2020 dataset. Our proposed method showed high accuracy and reliable consistency for MRI synthesis. In addition, we compared the CG-DDPM with several other state-of-the-art networks and demonstrated statistically significant improvements in the image quality of synthetic MRIs. The proposed method enhances the capability of current multimodal MRI synthesis approaches, which could contribute to more accurate diagnosis and better treatment planning for patients by synthesizing additional MRI modalities.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report
Authors:
Jielin Qiu,
Jiacheng Zhu,
Shiqi Liu,
William Han,
**gqi Zhang,
Chao**g Duan,
Michael Rosenberg,
Emerson Liu,
Douglas Weber,
Ding Zhao
Abstract:
Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we…
▽ More
Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images is more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
△ Less
Submitted 6 November, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
A Persistent-Excitation-Free Method for System Disturbance Estimation Using Concurrent Learning
Authors:
Zengjie Zhang,
Fangzhou Liu,
Tong Liu,
Jianbin Qiu,
Martin Buss
Abstract:
Observer-based methods are widely used to estimate the disturbances of different dynamic systems. However, a drawback of the conventional disturbance observers is that they all assume persistent excitation (PE) of the systems. As a result, they may lead to poor estimation precision when PE is not ensured, for instance, when the disturbance gain of the system is close to the singularity. In this pa…
▽ More
Observer-based methods are widely used to estimate the disturbances of different dynamic systems. However, a drawback of the conventional disturbance observers is that they all assume persistent excitation (PE) of the systems. As a result, they may lead to poor estimation precision when PE is not ensured, for instance, when the disturbance gain of the system is close to the singularity. In this paper, we propose a novel disturbance observer based on concurrent learning (CL) with time-variant history stacks, which ensures high estimation precision even in PE-free cases. The disturbance observer is designed in both continuous and discrete time. The estimation errors of the proposed method are proved to converge to a bounded set using the Lyapunov method. A history-sample-selection procedure is proposed to reduce the estimation error caused by the accumulation of old history samples. A simulation study on epidemic control shows that the proposed method produces higher estimation precision than the conventional disturbance observer when PE is not satisfied. This justifies the correctness of the proposed CL-based disturbance observer and verifies its applicability to solving practical problems.
△ Less
Submitted 5 June, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Aligning Multi-Sequence CMR Towards Fully Automated Myocardial Pathology Segmentation
Authors:
Wangbin Ding,
Lei Li,
Junyi Qiu,
Sihan Wang,
Liqin Huang,
Yinyin Chen,
Shan Yang,
Xiahai Zhuang
Abstract:
Myocardial pathology segmentation (MyoPS) is critical for the risk stratification and treatment planning of myocardial infarction (MI). Multi-sequence cardiac magnetic resonance (MS-CMR) images can provide valuable information. For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visu…
▽ More
Myocardial pathology segmentation (MyoPS) is critical for the risk stratification and treatment planning of myocardial infarction (MI). Multi-sequence cardiac magnetic resonance (MS-CMR) images can provide valuable information. For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visualize myocardial scar and edema of MI, respectively. Existing methods usually fuse anatomical and pathological information from different CMR sequences for MyoPS, but assume that these images have been spatially aligned. However, MS-CMR images are usually unaligned due to the respiratory motions in clinical practices, which poses additional challenges for MyoPS. This work presents an automatic MyoPS framework for unaligned MS-CMR images. Specifically, we design a combined computing model for simultaneous image registration and information fusion, which aggregates multi-sequence features into a common space to extract anatomical structures (i.e., myocardium). Consequently, we can highlight the informative regions in the common space via the extracted myocardium to improve MyoPS performance, considering the spatial relationship between myocardial pathologies and myocardium. Experiments on a private MS-CMR dataset and a public dataset from the MYOPS2020 challenge show that our framework could achieve promising performance for fully automatic MyoPS.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
Multi-Scanner Canine Cutaneous Squamous Cell Carcinoma Histopathology Dataset
Authors:
Frauke Wilm,
Marco Fragoso,
Christof A. Bertram,
Nikolas Stathonikos,
Mathias Öttl,
**gna Qiu,
Robert Klopfleisch,
Andreas Maier,
Katharina Breininger,
Marc Aubreville
Abstract:
In histopathology, scanner-induced domain shifts are known to impede the performance of trained neural networks when tested on unseen data. Multi-domain pre-training or dedicated domain-generalization techniques can help to develop domain-agnostic algorithms. For this, multi-scanner datasets with a high variety of slide scanning systems are highly desirable. We present a publicly available multi-s…
▽ More
In histopathology, scanner-induced domain shifts are known to impede the performance of trained neural networks when tested on unseen data. Multi-domain pre-training or dedicated domain-generalization techniques can help to develop domain-agnostic algorithms. For this, multi-scanner datasets with a high variety of slide scanning systems are highly desirable. We present a publicly available multi-scanner dataset of canine cutaneous squamous cell carcinoma histopathology images, composed of 44 samples digitized with five slide scanners. This dataset provides local correspondences between images and thereby isolates the scanner-induced domain shift from other inherent, e.g. morphology-induced domain shifts. To highlight scanner differences, we present a detailed evaluation of color distributions, sharpness, and contrast of the individual scanner subsets. Additionally, to quantify the inherent scanner-induced domain shift, we train a tumor segmentation network on each scanner subset and evaluate the performance both in- and cross-domain. We achieve a class-averaged in-domain intersection over union coefficient of up to 0.86 and observe a cross-domain performance decrease of up to 0.38, which confirms the inherent domain shift of the presented dataset and its negative impact on the performance of deep neural networks.
△ Less
Submitted 27 February, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Mind the Gap: Scanner-induced domain shifts pose challenges for representation learning in histopathology
Authors:
Frauke Wilm,
Marco Fragoso,
Christof A. Bertram,
Nikolas Stathonikos,
Mathias Öttl,
**gna Qiu,
Robert Klopfleisch,
Andreas Maier,
Marc Aubreville,
Katharina Breininger
Abstract:
Computer-aided systems in histopathology are often challenged by various sources of domain shift that impact the performance of these algorithms considerably. We investigated the potential of using self-supervised pre-training to overcome scanner-induced domain shifts for the downstream task of tumor segmentation. For this, we present the Barlow Triplets to learn scanner-invariant representations…
▽ More
Computer-aided systems in histopathology are often challenged by various sources of domain shift that impact the performance of these algorithms considerably. We investigated the potential of using self-supervised pre-training to overcome scanner-induced domain shifts for the downstream task of tumor segmentation. For this, we present the Barlow Triplets to learn scanner-invariant representations from a multi-scanner dataset with local image correspondences. We show that self-supervised pre-training successfully aligned different scanner representations, which, interestingly only results in a limited benefit for our downstream task. We thereby provide insights into the influence of scanner characteristics for downstream applications and contribute to a better understanding of why established self-supervised methods have not yet shown the same success on histopathology data as they have for natural images.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Improved HER2 Tumor Segmentation with Subtype Balancing using Deep Generative Networks
Authors:
Mathias Öttl,
Jana Mönius,
Matthias Rübner,
Carol I. Geppert,
**gna Qiu,
Frauke Wilm,
Arndt Hartmann,
Matthias W. Beckmann,
Peter A. Fasching,
Andreas Maier,
Ramona Erber,
Katharina Breininger
Abstract:
Tumor segmentation in histopathology images is often complicated by its composition of different histological subtypes and class imbalance. Oversampling subtypes with low prevalence features is not a satisfactory solution since it eventually leads to overfitting. We propose to create synthetic images with semantically-conditioned deep generative networks and to combine subtype-balanced synthetic i…
▽ More
Tumor segmentation in histopathology images is often complicated by its composition of different histological subtypes and class imbalance. Oversampling subtypes with low prevalence features is not a satisfactory solution since it eventually leads to overfitting. We propose to create synthetic images with semantically-conditioned deep generative networks and to combine subtype-balanced synthetic images with the original dataset to achieve better segmentation performance. We show the suitability of Generative Adversarial Networks (GANs) and especially diffusion models to create realistic images based on subtype-conditioning for the use case of HER2-stained histopathology. Additionally, we show the capability of diffusion models to conditionally inpaint HER2 tumor areas with modified subtypes. Combining the original dataset with the same amount of diffusion-generated images increased the tumor Dice score from 0.833 to 0.854 and almost halved the variance between the HER2 subtype recalls. These results create the basis for more reliable automatic HER2 analysis with lower performance variance between individual HER2 subtypes.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
MyoPS-Net: Myocardial Pathology Segmentation with Flexible Combination of Multi-Sequence CMR Images
Authors:
Junyi Qiu,
Lei Li,
Sihan Wang,
Ke Zhang,
Yinyin Chen,
Shan Yang,
Xiahai Zhuang
Abstract:
Myocardial pathology segmentation (MyoPS) can be a prerequisite for the accurate diagnosis and treatment planning of myocardial infarction. However, achieving this segmentation is challenging, mainly due to the inadequate and indistinct information from an image. In this work, we develop an end-to-end deep neural network, referred to as MyoPS-Net, to flexibly combine five-sequence cardiac magnetic…
▽ More
Myocardial pathology segmentation (MyoPS) can be a prerequisite for the accurate diagnosis and treatment planning of myocardial infarction. However, achieving this segmentation is challenging, mainly due to the inadequate and indistinct information from an image. In this work, we develop an end-to-end deep neural network, referred to as MyoPS-Net, to flexibly combine five-sequence cardiac magnetic resonance (CMR) images for MyoPS. To extract precise and adequate information, we design an effective yet flexible architecture to extract and fuse cross-modal features. This architecture can tackle different numbers of CMR images and complex combinations of modalities, with output branches targeting specific pathologies. To impose anatomical knowledge on the segmentation results, we first propose a module to regularize myocardium consistency and localize the pathologies, and then introduce an inclusiveness loss to utilize relations between myocardial scars and edema. We evaluated the proposed MyoPS-Net on two datasets, i.e., a private one consisting of 50 paired multi-sequence CMR images and a public one from MICCAI2020 MyoPS Challenge. Experimental results showed that MyoPS-Net could achieve state-of-the-art performance in various scenarios. Note that in practical clinics, the subjects may not have full sequences, such as missing LGE CMR or map** CMR scans. We therefore conducted extensive experiments to investigate the performance of the proposed method in dealing with such complex combinations of different CMR sequences. Results proved the superiority and generalizability of MyoPS-Net, and more importantly, indicated a practical clinical application.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation
Authors:
Rao Ma,
Xiaobo Wu,
** Qiu,
Yanan Qin,
Haihua Xu,
Peihao Wu,
Zejun Ma
Abstract:
ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper,…
▽ More
ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper, we propose an adaptive LM fusion approach called internal language model estimation based adaptive domain adaptation (ILME-ADA). To realize such an ILME-ADA, an interpolated log-likelihood score is calculated based on the maximum of the scores from the internal LM and the external LM (ELM) respectively. We demonstrate the efficacy of the proposed ILME-ADA method with both RNN-T and LAS modeling frameworks employing neural network and n-gram LMs as ELMs respectively on two domain specific (target) test sets. The proposed method can achieve significantly better performance on the target test sets while it gets minimal performance degradation on the general test set, compared with both shallow and ILME-based LM fusion methods.
△ Less
Submitted 2 March, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction
Authors:
Jiacheng Zhu,
Jielin Qiu,
Zhuolin Yang,
Douglas Weber,
Michael A. Rosenberg,
Emerson Liu,
Bo Li,
Ding Zhao
Abstract:
There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specia…
▽ More
There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specialized expertise and human effort required. Moreover, deep learning classifiers may be vulnerable to adversarial examples and perturbations, which could have catastrophic consequences, for example, when applied in the context of medical treatment, clinical trials, or insurance claims. In this paper, we propose a physiologically-inspired data augmentation method to improve performance and increase the robustness of heart disease detection based on ECG signals. We obtain augmented samples by perturbing the data distribution towards other classes along the geodesic in Wasserstein space. To better utilize domain-specific knowledge, we design a ground metric that recognizes the difference between ECG signals based on physiologically determined features. Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions. Our results demonstrate improvements in accuracy and robustness, reflecting the effectiveness of our data augmentation method.
△ Less
Submitted 10 August, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Cardiac Disease Diagnosis on Imbalanced Electrocardiography Data Through Optimal Transport Augmentation
Authors:
Jielin Qiu,
Jiacheng Zhu,
Mengdi Xu,
Peide Huang,
Michael Rosenberg,
Douglas Weber,
Emerson Liu,
Ding Zhao
Abstract:
In this paper, we focus on a new method of data augmentation to solve the data imbalance problem within imbalanced ECG datasets to improve the robustness and accuracy of heart disease detection. By using Optimal Transport, we augment the ECG disease data from normal ECG beats to balance the data among different categories. We build a Multi-Feature Transformer (MF-Transformer) as our classification…
▽ More
In this paper, we focus on a new method of data augmentation to solve the data imbalance problem within imbalanced ECG datasets to improve the robustness and accuracy of heart disease detection. By using Optimal Transport, we augment the ECG disease data from normal ECG beats to balance the data among different categories. We build a Multi-Feature Transformer (MF-Transformer) as our classification model, where different features are extracted from both time and frequency domains to diagnose various heart conditions. Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions. Our results demonstrate 1) the classification models' ability to make competitive predictions on five ECG categories; 2) improvements in accuracy and robustness reflecting the effectiveness of our data augmentation method.
△ Less
Submitted 16 February, 2023; v1 submitted 24 January, 2022;
originally announced February 2022.
-
Pan-tumor CAnine cuTaneous Cancer Histology (CATCH) dataset
Authors:
Frauke Wilm,
Marco Fragoso,
Christian Marzahl,
**gna Qiu,
Chloé Puget,
Laura Diehl,
Christof A. Bertram,
Robert Klopfleisch,
Andreas Maier,
Katharina Breininger,
Marc Aubreville
Abstract:
Due to morphological similarities, the differentiation of histologic sections of cutaneous tumors into individual subtypes can be challenging. Recently, deep learning-based approaches have proven their potential for supporting pathologists in this regard. However, many of these supervised algorithms require a large amount of annotated data for robust development. We present a publicly available da…
▽ More
Due to morphological similarities, the differentiation of histologic sections of cutaneous tumors into individual subtypes can be challenging. Recently, deep learning-based approaches have proven their potential for supporting pathologists in this regard. However, many of these supervised algorithms require a large amount of annotated data for robust development. We present a publicly available dataset of 350 whole slide images of seven different canine cutaneous tumors complemented by 12,424 polygon annotations for 13 histologic classes, including seven cutaneous tumor subtypes. In inter-rater experiments, we show a high consistency of the provided labels, especially for tumor annotations. We further validate the dataset by training a deep neural network for the task of tissue segmentation and tumor subtype classification. We achieve a class-averaged Jaccard coefficient of 0.7047, and 0.9044 for tumor in particular. For classification, we achieve a slide-level accuracy of 0.9857. Since canine cutaneous tumors possess various histologic homologies to human tumors the added value of this dataset is not limited to veterinary pathology but extends to more general fields of application.
△ Less
Submitted 26 August, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition
Authors:
Jibao Qiu,
C. L. Philip Chen,
Tong Zhang
Abstract:
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. Previous work mainly focused on learning better representation via (mask) language model pre-training but ignored the intrinsic structure of the music, which is extremely important to the emotional expression of music. In this paper, we present a simple multi-task framework for SMER,…
▽ More
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. Previous work mainly focused on learning better representation via (mask) language model pre-training but ignored the intrinsic structure of the music, which is extremely important to the emotional expression of music. In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks derived from the intrinsic structure of the music. The results show that our multi-task framework can be adapted to different models. Moreover, the labels of auxiliary tasks are easy to be obtained, which means our multi-task methods do not require manually annotated labels other than emotion. Conducting on two publicly available datasets (EMOPIA and VGMIDI), the experiments show that our methods perform better in SMER task. Specifically, accuracy has been increased by 4.17 absolute point to 67.58 in EMOPIA dataset, and 1.97 absolute point to 55.85 in VGMIDI dataset. Ablation studies also show the effectiveness of multi-task methods designed in this paper.
△ Less
Submitted 15 January, 2022;
originally announced January 2022.
-
Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning
Authors:
Yuzi Yan,
Xiaoxiang Li,
Xinyou Qiu,
Jiantao Qiu,
Jian Wang,
Yu Wang,
Yuan Shen
Abstract:
Multi-agent formation as well as obstacle avoidance is one of the most actively studied topics in the field of multi-agent systems. Although some classic controllers like model predictive control (MPC) and fuzzy control achieve a certain measure of success, most of them require precise global information which is not accessible in harsh environments. On the other hand, some reinforcement learning…
▽ More
Multi-agent formation as well as obstacle avoidance is one of the most actively studied topics in the field of multi-agent systems. Although some classic controllers like model predictive control (MPC) and fuzzy control achieve a certain measure of success, most of them require precise global information which is not accessible in harsh environments. On the other hand, some reinforcement learning (RL) based approaches adopt the leader-follower structure to organize different agents' behaviors, which sacrifices the collaboration between agents thus suffering from bottlenecks in maneuverability and robustness. In this paper, we propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL). Agents in our system only utilize local and relative information to make decisions and control themselves distributively. Agent in the multi-agent system will reorganize themselves into a new topology quickly in case that any of them is disconnected. Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines (both classic control methods and another RL-based method). The feasibility of our method is verified by both simulation and hardware implementation with Ackermann-steering vehicles.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
Online Learning Koopman operator for closed-loop electrical neurostimulation in epilepsy
Authors:
Zhichao Liang,
Zixiang Luo,
Keyin Liu,
**gwei Qiu,
Quanying Liu
Abstract:
Electrical neuromodulation as a palliative treatment has been increasingly used in the control of epilepsy. However, current neuromodulations commonly implement predetermined actuation strategies and lack the capability of self-adaptively adjusting stimulation inputs. In this work, rooted in optimal control theory, we propose a Koopman-MPC framework for real-time closed-loop electrical neuromodula…
▽ More
Electrical neuromodulation as a palliative treatment has been increasingly used in the control of epilepsy. However, current neuromodulations commonly implement predetermined actuation strategies and lack the capability of self-adaptively adjusting stimulation inputs. In this work, rooted in optimal control theory, we propose a Koopman-MPC framework for real-time closed-loop electrical neuromodulation in epilepsy, which integrates i) a deep Koopman operator based dynamical model to predict the temporal evolution of epileptic EEG with an approximate finite-dimensional linear dynamics and ii) a model predictive control (MPC) module to design optimal seizure suppression strategies. The Koopman operator based linear dynamical model is embedded in the latent state space of the autoencoder neural network, in which we can approximate and update the Koopman operator online. The linear dynamical property of the Koopman operator ensures the convexity of the optimization problem for subsequent MPC control. The proposed deep Koopman operator model shows greater predictive capability than the baseline models (e.g., vector autoregressive model, kernel based method and recurrent neural network (RNN)) in both synthetic and real epileptic EEG data. Moreover, compared with the RNN-MPC framework, our Koopman-MPC framework can suppress seizure dynamics with better computational efficiency in both the Jansen-Rit model and the Epileptor model. Koopman-MPC framework opens a new window for model-based closed-loop neuromodulation and sheds light on nonlinear neurodynamics and feedback control policies.
△ Less
Submitted 20 August, 2022; v1 submitted 26 March, 2021;
originally announced March 2021.
-
An integral sliding-mode parallel control approach for general nonlinear systems via piecewise affine linear models
Authors:
Chunyang Zhang,
Qing Gao,
Yue Deng,
Jianbin Qiu
Abstract:
The fundamental problem of stabilizing a general nonaffine continuous-time nonlinear system is investigated via piecewise affine linear models (PALMs) in this article. A novel integral sliding-mode parallel control (ISMPC) approach is developed, where an uncertain piecewise affine system (PWA) is constructed to model a nonaffine continuous-time nonlinear system equivalently on a compact region con…
▽ More
The fundamental problem of stabilizing a general nonaffine continuous-time nonlinear system is investigated via piecewise affine linear models (PALMs) in this article. A novel integral sliding-mode parallel control (ISMPC) approach is developed, where an uncertain piecewise affine system (PWA) is constructed to model a nonaffine continuous-time nonlinear system equivalently on a compact region containing the origin. A piecewise sliding-mode parallel controller is designed to globally stabilize the PALM and, consequently, to semiglobally stabilize the original nonlinear system. The proposed scheme enjoys three favorable features: (i) some restrictions on the system input channel are eliminated, thus the developed method is more relaxed compared with the published approaches; (ii) it is convenient to be used to deal with both matched and unmatched uncertainties of the system; and (iii) the proposed piecewise parallel controller generates smooth control signals even around the boundaries between different subspaces, which makes the developed control strategy more implementable and reliable. Moreover, we provide discussions about the universality analysis of the developed control strategy for two kinds of typical nonlinear systems. Simulation results from two numerical examples further demonstrate the performance of the developed control approach.
△ Less
Submitted 15 February, 2023; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Generative Adversarial Network for Image Synthesis
Authors:
Yang Lei,
Richard L. J. Qiu,
Tonghe Wang,
Walter J. Curran,
Tian Liu,
Xiaofeng Yang
Abstract:
This chapter reviews recent developments of generative adversarial networks (GAN)-based methods for medical and biomedical image synthesis tasks. These methods are classified into conditional GAN and Cycle-GAN according to the network architecture designs. For each category, a literature survey is given, which covers discussions of the network architecture designs, highlights important contributio…
▽ More
This chapter reviews recent developments of generative adversarial networks (GAN)-based methods for medical and biomedical image synthesis tasks. These methods are classified into conditional GAN and Cycle-GAN according to the network architecture designs. For each category, a literature survey is given, which covers discussions of the network architecture designs, highlights important contributions and identifies specific challenges.
△ Less
Submitted 30 December, 2020;
originally announced December 2020.
-
Reconstruction Condition of Quantized Signals in Unlimited Sampling Framework
Authors:
Yan He,
Jifang Qiu,
Chang Liu,
Yue Liu,
Jian Wu
Abstract:
The latest theoretical advances in the field of unlimited sampling framework (USF) show the potential to avoid clip** problems of analog-to-digital converters (ADC). To date, most of the related works have focused on real-valued modulo samples, but little has been reported about the impact of quantization. In this paper, we study more practical USF system where modulo samples are quantized to a…
▽ More
The latest theoretical advances in the field of unlimited sampling framework (USF) show the potential to avoid clip** problems of analog-to-digital converters (ADC). To date, most of the related works have focused on real-valued modulo samples, but little has been reported about the impact of quantization. In this paper, we study more practical USF system where modulo samples are quantized to a finite number of bits. In particular, we present a new requirement about the lower bound of sampling rate to ensure exact recovery of original signals from quantized modulo samples. The minimum sampling rate is jointly determined by signal bandwidth and quantization bits. Numerical results show that in the presence of quantization noise, signals with different waveforms and bandwidths are recovered perfectly at the new minimum sampling rate while the recovery fails at minimum sampling rate before modification, which also verifies the correctness of the theory. The trade-offs of sampling rates, quantization bits and computational complexity of recovery algorithm are also given for practitioners to weigh.
△ Less
Submitted 28 November, 2020;
originally announced November 2020.
-
Federated Learning via Intelligent Reflecting Surface
Authors:
Zhibin Wang,
Jiahang Qiu,
Yong Zhou,
Yuanming Shi,
Liqun Fu,
Wei Chen,
Khaled B. Lataief
Abstract:
Over-the-air computation (AirComp) based federated learning (FL) is capable of achieving fast model aggregation by exploiting the waveform superposition property of multiple access channels. However, the model aggregation performance is severely limited by the unfavorable wireless propagation channels. In this paper, we propose to leverage intelligent reflecting surface (IRS) to achieve fast yet r…
▽ More
Over-the-air computation (AirComp) based federated learning (FL) is capable of achieving fast model aggregation by exploiting the waveform superposition property of multiple access channels. However, the model aggregation performance is severely limited by the unfavorable wireless propagation channels. In this paper, we propose to leverage intelligent reflecting surface (IRS) to achieve fast yet reliable model aggregation for AirComp-based FL. To optimize the learning performance, we formulate an optimization problem that jointly optimizes the device selection, the aggregation beamformer at the base station (BS), and the phase shifts at the IRS to maximize the number of devices participating in the model aggregation of each communication round under certain mean-squared-error (MSE) requirements. To tackle the formulated highly-intractable problem, we propose a two-step optimization framework. Specifically, we induce the sparsity of device selection in the first step, followed by solving a series of MSE minimization problems to find the maximum feasible device set in the second step. We then propose an alternating optimization framework, supported by the difference-of-convex-functions programming algorithm for low-rank optimization, to efficiently design the aggregation beamformers at the BS and phase shifts at the IRS. Simulation results will demonstrate that our proposed algorithm and the deployment of an IRS can achieve a lower training loss and higher FL prediction accuracy than the baseline algorithms.
△ Less
Submitted 11 November, 2020; v1 submitted 10 November, 2020;
originally announced November 2020.
-
Symbolic Semantic Segmentation and Interpretation of COVID-19 Lung Infections in Chest CT volumes based on Emergent Languages
Authors:
Aritra Chowdhury,
Alberto Santamaria-Pang,
James R. Kubricht,
Jianwei Qiu,
Peter Tu
Abstract:
The coronavirus disease (COVID-19) has resulted in a pandemic crippling the a breadth of services critical to daily life. Segmentation of lung infections in computerized tomography (CT) slices could be be used to improve diagnosis and understanding of COVID-19 in patients. Deep learning systems lack interpretability because of their black box nature. Inspired by human communication of complex idea…
▽ More
The coronavirus disease (COVID-19) has resulted in a pandemic crippling the a breadth of services critical to daily life. Segmentation of lung infections in computerized tomography (CT) slices could be be used to improve diagnosis and understanding of COVID-19 in patients. Deep learning systems lack interpretability because of their black box nature. Inspired by human communication of complex ideas through language, we propose a symbolic framework based on emergent languages for the segmentation of COVID-19 infections in CT scans of lungs. We model the cooperation between two artificial agents - a Sender and a Receiver. These agents synergistically cooperate using emergent symbolic language to solve the task of semantic segmentation. Our game theoretic approach is to model the cooperation between agents unlike Generative Adversarial Networks (GANs). The Sender retrieves information from one of the higher layers of the deep network and generates a symbolic sentence sampled from a categorical distribution of vocabularies. The Receiver ingests the stream of symbols and cogenerates the segmentation mask. A private emergent language is developed that forms the communication channel used to describe the task of segmentation of COVID infections. We augment existing state of the art semantic segmentation architectures with our symbolic generator to form symbolic segmentation models. Our symbolic segmentation framework achieves state of the art performance for segmentation of lung infections caused by COVID-19. Our results show direct interpretation of symbolic sentences to discriminate between normal and infected regions, infection morphology and image characteristics. We show state of the art results for segmentation of COVID-19 lung infections in CT.
△ Less
Submitted 22 August, 2020;
originally announced August 2020.
-
Reducing the X-ray radiation exposure frequency in cardio-angiography via deep-learning based video interpolation
Authors:
Xiao-Lei Yin,
Dong-Xue Liang,
Lu Wang,
**g Qiu,
Zhi-Yun Yang,
Jun-Hui Xing,
Jian-Zeng Dong,
Zhao-Yuan Ma
Abstract:
Cardiac coronary angiography is a major technology to assist doctors during cardiac interventional surgeries. Under the exposure of X-ray radiation, doctors inject contrast agents through catheters to determine the position and status of coronary vessels in real time. To get a coronary angiography video with a high frame rate, the doctor needs to increase the exposure frequency and intensity of th…
▽ More
Cardiac coronary angiography is a major technology to assist doctors during cardiac interventional surgeries. Under the exposure of X-ray radiation, doctors inject contrast agents through catheters to determine the position and status of coronary vessels in real time. To get a coronary angiography video with a high frame rate, the doctor needs to increase the exposure frequency and intensity of the X-ray. This will inevitably increase the X-ray harm to both patients and surgeons. In this work, we innovatively utilize a deep-learning based video interpolation algorithm to interpolate coronary angiography videos. Moreover, we establish a new coronary angiography image dataset ,which contains 95,039 triplets images to retrain the video interpolation network model. Using the retrained network we synthesize high frame rate coronary angiography video from the low frame rate coronary angiography video. The average peak signal to noise ratio(PSNR) of those synthesized video frames reaches 34dB. Extensive experiment results demonstrate the feasibility of using the video frame interpolation algorithm to synthesize continuous and clear high frame rate coronary angiography video. With the help of this technology, doctors can significantly reduce exposure frequency and intensity of the X-ray during coronary angiography.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Coronary Artery Segmentation in Angiographic Videos Using A 3D-2D CE-Net
Authors:
Lu Wang,
Dong-xue Liang,
Xiao-lei Yin,
**g Qiu,
Zhi-yun Yang,
Jun-hui Xing,
Jian-zeng Dong,
Zhao-yuan Ma
Abstract:
Coronary angiography is an indispensable assistive technique for cardiac interventional surgery. Segmentation and extraction of blood vessels from coronary angiography videos are very essential prerequisites for physicians to locate, assess and diagnose the plaques and stenosis in blood vessels. This article proposes a new video segmentation framework that can extract the clearest and most compreh…
▽ More
Coronary angiography is an indispensable assistive technique for cardiac interventional surgery. Segmentation and extraction of blood vessels from coronary angiography videos are very essential prerequisites for physicians to locate, assess and diagnose the plaques and stenosis in blood vessels. This article proposes a new video segmentation framework that can extract the clearest and most comprehensive coronary angiography images from a video sequence, thereby hel** physicians to better observe the condition of blood vessels. This framework combines a 3D convolutional layer to extract spatial--temporal information from a video sequence and a 2D CE--Net to accomplish the segmentation task of an image sequence. The input is a few continuous frames of angiographic video, and the output is a mask of segmentation result. From the results of segmentation and extraction, we can get good segmentation results despite the poor quality of coronary angiography video sequences.
△ Less
Submitted 17 May, 2020; v1 submitted 26 March, 2020;
originally announced March 2020.
-
Weakly-supervised 3D coronary artery reconstruction from two-view angiographic images
Authors:
Lu Wang,
Dong-xue Liang,
Xiao-lei Yin,
**g Qiu,
Zhi-yun Yang,
Jun-hui Xing,
Jian-zeng Dong,
Zhao-yuan Ma
Abstract:
The reconstruction of three-dimensional models of coronary arteries is of great significance for the localization, evaluation and diagnosis of stenosis and plaque in the arteries, as well as for the assisted navigation of interventional surgery. In the clinical practice, physicians use a few angles of coronary angiography to capture arterial images, so it is of great practical value to perform 3D…
▽ More
The reconstruction of three-dimensional models of coronary arteries is of great significance for the localization, evaluation and diagnosis of stenosis and plaque in the arteries, as well as for the assisted navigation of interventional surgery. In the clinical practice, physicians use a few angles of coronary angiography to capture arterial images, so it is of great practical value to perform 3D reconstruction directly from coronary angiography images. However, this is a very difficult computer vision task due to the complex shape of coronary blood vessels, as well as the lack of data set and key point labeling. With the rise of deep learning, more and more work is being done to reconstruct 3D models of human organs from medical images using deep neural networks. We propose an adversarial and generative way to reconstruct three dimensional coronary artery models, from two different views of angiographic images of coronary arteries. With 3D fully supervised learning and 2D weakly supervised learning schemes, we obtained reconstruction accuracies that outperform state-of-art techniques.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
SlimConv: Reducing Channel Redundancy in Convolutional Neural Networks by Weights Flip**
Authors:
Jiaxiong Qiu,
Cai Chen,
Shuaicheng Liu,
Bing Zeng
Abstract:
The channel redundancy in feature maps of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the performance of CNNs by reducing channel redundancies. Our SlimConv consists of three main steps: Reconstruct, Transform and Fuse, through which the features are splitt…
▽ More
The channel redundancy in feature maps of convolutional neural networks (CNNs) results in the large consumption of memories and computational resources. In this work, we design a novel Slim Convolution (SlimConv) module to boost the performance of CNNs by reducing channel redundancies. Our SlimConv consists of three main steps: Reconstruct, Transform and Fuse, through which the features are splitted and reorganized in a more efficient way, such that the learned weights can be compressed effectively. In particular, the core of our model is a weight flip** operation which can largely improve the feature diversities, contributing to the performance crucially. Our SlimConv is a plug-and-play architectural unit which can be used to replace convolutional layers in CNNs directly. We validate the effectiveness of SlimConv by conducting comprehensive experiments on ImageNet, MS COCO2014, Pascal VOC2012 segmentation, and Pascal VOC2007 detection datasets. The experiments show that SlimConv-equipped models can achieve better performances consistently, less consumption of memory and computation resources than non-equipped conterparts. For example, the ResNet-101 fitted with SlimConv achieves 77.84% top-1 classification accuracy with 4.87 GFLOPs and 27.96M parameters on ImageNet, which shows almost 0.5% better performance with about 3 GFLOPs and 38% parameters reduced.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
Deep Learning in Multi-organ Segmentation
Authors:
Yang Lei,
Yabo Fu,
Tonghe Wang,
Richard L. J. Qiu,
Walter J. Curran,
Tian Liu,
Xiaofeng Yang
Abstract:
This paper presents a review of deep learning (DL) in multi-organ segmentation. We summarized the latest DL-based methods for medical image segmentation and applications. These methods were classified into six categories according to their network design. For each category, we listed the surveyed works, highlighted important contributions and identified specific challenges. Following the detailed…
▽ More
This paper presents a review of deep learning (DL) in multi-organ segmentation. We summarized the latest DL-based methods for medical image segmentation and applications. These methods were classified into six categories according to their network design. For each category, we listed the surveyed works, highlighted important contributions and identified specific challenges. Following the detailed review of each category, we briefly discussed its achievements, shortcomings and future potentials. We provided a comprehensive comparison among DL-based methods for thoracic and head & neck multiorgan segmentation using benchmark datasets, including the 2017 AAPM Thoracic Auto-segmentation Challenge datasets and 2015 MICCAI Head Neck Auto-Segmentation Challenge datasets.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
DeepBlindness: Fast Blindness Map Estimation and Blindness Type Classification for Outdoor Scene from Single Color Image
Authors:
Jiaxiong Qiu,
Xinyuan Yu,
Guoqiang Yang,
Shuaicheng Liu
Abstract:
Outdoor vision robotic systems and autonomous cars suffer from many image-quality issues, particularly haze, defocus blur, and motion blur, which we will define generically as "blindness issues". These blindness issues may seriously affect the performance of robotic systems and could lead to unsafe decisions being made. However, existing solutions either focus on one type of blindness only or lack…
▽ More
Outdoor vision robotic systems and autonomous cars suffer from many image-quality issues, particularly haze, defocus blur, and motion blur, which we will define generically as "blindness issues". These blindness issues may seriously affect the performance of robotic systems and could lead to unsafe decisions being made. However, existing solutions either focus on one type of blindness only or lack the ability to estimate the degree of blindness accurately. Besides, heavy computation is needed so that these solutions cannot run in real-time on practical systems. In this paper, we provide a method which could simultaneously detect the type of blindness and provide a blindness map indicating to what degree the vision is limited on a pixel-by-pixel basis. Both the blindness type and the estimate of per-pixel blindness are essential for tasks like deblur, dehaze, or the fail-safe functioning of robotic systems. We demonstrate the effectiveness of our approach on the KITTI and CUHK datasets where experiments show that our method outperforms other state-of-the-art approaches, achieving speeds of about 130 frames per second (fps).
△ Less
Submitted 2 November, 2019;
originally announced November 2019.