-
Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability
Authors:
Zirui Qiu,
Hassan Rivaz,
Yiming Xiao
Abstract:
As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosi…
▽ More
As deep learning has become the state-of-the-art for computer-assisted diagnosis, interpretability of the automatic decisions is crucial for clinical deployment. While various methods were proposed in this domain, visual attention maps of clinicians during radiological screening offer a unique asset to provide important insights and can potentially enhance the quality of computer-assisted diagnosis. With this paper, we introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans. Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for saliency map prediction, and a multi-scale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multi-task learning, we proposed a multi-stage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance. Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.
△ Less
Submitted 29 March, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN
Authors:
Shiqi Zhang,
Zheng Qiu,
Daiki Takeuchi,
Noboru Harada,
Shoji Makino
Abstract:
With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot…
▽ More
With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.
△ Less
Submitted 4 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
$μ$-Net: ConvNext-Based U-Nets for Cosmic Muon Tomography
Authors:
Li Xin Jed Lim,
Ziming Qiu
Abstract:
Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage d…
▽ More
Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage deep learning algorithm, $μ$-Net, consisting of an MLP to predict the muon trajectory and a ConvNeXt-based U-Net to convert the scattering points into voxels. $μ$-Net achieves a state-of-the-art performance of 17.14 PSNR at the dosage of 1024 muons, outperforming traditional reconstruction algorithms such as the point of closest approach algorithm and maximum likelihood and expectation maximisation algorithm. Furthermore, we find that our method is robust to various corruptions such as inaccuracies in the muon momentum or a limited detector resolution. We also generate and publicly release the first large-scale dataset that maps muon detections to voxels. We hope that our research will spark further investigations into the potential of deep learning to revolutionise this field.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain Adaptation for Medical Image Enhancement
Authors:
Heng Li,
Ziqin Lin,
Zhongxi Qiu,
Zinan Li,
Huazhu Fu,
Yan Hu,
Jiang Liu
Abstract:
Medical imaging provides many valuable clues involving anatomical structure and pathological characteristics. However, image degradation is a common issue in clinical practice, which can adversely impact the observation and diagnosis by physicians and algorithms. Although extensive enhancement models have been developed, these models require a well pre-training before deployment, while failing to…
▽ More
Medical imaging provides many valuable clues involving anatomical structure and pathological characteristics. However, image degradation is a common issue in clinical practice, which can adversely impact the observation and diagnosis by physicians and algorithms. Although extensive enhancement models have been developed, these models require a well pre-training before deployment, while failing to take advantage of the potential value of inference data after deployment. In this paper, we raise an algorithm for source-free unsupervised domain adaptive medical image enhancement (SAME), which adapts and optimizes enhancement models using test data in the inference phase. A structure-preserving enhancement network is first constructed to learn a robust source model from synthesized training data. Then a teacher-student model is initialized with the source model and conducts source-free unsupervised domain adaptation (SFUDA) by knowledge distillation with the test data. Additionally, a pseudo-label picker is developed to boost the knowledge distillation of enhancement tasks. Experiments were implemented on ten datasets from three medical image modalities to validate the advantage of the proposed algorithm, and setting analysis and ablation studies were also carried out to interpret the effectiveness of SAME. The remarkable enhancement performance and benefits for downstream tasks demonstrate the potential and generalizability of SAME. The code is available at https://github.com/liamheng/Annotation-free-Medical-Image-Enhancement.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
MMA-Net: Multiple Morphology-Aware Network for Automated Cobb Angle Measurement
Authors:
Zhengxuan Qiu,
Jie Yang,
Jiankun Wang
Abstract:
Scoliosis diagnosis and assessment depend largely on the measurement of the Cobb angle in spine X-ray images. With the emergence of deep learning techniques that employ landmark detection, tilt prediction, and spine segmentation, automated Cobb angle measurement has become increasingly popular. However, these methods encounter difficulties such as high noise sensitivity, intricate computational pr…
▽ More
Scoliosis diagnosis and assessment depend largely on the measurement of the Cobb angle in spine X-ray images. With the emergence of deep learning techniques that employ landmark detection, tilt prediction, and spine segmentation, automated Cobb angle measurement has become increasingly popular. However, these methods encounter difficulties such as high noise sensitivity, intricate computational procedures, and exclusive reliance on a single type of morphological information. In this paper, we introduce the Multiple Morphology-Aware Network (MMA-Net), a novel framework that improves Cobb angle measurement accuracy by integrating multiple spine morphology as attention information. In the MMA-Net, we first feed spine X-ray images into the segmentation network to produce multiple morphological information (spine region, centerline, and boundary) and then concatenate the original X-ray image with the resulting segmentation maps as input for the regression module to perform precise Cobb angle measurement. Furthermore, we devise joint loss functions for our segmentation and regression network training, respectively. We evaluate our method on the AASCE challenge dataset and achieve superior performance with the SMAPE of 7.28% and the MAE of 3.18°, indicating a strong competitiveness compared to other outstanding methods. Consequently, we can offer clinicians automated, efficient, and reliable Cobb angle measurement.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Is visual explanation with Grad-CAM more reliable for deeper neural networks? a case study with automatic pneumothorax diagnosis
Authors:
Zirui Qiu,
Hassan Rivaz,
Yiming Xiao
Abstract:
While deep learning techniques have provided the state-of-the-art performance in various clinical tasks, explainability regarding their decision-making process can greatly enhance the credence of these methods for safer and quicker clinical adoption. With high flexibility, Gradient-weighted Class Activation Map** (Grad-CAM) has been widely adopted to offer intuitive visual interpretation of vari…
▽ More
While deep learning techniques have provided the state-of-the-art performance in various clinical tasks, explainability regarding their decision-making process can greatly enhance the credence of these methods for safer and quicker clinical adoption. With high flexibility, Gradient-weighted Class Activation Map** (Grad-CAM) has been widely adopted to offer intuitive visual interpretation of various deep learning models' reasoning processes in computer-assisted diagnosis. However, despite the popularity of the technique, there is still a lack of systematic study on Grad-CAM's performance on different deep learning architectures. In this study, we investigate its robustness and effectiveness across different popular deep learning models, with a focus on the impact of the networks' depths and architecture types, by using a case study of automatic pneumothorax diagnosis in X-ray scans. Our results show that deeper neural networks do not necessarily contribute to a strong improvement of pneumothorax diagnosis accuracy, and the effectiveness of GradCAM also varies among different network architectures.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models
Authors:
Shitong Shao,
Xiaohan Yuan,
Zhen Huang,
Ziming Qiu,
Shuai Wang,
Kevin Zhou
Abstract:
Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powe…
▽ More
Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at https://github.com/shaoshitong/DiffuseExpand.
△ Less
Submitted 6 June, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Efficient automatic segmentation for multi-level pulmonary arteries: The PARSE challenge
Authors:
Gongning Luo,
Kuanquan Wang,
Jun Liu,
Shuo Li,
Xinjie Liang,
Xiangyu Li,
Shaowei Gan,
Wei Wang,
Suyu Dong,
Wenyi Wang,
Pengxin Yu,
Enyou Liu,
Hongrong Wei,
Na Wang,
Jia Guo,
Huiqi Li,
Zhao Zhang,
Ziwei Zhao,
Na Gao,
Nan An,
Ashkan Pakzad,
Bojidar Rangelov,
Jiaqi Dou,
Song Tian,
Zeyu Liu
, et al. (5 additional authors not shown)
Abstract:
Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challengi…
▽ More
Efficient automatic segmentation of multi-level (i.e. main and branch) pulmonary arteries (PA) in CTPA images plays a significant role in clinical applications. However, most existing methods concentrate only on main PA or branch PA segmentation separately and ignore segmentation efficiency. Besides, there is no public large-scale dataset focused on PA segmentation, which makes it highly challenging to compare the different methods. To benchmark multi-level PA segmentation algorithms, we organized the first \textbf{P}ulmonary \textbf{AR}tery \textbf{SE}gmentation (PARSE) challenge. On the one hand, we focus on both the main PA and the branch PA segmentation. On the other hand, for better clinical application, we assign the same score weight to segmentation efficiency (mainly running time and GPU memory consumption during inference) while ensuring PA segmentation accuracy. We present a summary of the top algorithms and offer some suggestions for efficient and accurate multi-level PA automatic segmentation. We provide the PARSE challenge as open-access for the community to benchmark future algorithm developments at \url{https://parse2022.grand-challenge.org/Parse2022/}.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Hierarchical Fuel-Cell Airpath Control: an Efficiency-Aware MIMO Control Approach Combined with a Novel Constraint-Enforcing Reference Governor
Authors:
Eli Bacher-Chong,
Mostafa Ali Ayubirad,
Zeng Qiu,
Hao Wang,
Alireza Goshtasbi,
Hamid R. Ossareh
Abstract:
This paper presents a hierarchical multivariable control and constraint management approach for an air supply system for a proton exchange membrane fuel cell (PEMFC) system. The control objectives are to track desired compressor mass airflow and cathode inlet pressure, maintain a minimum oxygen excess ratio (OER), and run the system at maximum net efficiency. A multi-input multi-output (MIMO) inte…
▽ More
This paper presents a hierarchical multivariable control and constraint management approach for an air supply system for a proton exchange membrane fuel cell (PEMFC) system. The control objectives are to track desired compressor mass airflow and cathode inlet pressure, maintain a minimum oxygen excess ratio (OER), and run the system at maximum net efficiency. A multi-input multi-output (MIMO) internal model controller (IMC) is designed and simulated to track flow and pressure set-points, which showed high performance despite strongly coupled plant dynamics. A new set-point map is generated to compute the most efficient cathode inlet pressure from the stack current load. To enforce OER constraints, a novel reference governor (RG) with the ability to govern multiple references (the cascade RG) and the ability to speed up as well as slow down a reference signal (the cross-section RG) is developed and tested. Compared with a single-input single-output (SISO) air-flow control approach, the proposed MIMO control approach shows up to 7.36 percent lower hydrogen fuel consumption. Compared to a traditional load governor, the novel cascaded cross-section RG (CC-RG) shows up to 3.68 percent less mean absolute percent error (MAPE) on net power tracking and greatly improved worst-case OER on realistic drive-cycle simulations. Control development and validations were conducted on two fuel cell system (FCS) models, a nonlinear open-source model and a proprietary Ford high-fidelity model
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution
Authors:
Zhongwei Qiu,
Huan Yang,
Jianlong Fu,
Daochang Liu,
Chang Xu,
Dongmei Fu
Abstract:
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences…
▽ More
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Authors:
Xian Wu,
Shuxin Yang,
Zhaopeng Qiu,
Shen Ge,
Yangtian Yan,
Xingwang Wu,
Yefeng Zheng,
S. Kevin Zhou,
Li Xiao
Abstract:
Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate…
▽ More
Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate medical reports automatically. Different from typical image captioning approaches that generate reports with an encoder and a decoder, DeltaNet applies a conditional generation process. In particular, given a medical image, DeltaNet employs three steps to generate a report: 1) first retrieving related medical reports, i.e., the historical reports from the same or similar patients; 2) then comparing retrieved images and current image to find the differences; 3) finally generating a new report to accommodate identified differences based on the conditional report. We evaluate DeltaNet on a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches. Besides COVID-19, the proposed DeltaNet can be applied to other diseases as well. We validate its generalization capabilities on the public IU-Xray and MIMIC-CXR datasets for chest-related diseases. Code is available at \url{https://github.com/LX-doctorAI1/DeltaNet}.
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
Hard Exudate Segmentation Supplemented by Super-Resolution with Multi-scale Attention Fusion Module
Authors:
Jiayi Zhang,
Xiaoshan Chen,
Zhongxi Qiu,
Mingming Yang,
Yan Hu,
Jiang Liu
Abstract:
Hard exudates (HE) is the most specific biomarker for retina edema. Precise HE segmentation is vital for disease diagnosis and treatment, but automatic segmentation is challenged by its large variation of characteristics including size, shape and position, which makes it difficult to detect tiny lesions and lesion boundaries. Considering the complementary features between segmentation and super-re…
▽ More
Hard exudates (HE) is the most specific biomarker for retina edema. Precise HE segmentation is vital for disease diagnosis and treatment, but automatic segmentation is challenged by its large variation of characteristics including size, shape and position, which makes it difficult to detect tiny lesions and lesion boundaries. Considering the complementary features between segmentation and super-resolution tasks, this paper proposes a novel hard exudates segmentation method named SS-MAF with an auxiliary super-resolution task, which brings in helpful detailed features for tiny lesion and boundaries detection. Specifically, we propose a fusion module named Multi-scale Attention Fusion (MAF) module for our dual-stream framework to effectively integrate features of the two tasks. MAF first adopts split spatial convolutional (SSC) layer for multi-scale features extraction and then utilize attention mechanism for features fusion of the two tasks. Considering pixel dependency, we introduce region mutual information (RMI) loss to optimize MAF module for tiny lesions and boundary detection. We evaluate our method on two public lesion datasets, IDRiD and E-Ophtha. Our method shows competitive performance with low-resolution inputs, both quantitatively and qualitatively. On E-Ophtha dataset, the method can achieve $\geq3\%$ higher dice and recall compared with the state-of-the-art methods.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
A Long-term Dependent and Trustworthy Approach to Reactor Accident Prognosis based on Temporal Fusion Transformer
Authors:
Chengyuan Li,
Zhifang Qiu,
Yugao Ma,
Meifu Li
Abstract:
Prognosis of the reactor accident is a crucial way to ensure appropriate strategies are adopted to avoid radioactive releases. However, there is very limited research in the field of nuclear industry. In this paper, we propose a method for accident prognosis based on the Temporal Fusion Transformer (TFT) model with multi-headed self-attention and gating mechanisms. The method utilizes multiple cov…
▽ More
Prognosis of the reactor accident is a crucial way to ensure appropriate strategies are adopted to avoid radioactive releases. However, there is very limited research in the field of nuclear industry. In this paper, we propose a method for accident prognosis based on the Temporal Fusion Transformer (TFT) model with multi-headed self-attention and gating mechanisms. The method utilizes multiple covariates to improve prediction accuracy on the one hand, and quantile regression methods for uncertainty assessment on the other. The method proposed in this paper is applied to the prognosis after loss of coolant accidents (LOCAs) in HPR1000 reactor. Extensive experimental results show that the method surpasses novel deep learning-based prediction methods in terms of prediction accuracy and confidence. Furthermore, the interference experiments with different signal-to-noise ratios and the ablation experiments for static covariates further illustrate that the robustness comes from the ability to extract the features of static and historical covariates. In summary, this work for the first time applies the novel composite deep learning model TFT to the prognosis of key parameters after a reactor accident, and makes a positive contribution to the establishment of a more intelligent and staff-light maintenance method for reactor systems.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement
Authors:
Zhibin Qiu,
Mengfan Fu,
Yinfeng Yu,
LiLi Yin,
Fuchun Sun,
Hao Huang
Abstract:
Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time domain. Specifically, we design a joint network consisting of a determinist…
▽ More
Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the ``enhance-and-refine'' paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality. Our code and enhanced samples are available at https://github.com/zhibinQiu/SRTNet.git.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Case Studies for Computing Density of Reachable States for Safe Autonomous Motion Planning
Authors:
Yue Meng,
Zeng Qiu,
Md Tawhid Bin Waez,
Chuchu Fan
Abstract:
Density of the reachable states can help understand the risk of safety-critical systems, especially in situations when worst-case reachability is too conservative. Recent work provides a data-driven approach to compute the density distribution of autonomous systems' forward reachable states online. In this paper, we study the use of such approach in combination with model predictive control for ve…
▽ More
Density of the reachable states can help understand the risk of safety-critical systems, especially in situations when worst-case reachability is too conservative. Recent work provides a data-driven approach to compute the density distribution of autonomous systems' forward reachable states online. In this paper, we study the use of such approach in combination with model predictive control for verifiable safe path planning under uncertainties. We first use the learned density distribution to compute the risk of collision online. If such risk exceeds the acceptable threshold, our method will plan for a new path around the previous trajectory, with the risk of collision below the threshold. Our method is well-suited to handle systems with uncertainties and complicated dynamics as our data-driven approach does not need an analytical form of the systems' dynamics and can estimate forward state density with an arbitrary initial distribution of uncertainties. We design two challenging scenarios (autonomous driving and hovercraft control) for safe motion planning in environments with obstacles under system uncertainties. We first show that our density estimation approach can reach a similar accuracy as the Monte-Carlo-based method while using only 0.01X training samples. By leveraging the estimated risk, our algorithm achieves the highest success rate in goal reaching when enforcing the safety rate above 0.99.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Representation Learning based and Interpretable Reactor System Diagnosis Using Denoising Padded Autoencoder
Authors:
Chengyuan Li,
Zhifang Qiu,
Zhangrui Yan,
Meifu Li
Abstract:
With the mass construction of Gen III nuclear reactors, it is a popular trend to use deep learning (DL) techniques for fast and effective diagnosis of possible accidents. To overcome the common problems of previous work in diagnosing reactor accidents using deep learning theory, this paper proposes a diagnostic process that ensures robustness to noisy and crippled data and is interpretable. First,…
▽ More
With the mass construction of Gen III nuclear reactors, it is a popular trend to use deep learning (DL) techniques for fast and effective diagnosis of possible accidents. To overcome the common problems of previous work in diagnosing reactor accidents using deep learning theory, this paper proposes a diagnostic process that ensures robustness to noisy and crippled data and is interpretable. First, a novel Denoising Padded Autoencoder (DPAE) is proposed for representation extraction of monitoring data, with representation extractor still effective on disturbed data with signal-to-noise ratios up to 25.0 and monitoring data missing up to 40.0%. Secondly, a diagnostic framework using DPAE encoder for extraction of representations followed by shallow statistical learning algorithms is proposed, and such stepwise diagnostic approach is tested on disturbed datasets with 41.8% and 80.8% higher classification and regression task evaluation metrics, in comparison with the end-to-end diagnostic approaches. Finally, a hierarchical interpretation algorithm using SHAP and feature ablation is presented to analyze the importance of the input monitoring parameters and validate the effectiveness of the high importance parameters. The outcomes of this study provide a referential method for building robust and interpretable intelligent reactor anomaly diagnosis systems in scenarios with high safety requirements.
△ Less
Submitted 23 September, 2022; v1 submitted 30 August, 2022;
originally announced August 2022.
-
An Unsupervised Learning-based Framework for Effective Representation Extraction of Reactor Accidents
Authors:
Chengyuan Li,
Meifu Li,
Zhifang Qiu
Abstract:
With the increasing use of high-precision system analysis programs in nuclear engineering, the number of high-fidelity computational data for accident simulation is exploding. Therefore, an algorithm that can achieve both automatic extraction of low-dimensional features from the data and guarantee the validity of the features is needed to improve the performance and confidence of the accident diag…
▽ More
With the increasing use of high-precision system analysis programs in nuclear engineering, the number of high-fidelity computational data for accident simulation is exploding. Therefore, an algorithm that can achieve both automatic extraction of low-dimensional features from the data and guarantee the validity of the features is needed to improve the performance and confidence of the accident diagnosis system. This study proposes an autoencoder-based autonomous learning framework, namely Padded Auto-Encoder (PAE), which is able to automatically encode accident monitoring data that has been noise-added and with partially missing data into low-dimensional feature vectors via a Vision Transformer-based encoder, and to decode the feature vectors into noise-free and complete reconstructed monitoring data. Thus, the encoder part of the framework is able to automatically infer valid representations from partially missing and noisy monitoring data that reflect the complete and noise-free original data, and the representation vectors can be used for downstream tasks for accident diagnosis or else. In this paper, LOCA of HPR1000 was used as the study object, and the PAE was trained by an unsupervised method using cases with different break locations and sizes as the dataset. The encoder part of the pre-trained PAE was subsequently used as the feature extractor for the monitoring data, and several basic statistical learning algorithms for predicting the break locations and sizes. The results of the study show that the pre-trained diagnostic model with two stages has a better performance in break location and size diagnostic capability with an improvement of 41.62% and 80.86% in the metrics respectively, compared to the diagnostic model with end-to-end model structure.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?
Authors:
Xi Jia,
Joseph Bartlett,
Tianyang Zhang,
Wenqi Lu,
Zhaowen Qiu,
**ming Duan
Abstract:
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based metho…
▽ More
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at https://github.com/xi-jia/LKU-Net.
△ Less
Submitted 13 August, 2022; v1 submitted 7 August, 2022;
originally announced August 2022.
-
Post-hoc Interpretability based Parameter Selection for Data Oriented Nuclear Reactor Accident Diagnosis System
Authors:
Chengyuan Li,
Meifu Li,
Zhifang Qiu
Abstract:
During applying data-oriented diagnosis systems to distinguishing the type of and evaluating the severity of nuclear power plant initial events, it is of vital importance to decide which parameters to be used as the system input. However, although several diagnosis systems have already achieved acceptable performance in diagnosis precision and speed, hardly have the researchers discussed the metho…
▽ More
During applying data-oriented diagnosis systems to distinguishing the type of and evaluating the severity of nuclear power plant initial events, it is of vital importance to decide which parameters to be used as the system input. However, although several diagnosis systems have already achieved acceptable performance in diagnosis precision and speed, hardly have the researchers discussed the method of monitoring point choosing and its layout. For this reason, redundant measuring data are used to train the diagnostic model, leading to high uncertainty of the classification, extra training time consumption, and higher probability of overfitting while training. In this study, a method of choosing thermal hydraulics parameters of a nuclear power plant is proposed, using the theory of post-hoc interpretability theory in deep learning. At the start, a novel Time-sequential Residual Convolutional Neural Network (TRES-CNN) diagnosis model is introduced to identify the position and hydrodynamic diameter of breaks in LOCA, using 38 parameters manually chosen on HPR1000 empirically. Afterwards, post-hoc interpretability methods are applied to evaluate the attributions of diagnosis model's outputs, deciding which 15 parameters to be more decisive in diagnosing LOCA details. The results show that the TRES-CNN based diagnostic model successfully predicts the position and size of breaks in LOCA via selected 15 parameters of HPR1000, with 25% of time consumption while training the model compared the process using total 38 parameters. In addition, the relative diagnostic accuracy error is within 1.5 percent compared with the model using parameters chosen empirically, which can be regarded as the same amount of diagnostic reliability.
△ Less
Submitted 27 August, 2022; v1 submitted 2 August, 2022;
originally announced August 2022.
-
SuperVessel: Segmenting High-resolution Vessel from Low-resolution Retinal Image
Authors:
Yan Hu,
Zhongxi Qiu,
Dan Zeng,
Li Jiang,
Chen Lin,
Jiang Liu
Abstract:
Vascular segmentation extracts blood vessels from images and serves as the basis for diagnosing various diseases, like ophthalmic diseases. Ophthalmologists often require high-resolution segmentation results for analysis, which leads to super-computational load by most existing methods. If based on low-resolution input, they easily ignore tiny vessels or cause discontinuity of segmented vessels. T…
▽ More
Vascular segmentation extracts blood vessels from images and serves as the basis for diagnosing various diseases, like ophthalmic diseases. Ophthalmologists often require high-resolution segmentation results for analysis, which leads to super-computational load by most existing methods. If based on low-resolution input, they easily ignore tiny vessels or cause discontinuity of segmented vessels. To solve these problems, the paper proposes an algorithm named SuperVessel, which gives out high-resolution and accurate vessel segmentation using low-resolution images as input. We first take super-resolution as our auxiliary branch to provide potential high-resolution detail features, which can be deleted in the test phase. Secondly, we propose two modules to enhance the features of the interested segmentation region, including an upsampling with feature decomposition (UFD) module and a feature interaction module (FIM) with a constraining loss to focus on the interested features. Extensive experiments on three publicly available datasets demonstrate that our proposed SuperVessel can segment more tiny vessels with higher segmentation accuracy IoU over 6%, compared with other state-of-the-art algorithms. Besides, the stability of SuperVessel is also stronger than other algorithms. We will release the code after the paper is published.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Structure Unbiased Adversarial Model for Medical Image Segmentation
Authors:
Tianyang Zhang,
Shaoming Zheng,
Jun Cheng,
Xi Jia,
Joseph Bartlett,
Xinxing Cheng,
Huazhu Fu,
Zhaowen Qiu,
Jiang Liu,
**ming Duan
Abstract:
Generative models have been widely proposed in image recognition to generate more images where the distribution is similar to that of the real ones. It often introduces a discriminator network to differentiate the real data from the generated ones. Such models utilise a discriminator network tasked with differentiating style transferred data from data contained in the target dataset. However in do…
▽ More
Generative models have been widely proposed in image recognition to generate more images where the distribution is similar to that of the real ones. It often introduces a discriminator network to differentiate the real data from the generated ones. Such models utilise a discriminator network tasked with differentiating style transferred data from data contained in the target dataset. However in doing so the network focuses on discrepancies in the intensity distribution and may overlook structural differences between the datasets. In this paper we formulate a new image-to-image translation problem to ensure that the structure of the generated images is similar to that in the target dataset. We propose a simple, yet powerful Structure-Unbiased Adversarial (SUA) network which accounts for both intensity and structural differences between the training and test sets when performing image segmentation. It consists of a spatial transformation block followed by an intensity distribution rendering module. The spatial transformation block is proposed to reduce the structure gap between the two images, and also produce an inverse deformation field to warp the final segmented image back. The intensity distribution rendering module then renders the deformed structure to an image with the target intensity distribution. Experimental results show that the proposed SUA method has the capability to transfer both intensity distribution and structural content between multiple datasets.
△ Less
Submitted 11 August, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
BDG-Net: Boundary Distribution Guided Network for Accurate Polyp Segmentation
Authors:
Zihuan Qiu,
Zhichuan Wang,
Miaomiao Zhang,
Ziyong Xu,
Jie Fan,
Linfeng Xu
Abstract:
Colorectal cancer (CRC) is one of the most common fatal cancer in the world. Polypectomy can effectively interrupt the progression of adenoma to adenocarcinoma, thus reducing the risk of CRC development. Colonoscopy is the primary method to find colonic polyps. However, due to the different sizes of polyps and the unclear boundary between polyps and their surrounding mucosa, it is challenging to s…
▽ More
Colorectal cancer (CRC) is one of the most common fatal cancer in the world. Polypectomy can effectively interrupt the progression of adenoma to adenocarcinoma, thus reducing the risk of CRC development. Colonoscopy is the primary method to find colonic polyps. However, due to the different sizes of polyps and the unclear boundary between polyps and their surrounding mucosa, it is challenging to segment polyps accurately. To address this problem, we design a Boundary Distribution Guided Network (BDG-Net) for accurate polyp segmentation. Specifically, under the supervision of the ideal Boundary Distribution Map (BDM), we use Boundary Distribution Generate Module (BDGM) to aggregate high-level features and generate BDM. Then, BDM is sent to the Boundary Distribution Guided Decoder (BDGD) as complementary spatial information to guide the polyp segmentation. Moreover, a multi-scale feature interaction strategy is adopted in BDGD to improve the segmentation accuracy of polyps with different sizes. Extensive quantitative and qualitative evaluations demonstrate the effectiveness of our model, which outperforms state-of-the-art models remarkably on five public polyp datasets while maintaining low computational complexity. Code: https://github.com/zihuanqiu/BDG-Net
△ Less
Submitted 17 April, 2022; v1 submitted 3 January, 2022;
originally announced January 2022.
-
Efficient Fourier single-pixel imaging with Gaussian random sampling
Authors:
Ziheng Qiu,
Xinyi Guo,
Tianao Lu,
Pan Qi,
Zibang Zhang,
**gang Zhong
Abstract:
Fourier single-pixel imaging (FSI) is a branch of single-pixel imaging techniques. It uses Fourier basis patterns as structured patterns for spatial information acquisition in the Fourier domain. However, the spatial resolution of the image reconstructed by FSI mainly depends on the number of Fourier coefficients sampled. The reconstruction of a high-resolution image typically requires a number of…
▽ More
Fourier single-pixel imaging (FSI) is a branch of single-pixel imaging techniques. It uses Fourier basis patterns as structured patterns for spatial information acquisition in the Fourier domain. However, the spatial resolution of the image reconstructed by FSI mainly depends on the number of Fourier coefficients sampled. The reconstruction of a high-resolution image typically requires a number of Fourier coefficients to be sampled, and therefore takes a long data acquisition time. Here we propose a new sampling strategy for FSI. It allows FSI to reconstruct a clear and sharp image with a reduced number of measurements. The core of the proposed sampling strategy is to perform a variable density sampling in the Fourier space and, more importantly, the density with respect to the importance of Fourier coefficients is subject to a one-dimensional Gaussian function. Combined with compressive sensing, the proposed sampling strategy enables better reconstruction quality than conventional sampling strategies, especially when the sampling ratio is low. We experimentally demonstrate compressive FSI combined with the proposed sampling strategy is able to reconstruct a sharp and clear image of 256-by-256 pixels with a sampling ratio of 10%. The proposed method enables fast single-pixel imaging and provides a new approach for efficient spatial information acquisition.
△ Less
Submitted 28 June, 2021;
originally announced August 2021.
-
Deep Low-rank plus Sparse Network for Dynamic MR Imaging
Authors:
Wenqi Huang,
Ziwen Ke,
Zhuo-Xu Cui,
**g Cheng,
Zhilang Qiu,
Sen Jia,
Leslie Ying,
Yanjie Zhu,
Dong Liang
Abstract:
In dynamic magnetic resonance (MR) imaging, low-rank plus sparse (L+S) decomposition, or robust principal component analysis (PCA), has achieved stunning performance. However, the selection of the parameters of L+S is empirical, and the acceleration rate is limited, which are common failings of iterative compressed sensing MR imaging (CS-MRI) reconstruction methods. Many deep learning approaches h…
▽ More
In dynamic magnetic resonance (MR) imaging, low-rank plus sparse (L+S) decomposition, or robust principal component analysis (PCA), has achieved stunning performance. However, the selection of the parameters of L+S is empirical, and the acceleration rate is limited, which are common failings of iterative compressed sensing MR imaging (CS-MRI) reconstruction methods. Many deep learning approaches have been proposed to address these issues, but few of them use a low-rank prior. In this paper, a model-based low-rank plus sparse network, dubbed L+S-Net, is proposed for dynamic MR reconstruction. In particular, we use an alternating linearized minimization method to solve the optimization problem with low-rank and sparse regularization. Learned soft singular value thresholding is introduced to ensure the clear separation of the L component and S component. Then, the iterative steps are unrolled into a network in which the regularization parameters are learnable. We prove that the proposed L+S-Net achieves global convergence under two standard assumptions. Experiments on retrospective and prospective cardiac cine datasets show that the proposed model outperforms state-of-the-art CS and existing deep learning methods and has great potential for extremely high acceleration factors (up to 24x).
△ Less
Submitted 20 July, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Positive Contrast Susceptibility MR Imaging Using GPU-based Primal-Dual Algorithm
Authors:
Haifeng Wang,
Fang Cai,
Caiyun Shi,
**g Cheng,
Shi Su,
Zhilang Qiu,
Guoxi Xie,
Hanwei Chen,
Xin Liu,
Dong Liang
Abstract:
The susceptibility-based positive contrast MR technique was applied to estimate arbitrary magnetic susceptibility distributions of the metallic devices using a kernel deconvolution algorithm with a regularized L-1 minimization.Previously, the first-order primal-dual (PD) algorithm could provide a faster reconstruction time to solve the L-1 minimization, compared with other methods. Here, we propos…
▽ More
The susceptibility-based positive contrast MR technique was applied to estimate arbitrary magnetic susceptibility distributions of the metallic devices using a kernel deconvolution algorithm with a regularized L-1 minimization.Previously, the first-order primal-dual (PD) algorithm could provide a faster reconstruction time to solve the L-1 minimization, compared with other methods. Here, we propose to accelerate the PD algorithm of the positive contrast image using the multi-core multi-thread feature of graphics processor units (GPUs). The some experimental results showed that the GPU-based PD algorithm could achieve comparable accuracy of the metallic interventional devices in positive contrast imaging with less computational time. And the GPU-based PD approach was 4~15 times faster than the previous CPU-based scheme.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Linear Quadratic Gaussian Mean-Field Controls of Social Optima
Authors:
Zhenghong Qiu,
Jianhui Huang,
Tinghan Xie
Abstract:
This paper investigates a class of unified stochastic linear quadratic Gaussian (LQG) social optima problems involving a large number of weakly-coupled interactive agents under a {generalized} setting. For each individual agent, the control and state process enters both diffusion and drift terms in its linear dynamics, and the control weight might be \emph{indefinite} in cost functional. This setu…
▽ More
This paper investigates a class of unified stochastic linear quadratic Gaussian (LQG) social optima problems involving a large number of weakly-coupled interactive agents under a {generalized} setting. For each individual agent, the control and state process enters both diffusion and drift terms in its linear dynamics, and the control weight might be \emph{indefinite} in cost functional. This setup is {innovative and has great theoretical and realistic significance} as its applications in mathematical finance {(e.g., portfolio selection in mean-variation model)}. Using some \emph{fully-coupled} variational analysis under person-by-person optimality principle, and mean-field approximation method, the decentralized social control is derived by a class of new type consistency condition (CC) system for typical representative agent. Such CC system is some mean-field forward-backward stochastic differential equation (MF-FBSDE) combined with \emph{embedding representation}. The well-posedness of such forward-backward stochastic differential equation (FBSDE) system is carefully examined. The related social asymptotic optimality is related to the convergence of the average of a series of weakly-coupled backward stochastic differential equation (BSDE). They are verified through some Lyapunov equations.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Multi-Phase Cross-modal Learning for Noninvasive Gene Mutation Prediction in Hepatocellular Carcinoma
Authors:
Jiapan Gu,
Ziyuan Zhao,
Zeng Zeng,
Yuzhe Wang,
Zhengyiren Qiu,
Bharadwaj Veeravalli,
Brian Kim Poh Goh,
Glenn Kunnath Bonney,
Krishnakumar Madhavan,
Chan Wan Ying,
Lim Kheng Choon,
Thng Choon Hua,
Pierce KH Chow
Abstract:
Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer and the fourth most common cause of cancer-related death worldwide. Understanding the underlying gene mutations in HCC provides great prognostic value for treatment planning and targeted therapy. Radiogenomics has revealed an association between non-invasive imaging features and molecular genomics. However, imaging feat…
▽ More
Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer and the fourth most common cause of cancer-related death worldwide. Understanding the underlying gene mutations in HCC provides great prognostic value for treatment planning and targeted therapy. Radiogenomics has revealed an association between non-invasive imaging features and molecular genomics. However, imaging feature identification is laborious and error-prone. In this paper, we propose an end-to-end deep learning framework for mutation prediction in APOB, COL11A1 and ATRX genes using multiphasic CT scans. Considering intra-tumour heterogeneity (ITH) in HCC, multi-region sampling technology is implemented to generate the dataset for experiments. Experimental results demonstrate the effectiveness of the proposed model.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19
Authors:
Yifan Zhang,
Shuaicheng Niu,
Zhen Qiu,
Ying Wei,
Peilin Zhao,
Jianhua Yao,
Junzhou Huang,
Qingyao Wu,
Mingkui Tan
Abstract:
The outbreak of novel coronavirus disease 2019 (COVID-19) has already infected millions of people and is still rapidly spreading all over the globe. Most COVID-19 patients suffer from lung infection, so one important diagnostic method is to screen chest radiography images, e.g., X-Ray or CT images. However, such examinations are time-consuming and labor-intensive, leading to limited diagnostic eff…
▽ More
The outbreak of novel coronavirus disease 2019 (COVID-19) has already infected millions of people and is still rapidly spreading all over the globe. Most COVID-19 patients suffer from lung infection, so one important diagnostic method is to screen chest radiography images, e.g., X-Ray or CT images. However, such examinations are time-consuming and labor-intensive, leading to limited diagnostic efficiency. To solve this issue, AI-based technologies, such as deep learning, have been used recently as effective computer-aided means to improve diagnostic efficiency. However, one practical and critical difficulty is the limited availability of annotated COVID-19 data, due to the prohibitive annotation costs and urgent work of doctors to fight against the pandemic. This makes the learning of deep diagnosis models very challenging. To address this, motivated by that typical pneumonia has similar characteristics with COVID-19 and many pneumonia datasets are publicly available, we propose to conduct domain knowledge adaptation from typical pneumonia to COVID-19. There are two main challenges: 1) the discrepancy of data distributions between domains; 2) the task difference between the diagnosis of typical pneumonia and COVID-19. To address them, we propose a new deep domain adaptation method for COVID-19 diagnosis, namely COVID-DA. Specifically, we alleviate the domain discrepancy via feature adversarial adaptation and handle the task difference issue via a novel classifier separation scheme. In this way, COVID-DA is able to diagnose COVID-19 effectively with only a small number of COVID-19 annotations. Extensive experiments verify the effectiveness of COVID-DA and its great potential for real-world applications.
△ Less
Submitted 29 April, 2020;
originally announced May 2020.
-
Engine and Aftertreatment Co-Optimization of Connected HEVs via Multi-Range Vehicle Speed Planning and Prediction
Authors:
Qiuhao Hu,
Mohammad Reza Amini,
Yiheng Feng,
Zhen Yang,
Hao Wang,
Ilya Kolmanovsky,
**g Sun,
Ashley Wiese,
Zeng Qiu,
Julia Buckland Seeds
Abstract:
Connected vehicles (CVs) have situational awareness that can be exploited for control and optimization of the powertrain system. While extensive studies have been carried out for energy efficiency improvement of CVs via eco-driving and planning, the implication of such technologies on the thermal responses of CVs has not been fully investigated. One of the key challenges in leveraging connectivity…
▽ More
Connected vehicles (CVs) have situational awareness that can be exploited for control and optimization of the powertrain system. While extensive studies have been carried out for energy efficiency improvement of CVs via eco-driving and planning, the implication of such technologies on the thermal responses of CVs has not been fully investigated. One of the key challenges in leveraging connectivity for optimization-based thermal management of CVs is the relatively slow thermal dynamics, which necessitate the use of a long prediction horizon to achieve the best performance. Long-term prediction of the CV speed, unlike the V2V/V2I-based short-range prediction, is difficult and error-prone. The multiple timescales inherent to power and thermal systems call for a variable timescale optimization framework with access to short- and long-term vehicle speed preview. To this end, a model predictive controller (MPC) with a multi-range speed preview for integrated power and thermal management (iPTM) of connected hybrid electric vehicles (HEVs) is presented in this paper. The MPC is formulated to manage the power-split between the engine and the battery while enforcing the power and thermal (engine coolant and catalytic converter temperatures) constraints. The MPC exploits prediction and optimization over a shorter receding horizon and longer shrinking horizon. Over the longer shrinking horizon, the vehicle speed estimation is based on the data collected from the connected vehicles traveling on the same route as the ego-vehicle. Simulation results of applying the MPC over real-world urban driving cycles in Ann Arbor, MI are presented to demonstrate the effectiveness and fuel-saving potentials of the proposed iPTM strategy under the uncertainty associated with long-term predictions of the CV's speed.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Improving Uyghur ASR systems with decoders using morpheme-based language models
Authors:
Zicheng Qiu,
Wei Jiang,
Turghunjan Mamut
Abstract:
Uyghur is a minority language, and its resources for Automatic Speech Recognition (ASR) research are always insufficient. THUYG-20 is currently the only open-sourced dataset of Uyghur speeches. State-of-the-art results of its clean and noiseless speech test task haven't been updated since the first release, which shows a big gap in the development of ASR between mainstream languages and Uyghur. In…
▽ More
Uyghur is a minority language, and its resources for Automatic Speech Recognition (ASR) research are always insufficient. THUYG-20 is currently the only open-sourced dataset of Uyghur speeches. State-of-the-art results of its clean and noiseless speech test task haven't been updated since the first release, which shows a big gap in the development of ASR between mainstream languages and Uyghur. In this paper, we try to bridge the gap by ultimately optimizing the ASR systems, and by develo** a morpheme-based decoder, MLDG-Decoder (Morpheme Lattice Dynamically Generating Decoder for Uyghur DNN-HMM systems), which has long been missing. We have open-sourced the decoder. The MLDG-Decoder employs an algorithm, named as "on-the-fly composition with FEBABOS", to allow the back-off states and transitions to play the role of a relay station in on-the-fly composition. The algorithm empowers the dynamically generated graph to constrain the morpheme sequences in the lattices as effectively as the static and fully composed graph does when a 4-Gram morpheme-based Language Model (LM) is used. We have trained deeper and wider neural network acoustic models, and experimented with three kinds of decoding schemes. The experimental results show that the decoding based on the static and fully composed graph reduces state-of-the-art Word Error Rate (WER) on the clean and noiseless speech test task in THUYG-20 to 14.24%. The MLDG-Decoder reduces the WER to 14.54% while kee** the memory consumption reasonable. Based on the open-sourced MLDG-Decoder, readers can easily reproduce the experimental results in this paper.
△ Less
Submitted 4 March, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Deep Mouse: An End-to-end Auto-context Refinement Framework for Brain Ventricle and Body Segmentation in Embryonic Mice Ultrasound Volumes
Authors:
Tongda Xu,
Ziming Qiu,
William Das,
Chuiyu Wang,
Jack Langerman,
Nitin Nair,
Orlando Aristizabal,
Jonathan Mamou,
Daniel H. Turnbull,
Jeffrey A. Ketterling,
Yao Wang
Abstract:
High-frequency ultrasound (HFU) is well suited for imaging embryonic mice due to its noninvasive and real-time characteristics. However, manual segmentation of the brain ventricles (BVs) and body requires substantial time and expertise. This work proposes a novel deep learning based end-to-end auto-context refinement framework, consisting of two stages. The first stage produces a low resolution se…
▽ More
High-frequency ultrasound (HFU) is well suited for imaging embryonic mice due to its noninvasive and real-time characteristics. However, manual segmentation of the brain ventricles (BVs) and body requires substantial time and expertise. This work proposes a novel deep learning based end-to-end auto-context refinement framework, consisting of two stages. The first stage produces a low resolution segmentation of the BV and body simultaneously. The resulting probability map for each object (BV or body) is then used to crop a region of interest (ROI) around the target object in both the original image and the probability map to provide context to the refinement segmentation network. Joint training of the two stages provides significant improvement in Dice Similarity Coefficient (DSC) over using only the first stage (0.818 to 0.906 for the BV, and 0.919 to 0.934 for the body). The proposed method significantly reduces the inference time (102.36 to 0.09 s/volume around 1000x faster) while slightly improves the segmentation accuracy over the previous methods using slide-window approaches.
△ Less
Submitted 29 October, 2019; v1 submitted 20 October, 2019;
originally announced October 2019.
-
Automatic Mouse Embryo Brain Ventricle & Body Segmentation and Mutant Classification From Ultrasound Data Using Deep Learning
Authors:
Ziming Qiu,
Nitin Nair,
Jack Langerman,
Orlando Aristizabal,
Jonathan Mamou,
Daniel H. Turnbull,
Jeffrey A. Ketterling,
Yao Wang
Abstract:
High-frequency ultrasound (HFU) is well suited for imaging embryonic mice in vivo because it is non-invasive and real-time. Manual segmentation of the brain ventricles (BVs) and whole body from 3D HFU images is time-consuming and requires specialized training. This paper presents a deep-learning-based segmentation pipeline which automates several time-consuming, repetitive tasks currently performe…
▽ More
High-frequency ultrasound (HFU) is well suited for imaging embryonic mice in vivo because it is non-invasive and real-time. Manual segmentation of the brain ventricles (BVs) and whole body from 3D HFU images is time-consuming and requires specialized training. This paper presents a deep-learning-based segmentation pipeline which automates several time-consuming, repetitive tasks currently performed to study genetic mutations in develo** mouse embryos. Namely, the pipeline accurately segments the BV and body regions in 3D HFU images of mouse embryos, despite significant challenges due to position and shape variation of the embryos, as well as imaging artifacts. Based on the BV segmentation, a 3D convolutional neural network (CNN) is further trained to detect embryos with the Engrailed-1 (En1) mutation. The algorithms achieve 0.896 and 0.925 Dice Similarity Coefficient (DSC) for BV and body segmentation, respectively, and 95.8% accuracy on mutant classification. Through gradient based interrogation and visualization of the trained classifier, it is demonstrated that the model focuses on the morphological structures known to be affected by the En1 mutation.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Accelerating MR Imaging via Deep Chambolle-Pock Network
Authors:
Haifeng Wang,
**g Cheng,
Sen Jia,
Zhilang Qiu,
Caiyun Shi,
Lixian Zou,
Shi Su,
Yuchou Chang,
Yanjie Zhu,
Leslie Ying,
Dong Liang
Abstract:
Compressed sensing (CS) has been introduced to accelerate data acquisition in MR Imaging. However, CS-MRI methods suffer from detail loss with large acceleration and complicated parameter selection. To address the limitations of existing CS-MRI methods, a model-driven MR reconstruction is proposed that trains a deep network, named CP-net, which is derived from the Chambolle-Pock algorithm to recon…
▽ More
Compressed sensing (CS) has been introduced to accelerate data acquisition in MR Imaging. However, CS-MRI methods suffer from detail loss with large acceleration and complicated parameter selection. To address the limitations of existing CS-MRI methods, a model-driven MR reconstruction is proposed that trains a deep network, named CP-net, which is derived from the Chambolle-Pock algorithm to reconstruct the in vivo MR images of human brains from highly undersampled complex k-space data acquired on different types of MR scanners. The proposed deep network can learn the proximal operator and parameters among the Chambolle-Pock algorithm. All of the experiments show that the proposed CP-net achieves more accurate MR reconstruction results, outperforming state-of-the-art methods across various quantitative metrics.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Fast Calculation Method of Average g-Factor for Wave-CAIPI Imaging
Authors:
Haifeng Wang,
Zhilang Qiu,
Shi Su,
Leslie Ying,
Dong Liang
Abstract:
Wave-CAIPI MR imaging is a 3D imaging technique which can uniformize the g-factor maps and significantly reduce g-factor penalty at high acceleration factors. But it is time-consuming to calculate the average g-factor penalty for optimizing the parameters of Wave-CAIPI. In this paper, we propose a novel fast calculation method to calculate the average g-factor in Wave-CAIPI imaging. Wherein, the g…
▽ More
Wave-CAIPI MR imaging is a 3D imaging technique which can uniformize the g-factor maps and significantly reduce g-factor penalty at high acceleration factors. But it is time-consuming to calculate the average g-factor penalty for optimizing the parameters of Wave-CAIPI. In this paper, we propose a novel fast calculation method to calculate the average g-factor in Wave-CAIPI imaging. Wherein, the g-factor value in the arbitrary (e.g. the central) position is separately calculated and then approximated to the average g-factor using Taylor linear approximation. The verification experiments have demonstrated that the average g-factors of Wave-CAIPI imaging which are calculated by the proposed method is consistent with the previous time-consuming theoretical calculation method and the conventional pseudo multiple replica method. Comparison experiments show that the proposed method is averagely about 1000 times faster than the previous theoretical calculation method and about 1700 times faster than the conventional pseudo multiple replica method.
△ Less
Submitted 14 March, 2019; v1 submitted 5 March, 2019;
originally announced March 2019.
-
Deep BV: A Fully Automated System for Brain Ventricle Localization and Segmentation in 3D Ultrasound Images of Embryonic Mice
Authors:
Ziming Qiu,
Jack Langerman,
Nitin Nair,
Orlando Aristizabal,
Jonathan Mamou,
Daniel H. Turnbull,
Jeffrey Ketterling,
Yao Wang
Abstract:
Volumetric analysis of brain ventricle (BV) structure is a key tool in the study of central nervous system development in embryonic mice. High-frequency ultrasound (HFU) is the only non-invasive, real-time modality available for rapid volumetric imaging of embryos in utero. However, manual segmentation of the BV from HFU volumes is tedious, time-consuming, and requires specialized expertise. In th…
▽ More
Volumetric analysis of brain ventricle (BV) structure is a key tool in the study of central nervous system development in embryonic mice. High-frequency ultrasound (HFU) is the only non-invasive, real-time modality available for rapid volumetric imaging of embryos in utero. However, manual segmentation of the BV from HFU volumes is tedious, time-consuming, and requires specialized expertise. In this paper, we propose a novel deep learning based BV segmentation system for whole-body HFU images of mouse embryos. Our fully automated system consists of two modules: localization and segmentation. It first applies a volumetric convolutional neural network on a 3D sliding window over the entire volume to identify a 3D bounding box containing the entire BV. It then employs a fully convolutional network to segment the detected bounding box into BV and background. The system achieves a Dice Similarity Coefficient (DSC) of 0.8956 for BV segmentation on an unseen 111 HFU volume test set surpassing the previous state-of-the-art method (DSC of 0.7119) by a margin of 25%.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
An integrated localization-navigation scheme for distance-based docking of UAVs
Authors:
Thien-Minh Nguyen,
Zhirong Qiu,
Muqing Cao,
Thien Hoang Nguyen,
Lihua Xie
Abstract:
In this paper we study the distance-based docking problem of unmanned aerial vehicles (UAVs) by using a single landmark placed at an arbitrarily unknown position. To solve the problem, we propose an integrated estimation-control scheme to simultaneously achieve the relative localization and navigation tasks for discrete-time integrators under bounded velocity: a nonlinear adaptive estimation schem…
▽ More
In this paper we study the distance-based docking problem of unmanned aerial vehicles (UAVs) by using a single landmark placed at an arbitrarily unknown position. To solve the problem, we propose an integrated estimation-control scheme to simultaneously achieve the relative localization and navigation tasks for discrete-time integrators under bounded velocity: a nonlinear adaptive estimation scheme to estimate the relative position to the landmark, and a delicate control scheme to ensure both the convergence of the estimation and the asymptotic docking at the given landmark. A rigorous proof of convergence is provided by invoking the discrete-time LaSalle's invariance principle, and we also validate our theoretical findings on quadcopters equipped with ultra-wideband ranging sensors and optical flow sensors in a GPS-less environment.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
Data Rate for Distributed Consensus of Multi-agent Systems with High Order Oscillator Dynamics
Authors:
Zhirong Qiu,
Lihua Xie,
Yiguang Hong
Abstract:
Distributed consensus with data rate constraint is an important research topic of multi-agent systems. Some results have been obtained for consensus of multi-agent systems with integrator dynamics, but it remains challenging for general high-order systems, especially in the presence of unmeasurable states. In this paper, we study the quantized consensus problem for a special kind of high-order sys…
▽ More
Distributed consensus with data rate constraint is an important research topic of multi-agent systems. Some results have been obtained for consensus of multi-agent systems with integrator dynamics, but it remains challenging for general high-order systems, especially in the presence of unmeasurable states. In this paper, we study the quantized consensus problem for a special kind of high-order systems and investigate the corresponding data rate required for achieving consensus. The state matrix of each agent is a 2m-th order real Jordan block admitting m identical pairs of conjugate poles on the unit circle; each agent has a single input, and only the first state variable can be measured. The case of harmonic oscillators corresponding to m=1 is first investigated under a directed communication topology which contains a spanning tree, while the general case of m >= 2 is considered for a connected and undirected network. In both cases it is concluded that the sufficient number of communication bits to guarantee the consensus at an exponential convergence rate is an integer between $m$ and $2m$, depending on the location of the poles.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.