-
Towards Precision Cardiovascular Analysis in Zebrafish: The ZACAF Paradigm
Authors:
Amir Mohammad Naderi,
Jennifer G. Casey,
Mao-Hsiang Huang,
Rachelle Victorio,
David Y. Chiang,
Calum MacRae,
Hung Cao,
Vandana A. Gupta
Abstract:
Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend…
▽ More
Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend to be overfitted on their training dataset. This means that applying the same framework to new data with different imaging setups and mutant types can severely decrease performance. We have developed a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) to quantify the cardiac function in zebrafish. In this work, we further applied data augmentation, Transfer Learning (TL), and Test Time Augmentation (TTA) to ZACAF to improve the performance for the quantification of cardiovascular function quantification in zebrafish. This strategy can be integrated with the available frameworks to aid other researchers. We demonstrate that using TL, even with a constrained dataset, the model can be refined to accommodate a novel microscope setup, encompassing diverse mutant types and accommodating various video recording protocols. Additionally, as users engage in successive rounds of TL, the model is anticipated to undergo substantial enhancements in both generalizability and accuracy. Finally, we applied this approach to assess the cardiovascular function in nrap mutant zebrafish, a model of cardiomyopathy.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
MRAnnotator: A Multi-Anatomy Deep Learning Model for MRI Segmentation
Authors:
Alexander Zhou,
Zelong Liu,
Andrew Tieu,
Nikhil Patel,
Sean Sun,
Anthony Yang,
Peter Choi,
Valentin Fauveau,
George Soultanidis,
Mingqian Huang,
Amish Doshi,
Zahi A. Fayad,
Timothy Deyer,
Xueyan Mei
Abstract:
Purpose To develop a deep learning model for multi-anatomy and many-class segmentation of diverse anatomic structures on MRI imaging.
Materials and Methods In this retrospective study, two datasets were curated and annotated for model development and evaluation. An internal dataset of 1022 MRI sequences from various clinical sites within a health system and an external dataset of 264 MRI sequenc…
▽ More
Purpose To develop a deep learning model for multi-anatomy and many-class segmentation of diverse anatomic structures on MRI imaging.
Materials and Methods In this retrospective study, two datasets were curated and annotated for model development and evaluation. An internal dataset of 1022 MRI sequences from various clinical sites within a health system and an external dataset of 264 MRI sequences from an independent imaging center were collected. In both datasets, 49 anatomic structures were annotated as the ground truth. The internal dataset was divided into training, validation, and test sets and used to train and evaluate an nnU-Net model. The external dataset was used to evaluate nnU-Net model generalizability and performance in all classes on independent imaging data. Dice scores were calculated to evaluate model segmentation performance.
Results The model achieved an average Dice score of 0.801 on the internal test set, and an average score of 0.814 on the complete external dataset across 49 classes.
Conclusion The developed model achieves robust and generalizable segmentation of 49 anatomic structures on MRI imaging. A future direction is focused on the incorporation of additional anatomic regions and structures into the datasets and model.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Deep Learning-based Multi-Organ CT Segmentation with Adversarial Data Augmentation
Authors:
Shaoyan Pan,
Shao-Yuan Lo,
Min Huang,
Chaoqiong Ma,
Jacob Wynne,
Tonghe Wang,
Tian Liu,
Xiaofeng Yang
Abstract:
In this work, we propose an adversarial attack-based data augmentation method to improve the deep-learning-based segmentation algorithm for the delineation of Organs-At-Risk (OAR) in abdominal Computed Tomography (CT) to facilitate radiation therapy. We introduce Adversarial Feature Attack for Medical Image (AFA-MI) augmentation, which forces the segmentation network to learn out-of-distribution s…
▽ More
In this work, we propose an adversarial attack-based data augmentation method to improve the deep-learning-based segmentation algorithm for the delineation of Organs-At-Risk (OAR) in abdominal Computed Tomography (CT) to facilitate radiation therapy. We introduce Adversarial Feature Attack for Medical Image (AFA-MI) augmentation, which forces the segmentation network to learn out-of-distribution statistics and improve generalization and robustness to noises. AFA-MI augmentation consists of three steps: 1) generate adversarial noises by Fast Gradient Sign Method (FGSM) on the intermediate features of the segmentation network's encoder; 2) inject the generated adversarial noises into the network, intentionally compromising performance; 3) optimize the network with both clean and adversarial features. Experiments are conducted segmenting the heart, left and right kidney, liver, left and right lung, spinal cord, and stomach. We first evaluate the AFA-MI augmentation using nnUnet and TT-Vnet on the test data from a public abdominal dataset and an institutional dataset. In addition, we validate how AFA-MI affects the networks' robustness to the noisy data by evaluating the networks with added Gaussian noises of varying magnitudes to the institutional dataset. Network performance is quantitatively evaluated using Dice Similarity Coefficient (DSC) for volume-based accuracy. Also, Hausdorff Distance (HD) is applied for surface-based accuracy. On the public dataset, nnUnet with AFA-MI achieves DSC = 0.85 and HD = 6.16 millimeters (mm); and TT-Vnet achieves DSC = 0.86 and HD = 5.62 mm. AFA-MI augmentation further improves all contour accuracies up to 0.217 DSC score when tested on images with Gaussian noises. AFA-MI augmentation is therefore demonstrated to improve segmentation performance and robustness in CT multi-organ segmentation.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Two Efficient Beamforming Methods for Hybrid IRS-aided AF Relay Wireless Networks
Authors:
Xuehui Wang,
Feng Shu,
Mengxing Huang,
Fuhui Zhou,
Riqing Chen,
Cunhua Pan,
Yongpeng Wu,
Jiangzhou Wang
Abstract:
Due to the double fading effect caused by conventional passive intelligent reflecting surface (IRS), the signal via the reflection link is weak. To enhance the received signal, active elements with the ability to amplify the reflected signal are introduced to the passive IRS forming hybrid IRS. In this paper, we propose a hybrid IRS-aided amplify-and-forward (AF) relay wireless network, where an o…
▽ More
Due to the double fading effect caused by conventional passive intelligent reflecting surface (IRS), the signal via the reflection link is weak. To enhance the received signal, active elements with the ability to amplify the reflected signal are introduced to the passive IRS forming hybrid IRS. In this paper, we propose a hybrid IRS-aided amplify-and-forward (AF) relay wireless network, where an optimization problem is formulated, which is subject to the constraints of transmit power budgets at the source/AF relay/hybrid IRS and that of unit modulus for passive IRS elements. By alternately designing the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS, signal-to-noise ratio can be maximized. To achieve high rate performance and extend the coverage range, a high-performance method based on semidefinite relaxation and fractional programming (HP-SDR-FP) algorithm is presented. Due to its extremely high complexity, a low-complexity method based on whitening filter, general power iterative and generalized Rayleigh-Ritz (WF-GPI-GRR) is proposed, which is different from HP-SDR-FP method. It is assumed that the amplifying coefficient of each active IRS element is equal, and the corresponding analytical solution of the amplifying coefficient can be obtained according to the transmit powers at AF relay and hybrid IRS. Simulation results show that the proposed two methods can greatly improve the rate performance compared to the existing networks, such as the passive IRS-aided AF relay and only AF relay network. In particular, a 50.0% rate gain over the existing networks is approximately achieved in the high power budget region of hybrid IRS. Moreover, it is verified that the proposed HP-SDR-FP method perform better than WF-GPI-GRR method in terms of rate performance.
△ Less
Submitted 23 November, 2023; v1 submitted 7 January, 2023;
originally announced January 2023.
-
Linear Convergent Distributed Nash Equilibrium Seeking with Compression
Authors:
Xiaomeng Chen,
Yuchi Wu,
Xinlei Yi,
Minyi Huang,
Ling Shi
Abstract:
Information compression techniques are majorly employed to address the concern of reducing communication cost over peer-to-peer links. In this paper, we investigate distributed Nash equilibrium (NE) seeking problems in a class of non-cooperative games over directed graphs with information compression. To improve communication efficiency, a compressed distributed NE seeking (C-DNES) algorithm is pr…
▽ More
Information compression techniques are majorly employed to address the concern of reducing communication cost over peer-to-peer links. In this paper, we investigate distributed Nash equilibrium (NE) seeking problems in a class of non-cooperative games over directed graphs with information compression. To improve communication efficiency, a compressed distributed NE seeking (C-DNES) algorithm is proposed to obtain a NE for games, where the differences between decision vectors and their estimates are compressed. The proposed algorithm is compatible with a general class of compression operators, including both unbiased and biased compressors. Moreover, our approach only requires the adjacency matrix of the directed graph to be row-stochastic, in contrast to past works that relied on balancedness or specific global network parameters. It is shown that C-DNES not only inherits the advantages of conventional distributed NE algorithms, achieving linear convergence rate for games with restricted strongly monotone map**s, but also saves communication costs in terms of transmitted bits. Finally, numerical simulations illustrate the advantages of C-DNES in saving communication cost by an order of magnitude under different compressors.
△ Less
Submitted 21 September, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
ImageCAS: A Large-Scale Dataset and Benchmark for Coronary Artery Segmentation based on Computed Tomography Angiography Images
Authors:
An Zeng,
Chunbiao Wu,
Mei** Huang,
Jian Zhuang,
Shanshan Bi,
Dan Pan,
Najeeb Ullah,
Kaleem Nawaz Khan,
Tianchen Wang,
Yiyu Shi,
Xiaomeng Li,
Guisen Lin,
Xiaowei Xu
Abstract:
Cardiovascular disease (CVD) accounts for about half of non-communicable diseases. Vessel stenosis in the coronary artery is considered to be the major risk of CVD. Computed tomography angiography (CTA) is one of the widely used noninvasive imaging modalities in coronary artery diagnosis due to its superior image resolution. Clinically, segmentation of coronary arteries is essential for the diagno…
▽ More
Cardiovascular disease (CVD) accounts for about half of non-communicable diseases. Vessel stenosis in the coronary artery is considered to be the major risk of CVD. Computed tomography angiography (CTA) is one of the widely used noninvasive imaging modalities in coronary artery diagnosis due to its superior image resolution. Clinically, segmentation of coronary arteries is essential for the diagnosis and quantification of coronary artery disease. Recently, a variety of works have been proposed to address this problem. However, on one hand, most works rely on in-house datasets, and only a few works published their datasets to the public which only contain tens of images. On the other hand, their source code have not been published, and most follow-up works have not made comparison with existing works, which makes it difficult to judge the effectiveness of the methods and hinders the further exploration of this challenging yet critical problem in the community. In this paper, we propose a large-scale dataset for coronary artery segmentation on CTA images. In addition, we have implemented a benchmark in which we have tried our best to implement several typical existing methods. Furthermore, we propose a strong baseline method which combines multi-scale patch fusion and two-stage processing to extract the details of vessels. Comprehensive experiments show that the proposed method achieves better performance than existing works on the proposed large-scale dataset. The benchmark and the dataset are published at https://github.com/XiaoweiXu/ImageCAS-A-Large-Scale-Dataset-and-Benchmark-for-Coronary-Artery-Segmentation-based-on-CT.
△ Less
Submitted 17 October, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Interpretable CNN-Multilevel Attention Transformer for Rapid Recognition of Pneumonia from Chest X-Ray Images
Authors:
Shengchao Chen,
Sufen Ren,
Guanjun Wang,
Mengxing Huang,
Chenyang Xue
Abstract:
Chest imaging plays an essential role in diagnosing and predicting patients with COVID-19 with evidence of worsening respiratory status. Many deep learning-based approaches for pneumonia recognition have been developed to enable computer-aided diagnosis. However, the long training and inference time makes them inflexible, and the lack of interpretability reduces their credibility in clinical medic…
▽ More
Chest imaging plays an essential role in diagnosing and predicting patients with COVID-19 with evidence of worsening respiratory status. Many deep learning-based approaches for pneumonia recognition have been developed to enable computer-aided diagnosis. However, the long training and inference time makes them inflexible, and the lack of interpretability reduces their credibility in clinical medical practice. This paper aims to develop a pneumonia recognition framework with interpretability, which can understand the complex relationship between lung features and related diseases in chest X-ray (CXR) images to provide high-speed analytics support for medical practice. To reduce the computational complexity to accelerate the recognition process, a novel multi-level self-attention mechanism within Transformer has been proposed to accelerate convergence and emphasize the task-related feature regions. Moreover, a practical CXR image data augmentation has been adopted to address the scarcity of medical image data problems to boost the model's performance. The effectiveness of the proposed method has been demonstrated on the classic COVID-19 recognition task using the widespread pneumonia CXR image dataset. In addition, abundant ablation experiments validate the effectiveness and necessity of all of the components of the proposed method.
△ Less
Submitted 13 January, 2024; v1 submitted 29 October, 2022;
originally announced October 2022.
-
Multi-View Imputation and Cross-Attention Network Based on Incomplete Longitudinal and Multimodal Data for Conversion Prediction of Mild Cognitive Impairment
Authors:
Tao Wang,
Xiumei Chen,
Xiaoling Zhang,
Shuoling Zhou,
Qian** Feng,
Meiyan Huang
Abstract:
Predicting whether subjects with mild cognitive impairment (MCI) will convert to Alzheimer's disease is a significant clinical challenge. Longitudinal variations and complementary information inherent in longitudinal and multimodal data are crucial for MCI conversion prediction, but persistent issue of missing data in these data may hinder their effective application. Additionally, conversion pred…
▽ More
Predicting whether subjects with mild cognitive impairment (MCI) will convert to Alzheimer's disease is a significant clinical challenge. Longitudinal variations and complementary information inherent in longitudinal and multimodal data are crucial for MCI conversion prediction, but persistent issue of missing data in these data may hinder their effective application. Additionally, conversion prediction should be achieved in the early stages of disease progression in clinical practice, specifically at baseline visit (BL). Therefore, longitudinal data should only be incorporated during training to capture disease progression information. To address these challenges, a multi-view imputation and cross-attention network (MCNet) was proposed to integrate data imputation and MCI conversion prediction in a unified framework. First, a multi-view imputation method combined with adversarial learning was presented to handle various missing data scenarios and reduce imputation errors. Second, two cross-attention blocks were introduced to exploit the potential associations in longitudinal and multimodal data. Finally, a multi-task learning model was established for data imputation, longitudinal classification, and conversion prediction tasks. When the model was appropriately trained, the disease progression information learned from longitudinal data can be leveraged by BL data to improve MCI conversion prediction at BL. MCNet was tested on two independent testing sets and single-modal BL data to verify its effectiveness and flexibility in MCI conversion prediction. Results showed that MCNet outperformed several competitive methods. Moreover, the interpretability of MCNet was demonstrated. Thus, our MCNet may be a valuable tool in longitudinal and multimodal data analysis for MCI conversion prediction. Codes are available at https://github.com/Meiyan88/MCNET.
△ Less
Submitted 25 May, 2023; v1 submitted 16 June, 2022;
originally announced June 2022.
-
Performance Analysis of Wireless Network Aided by Discrete-Phase-Shifter IRS
Authors:
Rongen Dong,
Yin Teng,
Zhongwen Sun,
Jun Zou,
Mengxing Huang,
Jun Li,
Feng Shu,
Jiangzhou Wang
Abstract:
Discrete phase shifters of intelligent reflecting surface (IRS) generates phase quantization error (QE) and degrades the receive performance at the receiver. To make an analysis of the performance loss caused by IRS with phase QE, based on the law of large numbers, the closed-form expressions of signal-to-noise ratio (SNR) performance loss (PL), achievable rate (AR), and bit error rate (BER) are s…
▽ More
Discrete phase shifters of intelligent reflecting surface (IRS) generates phase quantization error (QE) and degrades the receive performance at the receiver. To make an analysis of the performance loss caused by IRS with phase QE, based on the law of large numbers, the closed-form expressions of signal-to-noise ratio (SNR) performance loss (PL), achievable rate (AR), and bit error rate (BER) are successively derived under line-of-sight (LoS) channels and Rayleigh channels. Moreover, based on the Taylor series expansion, the approximate simple closed form of PL of IRS with approximate QE is also given. The simulation results show that the performance losses of SNR and AR decrease as the number of quantization bits increase, while they gradually increase with the number of IRS phase shifter elements increase. Regardless of LoS channels or Rayleigh channels, when the number of quantization bits is larger than or equal to 3, the performance losses of SNR and AR are less than 0.23dB and 0.08bits/s/Hz, respectively, and the BER performance degradation is trivial. In particular, the performance loss difference between IRS with QE and IRS with approximate QE is negligible when the number of quantization bits is not less than 2.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Automated Sleep Staging via Parallel Frequency-Cut Attention
Authors:
Zheng Chen,
Ziwei Yang,
Lingwei Zhu,
Wei Chen,
Toshiyo Tamura,
Naoaki Ono,
MD Altaf-Ul-Amin,
Shigehiko Kanaya,
Ming Huang
Abstract:
This paper proposes a novel framework for automatically capturing the time-frequency nature of electroencephalogram (EEG) signals of human sleep based on the authoritative sleep medicine guidance. The framework consists of two parts: the first part extracts informative features by partitioning the input EEG spectrograms into a sequence of time-frequency patches. The second part is constituted by a…
▽ More
This paper proposes a novel framework for automatically capturing the time-frequency nature of electroencephalogram (EEG) signals of human sleep based on the authoritative sleep medicine guidance. The framework consists of two parts: the first part extracts informative features by partitioning the input EEG spectrograms into a sequence of time-frequency patches. The second part is constituted by an attention-based architecture to efficiently search for the correlation between partitioned time-frequency patches and defining factors of sleep stages in parallel. The proposed pipeline is validated on the Sleep Heart Health Study dataset with new state-of-the-art results for the stages wake, N2, and N3, obtaining respective F1 scores of 0.93, 0.88, and 0.87, with only EEG signals used. The proposed method also has a high inter-rater reliability of 0.80 kappa. We also visualize the correspondence between sleep staging decisions and features extracted by the proposed method, providing strong interpretability for our model.
△ Less
Submitted 12 January, 2023; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Semi-Supervised Hybrid Spine Network for Segmentation of Spine MR Images
Authors:
Meiyan Huang,
Shuoling Zhou,
Xiumei Chen,
Haoran Lai,
Qian** Feng
Abstract:
Automatic segmentation of vertebral bodies (VBs) and intervertebral discs (IVDs) in 3D magnetic resonance (MR) images is vital in diagnosing and treating spinal diseases. However, segmenting the VBs and IVDs simultaneously is not trivial. Moreover, problems exist, including blurry segmentation caused by anisotropy resolution, high computational cost, inter-class similarity and intra-class variabil…
▽ More
Automatic segmentation of vertebral bodies (VBs) and intervertebral discs (IVDs) in 3D magnetic resonance (MR) images is vital in diagnosing and treating spinal diseases. However, segmenting the VBs and IVDs simultaneously is not trivial. Moreover, problems exist, including blurry segmentation caused by anisotropy resolution, high computational cost, inter-class similarity and intra-class variability, and data imbalances. We proposed a two-stage algorithm, named semi-supervised hybrid spine network (SSHSNet), to address these problems by achieving accurate simultaneous VB and IVD segmentation. In the first stage, we constructed a 2D semi-supervised DeepLabv3+ by using cross pseudo supervision to obtain intra-slice features and coarse segmentation. In the second stage, a 3D full-resolution patch-based DeepLabv3+ was built. This model can be used to extract inter-slice information and combine the coarse segmentation and intra-slice features provided from the first stage. Moreover, a cross tri-attention module was applied to compensate for the loss of inter-slice and intra-slice information separately generated from 2D and 3D networks, thereby improving feature representation ability and achieving satisfactory segmentation results. The proposed SSHSNet was validated on a publicly available spine MR image dataset, and remarkable segmentation performance was achieved. Moreover, results show that the proposed method has great potential in dealing with the data imbalance problem. Based on previous reports, few studies have incorporated a semi-supervised learning strategy with a cross attention mechanism for spine segmentation. Therefore, the proposed method may provide a useful tool for spine segmentation and aid clinically in spinal disease diagnoses and treatments. Codes are publicly available at: https://github.com/Meiyan88/SSHSNet.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks
Authors:
Yizhou Lu,
Mingkun Huang,
Xinghua Qu,
Pengfei Wei,
Zejun Ma
Abstract:
Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More…
▽ More
Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More importantly, we propose a novel language adaptive pre-training approach based on sparse sharing sub-networks. It makes room for language specific modeling by pruning out unimportant parameters for each language, without requiring any manually designed language specific component. After pruning, each language only maintains a sparse sub-network, while the sub-networks are partially shared with each other. Experimental results on a downstream multilingual speech recognition task show that our proposed method significantly outperforms baseline XLSR models on both high resource and low resource languages. Besides, our proposed method consistently outperforms other adaptation methods and requires fewer parameters.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Robust Dynamic State Estimator of Integrated Energy Systems based on Natural Gas Partial Differential Equations
Authors:
Liang Chen,
Yang Li,
Manyun Huang,
Xinxin Hui,
Songlin Gu
Abstract:
The reliability and precision of dynamic database are vital for the optimal operating and global control of integrated energy systems. One of the effective ways to obtain the accurate states is state estimations. A novel robust dynamic state estimation methodology for integrated natural gas and electric power systems is proposed based on Kalman filter. To take full advantage of measurement redunda…
▽ More
The reliability and precision of dynamic database are vital for the optimal operating and global control of integrated energy systems. One of the effective ways to obtain the accurate states is state estimations. A novel robust dynamic state estimation methodology for integrated natural gas and electric power systems is proposed based on Kalman filter. To take full advantage of measurement redundancies and predictions for enhancing the estimating accuracy, the dynamic state estimation model coupling gas and power systems by gas turbine units is established. The exponential smoothing technique and gas physical model are integrated in Kalman filter. Additionally, the time-varying scalar matrix is proposed to conquer bad data in Kalman filter algorithm. The proposed method is applied to an integrated gas and power systems formed by GasLib-40 and IEEE 39-bus system with five gas turbine units. The simulating results show that the method can obtain the accurate dynamic states under three different measurement error conditions, and the filtering performance are better than separate estimation methods. Additionally, the proposed method is robust when the measurements experience bad data.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Machine-learning-aided Massive Hybrid Analog and Digital MIMO DOA Estimation for Future Wireless Networks
Authors:
Feng Shu,
Yiwen Chen,
Xichao Zhan,
Wenlong Cai,
Mengxing Huang,
Qijuan Jie,
Yifang Li,
Baihua Shi,
Jiangzhou Wang,
Xiaohu You
Abstract:
Due to a high spatial angle resolution and low circuit cost of massive hybrid analog and digital (HAD) multiple-input multiple-output (MIMO), it is viewed as a valuable green communication technology for future wireless networks. Combining a massive HAD-MIMO with direction of arrival (DOA) will provide a high-precision even ultra-high-precision DOA measurement performance approaching the fully-dig…
▽ More
Due to a high spatial angle resolution and low circuit cost of massive hybrid analog and digital (HAD) multiple-input multiple-output (MIMO), it is viewed as a valuable green communication technology for future wireless networks. Combining a massive HAD-MIMO with direction of arrival (DOA) will provide a high-precision even ultra-high-precision DOA measurement performance approaching the fully-digital (FD) MIMO. However, phase ambiguity is a challenge issue for a massive HAD-MIMO DOA estimation. In this paper, we review three aspects: detection, estimation, and Cramer-Rao lower bound (CRLB) with low-resolution ADCs at receiver. First, a multi-layer-neural-network (MLNN) detector is proposed to infer the existence of passive emitters. Then, a two-layer HAD (TLHAD) MIMO structure is proposed to eliminate phase ambiguity using only one-snapshot. Simulation results show that the proposed MLNN detector is much better than both the existing generalized likelihood ratio test (GRLT) and the ratio of maximum eigen-value (Max-EV) to minimum eigen-value (R-MaxEV-MinEV) in terms of detection probability. Additionally, the proposed TLHAD structure can achieve the corresponding CRLB using single snapshot.
△ Less
Submitted 5 August, 2023; v1 submitted 12 January, 2022;
originally announced January 2022.
-
"One-Shot" Reduction of Additive Artifacts in Medical Images
Authors:
Yu-Jen Chen,
Yen-Jung Chang,
Shao-Cheng Wen,
Yiyu Shi,
Xiaowei Xu,
Tsung-Yi Ho,
Mei** Huang,
Haiyun Yuan,
Jian Zhuang
Abstract:
Medical images may contain various types of artifacts with different patterns and mixtures, which depend on many factors such as scan setting, machine condition, patients' characteristics, surrounding environment, etc. However, existing deep-learning-based artifact reduction methods are restricted by their training set with specific predetermined artifact types and patterns. As such, they have lim…
▽ More
Medical images may contain various types of artifacts with different patterns and mixtures, which depend on many factors such as scan setting, machine condition, patients' characteristics, surrounding environment, etc. However, existing deep-learning-based artifact reduction methods are restricted by their training set with specific predetermined artifact types and patterns. As such, they have limited clinical adoption. In this paper, we introduce One-Shot medical image Artifact Reduction (OSAR), which exploits the power of deep learning but without using pre-trained general networks. Specifically, we train a light-weight image-specific artifact reduction network using data synthesized from the input image at test-time. Without requiring any prior large training data set, OSAR can work with almost any medical images that contain varying additive artifacts which are not in any existing data sets. In addition, Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are used as vehicles and show that the proposed method can reduce artifacts better than state-of-the-art both qualitatively and quantitatively using shorter test time.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning
Authors:
Rui Li,
Dong Pu,
Minnie Huang,
Bill Huang
Abstract:
One-shot voice cloning aims to transform speaker voice and speaking style in speech synthesized from a text-to-speech (TTS) system, where only a shot recording from the target reference speech can be used. Out-of-domain transfer is still a challenging task, and one important aspect that impacts the accuracy and similarity of synthetic speech is the conditional representations carrying speaker or s…
▽ More
One-shot voice cloning aims to transform speaker voice and speaking style in speech synthesized from a text-to-speech (TTS) system, where only a shot recording from the target reference speech can be used. Out-of-domain transfer is still a challenging task, and one important aspect that impacts the accuracy and similarity of synthetic speech is the conditional representations carrying speaker or style cues extracted from the limited references. In this paper, we present a novel one-shot voice cloning algorithm called Unet-TTS that has good generalization ability for unseen speakers and styles. Based on a skip-connected U-net structure, the new model can efficiently discover speaker-level and utterance-level spectral feature details from the reference audio, enabling accurate inference of complex acoustic characteristics as well as imitation of speaking styles into the synthetic speech. According to both subjective and objective evaluations of similarity, the new model outperforms both speaker embedding and unsupervised style modeling (GST) approaches on an unseen emotional corpus.
△ Less
Submitted 24 February, 2022; v1 submitted 22 September, 2021;
originally announced September 2021.
-
Hardware-aware Real-time Myocardial Segmentation Quality Control in Contrast Echocardiography
Authors:
Dewen Zeng,
Yukun Ding,
Haiyun Yuan,
Mei** Huang,
Xiaowei Xu,
Jian Zhuang,
**gtong Hu,
Yiyu Shi
Abstract:
Automatic myocardial segmentation of contrast echocardiography has shown great potential in the quantification of myocardial perfusion parameters. Segmentation quality control is an important step to ensure the accuracy of segmentation results for quality research as well as its clinical application. Usually, the segmentation quality control happens after the data acquisition. At the data acquisit…
▽ More
Automatic myocardial segmentation of contrast echocardiography has shown great potential in the quantification of myocardial perfusion parameters. Segmentation quality control is an important step to ensure the accuracy of segmentation results for quality research as well as its clinical application. Usually, the segmentation quality control happens after the data acquisition. At the data acquisition time, the operator could not know the quality of the segmentation results. On-the-fly segmentation quality control could help the operator to adjust the ultrasound probe or retake data if the quality is unsatisfied, which can greatly reduce the effort of time-consuming manual correction. However, it is infeasible to deploy state-of-the-art DNN-based models because the segmentation module and quality control module must fit in the limited hardware resource on the ultrasound machine while satisfying strict latency constraints. In this paper, we propose a hardware-aware neural architecture search framework for automatic myocardial segmentation and quality control of contrast echocardiography. We explicitly incorporate the hardware latency as a regularization term into the loss function during training. The proposed method searches the best neural network architecture for the segmentation module and quality prediction module with strict latency.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
ImageTBAD: A 3D Computed Tomography Angiography Image Dataset for Automatic Segmentation of Type-B Aortic Dissection
Authors:
Zeyang Yao,
Jiawei Zhang,
Hailong Qiu,
Tianchen Wang,
Yiyu Shi,
Jian Zhuang,
Yuhao Dong,
Mei** Huang,
Xiaowei Xu
Abstract:
Type-B Aortic Dissection (TBAD) is one of the most serious cardiovascular events characterized by a growing yearly incidence,and the severity of disease prognosis. Currently, computed tomography angiography (CTA) has been widely adopted for the diagnosis and prognosis of TBAD. Accurate segmentation of true lumen (TL), false lumen (FL), and false lumen thrombus (FLT) in CTA are crucial for the prec…
▽ More
Type-B Aortic Dissection (TBAD) is one of the most serious cardiovascular events characterized by a growing yearly incidence,and the severity of disease prognosis. Currently, computed tomography angiography (CTA) has been widely adopted for the diagnosis and prognosis of TBAD. Accurate segmentation of true lumen (TL), false lumen (FL), and false lumen thrombus (FLT) in CTA are crucial for the precise quantification of anatomical features. However, existing works only focus on only TL and FL without considering FLT. In this paper, we propose ImageTBAD, the first 3D computed tomography angiography (CTA) image dataset of TBAD with annotation of TL, FL, and FLT. The proposed dataset contains 100 TBAD CTA images, which is of decent size compared with existing medical imaging datasets. As FLT can appear almost anywhere along the aorta with irregular shapes, segmentation of FLT presents a wide class of segmentation problems where targets exist in a variety of positions with irregular shapes. We further propose a baseline method for automatic segmentation of TBAD. Results show that the baseline method can achieve comparable results with existing works on aorta and TL segmentation. However, the segmentation accuracy of FLT is only 52%, which leaves large room for improvement and also shows the challenge of our dataset. To facilitate further research on this challenging problem, our dataset and codes are released to the public.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Dynamic State Estimation for Integrated Natural Gas and Electric Power Systems
Authors:
Liang Chen,
Xinxin Hui,
Songlin Gu,
Manyun Huang,
Yang Li
Abstract:
A dynamic state estimation method of integrated natural gas and electric power systems (IGESs) in proposed. Firstly, the coupling model of gas pipeline networks and power systems by gas turbine units (GTUs) is established. Secondly, the Kalman filter based linear DSE model for the IGES is built. The gas density and mass flow rate, as well as the real and imaginary parts of bus voltages are taken a…
▽ More
A dynamic state estimation method of integrated natural gas and electric power systems (IGESs) in proposed. Firstly, the coupling model of gas pipeline networks and power systems by gas turbine units (GTUs) is established. Secondly, the Kalman filter based linear DSE model for the IGES is built. The gas density and mass flow rate, as well as the real and imaginary parts of bus voltages are taken as states, which are predicted by the linearized fluid dynamic equations of gases and exponential smoothing techniques. Boundary conditions of pipeline networks are used as supplementary constraints in the system model. At last, the proposed method is applied to an IGES including a 30-node pipeline network and IEEE 39-bus system coupled by two GTUs. Two indexes are used to evaluate the DSE performance under three measurement error conditions, and the results show that the DSE can obtain the accurate dynamic states in different conditions.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Beamforming and Transmit Power Design for Intelligent Reconfigurable Surface-aided Secure Spatial Modulation
Authors:
Feng Shu,
Xinyi Jiang,
Wenlong Cai,
Wei** Shi,
Mengxing Huang,
Jiangzhou Wang,
Xiaohu You
Abstract:
Intelligent reflecting surface (IRS) is a promising solution to build a programmable wireless environment for future communication systems, in which the reflector elements steer the incident signal in fully customizable ways by passive beamforming. In this paper, an IRS-aided secure spatial modulation (SM) is proposed, where the IRS perform passive beamforming and information transfer simultaneous…
▽ More
Intelligent reflecting surface (IRS) is a promising solution to build a programmable wireless environment for future communication systems, in which the reflector elements steer the incident signal in fully customizable ways by passive beamforming. In this paper, an IRS-aided secure spatial modulation (SM) is proposed, where the IRS perform passive beamforming and information transfer simultaneously by adjusting the on-off states of the reflecting elements. We formulate an optimization problem to maximize the average secrecy rate (SR) by jointly optimizing the passive beamforming at IRS and the transmit power at transmitter under the consideration that the direct pathes channels from transmitter to receivers are obstructed by obstacles. As the expression of SR is complex, we derive a newly fitting expression (NASR) for the expression of traditional approximate SR (TASR), which has simpler closed-form and more convenient for subsequent optimization. Based on the above two fitting expressions, three beamforming methods, called maximizing NASR via successive convex approximation (Max-NASR-SCA), maximizing NASR via dual ascent (Max-NASR-DA) and maximizing TASR via semi-definite relaxation (Max-TASR-SDR) are proposed to improve the SR performance. Additionally, two transmit power design (TPD) methods are proposed based on the above two approximate SR expressions, called Max-NASR-TPD and Max-TASR-TPD. Simulation results show that the proposed Max-NASR-DA and Max-NASR-SCA IRS beamformers harvest substantial SR performance gains over Max-TASR-SDR. For TPD, the proposed Max-NASR-TPD performs better than Max-TASR-TPD. Particularly, the Max-NASR-TPD has a closed-form solution.
△ Less
Submitted 21 October, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
EchoCP: An Echocardiography Dataset in Contrast Transthoracic Echocardiography for Patent Foramen Ovale Diagnosis
Authors:
Tianchen Wang,
Zhihe Li,
Mei** Huang,
Jian Zhuang,
Shanshan Bi,
Jiawei Zhang,
Yiyu Shi,
Hongwen Fei,
Xiaowei Xu
Abstract:
Patent foramen ovale (PFO) is a potential separation between the septum, primum and septum secundum located in the anterosuperior portion of the atrial septum. PFO is one of the main factors causing cryptogenic stroke which is the fifth leading cause of death in the United States. For PFO diagnosis, contrast transthoracic echocardiography (cTTE) is preferred as being a more robust method compared…
▽ More
Patent foramen ovale (PFO) is a potential separation between the septum, primum and septum secundum located in the anterosuperior portion of the atrial septum. PFO is one of the main factors causing cryptogenic stroke which is the fifth leading cause of death in the United States. For PFO diagnosis, contrast transthoracic echocardiography (cTTE) is preferred as being a more robust method compared with others. However, the current PFO diagnosis through cTTE is extremely slow as it is proceeded manually by sonographers on echocardiography videos. Currently there is no publicly available dataset for this important topic in the community. In this paper, we present EchoCP, as the first echocardiography dataset in cTTE targeting PFO diagnosis.
EchoCP consists of 30 patients with both rest and Valsalva maneuver videos which covers various PFO grades. We further establish an automated baseline method for PFO diagnosis based on the state-of-the-art cardiac chamber segmentation technique, which achieves 0.89 average mean Dice score, but only 0.60/0.67 mean accuracies for PFO diagnosis, leaving large room for improvement. We hope that the challenging EchoCP dataset can stimulate further research and lead to innovative and generic solutions that would have an impact in multiple domains. Our dataset is released.
△ Less
Submitted 15 September, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Multi-Cycle-Consistent Adversarial Networks for Edge Denoising of Computed Tomography Images
Authors:
Xiaowe Xu,
Jiawei Zhang,
**glan Liu,
Yukun Ding,
Tianchen Wang,
Hailong Qiu,
Haiyun Yuan,
Jian Zhuang,
Wen Xie,
Yuhao Dong,
Qianjun Jia,
Mei** Huang,
Yiyu Shi
Abstract:
As one of the most commonly ordered imaging tests, computed tomography (CT) scan comes with inevitable radiation exposure that increases the cancer risk to patients. However, CT image quality is directly related to radiation dose, thus it is desirable to obtain high-quality CT images with as little dose as possible. CT image denoising tries to obtain high dose like high-quality CT images (domain X…
▽ More
As one of the most commonly ordered imaging tests, computed tomography (CT) scan comes with inevitable radiation exposure that increases the cancer risk to patients. However, CT image quality is directly related to radiation dose, thus it is desirable to obtain high-quality CT images with as little dose as possible. CT image denoising tries to obtain high dose like high-quality CT images (domain X) from low dose low-quality CTimages (domain Y), which can be treated as an image-to-image translation task where the goal is to learn the transform between a source domain X (noisy images) and a target domain Y (clean images). In this paper, we propose a multi-cycle-consistent adversarial network (MCCAN) that builds intermediate domains and enforces both local and global cycle-consistency for edge denoising of CT images. The global cycle-consistency couples all generators together to model the whole denoising process, while the local cycle-consistency imposes effective supervision on the process between adjacent domains. Experiments show that both local and global cycle-consistency are important for the success of MCCAN, which outperformsCCADN in terms of denoising quality with slightly less computation resource consumption.
△ Less
Submitted 24 April, 2021;
originally announced April 2021.
-
Earnings-21: A Practical Benchmark for ASR in the Wild
Authors:
Miguel Del Rio,
Natalie Delworth,
Ryan Westerman,
Michelle Huang,
Nishchal Bhandari,
Joseph Palakapilly,
Quinten McNamara,
Joshua Dong,
Piotr Zelasko,
Miguel Jette
Abstract:
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special a…
▽ More
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special attention towards named entity recognition. We benchmark four commercial ASR models, two internal models built with open-source tools, and an open-source LibriSpeech model and discuss their differences in performance on Earnings-21. Using our recently released fstalign tool, we provide a candid analysis of each model's recognition capabilities under different partitions. Our analysis finds that ASR accuracy for certain NER categories is poor, presenting a significant impediment to transcript comprehension and usage. Earnings-21 bridges academic and commercial ASR system evaluation and enables further research on entity modeling and WER on real world audio.
△ Less
Submitted 15 June, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Accented Speech Recognition: A Survey
Authors:
Arthur Hinsvark,
Natalie Delworth,
Miguel Del Rio,
Quinten McNamara,
Joshua Dong,
Ryan Westerman,
Michelle Huang,
Joseph Palakapilly,
Jennifer Drexler,
Ilya Pirkin,
Nishchal Bhandari,
Miguel Jette
Abstract:
Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
We present a survey of current promising approaches to accented sp…
▽ More
Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
We present a survey of current promising approaches to accented speech recognition and highlight the key challenges in the space. Approaches mostly focus on single model generalization and accent feature engineering. Among the challenges, lack of a standard benchmark makes research and comparison especially difficult.
△ Less
Submitted 2 June, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.
-
Dopamine Transporter SPECT Image Classification for Neurodegenerative Parkinsonism via Diffusion Maps and Machine Learning Classifiers
Authors:
Jun-En Ding,
Chi-Hsiang Chu,
Mong-Na Lo Huang,
Chien-Ching Hsu
Abstract:
Neurodegenerative parkinsonism can be assessed by dopamine transporter single photon emission computed tomography (DaT-SPECT). Although generating images is time consuming, these images can show interobserver variability and they have been visually interpreted by nuclear medicine physicians to date. Accordingly, this study aims to provide an automatic and robust method based on Diffusion Maps and…
▽ More
Neurodegenerative parkinsonism can be assessed by dopamine transporter single photon emission computed tomography (DaT-SPECT). Although generating images is time consuming, these images can show interobserver variability and they have been visually interpreted by nuclear medicine physicians to date. Accordingly, this study aims to provide an automatic and robust method based on Diffusion Maps and machine learning classifiers to classify the SPECT images into two types, namely Normal and Abnormal DaT-SPECT image groups. In the proposed method, the 3D images of N patients are mapped to an N by N pairwise distance matrix and are visualized in Diffusion Maps coordinates. The images of the training set are embedded into a low-dimensional space by using diffusion maps. Moreover, we use Nyström's out-of-sample extension, which embeds new sample points as the testing set in the reduced space. Testing samples in the embedded space are then classified into two types through the ensemble classifier with Linear Discriminant Analysis (LDA) and voting procedure through twenty-five-fold cross-validation results. The feasibility of the method is demonstrated via Parkinsonism Progression Markers Initiative (PPMI) dataset of 1097 subjects and a clinical cohort from Kaohsiung Chang Gung Memorial Hospital (KCGMH-TW) of 630 patients. We compare performances using Diffusion Maps with those of three alternative manifold methods for dimension reduction, namely Locally Linear Embedding (LLE), Isomorphic Map** Algorithm (Isomap), and Kernel Principal Component Analysis (Kernel PCA). We also compare results using 2D and 3D CNN methods. The diffusion maps method has an average accuracy of 98% for the PPMI and 90% for the KCGMH-TW dataset with twenty-five fold cross-validation results. It outperforms the other three methods concerning the overall accuracy and the robustness in the training and testing samples.
△ Less
Submitted 7 May, 2021; v1 submitted 6 April, 2021;
originally announced April 2021.
-
The QXS-SAROPT Dataset for Deep Learning in SAR-Optical Data Fusion
Authors:
Meiyu Huang,
Yao Xu,
Lixin Qian,
Weili Shi,
Yaqin Zhang,
Wei Bao,
Nan Wang,
Xuejiao Liu,
Xueshuang Xiang
Abstract:
Deep learning techniques have made an increasing impact on the field of remote sensing. However, deep neural networks based fusion of multimodal data from different remote sensors with heterogenous characteristics has not been fully explored, due to the lack of availability of big amounts of perfectly aligned multi-sensor image data with diverse scenes of high resolutions, especially for synthetic…
▽ More
Deep learning techniques have made an increasing impact on the field of remote sensing. However, deep neural networks based fusion of multimodal data from different remote sensors with heterogenous characteristics has not been fully explored, due to the lack of availability of big amounts of perfectly aligned multi-sensor image data with diverse scenes of high resolutions, especially for synthetic aperture radar (SAR) data and optical imagery. To promote the development of deep learning based SAR-optical fusion approaches, we release the QXS-SAROPT dataset, which contains 20,000 pairs of SAR-optical image patches. We obtain the SAR patches from SAR satellite GaoFen-3 images and the optical patches from Google Earth images. These images cover three port cities: San Diego, Shanghai and Qingdao. Here, we present a detailed introduction of the construction of the dataset, and show its two representative exemplary applications, namely SAR-optical image matching and SAR ship detection boosted by cross-modal information from optical images. As a large open SAR-optical dataset with multiple scenes of a high resolution, we believe QXS-SAROPT will be of potential value for further research in SAR-optical data fusion technology based on deep learning.
△ Less
Submitted 25 April, 2021; v1 submitted 15 March, 2021;
originally announced March 2021.
-
Deep learning-based framework for cardiac function assessment in embryonic zebrafish from heart beating videos
Authors:
Amir Mohammad Naderi,
Haisong Bu,
**gcheng Su,
Mao-Hsiang Huang,
Khuong Vo,
Ramses Seferino Trigo Torres,
J. -C. Chiao,
Juhyun Lee,
Michael P. H. Lau,
Xiaolei Xu,
Hung Cao
Abstract:
Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validate…
▽ More
Zebrafish is a powerful and widely-used model system for a host of biological investigations including cardiovascular studies and genetic screening. Zebrafish are readily assessable during developmental stages; however, the current methods for quantification and monitoring of cardiac functions mostly involve tedious manual work and inconsistent estimations. In this paper, we developed and validated a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) based on a U-net deep learning model for automated assessment of cardiovascular indices, such as ejection fraction (EF) and fractional shortening (FS) from microscopic videos of wildtype and cardiomyopathy mutant zebrafish embryos. Our approach yielded favorable performance with accuracy above 90% compared with manual processing. We used only black and white regular microscopic recordings with frame rates of 5-20 frames per second (fps); thus, the framework could be widely applicable with any laboratory resources and infrastructure. Most importantly, the automatic feature holds promise to enable efficient, consistent and reliable processing and analysis capacity for large amounts of videos, which can be generated by diverse collaborating teams.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
ImageCHD: A 3D Computed Tomography Image Dataset for Classification of Congenital Heart Disease
Authors:
Xiaowei Xu,
Tianchen Wang,
Jian Zhuang,
Haiyun Yuan,
Mei** Huang,
Jianzheng Cen,
Qianjun Jia,
Yuhao Dong,
Yiyu Shi
Abstract:
Congenital heart disease (CHD) is the most common type of birth defect, which occurs 1 in every 110 births in the United States. CHD usually comes with severe variations in heart structure and great artery connections that can be classified into many types. Thus highly specialized domain knowledge and the time-consuming human process is needed to analyze the associated medical images. On the other…
▽ More
Congenital heart disease (CHD) is the most common type of birth defect, which occurs 1 in every 110 births in the United States. CHD usually comes with severe variations in heart structure and great artery connections that can be classified into many types. Thus highly specialized domain knowledge and the time-consuming human process is needed to analyze the associated medical images. On the other hand, due to the complexity of CHD and the lack of dataset, little has been explored on the automatic diagnosis (classification) of CHDs. In this paper, we present ImageCHD, the first medical image dataset for CHD classification. ImageCHD contains 110 3D Computed Tomography (CT) images covering most types of CHD, which is of decent size Classification of CHDs requires the identification of large structural changes without any local tissue changes, with limited data. It is an example of a larger class of problems that are quite difficult for current machine-learning-based vision methods to solve. To demonstrate this, we further present a baseline framework for the automatic classification of CHD, based on a state-of-the-art CHD segmentation method. Experimental results show that the baseline framework can only achieve a classification accuracy of 82.0\% under a selective prediction scheme with 88.4\% coverage, leaving big room for further improvement. We hope that ImageCHD can stimulate further research and lead to innovative and generic solutions that would have an impact in multiple domains. Our dataset is released to the public compared with existing medical imaging datasets.
△ Less
Submitted 11 May, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Myocardial Segmentation of Cardiac MRI Sequences with Temporal Consistency for Coronary Artery Disease Diagnosis
Authors:
Yutian Chen,
Xiaowei Xu,
Dewen Zeng,
Yiyu Shi,
Haiyun Yuan,
Jian Zhuang,
Yuhao Dong,
Qianjun Jia,
Mei** Huang
Abstract:
Coronary artery disease (CAD) is the most common cause of death globally, and its diagnosis is usually based on manual myocardial segmentation of Magnetic Resonance Imaging (MRI) sequences. As the manual segmentation is tedious, time-consuming and with low applicability, automatic myocardial segmentation using machine learning techniques has been widely explored recently. However, almost all the e…
▽ More
Coronary artery disease (CAD) is the most common cause of death globally, and its diagnosis is usually based on manual myocardial segmentation of Magnetic Resonance Imaging (MRI) sequences. As the manual segmentation is tedious, time-consuming and with low applicability, automatic myocardial segmentation using machine learning techniques has been widely explored recently. However, almost all the existing methods treat the input MRI sequences independently, which fails to capture the temporal information between sequences, e.g., the shape and location information of the myocardium in sequences along time. In this paper, we propose a myocardial segmentation framework for sequence of cardiac MRI (CMR) scanning images of left ventricular cavity, right ventricular cavity, and myocardium. Specifically, we propose to combine conventional networks and recurrent networks to incorporate temporal information between sequences to ensure temporal consistent. We evaluated our framework on the Automated Cardiac Diagnosis Challenge (ACDC) dataset. Experiment results demonstrate that our framework can improve the segmentation accuracy by up to 2% in Dice coefficient.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Distributed Robust State Estimation for Hybrid AC/DC Distribution Systems using Multi-Source Data
Authors:
Manyun Huang,
Junbo Zhao,
Zhinong Wei,
Marco Pau,
Guoqiang Sun
Abstract:
Hybrid AC/DC distribution systems are becoming a popular means to accommodate the increasing penetration of distributed energy resources and flexible loads. This paper proposes a distributed and robust state estimation (DRSE) method for hybrid AC/DC distribution systems using multiple sources of data. In the proposed distributed implementation framework, a unified robust linear state estimation mo…
▽ More
Hybrid AC/DC distribution systems are becoming a popular means to accommodate the increasing penetration of distributed energy resources and flexible loads. This paper proposes a distributed and robust state estimation (DRSE) method for hybrid AC/DC distribution systems using multiple sources of data. In the proposed distributed implementation framework, a unified robust linear state estimation model is derived for each AC and DC regions, where the regions are connected via AC/DC converters and only limited information exchange is needed. To enhance the estimation accuracy of the areas with low measurement coverage, a deep neural network (DNN) is used to extract hidden system statistical information and allow deriving nodal power injections that keep up with the real-time measurement update rate. This provides the way of integrating smart meter data, SCADA measurements and zero injections together for state estimation. Simulations on two hybrid AC/DC distribution systems show that the proposed DRSE has only slight accuracy loss by the linearization formulation but offers robustness of suppressing bad data automatically, as well as benefits of improving computational efficiency.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Do Noises Bother Human and Neural Networks In the Same Way? A Medical Image Analysis Perspective
Authors:
Shao-Cheng Wen,
Yu-Jen Chen,
Zihao Liu,
Wujie Wen,
Xiaowei Xu,
Yiyu Shi,
Tsung-Yi Ho,
Qianjun Jia,
Mei** Huang,
Jian Zhuang
Abstract:
Deep learning had already demonstrated its power in medical images, including denoising, classification, segmentation, etc. All these applications are proposed to automatically analyze medical images beforehand, which brings more information to radiologists during clinical assessment for accuracy improvement. Recently, many medical denoising methods had shown their significant artifact reduction r…
▽ More
Deep learning had already demonstrated its power in medical images, including denoising, classification, segmentation, etc. All these applications are proposed to automatically analyze medical images beforehand, which brings more information to radiologists during clinical assessment for accuracy improvement. Recently, many medical denoising methods had shown their significant artifact reduction result and noise removal both quantitatively and qualitatively. However, those existing methods are developed around human-vision, i.e., they are designed to minimize the noise effect that can be perceived by human eyes. In this paper, we introduce an application-guided denoising framework, which focuses on denoising for the following neural networks. In our experiments, we apply the proposed framework to different datasets, models, and use cases. Experimental results show that our proposed framework can achieve a better result than human-vision denoising network.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Improving RNN transducer with normalized jointer network
Authors:
Mingkun Huang,
Jun Zhang,
Meng Cai,
Yang Zhang,
Jiali Yao,
Yongbin You,
Yi He,
Zejun Ma
Abstract:
Recurrent neural transducer (RNN-T) is a promising end-to-end (E2E) model in automatic speech recognition (ASR). It has shown superior performance compared to traditional hybrid ASR systems. However, training RNN-T from scratch is still challenging. We observe a huge gradient variance during RNN-T training and suspect it hurts the performance. In this work, we analyze the cause of the huge gradien…
▽ More
Recurrent neural transducer (RNN-T) is a promising end-to-end (E2E) model in automatic speech recognition (ASR). It has shown superior performance compared to traditional hybrid ASR systems. However, training RNN-T from scratch is still challenging. We observe a huge gradient variance during RNN-T training and suspect it hurts the performance. In this work, we analyze the cause of the huge gradient variance in RNN-T training and proposed a new \textit{normalized jointer network} to overcome it. We also propose to enhance the RNN-T network with a modified conformer encoder network and transformer-XL predictor networks to achieve the best performance. Experiments are conducted on the open 170-hour AISHELL-1 and industrial-level 30000-hour mandarin speech dataset. On the AISHELL-1 dataset, our RNN-T system gets state-of-the-art results on AISHELL-1's streaming and non-streaming benchmark with CER 6.15\% and 5.37\% respectively. We further compare our RNN-T system with our well trained commercial hybrid system on 30000-hour-industry audio data and get 9\% relative improvement without pre-training or external language model.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Dynamic latency speech recognition with asynchronous revision
Authors:
Mingkun Huang,
Meng Cai,
Jun Zhang,
Yang Zhang,
Yongbin You,
Yi He,
Zejun Ma
Abstract:
In this work we propose an inference technique, asynchronous revision, to unify streaming and non-streaming speech recognition models. Specifically, we achieve dynamic latency with only one model by using arbitrary right context during inference. The model is composed of a stack of convolutional layers for audio encoding. In inference stage, the history states of encoder and decoder can be asynchr…
▽ More
In this work we propose an inference technique, asynchronous revision, to unify streaming and non-streaming speech recognition models. Specifically, we achieve dynamic latency with only one model by using arbitrary right context during inference. The model is composed of a stack of convolutional layers for audio encoding. In inference stage, the history states of encoder and decoder can be asynchronously revised to trade off between the latency and the accuracy of the model. To alleviate training and inference mismatch, we propose a training technique, segment crop**, which randomly splits input utterances into several segments with forward connections. This allows us to have dynamic latency speech recognition results with large improvements in accuracy. Experiments show that our dynamic latency model with asynchronous revision gives 8\%-14\% relative improvements over the streaming models.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Towards Cardiac Intervention Assistance: Hardware-aware Neural Architecture Exploration for Real-Time 3D Cardiac Cine MRI Segmentation
Authors:
Dewen Zeng,
Weiwen Jiang,
Tianchen Wang,
Xiaowei Xu,
Haiyun Yuan,
Mei** Huang,
Jian Zhuang,
**gtong Hu,
Yiyu Shi
Abstract:
Real-time cardiac magnetic resonance imaging (MRI) plays an increasingly important role in guiding various cardiac interventions. In order to provide better visual assistance, the cine MRI frames need to be segmented on-the-fly to avoid noticeable visual lag. In addition, considering reliability and patient data privacy, the computation is preferably done on local hardware. State-of-the-art MRI se…
▽ More
Real-time cardiac magnetic resonance imaging (MRI) plays an increasingly important role in guiding various cardiac interventions. In order to provide better visual assistance, the cine MRI frames need to be segmented on-the-fly to avoid noticeable visual lag. In addition, considering reliability and patient data privacy, the computation is preferably done on local hardware. State-of-the-art MRI segmentation methods mostly focus on accuracy only, and can hardly be adopted for real-time application or on local hardware. In this work, we present the first hardware-aware multi-scale neural architecture search (NAS) framework for real-time 3D cardiac cine MRI segmentation. The proposed framework incorporates a latency regularization term into the loss function to handle real-time constraints, with the consideration of underlying hardware. In addition, the formulation is fully differentiable with respect to the architecture parameters, so that stochastic gradient descent (SGD) can be used for optimization to reduce the computation cost while maintaining optimization quality. Experimental results on ACDC MICCAI 2017 dataset demonstrate that our hardware-aware multi-scale NAS framework can reduce the latency by up to 3.5 times and satisfy the real-time constraints, while still achieving competitive segmentation accuracy, compared with the state-of-the-art NAS segmentation framework.
△ Less
Submitted 13 December, 2020; v1 submitted 16 August, 2020;
originally announced August 2020.
-
Enhanced Secrecy Rate Maximization for Directional Modulation Networks via IRS
Authors:
Feng Shu,
Jiayu Li,
Mengxing Huang,
Wei** Shi,
Yin Teng,
Jun Li,
Yongpeng Wu,
Jiangzhou Wang
Abstract:
Intelligent reflecting surface (IRS) is of low-cost and energy-efficiency and will be a promising technology for the future wireless communications like sixth generation. To address the problem of conventional directional modulation (DM) that Alice only transmits single confidential bit stream (CBS) to Bob with multiple antennas in a line-of-sight channel, IRS is proposed to create friendly multip…
▽ More
Intelligent reflecting surface (IRS) is of low-cost and energy-efficiency and will be a promising technology for the future wireless communications like sixth generation. To address the problem of conventional directional modulation (DM) that Alice only transmits single confidential bit stream (CBS) to Bob with multiple antennas in a line-of-sight channel, IRS is proposed to create friendly multipaths for DM such that two CBSs can be transmitted from Alice to Bob. This will significantly enhance the secrecy rate (SR) of DM. To maximize the SR (Max-SR), a general non-convex optimization problem is formulated with the unit-modulus constraint of IRS phase-shift matrix (PSM), and the general alternating iterative (GAI) algorithm is proposed to jointly obtain the transmit beamforming vectors (TBVs) and PSM by alternately optimizing one and fixing another. To reduce its high complexity, a low-complexity iterative algorithm for Max-SR is proposed by placing the constraint of null-space (NS) on the TBVs, called NS projection (NSP). Here, each CBS is transmitted separately in the NSs of other CBS and AN channels. Simulation results show that the SRs of the proposed GAI and NSP can approximately double that of IRS-based DM with single CBS for massive IRS in the high signal-to-noise ratio region.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Modular End-to-end Automatic Speech Recognition Framework for Acoustic-to-word Model
Authors:
Qi Liu,
Zhehuai Chen,
Hao Li,
Mingkun Huang,
Yizhou Lu,
Kai Yu
Abstract:
End-to-end (E2E) systems have played a more and more important role in automatic speech recognition (ASR) and achieved great performance. However, E2E systems recognize output word sequences directly with the input acoustic feature, which can only be trained on limited acoustic data. The extra text data is widely used to improve the results of traditional artificial neural network-hidden Markov mo…
▽ More
End-to-end (E2E) systems have played a more and more important role in automatic speech recognition (ASR) and achieved great performance. However, E2E systems recognize output word sequences directly with the input acoustic feature, which can only be trained on limited acoustic data. The extra text data is widely used to improve the results of traditional artificial neural network-hidden Markov model (ANN-HMM) hybrid systems. The involving of extra text data to standard E2E ASR systems may break the E2E property during decoding. In this paper, a novel modular E2E ASR system is proposed. The modular E2E ASR system consists of two parts: an acoustic-to-phoneme (A2P) model and a phoneme-to-word (P2W) model. The A2P model is trained on acoustic data, while extra data including large scale text data can be used to train the P2W model. This additional data enables the modular E2E ASR system to model not only the acoustic part but also the language part. During the decoding phase, the two models will be integrated and act as a standard acoustic-to-word (A2W) model. In other words, the proposed modular E2E ASR system can be easily trained with extra text data and decoded in the same way as a standard E2E ASR system. Experimental results on the Switchboard corpus show that the modular E2E model achieves better word error rate (WER) than standard A2W models.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
TSIT: A Simple and Versatile Framework for Image-to-Image Translation
Authors:
Liming Jiang,
Changxu Zhang,
Mingyang Huang,
Chunxiao Liu,
Jian** Shi,
Chen Change Loy
Abstract:
We introduce a simple and versatile framework for image-to-image translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, perm…
▽ More
We introduce a simple and versatile framework for image-to-image translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, permitting our method to scale to various tasks in both unsupervised and supervised settings. No additional constraints (e.g., cycle consistency) are needed, contributing to a very clean and simple method. Multi-modal image synthesis with arbitrary style control is made possible. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations.
△ Less
Submitted 25 July, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
ICA-UNet: ICA Inspired Statistical UNet for Real-time 3D Cardiac Cine MRI Segmentation
Authors:
Tianchen Wang,
Xiaowei Xu,
**jun Xiong,
Qianjun Jia,
Haiyun Yuan,
Mei** Huang,
Jian Zhuang,
Yiyu Shi
Abstract:
Real-time cine magnetic resonance imaging (MRI) plays an increasingly important role in various cardiac interventions. In order to enable fast and accurate visual assistance, the temporal frames need to be segmented on-the-fly. However, state-of-the-art MRI segmentation methods are used either offline because of their high computation complexity, or in real-time but with significant accuracy loss…
▽ More
Real-time cine magnetic resonance imaging (MRI) plays an increasingly important role in various cardiac interventions. In order to enable fast and accurate visual assistance, the temporal frames need to be segmented on-the-fly. However, state-of-the-art MRI segmentation methods are used either offline because of their high computation complexity, or in real-time but with significant accuracy loss and latency increase (causing visually noticeable lag). As such, they can hardly be adopted to assist visual guidance. In this work, inspired by a new interpretation of Independent Component Analysis (ICA) for learning, we propose a novel ICA-UNet for real-time 3D cardiac cine MRI segmentation. Experiments using the MICCAI ACDC 2017 dataset show that, compared with the state-of-the-arts, ICA-UNet not only achieves higher Dice scores, but also meets the real-time requirements for both throughput and latency (up to 12.6X reduction), enabling real-time guidance for cardiac interventions without visual lag.
△ Less
Submitted 18 July, 2020;
originally announced July 2020.
-
I/Q Imbalance Aware Nonlinear Wireless-Powered Relaying of B5G Networks: Security and Reliability Analysis
Authors:
Xingwang Li,
Mengyan Huang,
Yuanwei Liu,
Varun G Menon,
Anand Paul,
Zhiguo Ding
Abstract:
Physical layer security is known as a promising paradigm to ensure security for the beyond 5G (B5G) networks in the presence of eavesdroppers. In this paper, we elaborate on a tractable analysis framework to evaluate the reliability and security of wireless-powered decode-and-forward (DF) multi-relay networks. The nonlinear energy harvesters, in-phase and quadrature-phase imbalance (IQI) and chann…
▽ More
Physical layer security is known as a promising paradigm to ensure security for the beyond 5G (B5G) networks in the presence of eavesdroppers. In this paper, we elaborate on a tractable analysis framework to evaluate the reliability and security of wireless-powered decode-and-forward (DF) multi-relay networks. The nonlinear energy harvesters, in-phase and quadrature-phase imbalance (IQI) and channel estimation errors (CEEs) are taken into account in the considered system. To further improve the secure performance, two relay selection strategies are presented: 1) suboptimal relay selection (SRS); 2) optimal relay selection (ORS). Specifically, exact analytical expressions for the outage probability (OP) and the intercept probability (IP) are derived in closed-form. For the IP, we consider that the eavesdropper can wiretap the signal from the source or the relay. In order to obtain more useful insights, we carry out the asymptotic analysis and diversity orders for the OP in the high signal-to-noise ratio (SNR) regime under non-ideal and ideal conditions. Numerical results show that: 1) Although the mismatches of amplitude/phase of transmitter (TX)/receiver (RX) limit the OP performance, it can enhance IP performance; 2) Large number of relays yields better OP performance; 3) There are error floors for the OP because of the CEEs; 4) There is a trade-off for the OP and IO to obtain the balance between reliability and security.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
LQG Graphon Mean Field Games: Analysis via Graphon Invariant Subspaces
Authors:
Shuang Gao,
Peter E. Caines,
Minyi Huang
Abstract:
This paper studies approximate solutions to large-scale linear quadratic stochastic games with homogeneous nodal dynamics parameters and heterogeneous network couplings within the graphon mean field game framework in [2]-[4]. A graphon time-varying dynamical system model is first formulated to study the finite and then limit problems of linear quadratic Gaussian graphon mean field games (LQG-GMFG)…
▽ More
This paper studies approximate solutions to large-scale linear quadratic stochastic games with homogeneous nodal dynamics parameters and heterogeneous network couplings within the graphon mean field game framework in [2]-[4]. A graphon time-varying dynamical system model is first formulated to study the finite and then limit problems of linear quadratic Gaussian graphon mean field games (LQG-GMFG). The Nash equilibrium of the limit problem is then characterized by two coupled graphon time-varying dynamical systems. Sufficient conditions are established for the existence of a unique solution to the limit LQG-GMFG problem. For the computation of LQG-GMFG solutions two methods are established and employed where one is based on fixed point iterations and the other on a decoupling operator Riccati equation; furthermore, two corresponding sets of solutions are established based on spectral decompositions. Finally, a set of numerical simulations on networks associated with different types of graphons are presented.
△ Less
Submitted 21 October, 2021; v1 submitted 1 April, 2020;
originally announced April 2020.
-
Multi-Cycle-Consistent Adversarial Networks for CT Image Denoising
Authors:
**glan Liu,
Yukun Ding,
**jun Xiong,
Qianjun Jia,
Mei** Huang,
Jian Zhuang,
Bike Xie,
Chun-Chen Liu,
Yiyu Shi
Abstract:
CT image denoising can be treated as an image-to-image translation task where the goal is to learn the transform between a source domain $X$ (noisy images) and a target domain $Y$ (clean images). Recently, cycle-consistent adversarial denoising network (CCADN) has achieved state-of-the-art results by enforcing cycle-consistent loss without the need of paired training data. Our detailed analysis of…
▽ More
CT image denoising can be treated as an image-to-image translation task where the goal is to learn the transform between a source domain $X$ (noisy images) and a target domain $Y$ (clean images). Recently, cycle-consistent adversarial denoising network (CCADN) has achieved state-of-the-art results by enforcing cycle-consistent loss without the need of paired training data. Our detailed analysis of CCADN raises a number of interesting questions. For example, if the noise is large leading to significant difference between domain $X$ and domain $Y$, can we bridge $X$ and $Y$ with an intermediate domain $Z$ such that both the denoising process between $X$ and $Z$ and that between $Z$ and $Y$ are easier to learn? As such intermediate domains lead to multiple cycles, how do we best enforce cycle-consistency? Driven by these questions, we propose a multi-cycle-consistent adversarial network (MCCAN) that builds intermediate domains and enforces both local and global cycle-consistency. The global cycle-consistency couples all generators together to model the whole denoising process, while the local cycle-consistency imposes effective supervision on the process between adjacent domains. Experiments show that both local and global cycle-consistency are important for the success of MCCAN, which outperforms the state-of-the-art.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Mean-Field Transmission Power Control in Dense Networks, Part II -- Social Welfare Evaluation
Authors:
Yuchi Wu,
Junfeng Wu,
Minyi Huang,
Ling Shi
Abstract:
We consider uplink power control in wireless communication when massive users compete over the channel resources. In Part I, we have formulated massive transmission power control contest in a mean-field game framework. In this part, our goal is to investigate whether the power-domain non-orthogonal multiple access (NOMA) protocol can regulate the non-cooperative channel access behaviors, i.e., ste…
▽ More
We consider uplink power control in wireless communication when massive users compete over the channel resources. In Part I, we have formulated massive transmission power control contest in a mean-field game framework. In this part, our goal is to investigate whether the power-domain non-orthogonal multiple access (NOMA) protocol can regulate the non-cooperative channel access behaviors, i.e., steering the competition among the non-cooperative users in a direction with improved efficiency and fairness. It is compared with the CDMA protocol, which drives each user to fiercely compete against the population, hence the efficiency of channel usage is sacrificed. The existence and uniqueness of an equilibrium strategy under CDMA and NOMA have already been characterized in Part I. In this paper, we adopt the social welfare of the population as the performance metric, which is defined as the expectation of utility over the distribution of different types of channel users. It is shown that under the corresponding equilibrium strategies, NOMA outperforms CDMA in the social welfare achieved, which is illustrated through simulation with different unit price for power consumption. Moreover, it can be observed from numerical results that NOMA can improve the fairness of the achieved data rates among different users.
△ Less
Submitted 8 May, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Mean-Field Transmission Power Control in Dense Networks
Authors:
Yuchi Wu,
Junfeng Wu,
Minyi Huang,
Ling Shi
Abstract:
We consider uplink power control in wireless communication when a large number of users compete over the channel resources. The CDMA protocol, as a supporting technology of 3G networks accommodating signal from different sources over the code domain, represents the orthogonal multiple access (OMA) techniques. With the development of 5G wireless networks, non-orthogonal multiple access (NOMA) is in…
▽ More
We consider uplink power control in wireless communication when a large number of users compete over the channel resources. The CDMA protocol, as a supporting technology of 3G networks accommodating signal from different sources over the code domain, represents the orthogonal multiple access (OMA) techniques. With the development of 5G wireless networks, non-orthogonal multiple access (NOMA) is introduced to improve the efficiency of channel allocation. Our goal is to investigate whether the power-domain NOMA protocol can introduce performance improvement when the users interact with each other in a non-cooperative manner. It is compared with the CDMA protocol, where the fierce competition among users jeopardizes the efficiency of channel usage. In this work, we conduct analysis with an aggregative game model, and show the existence and uniqueness of an equilibrium strategy. Next, we adopt the social welfare of the population as the performance metric, which is the average utility achieved by the user population. It is shown that under the corresponding equilibrium strategies, NOMA outperforms CDMA by higher efficiency of channel access for uplink communications.
△ Less
Submitted 28 November, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
MSU-Net: Multiscale Statistical U-Net for Real-time 3D Cardiac MRI Video Segmentation
Authors:
Tianchen Wang,
**jun Xiong,
Xiaowei Xu,
Meng Jiang,
Yiyu Shi,
Haiyun Yuan,
Mei** Huang,
Jian Zhuang
Abstract:
Cardiac magnetic resonance imaging (MRI) is an essential tool for MRI-guided surgery and real-time intervention. The MRI videos are expected to be segmented on-the-fly in real practice. However, existing segmentation methods would suffer from drastic accuracy loss when modified for speedup. In this work, we propose Multiscale Statistical U-Net (MSU-Net) for real-time 3D MRI video segmentation in c…
▽ More
Cardiac magnetic resonance imaging (MRI) is an essential tool for MRI-guided surgery and real-time intervention. The MRI videos are expected to be segmented on-the-fly in real practice. However, existing segmentation methods would suffer from drastic accuracy loss when modified for speedup. In this work, we propose Multiscale Statistical U-Net (MSU-Net) for real-time 3D MRI video segmentation in cardiac surgical guidance. Our idea is to model the input samples as multiscale canonical form distributions for speedup, while the spatio-temporal correlation is still fully utilized. A parallel statistical U-Net is then designed to efficiently process these distributions. The fast data sampling and efficient parallel structure of MSU-Net endorse the fast and accurate inference. Compared with vanilla U-Net and a modified state-of-the-art method GridNet, our method achieves up to 268% and 237% speedup with 1.6% and 3.6% increased Dice scores.
△ Less
Submitted 14 September, 2019;
originally announced September 2019.
-
Accurate Congenital Heart Disease Model Generation for 3D Printing
Authors:
Xiaowei Xu,
Tianchen Wang,
Dewen Zeng,
Yiyu Shi,
Qianjun Jia,
Haiyun Yuan,
Mei** Huang,
Jian Zhuang
Abstract:
3D printing has been widely adopted for clinical decision making and interventional planning of Congenital heart disease (CHD), while whole heart and great vessel segmentation is the most significant but time-consuming step in the model generation for 3D printing. While various automatic whole heart and great vessel segmentation frameworks have been developed in the literature, they are ineffectiv…
▽ More
3D printing has been widely adopted for clinical decision making and interventional planning of Congenital heart disease (CHD), while whole heart and great vessel segmentation is the most significant but time-consuming step in the model generation for 3D printing. While various automatic whole heart and great vessel segmentation frameworks have been developed in the literature, they are ineffective when applied to medical images in CHD, which have significant variations in heart structure and great vessel connections. To address the challenge, we leverage the power of deep learning in processing regular structures and that of graph algorithms in dealing with large variations and propose a framework that combines both for whole heart and great vessel segmentation in CHD. Particularly, we first use deep learning to segment the four chambers and myocardium followed by the blood pool, where variations are usually small. We then extract the connection information and apply graph matching to determine the categories of all the vessels. Experimental results using 683D CT images covering 14 types of CHD show that our method can increase Dice score by 11.9% on average compared with the state-of-the-art whole heart and great vessel segmentation method in normal anatomy. The segmentation results are also printed out using 3D printers for validation.
△ Less
Submitted 11 July, 2019; v1 submitted 6 July, 2019;
originally announced July 2019.
-
Deep Transfer Learning for Cross-domain Activity Recognition
Authors:
**dong Wang,
Vincent W. Zheng,
Yiqiang Chen,
Meiyu Huang
Abstract:
Human activity recognition plays an important role in people's daily life. However, it is often expensive and time-consuming to acquire sufficient labeled activity data. To solve this problem, transfer learning leverages the labeled samples from the source domain to annotate the target domain which has few or none labels. Unfortunately, when there are several source domains available, it is diffic…
▽ More
Human activity recognition plays an important role in people's daily life. However, it is often expensive and time-consuming to acquire sufficient labeled activity data. To solve this problem, transfer learning leverages the labeled samples from the source domain to annotate the target domain which has few or none labels. Unfortunately, when there are several source domains available, it is difficult to select the right source domains for transfer. The right source domain means that it has the most similar properties with the target domain, thus their similarity is higher, which can facilitate transfer learning. Choosing the right source domain helps the algorithm perform well and prevents the negative transfer. In this paper, we propose an effective Unsupervised Source Selection algorithm for Activity Recognition (USSAR). USSAR is able to select the most similar $K$ source domains from a list of available domains. After this, we propose an effective Transfer Neural Network to perform knowledge transfer for Activity Recognition (TNNAR). TNNAR could capture both the time and spatial relationship between activities while transferring knowledge. Experiments on three public activity recognition datasets demonstrate that: 1) The USSAR algorithm is effective in selecting the best source domains. 2) The TNNAR method can reach high accuracy when performing activity knowledge transfer.
△ Less
Submitted 19 August, 2018; v1 submitted 20 July, 2018;
originally announced July 2018.