Search | arXiv e-print repository

LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model

Authors: Mingyang Gao, Suyang Zhou, Wei Gu, Zhi Wu, Zijian Hu, Hong Zhu, Haiquan Liu

Abstract: Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent… ▽ More Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent years, pre-trained language models (PLMs) have made significant advancements, demonstrating superior performance in various fields. This paper proposes a load forecasting method based on PLMs, which offers not only accurate predictive ability but also general and flexible applicability. Additionally, a data modeling method is proposed to effectively transform load sequence data into natural language for PLM training. Furthermore, we introduce a data enhancement strategy that eliminate the impact of PLM hallucinations on forecasting results. The effectiveness of the proposed method has been validated on two real-world datasets. Compared with existing methods, our approach shows state-of-the-art performance across all validation metrics. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures and 5 tables

arXiv:2405.04177 [pdf]

doi 10.1109/CICC60959.2024.10528987

A 49.8mm2 Fully Integrated, 1.5m Transmission-Range, High-Data-Rate IR-UWB Transmitter for Brain Implants

Authors: Cong Ding, Mingxiang Gao, Anja K. Skrivervik, Mahsa Shoaran

Abstract: To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of… ▽ More To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of 49.8mm2 and the longest transmission range of 1.5m compared to the state-of-the-art IR-UWB TXs. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Journal ref: 2024 IEEE Custom Integrated Circuits Conference (CICC)

arXiv:2404.13941 [pdf, other]

Autoencoder-assisted Feature Ensemble Net for Incipient Faults

Authors: Mingxuan Gao, Min Wang, Maoyin Chen

Abstract: Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to… ▽ More Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to detect, lacking effective detection technologies in the field of fault detection. In this work, we propose Autoencoder-assisted Feature Ensemble Net (AE-FENet): a deep feature ensemble framework that uses the unsupervised autoencoder to conduct the feature transformation. Compared with the principle component analysis (PCA) technique adopted in the original Feature Ensemble Net (FENet), autoencoder can mine more exact features on incipient faults, which results in the better detection performance of AE-FENet. With same kinds of basic detectors, AE-FENet achieves a state-of-the-art average accuracy over 96% on faults 3, 9 and 15 in TEP, which represents a significant enhancement in performance compared to other methods. Plenty of experiments have been done to extend our framework, proving that DLNs can be utilized efficiently within this architecture. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.09436 [pdf]

Image Reconstruction with B0 Inhomogeneity using an Interpretable Deep Unrolled Network on an Open-bore MRI-Linac

Authors: Shanshan Shan, Yang Gao, David E. J. Waddington, Hongli Chen, Brendan Whelan, Paul Z. Y. Liu, Yaohui Wang, Chunyi Liu, Hong** Gan, Mingyuan Gao, Feng Liu

Abstract: MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free… ▽ More MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free images from B0 inhomogeneity-corrupted k-space for fast MRI-guided radiotherapy applications. RebinNet includes convolutional neural network (CNN) blocks to perform image regularizations and nonuniform fast Fourier Transform (NUFFT) modules to incorporate B0 inhomogeneity information. The RebinNet was trained on a publicly available MR dataset from eleven healthy volunteers for both fully sampled and subsampled acquisitions. Grid phantom and human brain images acquired from an open-bore 1T MRI-Linac scanner were used to evaluate the performance of the proposed network. The RebinNet was compared with the conventional regularization algorithm and our recently developed UnUNet method in terms of root mean squared error (RMSE), structural similarity (SSIM), residual distortions, and computation time. Imaging results demonstrated that the RebinNet reconstructed images with lowest RMSE (<0.05) and highest SSIM (>0.92) at four-time acceleration for simulated brain images. The RebinNet could better preserve structural details and substantially improve the computational efficiency (ten-fold faster) compared to the conventional regularization methods, and had better generalization ability than the UnUNet method. The proposed RebinNet can achieve rapid image reconstruction and overcome the B0 inhomogeneity distortions simultaneously, which would facilitate accurate and fast image guidance in radiotherapy treatments. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09226 [pdf, other]

Breast Cancer Image Classification Method Based on Deep Transfer Learning

Authors: Weimin Wang, Min Gao, Mingxuan Xiao, Xu Yan, Yufeng Li

Abstract: To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attenti… ▽ More To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attention mechanisms, and trains the enhanced dataset using multi-level transfer learning. Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0\% in the test set, with a significantly improved classification accuracy compared to previous models, making it applicable to medical breast cancer detection tasks. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08713 [pdf, other]

Survival Prediction Across Diverse Cancer Types Using Neural Networks

Authors: Xu Yan, Weimin Wang, MingXuan Xiao, Yufeng Li, Min Gao

Abstract: Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival… ▽ More Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival prediction models for gastric and Colon adenocarcinoma patients. Leveraging advanced image analysis techniques, we sliced whole slide images (WSI) of these cancers, extracting comprehensive features to capture nuanced tumor characteristics. Subsequently, we constructed patient-level graphs, encapsulating intricate spatial relationships within tumor tissues. These graphs served as inputs for a sophisticated 4-layer graph convolutional neural network (GCN), designed to exploit the inherent connectivity of the data for comprehensive analysis and prediction. By integrating patients' total survival time and survival status, we computed C-index values for gastric cancer and Colon adenocarcinoma, yielding 0.57 and 0.64, respectively. Significantly surpassing previous convolutional neural network models, these results underscore the efficacy of our approach in accurately predicting patient survival outcomes. This research holds profound implications for both the medical and AI communities, offering insights into cancer biology and progression while advancing personalized treatment strategies. Ultimately, our study represents a significant stride in leveraging AI-driven methodologies to revolutionize cancer prognosis and improve patient outcomes on a global scale. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.08279 [pdf, other]

Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example

Authors: MingXuan Xiao, Yufeng Li, Xu Yan, Min Gao, Weimin Wang

Abstract: Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast… ▽ More Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast pathological image classification, this paper introduces an approach utilizing convolutional neural networks (CNNs) for the rapid categorization of pathological images, aiming to enhance the efficiency of breast pathological image detection. And the approach enables the rapid and automatic classification of pathological images into benign and malignant groups. The methodology involves utilizing a convolutional neural network (CNN) model leveraging the Inceptionv3 architecture and transfer learning algorithm for extracting features from pathological images. Utilizing a neural network with fully connected layers and employing the SoftMax function for image classification. Additionally, the concept of image partitioning is introduced to handle high-resolution images. To achieve the ultimate classification outcome, the classification probabilities of each image block are aggregated using three algorithms: summation, product, and maximum. Experimental validation was conducted on the BreaKHis public dataset, resulting in accuracy rates surpassing 0.92 across all four magnification coefficients (40X, 100X, 200X, and 400X). It demonstrates that the proposed method effectively enhances the accuracy in classifying pathological images of breast cancer. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2401.12004 [pdf]

NLCG-Net: A Model-Based Zero-Shot Learning Framework for Undersampled Quantitative MRI Reconstruction

Authors: Xinrui Jiang, Yohan Jun, Jae** Cho, Mengze Gao, Xingwang Yong, Berkin Bilgic

Abstract: Typical quantitative MRI (qMRI) methods estimate parameter maps after image reconstructing, which is prone to biases and error propagation. We propose a Nonlinear Conjugate Gradient (NLCG) optimizer for model-based T2/T1 estimation, which incorporates U-Net regularization trained in a scan-specific manner. This end-to-end method directly estimates qMRI maps from undersampled k-space data using mon… ▽ More Typical quantitative MRI (qMRI) methods estimate parameter maps after image reconstructing, which is prone to biases and error propagation. We propose a Nonlinear Conjugate Gradient (NLCG) optimizer for model-based T2/T1 estimation, which incorporates U-Net regularization trained in a scan-specific manner. This end-to-end method directly estimates qMRI maps from undersampled k-space data using mono-exponential signal modeling with zero-shot scan-specific neural network regularization to enable high fidelity T1 and T2 map**. T2 and T1 map** results demonstrate the ability of the proposed NLCG-Net to improve estimation quality compared to subspace reconstruction at high accelerations. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 8 pages, 5 figures, submitted to International Society for Magnetic Resonance in Medicine 2024

arXiv:2312.09488 [pdf]

Sequence adaptive field-imperfection estimation (SAFE): retrospective estimation and correction of $B_1^+$ and $B_0$ inhomogeneities for enhanced MRF quantification

Authors: Mengze Gao, Xiaozhi Cao, Daniel Abraham, Zihan Zhou, Kawin Setsompop

Abstract: $B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and $B_0$ calibration scans can mitigate this but add scan time and cannot be applied retrospectively to previously collected data. Here, we proposed a calibration-free sequence-adaptive deep-learning framework, to estimate and correct for $B_1^+… ▽ More $B_1^+$ and $B_0$ field-inhomogeneities can significantly reduce accuracy and robustness of MRF's quantitative parameter estimates. Additional $B_1^+$ and $B_0$ calibration scans can mitigate this but add scan time and cannot be applied retrospectively to previously collected data. Here, we proposed a calibration-free sequence-adaptive deep-learning framework, to estimate and correct for $B_1^+$ and $B_0$ effects of any MRF sequence. We demonstrate its capability on arbitrary MRF sequences at 3T, where no training data were previously obtained. Such approach can be applied to any previously-acquired and future MRF-scans. The flexibility in directly applying this framework to other quantitative sequences is also highlighted. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 12 pages, 5 figures, submitted to International Society for Magnetic Resonance in Medicine 31th Scientific Meeting, 2024

arXiv:2311.16652 [pdf, other]

Augmenting x-ray single particle imaging reconstruction with self-supervised machine learning

Authors: Zhantao Chen, Cong Wang, Mingye Gao, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner

Abstract: The development of X-ray Free Electron Lasers (XFELs) has opened numerous opportunities to probe atomic structure and ultrafast dynamics of various materials. Single Particle Imaging (SPI) with XFELs enables the investigation of biological particles in their natural physiological states with unparalleled temporal resolution, while circumventing the need for cryogenic conditions or crystallization.… ▽ More The development of X-ray Free Electron Lasers (XFELs) has opened numerous opportunities to probe atomic structure and ultrafast dynamics of various materials. Single Particle Imaging (SPI) with XFELs enables the investigation of biological particles in their natural physiological states with unparalleled temporal resolution, while circumventing the need for cryogenic conditions or crystallization. However, reconstructing real-space structures from reciprocal-space x-ray diffraction data is highly challenging due to the absence of phase and orientation information, which is further complicated by weak scattering signals and considerable fluctuations in the number of photons per pulse. In this work, we present an end-to-end, self-supervised machine learning approach to recover particle orientations and estimate reciprocal space intensities from diffraction images only. Our method demonstrates great robustness under demanding experimental conditions with significantly enhanced reconstruction capabilities compared with conventional algorithms, and signifies a paradigm shift in SPI as currently practiced at XFELs. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.08075 [pdf, ps, other]

GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy

Authors: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu

Abstract: Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis… ▽ More Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of microangioma lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze map, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting microangioma lesions. Finally, a domain knowledge filter refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i.e., IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and enhances segmentation performance through fine-tuning using annotations. This study highlights the potential of GlanceSeg-based annotations for self-model optimization, leading to enduring performance advancements through continual learning. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 12 pages, 10 figures

arXiv:2310.15930 [pdf, other]

CDSD: Chinese Dysarthria Speech Database

Authors: Mengyi Sun, Ming Gao, Xinchen Kang, Shiru Wang, Jun Du, Dengfeng Yao, Su-**g Wang

Abstract: We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text poo… ▽ More We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 9 pages, 3 figures

arXiv:2310.12429 [pdf, other]

Reconfigurable Intelligent Surface Assisted High-Speed Train Communications: Coverage Performance Analysis and Placement Optimization

Authors: Changzhu Liu, Ruisi He, Yong Niu, Zhu Han, Bo Ai, Meilin Gao, Zhangfeng Ma, Gongpu Wang, Zhangdui Zhong

Abstract: Reconfigurable intelligent surface (RIS) emerges as an efficient and promising technology for the next wireless generation networks and has attracted a lot of attention owing to the capability of extending wireless coverage by reflecting signals toward targeted receivers. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve… ▽ More Reconfigurable intelligent surface (RIS) emerges as an efficient and promising technology for the next wireless generation networks and has attracted a lot of attention owing to the capability of extending wireless coverage by reflecting signals toward targeted receivers. In this paper, we consider a RIS-assisted high-speed train (HST) communication system to enhance wireless coverage and improve coverage probability. First, coverage performance of the downlink single-input-single-output system is investigated, and the closed-form expression of coverage probability is derived. Moreover, travel distance maximization problem is formulated to facilitate RIS discrete phase design and RIS placement optimization, which is subject to coverage probability constraint. Simulation results validate that better coverage performance and higher travel distance can be achieved with deployment of RIS. The impacts of some key system parameters including transmission power, signal-to-noise ratio threshold, number of RIS elements, number of RIS quantization bits, horizontal distance between base station and RIS, and speed of HST on system performance are investigated. In addition, it is found that RIS can well improve coverage probability with limited power consumption for HST communications. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 14 figures, accepted by IEEE Transactions on Vehicular Technology

arXiv:2305.12311 [pdf, other]

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Authors: Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Abstract: The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a… ▽ More The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals. △ Less

Submitted 20 May, 2023; originally announced May 2023.

arXiv:2304.12719 [pdf, ps, other]

Eye tracking guided deep multiple instance learning with dual cross-attention for fundus disease detection

Authors: Hongyang Jiang, **gqi Huang, Chen Tang, Xiaoqing Zhang, Mengdi Gao, Jiang Liu

Abstract: Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, hel** ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by l… ▽ More Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, hel** ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by leveraging ophthalmologists' eye-tracking information, which is more efficient and accurate. Concretely, the HITL CAD system was implemented on the multiple instance learning (MIL), where eye-tracking gaze maps were beneficial to cherry-pick diagnosis-related instances. Furthermore, the dual-cross-attention MIL (DCAMIL) network was utilized to curb the adverse effects of noisy instances. Meanwhile, both sequence augmentation module and domain adversarial module were introduced to enrich and standardize instances in the training bag, respectively, thereby enhancing the robustness of our method. We conduct comparative experiments on our newly constructed datasets (namely, AMD-Gaze and DR-Gaze), respectively for the AMD and early DR detection. Rigorous experiments demonstrate the feasibility of our HITL CAD system and the superiority of the proposed DCAMIL, fully exploring the ophthalmologists' eye-tracking information. These investigations indicate that physicians' gaze maps, as medical prior knowledge, is potential to contribute to the CAD systems of clinical diseases. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 10 pages, 9 figures

MSC Class: none

arXiv:2212.00532 [pdf, other]

EBHI-Seg: A Novel Enteroscope Biopsy Histopathological Haematoxylin and Eosin Image Dataset for Image Segmentation Tasks

Authors: Liyu Shi, Xiaoyan Li, Weiming Hu, Haoyuan Chen, **g Chen, Zizhen Fan, Minghe Gao, Yujie **g, Guotao Lu, Deguo Ma, Zhiyu Ma, Qingtao Meng, Dechao Tang, Hongzan Sun, Marcin Grzegorzek, Shouliang Qi, Yueyang Teng, Chen Li

Abstract: Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when comp… ▽ More Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients. △ Less

Submitted 6 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2212.00337 [pdf, other]

Fault Models in Superconducting quantum circuits

Authors: Qifan Huang, Boxi Li, Minbo Gao, Mingsheng Ying

Abstract: Fault models are indispensable for many EDA tasks, so as for design and implementation of quantum hardware. In this article, we propose a fault model for superconducting quantum systems. Our fault model reflects the real fault behavior in control signals and structure of quantum systems. Based on it, we conduct fault simulation on controlled-Z gate and quantum circuits by QuTiP. We provide fidelit… ▽ More Fault models are indispensable for many EDA tasks, so as for design and implementation of quantum hardware. In this article, we propose a fault model for superconducting quantum systems. Our fault model reflects the real fault behavior in control signals and structure of quantum systems. Based on it, we conduct fault simulation on controlled-Z gate and quantum circuits by QuTiP. We provide fidelity benchmarks for incoherent faults and test patterns of minimal test repetitions for coherent faults. Results show that with 34 test repetitions a 10% control noise can be detected, which help to save test time and memory. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 7 pages, 10 figures

arXiv:2205.13133 [pdf, other]

Coverage Probability Analysis of RIS-Assisted High-Speed Train Communications

Authors: Changzhu Liu, Ruisi He, Yong Niu, Bo Ai, Zhu Han, Zhangfeng Ma, Meilin Gao, Zhangdui Zhong, Ning Wang

Abstract: Reconfigurable intelligent surface (RIS) has received increasing attention due to its capability of extending cell coverage by reflecting signals toward receivers. This paper considers a RIS-assisted high-speed train (HST) communication system to improve the coverage probability. We derive the closed-form expression of coverage probability. Moreover, we analyze impacts of some key system parameter… ▽ More Reconfigurable intelligent surface (RIS) has received increasing attention due to its capability of extending cell coverage by reflecting signals toward receivers. This paper considers a RIS-assisted high-speed train (HST) communication system to improve the coverage probability. We derive the closed-form expression of coverage probability. Moreover, we analyze impacts of some key system parameters, including transmission power, signal-to-noise ratio threshold, and horizontal distance between base station and RIS. Simulation results verify the efficiency of RIS-assisted HST communications in terms of coverage probability. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 6 pages, 6 figures,submmited to GlobeCom 2022

arXiv:2205.11743 [pdf]

doi 10.1016/j.engappai.2023.106060

Demand Response Method Considering Multiple Types of Flexible Loads in Industrial Parks

Authors: Jia Cui, Mingze Gao, Xiaoming Zhou, Yang Li, Wei Liu, Jiazheng Tian, Ximing Zhang

Abstract: With the rapid development of the energy internet, the proportion of flexible loads in smart grid is getting much higher than before. It is highly important to model flexible loads based on demand response. Therefore, a new demand response method considering multiple flexible loads is proposed in this paper to character the integrated demand response (IDR) resources. Firstly, a physical process an… ▽ More With the rapid development of the energy internet, the proportion of flexible loads in smart grid is getting much higher than before. It is highly important to model flexible loads based on demand response. Therefore, a new demand response method considering multiple flexible loads is proposed in this paper to character the integrated demand response (IDR) resources. Firstly, a physical process analytical deduction (PPAD) model is proposed to improve the classification of flexible loads in industrial parks. Scenario generation, data point augmentation, and smooth curves under various operating conditions are considered to enhance the applicability of the model. Secondly, in view of the strong volatility and poor modeling effect of Wasserstein-generative adversarial networks (WGAN), an improved WGAN-gradient penalty (IWGAN-GP) model is developed to get a faster convergence speed than traditional WGAN and generate a higher quality samples. Finally, the PPAD and IWGAN-GP models are jointly implemented to reveal the degree of correlation between flexible loads. Meanwhile, an intelligent offline database is built to deal with the impact of nonlinear factors in different response scenarios. Numerical examples have been performed with the results proving that the proposed method is significantly better than the existing technologies in reducing load modeling deviation and improving the responsiveness of park loads. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: Submitted to Expert Systems with Applications

Journal ref: Engineering Applications of Artificial Intelligence 122 (2023) 106060

arXiv:2205.01818 [pdf, other]

i-Code: An Integrative and Composable Multimodal Learning Framework

Authors: Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Abstract: Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. I… ▽ More Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining. △ Less

Submitted 5 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2203.03737 [pdf, other]

Battery Cloud with Advanced Algorithms

Authors: Xiaojun Li, David Jauernig, Mengzhu Gao, Trevor Jones

Abstract: A Battery Cloud or cloud battery management system leverages the cloud computational power and data storage to improve battery safety, performance, and economy. This work will present the Battery Cloud that collects measured battery data from electric vehicles and energy storage systems. Advanced algorithms are applied to improve battery performance. Using remote vehicle data, we train and validat… ▽ More A Battery Cloud or cloud battery management system leverages the cloud computational power and data storage to improve battery safety, performance, and economy. This work will present the Battery Cloud that collects measured battery data from electric vehicles and energy storage systems. Advanced algorithms are applied to improve battery performance. Using remote vehicle data, we train and validate an artificial neural network to estimate pack SOC during vehicle charging. The strategy is then tested on vehicles. Furthermore, high accuracy and onboard battery state of health estimation methods for electric vehicles are developed based on the differential voltage (DVA) and incremental capacity analysis (ICA). Using cycling data from battery cells at various temperatures, we extract the charging cycles and calculate the DVA and ICA curves, from which multiple features are extracted, analyzed, and eventually used to estimate the state of health. For battery safety, a data-driven thermal anomaly detection method is developed. The method can detect unforeseen anomalies such as thermal runaways at the very early stage. With the further development of the internet of things, more and more battery data will be available. Potential applications of battery cloud also include areas such as battery manufacture, recycling, and electric vehicle battery swap. △ Less

Submitted 12 May, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

arXiv:2112.03815 [pdf]

Accurate parameter estimation using scan-specific unsupervised deep learning for relaxometry and MR fingerprinting

Authors: Mengze Gao, Huihui Ye, Tae Hyung Kim, Zi**g Zhang, Seohee So, Berkin Bilgic

Abstract: We propose an unsupervised convolutional neural network (CNN) for relaxation parameter estimation. This network incorporates signal relaxation and Bloch simulations while taking advantage of residual learning and spatial relations across neighboring voxels. Quantification accuracy and robustness to noise is shown to be significantly improved compared to standard parameter estimation methods in num… ▽ More We propose an unsupervised convolutional neural network (CNN) for relaxation parameter estimation. This network incorporates signal relaxation and Bloch simulations while taking advantage of residual learning and spatial relations across neighboring voxels. Quantification accuracy and robustness to noise is shown to be significantly improved compared to standard parameter estimation methods in numerical simulations and in vivo data for multi-echo T2 and T2* map**. The combination of the proposed network with subspace modeling and MR fingerprinting (MRF) from highly undersampled data permits high quality T1 and T2 map**. △ Less

Submitted 12 December, 2021; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: 7 pages, 5 figures, submitted to International Society for Magnetic Resonance in Medicine 2022

arXiv:2111.15636 [pdf]

Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data

Authors: Jun Ma, Huanfeng Shen, Penghai Wu, **gan Wu, Meiling Gao, Chunlei Meng

Abstract: Land surface temperature (LST) is a key parameter when monitoring land surface processes. However, cloud contamination and the tradeoff between the spatial and temporal resolutions greatly impede the access to high-quality thermal infrared (TIR) remote sensing data. Despite the massive efforts made to solve these dilemmas, it is still difficult to generate LST estimates with concurrent spatial com… ▽ More Land surface temperature (LST) is a key parameter when monitoring land surface processes. However, cloud contamination and the tradeoff between the spatial and temporal resolutions greatly impede the access to high-quality thermal infrared (TIR) remote sensing data. Despite the massive efforts made to solve these dilemmas, it is still difficult to generate LST estimates with concurrent spatial completeness and a high spatio-temporal resolution. Land surface models (LSMs) can be used to simulate gapless LST with a high temporal resolution, but this usually comes with a low spatial resolution. In this paper, we present an integrated temperature fusion framework for satellite-observed and LSM-simulated LST data to map gapless LST at a 60-m spatial resolution and half-hourly temporal resolution. The global linear model (GloLM) model and the diurnal land surface temperature cycle (DTC) model are respectively performed as preprocessing steps for sensor and temporal normalization between the different LST data. The Landsat LST, Moderate Resolution Imaging Spectroradiometer (MODIS) LST, and Community Land Model Version 5.0 (CLM 5.0)-simulated LST are then fused using a filter-based spatio-temporal integrated fusion model. Evaluations were implemented in an urban-dominated region (the city of Wuhan in China) and a natural-dominated region (the Heihe River Basin in China), in terms of accuracy, spatial variability, and diurnal temporal dynamics. Results indicate that the fused LST is highly consistent with actual Landsat LST data (in situ LST measurements), in terms of a Pearson correlation coefficient of 0.94 (0.97-0.99), a mean absolute error of 0.71-0.98 K (0.82-3.17 K), and a root-mean-square error of 0.97-1.26 K (1.09-3.97 K). △ Less

Submitted 28 November, 2021; originally announced November 2021.

arXiv:2109.01949 [pdf, other]

Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment

Authors: Zhanghexuan Ji, Mohammad Abuzar Shaikh, Dana Moukheiber, Sargur Srihari, Yifan Peng, Mingchen Gao

Abstract: Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level… ▽ More Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their map**. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR △ Less

Submitted 4 September, 2021; originally announced September 2021.

Comments: 10 Pages, 1 Figure, 3 Tables, Accepted in 12th Machine Learning in Medical Imaging (MLMI 2021) workshop

arXiv:2108.03026 [pdf]

The Influence of Age and Gender Information on the Diagnosis of Diabetic Retinopathy: Based on Neural Networks

Authors: Long Bai, Sihang Chen, Mingyang Gao, Leila Abdelrahman, Manal Al Ghamdi, Mohamed Abdel-Mottaleb

Abstract: This paper proposes the importance of age and gender information in the diagnosis of diabetic retinopathy. We utilized Deep Residual Neural Networks (ResNet) and Densely Connected Convolutional Networks (DenseNet), which are proven effective on image classification problems and the diagnosis of diabetic retinopathy using the retinal fundus images. We used the ensemble of several classical networks… ▽ More This paper proposes the importance of age and gender information in the diagnosis of diabetic retinopathy. We utilized Deep Residual Neural Networks (ResNet) and Densely Connected Convolutional Networks (DenseNet), which are proven effective on image classification problems and the diagnosis of diabetic retinopathy using the retinal fundus images. We used the ensemble of several classical networks and decentralized the training so that the network was simple and avoided overfitting. To observe whether the age and gender information could help enhance the performance, we added the information before the dense layer and compared the results with the results that did not add age and gender information. We found that the test accuracy of the network with age and gender information was 2.67% higher than that of the network without age and gender information. Meanwhile, compared with gender information, age information had a better help for the results. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: 4 pages, 4 figures, Accepted in 43rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE EMBC 2021

arXiv:2105.03358 [pdf, other]

Soft-Attention Improves Skin Cancer Classification Performance

Authors: Soumyya Kanti Datta, Mohammad Abuzar Shaikh, Sargur N. Srihari, Mingchen Gao

Abstract: In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network toachieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important features and suppress the noise-inducing features. We co… ▽ More In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network toachieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important features and suppress the noise-inducing features. We compare the performance of VGG, ResNet, InceptionResNetv2 and DenseNet architectures with and without the Soft-Attention mechanism, while classifying skin lesions. The original network when coupled with Soft-Attention outperforms the baseline[16] by 4.7% while achieving a precision of 93.7% on HAM10000 dataset [25]. Additionally, Soft-Attention coupling improves the sensitivity score by 3.8% compared to baseline[31] and achieves 91.6% on ISIC-2017 dataset [2]. The code is publicly available at github. △ Less

Submitted 4 June, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: 8 pages, 9 figures, 4 tables

arXiv:2104.00086 [pdf, other]

An Online Survey on the Perception of Mediated Social Touch Interaction and Device Design

Authors: Carine Rognon, Taylor Bunge, Meiyuzi Gao, Chip Connor, Benjamin Stephens-Fripp, Casey Brown, Ali Israr

Abstract: Social touch is essential for our social interactions, communication, and well-being. It has been shown to reduce anxiety and loneliness; and is a key channel to transmit emotions for which words are not sufficient, such as love, sympathy, reassurance, etc. However, direct physical contact is not always possible due to being remotely located, interacting in a virtual environment, or as a result of… ▽ More Social touch is essential for our social interactions, communication, and well-being. It has been shown to reduce anxiety and loneliness; and is a key channel to transmit emotions for which words are not sufficient, such as love, sympathy, reassurance, etc. However, direct physical contact is not always possible due to being remotely located, interacting in a virtual environment, or as a result of a health issue. Mediated social touch enables physical interactions, despite the distance, by transmitting the haptic cues that constitute social touch through devices. As this technology is fairly new, the users' needs and their expectations on a device design and its features are unclear, as well as who would use this technology, and in which conditions. To better understand these aspects of the mediated interaction, we conducted an online survey on 258 respondents located in the USA. Results give insights on the type of interactions and device features that the US population would like to use. △ Less

Submitted 31 March, 2021; originally announced April 2021.

arXiv:2010.04049 [pdf, other]

doi 10.1007/978-3-030-59725-2_48

Hierarchical Classification of Pulmonary Lesions: A Large-Scale Radio-Pathomics Study

Authors: Jiancheng Yang, Mingze Gao, Kaiming Kuang, Bingbing Ni, Yunlang She, Dong Xie, Chang Chen

Abstract: Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases. Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis. Considering that invasive pathological analysis serves as… ▽ More Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases. Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis. Considering that invasive pathological analysis serves as the clinical golden standard of lung cancer diagnosis, in this study, we solve the label ambiguity issue via a large-scale radio-pathomics dataset containing 5,134 radiological CT images with pathologically confirmed labels, including cancers (e.g., invasive/non-invasive adenocarcinoma, squamous carcinoma) and non-cancer diseases (e.g., tuberculosis, hamartoma). This retrospective dataset, named Pulmonary-RadPath, enables development and validation of accurate deep learning systems to predict invasive pathological labels with a non-invasive procedure, i.e., radiological CT scans. A three-level hierarchical classification system for pulmonary lesions is developed, which covers most diseases in cancer-related diagnosis. We explore several techniques for hierarchical classification on this dataset, and propose a Leaky Dense Hierarchy approach with proven effectiveness in experiments. Our study significantly outperforms prior arts in terms of data scales (6x larger), disease comprehensiveness and hierarchies. The promising results suggest the potentials to facilitate precision medicine. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: MICCAI 2020 (Early Accepted)

arXiv:2008.07263 [pdf, other]

MLBF-Net: A Multi-Lead-Branch Fusion Network for Multi-Class Arrhythmia Classification Using 12-Lead ECG

Authors: **g Zhang, Deng Liang, Ai** Liu, Min Gao, Xiang Chen, Xu Zhang, Xun Chen

Abstract: Automatic arrhythmia detection using 12-lead electrocardiogram (ECG) signal plays a critical role in early prevention and diagnosis of cardiovascular diseases. In the previous studies on automatic arrhythmia detection, most methods concatenated 12 leads of ECG into a matrix, and then input the matrix to a variety of feature extractors or deep neural networks for extracting useful information. Unde… ▽ More Automatic arrhythmia detection using 12-lead electrocardiogram (ECG) signal plays a critical role in early prevention and diagnosis of cardiovascular diseases. In the previous studies on automatic arrhythmia detection, most methods concatenated 12 leads of ECG into a matrix, and then input the matrix to a variety of feature extractors or deep neural networks for extracting useful information. Under such frameworks, these methods had the ability to extract comprehensive features (known as integrity) of 12-lead ECG since the information of each lead interacts with each other during training. However, the diverse lead-specific features (known as diversity) among 12 leads were neglected, causing inadequate information learning for 12-lead ECG. To maximize the information learning of multi-lead ECG, the information fusion of comprehensive features with integrity and lead-specific features with diversity should be taken into account. In this paper, we propose a novel Multi-Lead-Branch Fusion Network (MLBF-Net) architecture for arrhythmia classification by integrating multi-loss optimization to jointly learning diversity and integrity of multi-lead ECG. MLBF-Net is composed of three components: 1) multiple lead-specific branches for learning the diversity of multi-lead ECG; 2) cross-lead features fusion by concatenating the output feature maps of all branches for learning the integrity of multi-lead ECG; 3) multi-loss co-optimization for all the individual branches and the concatenated network. We demonstrate our MLBF-Net on China Physiological Signal Challenge 2018 which is an open 12-lead ECG dataset. The experimental results show that MLBF-Net obtains an average $F_1$ score of 0.855, reaching the highest arrhythmia classification performance. The proposed method provides a promising solution for multi-lead ECG analysis from an information fusion perspective. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2007.00857 [pdf, other]

doi 10.1109/TVT.2020.3000757

Efficient Hybrid Beamforming with Anti-Blockage Design for High-Speed Railway Communications

Authors: Meilin Gao, Bo Ai, Yong Niu, Wen Wu, Peng Yang, Feng Lyu, Xuemin, Shen

Abstract: Future railway is expected to accommodate both train operation services and passenger broadband services. The millimeter wave (mmWave) communication is a promising technology in providing multi-gigabit data rates to onboard users. However, mmWave communications suffer from severe propagation attenuation and vulnerability to blockage, which can be very challenging in high-speed railway (HSR) scenar… ▽ More Future railway is expected to accommodate both train operation services and passenger broadband services. The millimeter wave (mmWave) communication is a promising technology in providing multi-gigabit data rates to onboard users. However, mmWave communications suffer from severe propagation attenuation and vulnerability to blockage, which can be very challenging in high-speed railway (HSR) scenarios. In this paper, we investigate efficient hybrid beamforming (HBF) design for train-to-ground communications. First, we develop a two-stage HBF algorithm in blockage-free scenarios. In the first stage, the minimum mean square error method is adopted for optimal hybrid beamformer design with low complexity and fast convergence; in the second stage, the orthogonal matching pursuit method is utilized to approximately recover the analog and digital beamformers. Second, in blocked scenarios, we design an anti-blockage scheme by adaptively invoking the proposed HBF algorithm, which can efficiently deal with random blockages. Extensive simulation results are presented to show the sum rate performance of the proposed algorithms under various configurations, including transmission power, velocity of the train, blockage probability, etc. It is demonstrated that the proposed anti-blockage algorithm can improve the effective rate by 20% in severely-blocked scenarios while maintaining low outage probability. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: 11 Pages, 9 Figures

Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020

arXiv:2004.08957 [pdf, other]

Reconstruction of high-resolution 6x6-mm OCT angiograms using deep learning

Authors: Min Gao, Yukun Guo, Tristan T. Hormel, Jiande Sun, Thomas Hwang, Yali Jia

Abstract: Typical optical coherence tomographic angiography (OCTA) acquisition areas on commercial devices are 3x3- or 6x6-mm. Compared to 3x3-mm angiograms with proper sampling density, 6x6-mm angiograms have significantly lower scan quality, with reduced signal-to-noise ratio and worse shadow artifacts due to undersampling. Here, we propose a deep-learning-based high-resolution angiogram reconstruction ne… ▽ More Typical optical coherence tomographic angiography (OCTA) acquisition areas on commercial devices are 3x3- or 6x6-mm. Compared to 3x3-mm angiograms with proper sampling density, 6x6-mm angiograms have significantly lower scan quality, with reduced signal-to-noise ratio and worse shadow artifacts due to undersampling. Here, we propose a deep-learning-based high-resolution angiogram reconstruction network (HARNet) to generate enhanced 6x6-mm superficial vascular complex (SVC) angiograms. The network was trained on data from 3x3-mm and 6x6-mm angiograms from the same eyes. The reconstructed 6x6-mm angiograms have significantly lower noise intensity and better vascular connectivity than the original images. The algorithm did not generate false flow signal at the noise level presented by the original angiograms. The image enhancement produced by our algorithm may improve biomarker measurements and qualitative clinical assessment of 6x6-mm OCTA. △ Less

Submitted 9 June, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

arXiv:1911.02014 [pdf, other]

Scribble-based Hierarchical Weakly Supervised Learning for Brain Tumor Segmentation

Authors: Zhanghexuan Ji, Yan Shen, Chunwei Ma, Mingchen Gao

Abstract: The recent state-of-the-art deep learning methods have significantly improved brain tumor segmentation. However, fully supervised training requires a large amount of manually labeled masks, which is highly time-consuming and needs domain expertise. Weakly supervised learning with scribbles provides a good trade-off between model accuracy and the effort of manual labeling. However, for segmenting t… ▽ More The recent state-of-the-art deep learning methods have significantly improved brain tumor segmentation. However, fully supervised training requires a large amount of manually labeled masks, which is highly time-consuming and needs domain expertise. Weakly supervised learning with scribbles provides a good trade-off between model accuracy and the effort of manual labeling. However, for segmenting the hierarchical brain tumor structures, manually labeling scribbles for each substructure could still be demanding. In this paper, we use only two kinds of weak labels, i.e., scribbles on whole tumor and healthy brain tissue, and global labels for the presence of each substructure, to train a deep learning model to segment all the sub-regions. Specifically, we train two networks in two phases: first, we only use whole tumor scribbles to train a whole tumor (WT) segmentation network, which roughly recovers the WT mask of training data; then we cluster the WT region with the guide of global labels. The rough substructure segmentation from clustering is used as weak labels to train the second network. The dense CRF loss is used to refine the weakly supervised segmentation. We evaluate our approach on the BraTS2017 dataset and achieve competitive WT dice score as well as comparable scores on substructure segmentation compared to an upper bound when trained with fully annotated masks. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: 22nd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019) Accept

arXiv:1905.04711 [pdf]

doi 10.1038/s41524-020-00392-6

Data augmentation in microscopic images for material data mining

Authors: Boyuan Ma, Xiaoyan Wei, Chuni Liu, Xiaojuan Ban, Haiyou Huang, Hao Wang, Weihua Xue, Stephen Wu, Mingfei Gao, Qing Shen, Adnan Omer Abuassba, Haokai Shen, Yan**g Su

Abstract: Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data,… ▽ More Recent progress in material data mining has been driven by high-capacity models trained on large datasets. However, collecting experimental data (real data) has been extremely costly since the amount of human effort and expertise required. Here, we develop a novel transfer learning strategy to address small or insufficient data problem. This strategy realizes the fusion of real and simulated data, and the augmentation of training data in data mining procedure. For a specific task of image segmentation, this strategy can generate synthetic images by fusing physical mechanism of simulated images and "image style" of real images. The result shows that the model trained with the acquired synthetic images and 35% of the real images outperforms the model trained on all real images. As the time required to generate synthetic data is almost negligible, this strategy is able to reduce the time cost of real data preparation by roughly 65%. △ Less

Submitted 28 October, 2019; v1 submitted 12 May, 2019; originally announced May 2019.

Comments: 17 pages, technical report

Journal ref: npj computational materials 2020

arXiv:1810.09757 [pdf]

Estimation of Spatial-Temporal Gait Parameters based on the Fusion of Inertial and Film-Pressure Signals

Authors: Cheng Wang, Xiangdong Wang, Zhou Long, Tian Tian, Mingming Gao, ** Yun, Yueliang Qian, **tao Li

Abstract: Gait analysis (GA) has been widely used in physical activity monitoring and clinical contexts, and the estimation of the spatial-temporal gait parameters is of primary importance for GA. With the quick development of smart tiny sensors, GA methods based on wearable devices have become more popular recently. However, most existing wearable GA methods focus on data analysis from inertial sensors. In… ▽ More Gait analysis (GA) has been widely used in physical activity monitoring and clinical contexts, and the estimation of the spatial-temporal gait parameters is of primary importance for GA. With the quick development of smart tiny sensors, GA methods based on wearable devices have become more popular recently. However, most existing wearable GA methods focus on data analysis from inertial sensors. In this paper, we firstly present a two-foot-worn in-shoe system (Gaitboter) based on low-cost, wearable and multimodal sensors' fusion for GA, comprising an inertial sensor and eight film-pressure sensors with each foot for gait raw data collection while walking. Secondly, a GA algorithm for estimating the spatial-temporal parameters of gait is proposed. The algorithm fully uses the fusion of two kinds of sensors' signals: inertial sensor and film-pressure sensor, in order to estimate the spatial-temporal gait parameters, such as stance phase, swing phase, double stance phase, cadence, step time, stride time, stride length, velocity. Finally, to verify the effectiveness of this system and algorithm of the paper, an experiment is performed with 27 stoke patients from local hospital that the spatial-temporal gait parameters obtained with the system and the algorithm are compared with a GA tool used in medical laboratories. And the experimental results show that it achieves very good consistency and correlation between the proposed system and the compared GA tool. △ Less

Submitted 23 October, 2018; originally announced October 2018.

Comments: 8 pages

arXiv:1807.03348 [pdf, ps, other]

doi 10.1109/TWC.2019.2914687

Blind Identification of SFBC-OFDM Signals Based on the Central Limit Theorem

Authors: Mingjun Gao, Yongzhao Li, Octavia A. Dobre, Naofal Al-Dhahir

Abstract: Previous approaches for blind identification of space-frequency block codes (SFBC) do not perform well for short observation periods due to their inefficient utilization of frequency-domain redundancy. This paper proposes a hypothesis test (HT)-based algorithm and a support vector machine (SVM)-based algorithm for SFBC signals identification over frequency-selective fading channels to exploit two-… ▽ More Previous approaches for blind identification of space-frequency block codes (SFBC) do not perform well for short observation periods due to their inefficient utilization of frequency-domain redundancy. This paper proposes a hypothesis test (HT)-based algorithm and a support vector machine (SVM)-based algorithm for SFBC signals identification over frequency-selective fading channels to exploit two-dimensional space-frequency domain redundancy. Based on the central limit theorem, space-domain redundancy is exploited to construct the cross-correlation function of the estimator and frequency-domain redundancy is incorporated in the construction of the statistics. The difference between the two proposed algorithms is that the HT-based algorithm constructs a chi-square statistic and employs an HT to make the decision, while the SVM-based algorithm constructs a non-central chi-square statistic with unknown mean as a strongly-distinguishable statistical feature and uses an SVM to make the decision. Both algorithms do not require knowledge of the channel coefficients, modulation type or noise power, and the SVM-based algorithm does not require timing synchronization. Simulation results verify the superior performance of the proposed algorithms for short observation periods with comparable computational complexity to conventional algorithms, as well as their acceptable identification performance in the presence of transmission impairments. △ Less

Submitted 14 August, 2019; v1 submitted 9 July, 2018; originally announced July 2018.

Journal ref: IEEE Trans. Wireless Commun. 18 (2019) 3500-3514

arXiv:1803.10849 [pdf, ps, other]

doi 10.1109/TWC.2018.2879941

Joint Blind Identification of the Number of Transmit Antennas and MIMO Schemes Using Gerschgorin Radii and FNN

Authors: Mingjun Gao, Yongzhao Li, Octavia A. Dobre, Naofal Al-Dhahir

Abstract: Blind enumeration of the number of transmit antennas and blind identification of multiple-input multiple-output (MIMO) schemes are two pivotal steps in MIMO signal identification for both military and commercial applications. Conventional approaches treat them as two independent problems, namely the source number enumeration and the presence detection of space-time redundancy, respectively. In thi… ▽ More Blind enumeration of the number of transmit antennas and blind identification of multiple-input multiple-output (MIMO) schemes are two pivotal steps in MIMO signal identification for both military and commercial applications. Conventional approaches treat them as two independent problems, namely the source number enumeration and the presence detection of space-time redundancy, respectively. In this paper, we develop a joint blind identification algorithm to determine the number of transmit antennas and MIMO scheme simultaneously. By restructuring the received signals, we derive three subspace-rank features based on the signal subspace-rank to determine the number of transmit antennas and identify space-time redundancy. Then, a Gerschgorin radii-based method and a feed-forward neural network are employed to calculate these three features, and a minimal weighted norm-1 distance metric is utilized for decision making. In particular, our approach can identify additional MIMO schemes, which most previous works have not considered, and is compatible with both single-carrier and orthogonal frequency division multiplexing (OFDM) systems. Simulation results verify the viability of our proposed approach for single-carrier and OFDM systems and demonstrate its favorable identification performance for a short observation period with acceptable complexity. △ Less

Submitted 14 August, 2019; v1 submitted 28 March, 2018; originally announced March 2018.

Journal ref: IEEE Trans. Wireless Commun. 18 (2019) 373-387

arXiv:1803.05053 [pdf, ps, other]

doi 10.1109/TVT.2018.2859761

Blind Identification of SFBC-OFDM Signals Using Subspace Decompositions and Random Matrix Theory

Authors: Mingjun Gao, Yongzhao Li, Octavia A. Dobre, Naofal Al-Dhahir

Abstract: Blind signal identification has important applications in both civilian and military communications. Previous investigations on blind identification of space-frequency block codes (SFBCs) only considered identifying Alamouti and spatial multiplexing transmission schemes. In this paper, we propose a novel algorithm to identify SFBCs by analyzing discriminating features for different SFBCs, calculat… ▽ More Blind signal identification has important applications in both civilian and military communications. Previous investigations on blind identification of space-frequency block codes (SFBCs) only considered identifying Alamouti and spatial multiplexing transmission schemes. In this paper, we propose a novel algorithm to identify SFBCs by analyzing discriminating features for different SFBCs, calculated by separating the signal subspace and noise subspace of the received signals at different adjacent OFDM subcarriers. Relying on random matrix theory, this algorithm utilizes a serial hypothesis test to determine the decision boundary according to the maximum eigenvalue in the noise subspace. Then, a decision tree of a special distance metric is employed for decision making. The proposed algorithm does not require prior knowledge of the signal parameters such as the number of transmit antennas, channel coefficients, modulation mode and noise power. Simulation results verify the viability of the proposed algorithm for a reduced observation period with an acceptable computational complexity. △ Less

Submitted 14 August, 2019; v1 submitted 13 March, 2018; originally announced March 2018.

Journal ref: IEEE Trans. Veh. Technol. 67 (2018) 9619-9630

Showing 1–37 of 37 results for author: Gao, M