Search | arXiv e-print repository

Trustworthy Enhanced Multi-view Multi-modal Alzheimer's Disease Prediction with Brain-wide Imaging Transcriptomics Data

Authors: Shan Cong, Zhoujie Fan, Hongwei Liu, Yinghan Zhang, Xin Wang, Haoran Luo, Xiaohui Yao

Abstract: Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities,… ▽ More Brain transcriptomics provides insights into the molecular mechanisms by which the brain coordinates its functions and processes. However, existing multimodal methods for predicting Alzheimer's disease (AD) primarily rely on imaging and sometimes genetic data, often neglecting the transcriptomic basis of brain. Furthermore, while striving to integrate complementary information between modalities, most studies overlook the informativeness disparities between modalities. Here, we propose TMM, a trusted multiview multimodal graph attention framework for AD diagnosis, using extensive brain-wide transcriptomics and imaging data. First, we construct view-specific brain regional co-function networks (RRIs) from transcriptomics and multimodal radiomics data to incorporate interaction information from both biomolecular and imaging perspectives. Next, we apply graph attention (GAT) processing to each RRI network to produce graph embeddings and employ cross-modal attention to fuse transcriptomics-derived embedding with each imagingderived embedding. Finally, a novel true-false-harmonized class probability (TFCP) strategy is designed to assess and adaptively adjust the prediction confidence of each modality for AD diagnosis. We evaluate TMM using the AHBA database with brain-wide transcriptomics data and the ADNI database with three imaging modalities (AV45-PET, FDG-PET, and VBM-MRI). The results demonstrate the superiority of our method in identifying AD, EMCI, and LMCI compared to state-of-the-arts. Code and data are available at https://github.com/Yaolab-fantastic/TMM. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.07739 [pdf, ps, other]

A Low-rank Projected Proximal Gradient Method for Spectral Compressed Sensing

Authors: Xi Yao, Wei Dai

Abstract: This paper presents a new approach to the recovery of a spectrally sparse signal (SSS) from partially observed entries, focusing on challenges posed by large-scale data and heavy noise environments. The SSS reconstruction can be formulated as a non-convex low-rank Hankel recovery problem. Traditional formulations for SSS recovery often suffer from reconstruction inaccuracies due to unequally weigh… ▽ More This paper presents a new approach to the recovery of a spectrally sparse signal (SSS) from partially observed entries, focusing on challenges posed by large-scale data and heavy noise environments. The SSS reconstruction can be formulated as a non-convex low-rank Hankel recovery problem. Traditional formulations for SSS recovery often suffer from reconstruction inaccuracies due to unequally weighted norms and over-relaxation of the Hankel structure in noisy conditions. Moreover, a critical limitation of standard proximal gradient (PG) methods for solving the optimization problem is their slow convergence. We overcome this by introducing a more accurate formulation and a Low-rank Projected Proximal Gradient (LPPG) method, designed to efficiently converge to stationary points through a two-step process. The first step involves a modified PG approach, allowing for a constant step size independent of signal size, which significantly accelerates the gradient descent phase. The second step employs a subspace projection strategy, optimizing within a low-rank matrix space to further decrease the objective function. Both steps of the LPPG method are meticulously tailored to exploit the intrinsic low-rank and Hankel structures of the problem, thereby enhancing computational efficiency. Our numerical simulations reveal a substantial improvement in both the efficiency and recovery accuracy of the LPPG method compared to existing benchmark algorithms. This performance gain is particularly pronounced in scenarios with significant noise, demonstrating the method's robustness and applicability to large-scale SSS recovery tasks. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.02801 [pdf, other]

Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

Authors: Tianze Xu, Jiajun Li, Xuesong Chen, Xinrui Yao, Shuchang Liu

Abstract: In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the c… ▽ More In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the cross-modality inputs, such as images, videos and text. Mozart's Touch is composed of three main components: Multi-modal Captioning Module, Large Language Model (LLM) Understanding & Bridging Module, and Music Generation Module. Unlike traditional approaches, Mozart's Touch requires no training or fine-tuning pre-trained models, offering efficiency and transparency through clear, interpretable prompts. We also introduce "LLM-Bridge" method to resolve the heterogeneous representation problems between descriptive texts of different modalities. We conduct a series of objective and subjective evaluations on the proposed model, and results indicate that our model surpasses the performance of current state-of-the-art models. Our codes and examples is availble at: https://github.com/WangTooNaive/MozartsTouch △ Less

Submitted 7 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: 7 pages, 2 figures, submitted to ACM MM 2024

arXiv:2404.13673 [pdf]

doi 10.1016/j.jmsy.2024.04.013

In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review

Authors: Lequn Chen, Guijun Bi, Xiling Yao, **long Su, Chaolin Tan, Wenhe Feng, Michalis Benakis, Youxiang Chew, Seung Ki Moon

Abstract: Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including… ▽ More Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including optical-based monitoring, acoustic-based sensing, laser line scanning, and operando X-ray monitoring. These techniques are evaluated for their capabilities and limitations in detecting defects within Laser Powder Bed Fusion (LPBF) and Laser Directed Energy Deposition (LDED) processes. Furthermore, the review discusses emerging multisensor monitoring and machine learning (ML)-assisted defect detection methods, benchmarking ML models tailored for in-situ defect detection. The paper also discusses in-situ adaptive defect remediation strategies that advance LAM towards zero-defect autonomous operations, focusing on real-time closed-loop feedback control and defect correction methods. Research gaps such as the need for standardization, improved reliability and sensitivity, and decision-making strategies beyond early stop** are highlighted. Future directions are proposed, with an emphasis on multimodal sensor fusion for multiscale defect prediction and fault diagnosis, ultimately enabling self-adaptation in LAM processes. This paper aims to equip researchers and industry professionals with a holistic understanding of the current capabilities, limitations, and future directions in in-situ process monitoring and adaptive quality enhancement in LAM. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 107 Pages, 29 Figures. Paper Accepted At Journal of Manufacturing Systems

arXiv:2403.05870 [pdf, ps, other]

doi 10.1109/LWC.2024.3369874

Channel Estimation for Stacked Intelligent Metasurface-Assisted Wireless Networks

Authors: Xianghao Yao, Jiancheng An, Lu Gan, Marco Di Renzo, Chau Yuen

Abstract: Emerging technologies, such as holographic multiple-input multiple-output (HMIMO) and stacked intelligent metasurface (SIM), are driving the development of wireless communication systems. Specifically, the SIM is physically constructed by stacking multiple layers of metasurfaces and has an architecture similar to an artificial neural network (ANN), which can flexibly manipulate the electromagnetic… ▽ More Emerging technologies, such as holographic multiple-input multiple-output (HMIMO) and stacked intelligent metasurface (SIM), are driving the development of wireless communication systems. Specifically, the SIM is physically constructed by stacking multiple layers of metasurfaces and has an architecture similar to an artificial neural network (ANN), which can flexibly manipulate the electromagnetic waves that propagate through it at the speed of light. This architecture enables the SIM to achieve HMIMO precoding and combining in the wave domain, thus significantly reducing the hardware cost and energy consumption. In this letter, we investigate the channel estimation problem in SIM-assisted multi-user HMIMO communication systems. Since the number of antennas at the base station (BS) is much smaller than the number of meta-atoms per layer of the SIM, it is challenging to acquire the channel state information (CSI) in SIM-assisted multi-user systems. To address this issue, we collect multiple copies of the uplink pilot signals that propagate through the SIM. Furthermore, we leverage the array geometry to identify the subspace that spans arbitrary spatial correlation matrices. Based on partial CSI about the channel statistics, a pair of subspace-based channel estimators are proposed. Additionally, we compute the mean square error (MSE) of the proposed channel estimators and optimize the phase shifts of the SIM to minimize the MSE. Numerical results are illustrated to analyze the effectiveness of the proposed channel estimation schemes. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 13 pages, 3 figures, accepted by IEEE WCL

arXiv:2402.18946 [pdf, other]

Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

Authors: Yu Zhang, Long Wen, Xiangtong Yao, Zhenshan Bing, Linghuan Kong, Wei He, Alois Knoll

Abstract: This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Sub… ▽ More This paper presents an adaptive online learning framework for systems with uncertain parameters to ensure safety-critical control in non-stationary environments. Our approach consists of two phases. The initial phase is centered on a novel sparse Gaussian process (GP) framework. We first integrate a forgetting factor to refine a variational sparse GP algorithm, thus enhancing its adaptability. Subsequently, the hyperparameters of the Gaussian model are trained with a specially compound kernel, and the Gaussian model's online inferential capability and computational efficiency are strengthened by updating a solitary inducing point derived from new samples, in conjunction with the learned hyperparameters. In the second phase, we propose a safety filter based on high-order control barrier functions (HOCBFs), synergized with the previously trained learning model. By leveraging the compound kernel from the first phase, we effectively address the inherent limitations of GPs in handling high-dimensional problems for real-time applications. The derived controller ensures a rigorous lower bound on the probability of satisfying the safety specification. Finally, the efficacy of our proposed algorithm is demonstrated through real-time obstacle avoidance experiments executed using both a simulation platform and a real-world 7-DOF robot. △ Less

Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2311.13028 [pdf, other]

DMLR: Data-centric Machine Learning Research -- Past, Present and Future

Authors: Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš , et al. (13 additional authors not shown)

Abstract: Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods tow… ▽ More Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact. △ Less

Submitted 1 June, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Published in the Journal of Data-centric Machine Learning Research (DMLR) at https://data.mlr.press/assets/pdf/v01-5.pdf

arXiv:2310.18732 [pdf, ps, other]

Tracking and fast imaging of a translational object via Fourier modulation

Authors: Shijian Li, Xu-ri Yao, Wei Zhang, Yeliang Wang, Qing Zhao

Abstract: The tracking and imaging of high-speed moving objects hold significant promise for application in various fields. Single-pixel imaging enables the progressive capture of a fast-moving translational object through motion compensation. However, achieving a balance between a short reconstruction time and a good image quality is challenging. In this study, we present a approach that simultaneously inc… ▽ More The tracking and imaging of high-speed moving objects hold significant promise for application in various fields. Single-pixel imaging enables the progressive capture of a fast-moving translational object through motion compensation. However, achieving a balance between a short reconstruction time and a good image quality is challenging. In this study, we present a approach that simultaneously incorporates position encoding and spatial information encoding through the Fourier patterns. The utilization of Fourier patterns with specific spatial frequencies ensures robust and accurate object localization. By exploiting the properties of the Fourier transform, our method achieves a remarkable reduction in time complexity and memory consumption while significantly enhancing image quality. Furthermore, we introduce an optimized sampling strategy specifically tailored for small moving objects, significantly reducing the required dwell time for imaging. The proposed method provides a practical solution for the real-time tracking, imaging and edge detection of translational objects, underscoring its considerable potential for diverse applications. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: 6 figures

arXiv:2310.14965 [pdf, ps, other]

Parallel compressive super-resolution imaging with wide field-of-view based on physics enhanced network

Authors: Xiao-Peng **, An-Dong Xiong, Wei Zhang, Xiao-Qing Wang, Fan Liu, Chang-Heng Li, Xu-Ri Yao, Xue-Feng Liu, Qing Zhao

Abstract: Achieving both high-performance and wide field-of-view (FOV) super-resolution imaging has been attracting increasing attention in recent years. However, such goal suffers from long reconstruction time and huge storage space. Parallel compressive imaging (PCI) provides an efficient solution, but the super-resolution quality and imaging speed are strongly dependent on precise optical transfer functi… ▽ More Achieving both high-performance and wide field-of-view (FOV) super-resolution imaging has been attracting increasing attention in recent years. However, such goal suffers from long reconstruction time and huge storage space. Parallel compressive imaging (PCI) provides an efficient solution, but the super-resolution quality and imaging speed are strongly dependent on precise optical transfer function (OTF), modulation masks and reconstruction algorithm. In this work, we propose a wide FOV parallel compressive super-resolution imaging approach based on physics enhanced network. By training the network with the prior OTF of an arbitrary 128x128-pixel region and fine-tuning the network with other OTFs within rest regions of FOV, we realize both mask optimization and super-resolution imaging with up to 1020x1500 wide FOV. Numerical simulations and practical experiments demonstrate the effectiveness and superiority of the proposed approach. We achieve high-quality reconstruction with 4x4 times super-resolution enhancement using only three designed masks to reach real-time imaging speed. The proposed approach promotes the technology of rapid imaging for super-resolution and wide FOV, ranging from infrared to Terahertz. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.15697 [pdf, other]

Physics Inspired Hybrid Attention for SAR Target Recognition

Authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han

Abstract: There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly dep… ▽ More There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly depends on the ASC optimization result, and the fusion strategy is not adaptable to different types of physical information. Meanwhile, the current evaluation scheme is inadequate to assess the model's robustness and generalizability. Thus, we propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the above issues. PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target, so as to re-weight the feature importance based on knowledge prior. It is flexible and generally applicable to various physical models, and can be integrated into arbitrary DNNs without modifying the original architecture. The experiments involve a rigorous assessment using the proposed OFA, which entails training and validating a model on either sufficient or limited data and evaluating on multiple test sets with different data distributions. Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters. Moreover, we analyze the working mechanism of PIHA and evaluate various PIHA enabled DNNs. The experiments also show PIHA is effective for different physical information. The source code together with the adopted physical information is available at https://github.com/XAI4SAR. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2308.13906 [pdf, other]

A Two-Dimensional Deep Network for RF-based Drone Detection and Identification Towards Secure Coverage Extension

Authors: Zixiao Zhao, Qinghe Du, Xiang Yao, Lei Lu, Shijiao Zhang

Abstract: As drones become increasingly prevalent in human life, they also raises security concerns such as unauthorized access and control, as well as collisions and interference with manned aircraft. Therefore, ensuring the ability to accurately detect and identify between different drones holds significant implications for coverage extension. Assisted by machine learning, radio frequency (RF) detection c… ▽ More As drones become increasingly prevalent in human life, they also raises security concerns such as unauthorized access and control, as well as collisions and interference with manned aircraft. Therefore, ensuring the ability to accurately detect and identify between different drones holds significant implications for coverage extension. Assisted by machine learning, radio frequency (RF) detection can recognize the type and flight mode of drones based on the sampled drone signals. In this paper, we first utilize Short-Time Fourier. Transform (STFT) to extract two-dimensional features from the raw signals, which contain both time-domain and frequency-domain information. Then, we employ a Convolutional Neural Network (CNN) built with ResNet structure to achieve multi-class classifications. Our experimental results show that the proposed ResNet-STFT can achieve higher accuracy and faster convergence on the extended dataset. Additionally, it exhibits balanced performance compared to other baselines on the raw dataset. △ Less

Submitted 26 August, 2023; originally announced August 2023.

arXiv:2308.06377 [pdf, other]

CATS v2: Hybrid encoders for robust medical segmentation

Authors: Hao Li, Han Liu, Dewei Hu, Xing Yao, Jiacheng Wang, Ipek Oguz

Abstract: Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks by capturing high-level (local) information, such as edges and textures. However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information. Recently, transformers have shown good performance for medical image segmentation due to their a… ▽ More Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks by capturing high-level (local) information, such as edges and textures. However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information. Recently, transformers have shown good performance for medical image segmentation due to their ability to better model long-range dependencies. Nevertheless, transformers struggle to capture high-level spatial features as effectively as CNNs. A good segmentation model should learn a better representation from local and global features to be both precise and semantically accurate. In our previous work, we proposed CATS, which is a U-shaped segmentation network augmented with transformer encoder. In this work, we further extend this model and propose CATS v2 with hybrid encoders. Specifically, hybrid encoders consist of a CNN-based encoder path paralleled to a transformer path with a shifted window, which better leverage both local and global information to produce robust 3D medical image segmentation. We fuse the information from the convolutional encoder and the transformer at the skip connections of different resolutions to form the final segmentation. The proposed method is evaluated on three public challenge datasets: Beyond the Cranial Vault (BTCV), Cross-Modality Domain Adaptation (CrossMoDA) and task 5 of Medical Segmentation Decathlon (MSD-5), to segment abdominal organs, vestibular schwannoma (VS) and prostate, respectively. Compared with the state-of-the-art methods, our approach demonstrates superior performance in terms of higher Dice scores. Our code is publicly available at https://github.com/MedICL-VU/CATS. △ Less

Submitted 31 January, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2307.12377 [pdf, other]

4D Feet: Registering Walking Foot Shapes Using Attention Enhanced Dynamic-Synchronized Graph Convolutional LSTM Network

Authors: Farzam Tajdari, Toon Huysmans, Xinhe Yao, Jun Xu, Yu Song

Abstract: 4D scans of dynamic deformable human body parts help researchers have a better understanding of spatiotemporal features. However, reconstructing 4D scans based on multiple asynchronous cameras encounters two main challenges: 1) finding the dynamic correspondences among different frames captured by each camera at the timestamps of the camera in terms of dynamic feature recognition, and 2) reconstru… ▽ More 4D scans of dynamic deformable human body parts help researchers have a better understanding of spatiotemporal features. However, reconstructing 4D scans based on multiple asynchronous cameras encounters two main challenges: 1) finding the dynamic correspondences among different frames captured by each camera at the timestamps of the camera in terms of dynamic feature recognition, and 2) reconstructing 3D shapes from the combined point clouds captured by different cameras at asynchronous timestamps in terms of multi-view fusion. In this paper, we introduce a generic framework that is able to 1) find and align dynamic features in the 3D scans captured by each camera using the nonrigid iterative closest-farthest points algorithm; 2) synchronize scans captured by asynchronous cameras through a novel ADGC-LSTM-based network, which is capable of aligning 3D scans captured by different cameras to the timeline of a specific camera; and 3) register a high-quality template to synchronized scans at each timestamp to form a high-quality 3D mesh model using a non-rigid registration method. With a newly developed 4D foot scanner, we validate the framework and create the first open-access data-set, namely the 4D feet. It includes 4D shapes (15 fps) of the right and left feet of 58 participants (116 feet in total, including 5147 3D frames), covering significant phases of the gait cycle. The results demonstrate the effectiveness of the proposed framework, especially in synchronizing asynchronous 4D scans using the proposed ADGC-LSTM network. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2307.00245 [pdf, other]

Deep Angiogram: Trivializing Retinal Vessel Segmentation

Authors: Dewei Hu, Xing Yao, Jiacheng Wang, Yuankai K. Tao, Ipek Oguz

Abstract: Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages… ▽ More Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 5 pages, 4 figures, SPIE 2023

Journal ref: In Medical Imaging 2023: Image Processing, vol. 12464, pp. 656-660. SPIE, 2023

arXiv:2305.13596 [pdf]

doi 10.1115/DETC2023-110284

Multimodal sensor fusion for real-time location-dependent defect detection in laser-directed energy deposition

Authors: Lequn Chen, Xiling Yao, Wenhe Feng, Youxiang Chew, Seung Ki Moon

Abstract: Real-time defect detection is crucial in laser-directed energy deposition (L-DED) additive manufacturing (AM). Traditional in-situ monitoring approach utilizes a single sensor (i.e., acoustic, visual, or thermal sensor) to capture the complex process dynamic behaviors, which is insufficient for defect detection with high accuracy and robustness. This paper proposes a novel multimodal sensor fusion… ▽ More Real-time defect detection is crucial in laser-directed energy deposition (L-DED) additive manufacturing (AM). Traditional in-situ monitoring approach utilizes a single sensor (i.e., acoustic, visual, or thermal sensor) to capture the complex process dynamic behaviors, which is insufficient for defect detection with high accuracy and robustness. This paper proposes a novel multimodal sensor fusion method for real-time location-dependent defect detection in the robotic L-DED process. The multimodal fusion sources include a microphone sensor capturing the laser-material interaction sound and a visible spectrum CCD camera capturing the coaxial melt pool images. A hybrid convolutional neural network (CNN) is proposed to fuse acoustic and visual data. The key novelty in this study is that the traditional manual feature extraction procedures are no longer required, and the raw melt pool images and acoustic signals are fused directly by the hybrid CNN model, which achieved the highest defect prediction accuracy (98.5 %) without the thermal sensing modality. Moreover, unlike previous region-based quality prediction, the proposed hybrid CNN can detect the onset of defect occurrences. The defect prediction outcomes are synchronized and registered with in-situ acquired robot tool-center-point (TCP) data, which enables localized defect identification. The proposed multimodal sensor fusion method offers a robust solution for in-situ defect detection. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 8 pages, 10 figures. This paper has been accepted to be published in the proceedings of IDETC-CIE 2023

arXiv:2304.05685 [pdf]

Multisensor fusion-based digital twin in additive manufacturing for in-situ quality monitoring and defect correction

Authors: Lequn Chen, Xiling Yao, Kui Liu, Chaolin Tan, Seung Ki Moon

Abstract: Early detection and correction of defects are critical in additive manufacturing (AM) to avoid build failures. In this paper, we present a multisensor fusion-based digital twin for in-situ quality monitoring and defect correction in a robotic laser direct energy deposition process. Multisensor fusion sources consist of an acoustic sensor, an infrared thermal camera, a coaxial vision camera, and a… ▽ More Early detection and correction of defects are critical in additive manufacturing (AM) to avoid build failures. In this paper, we present a multisensor fusion-based digital twin for in-situ quality monitoring and defect correction in a robotic laser direct energy deposition process. Multisensor fusion sources consist of an acoustic sensor, an infrared thermal camera, a coaxial vision camera, and a laser line scanner. The key novelty and contribution of this work are to develop a spatiotemporal data fusion method that synchronizes and registers the multisensor features within the part's 3D volume. The fused dataset can be used to predict location-specific quality using machine learning. On-the-fly identification of regions requiring material addition or removal is feasible. Robot toolpath and auto-tuned process parameters are generated for defecting correction. In contrast to traditional single-sensor-based monitoring, multisensor fusion allows for a more in-depth understanding of underlying process physics, such as pore formation and laser-material interactions. The proposed methods pave the way for self-adaptation AM with higher efficiency, less waste, and cleaner production. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 11 pages, 9 figures. Accepted at 24th International Conference on Engineering Design (ICED23)

arXiv:2304.04598 [pdf]

doi 10.1016/j.addma.2023.103547

In-situ crack and keyhole pore detection in laser directed energy deposition through acoustic signal and deep learning

Authors: Lequn Chen, Xiling Yao, Chaolin Tan, Weiyang He, **long Su, Fei Weng, Youxiang Chew, Nicholas Poh Huat Ng, Seung Ki Moon

Abstract: Cracks and keyhole pores are detrimental defects in alloys produced by laser directed energy deposition (LDED). Laser-material interaction sound may hold information about underlying complex physical events such as crack propagation and pores formation. However, due to the noisy environment and intricate signal content, acoustic-based monitoring in LDED has received little attention. This paper pr… ▽ More Cracks and keyhole pores are detrimental defects in alloys produced by laser directed energy deposition (LDED). Laser-material interaction sound may hold information about underlying complex physical events such as crack propagation and pores formation. However, due to the noisy environment and intricate signal content, acoustic-based monitoring in LDED has received little attention. This paper proposes a novel acoustic-based in-situ defect detection strategy in LDED. The key contribution of this study is to develop an in-situ acoustic signal denoising, feature extraction, and sound classification pipeline that incorporates convolutional neural networks (CNN) for online defect prediction. Microscope images are used to identify locations of the cracks and keyhole pores within a part. The defect locations are spatiotemporally registered with acoustic signal. Various acoustic features corresponding to defect-free regions, cracks, and keyhole pores are extracted and analysed in time-domain, frequency-domain, and time-frequency representations. The CNN model is trained to predict defect occurrences using the Mel-Frequency Cepstral Coefficients (MFCCs) of the lasermaterial interaction sound. The CNN model is compared to various classic machine learning models trained on the denoised acoustic dataset and raw acoustic dataset. The validation results shows that the CNN model trained on the denoised dataset outperforms others with the highest overall accuracy (89%), keyhole pore prediction accuracy (93%), and AUC-ROC score (98%). Furthermore, the trained CNN model can be deployed into an in-house developed software platform for online quality monitoring. The proposed strategy is the first study to use acoustic signals with deep learning for insitu defect detection in LDED process. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: 36 Pages, 16 Figures, accepted at journal Additive Manufacturing

arXiv:2212.14840 [pdf]

Normalized Blood Flow Index in Optical Coherence Tomography Angiography Provides a Sensitive Biomarker of Early Diabetic Retinopathy

Authors: Albert K. Dadzie, David Le, Mansour Abtahi, Behrouz Ebrahimi, Taeyoon Son, Jennifer I. Lim, Xincheng Yao

Abstract: Purpose: To evaluate the sensitivity of normalized blood flow index (NBFI) for detecting early diabetic retinopathy (DR). Methods: Optical coherence tomography angiography (OCTA) images of 30 eyes from 20 healthy controls, 21 eyes of diabetic patients with no DR (NoDR) and 26 eyes from 22 patients with mild non-proliferative DR (NPDR) were analyzed in this study. The OCTA images were centered on t… ▽ More Purpose: To evaluate the sensitivity of normalized blood flow index (NBFI) for detecting early diabetic retinopathy (DR). Methods: Optical coherence tomography angiography (OCTA) images of 30 eyes from 20 healthy controls, 21 eyes of diabetic patients with no DR (NoDR) and 26 eyes from 22 patients with mild non-proliferative DR (NPDR) were analyzed in this study. The OCTA images were centered on the fovea and covered a 6 mm x 6 mm area. Enface projections of the superficial vascular plexus (SVP) and the deep capillary plexus (DCP) were obtained for the quantitative OCTA feature analysis. Three quantitative OCTA features were examined: blood vessel density (BVD), blood flow flux (BFF), and normalized blood flow index (NBFI). Each feature was calculated from both the SVP and DCP and their sensitivity to distinguish the three cohorts of the study were evaluated. Results: The only quantitative feature that was capable of distinguishing between all three cohorts was NBFI in the DCP image. Comparative study revealed that both BVD and BFF were able to distinguish the controls from NoDR and mild NPDR. However, neither BVD nor BFF was sensitive enough to separate NoDR from the healthy controls. Conclusion: The NBFI has been demonstrated as a sensitive biomarker of early DR, revealing retinal blood flow abnormality better than traditional BVD and BFF. The NBFI in the DCP was verified as the most sensitive biomarker, supporting that diabetes affects the DCP earlier than SVP in DR. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2212.13257 [pdf]

A portable widefield fundus camera with high dynamic range imaging capability

Authors: Alfa Rossi, Mojtaba Rahimi, David Le, Taeyoon son, Michael J. Heiferman, R. V. Paul Chan, Xincheng Yao

Abstract: Fundus photography is indispensable for clinical detection and management of eye diseases. Limited image contrast and field of view (FOV) are common limitations of conventional fundus cameras, making it difficult to detect subtle abnormalities at the early stages of eye diseases. Further improvements of image contrast and FOV coverage are important to improve early disease detection and reliable t… ▽ More Fundus photography is indispensable for clinical detection and management of eye diseases. Limited image contrast and field of view (FOV) are common limitations of conventional fundus cameras, making it difficult to detect subtle abnormalities at the early stages of eye diseases. Further improvements of image contrast and FOV coverage are important to improve early disease detection and reliable treatment assessment. We report here a portable fundus camera, with a wide FOV and high dynamic range (HDR) imaging capabilities. Miniaturized indirect ophthalmoscopy illumination was employed to achieve the portable design for nonmydriatic, widefield fundus photography. Orthogonal polarization control was used to eliminate illumination reflectance artifact. With independent power controls, three fundus images were sequentially acquired and fused to achieve HDR function for local image contrast enhancement. A 101° eye-angle (67° visual-angle) snapshot FOV was achieved for nonmydriatic fundus photography. The effective FOV can be readily expanded up to 190° eye-angle (134° visual-angle) with the aid of a fixation target, without the need of pharmacologic pupillary dilation. The effectiveness of HDR imaging was validated with both normal healthy and pathologic eyes, compared to a conventional fundus camera. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 12 pages, 8 figures

arXiv:2212.00027 [pdf, other]

Imaging across multiple spatial scales with the multi-camera array microscope

Authors: Mark Harfouche, Kanghyun Kim, Kevin C. Zhou, Pavan Chandra Konda, Sunanda Sharma, Eric E. Thomson, Colin Cooke, Shiqi Xu, Lucas Kreiss, Amey Chaware, Xi Yang, Xing Yao, Vinayak Pathak, Martin Bohlen, Ron Appel, Aurélien Bègue, Clare Cook, Jed Doman, John Efromson, Gregor Horstmeyer, Jaehee Park, Paul Reamey, Veton Saliu, Eva Naumann, Roarke Horstmeyer

Abstract: This article experimentally examines different configurations of a novel multi-camera array microscope (MCAM) imaging technology. The MCAM is based upon a densely packed array of "micro-cameras" to jointly image across a large field-of-view at high resolution. Each micro-camera within the array images a unique area of a sample of interest, and then all acquired data with 54 micro-cameras are digit… ▽ More This article experimentally examines different configurations of a novel multi-camera array microscope (MCAM) imaging technology. The MCAM is based upon a densely packed array of "micro-cameras" to jointly image across a large field-of-view at high resolution. Each micro-camera within the array images a unique area of a sample of interest, and then all acquired data with 54 micro-cameras are digitally combined into composite frames, whose total pixel counts significantly exceed the pixel counts of standard microscope systems. We present results from three unique MCAM configurations for different use cases. First, we demonstrate a configuration that simultaneously images and estimates the 3D object depth across a 100 x 135 mm^2 field-of-view (FOV) at approximately 20 um resolution, which results in 0.15 gigapixels (GP) per snapshot. Second, we demonstrate an MCAM configuration that records video across a continuous 83 x 123 mm^2 FOV with two-fold increased resolution (0.48 GP per frame). Finally, we report a third high-resolution configuration (2 um resolution) that can rapidly produce 9.8 GP composites of large histopathology specimens. △ Less

Submitted 28 February, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.14467 [pdf]

Self-Supervised Surgical Instrument 3D Reconstruction from a Single Camera Image

Authors: Ange Lou, Xing Yao, Ziteng Liu, **tong Han, Jack Noble

Abstract: Surgical instrument tracking is an active research area that can provide surgeons feedback about the location of their tools relative to anatomy. Recent tracking methods are mainly divided into two parts: segmentation and object detection. However, both can only predict 2D information, which is limiting for application to real-world surgery. An accurate 3D surgical instrument model is a prerequisi… ▽ More Surgical instrument tracking is an active research area that can provide surgeons feedback about the location of their tools relative to anatomy. Recent tracking methods are mainly divided into two parts: segmentation and object detection. However, both can only predict 2D information, which is limiting for application to real-world surgery. An accurate 3D surgical instrument model is a prerequisite for precise predictions of the pose and depth of the instrument. Recent single-view 3D reconstruction methods are only used in natural object reconstruction and do not achieve satisfying reconstruction accuracy without 3D attribute-level supervision. Further, those methods are not suitable for the surgical instruments because of their elongated shapes. In this paper, we firstly propose an end-to-end surgical instrument reconstruction system -- Self-supervised Surgical Instrument Reconstruction (SSIR). With SSIR, we propose a multi-cycle-consistency strategy to help capture the texture information from a slim instrument while only requiring a binary instrument label map. Experiments demonstrate that our approach improves the reconstruction quality of surgical instruments compared to other self-supervised methods and achieves promising results. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: Accepted by SPIE Medical Imaging 2023

arXiv:2209.04449 [pdf, other]

doi 10.3788/COL202321.071101

A detail-enhanced sampling strategy in Hadamard single-pixel imaging

Authors: Yan Cai, Shijian Li, Wei Zhang, Hao Wu, Xu-ri Yao, Qing Zhao

Abstract: Hadamard single-pixel imaging (HSI) is an appealing imaging technique due to its features of low hardware complexity and industrial cost. To improve imaging efficiency, many studies have focused on sorting Hadamard patterns to obtain reliable reconstructed images with very few samples. In this study, we present an efficient HSI imaging method that employs an exponential probability function to sam… ▽ More Hadamard single-pixel imaging (HSI) is an appealing imaging technique due to its features of low hardware complexity and industrial cost. To improve imaging efficiency, many studies have focused on sorting Hadamard patterns to obtain reliable reconstructed images with very few samples. In this study, we present an efficient HSI imaging method that employs an exponential probability function to sample Hadamard spectra along a direction with better energy concentration for obtaining Hadamard patterns. We also propose an XY order to further optimize the pattern-selection method with extremely fast Hadamard order generation while retaining the original performance. We used the compressed sensing algorithm for image reconstruction. The simulation and experimental results show that these pattern-selection method reliably reconstructs objects and preserves the edge and details of images. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 14 pages, 12 figures,1 table

arXiv:2209.01554 [pdf, ps, other]

doi 10.1364/OE.481881

Single-pixel imaging of a translational object

Authors: Shijian Li, Yan Cai, Yeliang Wang, Xu-ri Yao, Qing Zhao

Abstract: Image-free tracking methods based on single-pixel detectors (SPDs) can track a moving object at a very high frame rate, but they rarely can achieve simultaneous imaging of such an object. In this study, we propose a method for simultaneously obtaining the relative displacements and images of a translational object. Four binary Fourier patterns and two differential Hadamard patterns are used to mod… ▽ More Image-free tracking methods based on single-pixel detectors (SPDs) can track a moving object at a very high frame rate, but they rarely can achieve simultaneous imaging of such an object. In this study, we propose a method for simultaneously obtaining the relative displacements and images of a translational object. Four binary Fourier patterns and two differential Hadamard patterns are used to modulate one frame of the object and then modulated light signals are obtained by SPD. The relative displacements and image of the moving object can be gradually obtained along with the detection. The proposed method does not require any prior knowledge of the object and its motion. The method has been verified by simulations and experiments, achieving a frame rate of 3332 Hz to acquire relative displacements of a translational object at a spatial resolution of $128 \times 128$ pixels using a 20000-Hz digital micro-mirror device. This proposed method can broaden the application of image-free tracking methods and obtain spatial information about moving objects. △ Less

Submitted 31 January, 2023; v1 submitted 4 September, 2022; originally announced September 2022.

Comments: 15 pages, 9 figures, 1 table

arXiv:2207.04324 [pdf, other]

Video Coding Using Learned Latent GAN Compression

Authors: Mustafa Shukor, Bharath Bhushan Damodaran, Xu Yao, Pierre Hellier

Abstract: We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model,… ▽ More We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model, where an entropy model can be optimized for image coding. In addition, we propose a new perceptual loss that is more efficient than other counterparts. Finally, an entropy model for video inter coding with residual is also learned in the previously constructed latent representation. Our method (SGANC) is simple, faster to train, and achieves better results for image and video coding compared to state-of-the-art codecs such as VTM, AV1, and recent deep learning techniques. In particular, it drastically minimizes perceptual distortion at low bit rates. △ Less

Submitted 12 July, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: Accepted at ACM Multimedia 2022

arXiv:2204.11769 [pdf, ps, other]

Multi-scale reconstruction of undersampled spectral-spatial OCT data for coronary imaging using deep learning

Authors: Xueshen Li, Shengting Cao, Hongshan Liu, Xinwen Yao, Brigitta C. Brott, Silvio H. Litovsky, Xiaoyu Song, Yuye Ling, Yu Gan

Abstract: Coronary artery disease (CAD) is a cardiovascular condition with high morbidity and mortality. Intravascular optical coherence tomography (IVOCT) has been considered as an optimal imagining system for the diagnosis and treatment of CAD. Constrained by Nyquist theorem, dense sampling in IVOCT attains high resolving power to delineate cellular structures/ features. There is a trade-off between high… ▽ More Coronary artery disease (CAD) is a cardiovascular condition with high morbidity and mortality. Intravascular optical coherence tomography (IVOCT) has been considered as an optimal imagining system for the diagnosis and treatment of CAD. Constrained by Nyquist theorem, dense sampling in IVOCT attains high resolving power to delineate cellular structures/ features. There is a trade-off between high spatial resolution and fast scanning rate for coronary imaging. In this paper, we propose a viable spectral-spatial acquisition method that down-scales the sampling process in both spectral and spatial domain while maintaining high quality in image reconstruction. The down-scaling schedule boosts data acquisition speed without any hardware modifications. Additionally, we propose a unified multi-scale reconstruction framework, namely Multiscale- Spectral-Spatial-Magnification Network (MSSMN), to resolve highly down-scaled (compressed) OCT images with flexible magnification factors. We incorporate the proposed methods into Spectral Domain OCT (SD-OCT) imaging of human coronary samples with clinical features such as stent and calcified lesions. Our experimental results demonstrate that spectral-spatial downscaled data can be better reconstructed than data that is downscaled solely in either spectral or spatial domain. Moreover, we observe better reconstruction performance using MSSMN than using existing reconstruction methods. Our acquisition method and multi-scale reconstruction framework, in combination, may allow faster SD-OCT inspection with high resolution during coronary intervention. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: 11 pages, 8 figures, reviewed by IEEE trans BME

arXiv:2203.15177 [pdf]

Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools Segmentation

Authors: Ange Lou, Kareem Tawfik, Xing Yao, Ziteng Liu, Jack Noble

Abstract: A common problem with segmentation of medical images using neural networks is the difficulty to obtain a significant number of pixel-level annotated data for training. To address this issue, we proposed a semi-supervised segmentation network based on contrastive learning. In contrast to the previous state-of-the-art, we introduce Min-Max Similarity (MMS), a contrastive learning form of dual-view t… ▽ More A common problem with segmentation of medical images using neural networks is the difficulty to obtain a significant number of pixel-level annotated data for training. To address this issue, we proposed a semi-supervised segmentation network based on contrastive learning. In contrast to the previous state-of-the-art, we introduce Min-Max Similarity (MMS), a contrastive learning form of dual-view training by employing classifiers and projectors to build all-negative, and positive and negative feature pairs, respectively, to formulate the learning as solving a MMS problem. The all-negative pairs are used to supervise the networks learning from different views and to capture general features, and the consistency of unlabeled predictions is measured by pixel-wise contrastive loss between positive and negative pairs. To quantitatively and qualitatively evaluate our proposed method, we test it on four public endoscopy surgical tool segmentation datasets and one cochlear implant surgery dataset, which we manually annotated. Results indicate that our proposed method consistently outperforms state-of-the-art semi-supervised and fully supervised segmentation algorithms. And our semi-supervised segmentation algorithm can successfully recognize unknown surgical tools and provide good predictions. Also, our MMS approach could achieve inference speeds of about 40 frames per second (fps) and is suitable to deal with the real-time video segmentation. △ Less

Submitted 22 February, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2202.09414 [pdf]

Functional Optical Coherence Tomography for Intrinsic Signal Optoretinography: Recent Developments and Deployment Challenges

Authors: Tae-Hoon Kim, Guangying Ma, Taeyoon Son, Xincheng Yao

Abstract: Intrinsic optical signal (IOS) imaging of the retina, also termed as optoretinography (ORG), promises a noninvasive method for objective assessment of retinal function. By providing unparalleled capability to differentiate individual layers of the retina, functional optical coherence tomography (OCT) has been actively investigated for intrinsic signal ORG measurements. However, clinical deployment… ▽ More Intrinsic optical signal (IOS) imaging of the retina, also termed as optoretinography (ORG), promises a noninvasive method for objective assessment of retinal function. By providing unparalleled capability to differentiate individual layers of the retina, functional optical coherence tomography (OCT) has been actively investigated for intrinsic signal ORG measurements. However, clinical deployment of functional OCT for quantitative ORG is still challenging due to the lack of a standardized imaging protocol and the complication of IOS sources and mechanisms. This article aims to summarize recent developments of functional OCT for ORG measurement, OCT intensity- and phase-based IOS processing. Technical challenges and perspectives of quantitative IOS analysis and ORG interpretations are discussed. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2201.12625 [pdf]

ADC-Net: An Open-Source Deep Learning Network for Automated Dispersion Compensation in Optical Coherence Tomography

Authors: Shaiban Ahmed, David Le, Taeyoon Son, Tobiloba Adejumo, Xincheng Yao, Department of Biomedical Engineering, University of Illinois at Chicago, Department of Ophthalmology, Visual Science, University of Illinois at Chicago

Abstract: Chromatic dispersion is a common problem to degrade the system resolution in optical coherence tomography (OCT). This study is to develop a deep learning network for automated dispersion compensation (ADC-Net) in OCT. The ADC-Net is based on a redesigned UNet architecture which employs an encoder-decoder pipeline. The input section encompasses partially compensated OCT B-scans with individual reti… ▽ More Chromatic dispersion is a common problem to degrade the system resolution in optical coherence tomography (OCT). This study is to develop a deep learning network for automated dispersion compensation (ADC-Net) in OCT. The ADC-Net is based on a redesigned UNet architecture which employs an encoder-decoder pipeline. The input section encompasses partially compensated OCT B-scans with individual retinal layers optimized. Corresponding output is a fully compensated OCT B-scans with all retinal layers optimized. Two numeric parameters, i.e., peak signal to noise ratio (PSNR) and structural similarity index metric computed at multiple scales (MS-SSIM), were used for objective assessment of the ADC-Net performance. Comparative analysis of training models, including single, three, five, seven and nine input channels were implemented. The five-input channels implementation was observed as the optimal mode for ADC-Net training to achieve robust dispersion compensation in OCT △ Less

Submitted 29 January, 2022; originally announced January 2022.

Comments: 18 pages, 5 figures

arXiv:2201.00466 [pdf, other]

RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark

Authors: Zhuo Deng, Yuanhao Cai, Lu Chen, Zheng Gong, Qiqi Bao, Xue Yao, Dong Fang, Shaochong Zhang, Lan Ma

Abstract: Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real cli… ▽ More Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real clinical benchmark has not been explored for this task so far. In this paper, we investigate the real clinical fundus image restoration problem. Firstly, We establish a clinical dataset, Real Fundus (RF), including 120 low- and high-quality (HQ) image pairs. Then we propose a novel Transformer-based Generative Adversarial Network (RFormer) to restore the real degradation of clinical fundus images. The key component in our network is the Window-based Self-Attention Block (WSAB) which captures non-local self-similarity and long-range dependencies. To produce more visually pleasant results, a Transformer-based discriminator is introduced. Extensive experiments on our clinical benchmark show that the proposed RFormer significantly outperforms the state-of-the-art (SOTA) methods. In addition, experiments of downstream tasks such as vessel segmentation and optic disc/cup detection demonstrate that our proposed RFormer benefits clinical fundus image analysis and applications. The dataset, code, and models are publicly available at https://github.com/dengzhuo-AI/Real-Fundus △ Less

Submitted 3 August, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

Comments: IEEE J-BHI 2022; The First Benchmark and First Transformer-based Method for Real Clinical Fundus Image Restoration

arXiv:2112.07775 [pdf]

doi 10.1364/BOE.450913

Depth-resolved vascular profile features for artery-vein classification in OCT and OCT angiography of human retina

Authors: Tobiloba Adejumo, Tae-Hoon Kim, David Le, Taeyoon Son, Guangying Ma, Xincheng Yao

Abstract: This study is to characterize reflectance profiles of retinal blood vessels in optical coherence tomography (OCT), and to validate these vascular features to guide artery-vein classification in OCT angiography (OCTA) of human retina. Depth-resolved OCT reveals unique features of retinal arteries and veins. Retinal arteries show hyper-reflective boundaries at both upper (inner side towards the vitr… ▽ More This study is to characterize reflectance profiles of retinal blood vessels in optical coherence tomography (OCT), and to validate these vascular features to guide artery-vein classification in OCT angiography (OCTA) of human retina. Depth-resolved OCT reveals unique features of retinal arteries and veins. Retinal arteries show hyper-reflective boundaries at both upper (inner side towards the vitreous) and lower (outer side towards the choroid) walls. In contrary, retinal veins reveal hyper-reflectivity at the upper boundary only. Uniform lumen intensity was observed in both small and large arteries. However, the vein lumen intensity was dependent on the vessel size. Small veins exhibit a hyper-reflective zone at the bottom half of the lumen, while large veins show a hypo-reflective zone at the bottom half of the lumen △ Less

Submitted 6 February, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 11 pages, 4 figures

arXiv:2111.10635 [pdf, other]

HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments

Authors: Ji Liu, Zhihua Wu, Dianhai Yu, Yanjun Ma, Danlei Feng, Minxu Zhang, Xinxuan Wu, Xuefeng Yao, De**g Dou

Abstract: Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In… ▽ More Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In addition, heterogeneous computing resources, e.g., CPUs, GPUs of multiple types, are available for the distributed training process. Thus, the scheduling of multiple layers to diverse computing resources is critical for the training process. To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed architecture and a Reinforcement Learning (RL)-based scheduling method. The advantages of Paddle-HeterPS are three-fold compared with existing frameworks. First, Paddle-HeterPS enables efficient training process of diverse workloads with heterogeneous computing resources. Second, Paddle-HeterPS exploits an RL-based method to efficiently schedule the workload of each layer to appropriate computing resources to minimize the cost while satisfying throughput constraints. Third, Paddle-HeterPS manages data storage and data communication among distributed computing resources. We carry out extensive experiments to show that Paddle-HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller). The codes of the framework are publicly available at: https://github.com/PaddlePaddle/Paddle. △ Less

Submitted 7 June, 2023; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: 14 pages, 11 figures, 2 tables; To appear in Future Generation Computer Systems (FGCS)

arXiv:2110.14144 [pdf, other]

doi 10.1016/j.isprsjprs.2022.05.008

Physically Explainable CNN for SAR Image Classification

Authors: Zhongling Huang, Xiwen Yao, Ying Liu, Corneliu Octavian Dumitru, Mihai Datcu, Junwei Han

Abstract: Integrating the special electromagnetic characteristics of Synthetic Aperture Radar (SAR) in deep neural networks is essential in order to enhance the explainability and physics awareness of deep learning. In this paper, we first propose a novel physically explainable convolutional neural network for SAR image classification, namely physics guided and injected learning (PGIL). It comprises three p… ▽ More Integrating the special electromagnetic characteristics of Synthetic Aperture Radar (SAR) in deep neural networks is essential in order to enhance the explainability and physics awareness of deep learning. In this paper, we first propose a novel physically explainable convolutional neural network for SAR image classification, namely physics guided and injected learning (PGIL). It comprises three parts: (1) explainable models (XM) to provide prior physics knowledge, (2) physics guided network (PGN) to encode the knowledge into physics-aware features, and (3) physics injected network (PIN) to adaptively introduce the physics-aware features into classification pipeline for label prediction. A hybrid Image-Physics SAR dataset format is proposed for evaluation, with both Sentinel-1 and Gaofen-3 SAR data being experimented. The results show that the proposed PGIL substantially improve the classification performance in case of limited labeled data compared with the counterpart data-driven CNN and other pre-training methods. Additionally, the physics explanations are discussed to indicate the interpretability and the physical consistency preserved in the predictions. We deem the proposed method would promote the development of physically explainable deep learning in SAR image interpretation field. △ Less

Submitted 2 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing Volume 190, August 2022, Pages 25-37

arXiv:2110.04921 [pdf, other]

doi 10.1364/OE.445001

Increasing a microscope's effective field of view via overlapped imaging and machine learning

Authors: Xing Yao, Vinayak Pathak, Haoran Xi, Amey Chaware, Colin Cooke, Kanghyun Kim, Shiqi Xu, Yuting Li, Timothy Dunn, Pavan Chandra Konda, Kevin C. Zhou, Roarke Horstmeyer

Abstract: This work demonstrates a multi-lens microscopic imaging system that overlaps multiple independent fields of view on a single sensor for high-efficiency automated specimen analysis. Automatic detection, classification and counting of various morphological features of interest is now a crucial component of both biomedical research and disease diagnosis. While convolutional neural networks (CNNs) hav… ▽ More This work demonstrates a multi-lens microscopic imaging system that overlaps multiple independent fields of view on a single sensor for high-efficiency automated specimen analysis. Automatic detection, classification and counting of various morphological features of interest is now a crucial component of both biomedical research and disease diagnosis. While convolutional neural networks (CNNs) have dramatically improved the accuracy of counting cells and sub-cellular features from acquired digital image data, the overall throughput is still typically hindered by the limited space-bandwidth product (SBP) of conventional microscopes. Here, we show both in simulation and experiment that overlapped imaging and co-designed analysis software can achieve accurate detection of diagnostically-relevant features for several applications, including counting of white blood cells and the malaria parasite, leading to multi-fold increase in detection and processing throughput with minimal reduction in accuracy. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2106.07564 [pdf]

An optimized Capsule-LSTM model for facial expression recognition with video sequences

Authors: Siwei Liu, Yuanpeng Long, Gao Xu, Lijia Yang, Shimei Xu, Xiaoming Yao, Kunxian Shu

Abstract: To overcome the limitations of convolutional neural network in the process of facial expression recognition, a facial expression recognition model Capsule-LSTM based on video frame sequence is proposed. This model is composed of three networks includingcapsule encoders, capsule decoders and LSTM network. The capsule encoder extracts the spatial information of facial expressions in video frames. Ca… ▽ More To overcome the limitations of convolutional neural network in the process of facial expression recognition, a facial expression recognition model Capsule-LSTM based on video frame sequence is proposed. This model is composed of three networks includingcapsule encoders, capsule decoders and LSTM network. The capsule encoder extracts the spatial information of facial expressions in video frames. Capsule decoder reconstructs the images to optimize the network. LSTM extracts the temporal information between video frames and analyzes the differences in expression changes between frames. The experimental results from the MMI dataset show that the Capsule-LSTM model proposed in this paper can effectively improve the accuracy of video expression recognition. △ Less

Submitted 27 May, 2021; originally announced June 2021.

Comments: 14pages,4 figurews

arXiv:2106.07563 [pdf]

BPLF: A Bi-Parallel Linear Flow Model for Facial Expression Generation from Emotion Set Images

Authors: Gao Xu, Yuanpeng Long, Siwei Liu, Lijia Yang, Shimei Xu, Xiaoming Yao, Kunxian Shu

Abstract: The flow-based generative model is a deep learning generative model, which obtains the ability to generate data by explicitly learning the data distribution. Theoretically its ability to restore data is stronger than other generative models. However, its implementation has many limitations, including limited model design, too many model parameters and tedious calculation. In this paper, a bi-paral… ▽ More The flow-based generative model is a deep learning generative model, which obtains the ability to generate data by explicitly learning the data distribution. Theoretically its ability to restore data is stronger than other generative models. However, its implementation has many limitations, including limited model design, too many model parameters and tedious calculation. In this paper, a bi-parallel linear flow model for facial emotion generation from emotion set images is constructed, and a series of improvements have been made in terms of the expression ability of the model and the convergence speed in training. The model is mainly composed of several coupling layers superimposed to form a multi-scale structure, in which each coupling layer contains 1*1 reversible convolution and linear operation modules. Furthermore, this paper sorted out the current public data set of facial emotion images, made a new emotion data, and verified the model through this data set. The experimental results show that, under the traditional convolutional neural network, the 3-layer 3*3 convolution kernel is more conducive to extracte the features of the face images. The introduction of principal component decomposition can improve the convergence speed of the model. △ Less

Submitted 27 May, 2021; originally announced June 2021.

Comments: 20 pages, 10 figures

arXiv:2106.06909 [pdf, other]

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

Authors: Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie **, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

Abstract: This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous sp… ▽ More This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality. Baseline systems are provided for popular speech recognition toolkits, namely Athena, ESPnet, Kaldi and Pika. △ Less

Submitted 13 June, 2021; originally announced June 2021.

arXiv:2105.04340 [pdf]

Interaction Theory of Hazard-Target System

Authors: Ji Ge, Yu-Yuan Zhang, Kai-Li Xu, Ji-Shuo Li, Xi-Wen Yao, Chun-Ying Wu, Shuang-Yuan Li, Fang Yan, **-Jia Zhang, Qing-Wei Xu

Abstract: Major accidents (e.g., the Space Shuttle Challenger disaster in the USA, the Bhopal Disaster in India, Fukushima nuclear accident in Japan, Tian** Port fire and explosion accident in China) have occurred all over the world. Safety scientists are always trying to understand why these accidents happened and how to prevent these accidents. Accident models and theories form the basis for many safety… ▽ More Major accidents (e.g., the Space Shuttle Challenger disaster in the USA, the Bhopal Disaster in India, Fukushima nuclear accident in Japan, Tian** Port fire and explosion accident in China) have occurred all over the world. Safety scientists are always trying to understand why these accidents happened and how to prevent these accidents. Accident models and theories form the basis for many safety research fields and practices such as investigation of accidents, design of a safer system and decision making on safety related field. There is no universally accepted model with useful elements relating to understanding accident causation, although many accident causation models exist. Based on STAMP and RMF, we proposed a new theory named the Interaction Theory of Hazard-Target System (ITHTS) that incorporate human, organisational and technological characteristics in the same framework. Accident analysis methods provide the necessary information to analysis the accident in a specific setting. In order to solve the issues that current accident analysis methods still face, we proposed a new systemic accident analysis method based on ITHTS and STPA. We choose Tian** Port fire and explosion accident in China as a case study to demonstrate the viability of the Interaction Theory of Hazard-target System and the applicability of the new accident analysis method. It is concluded that ITHTS can explain the phenomena in safety practice and the new accident analysis method can be application in the explanation and analysis of major accident. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: 28 pages, 9 figures, 3 tables

arXiv:2009.05103 [pdf, other]

Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space

Authors: Sicheng Zhao, Yaxian Li, Xingxu Yao, Weizhi Nie, Pengfei Xu, Jufeng Yang, Kurt Keutzer

Abstract: Both images and music can convey rich semantics and are widely used to induce specific emotions. Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music matching methods either employ limited categorical emotion states which cannot well reflect the complexity and subtlety of emotions, or train the matchi… ▽ More Both images and music can convey rich semantics and are widely used to induce specific emotions. Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music matching methods either employ limited categorical emotion states which cannot well reflect the complexity and subtlety of emotions, or train the matching model using an impractical multi-stage pipeline. In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space. First, we construct a large-scale dataset, termed Image-Music-Emotion-Matching-Net (IMEMNet), with over 140K image-music pairs. Second, we propose cross-modal deep continuous metric learning (CDCML) to learn a shared latent embedding space which preserves the cross-modal similarity relationship in the continuous matching space. Finally, we refine the embedding space by further preserving the single-modal emotion relationship in the VA spaces of both images and music. The metric learning in the embedding space and task regression in the label space are jointly optimized for both cross-modal matching and single-modal VA prediction. The extensive experiments conducted on IMEMNet demonstrate the superiority of CDCML for emotion-based image and music matching as compared to the state-of-the-art approaches. △ Less

Submitted 22 August, 2020; originally announced September 2020.

Comments: Accepted by ACM Multimedia 2020

arXiv:2006.03742 [pdf]

AV-Net: Deep learning for fully automated artery-vein classification in optical coherence tomography angiography

Authors: Minhaj Alam, David Le, Taeyoon Son, Jennifer I. Lim, Xincheng Yao

Abstract: This study is to demonstrate deep learning for automated artery-vein (AV) classification in optical coherence tomography angiography (OCTA). The AV-Net, a fully convolutional network (FCN) based on modified U-shaped CNN architecture, incorporates enface OCT and OCTA to differentiate arteries and veins. For the multi-modal training process, the enface OCT works as a near infrared fundus image to pr… ▽ More This study is to demonstrate deep learning for automated artery-vein (AV) classification in optical coherence tomography angiography (OCTA). The AV-Net, a fully convolutional network (FCN) based on modified U-shaped CNN architecture, incorporates enface OCT and OCTA to differentiate arteries and veins. For the multi-modal training process, the enface OCT works as a near infrared fundus image to provide vessel intensity profiles, and the OCTA contains blood flow strength and vessel geometry features. A transfer learning process is also integrated to compensate for the limitation of available dataset size of OCTA, which is a relatively new imaging modality. By providing an average accuracy of 86.75%, the AV-Net promises a fully automated platform to foster clinical deployment of differential AV analysis in OCTA. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2005.07036 [pdf, ps, other]

Infant Crying Detection in Real-World Environments

Authors: Xuewen Yao, Megan Micheletti, Mckensey Johnson, Edison Thomaz, Kaya de Barbaro

Abstract: Most existing cry detection models have been tested with data collected in controlled settings. Thus, the extent to which they generalize to noisy and lived environments is unclear. In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.613 (Prec… ▽ More Most existing cry detection models have been tested with data collected in controlled settings. Thus, the extent to which they generalize to noisy and lived environments is unclear. In this paper, we evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features. This model was able to recognize crying events with F1 score 0.613 (Precision: 0.672, Recall: 0.552), showing improved external validity over existing methods at cry detection in everyday real-world settings. As part of our evaluation, we collect and annotate a novel dataset of infant crying compiled from over 780 hours of labeled real-world audio data, captured via recorders worn by infants in their homes, which we make publicly available. Our findings confirm that a cry detection model trained on in-lab data underperforms when presented with real-world data (in-lab test F1: 0.656, real-world test F1: 0.236), highlighting the value of our new dataset and model. △ Less

Submitted 16 February, 2022; v1 submitted 12 May, 2020; originally announced May 2020.

arXiv:2002.07699 [pdf, other]

Cognitive Biomarker Prioritization in Alzheimer's Disease using Brain Morphometric Data

Authors: Bo Peng, Xiaohui Yao, Shannon L. Risacher, Andrew J. Saykin, Li Shen, Xia Ning

Abstract: Background:Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimer's Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are no… ▽ More Background:Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimer's Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are not customized for each individual subject. In this manuscript, we develop a machine learning paradigm enabling personalized cognitive assessments prioritization. Method: We adapt a newly developed learning-to-rank approach PLTR to implement our paradigm. This method learns the latent scoring function that pushes the most effective cognitive assessments onto the top of the prioritization list. We also extend PLTR to better separate the most effective cognitive assessments and the less effective ones. Results: Our empirical study on the ADNI data shows that the proposed paradigm outperforms the state-of-the-art baselines on identifying and prioritizing individual-specific cognitive biomarkers. We conduct experiments in cross validation and level-out validation settings. In the two settings, our paradigm significantly outperforms the best baselines with improvement as much as 22.1% and 19.7%, respectively, on prioritizing cognitive features. Conclusions: The proposed paradigm achieves superior performance on prioritizing cognitive biomarkers. The cognitive biomarkers prioritized on top have great potentials to facilitate personalized diagnosis, disease subty**, and ultimately precision medicine in AD. △ Less

Submitted 12 November, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: This paper has been accepted by BMC MIDM

arXiv:2001.03129 [pdf, other]

Beyond Fourier transform: super-resolving optical coherence tomography

Authors: Yuye Ling, Mengyuan Wang, Yu Gan, Xinwen Yao, Leopold Schmetterer, Chuanqing Zhou, Yikai Su

Abstract: Optical coherence tomography (OCT) is a volumetric imaging modality that empowers clinicians and scientists to noninvasively visualize the cross-sections of biological samples. As the latest generation of its kind, Fourier-domain OCT (FD-OCT) offers a micrometer-scale axial resolution by taking advantage of coherence gating. Based on the current theory, it is believed the only way to obtain a high… ▽ More Optical coherence tomography (OCT) is a volumetric imaging modality that empowers clinicians and scientists to noninvasively visualize the cross-sections of biological samples. As the latest generation of its kind, Fourier-domain OCT (FD-OCT) offers a micrometer-scale axial resolution by taking advantage of coherence gating. Based on the current theory, it is believed the only way to obtain a higher-axial-resolution OCT image is to physically extend the system's spectral bandwidth given a certain central wavelength. Here, we showed the belief is wrong. We proposed a novel reconstruction framework, which integrates prior knowledge and exploits the \emph{shift-variance}, to retrospectively super-resolve OCT images without altering the system configurations. Both numerical and experimental results confirmed the processed image manifested an axial resolution beyond the previous theoretical prediction. We believe this result not only opens new horizons for future research directions in OCT reconstruction but also promises an immediate upgrade to tens of thousands of legacy OCT units currently deployed. △ Less

Submitted 19 May, 2020; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: 30 pages, 9 figures

arXiv:1912.05920 [pdf, other]

Measuring Mother-Infant Emotions By Audio Sensing

Authors: Xuewen Yao, Dong He, Tiancheng **g, Kaya de Barbaro

Abstract: It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mother's speech. In order to build a model for sp… ▽ More It has been suggested in developmental psychology literature that the communication of affect between mothers and their infants correlates with the socioemotional and cognitive development of infants. In this study, we obtained day-long audio recordings of 10 mother-infant pairs in order to study their affect communication in speech with a focus on mother's speech. In order to build a model for speech emotion detection, we used the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and trained a Convolutional Neural Nets model which is able to classify 6 different emotions at 70% accuracy. We applied our model to mother's speech and found the dominant emotions were angry and sad, which were not true. Based on our own observations, we concluded that emotional speech databases made with the help of actors cannot generalize well to real-life settings, suggesting an active learning or unsupervised approach in the future. △ Less

Submitted 10 December, 2019; originally announced December 2019.

arXiv:1910.07002 [pdf]

Trans-pars-planar illumination enables 200° ultra-wide field pediatric fundus photography for easy examination of the retina

Authors: Devrim Toslak, Felix Chau, Muhammet Kazim Erol, Changgeng Liu, R. V. Paul Chan, Taeyoon Son, Xincheng Yao

Abstract: This study is to test the feasibility of using trans-pars-planar illumination for ultrawide field pediatric fundus photography. Fundus examination of the peripheral retina is essential for clinical management of pediatric eye diseases. However, current pediatric fundus cameras with traditional trans-pupillary illumination provide a limited field of view (FOV), making it difficult to access the per… ▽ More This study is to test the feasibility of using trans-pars-planar illumination for ultrawide field pediatric fundus photography. Fundus examination of the peripheral retina is essential for clinical management of pediatric eye diseases. However, current pediatric fundus cameras with traditional trans-pupillary illumination provide a limited field of view (FOV), making it difficult to access the peripheral retina adequately for a comprehensive assessment of eye conditions. Here, we report the first demonstration of trans-pars-planar illumination in ultra-wide field pediatric fundus photography. For proof-of-concept validation, all off-the-shelf optical components were selected to construct a lab prototype pediatric camera (PedCam). By freeing the entire pupil for imaging purpose only, the trans-pars-planar illumination enables a 200o FOV in a snapshot fundus image, allowing easy visualization of both the central and peripheral retina up to the ora serrata. A low-cost, easy-to-use ultra-wide field PedCam provides a unique opportunity to foster affordable telemedicine in rural and underserved areas. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: 9 pages, and 3 figures

arXiv:1910.01796 [pdf]

doi 10.1167/tvst.9.2.35

Transfer Learning for Automated OCTA Detection of Diabetic Retinopathy

Authors: David Le, Minhaj Alam, Cham Yao, Jennifer I. Lim, R. V. P. Chan, Devrim Toslak, Xincheng Yao

Abstract: Purpose: To test the feasibility of using deep learning for optical coherence tomography angiography (OCTA) detection of diabetic retinopathy (DR). Methods: A deep learning convolutional neural network (CNN) architecture VGG16 was employed for this study. A transfer learning process was implemented to re-train the CNN for robust OCTA classification. In order to demonstrate the feasibility of using… ▽ More Purpose: To test the feasibility of using deep learning for optical coherence tomography angiography (OCTA) detection of diabetic retinopathy (DR). Methods: A deep learning convolutional neural network (CNN) architecture VGG16 was employed for this study. A transfer learning process was implemented to re-train the CNN for robust OCTA classification. In order to demonstrate the feasibility of using this method for artificial intelligence (AI) screening of DR in clinical environments, the re-trained CNN was incorporated into a custom developed GUI platform which can be readily operated by ophthalmic personnel. Results: With last nine layers re-trained, CNN architecture achieved the best performance for automated OCTA classification. The overall accuracy of the re-trained classifier for differentiating healthy, NoDR, and NPDR was 87.27%, with 83.76% sensitivity and 90.82% specificity. The AUC metrics for binary classification of healthy, NoDR and DR were 0.97, 0.98 and 0.97, respectively. The GUI platform enabled easy validation of the method for AI screening of DR in a clinical environment. Conclusion: With a transfer leaning process to adopt the early layers for simple feature analysis and to re-train the upper layers for fine feature analysis, the CNN architecture VGG16 can be used for robust OCTA classification of healthy, NoDR, and NPDR eyes. Translational Relevance: OCTA can capture microvascular changes in early DR. A transfer learning process enables robust implementation of convolutional neural network (CNN) for automated OCTA classification of DR. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: 20 pages, 4 figures, 6 tables

arXiv:1906.02745 [pdf, other]

Automated Classification of Seizures against Nonseizures: A Deep Learning Approach

Authors: Xinghua Yao, Qiang Cheng, Guo-Qiang Zhang

Abstract: In current clinical practice, electroencephalograms (EEG) are reviewed and analyzed by well-trained neurologists to provide supports for therapeutic decisions. The way of manual reviewing is labor-intensive and error prone. Automatic and accurate seizure/nonseizure classification methods are needed. One major problem is that the EEG signals for seizure state and nonseizure state exhibit considerab… ▽ More In current clinical practice, electroencephalograms (EEG) are reviewed and analyzed by well-trained neurologists to provide supports for therapeutic decisions. The way of manual reviewing is labor-intensive and error prone. Automatic and accurate seizure/nonseizure classification methods are needed. One major problem is that the EEG signals for seizure state and nonseizure state exhibit considerable variations. In order to capture essential seizure features, this paper integrates an emerging deep learning model, the independently recurrent neural network (IndRNN), with a dense structure and an attention mechanism to exploit temporal and spatial discriminating features and overcome seizure variabilities. The dense structure is to ensure maximum information flow between layers. The attention mechanism is to capture spatial features. Evaluations are performed in cross-validation experiments over the noisy CHB-MIT data set. The obtained average sensitivity, specificity and precision of 88.80%, 88.60% and 88.69% are better than using the current state-of-the-art methods. In addition, we explore how the segment length affects the classification performance. Thirteen different segment lengths are assessed, showing that the classification performance varies over the segment lengths, and the maximal fluctuating margin is more than 4%. Thus, the segment length is an important factor influencing the classification performance. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:1903.09326

arXiv:1905.04224 [pdf]

doi 10.3390/jcm8060872

Supervised machine learning based multi-task artificial intelligence classification of retinopathies

Authors: Minhaj Alam, David Le, Jennifer I. Lim, R. V. P. Chan, Xincheng Yao

Abstract: Artificial intelligence (AI) classification holds promise as a novel and affordable screening tool for clinical management of ocular diseases. Rural and underserved areas, which suffer from lack of access to experienced ophthalmologists may particularly benefit from this technology. Quantitative optical coherence tomography angiography (OCTA) imaging provides excellent capability to identify subtl… ▽ More Artificial intelligence (AI) classification holds promise as a novel and affordable screening tool for clinical management of ocular diseases. Rural and underserved areas, which suffer from lack of access to experienced ophthalmologists may particularly benefit from this technology. Quantitative optical coherence tomography angiography (OCTA) imaging provides excellent capability to identify subtle vascular distortions, which are useful for classifying retinovascular diseases. However, application of AI for differentiation and classification of multiple eye diseases is not yet established. In this study, we demonstrate supervised machine learning based multi-task OCTA classification. We sought 1) to differentiate normal from diseased ocular conditions, 2) to differentiate different ocular disease conditions from each other, and 3) to stage the severity of each ocular condition. Quantitative OCTA features, including blood vessel tortuosity (BVT), blood vascular caliber (BVC), vessel perimeter index (VPI), blood vessel density (BVD), foveal avascular zone (FAZ) area (FAZ-A), and FAZ contour irregularity (FAZ-CI) were fully automatically extracted from the OCTA images. A stepwise backward elimination approach was employed to identify sensitive OCTA features and optimal-feature-combinations for the multi-task classification. For proof-of-concept demonstration, diabetic retinopathy (DR) and sickle cell retinopathy (SCR) were used to validate the supervised machine leaning classifier. The presented AI classification methodology is applicable and can be readily extended to other ocular diseases, holding promise to enable a mass-screening platform for clinical deployment and telemedicine. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: Supplemental material attached at the end

Journal ref: https://www.mdpi.com/2077-0383/8/6/872

Showing 1–48 of 48 results for author: Yao, X