-
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
Authors:
Fengtao Zhou,
Yingxue Xu,
Yanfen Cui,
Shenyan Zhang,
Yun Zhu,
Weiyang He,
Jiguang Wang,
Xin Wang,
Ronald Chan,
Louis Ho Shing Lau,
Chu Han,
Dafu Zhang,
Zhenhui Li,
Hao Chen
Abstract:
Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among…
▽ More
Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among patients, with a considerable subset displaying treatment resistance. Ineffective NACT not only leads to adverse effects but also misses the optimal therapeutic window, resulting in lower survival rate. However, existing multimodal learning methods assume the availability of all modalities for each patient, which does not align with the reality of clinical practice. The limited availability of modalities for each patient would cause information loss, adversely affecting predictive accuracy. In this study, we propose an incomplete multimodal data integration framework for GC (iMD4GC) to address the challenges posed by incomplete multimodal data, enabling precise response prediction and survival analysis. Specifically, iMD4GC incorporates unimodal attention layers for each modality to capture intra-modal information. Subsequently, the cross-modal interaction layers explore potential inter-modal interactions and capture complementary information across modalities, thereby enabling information compensation for missing modalities. To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis. The scale of our datasets is significantly larger than previous studies. The iMD4GC achieved impressive performance with an 80.2% AUC on GastricRes, 71.4% C-index on GastricSur, and 66.1% C-index on TCGA-STAD, significantly surpassing other compared methods.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Single-Shot Plug-and-Play Methods for Inverse Problems
Authors:
Yanqi Cheng,
Lipei Zhang,
Zhenda Shen,
Shujun Wang,
Lequan Yu,
Raymond H. Chan,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on p…
▽ More
The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations
Authors:
Zhenda Shen,
Yanqi Cheng,
Raymond H. Chan,
Pietro Liò,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secon…
▽ More
Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secondly, TRIDENT efficiently captures frequency information, a feature called frequency compactness. Thirdly, it has the capability to represent signals or images such that most of its energy is concentrated in a limited spatial region, denoting spatial compactness. We demonstrated through extensive experiments on various inverse problems that our proposed function outperforms existing implicit neural representation functions.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions
Authors:
Luyang Luo,
Xi Wang,
Yi Lin,
Xiaoqi Ma,
Andong Tan,
Ronald Chan,
Varut Vardhanabhuti,
Winnie CW Chu,
Kwang-Ting Cheng,
Hao Chen
Abstract:
Breast cancer has reached the highest incidence rate worldwide among all malignancies since 2020. Breast imaging plays a significant role in early diagnosis and intervention to improve the outcome of breast cancer patients. In the past decade, deep learning has shown remarkable progress in breast cancer imaging analysis, holding great promise in interpreting the rich information and complex contex…
▽ More
Breast cancer has reached the highest incidence rate worldwide among all malignancies since 2020. Breast imaging plays a significant role in early diagnosis and intervention to improve the outcome of breast cancer patients. In the past decade, deep learning has shown remarkable progress in breast cancer imaging analysis, holding great promise in interpreting the rich information and complex context of breast imaging modalities. Considering the rapid improvement in deep learning technology and the increasing severity of breast cancer, it is critical to summarize past progress and identify future challenges to be addressed. This paper provides an extensive review of deep learning-based breast cancer imaging research, covering studies on mammogram, ultrasound, magnetic resonance imaging, and digital pathology images over the past decade. The major deep learning methods and applications on imaging-based screening, diagnosis, treatment response prediction, and prognosis are elaborated and discussed. Drawn from the findings of this survey, we present a comprehensive discussion of the challenges and potential avenues for future research in deep learning-based breast cancer imaging.
△ Less
Submitted 20 January, 2024; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition
Authors:
Cindy M. Nguyen,
Eric R. Chan,
Alexander W. Bergman,
Gordon Wetzstein
Abstract:
Capturing images is a key part of automation for high-level tasks such as scene text recognition. Low-light conditions pose a challenge for high-level perception stacks, which are often optimized on well-lit, artifact-free images. Reconstruction methods for low-light images can produce well-lit counterparts, but typically at the cost of high-frequency details critical for downstream tasks. We prop…
▽ More
Capturing images is a key part of automation for high-level tasks such as scene text recognition. Low-light conditions pose a challenge for high-level perception stacks, which are often optimized on well-lit, artifact-free images. Reconstruction methods for low-light images can produce well-lit counterparts, but typically at the cost of high-frequency details critical for downstream tasks. We propose Diffusion in the Dark (DiD), a diffusion model for low-light image reconstruction for text recognition. DiD provides qualitatively competitive reconstructions with that of state-of-the-art (SOTA), while preserving high-frequency details even in extremely noisy, dark conditions. We demonstrate that DiD, without any task-specific optimization, can outperform SOTA low-light methods in low-light text recognition on real images, bolstering the potential of diffusion models to solve ill-posed inverse problems.
△ Less
Submitted 30 October, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
A Global and Patch-wise Contrastive Loss for Accurate Automated Exudate Detection
Authors:
Wei Tang,
Kangning Cui,
Raymond H. Chan
Abstract:
Diabetic retinopathy (DR) is a leading global cause of blindness. Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss. However, the unique characteristics of hard exudates, ranging from their inconsistent shapes to indistinct boundaries, pose significant challenges to existing segmentation techniques. To address these…
▽ More
Diabetic retinopathy (DR) is a leading global cause of blindness. Early detection of hard exudates plays a crucial role in identifying DR, which aids in treating diabetes and preventing vision loss. However, the unique characteristics of hard exudates, ranging from their inconsistent shapes to indistinct boundaries, pose significant challenges to existing segmentation techniques. To address these issues, we present a novel supervised contrastive learning framework to optimize hard exudate segmentation. Specifically, we introduce a patch-wise density contrasting scheme to distinguish between areas with varying lesion concentrations, and therefore improve the model's proficiency in segmenting small lesions. To handle the ambiguous boundaries, we develop a discriminative edge inspection module to dynamically analyze the pixels that lie around the boundaries and accurately delineate the exudates. Upon evaluation using the IDRiD dataset and comparison with state-of-the-art frameworks, our method exhibits its effectiveness and shows potential for computer-assisted hard exudate detection. The code to replicate experiments is available at github.com/wetang7/HECL/.
△ Less
Submitted 2 March, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Continuous U-Net: Faster, Greater and Noiseless
Authors:
Chun-Wun Cheng,
Christina Runkel,
Lihao Liu,
Raymond H Chan,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromisin…
▽ More
Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromising the performance and computational cost, as well as the fact that they do not account for inherent noise in the data. They have problems associated with discrete layers, and do not offer any theoretical underpinning. In this work we introduce continuous U-Net, a novel family of networks for image segmentation. Firstly, continuous U-Net is a continuous deep neural network that introduces new dynamic blocks modelled by second order ordinary differential equations. Secondly, we provide theoretical guarantees for our network demonstrating faster convergence, higher robustness and less sensitivity to noise. Thirdly, we derive qualitative measures to tailor-made segmentation tasks. We demonstrate, through extensive numerical and visual results, that our model outperforms existing U-Net blocks for several medical image segmentation benchmarking datasets.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
A portable widefield fundus camera with high dynamic range imaging capability
Authors:
Alfa Rossi,
Mojtaba Rahimi,
David Le,
Taeyoon son,
Michael J. Heiferman,
R. V. Paul Chan,
Xincheng Yao
Abstract:
Fundus photography is indispensable for clinical detection and management of eye diseases. Limited image contrast and field of view (FOV) are common limitations of conventional fundus cameras, making it difficult to detect subtle abnormalities at the early stages of eye diseases. Further improvements of image contrast and FOV coverage are important to improve early disease detection and reliable t…
▽ More
Fundus photography is indispensable for clinical detection and management of eye diseases. Limited image contrast and field of view (FOV) are common limitations of conventional fundus cameras, making it difficult to detect subtle abnormalities at the early stages of eye diseases. Further improvements of image contrast and FOV coverage are important to improve early disease detection and reliable treatment assessment. We report here a portable fundus camera, with a wide FOV and high dynamic range (HDR) imaging capabilities. Miniaturized indirect ophthalmoscopy illumination was employed to achieve the portable design for nonmydriatic, widefield fundus photography. Orthogonal polarization control was used to eliminate illumination reflectance artifact. With independent power controls, three fundus images were sequentially acquired and fused to achieve HDR function for local image contrast enhancement. A 101° eye-angle (67° visual-angle) snapshot FOV was achieved for nonmydriatic fundus photography. The effective FOV can be readily expanded up to 190° eye-angle (134° visual-angle) with the aid of a fixation target, without the need of pharmacologic pupillary dilation. The effectiveness of HDR imaging was validated with both normal healthy and pathologic eyes, compared to a conventional fundus camera.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Spherical Image Inpainting with Frame Transformation and Data-driven Prior Deep Networks
Authors:
Jianfei Li,
Chaoyan Huang,
Raymond Chan,
Han Feng,
Micheal Ng,
Tieyong Zeng
Abstract:
Spherical image processing has been widely applied in many important fields, such as omnidirectional vision for autonomous cars, global climate modelling, and medical imaging. It is non-trivial to extend an algorithm developed for flat images to the spherical ones. In this work, we focus on the challenging task of spherical image inpainting with deep learning-based regularizer. Instead of a naive…
▽ More
Spherical image processing has been widely applied in many important fields, such as omnidirectional vision for autonomous cars, global climate modelling, and medical imaging. It is non-trivial to extend an algorithm developed for flat images to the spherical ones. In this work, we focus on the challenging task of spherical image inpainting with deep learning-based regularizer. Instead of a naive application of existing models for planar images, we employ a fast directional spherical Haar framelet transform and develop a novel optimization framework based on a sparsity assumption of the framelet transform. Furthermore, by employing progressive encoder-decoder architecture, a new and better-performed deep CNN denoiser is carefully designed and works as an implicit regularizer. Finally, we use a plug-and-play method to handle the proposed optimization model, which can be implemented efficiently by training the CNN denoiser prior. Numerical experiments are conducted and show that the proposed algorithms can greatly recover damaged spherical images and achieve the best performance over purely using deep learning denoiser and plug-and-play model.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models
Authors:
Minhaj Nur Alam,
Rikiya Yamashita,
Vignav Ramesh,
Tejas Prabhune,
Jennifer I. Lim,
R. V. P. Chan,
Joelle Hallak,
Theodore Leng,
Daniel Rubin
Abstract:
Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets, reducing the burden of label generation. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification. We have developed a CL based framework with neural s…
▽ More
Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets, reducing the burden of label generation. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification. We have developed a CL based framework with neural style transfer (NST) augmentation to produce models with better representations and initializations for the detection of DR in color fundus images. We compare our CL pretrained model performance with two state of the art baseline models pretrained with Imagenet weights. We further investigate the model performance with reduced labeled training data (down to 10 percent) to test the robustness of the model when trained with small, labeled datasets. The model is trained and validated on the EyePACS dataset and tested independently on clinical data from the University of Illinois, Chicago (UIC). Compared to baseline models, our CL pretrained FundusNet model had higher AUC (CI) values (0.91 (0.898 to 0.930) vs 0.80 (0.783 to 0.820) and 0.83 (0.801 to 0.853) on UIC data). At 10 percent labeled training data, the FundusNet AUC was 0.81 (0.78 to 0.84) vs 0.58 (0.56 to 0.64) and 0.63 (0.60 to 0.66) in baseline models, when tested on the UIC dataset. CL based pretraining with NST significantly improves DL classification performance, helps the model generalize well (transferable from EyePACS to UIC data), and allows training with small, annotated datasets, therefore reducing ground truth annotation burden of the clinicians.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Deep Learning for Computational Cytology: A Survey
Authors:
Hao Jiang,
Yanning Zhou,
Yi Lin,
Ronald CK Chan,
Jiang Liu,
Hao Chen
Abstract:
Computational cytology is a critical, rapid-develo**, yet challenging topic in the field of medical image computing which analyzes the digitized cytology image by computer-aided technologies for cancer screening. Recently, an increasing number of deep learning (DL) algorithms have made significant progress in medical image analysis, leading to the boosting publications of cytological studies. To…
▽ More
Computational cytology is a critical, rapid-develo**, yet challenging topic in the field of medical image computing which analyzes the digitized cytology image by computer-aided technologies for cancer screening. Recently, an increasing number of deep learning (DL) algorithms have made significant progress in medical image analysis, leading to the boosting publications of cytological studies. To investigate the advanced methods and comprehensive applications, we survey more than 120 publications of DL-based cytology image analysis in this article. We first introduce various deep learning methods, including fully supervised, weakly supervised, unsupervised, and transfer learning. Then, we systematically summarize the public datasets, evaluation metrics, versatile cytology image analysis applications including classification, detection, segmentation, and other related tasks. Finally, we discuss current challenges and potential research directions of computational cytology.
△ Less
Submitted 16 February, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
The CORSMAL benchmark for the prediction of the properties of containers
Authors:
Alessio Xompero,
Santiago Donaher,
Vladimir Iashin,
Francesca Palermo,
Gökhan Solak,
Claudio Coppola,
Reina Ishikawa,
Yuichi Nagao,
Ryo Hachiuma,
Qi Liu,
Fan Feng,
Chuanlin Lan,
Rosa H. M. Chan,
Guilherme Christmann,
Jyun-Ting Song,
Gonuguntla Neeharika,
Chinnakotla Krishna Teja Reddy,
Dinesh Jain,
Bakhtawar Ur Rehman,
Andrea Cavallaro
Abstract:
The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmar…
▽ More
The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and estimating the filling mass with audio-visual multi-stage approaches reach up to 65% weighted average capacity and mass scores. These results show that there is still room for improvement on the design of new methods. These new methods can be ranked and compared on the individual leaderboards provided by our open framework.
△ Less
Submitted 21 April, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
I-ODA, Real-World Multi-modal Longitudinal Data for OphthalmicApplications
Authors:
Nooshin Mojab,
Vahid Noroozi,
Abdullah Aleem,
Manoj P. Nallabothula,
Joseph Baker,
Dimitri T. Azar,
Mark Rosenblatt,
RV Paul Chan,
Darvin Yi,
Philip S. Yu,
Joelle A. Hallak
Abstract:
Data from clinical real-world settings is characterized by variability in quality, machine-type, setting, and source. One of the primary goals of medical computer vision is to develop and validate artificial intelligence (AI) based algorithms on real-world data enabling clinical translations. However, despite the exponential growth in AI based applications in healthcare, specifically in ophthalmol…
▽ More
Data from clinical real-world settings is characterized by variability in quality, machine-type, setting, and source. One of the primary goals of medical computer vision is to develop and validate artificial intelligence (AI) based algorithms on real-world data enabling clinical translations. However, despite the exponential growth in AI based applications in healthcare, specifically in ophthalmology, translations to clinical settings remain challenging. Limited access to adequate and diverse real-world data inhibits the development and validation of translatable algorithms. In this paper, we present a new multi-modal longitudinal ophthalmic imaging dataset, the Illinois Ophthalmic Database Atlas (I-ODA), with the goal of advancing state-of-the-art computer vision applications in ophthalmology, and improving upon the translatable capacity of AI based applications across different clinical settings. We present the infrastructure employed to collect, annotate, and anonymize images from multiple sources, demonstrating the complexity of real-world retrospective data and its limitations. I-ODA includes 12 imaging modalities with a total of 3,668,649 ophthalmic images of 33,876 individuals from the Department of Ophthalmology and Visual Sciences at the Illinois Eye and Ear Infirmary of the University of Illinois Chicago (UIC) over the course of 12 years.
△ Less
Submitted 29 March, 2021;
originally announced April 2021.
-
Point Spread Function Engineering for 3D Imaging of Space Debris using a Continuous Exact l0 Penalty (CEL0) Based Algorithm
Authors:
Chao Wang,
Raymond H. Chan,
Robert J. Plemmons,
Sudhakar Prasad
Abstract:
We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D…
▽ More
We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D inverse problem in the discretized form. A recently developed penalty called continuous exact l0 (CEL0) is applied in this problem for the Gaussian noise model. Numerical experiments and comparisons illustrate the efficiency of the algorithm.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
IROS 2019 Lifelong Robotic Vision Challenge -- Lifelong Object Recognition Report
Authors:
Qi She,
Fan Feng,
Qi Liu,
Rosa H. M. Chan,
Xinyue Hao,
Chuanlin Lan,
Qihan Yang,
Vincenzo Lomonaco,
German I. Parisi,
Heechul Bae,
Eoin Brophy,
Baoquan Chen,
Gabriele Graffieti,
Vidit Goel,
Hyonyoung Han,
Sathursan Kanagarajah,
Somesh Kumar,
Siew-Kei Lam,
Tin Lun Lam,
Liang Ma,
Davide Maltoni,
Lorenzo Pellegrini,
Duvindu Piyasena,
Shiliang Pu,
Debdoot Sheet
, et al. (11 additional authors not shown)
Abstract:
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, w…
▽ More
This report summarizes IROS 2019-Lifelong Robotic Vision Competition (Lifelong Object Recognition Challenge) with methods and results from the top $8$ finalists (out of over~$150$ teams). The competition dataset (L)ifel(O)ng (R)obotic V(IS)ion (OpenLORIS) - Object Recognition (OpenLORIS-object) is designed for driving lifelong/continual learning research and application in robotic vision domain, with everyday objects in home, office, campus, and mall scenarios. The dataset explicitly quantifies the variants of illumination, object occlusion, object size, camera-object distance/angles, and clutter information. Rules are designed to quantify the learning capability of the robotic vision system when faced with the objects appearing in the dynamic environments in the contest. Individual reports, dataset information, rules, and released source code can be found at the project homepage: "https://lifelong-robotic-vision.github.io/competition/".
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Trans-pars-planar illumination enables 200° ultra-wide field pediatric fundus photography for easy examination of the retina
Authors:
Devrim Toslak,
Felix Chau,
Muhammet Kazim Erol,
Changgeng Liu,
R. V. Paul Chan,
Taeyoon Son,
Xincheng Yao
Abstract:
This study is to test the feasibility of using trans-pars-planar illumination for ultrawide field pediatric fundus photography. Fundus examination of the peripheral retina is essential for clinical management of pediatric eye diseases. However, current pediatric fundus cameras with traditional trans-pupillary illumination provide a limited field of view (FOV), making it difficult to access the per…
▽ More
This study is to test the feasibility of using trans-pars-planar illumination for ultrawide field pediatric fundus photography. Fundus examination of the peripheral retina is essential for clinical management of pediatric eye diseases. However, current pediatric fundus cameras with traditional trans-pupillary illumination provide a limited field of view (FOV), making it difficult to access the peripheral retina adequately for a comprehensive assessment of eye conditions. Here, we report the first demonstration of trans-pars-planar illumination in ultra-wide field pediatric fundus photography. For proof-of-concept validation, all off-the-shelf optical components were selected to construct a lab prototype pediatric camera (PedCam). By freeing the entire pupil for imaging purpose only, the trans-pars-planar illumination enables a 200o FOV in a snapshot fundus image, allowing easy visualization of both the central and peripheral retina up to the ora serrata. A low-cost, easy-to-use ultra-wide field PedCam provides a unique opportunity to foster affordable telemedicine in rural and underserved areas.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Transfer Learning for Automated OCTA Detection of Diabetic Retinopathy
Authors:
David Le,
Minhaj Alam,
Cham Yao,
Jennifer I. Lim,
R. V. P. Chan,
Devrim Toslak,
Xincheng Yao
Abstract:
Purpose: To test the feasibility of using deep learning for optical coherence tomography angiography (OCTA) detection of diabetic retinopathy (DR). Methods: A deep learning convolutional neural network (CNN) architecture VGG16 was employed for this study. A transfer learning process was implemented to re-train the CNN for robust OCTA classification. In order to demonstrate the feasibility of using…
▽ More
Purpose: To test the feasibility of using deep learning for optical coherence tomography angiography (OCTA) detection of diabetic retinopathy (DR). Methods: A deep learning convolutional neural network (CNN) architecture VGG16 was employed for this study. A transfer learning process was implemented to re-train the CNN for robust OCTA classification. In order to demonstrate the feasibility of using this method for artificial intelligence (AI) screening of DR in clinical environments, the re-trained CNN was incorporated into a custom developed GUI platform which can be readily operated by ophthalmic personnel. Results: With last nine layers re-trained, CNN architecture achieved the best performance for automated OCTA classification. The overall accuracy of the re-trained classifier for differentiating healthy, NoDR, and NPDR was 87.27%, with 83.76% sensitivity and 90.82% specificity. The AUC metrics for binary classification of healthy, NoDR and DR were 0.97, 0.98 and 0.97, respectively. The GUI platform enabled easy validation of the method for AI screening of DR in a clinical environment. Conclusion: With a transfer leaning process to adopt the early layers for simple feature analysis and to re-train the upper layers for fine feature analysis, the CNN architecture VGG16 can be used for robust OCTA classification of healthy, NoDR, and NPDR eyes. Translational Relevance: OCTA can capture microvascular changes in early DR. A transfer learning process enables robust implementation of convolutional neural network (CNN) for automated OCTA classification of DR.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Supervised machine learning based multi-task artificial intelligence classification of retinopathies
Authors:
Minhaj Alam,
David Le,
Jennifer I. Lim,
R. V. P. Chan,
Xincheng Yao
Abstract:
Artificial intelligence (AI) classification holds promise as a novel and affordable screening tool for clinical management of ocular diseases. Rural and underserved areas, which suffer from lack of access to experienced ophthalmologists may particularly benefit from this technology. Quantitative optical coherence tomography angiography (OCTA) imaging provides excellent capability to identify subtl…
▽ More
Artificial intelligence (AI) classification holds promise as a novel and affordable screening tool for clinical management of ocular diseases. Rural and underserved areas, which suffer from lack of access to experienced ophthalmologists may particularly benefit from this technology. Quantitative optical coherence tomography angiography (OCTA) imaging provides excellent capability to identify subtle vascular distortions, which are useful for classifying retinovascular diseases. However, application of AI for differentiation and classification of multiple eye diseases is not yet established. In this study, we demonstrate supervised machine learning based multi-task OCTA classification. We sought 1) to differentiate normal from diseased ocular conditions, 2) to differentiate different ocular disease conditions from each other, and 3) to stage the severity of each ocular condition. Quantitative OCTA features, including blood vessel tortuosity (BVT), blood vascular caliber (BVC), vessel perimeter index (VPI), blood vessel density (BVD), foveal avascular zone (FAZ) area (FAZ-A), and FAZ contour irregularity (FAZ-CI) were fully automatically extracted from the OCTA images. A stepwise backward elimination approach was employed to identify sensitive OCTA features and optimal-feature-combinations for the multi-task classification. For proof-of-concept demonstration, diabetic retinopathy (DR) and sickle cell retinopathy (SCR) were used to validate the supervised machine leaning classifier. The presented AI classification methodology is applicable and can be readily extended to other ocular diseases, holding promise to enable a mass-screening platform for clinical deployment and telemedicine.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
Total Variation and Tight Frame Image Segmentation with Intensity Inhomogeneity
Authors:
Raymond Chan,
Hongfei Yang,
Tieyong Zeng
Abstract:
Image segmentation is an important task in the domain of computer vision and medical imaging. In natural and medical images, intensity inhomogeneity, i.e. the varying image intensity, occurs often and it poses considerable challenges for image segmentation. In this paper, we propose an efficient variational method for segmenting images with intensity inhomogeneity. The method is inspired by previo…
▽ More
Image segmentation is an important task in the domain of computer vision and medical imaging. In natural and medical images, intensity inhomogeneity, i.e. the varying image intensity, occurs often and it poses considerable challenges for image segmentation. In this paper, we propose an efficient variational method for segmenting images with intensity inhomogeneity. The method is inspired by previous works on two-stage segmentation and variational Retinex. Our method consists of two stages. In the first stage, we decouple the image into reflection and illumination parts by solving a convex energy minimization model with either total variation or tight-frame regularisation. In the second stage, we segment the original image by thresholding on the reflection part, and the inhomogeneous intensity is estimated by the smoothly varying illumination part. We adopt a primal dual algorithm to solve the convex model in the first stage, and the convergence is guaranteed. Numerical experiments clearly show that our method is robust and efficient to segment both natural and medical images.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
A Convex Model for Edge-Histogram Specification with Applications to Edge-preserving Smoothing
Authors:
Kelvin C. K. Chan,
Raymond H. Chan,
Mila Nikolova
Abstract:
The goal of edge-histogram specification is to find an image whose edge image has a histogram that matches a given edge-histogram as much as possible. Mignotte has proposed a non-convex model for the problem [M. Mignotte. An energy-based model for the image edge-histogram specification problem. IEEE Transactions on Image Processing, 21(1):379--386, 2012]. In his work, edge magnitudes of an input i…
▽ More
The goal of edge-histogram specification is to find an image whose edge image has a histogram that matches a given edge-histogram as much as possible. Mignotte has proposed a non-convex model for the problem [M. Mignotte. An energy-based model for the image edge-histogram specification problem. IEEE Transactions on Image Processing, 21(1):379--386, 2012]. In his work, edge magnitudes of an input image are first modified by histogram specification to match the given edge-histogram. Then, a non-convex model is minimized to find an output image whose edge-histogram matches the modified edge-histogram. The non-convexity of the model hinders the computations and the inclusion of useful constraints such as the dynamic range constraint. In this paper, instead of considering edge magnitudes, we directly consider the image gradients and propose a convex model based on them. Furthermore, we include additional constraints in our model based on different applications. The convexity of our model allows us to compute the output image efficiently using either Alternating Direction Method of Multipliers or Fast Iterative Shrinkage-Thresholding Algorithm. We consider several applications in edge-preserving smoothing including image abstraction, edge extraction, details exaggeration, and documents scan-through removal. Numerical results are given to illustrate that our method successfully produces decent results efficiently.
△ Less
Submitted 21 June, 2018;
originally announced June 2018.
-
A two-stage method for spectral-spatial classification of hyperspectral images
Authors:
Raymond H. Chan,
Kelvin K. Kan,
Mila Nikolova,
Robert J. Plemmons
Abstract:
This paper proposes a novel two-stage method for the classification of hyperspectral images. Pixel-wise classifiers, such as the classical support vector machine (SVM), consider spectral information only; therefore they would generate noisy classification results as spatial information is not utilized. Many existing methods, such as morphological profiles, superpixel segmentation, and composite ke…
▽ More
This paper proposes a novel two-stage method for the classification of hyperspectral images. Pixel-wise classifiers, such as the classical support vector machine (SVM), consider spectral information only; therefore they would generate noisy classification results as spatial information is not utilized. Many existing methods, such as morphological profiles, superpixel segmentation, and composite kernels, exploit the spatial information too. In this paper, we propose a two-stage approach to incorporate the spatial information. In the first stage, an SVM is used to estimate the class probability for each pixel. The resulting probability map for each class will be noisy. In the second stage, a variational denoising method is used to restore these noisy probability maps to get a good classification map. Our proposed method effectively utilizes both spectral and spatial information of the hyperspectral data sets. Experimental results on three widely used real hyperspectral data sets indicate that our method is very competitive when compared with current state-of-the-art methods, especially when the inter-class spectra are similar or the percentage of the training pixels is high.
△ Less
Submitted 3 June, 2018;
originally announced June 2018.
-
Non-convex optimization for 3D point source localization using a rotating point spread function
Authors:
Chao Wang,
Raymond Chan,
Mila Nikolova,
Robert Plemmons,
Sudhakar Prasad
Abstract:
We consider the high-resolution imaging problem of 3D point source image recovery from 2D data using a method based on point spread function (PSF) engineering. The method involves a new technique, recently proposed by S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. Appli…
▽ More
We consider the high-resolution imaging problem of 3D point source image recovery from 2D data using a method based on point spread function (PSF) engineering. The method involves a new technique, recently proposed by S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain depth from defocus. The amount of rotation of the PSF encodes the depth position of the point source. Applications include high-resolution single molecule localization microscopy as well as the problem addressed in this paper on localization of space debris using a space-based telescope. The localization problem is discretized on a cubical lattice where the coordinates of nonzero entries represent the 3D locations and the values of these entries the fluxes of the point sources. Finding the locations and fluxes of the point sources is a large-scale sparse 3D inverse problem. A new nonconvex regularization method with a data-fitting term based on Kullback-Leibler (KL) divergence is proposed for 3D localization for the Poisson noise model. In addition, we propose a new scheme of estimation of the source fluxes from the KL data-fitting term. Numerical experiments illustrate the efficiency and stability of the algorithms that are trained on a random subset of image data before being applied to other images. Our 3D localization algorithms can be readily applied to other kinds of depth-encoding PSFs as well.
△ Less
Submitted 27 September, 2018; v1 submitted 10 April, 2018;
originally announced April 2018.
-
An Efficient and Flexible Spike Train Model via Empirical Bayes
Authors:
Qi She,
Xiaoli Wu,
Beth Jelfs,
Adam S. Charles,
Rosa H. M. Chan
Abstract:
Accurate statistical models of neural spike responses can characterize the information carried by neural populations. But the limited samples of spike counts during recording usually result in model overfitting. Besides, current models assume spike counts to be Poisson-distributed, which ignores the fact that many neurons demonstrate over-dispersed spiking behaviour. Although the Negative Binomial…
▽ More
Accurate statistical models of neural spike responses can characterize the information carried by neural populations. But the limited samples of spike counts during recording usually result in model overfitting. Besides, current models assume spike counts to be Poisson-distributed, which ignores the fact that many neurons demonstrate over-dispersed spiking behaviour. Although the Negative Binomial Generalized Linear Model (NB-GLM) provides a powerful tool for modeling over-dispersed spike counts, the maximum likelihood-based standard NB-GLM leads to highly variable and inaccurate parameter estimates. Thus, we propose a hierarchical parametric empirical Bayes method to estimate the neural spike responses among neuronal population. Our method integrates both Generalized Linear Models (GLMs) and empirical Bayes theory, which aims to (1) improve the accuracy and reliability of parameter estimation, compared to the maximum likelihood-based method for NB-GLM and Poisson-GLM; (2) effectively capture the over-dispersion nature of spike counts from both simulated data and experimental data; and (3) provide insight into both neural interactions and spiking behaviours of the neuronal populations. We apply our approach to study both simulated data and experimental neural data. The estimation of simulation data indicates that the new framework can accurately predict mean spike counts simulated from different models and recover the connectivity weights among neural populations. The estimation based on retinal neurons demonstrate the proposed method outperforms both NB-GLM and Poisson-GLM in terms of the predictive log-likelihood of held-out data. Codes are available in https://doi.org/10.5281/zenodo.4704423
△ Less
Submitted 27 April, 2021; v1 submitted 10 May, 2016;
originally announced May 2016.
-
Characterization and Control of Diffusion Processes in Multi-Agent Networks
Authors:
Wai Hong Ronald Chan,
Matthias Wildemeersch,
Tony Q. S. Quek
Abstract:
Diffusion processes are instrumental to describe the movement of a continuous quantity in a generic network of interacting agents. Here, we present a probabilistic framework for diffusion in networks and propose to classify agent interactions according to two protocols where the total network quantity is conserved or variable. For both protocols, our focus is on asymmetric interactions between age…
▽ More
Diffusion processes are instrumental to describe the movement of a continuous quantity in a generic network of interacting agents. Here, we present a probabilistic framework for diffusion in networks and propose to classify agent interactions according to two protocols where the total network quantity is conserved or variable. For both protocols, our focus is on asymmetric interactions between agents involving directed graphs. Specifically, we define how the dynamics of conservative and non-conservative networks relate to the weighted in-degree Laplacian and the weighted out-degree Laplacian. Our framework allows the addition and subtraction of the considered quantity to and from a set of nodes. This enables the modeling of stubborn agents with time-invariant quantities, and the process of dynamic learning. We highlight several stability and convergence characteristics of our framework, and define the conditions under which asymptotic convergence is guaranteed when the network topology is variable. In addition, we indicate how our framework accommodates external network control and targeted network design. We show how network diffusion can be externally manipulated by applying time-varying input functions at individual nodes. Desirable network structures can also be constructed by adjusting the dominant diffusion modes. To this purpose, we propose a Markov decision process that learns these network adjustments through a reinforcement learning algorithm, suitable for large networks. The presented network control and design schemes enable flow modifications that allow the alteration of the dynamic and stationary behavior of the network in conservative and non-conservative networks.
△ Less
Submitted 27 August, 2015;
originally announced August 2015.