Search | arXiv e-print repository

arXiv:2406.14988 [pdf]

Introducing the Biomechanics-Function Relationship in Glaucoma: Improved Visual Field Loss Predictions from intraocular pressure-induced Neural Tissue Strains

Authors: Thanadet Chuangsuwanich, Monisha E. Nongpiur, Fabian A. Braeu, Tin A. Tun, Alexandre Thiery, Shamira Perera, Ching Lin Ho, Martin Buist, George Barbastathis, Tin Aung, Michaël J. A. Girard

Abstract: Objective. (1) To assess whether neural tissue structure and biomechanics could predict functional loss in glaucoma; (2) To evaluate the importance of biomechanics in making such predictions. Design, Setting and Participants. We recruited 238 glaucoma subjects. For one eye of each subject, we imaged the optic nerve head (ONH) using spectral-domain OCT under the following conditions: (1) primary ga… ▽ More Objective. (1) To assess whether neural tissue structure and biomechanics could predict functional loss in glaucoma; (2) To evaluate the importance of biomechanics in making such predictions. Design, Setting and Participants. We recruited 238 glaucoma subjects. For one eye of each subject, we imaged the optic nerve head (ONH) using spectral-domain OCT under the following conditions: (1) primary gaze and (2) primary gaze with acute IOP elevation. Main Outcomes: We utilized automatic segmentation of optic nerve head (ONH) tissues and digital volume correlation (DVC) analysis to compute intraocular pressure (IOP)-induced neural tissue strains. A robust geometric deep learning approach, known as Point-Net, was employed to predict the full Humphrey 24-2 pattern standard deviation (PSD) maps from ONH structural and biomechanical information. For each point in each PSD map, we predicted whether it exhibited no defect or a PSD value of less than 5%. Predictive performance was evaluated using 5-fold cross-validation and the F1-score. We compared the model's performance with and without the inclusion of IOP-induced strains to assess the impact of biomechanics on prediction accuracy. Results: Integrating biomechanical (IOP-induced neural tissue strains) and structural (tissue morphology and neural tissues thickness) information yielded a significantly better predictive model (F1-score: 0.76+-0.02) across validation subjects, as opposed to relying only on structural information, which resulted in a significantly lower F1-score of 0.71+-0.02 (p < 0.05). Conclusion: Our study has shown that the integration of biomechanical data can significantly improve the accuracy of visual field loss predictions. This highlights the importance of the biomechanics-function relationship in glaucoma, and suggests that biomechanics may serve as a crucial indicator for the development and progression of glaucoma. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2405.11881 [pdf, other]

Out-of-Distribution Detection with a Single Unconditional Diffusion Model

Authors: Alvin Heng, Alexandre H. Thiery, Harold Soh

Abstract: Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples. Traditionally, unsupervised methods utilize a deep generative model for OOD detection. However, such approaches necessitate a different model when evaluating abnormality against a new distribution. With the emergence of foundational generative models, this paper explores whether a si… ▽ More Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples. Traditionally, unsupervised methods utilize a deep generative model for OOD detection. However, such approaches necessitate a different model when evaluating abnormality against a new distribution. With the emergence of foundational generative models, this paper explores whether a single generalist model can also perform OOD detection across diverse tasks. To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. DiffPath proposes to utilize a single diffusion model originally trained to perform unconditional generation for OOD detection. Specifically, we introduce a novel technique of measuring the rate-of-change and curvature of the diffusion paths connecting samples to the standard normal. Extensive experiments show that with a single model, DiffPath outperforms prior work on a variety of OOD tasks involving different distributions. Our code is publicly available at https://github.com/clear-nus/diffpath. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.18556 [pdf, other]

Doubly Adaptive Importance Sampling

Authors: Willem van den Boom, Andrea Cremaschi, Alexandre H. Thiery

Abstract: We propose an adaptive importance sampling scheme for Gaussian approximations of intractable posteriors. Optimization-based approximations like variational inference can be too inaccurate while existing Monte Carlo methods can be too slow. Therefore, we propose a hybrid where, at each iteration, the Monte Carlo effective sample size can be guaranteed at a fixed computational cost by interpolating… ▽ More We propose an adaptive importance sampling scheme for Gaussian approximations of intractable posteriors. Optimization-based approximations like variational inference can be too inaccurate while existing Monte Carlo methods can be too slow. Therefore, we propose a hybrid where, at each iteration, the Monte Carlo effective sample size can be guaranteed at a fixed computational cost by interpolating between natural-gradient variational inference and importance sampling. The amount of dam** in the updates adapts to the posterior and guarantees the effective sample size. Gaussianity enables the use of Stein's lemma to obtain gradient-based optimization in the highly damped variational inference regime and a reduction of Monte Carlo error for undamped adaptive importance sampling. The result is a generic, embarrassingly parallel and adaptive posterior approximation method. Numerical studies on simulated and real data show its competitiveness with other, less general methods. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 30 pages, 8 figures

arXiv:2312.05910 [pdf, other]

Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference

Authors: Zhidi Lin, Yiyong Sun, Feng Yin, Alexandre Hoang Thiéry

Abstract: The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, a… ▽ More The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, and infeasibility for online applications, among others. In this paper, we tackle these challenges by incorporating the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, into the NMF variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO). Moreover, owing to the streamlined parameterization via the EnKF, the new GPSSM model can be easily accommodated in online learning applications. We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting. We also provide detailed analysis and fresh insights for the proposed algorithms. Comprehensive evaluation across diverse real and synthetic datasets corroborates the superior learning and inference performance of our EnKF-aided variational inference algorithms compared to existing methods. △ Less

Submitted 14 January, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Gaussian process, state-space model, ensemble Kalman filter, online learning, variational inference. (19 pages, 10 figures)

arXiv:2312.02450 [pdf, other]

GIT-Net: Generalized Integral Transform for Operator Learning

Authors: Chao Wang, Alexandre Hoang Thiery

Abstract: This article introduces GIT-Net, a deep neural network architecture for approximating Partial Differential Equation (PDE) operators, inspired by integral transform operators. GIT-NET harnesses the fact that differential operators commonly used for defining PDEs can often be represented parsimoniously when expressed in specialized functional bases (e.g., Fourier basis). Unlike rigid integral transf… ▽ More This article introduces GIT-Net, a deep neural network architecture for approximating Partial Differential Equation (PDE) operators, inspired by integral transform operators. GIT-NET harnesses the fact that differential operators commonly used for defining PDEs can often be represented parsimoniously when expressed in specialized functional bases (e.g., Fourier basis). Unlike rigid integral transforms, GIT-Net parametrizes adaptive generalized integral transforms with deep neural networks. When compared to several recently proposed alternatives, GIT-Net's computational and memory requirements scale gracefully with mesh discretizations, facilitating its application to PDE problems on complex geometries. Numerical experiments demonstrate that GIT-Net is a competitive neural network operator, exhibiting small test errors and low evaluations across a range of PDE problems. This stands in contrast to existing neural network operators, which typically excel in just one of these areas. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Journal ref: Transactions on Machine Learning Research (05 Dec 2023)

arXiv:2301.02837 [pdf]

The 3D Structural Phenotype of the Glaucomatous Optic Nerve Head and its Relationship with The Severity of Visual Field Damage

Authors: Fabian A. Braeu, Thanadet Chuangsuwanich, Tin A. Tun, Shamira A. Perera, Rahat Husain, Aiste Kadziauskiene, Leopold Schmetterer, Alexandre H. Thiéry, George Barbastathis, Tin Aung, Michaël J. A. Girard

Abstract: $\bf{Purpose}$: To describe the 3D structural changes in both connective and neural tissues of the optic nerve head (ONH) that occur concurrently at different stages of glaucoma using traditional and AI-driven approaches. $\bf{Methods}$: We included 213 normal, 204 mild glaucoma (mean deviation [MD] $\ge… ▽ More $\bf{Purpose}$: To describe the 3D structural changes in both connective and neural tissues of the optic nerve head (ONH) that occur concurrently at different stages of glaucoma using traditional and AI-driven approaches. $\bf{Methods}$: We included 213 normal, 204 mild glaucoma (mean deviation [MD] $\ge$ -6.00 dB), 118 moderate glaucoma (MD of -6.01 to -12.00 dB), and 118 advanced glaucoma patients (MD < -12.00 dB). All subjects had their ONHs imaged in 3D with Spectralis optical coherence tomography. To describe the 3D structural phenotype of glaucoma as a function of severity, we used two different approaches: (1) We extracted human-defined 3D structural parameters of the ONH including retinal nerve fiber layer (RNFL) thickness, lamina cribrosa (LC) shape and depth at different stages of glaucoma; (2) we also employed a geometric deep learning method (i.e. PointNet) to identify the most important 3D structural features that differentiate ONHs from different glaucoma severity groups without any human input. $\bf{Results}$: We observed that the majority of ONH structural changes occurred in the early glaucoma stage, followed by a plateau effect in the later stages. Using PointNet, we also found that 3D ONH structural changes were present in both neural and connective tissues. In both approaches, we observed that structural changes were more prominent in the superior and inferior quadrant of the ONH, particularly in the RNFL, the prelamina, and the LC. As the severity of glaucoma increased, these changes became more diffuse (i.e. widespread), particularly in the LC. $\bf{Conclusions}$: In this study, we were able to uncover complex 3D structural changes of the ONH in both neural and connective tissues as a function of glaucoma severity. We hope to provide new insights into the complex pathophysiology of glaucoma that might help clinicians in their daily clinical care. △ Less

Submitted 7 January, 2023; originally announced January 2023.

arXiv:2210.06664 [pdf]

Are Macula or Optic Nerve Head Structures better at Diagnosing Glaucoma? An Answer using AI and Wide-Field Optical Coherence Tomography

Authors: Charis Y. N. Chiang, Fabian Braeu, Thanadet Chuangsuwanich, Royston K. Y. Tan, Jacqueline Chua, Leopold Schmetterer, Alexandre Thiery, Martin Buist, Michaël J. A. Girard

Abstract: Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field sw… ▽ More Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field swept-source OCT scans from 319 glaucoma subjects and 298 non-glaucoma subjects. All scans were compensated to improve deep-tissue visibility. We developed a deep learning algorithm to automatically label all major ONH tissue structures by using 270 manually annotated B-scans for training. The performance of our algorithm was assessed using the Dice coefficient (DC). A glaucoma classification algorithm (3D CNN) was then designed using a combination of 500 OCT volumes and their corresponding automatically segmented masks. This algorithm was trained and tested on 3 datasets: OCT scans cropped to contain the macular tissues only, those to contain the ONH tissues only, and the full wide-field OCT scans. The classification performance for each dataset was reported using the AUC. Results: Our segmentation algorithm was able to segment ONH and macular tissues with a DC of 0.94 $\pm$ 0.003. The classification algorithm was best able to diagnose glaucoma using wide-field 3D-OCT volumes with an AUC of 0.99 $\pm$ 0.01, followed by ONH volumes with an AUC of 0.93 $\pm$ 0.06, and finally macular volumes with an AUC of 0.91 $\pm$ 0.11. Conclusions: this study showed that using wide-field OCT as compared to the typical OCT images containing just the ONH or macular may allow for a significantly improved glaucoma diagnosis. This may encourage the mainstream adoption of 3D wide-field OCT scans. For clinical AI studies that use traditional machines, we would recommend the use of ONH scans as opposed to macula scans. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 23 pages, 5 figures

arXiv:2207.10137 [pdf, other]

A Generalized & Robust Framework For Timestamp Supervision in Temporal Action Segmentation

Authors: Rahul Rahaman, Dipika Singhania, Alexandre Thiery, Angela Yao

Abstract: In temporal action segmentation, Timestamp supervision requires only a handful of labelled frames per video sequence. For unlabelled frames, previous works rely on assigning hard labels, and performance rapidly collapses under subtle violations of the annotation assumptions. We propose a novel Expectation-Maximization (EM) based approach that leverages the label uncertainty of unlabelled frames an… ▽ More In temporal action segmentation, Timestamp supervision requires only a handful of labelled frames per video sequence. For unlabelled frames, previous works rely on assigning hard labels, and performance rapidly collapses under subtle violations of the annotation assumptions. We propose a novel Expectation-Maximization (EM) based approach that leverages the label uncertainty of unlabelled frames and is robust enough to accommodate possible annotation errors. With accurate timestamp annotations, our proposed method produces SOTA results and even exceeds the fully-supervised setup in several metrics and datasets. When applied to timestamp annotations with missing action segments, our method presents stable performance. To further test our formulation's robustness, we introduce the new challenging annotation setup of Skip-tag supervision. This setup relaxes constraints and requires annotations of any fixed number of random frames in a video, making it more flexible than Timestamp supervision while remaining competitive. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2206.04689 [pdf]

AI-based Clinical Assessment of Optic Nerve Head Robustness Superseding Biomechanical Testing

Authors: Fabian A. Braeu, Thanadet Chuangsuwanich, Tin A. Tun, Alexandre H. Thiery, Tin Aung, George Barbastathis, Michaël J. A. Girard

Abstract: $\mathbf{Purpose}$: To use artificial intelligence (AI) to: (1) exploit biomechanical knowledge of the optic nerve head (ONH) from a relatively large population; (2) assess ONH robustness from a single optical coherence tomography (OCT) scan of the ONH; (3) identify what critical three-dimensional (3D) structural features make a given ONH robust. $\mathbf{Design}… ▽ More $\mathbf{Purpose}$: To use artificial intelligence (AI) to: (1) exploit biomechanical knowledge of the optic nerve head (ONH) from a relatively large population; (2) assess ONH robustness from a single optical coherence tomography (OCT) scan of the ONH; (3) identify what critical three-dimensional (3D) structural features make a given ONH robust. $\mathbf{Design}$: Retrospective cross-sectional study. $\mathbf{Methods}$: 316 subjects had their ONHs imaged with OCT before and after acute intraocular pressure (IOP) elevation through ophthalmo-dynamometry. IOP-induced lamina-cribrosa deformations were then mapped in 3D and used to classify ONHs. Those with LC deformations superior to 4% were considered fragile, while those with deformations inferior to 4% robust. Learning from these data, we compared three AI algorithms to predict ONH robustness strictly from a baseline (undeformed) OCT volume: (1) a random forest classifier; (2) an autoencoder; and (3) a dynamic graph CNN (DGCNN). The latter algorithm also allowed us to identify what critical 3D structural features make a given ONH robust. $\mathbf{Results}$: All 3 methods were able to predict ONH robustness from 3D structural information alone and without the need to perform biomechanical testing. The DGCNN (area under the receiver operating curve [AUC]: 0.76 $\pm$ 0.08) outperformed the autoencoder (AUC: 0.70 $\pm$ 0.07) and the random forest classifier (AUC: 0.69 $\pm$ 0.05). Interestingly, to assess ONH robustness, the DGCNN mainly used information from the scleral canal and the LC insertion sites. $\mathbf{Conclusions}$: We propose an AI-driven approach that can assess the robustness of a given ONH solely from a single OCT scan of the ONH, and without the need to perform biomechanical testing. Longitudinal studies should establish whether ONH robustness could help us identify fast visual field loss progressors. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2206.03369 [pdf, other]

Computational Doob's h-transforms for Online Filtering of Discretely Observed Diffusions

Authors: Nicolas Chopin, Andras Fulop, Jeremy Heng, Alexandre H. Thiery

Abstract: This paper is concerned with online filtering of discretely observed nonlinear diffusion processes. Our approach is based on the fully adapted auxiliary particle filter, which involves Doob's $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac form… ▽ More This paper is concerned with online filtering of discretely observed nonlinear diffusion processes. Our approach is based on the fully adapted auxiliary particle filter, which involves Doob's $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac formulas and neural networks. The methodology allows one to train a locally optimal particle filter prior to the data-assimilation procedure. Numerical experiments illustrate that the proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters in the regime of highly informative observations, when the observations are extreme under the model, or if the state dimension is large. △ Less

Submitted 30 May, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: 20 pages

Journal ref: ICML 2023

arXiv:2204.07004 [pdf, other]

Medical Application of Geometric Deep Learning for the Diagnosis of Glaucoma

Authors: Alexandre H. Thiery, Fabian Braeu, Tin A. Tun, Tin Aung, Michael J. A. Girard

Abstract: Purpose: (1) To assess the performance of geometric deep learning (PointNet) in diagnosing glaucoma from a single optical coherence tomography (OCT) 3D scan of the optic nerve head (ONH); (2) To compare its performance to that obtained with a standard 3D convolutional neural network (CNN), and with a gold-standard glaucoma parameter, i.e. retinal nerve fiber layer (RNFL) thickness. Methods: 3D r… ▽ More Purpose: (1) To assess the performance of geometric deep learning (PointNet) in diagnosing glaucoma from a single optical coherence tomography (OCT) 3D scan of the optic nerve head (ONH); (2) To compare its performance to that obtained with a standard 3D convolutional neural network (CNN), and with a gold-standard glaucoma parameter, i.e. retinal nerve fiber layer (RNFL) thickness. Methods: 3D raster scans of the ONH were acquired with Spectralis OCT for 477 glaucoma and 2,296 non-glaucoma subjects at the Singapore National Eye Centre. All volumes were automatically segmented using deep learning to identify 7 major neural and connective tissues including the RNFL, the prelamina, and the lamina cribrosa (LC). Each ONH was then represented as a 3D point cloud with 1,000 points chosen randomly from all tissue boundaries. To simplify the problem, all ONH point clouds were aligned with respect to the plane and center of Bruch's membrane opening. Geometric deep learning (PointNet) was then used to provide a glaucoma diagnosis from a single OCT point cloud. The performance of our approach was compared to that obtained with a 3D CNN, and with RNFL thickness. Results: PointNet was able to provide a robust glaucoma diagnosis solely from the ONH represented as a 3D point cloud (AUC=95%). The performance of PointNet was superior to that obtained with a standard 3D CNN (AUC=87%) and with that obtained from RNFL thickness alone (AUC=80%). Discussion: We provide a proof-of-principle for the application of geometric deep learning in the field of glaucoma. Our technique requires significantly less information as input to perform better than a 3D CNN, and with an AUC superior to that obtained from RNFL thickness alone. Geometric deep learning may have wide applicability in the field of Ophthalmology. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2204.06931 [pdf]

Geometric Deep Learning to Identify the Critical 3D Structural Features of the Optic Nerve Head for Glaucoma Diagnosis

Authors: Fabian A. Braeu, Alexandre H. Thiéry, Tin A. Tun, Aiste Kadziauskiene, George Barbastathis, Tin Aung, Michaël J. A. Girard

Abstract: Purpose: The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma. Optical coherence tomography (OCT) is the current gold standard to visualize and quantify these changes, however the resulting 3D deep-tissue information has not yet been fully exploited for the diagnosis and prognosis of glaucoma. To this end, we aimed: (1) T… ▽ More Purpose: The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma. Optical coherence tomography (OCT) is the current gold standard to visualize and quantify these changes, however the resulting 3D deep-tissue information has not yet been fully exploited for the diagnosis and prognosis of glaucoma. To this end, we aimed: (1) To compare the performance of two relatively recent geometric deep learning techniques in diagnosing glaucoma from a single OCT scan of the ONH; and (2) To identify the 3D structural features of the ONH that are critical for the diagnosis of glaucoma. Methods: In this study, we included a total of 2,247 non-glaucoma and 2,259 glaucoma scans from 1,725 subjects. All subjects had their ONHs imaged in 3D with Spectralis OCT. All OCT scans were automatically segmented using deep learning to identify major neural and connective tissues. Each ONH was then represented as a 3D point cloud. We used PointNet and dynamic graph convolutional neural network (DGCNN) to diagnose glaucoma from such 3D ONH point clouds and to identify the critical 3D structural features of the ONH for glaucoma diagnosis. Results: Both the DGCNN (AUC: 0.97$\pm$0.01) and PointNet (AUC: 0.95$\pm$0.02) were able to accurately detect glaucoma from 3D ONH point clouds. The critical points formed an hourglass pattern with most of them located in the inferior and superior quadrant of the ONH. Discussion: The diagnostic accuracy of both geometric deep learning approaches was excellent. Moreover, we were able to identify the critical 3D structural features of the ONH for glaucoma diagnosis that tremendously improved the transparency and interpretability of our method. Consequently, our approach may have strong potential to be used in clinical applications for the diagnosis and prognosis of a wide range of ophthalmic disorders. △ Less

Submitted 20 April, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

arXiv:2204.00624 [pdf, ps, other]

Explainable and Interpretable Diabetic Retinopathy Classification Based on Neural-Symbolic Learning

Authors: Se-In Jang, Michael J. A. Girard, Alexandre H. Thiery

Abstract: In this paper, we propose an explainable and interpretable diabetic retinopathy (ExplainDR) classification model based on neural-symbolic learning. To gain explainability, a highlevel symbolic representation should be considered in decision making. Specifically, we introduce a human-readable symbolic representation, which follows a taxonomy style of diabetic retinopathy characteristics related to… ▽ More In this paper, we propose an explainable and interpretable diabetic retinopathy (ExplainDR) classification model based on neural-symbolic learning. To gain explainability, a highlevel symbolic representation should be considered in decision making. Specifically, we introduce a human-readable symbolic representation, which follows a taxonomy style of diabetic retinopathy characteristics related to eye health conditions to achieve explainability. We then include humanreadable features obtained from the symbolic representation in the disease prediction. Experimental results on a diabetic retinopathy classification dataset show that our proposed ExplainDR method exhibits promising performance when compared to that from state-of-the-art methods applied to the IDRiD dataset, while also providing interpretability and explainability. △ Less

Submitted 31 March, 2022; originally announced April 2022.

Comments: Published in AAAI-22 Workshop

arXiv:2112.09970 [pdf]

3D Structural Analysis of the Optic Nerve Head to Robustly Discriminate Between Papilledema and Optic Disc Drusen

Authors: Michaël J. A. Girard, Satish K. Panda, Tin Aung Tun, Elisabeth A. Wibroe, Raymond P. Najjar, Aung Tin, Alexandre H. Thiéry, Steffen Hamann, Clare Fraser, Dan Milea

Abstract: Purpose: (1) To develop a deep learning algorithm to identify major tissue structures of the optic nerve head (ONH) in 3D optical coherence tomography (OCT) scans; (2) to exploit such information to robustly differentiate among healthy, optic disc drusen (ODD), and papilledema ONHs. It was a cross-sectional comparative study with confirmed ODD (105 eyes), papilledema due to high intracranial pre… ▽ More Purpose: (1) To develop a deep learning algorithm to identify major tissue structures of the optic nerve head (ONH) in 3D optical coherence tomography (OCT) scans; (2) to exploit such information to robustly differentiate among healthy, optic disc drusen (ODD), and papilledema ONHs. It was a cross-sectional comparative study with confirmed ODD (105 eyes), papilledema due to high intracranial pressure (51 eyes), and healthy controls (100 eyes). 3D scans of the ONHs were acquired using OCT, then processed to improve deep-tissue visibility. At first, a deep learning algorithm was developed using 984 B-scans (from 130 eyes) in order to identify: major neural/connective tissues, and ODD regions. The performance of our algorithm was assessed using the Dice coefficient (DC). In a 2nd step, a classification algorithm (random forest) was designed using 150 OCT volumes to perform 3-class classifications (1: ODD, 2: papilledema, 3: healthy) strictly from their drusen and prelamina swelling scores (derived from the segmentations). To assess performance, we reported the area under the receiver operating characteristic curves (AUCs) for each class. Our segmentation algorithm was able to isolate neural and connective tissues, and ODD regions whenever present. This was confirmed by an average DC of 0.93$\pm$0.03 on the test set, corresponding to good performance. Classification was achieved with high AUCs, i.e. 0.99$\pm$0.01 for the detection of ODD, 0.99 $\pm$ 0.01 for the detection of papilledema, and 0.98$\pm$0.02 for the detection of healthy ONHs. Our AI approach accurately discriminated ODD from papilledema, using a single OCT scan. Our classification performance was excellent, with the caveat that validation in a much larger population is warranted. Our approach may have the potential to establish OCT as the mainstay of diagnostic imaging in neuro-ophthalmology. △ Less

Submitted 18 December, 2021; originally announced December 2021.

arXiv:2111.03997 [pdf]

The Three-Dimensional Structural Configuration of the Central Retinal Vessel Trunk and Branches as a Glaucoma Biomarker

Authors: Satish K. Panda, Haris Cheong, Tin A. Tun, Thanadet Chuangsuwanich, Aiste Kadziauskiene, Vijayalakshmi Senthil, Ramaswami Krishnadas, Martin L. Buist, Shamira Perera, Ching-Yu Cheng, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Purpose: To assess whether the three-dimensional (3D) structural configuration of the central retinal vessel trunk and its branches (CRVT&B) could be used as a diagnostic marker for glaucoma. Method: We trained a deep learning network to automatically segment the CRVT&B from the B-scans of the optical coherence tomography (OCT) volume of the optic nerve head (ONH). Subsequently, two different appr… ▽ More Purpose: To assess whether the three-dimensional (3D) structural configuration of the central retinal vessel trunk and its branches (CRVT&B) could be used as a diagnostic marker for glaucoma. Method: We trained a deep learning network to automatically segment the CRVT&B from the B-scans of the optical coherence tomography (OCT) volume of the optic nerve head (ONH). Subsequently, two different approaches were used for glaucoma diagnosis using the structural configuration of the CRVT&B as extracted from the OCT volumes. In the first approach, we aimed to provide a diagnosis using only 3D CNN and the 3D structure of the CRVT&B. For the second approach, we projected the 3D structure of the CRVT&B orthographically onto three planes to obtain 2D images, and then a 2D CNN was used for diagnosis. The segmentation accuracy was evaluated using the Dice coefficient, whereas the diagnostic accuracy was assessed using the area under the receiver operating characteristic curves (AUC). The diagnostic performance of the CRVT&B was also compared with that of retinal nerve fiber layer (RNFL) thickness. Results: Our segmentation network was able to efficiently segment retinal blood vessels from OCT scans. On a test set, we achieved a Dice coefficient of 0.81\pm0.07. The 3D and 2D diagnostic networks were able to differentiate glaucoma from non-glaucoma subjects with accuracies of 82.7% and 83.3%, respectively. The corresponding AUCs for CRVT&B were 0.89 and 0.90, higher than those obtained with RNFL thickness alone. Conclusions: Our work demonstrated that the diagnostic power of the CRVT&B is superior to that of a gold-standard glaucoma parameter, i.e., RNFL thickness. Our work also suggested that the major retinal blood vessels form a skeleton -- the configuration of which may be representative of major ONH structural changes as typically observed with the development and progression of glaucoma. △ Less

Submitted 8 November, 2021; v1 submitted 7 November, 2021; originally announced November 2021.

arXiv:2108.10277 [pdf, other]

Conditional sequential Monte Carlo in high dimensions

Authors: Axel Finke, Alexandre H. Thiery

Abstract: The iterated conditional sequential Monte Carlo (i-CSMC) algorithm from Andrieu, Doucet and Holenstein (2010) is an MCMC approach for efficiently sampling from the joint posterior distribution of the $T$ latent states in challenging time-series models, e.g. in non-linear or non-Gaussian state-space models. It is also the main ingredient in particle Gibbs samplers which infer unknown model paramete… ▽ More The iterated conditional sequential Monte Carlo (i-CSMC) algorithm from Andrieu, Doucet and Holenstein (2010) is an MCMC approach for efficiently sampling from the joint posterior distribution of the $T$ latent states in challenging time-series models, e.g. in non-linear or non-Gaussian state-space models. It is also the main ingredient in particle Gibbs samplers which infer unknown model parameters alongside the latent states. In this work, we first prove that the i-CSMC algorithm suffers from a curse of dimension in the dimension of the states, $D$: it breaks down unless the number of samples ("particles"), $N$, proposed by the algorithm grows exponentially with $D$. Then, we present a novel "local" version of the algorithm which proposes particles using Gaussian random-walk moves that are suitably scaled with $D$. We prove that this iterated random-walk conditional sequential Monte Carlo (i-RW-CSMC) algorithm avoids the curse of dimension: for arbitrary $N$, its acceptance rates and expected squared jum** distance converge to non-trivial limits as $D \to \infty$. If $T = N = 1$, our proposed algorithm reduces to a Metropolis--Hastings or Barker's algorithm with Gaussian random-walk moves and we recover the well known scaling limits for such algorithms. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: 47 pages, 5 figures

arXiv:2104.02925 [pdf, other]

Pretrained equivariant features improve unsupervised landmark discovery

Authors: Rahul Rahaman, Atin Ghosh, Alexandre H. Thiery

Abstract: Locating semantically meaningful landmark points is a crucial component of a large number of computer vision pipelines. Because of the small number of available datasets with ground truth landmark annotations, it is important to design robust unsupervised and semi-supervised methods for landmark detection. Many of the recent unsupervised learning methods rely on the equivariance properties of la… ▽ More Locating semantically meaningful landmark points is a crucial component of a large number of computer vision pipelines. Because of the small number of available datasets with ground truth landmark annotations, it is important to design robust unsupervised and semi-supervised methods for landmark detection. Many of the recent unsupervised learning methods rely on the equivariance properties of landmarks to synthetic image deformations. Our work focuses on such widely used methods and sheds light on its core problem, its inability to produce equivariant intermediate convolutional features. This finding leads us to formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features and then use the pre-trained features to learn a landmark detector by the traditional equivariance method. Our method produces state-of-the-art results in several challenging landmark detection datasets such as the BBC Pose dataset and the Cat-Head dataset. It performs comparably on a range of other benchmarks. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2101.06967 [pdf, other]

On Data-Augmentation and Consistency-Based Semi-Supervised Learning

Authors: Atin Ghosh, Alexandre H. Thiery

Abstract: Recently proposed consistency-based Semi-Supervised Learning (SSL) methods such as the $Π$-model, temporal ensembling, the mean teacher, or the virtual adversarial training, have advanced the state of the art in several SSL tasks. These methods can typically reach performances that are comparable to their fully supervised counterparts while using only a fraction of labelled examples. Despite these… ▽ More Recently proposed consistency-based Semi-Supervised Learning (SSL) methods such as the $Π$-model, temporal ensembling, the mean teacher, or the virtual adversarial training, have advanced the state of the art in several SSL tasks. These methods can typically reach performances that are comparable to their fully supervised counterparts while using only a fraction of labelled examples. Despite these methodological advances, the understanding of these methods is still relatively limited. In this text, we analyse (variations of) the $Π$-model in settings where analytically tractable results can be obtained. We establish links with Manifold Tangent Classifiers and demonstrate that the quality of the perturbations is key to obtaining reasonable SSL performances. Importantly, we propose a simple extension of the Hidden Manifold Model that naturally incorporates data-augmentation schemes and offers a framework for understanding and experimenting with SSL methods. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Journal ref: ICLR 2021

arXiv:2012.09755 [pdf, other]

Describing the Structural Phenotype of the Glaucomatous Optic Nerve Head Using Artificial Intelligence

Authors: Satish K. Panda, Haris Cheong, Tin A. Tun, Sripad K. Devella, Ramaswami Krishnadas, Martin L. Buist, Shamira Perera, Ching-Yu Cheng, Tin Aung, Alexandre H. Thiéry, Michaël J. A. Girard

Abstract: The optic nerve head (ONH) typically experiences complex neural- and connective-tissue structural changes with the development and progression of glaucoma, and monitoring these changes could be critical for improved diagnosis and prognosis in the glaucoma clinic. The gold-standard technique to assess structural changes of the ONH clinically is optical coherence tomography (OCT). However, OCT is li… ▽ More The optic nerve head (ONH) typically experiences complex neural- and connective-tissue structural changes with the development and progression of glaucoma, and monitoring these changes could be critical for improved diagnosis and prognosis in the glaucoma clinic. The gold-standard technique to assess structural changes of the ONH clinically is optical coherence tomography (OCT). However, OCT is limited to the measurement of a few hand-engineered parameters, such as the thickness of the retinal nerve fiber layer (RNFL), and has not yet been qualified as a stand-alone device for glaucoma diagnosis and prognosis applications. We argue this is because the vast amount of information available in a 3D OCT scan of the ONH has not been fully exploited. In this study we propose a deep learning approach that can: \textbf{(1)} fully exploit information from an OCT scan of the ONH; \textbf{(2)} describe the structural phenotype of the glaucomatous ONH; and that can \textbf{(3)} be used as a robust glaucoma diagnosis tool. Specifically, the structural features identified by our algorithm were found to be related to clinical observations of glaucoma. The diagnostic accuracy from these structural features was $92.0 \pm 2.3 \%$ with a sensitivity of $90.0 \pm 2.4 \% $ (at $95 \%$ specificity). By changing their magnitudes in steps, we were able to reveal how the morphology of the ONH changes as one transitions from a `non-glaucoma' to a `glaucoma' condition. We believe our work may have strong clinical implication for our understanding of glaucoma pathogenesis, and could be improved in the future to also predict future loss of vision. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:2010.11698 [pdf, other]

OCT-GAN: Single Step Shadow and Noise Removal from Optical Coherence Tomography Images of the Human Optic Nerve Head

Authors: Haris Cheong, Sripad Krishna Devalla, Thanadet Chuangsuwanich, Tin A. Tun, Xiaofei Wang, Tin Aung, Leopold Schmetterer, Martin L. Buist, Craig Boote, Alexandre H. Thiéry, Michaël J. A. Girard

Abstract: Speckle noise and retinal shadows within OCT B-scans occlude important edges, fine textures and deep tissues, preventing accurate and robust diagnosis by algorithms and clinicians. We developed a single process that successfully removed both noise and retinal shadows from unseen single-frame B-scans within 10.4ms. Mean average gradient magnitude (AGM) for the proposed algorithm was 57.2% higher th… ▽ More Speckle noise and retinal shadows within OCT B-scans occlude important edges, fine textures and deep tissues, preventing accurate and robust diagnosis by algorithms and clinicians. We developed a single process that successfully removed both noise and retinal shadows from unseen single-frame B-scans within 10.4ms. Mean average gradient magnitude (AGM) for the proposed algorithm was 57.2% higher than current state-of-the-art, while mean peak signal to noise ratio (PSNR), contrast to noise ratio (CNR), and structural similarity index metric (SSIM) increased by 11.1%, 154% and 187% respectively compared to single-frame B-scans. Mean intralayer contrast (ILC) improvement for the retinal nerve fiber layer (RNFL), photoreceptor layer (PR) and retinal pigment epithelium (RPE) layers decreased from 0.362 \pm 0.133 to 0.142 \pm 0.102, 0.449 \pm 0.116 to 0.0904 \pm 0.0769, 0.381 \pm 0.100 to 0.0590 \pm 0.0451 respectively. The proposed algorithm reduces the necessity for long image acquisition times, minimizes expensive hardware requirements and reduces motion artifacts in OCT images. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: 20 pages, 7 figures

arXiv:2007.08792 [pdf, other]

Uncertainty Quantification and Deep Ensembles

Authors: Rahul Rahaman, Alexandre H. Thiery

Abstract: Deep Learning methods are known to suffer from calibration issues: they typically produce over-confident estimates. These problems are exacerbated in the low data regime. Although the calibration of probabilistic models is well studied, calibrating extremely over-parametrized models in the low-data regime presents unique challenges. We show that deep-ensembles do not necessarily lead to improved c… ▽ More Deep Learning methods are known to suffer from calibration issues: they typically produce over-confident estimates. These problems are exacerbated in the low data regime. Although the calibration of probabilistic models is well studied, calibrating extremely over-parametrized models in the low-data regime presents unique challenges. We show that deep-ensembles do not necessarily lead to improved calibration properties. In fact, we show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce: data-augmentation, ensembling, and post-processing calibration methods. Although standard ensembling techniques certainly help boost accuracy, we demonstrate that the calibration of deep ensembles relies on subtle trade-offs. We also find that calibration methods such as temperature scaling need to be slightly tweaked when used with deep-ensembles and, crucially, need to be executed after the averaging process. Our simulations indicate that this simple strategy can halve the Expected Calibration Error (ECE) on a range of benchmark classification problems compared to standard deep-ensembles in the low data regime. △ Less

Submitted 2 November, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

Journal ref: Thirty-Fifth Conference on Neural Information Processing Systems (2021)

arXiv:2003.03950 [pdf, other]

Manifold lifting: scaling MCMC to the vanishing noise regime

Authors: Khai Xiang Au, Matthew M. Graham, Alexandre H. Thiery

Abstract: Standard Markov chain Monte Carlo methods struggle to explore distributions that are concentrated in the neighbourhood of low-dimensional structures. These pathologies naturally occur in a number of situations. For example, they are common to Bayesian inverse problem modelling and Bayesian neural networks, when observational data are highly informative, or when a subset of the statistical paramete… ▽ More Standard Markov chain Monte Carlo methods struggle to explore distributions that are concentrated in the neighbourhood of low-dimensional structures. These pathologies naturally occur in a number of situations. For example, they are common to Bayesian inverse problem modelling and Bayesian neural networks, when observational data are highly informative, or when a subset of the statistical parameters of interest are non-identifiable. In this paper, we propose a strategy that transforms the original sampling problem into the task of exploring a distribution supported on a manifold embedded in a higher dimensional space; in contrast to the original posterior this lifted distribution remains diffuse in the vanishing noise limit. We employ a constrained Hamiltonian Monte Carlo method which exploits the manifold geometry of this lifted distribution, to perform efficient approximate inference. We demonstrate in several numerical experiments that, contrarily to competing approaches, the sampling efficiency of our proposed methodology does not degenerate as the target distribution to be explored concentrates near low dimensional structures. △ Less

Submitted 1 December, 2021; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: 56 pages, 25 figures

arXiv:2002.09635 [pdf, other]

Towards Label-Free 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep Learning

Authors: Sripad Krishna Devalla, Tan Hung Pham, Satish Kumar Panda, Liang Zhang, Giridhar Subramanian, Anirudh Swaminathan, Chin Zhi Yun, Mohan Rajan, Sujatha Mohan, Ramaswami Krishnadas, Vijayalakshmi Senthil, John Mark S. de Leon, Tin A. Tun, Ching-Yu Cheng, Leopold Schmetterer, Shamira Perera, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device s… ▽ More Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device specific nature and the difficulty in preparing manual segmentations (training data) limit their clinical adoption. With several new manufacturers and next-generation OCT devices entering the market, the complexity in deploying DL algorithms clinically is only increasing. To address this, we propose a DL based 3D segmentation framework that is easily translatable across OCT devices in a label-free manner (i.e. without the need to manually re-segment data for each device). Specifically, we developed 2 sets of DL networks. The first (referred to as the enhancer) was able to enhance OCT image quality from 3 OCT devices, and harmonized image-characteristics across these devices. The second performed 3D segmentation of 6 important ONH tissue layers. We found that the use of the enhancer was critical for our segmentation network to achieve device independency. In other words, our 3D segmentation network trained on any of 3 devices successfully segmented ONH tissue layers from the other two devices with high performance (Dice coefficients > 0.92). With such an approach, we could automatically segment images from new OCT devices without ever needing manual segmentation data from such devices. △ Less

Submitted 22 February, 2020; originally announced February 2020.

arXiv:1912.02982 [pdf, other]

Manifold Markov chain Monte Carlo methods for Bayesian inference in diffusion models

Authors: Matthew M. Graham, Alexandre H. Thiery, Alexandros Beskos

Abstract: Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion p… ▽ More Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology, borrowing ideas from statistical physics and computational chemistry, for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, map** onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for a class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypo-elliptic systems; observations with or without noise; linear or non-linear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times. Python code reproducing the results is available at https://doi.org/10.5281/zenodo.5796148 △ Less

Submitted 10 January, 2022; v1 submitted 6 December, 2019; originally announced December 2019.

Comments: Title change and updated with additional details of numerical experiments. 54 pages, 6 figures

MSC Class: 65C40 (Primary) 65C05; 65C40 (Secondary) ACM Class: G.3

arXiv:1910.02844 [pdf, other]

DeshadowGAN: A Deep Learning Approach to Remove Shadows from Optical Coherence Tomography Images

Authors: Haris Cheong, Sripad Krishna Devalla, Tan Hung Pham, Zhang Liang, Tin Aung Tun, Xiaofei Wang, Shamira Perera, Leopold Schmetterer, Aung Tin, Craig Boote, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Purpose: To remove retinal shadows from optical coherence tomography (OCT) images of the optic nerve head(ONH). Methods:2328 OCT images acquired through the center of the ONH using a Spectralis OCT machine for both eyes of 13 subjects were used to train a generative adversarial network (GAN) using a custom loss function. Image quality was assessed qualitatively (for artifacts) and quantitatively… ▽ More Purpose: To remove retinal shadows from optical coherence tomography (OCT) images of the optic nerve head(ONH). Methods:2328 OCT images acquired through the center of the ONH using a Spectralis OCT machine for both eyes of 13 subjects were used to train a generative adversarial network (GAN) using a custom loss function. Image quality was assessed qualitatively (for artifacts) and quantitatively using the intralayer contrast: a measure of shadow visibility ranging from 0 (shadow-free) to 1 (strong shadow) and compared to compensated images. This was computed in the Retinal Nerve Fiber Layer (RNFL), the Inner Plexiform Layer (IPL), the Photoreceptor layer (PR) and the Retinal Pigment Epithelium (RPE) layers. Results: Output images had improved intralayer contrast in all ONH tissue layers. On average the intralayer contrast decreased by 33.7$\pm$6.81%, 28.8$\pm$10.4%, 35.9$\pm$13.0%, and43.0$\pm$19.5%for the RNFL, IPL, PR, and RPE layers respectively, indicating successful shadow removal across all depths. This compared to 70.3$\pm$22.7%, 33.9$\pm$11.5%, 47.0$\pm$11.2%, 26.7$\pm$19.0%for compensation. Output images were also free from artifacts commonly observed with compensation. Conclusions: DeshadowGAN significantly corrected blood vessel shadows in OCT images of the ONH. Our algorithm may be considered as a pre-processing step to improve the performance of a wide range of algorithms including those currently being used for OCT image segmentation, denoising, and classification. Translational Relevance: DeshadowGAN could be integrated to existing OCT devices to improve the diagnosis and prognosis of ocular pathologies. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1909.09591 [pdf, other]

Sequential Ensemble Transform for Bayesian Inverse Problems

Authors: Aaron Myers, Alexandre H. Thiery, Kainan Wang, Tan Bui-Thanh

Abstract: We present the Sequential Ensemble Transform (SET) method, an approach for generating approximate samples from a Bayesian posterior distribution. The method explores the posterior distribution by solving a sequence of discrete optimal transport problems to produce a series of transport plans which map prior samples to posterior samples. We prove that the sequence of Dirac mixture distributions pro… ▽ More We present the Sequential Ensemble Transform (SET) method, an approach for generating approximate samples from a Bayesian posterior distribution. The method explores the posterior distribution by solving a sequence of discrete optimal transport problems to produce a series of transport plans which map prior samples to posterior samples. We prove that the sequence of Dirac mixture distributions produced by the SET method converges weakly to the true posterior as the sample size approaches infinity. Furthermore, our numerical results indicate that, when compared to standard Sequential Monte Carlo (SMC) methods, the SET approach is more robust to the choice of Markov mutation kernels and requires less computational efforts to reach a similar accuracy when used to explore complex posterior distributions. Finally, we describe adaptive schemes that allow to completely automate the use of the SET method. △ Less

Submitted 20 September, 2020; v1 submitted 20 September, 2019; originally announced September 2019.

arXiv:1909.00331 [pdf, other]

Deep Learning Algorithms to Isolate and Quantify the Structures of the Anterior Segment in Optical Coherence Tomography Images

Authors: Tan Hung Pham, Sripad Krishna Devalla, Aloysius Ang, Soh Zhi Da, Alexandre H. Thiery, Craig Boote, Ching-Yu Cheng, Victor Koh, Michael J. A. Girard

Abstract: Accurate isolation and quantification of intraocular dimensions in the anterior segment (AS) of the eye using optical coherence tomography (OCT) images is important in the diagnosis and treatment of many eye diseases, especially angle closure glaucoma. In this study, we developed a deep convolutional neural network (DCNN) for the localization of the scleral spur, and the segmentation of anterior s… ▽ More Accurate isolation and quantification of intraocular dimensions in the anterior segment (AS) of the eye using optical coherence tomography (OCT) images is important in the diagnosis and treatment of many eye diseases, especially angle closure glaucoma. In this study, we developed a deep convolutional neural network (DCNN) for the localization of the scleral spur, and the segmentation of anterior segment structures (iris, corneo-sclera shell, anterior chamber). With limited training data, the DCNN was able to detect the scleral spur on unseen ASOCT images as accurately as an experienced ophthalmologist; and simultaneously isolated the anterior segment structures with a Dice coefficient of 95.7%. We then automatically extracted eight clinically relevant ASOCT parameters and proposed an automated quality check process that asserts the reliability of these parameters. When combined with an OCT machine capable of imaging multiple radial sections, the algorithms can provide a more complete objective assessment. This is an essential step toward providing a robust automated framework for reliable quantification of ASOCT scans, for applications in the diagnosis and management of angle closure glaucoma. △ Less

Submitted 1 September, 2019; originally announced September 2019.

arXiv:1907.10477 [pdf, other]

On importance-weighted autoencoders

Authors: Axel Finke, Alexandre H. Thiery

Abstract: The importance weighted autoencoder (IWAE) (Burda et al., 2016) is a popular variational-inference method which achieves a tighter evidence bound (and hence a lower bias) than standard variational autoencoders by optimising a multi-sample objective, i.e. an objective that is expressible as an integral over $K > 1$ Monte Carlo samples. Unfortunately, IWAE crucially relies on the availability of rep… ▽ More The importance weighted autoencoder (IWAE) (Burda et al., 2016) is a popular variational-inference method which achieves a tighter evidence bound (and hence a lower bias) than standard variational autoencoders by optimising a multi-sample objective, i.e. an objective that is expressible as an integral over $K > 1$ Monte Carlo samples. Unfortunately, IWAE crucially relies on the availability of reparametrisations and even if these exist, the multi-sample objective leads to inference-network gradients which break down as $K$ is increased (Rainforth et al., 2018). This breakdown can only be circumvented by removing high-variance score-function terms, either by heuristically ignoring them (which yields the 'sticking-the-landing' IWAE (IWAE-STL) gradient from Roeder et al. (2017)) or through an identity from Tucker et al. (2019) (which yields the 'doubly-reparametrised' IWAE (IWAE-DREG) gradient). In this work, we argue that directly optimising the proposal distribution in importance sampling as in the reweighted wake-sleep (RWS) algorithm from Bornschein & Bengio (2015) is preferable to optimising IWAE-type multi-sample objectives. To formalise this argument, we introduce an adaptive-importance sampling framework termed adaptive importance sampling for learning (AISLE) which slightly generalises the RWS algorithm. We then show that AISLE admits IWAE-STL and IWAE-DREG (i.e. the IWAE-gradients which avoid breakdown) as special cases. △ Less

Submitted 19 September, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

arXiv:1906.00507 [pdf, other]

A scalable optimal-transport based local particle filter

Authors: Matthew M. Graham, Alexandre H. Thiery

Abstract: Filtering in spatially-extended dynamical systems is a challenging problem with significant practical applications such as numerical weather prediction. Particle filters allow asymptotically consistent inference but require infeasibly large ensemble sizes for accurate estimates in complex spatial models. Localisation approaches, which perform local state updates by exploiting low dependence betwee… ▽ More Filtering in spatially-extended dynamical systems is a challenging problem with significant practical applications such as numerical weather prediction. Particle filters allow asymptotically consistent inference but require infeasibly large ensemble sizes for accurate estimates in complex spatial models. Localisation approaches, which perform local state updates by exploiting low dependence between variables at distant points, have been suggested as a potential resolution to this issue. Naively applying the resampling step of the particle filter locally however produces implausible spatially discontinuous states. The ensemble transform particle filter replaces resampling with an optimal-transport map and can be localised by computing maps for every spatial mesh node. The resulting local ensemble transport particle filter is however computationally intensive for dense meshes. We propose a new optimal-transport based local particle filter which computes a fixed number of maps independent of the mesh resolution and interpolates these maps across space, reducing the computation required and allowing it to be ensured particles remain spatially smooth. We numerically illustrate that, at a reduced computational cost, we are able to achieve the same accuracy as the local ensemble transport particle filter, and retain its improved robustness to non-Gaussianity and ability to quantify uncertainty when compared to local ensemble Kalman filters. △ Less

Submitted 2 June, 2019; originally announced June 2019.

Comments: 55 pages, 20 figures

arXiv:1809.10589 [pdf, other]

A Deep Learning Approach to Denoise Optical Coherence Tomography Images of the Optic Nerve Head

Authors: Sripad Krishna Devalla, Giridhar Subramanian, Tan Hung Pham, Xiaofei Wang, Shamira Perera, Tin A. Tun, Tin Aung, Leopold Schmetterer, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Purpose: To develop a deep learning approach to de-noise optical coherence tomography (OCT) B-scans of the optic nerve head (ONH). Methods: Volume scans consisting of 97 horizontal B-scans were acquired through the center of the ONH using a commercial OCT device (Spectralis) for both eyes of 20 subjects. For each eye, single-frame (without signal averaging), and multi-frame (75x signal averaging… ▽ More Purpose: To develop a deep learning approach to de-noise optical coherence tomography (OCT) B-scans of the optic nerve head (ONH). Methods: Volume scans consisting of 97 horizontal B-scans were acquired through the center of the ONH using a commercial OCT device (Spectralis) for both eyes of 20 subjects. For each eye, single-frame (without signal averaging), and multi-frame (75x signal averaging) volume scans were obtained. A custom deep learning network was then designed and trained with 2,328 "clean B-scans" (multi-frame B-scans), and their corresponding "noisy B-scans" (clean B-scans + gaussian noise) to de-noise the single-frame B-scans. The performance of the de-noising algorithm was assessed qualitatively, and quantitatively on 1,552 B-scans using the signal to noise ratio (SNR), contrast to noise ratio (CNR), and mean structural similarity index metrics (MSSIM). Results: The proposed algorithm successfully denoised unseen single-frame OCT B-scans. The denoised B-scans were qualitatively similar to their corresponding multi-frame B-scans, with enhanced visibility of the ONH tissues. The mean SNR increased from $4.02 \pm 0.68$ dB (single-frame) to $8.14 \pm 1.03$ dB (denoised). For all the ONH tissues, the mean CNR increased from $3.50 \pm 0.56$ (single-frame) to $7.63 \pm 1.81$ (denoised). The MSSIM increased from $0.13 \pm 0.02$ (single frame) to $0.65 \pm 0.03$ (denoised) when compared with the corresponding multi-frame B-scans. Conclusions: Our deep learning algorithm can denoise a single-frame OCT B-scan of the ONH in under 20 ms, thus offering a framework to obtain superior quality OCT B-scans with reduced scanning times and minimal patient discomfort. △ Less

Submitted 27 September, 2018; originally announced September 2018.

arXiv:1803.00232 [pdf, other]

DRUNET: A Dilated-Residual U-Net Deep Learning Network to Digitally Stain Optic Nerve Head Tissues in Optical Coherence Tomography Images

Authors: Sripad Krishna Devalla, Prajwal K. Renukanand, Bharathwaj K. Sreedhar, Shamira Perera, Jean-Martial Mari, Khai Sing Chin, Tin A. Tun, Nicholas G. Strouthidis, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Given that the neural and connective tissues of the optic nerve head (ONH) exhibit complex morphological changes with the development and progression of glaucoma, their simultaneous isolation from optical coherence tomography (OCT) images may be of great interest for the clinical diagnosis and management of this pathology. A deep learning algorithm was designed and trained to digitally stain (i.e.… ▽ More Given that the neural and connective tissues of the optic nerve head (ONH) exhibit complex morphological changes with the development and progression of glaucoma, their simultaneous isolation from optical coherence tomography (OCT) images may be of great interest for the clinical diagnosis and management of this pathology. A deep learning algorithm was designed and trained to digitally stain (i.e. highlight) 6 ONH tissue layers by capturing both the local (tissue texture) and contextual information (spatial arrangement of tissues). The overall dice coefficient (mean of all tissues) was $0.91 \pm 0.05$ when assessed against manual segmentations performed by an expert observer. We offer here a robust segmentation framework that could be extended for the automated parametric study of the ONH tissues. △ Less

Submitted 1 March, 2018; originally announced March 2018.

arXiv:1707.07609 [pdf, other]

A Deep Learning Approach to Digitally Stain Optical Coherence Tomography Images of the Optic Nerve Head

Authors: Sripad Krishna Devalla, Jean-Martial Mari, Tin A. Tun, Nicholas G. Strouthidis, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard

Abstract: Purpose: To develop a deep learning approach to digitally-stain optical coherence tomography (OCT) images of the optic nerve head (ONH). Methods: A horizontal B-scan was acquired through the center of the ONH using OCT (Spectralis) for 1 eye of each of 100 subjects (40 normal & 60 glaucoma). All images were enhanced using adaptive compensation. A custom deep learning network was then designed an… ▽ More Purpose: To develop a deep learning approach to digitally-stain optical coherence tomography (OCT) images of the optic nerve head (ONH). Methods: A horizontal B-scan was acquired through the center of the ONH using OCT (Spectralis) for 1 eye of each of 100 subjects (40 normal & 60 glaucoma). All images were enhanced using adaptive compensation. A custom deep learning network was then designed and trained with the compensated images to digitally stain (i.e. highlight) 6 tissue layers of the ONH. The accuracy of our algorithm was assessed (against manual segmentations) using the Dice coefficient, sensitivity, and specificity. We further studied how compensation and the number of training images affected the performance of our algorithm. Results: For images it had not yet assessed, our algorithm was able to digitally stain the retinal nerve fiber layer + prelamina, the retinal pigment epithelium, all other retinal layers, the choroid, and the peripapillary sclera and lamina cribrosa. For all tissues, the mean dice coefficient was $0.84 \pm 0.03$, the mean sensitivity $0.92 \pm 0.03$, and the mean specificity $0.99 \pm 0.00$. Our algorithm performed significantly better when compensated images were used for training. Increasing the number of images (from 10 to 40) to train our algorithm did not significantly improve performance, except for the RPE. Conclusion. Our deep learning algorithm can simultaneously stain neural and connective tissues in ONH images. Our approach offers a framework to automatically measure multiple key structural parameters of the ONH that may be critical to improve glaucoma management. △ Less

Submitted 24 July, 2017; originally announced July 2017.

arXiv:1707.05200 [pdf, other]

A Discrete Bouncy Particle Sampler

Authors: Chris Sherlock, Alexandre H. Thiery

Abstract: Most Markov chain Monte Carlo methods operate in discrete time and are reversible with respect to the target probability. Nevertheless, it is now understood that the use of non-reversible Markov chains can be beneficial in many contexts. In particular, the recently-proposed Bouncy Particle Sampler leverages a continuous-time and non-reversible Markov process and empirically shows state-of-the-art… ▽ More Most Markov chain Monte Carlo methods operate in discrete time and are reversible with respect to the target probability. Nevertheless, it is now understood that the use of non-reversible Markov chains can be beneficial in many contexts. In particular, the recently-proposed Bouncy Particle Sampler leverages a continuous-time and non-reversible Markov process and empirically shows state-of-the-art performances when used to explore certain probability densities; however, its implementation typically requires the computation of local upper bounds on the gradient of the log target density. We present the Discrete Bouncy Particle Sampler, a general algorithm based upon a guided random walk, a partial refreshment of direction, and a delayed-rejection step. We show that the Bouncy Particle Sampler can be understood as a scaling limit of a special case of our algorithm. In contrast to the Bouncy Particle Sampler, implementing the Discrete Bouncy Particle Sampler only requires point-wise evaluation of the target density and its gradient. We propose extensions of the basic algorithm for situations when the exact gradient of the target density is not available. In a Gaussian setting, we establish a scaling limit for the radial process as dimension increases to infinity. We leverage this result to obtain the theoretical efficiency of the Discrete Bouncy Particle Sampler as a function of the partial-refreshment parameter, which leads to a simple and robust tuning criterion. A further analysis in a more general setting suggests that this tuning criterion applies more generally. Theoretical and empirical efficiency curves are then compared for different targets and algorithm variations. △ Less

Submitted 22 February, 2021; v1 submitted 17 July, 2017; originally announced July 2017.

Comments: New theory for anisotropic targets

MSC Class: 65C40

arXiv:1610.09788 [pdf, other]

Pseudo-marginal Metropolis--Hastings using averages of unbiased estimators

Authors: Chris Sherlock, Alexandre Thiery, Anthony Lee

Abstract: We consider a pseudo-marginal Metropolis--Hastings kernel $P_m$ that is constructed using an average of $m$ exchangeable random variables, as well as an analogous kernel $P_s$ that averages $s<m$ of these same random variables. Using an embedding technique to facilitate comparisons, we show that the asymptotic variances of ergodic averages associated with $P_m$ are lower bounded in terms of those… ▽ More We consider a pseudo-marginal Metropolis--Hastings kernel $P_m$ that is constructed using an average of $m$ exchangeable random variables, as well as an analogous kernel $P_s$ that averages $s<m$ of these same random variables. Using an embedding technique to facilitate comparisons, we show that the asymptotic variances of ergodic averages associated with $P_m$ are lower bounded in terms of those associated with $P_s$. We show that the bound provided is tight and disprove a conjecture that when the random variables to be averaged are independent, the asymptotic variance under $P_m$ is never less than $s/m$ times the variance under $P_s$. The conjecture does, however, hold when considering continuous-time Markov chains. These results imply that if the computational cost of the algorithm is proportional to $m$, it is often better to set $m=1$. We provide intuition as to why these findings differ so markedly from recent results for pseudo-marginal kernels employing particle filter approximations. Our results are exemplified through two simulation studies; in the first the computational cost is effectively proportional to $m$ and in the second there is a considerable start-up cost at each iteration. △ Less

Submitted 31 October, 2016; originally announced October 2016.

arXiv:1607.04419 [pdf, ps, other]

doi 10.1103/PhysRevA.95.053418

Lévy statistics of interacting Rydberg gases

Authors: Thibault Vogt, **gshan Han, Alexandre Thiery, Wenhui Li

Abstract: A statistical analysis of the laser excitation of cold and randomly distributed atoms to Rydberg states is developed. We first demonstrate with a hard ball model that the distribution of energy level shifts in an interacting gas obeys Lévy statistics, in any dimension $d$ and for any interaction $-C_p/R^p$ under the condition $d/p<1$. This result is confirmed with a Monte Carlo rate equations simu… ▽ More A statistical analysis of the laser excitation of cold and randomly distributed atoms to Rydberg states is developed. We first demonstrate with a hard ball model that the distribution of energy level shifts in an interacting gas obeys Lévy statistics, in any dimension $d$ and for any interaction $-C_p/R^p$ under the condition $d/p<1$. This result is confirmed with a Monte Carlo rate equations simulation of the actual laser excitation in the particular case $p=6$ and $d=3$. With this finding, we develop a statistical approach for the modeling of probe light transmission through a cold atom gas driven under conditions of electromagnetically induced transparency involving a Rydberg state. The simulated results are in good agreement with experiment. △ Less

Submitted 31 January, 2018; v1 submitted 15 July, 2016; originally announced July 2016.

Journal ref: Phys. Rev. A 95, 053418 (2017)

arXiv:1606.01016 [pdf, other]

On Coupling Particle Filter Trajectories

Authors: Deborshee Sen, Alexandre Thiery, Ajay Jasra

Abstract: Particle filters are a powerful and flexible tool for performing inference on state-space models. They involve a collection of samples evolving over time through a combination of sampling and re-sampling steps. The re-sampling step is necessary to ensure that weight degeneracy is avoided. In several situations of statistical interest, it is important to be able to compare the estimates produced by… ▽ More Particle filters are a powerful and flexible tool for performing inference on state-space models. They involve a collection of samples evolving over time through a combination of sampling and re-sampling steps. The re-sampling step is necessary to ensure that weight degeneracy is avoided. In several situations of statistical interest, it is important to be able to compare the estimates produced by two different particle filters; consequently, being able to efficiently couple two particle filter trajectories is often of paramount importance. In this text, we propose several ways to do so. In particular, we leverage ideas from the optimal transportation literature. In general, though, computing the optimal transport map is extremely computationally expensive; to deal with this, we introduce computationally tractable approximations to optimal transport couplings. We demonstrate that our resulting algorithms for coupling two particle filter trajectories often perform orders of magnitude more efficiently than more standard approaches. △ Less

Submitted 16 March, 2017; v1 submitted 3 June, 2016; originally announced June 2016.

arXiv:1510.02577 [pdf, other]

Asymptotic Analysis of the Random-Walk Metropolis Algorithm on Ridged Densities

Authors: Alexandros Beskos, Gareth Roberts, Alexandre Thiery, Natesh Pillai

Abstract: In this paper we study the asymptotic behavior of the Random-Walk Metropolis algorithm on probability densities with two different `scales', where most of the probability mass is distributed along certain key directions with the `orthogonal' directions containing relatively less mass. Such class of probability measures arise in various applied contexts including Bayesian inverse problems where the… ▽ More In this paper we study the asymptotic behavior of the Random-Walk Metropolis algorithm on probability densities with two different `scales', where most of the probability mass is distributed along certain key directions with the `orthogonal' directions containing relatively less mass. Such class of probability measures arise in various applied contexts including Bayesian inverse problems where the posterior measure concentrates on a sub-manifold when the noise variance goes to zero. When the target measure concentrates on a linear sub-manifold, we derive analytically a diffusion limit for the Random-Walk Metropolis Markov chain as the scale parameter goes to zero. In contrast to the existing works on scaling limits, our limiting Stochastic Differential Equation does not in general have a constant diffusion coefficient. Our results show that in some cases, the usual practice of adapting the step-size to control the acceptance probability might be sub-optimal as the optimal acceptance probability is zero (in the limit). △ Less

Submitted 9 October, 2015; originally announced October 2015.

arXiv:1509.08775 [pdf, other]

Error Bounds for Sequential Monte Carlo Samplers for Multimodal Distributions

Authors: Daniel Paulin, Ajay Jasra, Alexandre Thiery

Abstract: In this paper, we provide bounds on the asymptotic variance for a class of sequential Monte Carlo (SMC) samplers designed for approximating multimodal distributions. Such methods combine standard SMC methods and Markov chain Monte Carlo (MCMC) kernels. Our bounds improve upon previous results, and unlike some earlier work, they also apply in the case when the MCMC kernels can move between the mode… ▽ More In this paper, we provide bounds on the asymptotic variance for a class of sequential Monte Carlo (SMC) samplers designed for approximating multimodal distributions. Such methods combine standard SMC methods and Markov chain Monte Carlo (MCMC) kernels. Our bounds improve upon previous results, and unlike some earlier work, they also apply in the case when the MCMC kernels can move between the modes. We apply our results to the Potts model from statistical physics. In this case, the problem of sharp peaks is encountered. Earlier methods, such as parallel tempering, are only able to sample from it at an exponential (in an important parameter of the model) cost. We propose a sequence of interpolating distributions called interpolation to independence, and show that the SMC sampler based on it is able to sample from this target distribution at a polynomial cost. We believe that our method is generally applicable to many other distributions as well. △ Less

Submitted 24 January, 2018; v1 submitted 29 September, 2015; originally announced September 2015.

Comments: 42 pages, 6 figures. Some minor corrections in this version

MSC Class: 60J22; 65C05; 65Z05

arXiv:1506.08155 [pdf, other]

Efficiency of delayed-acceptance random walk Metropolis algorithms

Authors: Chris Sherlock, Alexandre Thiery, Andrew Golightly

Abstract: Delayed-acceptance Metropolis-Hastings and delayed-acceptance pseudo-marginal Metropolis-Hastings algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased stochastic approximation thereof, but a computationally cheap deterministic approximation is available. An initial accept-reject stage uses the cheap approximation for computing the Metropolis… ▽ More Delayed-acceptance Metropolis-Hastings and delayed-acceptance pseudo-marginal Metropolis-Hastings algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased stochastic approximation thereof, but a computationally cheap deterministic approximation is available. An initial accept-reject stage uses the cheap approximation for computing the Metropolis-Hastings ratio; proposals which are accepted at this stage are then subjected to a further accept-reject step which corrects for the error in the approximation. Since the expensive posterior, or the approximation thereof, is only evaluated for proposals which are accepted at the first stage, the cost of the algorithm is reduced and larger scalings may be used. We focus on the random walk Metropolis (RWM) and consider the delayed-acceptance RWM and the delayed-acceptance pseudo-marginal RWM. We provide a framework for incorporating relatively general deterministic approximations into the theoretical analysis of high-dimensional targets. Justified by diffusion approximation arguments, we derive expressions for the limiting efficiency and acceptance rates in high-dimensional settings. These theoretical insights are finally leveraged to formulate practical guidelines for the efficient tuning of the algorithms. The robustness of these guidelines and predicted properties are verified against simulation studies, all of which are strictly outside of the domain of validity of our limit results. △ Less

Submitted 23 February, 2021; v1 submitted 26 June, 2015; originally announced June 2015.

MSC Class: 65C05; 65C40; 60F05

arXiv:1409.5281 [pdf, ps, other]

$q$-Varieties and Drinfeld Modules

Authors: Alain Thiéry

Abstract: Let $\mathbb{F}_q$ be the finite field with $q$ elements, $K$ be an algebraically closed field containing $\mathbb{F}_q$, $K\{τ\}$ be the Ore ring of $\mathbb{F}_q$-linear polynomials and $Λ_n$ be a free $K\{τ\}$-module of rank $n$. In a first part, we prove that there is a bijection between the set of Zariski closed subsets of $K^n$ which are also $\mathbb{F}_q$-vector spaces, the so-called… ▽ More Let $\mathbb{F}_q$ be the finite field with $q$ elements, $K$ be an algebraically closed field containing $\mathbb{F}_q$, $K\{τ\}$ be the Ore ring of $\mathbb{F}_q$-linear polynomials and $Λ_n$ be a free $K\{τ\}$-module of rank $n$. In a first part, we prove that there is a bijection between the set of Zariski closed subsets of $K^n$ which are also $\mathbb{F}_q$-vector spaces, the so-called $q$-varities, and the set of radical $K\{τ\}$-submodules of $Λ_n$. We also study the dimension of $q$-varieties and their tangent spaces. Let $F$ be a $q$-variety, $K\{F\} := Mor(F,K)$ be the set of $\mathbb{F}_q$-linear polynomial maps from $F$ to $K$. Let $A=\mathbb{F}_q[T]$ and choose $δ: A \longrightarrow K$ a ring morphism. By definition, an $A$-module structure on $F$ is a ring morphism $Φ: A \longrightarrow End(F)$ such that, for all $a\in A$, $$d(Φ_a) = δ(a) Id_{T(F)}$$ where $T(F)$ is the tangent space of $F$ and $d(Φ_a)$ the differential map. We prove that $K(F) := K(T)\otimes_{K[T]}K\{F\}$ has finite dimension over $K(T)$. This dimension is called the rank of the $A$-module and is denoted by $r(F)$. We then prove that there exists $c \in A\setminus \{0\}$ such that for all $a\in A$, prime to $c$, $$Tor(a,F) := \{x\in F \mid Φ_a(x) = 0\} = (A/aA)^{r(F)}.$$ △ Less

Submitted 18 September, 2014; originally announced September 2014.

arXiv:1409.0578 [pdf, other]

Consistency and fluctuations for stochastic gradient Langevin dynamics

Authors: Yee Whye Teh, Alexandre Thiéry, Sebastian Vollmer

Abstract: Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally expensive. Both the calculation of the acceptance probability and the creation of informed proposals usually require an iteration through the whole data set. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem by generating proposals which are only based… ▽ More Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally expensive. Both the calculation of the acceptance probability and the creation of informed proposals usually require an iteration through the whole data set. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem by generating proposals which are only based on a subset of the data, by skip** the accept-reject step and by using decreasing step-sizes sequence $(δ_m)_{m \geq 0}$. %Under appropriate Lyapunov conditions, We provide in this article a rigorous mathematical framework for analysing this algorithm. We prove that, under verifiable assumptions, the algorithm is consistent, satisfies a central limit theorem (CLT) and its asymptotic bias-variance decomposition can be characterized by an explicit functional of the step-sizes sequence $(δ_m)_{m \geq 0}$. We leverage this analysis to give practical recommendations for the notoriously difficult tuning of this algorithm: it is asymptotically optimal to use a step-size sequence of the type $δ_m \asymp m^{-1/3}$, leading to an algorithm whose mean squared error (MSE) decreases at rate $\mathcal{O}(m^{-1/3})$ △ Less

Submitted 12 June, 2015; v1 submitted 1 September, 2014; originally announced September 2014.

Comments: 35 pages, 5 figures

MSC Class: 60J22; 65C40

arXiv:1401.6140 [pdf, other]

The density of sets avoiding distance 1 in Euclidean space

Authors: Christine Bachoc, Alberto Passuello, Alain Thiery

Abstract: We improve by an exponential factor the best known asymptotic upper bound for the density of sets avoiding 1 in Euclidean space. This result is obtained by a combination of an analytic bound that is an analogue of Lovasz theta number and of a combinatorial argument involving finite subgraphs of the unit distance graph. In turn, we straightforwardly obtain an asymptotic improvement for the measurab… ▽ More We improve by an exponential factor the best known asymptotic upper bound for the density of sets avoiding 1 in Euclidean space. This result is obtained by a combination of an analytic bound that is an analogue of Lovasz theta number and of a combinatorial argument involving finite subgraphs of the unit distance graph. In turn, we straightforwardly obtain an asymptotic improvement for the measurable chromatic number of Euclidean space. We also tighten previous results for the dimensions between 4 and 24. △ Less

Submitted 29 January, 2015; v1 submitted 23 January, 2014; originally announced January 2014.

Comments: Revised version, to appear in Discrete and Computational Geometry

arXiv:1309.7209 [pdf, ps, other]

doi 10.1214/14-AOS1278

On the efficiency of pseudo-marginal random walk Metropolis algorithms

Authors: Chris Sherlock, Alexandre H. Thiery, Gareth O. Roberts, Jeffrey S. Rosenthal

Abstract: We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approac… ▽ More We examine the behaviour of the pseudo-marginal random walk Metropolis algorithm, where evaluations of the target density for the accept/reject probability are estimated rather than computed precisely. Under relatively general conditions on the target distribution, we obtain limiting formulae for the acceptance rate and for the expected squared jump distance, as the dimension of the target approaches infinity, under the assumption that the noise in the estimate of the log-target is additive and is independent of the position. For targets with independent and identically distributed components, we also obtain a limiting diffusion for the first component. We then consider the overall efficiency of the algorithm, in terms of both speed of mixing and computational time. Assuming the additive noise is Gaussian and is inversely proportional to the number of unbiased estimates that are used, we prove that the algorithm is optimally efficient when the variance of the noise is approximately 3.283 and the acceptance rate is approximately 7.001%. We also find that the optimal scaling is insensitive to the noise and that the optimal variance of the noise is insensitive to the scaling. The theory is illustrated with a simulation study using the particle marginal random walk Metropolis. △ Less

Submitted 30 December, 2014; v1 submitted 27 September, 2013; originally announced September 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOS1278 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1278

Journal ref: Annals of Statistics 2015, Vol. 43, No. 1, 238-275

arXiv:1309.6473 [pdf, ps, other]

doi 10.1214/15-AOS1311

On nonnegative unbiased estimators

Authors: Pierre E. Jacob, Alexandre H. Thiery

Abstract: We study the existence of algorithms generating almost surely nonnegative unbiased estimators. We show that given a nonconstant real-valued function $f$ and a sequence of unbiased estimators of $λ\in\mathbb{R}$, there is no algorithm yielding almost surely nonnegative unbiased estimators of $f(λ)\in\mathbb{R}^+$. The study is motivated by pseudo-marginal Monte Carlo algorithms that rely on such no… ▽ More We study the existence of algorithms generating almost surely nonnegative unbiased estimators. We show that given a nonconstant real-valued function $f$ and a sequence of unbiased estimators of $λ\in\mathbb{R}$, there is no algorithm yielding almost surely nonnegative unbiased estimators of $f(λ)\in\mathbb{R}^+$. The study is motivated by pseudo-marginal Monte Carlo algorithms that rely on such nonnegative unbiased estimators. These methods allow "exact inference" in intractable models, in the sense that integrals with respect to a target distribution can be estimated without any systematic error, even though the associated probability density function cannot be evaluated pointwise. We discuss the consequences of our results on the applicability of pseudo-marginal algorithms and thus on the possibility of exact inference in intractable models. We illustrate our study with particular choices of functions $f$ corresponding to known challenges in statistics, such as exact simulation of diffusions, inference in large datasets and doubly intractable distributions. △ Less

Submitted 1 April, 2015; v1 submitted 25 September, 2013; originally announced September 2013.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1311 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1311

Journal ref: Annals of Statistics 2015, Vol. 43, No. 2, 769-784

arXiv:1306.6462 [pdf, other]

On the Convergence of Adaptive Sequential Monte Carlo Methods

Authors: Alexandros Beskos, Ajay Jasra, Nikolas Kantas, Alexandre Thiery

Abstract: In several implementations of Sequential Monte Carlo (SMC) methods it is natural, and important in terms of algorithmic efficiency, to exploit the information of the history of the samples to optimally tune their subsequent propagations. In this article we provide a carefully formulated asymptotic theory for a class of such \emph{adaptive} SMC methods. The theoretical framework developed here will… ▽ More In several implementations of Sequential Monte Carlo (SMC) methods it is natural, and important in terms of algorithmic efficiency, to exploit the information of the history of the samples to optimally tune their subsequent propagations. In this article we provide a carefully formulated asymptotic theory for a class of such \emph{adaptive} SMC methods. The theoretical framework developed here will cover, under assumptions, several commonly used SMC algorithms. There are only limited results about the theoretical underpinning of such adaptive methods: we will bridge this gap by providing a weak law of large numbers (WLLN) and a central limit theorem (CLT) for some of these algorithms. The latter seems to be the first result of its kind in the literature and provides a formal justification of algorithms used in many real data context. We establish that for a general class of adaptive SMC algorithms the asymptotic variance of the estimators from the adaptive SMC method is \emph{identical} to a so-called `perfect' SMC algorithm which uses ideal proposal kernels. Our results are supported by application on a complex high-dimensional posterior distribution associated with the Navier-Stokes model, where adapting high-dimensional parameters of the proposal kernels is critical for the efficiency of the algorithm. △ Less

Submitted 5 February, 2014; v1 submitted 27 June, 2013; originally announced June 2013.

arXiv:1108.1494 [pdf, other]

Gradient Flow from a Random Walk in Hilbert Space

Authors: Natesh S. Pillai, Andrew M. Stuart, Alexandre H. Thiery

Abstract: Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metr… ▽ More Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metropolis-Hastings accept-reject mechanism to an Ornstein-Uhlenbeck proposal which is itself reversible with respect to the underlying Gaussian measure. The resulting noisy gradient flow is a stochastic partial differential equation driven by a Wiener process with spatial correlation given by the underlying Gaussian structure. △ Less

Submitted 18 April, 2014; v1 submitted 6 August, 2011; originally announced August 2011.

Comments: Major revision of the original version

arXiv:1103.0542 [pdf, ps, other]

doi 10.1214/11-AAP828

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

Authors: Natesh S. Pillai, Andrew M. Stuart, Alexandre H. Thiéry

Abstract: The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class of target measures supported on an infinite dimensional Hilbert space. These natural measures have density with respect to a Gaussian random field me… ▽ More The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class of target measures supported on an infinite dimensional Hilbert space. These natural measures have density with respect to a Gaussian random field measure and arise in many applications such as Bayesian nonparametric statistics and the theory of conditioned diffusions. We prove that, started in stationarity, a suitably interpolated and scaled version of the Markov chain corresponding to MALA converges to an infinite dimensional diffusion process. Our results imply that, in stationarity, the MALA algorithm applied to an N-dimensional approximation of the target will take $\mathcal{O}(N^{1/3})$ steps to explore the invariant measure, comparing favorably with the Random Walk Metropolis which was recently shown to require $\mathcal{O}(N)$ steps when applied to the same class of problems. △ Less

Submitted 28 November, 2012; v1 submitted 2 March, 2011; originally announced March 2011.

Comments: Published in at http://dx.doi.org/10.1214/11-AAP828 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP828

Journal ref: Annals of Applied Probability 2012, Vol. 22, No. 6, 2320-2356

arXiv:1103.0444 [pdf, ps, other]

On the theta number of powers of cycle graphs

Authors: Christine Bachoc, Arnaud Pêcher, Alain Thiéry

Abstract: We give a closed formula for Lovasz theta number of the powers of cycle graphs and of their complements, the circular complete graphs. As a consequence, we establish that the circular chromatic number of a circular perfect graph is computable in polynomial time. We also derive an asymptotic estimate for this theta number. We give a closed formula for Lovasz theta number of the powers of cycle graphs and of their complements, the circular complete graphs. As a consequence, we establish that the circular chromatic number of a circular perfect graph is computable in polynomial time. We also derive an asymptotic estimate for this theta number. △ Less

Submitted 2 March, 2011; originally announced March 2011.

Comments: 17 pages

Showing 1–48 of 48 results for author: Thiery, A