Search | arXiv e-print repository

Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

Authors: Hong Ye Tan, Ziruo Cai, Marcelo Pereyra, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

Abstract: Unsupervised learning is a training approach in the situation where ground truth data is unavailable, such as inverse imaging problems. We present an unsupervised Bayesian training approach to learning convex neural network regularizers using a fixed noisy dataset, based on a dual Markov chain estimation method. Compared to classical supervised adversarial regularization methods, where there is ac… ▽ More Unsupervised learning is a training approach in the situation where ground truth data is unavailable, such as inverse imaging problems. We present an unsupervised Bayesian training approach to learning convex neural network regularizers using a fixed noisy dataset, based on a dual Markov chain estimation method. Compared to classical supervised adversarial regularization methods, where there is access to both clean images as well as unlimited to noisy copies, we demonstrate close performance on natural image Gaussian deconvolution and Poisson denoising tasks. △ Less

Submitted 8 April, 2024; originally announced April 2024.

MSC Class: 62C12; 62F15; 65C40; 65J22

arXiv:2402.01052 [pdf, other]

Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation

Authors: Zakhar Shumaylov, Jeremy Budd, Subhadip Mukherjee, Carola-Bibiane Schönlieb

Abstract: Variational regularisation is the primary method for solving inverse problems, and recently there has been considerable work leveraging deeply learned regularisation for enhanced performance. However, few results exist addressing the convergence of such regularisation, particularly within the context of critical points as opposed to global minimisers. In this paper, we present a generalised formul… ▽ More Variational regularisation is the primary method for solving inverse problems, and recently there has been considerable work leveraging deeply learned regularisation for enhanced performance. However, few results exist addressing the convergence of such regularisation, particularly within the context of critical points as opposed to global minimisers. In this paper, we present a generalised formulation of convergent regularisation in terms of critical points, and show that this is achieved by a class of weakly convex regularisers. We prove convergence of the primal-dual hybrid gradient method for the associated variational problem, and, given a Kurdyka-Lojasiewicz condition, an $\mathcal{O}(\log{k}/k)$ ergodic convergence rate. Finally, applying this theory to learned regularisation, we prove universal approximation for input weakly convex neural networks (IWCNN), and show empirically that IWCNNs can lead to improved performance of learned adversarial regularisers for computed tomography (CT) reconstruction. △ Less

Submitted 15 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 26 pages, 4 figures; https://openreview.net/forum?id=E8FpcUyPuS

arXiv:2312.16188 [pdf, other]

The curious case of the test set AUROC

Authors: Michael Roberts, Alon Hazan, Sören Dittmer, James H. F. Rudd, Carola-Bibiane Schönlieb

Abstract: Whilst the size and complexity of ML models have rapidly and significantly increased over the past decade, the methods for assessing their performance have not kept pace. In particular, among the many potential performance metrics, the ML community stubbornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from trai… ▽ More Whilst the size and complexity of ML models have rapidly and significantly increased over the past decade, the methods for assessing their performance have not kept pace. In particular, among the many potential performance metrics, the ML community stubbornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from training data) or (b) the sensitivity and specificity for the test data at an optimal threshold determined from the validation ROC. However, we argue that considering scores derived from the test ROC curve alone gives only a narrow insight into how a model performs and its ability to generalise. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 3 pages, 4 figures

arXiv:2311.15996 [pdf, other]

Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

Authors: Teo Deveney, Jan Stanczuk, Lisa Maria Kreusser, Chris Budd, Carola-Bibiane Schönlieb

Abstract: Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based sa… ▽ More Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.05812 [pdf, other]

Provably Convergent Data-Driven Convex-Nonconvex Regularization

Authors: Zakhar Shumaylov, Jeremy Budd, Subhadip Mukherjee, Carola-Bibiane Schönlieb

Abstract: An emerging new paradigm for solving inverse problems is via the use of deep learning to learn a regularizer from data. This leads to high-quality results, but often at the cost of provable guarantees. In this work, we show how well-posedness and convergent regularization arises within the convex-nonconvex (CNC) framework for inverse problems. We introduce a novel input weakly convex neural networ… ▽ More An emerging new paradigm for solving inverse problems is via the use of deep learning to learn a regularizer from data. This leads to high-quality results, but often at the cost of provable guarantees. In this work, we show how well-posedness and convergent regularization arises within the convex-nonconvex (CNC) framework for inverse problems. We introduce a novel input weakly convex neural network (IWCNN) construction to adapt the method of learned adversarial regularization to the CNC framework. Empirically we show that our method overcomes numerical issues of previous adversarial methods. △ Less

Submitted 2 November, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023 Workshop on Deep Learning and Inverse Problems

arXiv:2306.17737 [pdf, other]

Proximal Langevin Sampling With Inexact Proximal Map**

Authors: Matthias J. Ehrhardt, Lorenz Kuger, Carola-Bibiane Schönlieb

Abstract: In order to solve tasks like uncertainty quantification or hypothesis tests in Bayesian imaging inverse problems, we often have to draw samples from the arising posterior distribution. For the usually log-concave but high-dimensional posteriors, Markov chain Monte Carlo methods based on time discretizations of Langevin diffusion are a popular tool. If the potential defining the distribution is non… ▽ More In order to solve tasks like uncertainty quantification or hypothesis tests in Bayesian imaging inverse problems, we often have to draw samples from the arising posterior distribution. For the usually log-concave but high-dimensional posteriors, Markov chain Monte Carlo methods based on time discretizations of Langevin diffusion are a popular tool. If the potential defining the distribution is non-smooth, these discretizations are usually of an implicit form leading to Langevin sampling algorithms that require the evaluation of proximal operators. For some of the potentials relevant in imaging problems this is only possible approximately using an iterative scheme. We investigate the behaviour of a proximal Langevin algorithm under the presence of errors in the evaluation of proximal map**s. We generalize existing non-asymptotic and asymptotic convergence results of the exact algorithm to our inexact setting and quantify the bias between the target and the algorithm's stationary distribution due to the errors. We show that the additional bias stays bounded for bounded errors and converges to zero for decaying errors in a strongly convex setting. We apply the inexact algorithm to sample numerically from the posterior of typical imaging inverse problems in which we can only approximate the proximal operator by an iterative scheme and validate our theoretical convergence results. △ Less

Submitted 13 May, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: 29 pages, 11 figures

arXiv:2304.12141 [pdf, other]

Variational Diffusion Auto-encoder: Latent Space Extraction from Pre-trained Diffusion Models

Authors: Georgios Batzolis, Jan Stanczuk, Carola-Bibiane Schönlieb

Abstract: As a widely recognized approach to deep generative modeling, Variational Auto-Encoders (VAEs) still face challenges with the quality of generated images, often presenting noticeable blurriness. This issue stems from the unrealistic assumption that approximates the conditional data distribution, $p(\textbf{x} | \textbf{z})$, as an isotropic Gaussian. In this paper, we propose a novel solution to ad… ▽ More As a widely recognized approach to deep generative modeling, Variational Auto-Encoders (VAEs) still face challenges with the quality of generated images, often presenting noticeable blurriness. This issue stems from the unrealistic assumption that approximates the conditional data distribution, $p(\textbf{x} | \textbf{z})$, as an isotropic Gaussian. In this paper, we propose a novel solution to address these issues. We illustrate how one can extract a latent space from a pre-existing diffusion model by optimizing an encoder to maximize the marginal data log-likelihood. Furthermore, we demonstrate that a decoder can be analytically derived post encoder-training, employing the Bayes rule for scores. This leads to a VAE-esque deep latent variable model, which discards the need for Gaussian assumptions on $p(\textbf{x} | \textbf{z})$ or the training of a separate decoder network. Our method, which capitalizes on the strengths of pre-trained diffusion models and equips them with latent spaces, results in a significant enhancement to the performance of VAEs. △ Less

Submitted 18 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: arXiv admin note: text overlap with arXiv:2212.12611, arXiv:2207.09786

arXiv:2304.08342 [pdf, other]

NF-ULA: Langevin Monte Carlo with Normalizing Flow Prior for Imaging Inverse Problems

Authors: Ziruo Cai, Junqi Tang, Subhadip Mukherjee, **glai Li, Carola Bibiane Schönlieb, Xiaoqun Zhang

Abstract: Bayesian methods for solving inverse problems are a powerful alternative to classical methods since the Bayesian approach offers the ability to quantify the uncertainty in the solution. In recent years, data-driven techniques for solving inverse problems have also been remarkably successful, due to their superior representation ability. In this work, we incorporate data-based models into a class o… ▽ More Bayesian methods for solving inverse problems are a powerful alternative to classical methods since the Bayesian approach offers the ability to quantify the uncertainty in the solution. In recent years, data-driven techniques for solving inverse problems have also been remarkably successful, due to their superior representation ability. In this work, we incorporate data-based models into a class of Langevin-based sampling algorithms for Bayesian inference in imaging inverse problems. In particular, we introduce NF-ULA (Normalizing Flow-based Unadjusted Langevin algorithm), which involves learning a normalizing flow (NF) as the image prior. We use NF to learn the prior because a tractable closed-form expression for the log prior enables the differentiation of it using autograd libraries. Our algorithm only requires a normalizing flow-based generative network, which can be pre-trained independently of the considered inverse problem and the forward operator. We perform theoretical analysis by investigating the well-posedness and non-asymptotic convergence of the resulting NF-ULA algorithm. The efficacy of the proposed NF-ULA algorithm is demonstrated in various image restoration problems such as image deblurring, image inpainting, and limited-angle X-ray computed tomography (CT) reconstruction. NF-ULA is found to perform better than competing methods for severely ill-posed inverse problems. △ Less

Submitted 14 October, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.07271 [pdf, other]

Provably Convergent Plug-and-Play Quasi-Newton Methods

Authors: Hong Ye Tan, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

Abstract: Plug-and-Play (PnP) methods are a class of efficient iterative methods that aim to combine data fidelity terms and deep denoisers using classical optimization algorithms, such as ISTA or ADMM, with applications in inverse problems and imaging. Provable PnP methods are a subclass of PnP methods with convergence guarantees, such as fixed point convergence or convergence to critical points of some en… ▽ More Plug-and-Play (PnP) methods are a class of efficient iterative methods that aim to combine data fidelity terms and deep denoisers using classical optimization algorithms, such as ISTA or ADMM, with applications in inverse problems and imaging. Provable PnP methods are a subclass of PnP methods with convergence guarantees, such as fixed point convergence or convergence to critical points of some energy function. Many existing provable PnP methods impose heavy restrictions on the denoiser or fidelity function, such as non-expansiveness or strict convexity, respectively. In this work, we propose a novel algorithmic approach incorporating quasi-Newton steps into a provable PnP framework based on proximal denoisers, resulting in greatly accelerated convergence while retaining light assumptions on the denoiser. By characterizing the denoiser as the proximal operator of a weakly convex function, we show that the fixed points of the proposed quasi-Newton PnP algorithm are critical points of a weakly convex function. Numerical experiments on image deblurring and super-resolution demonstrate 2--8x faster convergence as compared to other provable PnP methods with similar reconstruction quality. △ Less

Submitted 13 November, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

MSC Class: 49M15; 49J52; 65K15

arXiv:2212.12611 [pdf, other]

Your diffusion model secretly knows the dimension of the data manifold

Authors: Jan Stanczuk, Georgios Batzolis, Teo Deveney, Carola-Bibiane Schönlieb

Abstract: In this work, we propose a novel framework for estimating the dimension of the data manifold using a trained diffusion model. A diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that, if the data concentrates around a manifold embedded in the high-dimensional ambien… ▽ More In this work, we propose a novel framework for estimating the dimension of the data manifold using a trained diffusion model. A diffusion model approximates the score function i.e. the gradient of the log density of a noise-corrupted version of the target distribution for varying levels of corruption. We prove that, if the data concentrates around a manifold embedded in the high-dimensional ambient space, then as the level of corruption decreases, the score function points towards the manifold, as this direction becomes the direction of maximal likelihood increase. Therefore, for small levels of corruption, the diffusion model provides us with access to an approximation of the normal bundle of the data manifold. This allows us to estimate the dimension of the tangent space, thus, the intrinsic dimension of the data manifold. To the best of our knowledge, our method is the first estimator of the data manifold dimension based on diffusion models and it outperforms well established statistical estimators in controlled experiments on both Euclidean and image data. △ Less

Submitted 25 May, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.09786

arXiv:2209.05546 [pdf, other]

doi 10.1088/1361-6420/acb2ba

Spectral decomposition of atomic structures in heterogeneous cryo-EM

Authors: Carlos Esteve-Yagüe, Willem Diepeveen, Ozan Öktem, Carola-Bibiane Schönlieb

Abstract: We consider the problem of recovering the three-dimensional atomic structure of a flexible macromolecule from a heterogeneous cryo-EM dataset. The dataset contains noisy tomographic projections of the electrostatic potential of the macromolecule, taken from different viewing directions, and in the heterogeneous case, each image corresponds to a different conformation of the macromolecule. Under th… ▽ More We consider the problem of recovering the three-dimensional atomic structure of a flexible macromolecule from a heterogeneous cryo-EM dataset. The dataset contains noisy tomographic projections of the electrostatic potential of the macromolecule, taken from different viewing directions, and in the heterogeneous case, each image corresponds to a different conformation of the macromolecule. Under the assumption that the macromolecule can be modelled as a chain, or discrete curve (as it is for instance the case for a protein backbone with a single chain of amino-acids), we introduce a method to estimate the deformation of the atomic model with respect to a given conformation, which is assumed to be known a priori. Our method consists on estimating the torsion and bond angles of the atomic model in each conformation as a linear combination of the eigenfunctions of the Laplace operator in the manifold of conformations. These eigenfunctions can be approximated by means of a well-known technique in manifold learning, based on the construction of a graph Laplacian using the cryo-EM dataset. Finally, we test our approach with synthetic datasets, for which we recover the atomic model of two-dimensional and three-dimensional flexible structures from noisy tomographic projections. △ Less

Submitted 27 December, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: 35 pages,20 figures

arXiv:2111.13606 [pdf, other]

Conditional Image Generation with Score-Based Diffusion Models

Authors: Georgios Batzolis, Jan Stanczuk, Carola-Bibiane Schönlieb, Christian Etmann

Abstract: Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling. In this work we conduct a systematic comparison and theoretical analysis of different approaches to learning conditional probability distributions with score-based diffusion models. In particular, we prove results which provide a theoretical justification for one of the most successful… ▽ More Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling. In this work we conduct a systematic comparison and theoretical analysis of different approaches to learning conditional probability distributions with score-based diffusion models. In particular, we prove results which provide a theoretical justification for one of the most successful estimators of the conditional score. Moreover, we introduce a multi-speed diffusion framework, which leads to a new estimator for the conditional score, performing on par with previous state-of-the-art approaches. Our theoretical and experimental findings are accompanied by an open source library MSDiff which allows for application and further research of multi-speed diffusion models. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2106.03755 [pdf, other]

HERS Superpixels: Deep Affinity Learning for Hierarchical Entropy Rate Segmentation

Authors: Hankui Peng, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: Superpixels serve as a powerful preprocessing tool in numerous computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. With the rise of deep learning in recent years, a few works have attempted to feed deeply learned features / graphs into existing classical superpixel techniques. However, none of them are able to pr… ▽ More Superpixels serve as a powerful preprocessing tool in numerous computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. With the rise of deep learning in recent years, a few works have attempted to feed deeply learned features / graphs into existing classical superpixel techniques. However, none of them are able to produce superpixels in near real-time, which is crucial to the applicability of superpixels in practice. In this work, we propose a two-stage graph-based framework for superpixel segmentation. In the first stage, we introduce an efficient Deep Affinity Learning (DAL) network that learns pairwise pixel affinities by aggregating multi-scale information. In the second stage, we propose a highly efficient superpixel method called Hierarchical Entropy Rate Segmentation (HERS). Using the learned affinities from the first stage, HERS builds a hierarchical tree structure that can produce any number of highly adaptive superpixels instantaneously. We demonstrate, through visual and numerical experiments, the effectiveness and efficiency of our method compared to various state-of-the-art superpixel methods. △ Less

Submitted 18 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

ACM Class: I.4; I.5

arXiv:2106.02531 [pdf, other]

CAFLOW: Conditional Autoregressive Flows

Authors: Georgios Batzolis, Marcello Carioni, Christian Etmann, Soroosh Afyouni, Zoe Kourtzi, Carola Bibiane Schönlieb

Abstract: We introduce CAFLOW, a new diverse image-to-image translation model that simultaneously leverages the power of auto-regressive modeling and the modeling efficiency of conditional normalizing flows. We transform the conditioning image into a sequence of latent encodings using a multi-scale normalizing flow and repeat the process for the conditioned image. We model the conditional distribution of th… ▽ More We introduce CAFLOW, a new diverse image-to-image translation model that simultaneously leverages the power of auto-regressive modeling and the modeling efficiency of conditional normalizing flows. We transform the conditioning image into a sequence of latent encodings using a multi-scale normalizing flow and repeat the process for the conditioned image. We model the conditional distribution of the latent encodings by modeling the auto-regressive distributions with an efficient multi-scale normalizing flow, where each conditioning factor affects image synthesis at its respective resolution scale. Our proposed framework performs well on a range of image-to-image translation tasks. It outperforms former designs of conditional flows because of its expressive auto-regressive structure. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2103.01678 [pdf, other]

Wasserstein GANs Work Because They Fail (to Approximate the Wasserstein Distance)

Authors: Jan Stanczuk, Christian Etmann, Lisa Maria Kreusser, Carola-Bibiane Schönlieb

Abstract: Wasserstein GANs are based on the idea of minimising the Wasserstein distance between a real and a generated distribution. We provide an in-depth mathematical analysis of differences between the theoretical setup and the reality of training Wasserstein GANs. In this work, we gather both theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of the Wasserstein dista… ▽ More Wasserstein GANs are based on the idea of minimising the Wasserstein distance between a real and a generated distribution. We provide an in-depth mathematical analysis of differences between the theoretical setup and the reality of training Wasserstein GANs. In this work, we gather both theoretical and empirical evidence that the WGAN loss is not a meaningful approximation of the Wasserstein distance. Moreover, we argue that the Wasserstein distance is not even a desirable loss function for deep generative models, and conclude that the success of Wasserstein GANs can in truth be attributed to a failure to approximate the Wasserstein distance. △ Less

Submitted 5 October, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.06496 [pdf, other]

Depthwise Separable Convolutions Allow for Fast and Memory-Efficient Spectral Normalization

Authors: Christina Runkel, Christian Etmann, Michael Möller, Carola-Bibiane Schönlieb

Abstract: An increasing number of models require the control of the spectral norm of convolutional layers of a neural network. While there is an abundance of methods for estimating and enforcing upper bounds on those during training, they are typically costly in either memory or time. In this work, we introduce a very simple method for spectral normalization of depthwise separable convolutions, which introd… ▽ More An increasing number of models require the control of the spectral norm of convolutional layers of a neural network. While there is an abundance of methods for estimating and enforcing upper bounds on those during training, they are typically costly in either memory or time. In this work, we introduce a very simple method for spectral normalization of depthwise separable convolutions, which introduces negligible computational and memory overhead. We demonstrate the effectiveness of our method on image classification tasks using standard architectures like MobileNetV2. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2010.00378 [pdf, other]

GraphXCOVID: Explainable Deep Graph Diffusion Pseudo-Labelling for Identifying COVID-19 on Chest X-rays

Authors: Angelica I Aviles-Rivero, Philip Sellars, Carola-Bibiane Schönlieb, Nicolas Papadakis

Abstract: Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for develo** Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availa… ▽ More Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for develo** Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availability of a large and representative labelled dataset. The creation of which is a heavily expensive and time consuming task, and especially imposes a great challenge for a novel disease. Semi-supervised learning has shown the ability to match the incredible performance of supervised models whilst requiring a small fraction of the labelled examples. This makes the semi-supervised paradigm an attractive option for identifying COVID-19. In this work, we introduce a graph based deep semi-supervised framework for classifying COVID-19 from chest X-rays. Our framework introduces an optimisation model for graph diffusion that reinforces the natural relation among the tiny labelled set and the vast unlabelled data. We then connect the diffusion prediction output as pseudo-labels that are used in an iterative scheme in a deep net. We demonstrate, through our experiments, that our model is able to outperform the current leading supervised model with a tiny fraction of the labelled examples. Finally, we provide attention maps to accommodate the radiologist's mental model, better fitting their perceptual and cognitive abilities. These visualisation aims to assist the radiologist in judging whether the diagnostic is correct or not, and in consequence to accelerate the decision. △ Less

Submitted 4 July, 2021; v1 submitted 30 September, 2020; originally announced October 2020.

arXiv:2008.06388 [pdf]

doi 10.1038/s42256-021-00307-0

Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

Authors: Michael Roberts, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I. Aviles-Rivero, Christian Etmann, Cathal McCague, Lucian Beer, Jonathan R. Weir-McCall, Zhongzhao Teng, Effrossyni Gkrania-Klotsas, James H. F. Rudd, Evis Sala, Carola-Bibiane Schönlieb

Abstract: Machine learning methods offer great promise for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we search… ▽ More Machine learning methods offer great promise for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we search EMBASE via OVID, MEDLINE via PubMed, bioRxiv, medRxiv and arXiv for published papers and preprints uploaded from January 1, 2020 to October 3, 2020 which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 61 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher quality model development and well documented manuscripts. △ Less

Submitted 5 January, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

Comments: 35 pages, 3 figures, 2 tables, updated to the period 1 January 2020 - 3 October 2020

Journal ref: Nature Machine Intelligence 3, 199-217 (2021)

arXiv:2008.02839 [pdf, other]

Learned convex regularizers for inverse problems

Authors: Subhadip Mukherjee, Sören Dittmer, Zakhar Shumaylov, Sebastian Lunz, Ozan Öktem, Carola-Bibiane Schönlieb

Abstract: We consider the variational reconstruction framework for inverse problems and propose to learn a data-adaptive input-convex neural network (ICNN) as the regularization functional. The ICNN-based convex regularizer is trained adversarially to discern ground-truth images from unregularized reconstructions. Convexity of the regularizer is desirable since (i) one can establish analytical convergence g… ▽ More We consider the variational reconstruction framework for inverse problems and propose to learn a data-adaptive input-convex neural network (ICNN) as the regularization functional. The ICNN-based convex regularizer is trained adversarially to discern ground-truth images from unregularized reconstructions. Convexity of the regularizer is desirable since (i) one can establish analytical convergence guarantees for the corresponding variational reconstruction problem and (ii) devise efficient and provable algorithms for reconstruction. In particular, we show that the optimal solution to the variational problem converges to the ground-truth if the penalty parameter decays sub-linearly with respect to the norm of the noise. Further, we prove the existence of a sub-gradient-based algorithm that leads to a monotonically decreasing error in the parameter space with iterations. To demonstrate the performance of our approach for solving inverse problems, we consider the tasks of deblurring natural images and reconstructing images in computed tomography (CT), and show that the proposed convex regularizer is at least competitive with and sometimes superior to state-of-the-art data-driven techniques for inverse problems. △ Less

Submitted 1 March, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

arXiv:2006.06624 [pdf, other]

SLIC-UAV: A Method for monitoring recovery in tropical restoration projects through identification of signature species using UAVs

Authors: Jonathan Williams, Carola-Bibiane Schönlieb, Tom Swinfield, Bambang Irawan, Eva Achmad, Muhammad Zudhi, Habibi, Elva Gemita, David A. Coomes

Abstract: Logged forests cover four million square kilometres of the tropics and restoring these forests is essential if we are to avoid the worst impacts of climate change, yet monitoring recovery is challenging. Tracking the abundance of visually identifiable, early-successional species enables successional status and thereby restoration progress to be evaluated. Here we present a new pipeline, SLIC-UAV,… ▽ More Logged forests cover four million square kilometres of the tropics and restoring these forests is essential if we are to avoid the worst impacts of climate change, yet monitoring recovery is challenging. Tracking the abundance of visually identifiable, early-successional species enables successional status and thereby restoration progress to be evaluated. Here we present a new pipeline, SLIC-UAV, for processing Unmanned Aerial Vehicle (UAV) imagery to map early-successional species in tropical forests. The pipeline is novel because it comprises: (a) a time-efficient approach for labelling crowns from UAV imagery; (b) machine learning of species based on spectral and textural features within individual tree crowns, and (c) automatic segmentation of orthomosaiced UAV imagery into 'superpixels', using Simple Linear Iterative Clustering (SLIC). Creating superpixels reduces the dataset's dimensionality and focuses prediction onto clusters of pixels, greatly improving accuracy. To demonstrate SLIC-UAV, support vector machines and random forests were used to predict the species of hand-labelled crowns in a restoration concession in Indonesia. Random forests were most accurate at discriminating species for whole crowns, with accuracy ranging from 79.3% when map** five common species, to 90.5% when map** the three most visually-distinctive species. In contrast, support vector machines proved better for labelling automatically segmented superpixels, with accuracy ranging from 74.3% to 91.7% for the same species. Models were extended to map species across 100 hectares of forest. The study demonstrates the power of SLIC-UAV for map** characteristic early-successional tree species as an indicator of successional stage within tropical forest restoration areas. Continued effort is needed to develop easy-to-implement and low-cost technology to improve the affordability of project management. △ Less

Submitted 11 June, 2020; originally announced June 2020.

arXiv:2006.03364 [pdf, other]

Structure preserving deep learning

Authors: Elena Celledoni, Matthias J. Ehrhardt, Christian Etmann, Robert I McLachlan, Brynjulf Owren, Carola-Bibiane Schönlieb, Ferdia Sherry

Abstract: Over the past few years, deep learning has risen to the foreground as a topic of massive interest, mainly as a result of successes obtained in solving large-scale image processing tasks. There are multiple challenging mathematical problems involved in applying deep learning: most deep learning methods require the solution of hard optimisation problems, and a good understanding of the tradeoff betw… ▽ More Over the past few years, deep learning has risen to the foreground as a topic of massive interest, mainly as a result of successes obtained in solving large-scale image processing tasks. There are multiple challenging mathematical problems involved in applying deep learning: most deep learning methods require the solution of hard optimisation problems, and a good understanding of the tradeoff between computational effort, amount of data and model complexity is required to successfully design a deep learning approach for a given problem. A large amount of progress made in deep learning has been based on heuristic explorations, but there is a growing effort to mathematically understand the structure in existing deep learning methods and to systematically design new deep learning methods to preserve certain types of structure in deep learning. In this article, we review a number of these directions: some deep neural networks can be understood as discretisations of dynamical systems, neural networks can be designed to have desirable properties such as invertibility or group equivariance, and new algorithmic frameworks based on conformal Hamiltonian systems and Riemannian manifolds to solve the optimisation problems have been proposed. We conclude our review of each of these topics by discussing some open problems that we consider to be interesting directions for future research. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2001.05317 [pdf, other]

CycleCluster: Modernising Clustering Regularisation for Deep Semi-Supervised Classification

Authors: Philip Sellars, Angelica Aviles-Rivero, Carola Bibiane Schönlieb

Abstract: Given the potential difficulties in obtaining large quantities of labelled data, many works have explored the use of deep semi-supervised learning, which uses both labelled and unlabelled data to train a neural network architecture. The vast majority of SSL approaches focus on implementing the low-density separation assumption or consistency assumption, the idea that decision boundaries should lie… ▽ More Given the potential difficulties in obtaining large quantities of labelled data, many works have explored the use of deep semi-supervised learning, which uses both labelled and unlabelled data to train a neural network architecture. The vast majority of SSL approaches focus on implementing the low-density separation assumption or consistency assumption, the idea that decision boundaries should lie in low density regions. However, they have implemented this assumption by making local changes to the decision boundary at each data point, ignoring the global structure of the data. In this work, we explore an alternative approach using the global information present in the clustered data to update our decision boundaries. We propose a novel framework, CycleCluster, for deep semi-supervised classification. Our core optimisation is driven by a new clustering based regularisation along with a graph based pseudo-labels and a shared deep network. Demonstrating that direct implementation of the cluster assumption is a viable alternative to the popular consistency based regularisation. We demonstrate the predictive capability of our technique through a careful set of numerical results. △ Less

Submitted 1 September, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

arXiv:1909.10221 [pdf, other]

PDE-Inspired Algorithms for Semi-Supervised Learning on Point Clouds

Authors: Oliver M. Crook, Tim Hurst, Carola-Bibiane Schönlieb, Matthew Thorpe, Konstantinos C. Zygalakis

Abstract: Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with t… ▽ More Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with the same constraints. We take advantage of this connection by designing numerical schemes that first estimate the density of the data and then apply PDE methods, such as pseudo-spectral methods, to solve the corresponding Euler-Lagrange equation. We prove that our scheme is consistent in the large data limit for two methods of density estimation: kernel density estimation and spline kernel density estimation. △ Less

Submitted 23 September, 2019; originally announced September 2019.

arXiv:1907.10085 [pdf, other]

GraphX$^{NET}-$ Chest X-Ray Classification Under Extreme Minimal Supervision

Authors: Angelica I. Aviles-Rivero, Nicolas Papadakis, Ruoteng Li, Philip Sellars, Qingnan Fan, Robby T. Tan, Carola-Bibiane Schönlieb

Abstract: The task of classifying X-ray data is a problem of both theoretical and clinical interest. Whilst supervised deep learning methods rely upon huge amounts of labelled data, the critical problem of achieving a good classification accuracy when an extremely small amount of labelled data is available has yet to be tackled. In this work, we introduce a novel semi-supervised framework for X-ray classifi… ▽ More The task of classifying X-ray data is a problem of both theoretical and clinical interest. Whilst supervised deep learning methods rely upon huge amounts of labelled data, the critical problem of achieving a good classification accuracy when an extremely small amount of labelled data is available has yet to be tackled. In this work, we introduce a novel semi-supervised framework for X-ray classification which is based on a graph-based optimisation model. To the best of our knowledge, this is the first method that exploits graph-based semi-supervised learning for X-ray data classification. Furthermore, we introduce a new multi-class classification functional with carefully selected class priors which allows for a smooth solution that strengthens the synergy between the limited number of labels and the huge amount of unlabelled data. We demonstrate, through a set of numerical and visual experiments, that our method produces highly competitive results on the ChestX-ray14 data set whilst drastically reducing the need for annotated data. △ Less

Submitted 3 July, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: MICCAI 2019

arXiv:1906.12177 [pdf, other]

Learning to segment microscopy images with lazy labels

Authors: Rihuan Ke, Aurélie Bugeau, Nicolas Papadakis, Peter Schuetz, Carola-Bibiane Schönlieb

Abstract: The need for labour intensive pixel-wise annotation is a major limitation of many fully supervised learning methods for segmenting bioimages that can contain numerous object instances with thin separations. In this paper, we introduce a deep convolutional neural network for microscopy image segmentation. Annotation issues are circumvented by letting the network being trainable on coarse labels com… ▽ More The need for labour intensive pixel-wise annotation is a major limitation of many fully supervised learning methods for segmenting bioimages that can contain numerous object instances with thin separations. In this paper, we introduce a deep convolutional neural network for microscopy image segmentation. Annotation issues are circumvented by letting the network being trainable on coarse labels combined with only a very small number of images with pixel-wise annotations. We call this new labelling strategy `lazy' labels. Image segmentation is stratified into three connected tasks: rough inner region detection, object separation and pixel-wise segmentation. These tasks are learned in an end-to-end multi-task learning framework. The method is demonstrated on two microscopy datasets, where we show that the model gives accurate segmentation results even if exact boundary labels are missing for a majority of annotated data. It brings more flexibility and efficiency for training deep neural networks that are data hungry and is applicable to biomedical images with poor contrast at the object boundaries or with diverse textures and repeated patterns. △ Less

Submitted 10 September, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

arXiv:1906.08635 [pdf, other]

Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy

Authors: Angelica I. Aviles-Rivero, Nicolas Papadakis, Ruoteng Li, Philip Sellars, Samar M Alsaleh, Robby T Tan, Carola-Bibiane Schönlieb

Abstract: Semi-supervised classification is a great focus of interest, as in real-world scenarios obtaining labels is expensive, time-consuming and might require expert knowledge. This has motivated the fast development of semi-supervised techniques, whose performance is on a par with or better than supervised approaches. A current major challenge for semi-supervised techniques is how to better handle the n… ▽ More Semi-supervised classification is a great focus of interest, as in real-world scenarios obtaining labels is expensive, time-consuming and might require expert knowledge. This has motivated the fast development of semi-supervised techniques, whose performance is on a par with or better than supervised approaches. A current major challenge for semi-supervised techniques is how to better handle the network calibration and confirmation bias problems for improving performance. In this work, we argue that energy models are an effective alternative to such problems. With this motivation in mind, we propose a hybrid framework for semi-supervised classification called CREPE model (1-Lapla$\mathbf{C}$ian g$\mathbf{R}$aph $\mathbf{E}$nergy for $\mathbf{P}$seudo-lab$\mathbf{E}$ls). Firstly, we introduce a new energy model based on the non-smooth $\ell_1$ norm of the normalised graph 1-Laplacian. Our functional enforces a sufficiently smooth solution and strengthens the intrinsic relation between the labelled and unlabelled data. Secondly, we provide a theoretical analysis for our proposed scheme and show that the solution trajectory does converge to a non-constant steady point. Thirdly, we derive the connection of our energy model for pseudo-labelling. We show that our energy model produces more meaningful pseudo-labels than the ones generated directly by a deep network. We extensively evaluate our framework, through numerical and visual experiments, using six benchmarking datasets for natural and medical images. We demonstrate that our technique reports state-of-the-art results for semi-supervised classification. △ Less

Submitted 25 October, 2021; v1 submitted 20 June, 2019; originally announced June 2019.

arXiv:1905.04172 [pdf, other]

On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Authors: Christian Etmann, Sebastian Lunz, Peter Maass, Carola-Bibiane Schönlieb

Abstract: Recent studies on the adversarial vulnerability of neural networks have shown that models trained to be more robust to adversarial attacks exhibit more interpretable saliency maps than their non-robust counterparts. We aim to quantify this behavior by considering the alignment between input image and saliency map. We hypothesize that as the distance to the decision boundary grows,so does the align… ▽ More Recent studies on the adversarial vulnerability of neural networks have shown that models trained to be more robust to adversarial attacks exhibit more interpretable saliency maps than their non-robust counterparts. We aim to quantify this behavior by considering the alignment between input image and saliency map. We hypothesize that as the distance to the decision boundary grows,so does the alignment. This connection is strictly true in the case of linear models. We confirm these theoretical findings with experiments based on models trained with a local Lipschitz regularization and identify where the non-linear nature of neural networks weakens the relation. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: 12 pages, accepted for publication at the 36th International Conference on Machine Learning 2019

arXiv:1903.06548 [pdf, other]

doi 10.1109/TGRS.2019.2961599

Superpixel Contracted Graph-Based Learning for Hyperspectral Image Classification

Authors: Philip Sellars, Angelica Aviles-Rivero, Carola-Bibiane Schönlieb

Abstract: A central problem in hyperspectral image classification is obtaining high classification accuracy when using a limited amount of labelled data. In this paper we present a novel graph-based framework, which aims to tackle this problem in the presence of large scale data input. Our approach utilises a novel superpixel method, specifically designed for hyperspectral data, to define meaningful local r… ▽ More A central problem in hyperspectral image classification is obtaining high classification accuracy when using a limited amount of labelled data. In this paper we present a novel graph-based framework, which aims to tackle this problem in the presence of large scale data input. Our approach utilises a novel superpixel method, specifically designed for hyperspectral data, to define meaningful local regions in an image, which with high probability share the same classification label. We then extract spectral and spatial features from these regions and use these to produce a contracted weighted graph-representation, where each node represents a region rather than a pixel. Our graph is then fed into a graph-based semi-supervised classifier which gives the final classification. We show that using superpixels in a graph representation is an effective tool for speeding up graphical classifiers applied to hyperspectral images. We demonstrate through exhaustive quantitative and qualitative results that our proposed method produces accurate classifications when an incredibly small amount of labelled data is used. We show that our approach mitigates the major drawbacks of existing approaches, resulting in our approach outperforming several comparative state-of-the-art techniques. △ Less

Submitted 19 March, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

Comments: 11 pages

arXiv:1901.04240 [pdf, other]

Semi-supervised Learning with Graphs: Covariance Based Superpixels For Hyperspectral Image Classification

Authors: Philip Sellars, Angelica Aviles-Rivero, Nicolas Papadakis, David Coomes, Anita Faul, Carola-Bibane Schönlieb

Abstract: In this paper, we present a graph-based semi-supervised framework for hyperspectral image classification. We first introduce a novel superpixel algorithm based on the spectral covariance matrix representation of pixels to provide a better representation of our data. We then construct a superpixel graph, based on carefully considered feature vectors, before performing classification. We demonstrate… ▽ More In this paper, we present a graph-based semi-supervised framework for hyperspectral image classification. We first introduce a novel superpixel algorithm based on the spectral covariance matrix representation of pixels to provide a better representation of our data. We then construct a superpixel graph, based on carefully considered feature vectors, before performing classification. We demonstrate, through a set of experimental results using two benchmarking datasets, that our approach outperforms three state-of-the-art classification frameworks, especially when an extremely small amount of labelled data is used. △ Less

Submitted 14 May, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: Four pages with two figures

arXiv:1805.11572 [pdf, other]

Adversarial Regularizers in Inverse Problems

Authors: Sebastian Lunz, Ozan Öktem, Carola-Bibiane Schönlieb

Abstract: Inverse Problems in medical imaging and computer vision are traditionally solved using purely model-based methods. Among those variational regularization models are one of the most popular approaches. We propose a new framework for applying data-driven approaches to inverse problems, using a neural network as a regularization functional. The network learns to discriminate between the distribution… ▽ More Inverse Problems in medical imaging and computer vision are traditionally solved using purely model-based methods. Among those variational regularization models are one of the most popular approaches. We propose a new framework for applying data-driven approaches to inverse problems, using a neural network as a regularization functional. The network learns to discriminate between the distribution of ground truth images and the distribution of unregularized reconstructions. Once trained, the network is applied to the inverse problem by solving the corresponding variational problem. Unlike other data-based approaches for inverse problems, the algorithm can be applied even if only unsupervised training data is available. Experiments demonstrate the potential of the framework for denoising on the BSDS dataset and for computed tomography reconstruction on the LIDC dataset. △ Less

Submitted 11 January, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: published at NeurIPS 2018

Showing 1–30 of 30 results for author: Schönlieb, C