-
On the mean-field limit for Stein variational gradient descent: stability and multilevel approximation
Authors:
Simon Weissmann,
Jakob Zech
Abstract:
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood…
▽ More
In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood evaluations at all particle locations. To address this, we introduce a multilevel variant that involves running several interacting particle dynamics in parallel corresponding to different approximation levels of the likelihood. By carefully tuning the number of particles at each level, we prove that a significant reduction in computational complexity can be achieved. As an application we provide a numerical experiment for a PDE driven inverse problem, which confirms the speed up suggested by our theoretical results.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Metropolis-adjusted interacting particle sampling
Authors:
Björn Sprungk,
Simon Weissmann,
Jakob Zech
Abstract:
In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and…
▽ More
In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and numerical time step** used to simulate these systems can introduce bias and affect the invariance of the particle system with respect to the target distribution. To correct for this, we investigate the use of a Metropolization step, similar to the Metropolis-adjusted Langevin algorithm. We examine Metropolization of either the whole ensemble or smaller subsets of the ensemble, and prove basic convergence of the resulting ensemble Markov chain to the target distribution. Our numerical results demonstrate the benefits of this correction in numerical examples for popular interacting particle samplers such as ALDI, CBS, and stochastic SVGD.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Measure transport via polynomial density surrogates
Authors:
Josephine Westermann,
Jakob Zech
Abstract:
We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss th…
▽ More
We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss the design and construction of suitable sparse approximation spaces, and provide a complete error and cost analysis for target densities belonging to certain smoothness classes. Further, we explore the relation between our proposed algorithm and related approaches that aim to find suitable transports via optimization over a class of parametrized transports. Finally, we discuss the efficient implementation of our algorithm and report on numerical experiments which confirm our theory.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Distribution learning via neural differential equations: a nonparametric statistical perspective
Authors:
Youssef Marzouk,
Zhi Ren,
Sven Wang,
Jakob Zech
Abstract:
Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work…
▽ More
Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work establishes the first general nonparametric statistical convergence analysis for distribution learning via ODE models trained through likelihood maximization. We first prove a convergence theorem applicable to arbitrary velocity field classes $\mathcal{F}$ satisfying certain simple boundary constraints. This general result captures the trade-off between approximation error (`bias') and the complexity of the ODE model (`variance'). We show that the latter can be quantified via the $C^1$-metric entropy of the class $\mathcal F$. We then apply this general framework to the setting of $C^k$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $\mathcal F$: $C^k$ functions and neural networks. The latter is the practically important case of neural ODEs.
Our proof techniques require a careful synthesis of (i) analytical stability results for ODEs, (ii) classical theory for sieved M-estimators, and (iii) recent results on approximation rates and metric entropies of neural network classes. The results also provide theoretical insight on how the choice of velocity field class, and the dependence of this choice on sample size $n$ (e.g., the scaling of width, depth, and sparsity of neural network classes), impacts statistical performance.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Deep Operator Network Approximation Rates for Lipschitz Operators
Authors:
Christoph Schwab,
Andreas Stein,
Jakob Zech
Abstract:
We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of…
▽ More
We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, and an approximator network of an infinite-dimensional, parametric coordinate map that is Lipschitz continuous on the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be holomorphic, the present expression rate results require mere Lipschitz (or Hölder) continuity of $\mathcal G$. Key in the proof of the present expression rate bounds is the use of either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], and the references there) which are inspired by the Kolmogorov superposition theorem, or of nonstandard NN architectures with standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]. We illustrate the abstract results by approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Multilevel Domain Uncertainty Quantification in Computational Electromagnetics
Authors:
Rubén Aylwin,
Carlos Jerez-Hanckes,
Christoph Schwab,
Jakob Zech
Abstract:
We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, map** the physical domains to a nominal polygonal domain with piecewise smooth…
▽ More
We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, map** the physical domains to a nominal polygonal domain with piecewise smooth maps. The regularity of the pullback solutions on the nominal domain is characterized in piecewise Sobolev spaces. We prove error convergence rates and optimize the algorithmic steering of parameters for edge-element discretizations in the nominal domain combined with: (a) multilevel Monte Carlo sampling, and (b) multilevel, sparse-grid quadrature for computing the expectation of the solutions with respect to uncertain domain ensembles. In addition, we analyze sparse-grid interpolation to compute surrogates of the domain-to-solution map**s. All calculations are performed on the polyhedral nominal domain, which enables the use of standard simplicial finite element meshes. We provide a rigorous fully discrete error analysis and show, in all cases, that dimension-independent algebraic convergence is achieved. For the multilevel sparse-grid quadrature methods, we prove higher order convergence rates which are free from the so-called curse of dimensionality, i.e. independent of the number of parameters used to parametrize the admissible shapes. Numerical experiments confirm our theoretical results and verify the superiority of the sparse-grid methods.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Neural and spectral operator surrogates: unified construction and expression rate bounds
Authors:
Lukas Herrmann,
Christoph Schwab,
Jakob Zech
Abstract:
Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable…
▽ More
Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable Hilbert spaces. Operator in- and outputs from function spaces are assumed to be parametrized by stable, affine representation systems. Admissible representation systems comprise orthonormal bases, Riesz bases or suitable tight frames of the spaces under consideration. Algebraic expression rate bounds are established for both, deep neural and spectral operator surrogates acting in scales of separable Hilbert spaces containing domain and range of the map to be expressed, with finite Sobolev or Besov regularity. We illustrate the abstract concepts by expression rate bounds for the coefficient-to-solution map for a linear elliptic PDE on the torus.
△ Less
Submitted 8 February, 2024; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Multilevel Optimization for Inverse Problems
Authors:
Simon Weissmann,
Ashia Wilson,
Jakob Zech
Abstract:
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associat…
▽ More
Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associated with evaluating the expensive forward maps stemming from various physical models. To demonstrate the versatility of our analysis, we discuss its implications for various methodologies including multilevel (accelerated, stochastic) gradient descent, a multilevel ensemble Kalman inversion and a multilevel Langevin sampler. We also provide numerical experiments to verify our theoretical findings.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
De Rham compatible Deep Neural Network FEM
Authors:
Marcello Longo,
Joost A. A. Opschoor,
Nico Disch,
Christoph Schwab,
Jakob Zech
Abstract:
On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element''…
▽ More
On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element'', and the ``Nédélec edge element''. For all but the CPwL case, our network architectures employ both ReLU (rectified linear unit) and BiSU (binary step unit) activations to capture discontinuities. In the important case of CPwL functions, we prove that it suffices to work with pure ReLU nets. Our construction and DNN architecture generalizes previous results in that no geometric restrictions on the regular simplicial partitions $\mathcal{T}$ of $Ω$ are required for DNN emulation. In addition, for CPwL functions our DNN construction is valid in any dimension $d\geq 2$. Our ``FE-Nets'' are required in the variationally correct, structure-preserving approximation of boundary value problems of electromagnetism in nonconvex polyhedra $Ω\subset \mathbb{R}^3$. They are thus an essential ingredient in the application of e.g., the methodology of ``physics-informed NNs'' or ``deep Ritz methods'' to electromagnetic field simulation via deep learning techniques. We indicate generalizations of our constructions to higher-order compatible spaces and other, non-compatible classes of discretizations, in particular the ``Crouzeix-Raviart'' elements and Hybridized, Higher Order (HHO) methods.
△ Less
Submitted 2 June, 2023; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs
Authors:
Dinh Dũng,
Van Kien Nguyen,
Christoph Schwab,
Jakob Zech
Abstract:
We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs.
The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It diff…
▽ More
We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs.
The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It differs from previous works that used bootstrap arguments and induction on the differentiation order of solution derivatives with respect to the parameters. The present holomorphy-based argument allows a unified, ``differentiation-free'' proof of sparsity (expressed in terms of $\ell^p$-summability or weighted $\ell^2$-summability) of sequences of Wiener-Hermite coefficients in polynomial chaos expansions in various scales of function spaces. The analysis also implies corresponding analyticity and sparsity results for posterior densities in Bayesian inverse problems subject to Gaussian priors on uncertain inputs from function spaces.
Our results furthermore yield dimension-independent convergence rates of various \emph{constructive} high-dimensional deterministic numerical approximation schemes such as single-level and multi-level versions of Hermite-Smolyak anisotropic sparse-grid interpolation and quadrature in both forward and inverse computational uncertainty quantification.
△ Less
Submitted 16 June, 2023; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,γ_d)$
Authors:
Christoph Schwab,
Jakob Zech
Abstract:
For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential…
▽ More
For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential convergence rates in $L^2(\mathbb{R}^d,γ_d)$. In case $d=\infty$, under suitable smoothness and sparsity assumptions on $f:\mathbb{R}^{\mathbb{N}}\to\mathbb{R}$, with $γ_\infty$ denoting an infinite (Gaussian) product measure on $\mathbb{R}^{\mathbb{N}}$, we prove dimension-independent expression rate bounds in the norm of $L^2(\mathbb{R}^{\mathbb{N}},γ_\infty)$. The rates only depend on quantified holomorphy of (an analytic continuation of) the map $f$ to a product of strips in $\mathbb{C}^d$. As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Sparse approximation of triangular transports. Part II: the infinite dimensional case
Authors:
Jakob Zech,
Youssef Marzouk
Abstract:
For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior me…
▽ More
For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior measures arising in certain inference problems where the unknown belongs to an (infinite dimensional) Banach space. In particular, we show that it is possible to efficiently approximately sample from certain high-dimensional measures by transforming a lower-dimensional latent variable.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Sparse approximation of triangular transports. Part I: the finite dimensional case
Authors:
Jakob Zech,
Youssef Marzouk
Abstract:
For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expans…
▽ More
For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expansions or deep ReLU neural networks, such that the distance between $\tilde T_\sharpρ$ and $π$ decreases exponentially. More precisely, we prove error bounds of the type $\exp(-βN^{1/d})$ (or $\exp(-βN^{1/(d+1)})$ for neural networks), where $N$ refers to the dimension of the ansatz space (or the size of the network) containing $\tilde T$; the notion of distance comprises the Hellinger distance, the total variation distance, the Wasserstein distance and the Kullback-Leibler divergence. Our construction guarantees $\tilde T$ to be a monotone triangular bijective transport on the hypercube $[-1,1]^d$. Analogous results hold for the inverse transport $S=T^{-1}$. The proofs are constructive, and we give an explicit a priori description of the ansatz space, which can be used for numerical implementations.
△ Less
Submitted 28 July, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging
Authors:
John R. Zech,
Jessica Zosa Forde,
Michael L. Littman
Abstract:
We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (me…
▽ More
We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (mean ln[(maximum probability)/(minimum probability)] 2.45, coefficient of variation 0.543). This individual radiograph-level variability was not fully reflected in the variability of AUC on a large test set. Averaging predictions from 10 models reduced variability by nearly 70% (mean coefficient of variation from 0.543 to 0.169, t-test 15.96, p-value < 0.0001). We encourage researchers to be aware of the potential variability of CNNs and ensemble predictions from multiple models to minimize the effect this variability may have on the care of individual patients when these models are deployed clinically.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables
Authors:
Marcus A. Badgeley,
John R. Zech,
Luke Oakden-Rayner,
Benjamin S. Glicksberg,
Manway Liu,
William Gale,
Michael V. McConnell,
Beth Percha,
Thomas M. Snyder,
Joel T. Dudley
Abstract:
Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs. Computer-Aided Diagnosis (CAD) algorithms have shown promise for hel** radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep learning mo…
▽ More
Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs. Computer-Aided Diagnosis (CAD) algorithms have shown promise for hel** radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep learning models on 17,587 radiographs to classify fracture, five patient traits, and 14 hospital process variables. All 20 variables could be predicted from a radiograph (p < 0.05), with the best performances on scanner model (AUC=1.00), scanner brand (AUC=0.98), and whether the order was marked "priority" (AUC=0.79). Fracture was predicted moderately well from the image (AUC=0.78) and better when combining image features with patient data (AUC=0.86, p=2e-9) or patient data plus hospital process features (AUC=0.91, p=1e-21). The model performance on a test set with matched patient variables was significantly lower than a random test set (AUC=0.67, p=0.003); and when the test set was matched on patient and image acquisition variables, the model performed randomly (AUC=0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's predictive ability overall. We also used Naive Bayes to combine evidence from image models with patient and hospital data and found their inclusion improved performance, but that this approach was nevertheless inferior to directly modeling all variables. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep learning decision processes so that computers and clinicians can effectively cooperate.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.
-
Confounding variables can degrade generalization performance of radiological deep learning models
Authors:
John R. Zech,
Marcus A. Badgeley,
Manway Liu,
Anthony B. Costa,
Joseph J. Titano,
Eric K. Oermann
Abstract:
Early results in using convolutional neural networks (CNNs) on x-rays to diagnose disease have been promising, but it has not yet been shown that models trained on x-rays from one hospital or one group of hospitals will work equally well at different hospitals. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize acros…
▽ More
Early results in using convolutional neural networks (CNNs) on x-rays to diagnose disease have been promising, but it has not yet been shown that models trained on x-rays from one hospital or one group of hospitals will work equally well at different hospitals. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize across a variety of hospital systems. A cross-sectional design was used to train and evaluate pneumonia screening CNNs on 158,323 chest x-rays from NIH (n=112,120 from 30,805 patients), Mount Sinai (42,396 from 12,904 patients), and Indiana (n=3,807 from 3,683 patients). In 3 / 5 natural comparisons, performance on chest x-rays from outside hospitals was significantly lower than on held-out x-rays from the original hospital systems. CNNs were able to detect where an x-ray was acquired (hospital system, hospital department) with extremely high accuracy and calibrate predictions accordingly. The performance of CNNs in diagnosing diseases on x-rays may reflect not only their ability to identify disease-specific imaging findings on x-rays, but also their ability to exploit confounding information. Estimates of CNN performance based on test data from hospital systems used for model training may overstate their likely real-world performance.
△ Less
Submitted 12 July, 2018; v1 submitted 1 July, 2018;
originally announced July 2018.
-
A Posteriori Error Estimation of hp-dG Finite Element Methods for Highly Indefinite Helmholtz Problems (extended version)
Authors:
Stefan Sauter,
Jakob Zech
Abstract:
In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters…
▽ More
In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters $h$ and $p$. In contrast to the conventional conforming finite element method for indefinite problems, the dG formulation is unconditionally stable and the adaptive discretization process may start from a very coarse initial mesh. Numerical experiments will illustrate the efficiency and robustness of the method.
△ Less
Submitted 14 March, 2015; v1 submitted 5 July, 2014;
originally announced July 2014.
-
Inferring Team Strengths Using a Discrete Markov Random Field
Authors:
John Zech,
Frank Wood
Abstract:
We propose an original model for inferring team strengths using a Markov Random Field, which can be used to generate historical estimates of the offensive and defensive strengths of a team over time. This model was designed to be applied to sports such as soccer or hockey, in which contest outcomes take value in a limited discrete space. We perform inference using a combination of Expectation Maxi…
▽ More
We propose an original model for inferring team strengths using a Markov Random Field, which can be used to generate historical estimates of the offensive and defensive strengths of a team over time. This model was designed to be applied to sports such as soccer or hockey, in which contest outcomes take value in a limited discrete space. We perform inference using a combination of Expectation Maximization and Loopy Belief Propagation. The challenges of working with a non-convex optimization problem and a high-dimensional parameter space are discussed. The performance of the model is demonstrated on professional soccer data from the English Premier League.
△ Less
Submitted 8 May, 2013;
originally announced May 2013.