Search | arXiv e-print repository

On the mean-field limit for Stein variational gradient descent: stability and multilevel approximation

Abstract: In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood… ▽ More In this paper we propose and analyze a novel multilevel version of Stein variational gradient descent (SVGD). SVGD is a recent particle based variational inference method. For Bayesian inverse problems with computationally expensive likelihood evaluations, the method can become prohibitive as it requires to evolve a discrete dynamical system over many time steps, each of which requires likelihood evaluations at all particle locations. To address this, we introduce a multilevel variant that involves running several interacting particle dynamics in parallel corresponding to different approximation levels of the likelihood. By carefully tuning the number of particles at each level, we prove that a significant reduction in computational complexity can be achieved. As an application we provide a numerical experiment for a PDE driven inverse problem, which confirms the speed up suggested by our theoretical results. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2312.13889 [pdf, ps, other]

Metropolis-adjusted interacting particle sampling

Authors: Björn Sprungk, Simon Weissmann, Jakob Zech

Abstract: In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and… ▽ More In recent years, various interacting particle samplers have been developed to sample from complex target distributions, such as those found in Bayesian inverse problems. These samplers are motivated by the mean-field limit perspective and implemented as ensembles of particles that move in the product state space according to coupled stochastic differential equations. The ensemble approximation and numerical time step** used to simulate these systems can introduce bias and affect the invariance of the particle system with respect to the target distribution. To correct for this, we investigate the use of a Metropolization step, similar to the Metropolis-adjusted Langevin algorithm. We examine Metropolization of either the whole ensemble or smaller subsets of the ensemble, and prove basic convergence of the resulting ensemble Markov chain to the target distribution. Our numerical results demonstrate the benefits of this correction in numerical examples for popular interacting particle samplers such as ALDI, CBS, and stochastic SVGD. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.04172 [pdf, other]

Measure transport via polynomial density surrogates

Authors: Josephine Westermann, Jakob Zech

Abstract: We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss th… ▽ More We discuss an algorithm to compute transport maps that couple the uniform measure on $[0,1]^d$ with a specified target distribution $π$ on $[0,1]^d$. The primary objectives are either to sample from or to compute expectations w.r.t. $π$. The method is based on leveraging a polynomial surrogate of the target density, which is obtained by a least-squares or interpolation approximation. We discuss the design and construction of suitable sparse approximation spaces, and provide a complete error and cost analysis for target densities belonging to certain smoothness classes. Further, we explore the relation between our proposed algorithm and related approaches that aim to find suitable transports via optimization over a class of parametrized transports. Finally, we discuss the efficient implementation of our algorithm and report on numerical experiments which confirm our theory. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 51 pages

MSC Class: 65C10; 62F15; 65C05; 65D40; 41A10; 41A25; 41A63

arXiv:2309.01043 [pdf, ps, other]

Distribution learning via neural differential equations: a nonparametric statistical perspective

Authors: Youssef Marzouk, Zhi Ren, Sven Wang, Jakob Zech

Abstract: Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work… ▽ More Ordinary differential equations (ODEs), via their induced flow maps, provide a powerful framework to parameterize invertible transformations for the purpose of representing complex probability distributions. While such models have achieved enormous success in machine learning, particularly for generative modeling and density estimation, little is known about their statistical properties. This work establishes the first general nonparametric statistical convergence analysis for distribution learning via ODE models trained through likelihood maximization. We first prove a convergence theorem applicable to arbitrary velocity field classes $\mathcal{F}$ satisfying certain simple boundary constraints. This general result captures the trade-off between approximation error (`bias') and the complexity of the ODE model (`variance'). We show that the latter can be quantified via the $C^1$-metric entropy of the class $\mathcal F$. We then apply this general framework to the setting of $C^k$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $\mathcal F$: $C^k$ functions and neural networks. The latter is the practically important case of neural ODEs. Our proof techniques require a careful synthesis of (i) analytical stability results for ODEs, (ii) classical theory for sieved M-estimators, and (iii) recent results on approximation rates and metric entropies of neural network classes. The results also provide theoretical insight on how the choice of velocity field class, and the dependence of this choice on sample size $n$ (e.g., the scaling of width, depth, and sparsity of neural network classes), impacts statistical performance. △ Less

Submitted 2 September, 2023; originally announced September 2023.

arXiv:2307.09835 [pdf, ps, other]

Deep Operator Network Approximation Rates for Lipschitz Operators

Authors: Christoph Schwab, Andreas Stein, Jakob Zech

Abstract: We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of… ▽ More We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, and an approximator network of an infinite-dimensional, parametric coordinate map that is Lipschitz continuous on the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be holomorphic, the present expression rate results require mere Lipschitz (or Hölder) continuity of $\mathcal G$. Key in the proof of the present expression rate bounds is the use of either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], and the references there) which are inspired by the Kolmogorov superposition theorem, or of nonstandard NN architectures with standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]. We illustrate the abstract results by approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 31 pages

MSC Class: 41A65; 68T15; 68Q32

arXiv:2212.07240 [pdf, ps, other]

Multilevel Domain Uncertainty Quantification in Computational Electromagnetics

Authors: Rubén Aylwin, Carlos Jerez-Hanckes, Christoph Schwab, Jakob Zech

Abstract: We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, map** the physical domains to a nominal polygonal domain with piecewise smooth… ▽ More We continue our study [Domain Uncertainty Quantification in Computational Electromagnetics, JUQ (2020), 8:301--341] of the numerical approximation of time-harmonic electromagnetic fields for the Maxwell lossy cavity problem for uncertain geometries. We adopt the same affine-parametric shape parametrization framework, map** the physical domains to a nominal polygonal domain with piecewise smooth maps. The regularity of the pullback solutions on the nominal domain is characterized in piecewise Sobolev spaces. We prove error convergence rates and optimize the algorithmic steering of parameters for edge-element discretizations in the nominal domain combined with: (a) multilevel Monte Carlo sampling, and (b) multilevel, sparse-grid quadrature for computing the expectation of the solutions with respect to uncertain domain ensembles. In addition, we analyze sparse-grid interpolation to compute surrogates of the domain-to-solution map**s. All calculations are performed on the polyhedral nominal domain, which enables the use of standard simplicial finite element meshes. We provide a rigorous fully discrete error analysis and show, in all cases, that dimension-independent algebraic convergence is achieved. For the multilevel sparse-grid quadrature methods, we prove higher order convergence rates which are free from the so-called curse of dimensionality, i.e. independent of the number of parameters used to parametrize the admissible shapes. Numerical experiments confirm our theoretical results and verify the superiority of the sparse-grid methods. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2207.04950 [pdf, ps, other]

Neural and spectral operator surrogates: unified construction and expression rate bounds

Authors: Lukas Herrmann, Christoph Schwab, Jakob Zech

Abstract: Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable… ▽ More Approximation rates are analyzed for deep surrogates of maps between infinite-dimensional function spaces, arising e.g. as data-to-solution maps of linear and nonlinear partial differential equations. Specifically, we study approximation rates for Deep Neural Operator and Generalized Polynomial Chaos (gpc) Operator surrogates for nonlinear, holomorphic maps between infinite-dimensional, separable Hilbert spaces. Operator in- and outputs from function spaces are assumed to be parametrized by stable, affine representation systems. Admissible representation systems comprise orthonormal bases, Riesz bases or suitable tight frames of the spaces under consideration. Algebraic expression rate bounds are established for both, deep neural and spectral operator surrogates acting in scales of separable Hilbert spaces containing domain and range of the map to be expressed, with finite Sobolev or Besov regularity. We illustrate the abstract concepts by expression rate bounds for the coefficient-to-solution map for a linear elliptic PDE on the torus. △ Less

Submitted 8 February, 2024; v1 submitted 11 July, 2022; originally announced July 2022.

arXiv:2204.13732 [pdf, other]

Multilevel Optimization for Inverse Problems

Authors: Simon Weissmann, Ashia Wilson, Jakob Zech

Abstract: Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associat… ▽ More Inverse problems occur in a variety of parameter identification tasks in engineering. Such problems are challenging in practice, as they require repeated evaluation of computationally expensive forward models. We introduce a unifying framework of multilevel optimization that can be applied to a wide range of optimization-based solvers. Our framework provably reduces the computational cost associated with evaluating the expensive forward maps stemming from various physical models. To demonstrate the versatility of our analysis, we discuss its implications for various methodologies including multilevel (accelerated, stochastic) gradient descent, a multilevel ensemble Kalman inversion and a multilevel Langevin sampler. We also provide numerical experiments to verify our theoretical findings. △ Less

Submitted 28 April, 2022; originally announced April 2022.

MSC Class: 65N21; 65N75; 65K10

arXiv:2201.05395 [pdf, other]

doi 10.1016/j.neunet.2023.06.008

De Rham compatible Deep Neural Network FEM

Authors: Marcello Longo, Joost A. A. Opschoor, Nico Disch, Christoph Schwab, Jakob Zech

Abstract: On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element''… ▽ More On general regular simplicial partitions $\mathcal{T}$ of bounded polytopal domains $Ω\subset \mathbb{R}^d$, $d\in\{2,3\}$, we construct \emph{exact neural network (NN) emulations} of all lowest order finite element spaces in the discrete de Rham complex. These include the spaces of piecewise constant functions, continuous piecewise linear (CPwL) functions, the classical ``Raviart-Thomas element'', and the ``Nédélec edge element''. For all but the CPwL case, our network architectures employ both ReLU (rectified linear unit) and BiSU (binary step unit) activations to capture discontinuities. In the important case of CPwL functions, we prove that it suffices to work with pure ReLU nets. Our construction and DNN architecture generalizes previous results in that no geometric restrictions on the regular simplicial partitions $\mathcal{T}$ of $Ω$ are required for DNN emulation. In addition, for CPwL functions our DNN construction is valid in any dimension $d\geq 2$. Our ``FE-Nets'' are required in the variationally correct, structure-preserving approximation of boundary value problems of electromagnetism in nonconvex polyhedra $Ω\subset \mathbb{R}^3$. They are thus an essential ingredient in the application of e.g., the methodology of ``physics-informed NNs'' or ``deep Ritz methods'' to electromagnetic field simulation via deep learning techniques. We indicate generalizations of our constructions to higher-order compatible spaces and other, non-compatible classes of discretizations, in particular the ``Crouzeix-Raviart'' elements and Hybridized, Higher Order (HHO) methods. △ Less

Submitted 2 June, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

MSC Class: 41A05; 68Q32; 26B40; 65N30

arXiv:2201.01912 [pdf, ps, other]

Analyticity and sparsity in uncertainty quantification for PDEs with Gaussian random field inputs

Authors: Dinh Dũng, Van Kien Nguyen, Christoph Schwab, Jakob Zech

Abstract: We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs. The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It diff… ▽ More We establish sparsity and summability results for coefficient sequences of Wiener-Hermite polynomial chaos expansions of countably-parametric solutions of linear elliptic and parabolic divergence-form partial differential equations with Gaussian random field inputs. The novel proof technique developed here is based on analytic continuation of parametric solutions into the complex domain. It differs from previous works that used bootstrap arguments and induction on the differentiation order of solution derivatives with respect to the parameters. The present holomorphy-based argument allows a unified, ``differentiation-free'' proof of sparsity (expressed in terms of $\ell^p$-summability or weighted $\ell^2$-summability) of sequences of Wiener-Hermite coefficients in polynomial chaos expansions in various scales of function spaces. The analysis also implies corresponding analyticity and sparsity results for posterior densities in Bayesian inverse problems subject to Gaussian priors on uncertain inputs from function spaces. Our results furthermore yield dimension-independent convergence rates of various \emph{constructive} high-dimensional deterministic numerical approximation schemes such as single-level and multi-level versions of Hermite-Smolyak anisotropic sparse-grid interpolation and quadrature in both forward and inverse computational uncertainty quantification. △ Less

Submitted 16 June, 2023; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: 165 pages

arXiv:2111.07080 [pdf, ps, other]

Deep Learning in High Dimension: Neural Network Approximation of Analytic Functions in $L^2(\mathbb{R}^d,γ_d)$

Authors: Christoph Schwab, Jakob Zech

Abstract: For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential… ▽ More For artificial deep neural networks, we prove expression rates for analytic functions $f:\mathbb{R}^d\to\mathbb{R}$ in the norm of $L^2(\mathbb{R}^d,γ_d)$ where $d\in {\mathbb{N}}\cup\{ \infty \}$. Here $γ_d$ denotes the Gaussian product probability measure on $\mathbb{R}^d$. We consider in particular ReLU and ReLU${}^k$ activations for integer $k\geq 2$. For $d\in\mathbb{N}$, we show exponential convergence rates in $L^2(\mathbb{R}^d,γ_d)$. In case $d=\infty$, under suitable smoothness and sparsity assumptions on $f:\mathbb{R}^{\mathbb{N}}\to\mathbb{R}$, with $γ_\infty$ denoting an infinite (Gaussian) product measure on $\mathbb{R}^{\mathbb{N}}$, we prove dimension-independent expression rate bounds in the norm of $L^2(\mathbb{R}^{\mathbb{N}},γ_\infty)$. The rates only depend on quantified holomorphy of (an analytic continuation of) the map $f$ to a product of strips in $\mathbb{C}^d$. As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2107.13422 [pdf, ps, other]

Sparse approximation of triangular transports. Part II: the infinite dimensional case

Authors: Jakob Zech, Youssef Marzouk

Abstract: For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior me… ▽ More For two probability measures $ρ$ and $π$ on $[-1,1]^{\mathbb{N}}$ we investigate the approximation of the triangular Knothe-Rosenblatt transport $T:[-1,1]^{\mathbb{N}}\to [-1,1]^{\mathbb{N}}$ that pushes forward $ρ$ to $π$. Under suitable assumptions, we show that $T$ can be approximated by rational functions without suffering from the curse of dimension. Our results are applicable to posterior measures arising in certain inference problems where the unknown belongs to an (infinite dimensional) Banach space. In particular, we show that it is possible to efficiently approximately sample from certain high-dimensional measures by transforming a lower-dimensional latent variable. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: The original manuscript arXiv:2006.06994v1 has been split into two parts; the present paper is the second part

arXiv:2006.06994 [pdf, ps, other]

Sparse approximation of triangular transports. Part I: the finite dimensional case

Authors: Jakob Zech, Youssef Marzouk

Abstract: For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expans… ▽ More For two probability measures $ρ$ and $π$ with analytic densities on the $d$-dimensional cube $[-1,1]^d$, we investigate the approximation of the unique triangular monotone Knothe-Rosenblatt transport $T:[-1,1]^d\to [-1,1]^d$, such that the pushforward $T_\sharpρ$ equals $π$. It is shown that for $d\in\mathbb{N}$ there exist approximations $\tilde T$ of $T$, based on either sparse polynomial expansions or deep ReLU neural networks, such that the distance between $\tilde T_\sharpρ$ and $π$ decreases exponentially. More precisely, we prove error bounds of the type $\exp(-βN^{1/d})$ (or $\exp(-βN^{1/(d+1)})$ for neural networks), where $N$ refers to the dimension of the ansatz space (or the size of the network) containing $\tilde T$; the notion of distance comprises the Hellinger distance, the total variation distance, the Wasserstein distance and the Kullback-Leibler divergence. Our construction guarantees $\tilde T$ to be a monotone triangular bijective transport on the hypercube $[-1,1]^d$. Analogous results hold for the inverse transport $S=T^{-1}$. The proofs are constructive, and we give an explicit a priori description of the ansatz space, which can be used for numerical implementations. △ Less

Submitted 28 July, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: The original manuscript arXiv:2006.06994v1 has been split into two parts; the present paper is the first part

MSC Class: 32D05; 41A10; 41A25; 41A46; 62D99; 65D15

arXiv:1912.03606 [pdf, other]

Individual predictions matter: Assessing the effect of data ordering in training fine-tuned CNNs for medical imaging

Authors: John R. Zech, Jessica Zosa Forde, Michael L. Littman

Abstract: We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (me… ▽ More We reproduced the results of CheXNet with fixed hyperparameters and 50 different random seeds to identify 14 finding in chest radiographs (x-rays). Because CheXNet fine-tunes a pre-trained DenseNet, the random seed affects the ordering of the batches of training data but not the initialized model weights. We found substantial variability in predictions for the same radiograph across model runs (mean ln[(maximum probability)/(minimum probability)] 2.45, coefficient of variation 0.543). This individual radiograph-level variability was not fully reflected in the variability of AUC on a large test set. Averaging predictions from 10 models reduced variability by nearly 70% (mean coefficient of variation from 0.543 to 0.169, t-test 15.96, p-value < 0.0001). We encourage researchers to be aware of the potential variability of CNNs and ensemble predictions from multiple models to minimize the effect this variability may have on the care of individual patients when these models are deployed clinically. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: J.Z. and J.F. contributed equally to this work

arXiv:1811.03695 [pdf, other]

Deep Learning Predicts Hip Fracture using Confounding Patient and Healthcare Variables

Authors: Marcus A. Badgeley, John R. Zech, Luke Oakden-Rayner, Benjamin S. Glicksberg, Manway Liu, William Gale, Michael V. McConnell, Beth Percha, Thomas M. Snyder, Joel T. Dudley

Abstract: Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs. Computer-Aided Diagnosis (CAD) algorithms have shown promise for hel** radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep learning mo… ▽ More Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs. Computer-Aided Diagnosis (CAD) algorithms have shown promise for hel** radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep learning models on 17,587 radiographs to classify fracture, five patient traits, and 14 hospital process variables. All 20 variables could be predicted from a radiograph (p < 0.05), with the best performances on scanner model (AUC=1.00), scanner brand (AUC=0.98), and whether the order was marked "priority" (AUC=0.79). Fracture was predicted moderately well from the image (AUC=0.78) and better when combining image features with patient data (AUC=0.86, p=2e-9) or patient data plus hospital process features (AUC=0.91, p=1e-21). The model performance on a test set with matched patient variables was significantly lower than a random test set (AUC=0.67, p=0.003); and when the test set was matched on patient and image acquisition variables, the model performed randomly (AUC=0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's predictive ability overall. We also used Naive Bayes to combine evidence from image models with patient and hospital data and found their inclusion improved performance, but that this approach was nevertheless inferior to directly modeling all variables. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep learning decision processes so that computers and clinicians can effectively cooperate. △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1807.00431 [pdf, other]

doi 10.1371/journal.pmed.1002683

Confounding variables can degrade generalization performance of radiological deep learning models

Authors: John R. Zech, Marcus A. Badgeley, Manway Liu, Anthony B. Costa, Joseph J. Titano, Eric K. Oermann

Abstract: Early results in using convolutional neural networks (CNNs) on x-rays to diagnose disease have been promising, but it has not yet been shown that models trained on x-rays from one hospital or one group of hospitals will work equally well at different hospitals. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize acros… ▽ More Early results in using convolutional neural networks (CNNs) on x-rays to diagnose disease have been promising, but it has not yet been shown that models trained on x-rays from one hospital or one group of hospitals will work equally well at different hospitals. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize across a variety of hospital systems. A cross-sectional design was used to train and evaluate pneumonia screening CNNs on 158,323 chest x-rays from NIH (n=112,120 from 30,805 patients), Mount Sinai (42,396 from 12,904 patients), and Indiana (n=3,807 from 3,683 patients). In 3 / 5 natural comparisons, performance on chest x-rays from outside hospitals was significantly lower than on held-out x-rays from the original hospital systems. CNNs were able to detect where an x-ray was acquired (hospital system, hospital department) with extremely high accuracy and calibrate predictions accordingly. The performance of CNNs in diagnosing diseases on x-rays may reflect not only their ability to identify disease-specific imaging findings on x-rays, but also their ability to exploit confounding information. Estimates of CNN performance based on test data from hospital systems used for model training may overstate their likely real-world performance. △ Less

Submitted 12 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

Journal ref: PLoS Med 15(11):e1002683 (2019)

arXiv:1407.1430 [pdf, other]

A Posteriori Error Estimation of hp-dG Finite Element Methods for Highly Indefinite Helmholtz Problems (extended version)

Authors: Stefan Sauter, Jakob Zech

Abstract: In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters… ▽ More In this paper, we will consider an $hp$-finite elements discretization of a highly indefinite Helmholtz problem by some dG formulation which is based on the ultra-weak variational formulation by Cessenat and Deprés. We will introduce an a posteriori error estimator and derive reliability and efficiency estimates which are explicit with respect to the wavenumber and the discretization parameters $h$ and $p$. In contrast to the conventional conforming finite element method for indefinite problems, the dG formulation is unconditionally stable and the adaptive discretization process may start from a very coarse initial mesh. Numerical experiments will illustrate the efficiency and robustness of the method. △ Less

Submitted 14 March, 2015; v1 submitted 5 July, 2014; originally announced July 2014.

Comments: 39 pages, 10 figures

arXiv:1305.1998 [pdf, other]

Inferring Team Strengths Using a Discrete Markov Random Field

Authors: John Zech, Frank Wood

Abstract: We propose an original model for inferring team strengths using a Markov Random Field, which can be used to generate historical estimates of the offensive and defensive strengths of a team over time. This model was designed to be applied to sports such as soccer or hockey, in which contest outcomes take value in a limited discrete space. We perform inference using a combination of Expectation Maxi… ▽ More We propose an original model for inferring team strengths using a Markov Random Field, which can be used to generate historical estimates of the offensive and defensive strengths of a team over time. This model was designed to be applied to sports such as soccer or hockey, in which contest outcomes take value in a limited discrete space. We perform inference using a combination of Expectation Maximization and Loopy Belief Propagation. The challenges of working with a non-convex optimization problem and a high-dimensional parameter space are discussed. The performance of the model is demonstrated on professional soccer data from the English Premier League. △ Less

Submitted 8 May, 2013; originally announced May 2013.

Showing 1–18 of 18 results for author: Zech, J