Search | arXiv e-print repository

doi 10.1093/biomet/asaa092

Bagging cross-validated bandwidths with application to Big Data

Authors: Daniel Barreiro-Ures, Ricardo Cao, Mario Francisco Fernández, Jeffrey D. Hart

Abstract: Hall and Robinson (2009) proposed and analyzed the use of bagged cross-validation to choose the bandwidth of a kernel density estimator. They established that bagging greatly reduces the noise inherent in ordinary cross-validation, and hence leads to a more efficient bandwidth selector. The asymptotic theory of Hall and Robinson (2009) assumes that $N$, the number of bagged subsamples, is… ▽ More Hall and Robinson (2009) proposed and analyzed the use of bagged cross-validation to choose the bandwidth of a kernel density estimator. They established that bagging greatly reduces the noise inherent in ordinary cross-validation, and hence leads to a more efficient bandwidth selector. The asymptotic theory of Hall and Robinson (2009) assumes that $N$, the number of bagged subsamples, is $\infty$. We expand upon their theoretical results by allowing $N$ to be finite, as it is in practice. Our results indicate an important difference in the rate of convergence of the bagged cross-validation bandwidth for the cases $N=\infty$ and $N<\infty$. Simulations quantify the improvement in statistical efficiency and computational speed that can result from using bagged cross-validation as opposed to a binned implementation of ordinary cross-validation. The performance of thebagged bandwidth is also illustrated on a real, very large, data set. Finally, a byproduct of our study is the correction of errors appearing in the Hall and Robinson (2009) expression for the asymptotic mean squared error of the bagging selector. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 37 pages, 9 figures

MSC Class: 62G07 (Primary); 62G20 (Secondary)

Journal ref: Bagging cross-validated bandwidths with application to Big Data. Biometrika (2021), 108(4), 981-988

arXiv:2303.11379 [pdf, other]

Solving High-Dimensional Inverse Problems with Auxiliary Uncertainty via Operator Learning with Limited Data

Authors: Joseph Hart, Mamikon Gulian, Indu Manickam, Laura Swiler

Abstract: In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost o… ▽ More In complex large-scale systems such as climate, important effects are caused by a combination of confounding processes that are not fully observable. The identification of sources from observations of system state is vital for attribution and prediction, which inform critical policy decisions. The difficulty of these types of inverse problems lies in the inability to isolate sources and the cost of simulating computational models. Surrogate models may enable the many-query algorithms required for source identification, but data challenges arise from high dimensionality of the state and source, limited ensembles of costly model simulations to train a surrogate model, and few and potentially noisy state observations for inversion due to measurement limitations. The influence of auxiliary processes adds an additional layer of uncertainty that further confounds source identification. We introduce a framework based on (1) calibrating deep neural network surrogates to the flow maps provided by an ensemble of simulations obtained by varying sources, and (2) using these surrogates in a Bayesian framework to identify sources from observations via optimization. Focusing on an atmospheric dispersion exemplar, we find that the expressive and computationally efficient nature of the deep neural network operator surrogates in appropriately reduced dimension allows for source identification with uncertainty quantification using limited data. Introducing a variable wind field as an auxiliary process, we find that a Bayesian approximation error approach is essential for reliable source inversion when uncertainty due to wind stresses the algorithm. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 29 pages, 10 figures

arXiv:2301.02283 [pdf, other]

Screening Methods for Classification Based on Non-parametric Bayesian Tests

Authors: Naveed Merchant, Jeffrey D. Hart

Abstract: Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly with the number of predictors by checking each predictor one at a time, and are a tool used to decrease the number of variables to consider before further analysis… ▽ More Feature or variable selection is a problem inherent to large data sets. While many methods have been proposed to deal with this problem, some can scale poorly with the number of predictors in a data set. Screening methods scale linearly with the number of predictors by checking each predictor one at a time, and are a tool used to decrease the number of variables to consider before further analysis or variable selection. For classification, there is a variety of techniques. There are parametric based screening tests, such as t-test or SIS based screening, and non-parametric based screening tests, such as Kolmogorov distance based screening, and MV-SIS. We propose a method for variable screening that uses Bayesian-motivated tests, compare it to SIS based screening, and provide example applications of the method on simulated and real data. It is shown that our screening method can lead to improvements in classification rate. This is so even when our method is used in conjunction with a classifier, such as DART, which is designed to select a sparse subset of variables. Finally, we propose a classifier based on kernel density estimates that in some cases can produce dramatic improvements in classification rates relative to DART. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2212.12386 [pdf, ps, other]

Hyper-differential sensitivity analysis in the context of Bayesian inference applied to ice-sheet problems

Authors: William Reese, Joseph Hart, Bart van Bloemen Waanders, Mauro Perego, John Jakeman, Arvind Saibaba

Abstract: Inverse problems constrained by partial differential equations (PDEs) play a critical role in model development and calibration. In many applications, there are multiple uncertain parameters in a model which must be estimated. Although the Bayesian formulation is attractive for such problems, computational cost and high dimensionality frequently prohibit a thorough exploration of the parametric un… ▽ More Inverse problems constrained by partial differential equations (PDEs) play a critical role in model development and calibration. In many applications, there are multiple uncertain parameters in a model which must be estimated. Although the Bayesian formulation is attractive for such problems, computational cost and high dimensionality frequently prohibit a thorough exploration of the parametric uncertainty. A common approach is to reduce the dimension by fixing some parameters (which we will call auxiliary parameters) to a best estimate and use techniques from PDE-constrained optimization to approximate properties of the Bayesian posterior distribution. For instance, the maximum a posteriori probability (MAP) and the Laplace approximation of the posterior covariance can be computed. In this article, we propose using hyper-differential sensitivity analysis (HDSA) to assess the sensitivity of the MAP point to changes in the auxiliary parameters. We establish an interpretation of HDSA as correlations in the posterior distribution. Our proposed framework is demonstrated on the inversion of bedrock topography for the Greenland ice sheet with uncertainties arising from the basal friction coefficient and climate forcing (ice accumulation rate) △ Less

Submitted 23 December, 2022; originally announced December 2022.

arXiv:2007.13171 [pdf, other]

Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

Authors: Elizabeth Newman, Lars Ruthotto, Joseph Hart, Bart van Bloemen Waanders

Abstract: Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogat… ▽ More Deep neural networks (DNNs) have achieved state-of-the-art performance across a variety of traditional machine learning tasks, e.g., speech recognition, image classification, and segmentation. The ability of DNNs to efficiently approximate high-dimensional functions has also motivated their use in scientific applications, e.g., to solve partial differential equations (PDE) and to generate surrogate models. In this paper, we consider the supervised training of DNNs, which arises in many of the above applications. We focus on the central problem of optimizing the weights of the given DNN such that it accurately approximates the relation between observed input and target data. Devising effective solvers for this optimization problem is notoriously challenging due to the large number of weights, non-convexity, data-sparsity, and non-trivial choice of hyperparameters. To solve the optimization problem more efficiently, we propose the use of variable projection (VarPro), a method originally designed for separable nonlinear least-squares problems. Our main contribution is the Gauss-Newton VarPro method (GNvpro) that extends the reach of the VarPro idea to non-quadratic objective functions, most notably, cross-entropy loss functions arising in classification. These extensions make GNvpro applicable to all training problems that involve a DNN whose last layer is an affine map**, which is common in many state-of-the-art architectures. In our four numerical experiments from surrogate modeling, segmentation, and classification GNvpro solves the optimization problem more efficiently than commonly-used stochastic gradient descent (SGD) schemes. Also, GNvpro finds solutions that generalize well, and in all but one example better than well-tuned SGD methods, to unseen data points. △ Less

Submitted 19 April, 2021; v1 submitted 26 July, 2020; originally announced July 2020.

Comments: 33 pages, 14 figures, 3 tables

MSC Class: 68T05; 49M15 ACM Class: I.2.6

arXiv:2006.00589 [pdf, other]

Deep R-Learning for Continual Area Swee**

Authors: Rishi Shah, Yuqian Jiang, Justin Hart, Peter Stone

Abstract: Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency. To address the scenario in which some points need to be visited more frequently than others, this problem has been extended to non-uniform coverage planning. This paper considers the variant of non-uniform cover… ▽ More Coverage path planning is a well-studied problem in robotics in which a robot must plan a path that passes through every point in a given area repeatedly, usually with a uniform frequency. To address the scenario in which some points need to be visited more frequently than others, this problem has been extended to non-uniform coverage planning. This paper considers the variant of non-uniform coverage in which the robot does not know the distribution of relevant events beforehand and must nevertheless learn to maximize the rate of detecting events of interest. This continual area swee** problem has been previously formalized in a way that makes strong assumptions about the environment, and to date only a greedy approach has been proposed. We generalize the continual area swee** formulation to include fewer environmental constraints, and propose a novel approach based on reinforcement learning in a Semi-Markov Decision Process. This approach is evaluated in an abstract simulation and in a high fidelity Gazebo simulation. These evaluations show significant improvement upon the existing approach in general settings, which is especially relevant in the growing area of service robotics. △ Less

Submitted 31 May, 2020; originally announced June 2020.

arXiv:2003.06368 [pdf, other]

Use of Cross-validation Bayes Factors to Test Equality of Two Densities

Authors: Jeffery Hart, Taeryon Choi, Naveed Merchant

Abstract: We propose a non-parametric, two-sample Bayesian test for checking whether or not two data sets share a common distribution. The test makes use of data splitting ideas and does not require priors for high-dimensional parameter vectors as do other nonparametric Bayesian procedures. We provide evidence that the new procedure provides more stable Bayes factors than do methods based on Pólya trees. So… ▽ More We propose a non-parametric, two-sample Bayesian test for checking whether or not two data sets share a common distribution. The test makes use of data splitting ideas and does not require priors for high-dimensional parameter vectors as do other nonparametric Bayesian procedures. We provide evidence that the new procedure provides more stable Bayes factors than do methods based on Pólya trees. Somewhat surprisingly, the behavior of the proposed Bayes factors when the two distributions are the same is usually superior to that of Pólya tree Bayes factors. We showcase the effectiveness of the test by proving its consistency, conducting a simulation study and applying the test to Higgs boson data. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:1812.07042 [pdf, other]

Robustness of the Sobol' indices to marginal distribution uncertainty

Authors: Joseph Hart, Pierre Gremaud

Abstract: Global sensitivity analysis (GSA) quantifies the influence of uncertain variables in a mathematical model. The Sobol' indices, a commonly used tool in GSA, seek to do this by attributing to each variable its relative contribution to the variance of the model output. In order to compute Sobol' indices, the user must specify a probability distribution for the uncertain variables. This distribution i… ▽ More Global sensitivity analysis (GSA) quantifies the influence of uncertain variables in a mathematical model. The Sobol' indices, a commonly used tool in GSA, seek to do this by attributing to each variable its relative contribution to the variance of the model output. In order to compute Sobol' indices, the user must specify a probability distribution for the uncertain variables. This distribution is typically unknown and must be chosen using limited data and/or knowledge. The usefulness of the Sobol' indices depends on their robustness to this distributional uncertainty. This article presents a novel method which uses "optimal perturbations" of the marginal probability density functions to analyze the robustness of the Sobol' indices. The method is illustrated through synthetic examples and a model for contaminant transport. △ Less

Submitted 17 December, 2018; originally announced December 2018.

Comments: 20 pages

arXiv:1811.06601 [pdf, other]

doi 10.1016/j.csda.2019.04.006

Estimating the Mean and Variance of a High-dimensional Normal Distribution Using a Mixture Prior

Authors: Shyamalendu Sinha, Jeffrey D. Hart

Abstract: This paper provides a framework for estimating the mean and variance of a high-dimensional normal density. The main setting considered is a fixed number of vector following a high-dimensional normal distribution with unknown mean and diagonal covariance matrix. The diagonal covariance matrix can be known or unknown. If the covariance matrix is unknown, the sample size can be as small as $2$. The p… ▽ More This paper provides a framework for estimating the mean and variance of a high-dimensional normal density. The main setting considered is a fixed number of vector following a high-dimensional normal distribution with unknown mean and diagonal covariance matrix. The diagonal covariance matrix can be known or unknown. If the covariance matrix is unknown, the sample size can be as small as $2$. The proposed estimator is based on the idea that the unobserved pairs of mean and variance for each dimension are drawn from an unknown bivariate distribution, which we model as a mixture of normal-inverse gammas. The mixture of normal-inverse gamma distributions provides advantages over more traditional empirical Bayes methods, which are based on a normal-normal model. When fitting a mixture model, we are essentially clustering the unobserved mean and variance pairs for each dimension into different groups, with each group having a different normal-inverse gamma distribution. The proposed estimator of each mean is the posterior mean of shrinkage estimates, each of which shrinks a sample mean towards a different component of the mixture distribution. Similarly, the proposed estimator of variance has an analogous interpretation in terms of sample variances and components of the mixture distribution. If diagonal covariance matrix is known, then the sample size can be as small as $1$, and we treat the pairs of known variance and unknown mean for each dimension as random observations coming from a flexible mixture of normal-inverse gamma distributions. △ Less

Submitted 15 November, 2018; originally announced November 2018.

Journal ref: Computational Statistics and Data Analysis 138 (2019) 201-221

arXiv:1708.07441 [pdf, other]

Global sensitivity analysis for statistical model parameters

Authors: Joseph Hart, Julie Bessac, Emil Constantinescu

Abstract: Global sensitivity analysis (GSA) is frequently used to analyze the influence of uncertain parameters in mathematical models and simulations. In principle, tools from GSA may be extended to analyze the influence of parameters in statistical models. Such analyses may enable reduced or parsimonious modeling and greater predictive capability. However, difficulties such as parameter correlation, model… ▽ More Global sensitivity analysis (GSA) is frequently used to analyze the influence of uncertain parameters in mathematical models and simulations. In principle, tools from GSA may be extended to analyze the influence of parameters in statistical models. Such analyses may enable reduced or parsimonious modeling and greater predictive capability. However, difficulties such as parameter correlation, model stochasticity, multivariate model output, and unknown parameter distributions prohibit a direct application of GSA tools to statistical models. By leveraging a loss function associated with the statistical model, we introduce a novel framework to address these difficulties and enable efficient GSA for statistical model parameters. Theoretical and computational properties are considered and illustrated on a synthetic example. The framework is applied to a Gaussian process model from the literature, which depends on 95 parameters. Non-influential parameters are discovered through GSA and a reduced model with equal or stronger predictive capability is constructed by using only 79 parameters. △ Less

Submitted 28 June, 2018; v1 submitted 24 August, 2017; originally announced August 2017.

Comments: revisions

arXiv:1609.00065 [pdf, other]

Partitioned Cross-Validation for Divide-and-Conquer Density Estimation

Authors: Anirban Bhattacharya, Jeffrey D. Hart

Abstract: We present an efficient method to estimate cross-validation bandwidth parameters for kernel density estimation in very large datasets where ordinary cross-validation is rendered highly inefficient, both statistically and computationally. Our approach relies on calculating multiple cross-validation bandwidths on partitions of the data, followed by suitable scaling and averaging to return a partitio… ▽ More We present an efficient method to estimate cross-validation bandwidth parameters for kernel density estimation in very large datasets where ordinary cross-validation is rendered highly inefficient, both statistically and computationally. Our approach relies on calculating multiple cross-validation bandwidths on partitions of the data, followed by suitable scaling and averaging to return a partitioned cross-validation bandwidth for the entire dataset. The partitioned cross-validation approach produces substantial computational gains over ordinary cross-validation. We additionally show that partitioned cross-validation can be statistically efficient compared to ordinary cross-validation. We derive analytic expressions for the asymptotically optimal number of partitions and study its finite sample accuracy through a detailed simulation study. We additionally propose a permuted version of partitioned cross-validation which attains even higher efficiency. Theoretical properties of the estimators are studied and the methodology is applied to the Higgs Boson dataset with 11 million observations △ Less

Submitted 31 August, 2016; originally announced September 2016.

arXiv:1602.08521 [pdf, ps, other]

Theoretical Properties and Practical Performance of Fully Robust One-Sided Cross-Validation

Authors: Olga Y. Savchuk, Jeffrey D. Hart

Abstract: Fully robust OSCV is a modification of the OSCV method that produces consistent bandwidth in the cases of smooth and nonsmooth regression functions. The current implementation of the method uses the kernel $H_I$ that is almost indistinguishable from the Gaussian kernel on the interval $[-4,4]$, but has negative tails. The theoretical properties and practical performances of the $H_I$- and $φ$-base… ▽ More Fully robust OSCV is a modification of the OSCV method that produces consistent bandwidth in the cases of smooth and nonsmooth regression functions. The current implementation of the method uses the kernel $H_I$ that is almost indistinguishable from the Gaussian kernel on the interval $[-4,4]$, but has negative tails. The theoretical properties and practical performances of the $H_I$- and $φ$-based OSCV versions are compared. The kernel $H_I$ tends to produce too low bandwidths in the smooth case. The $H_I$-based OSCV curves are shown to have wiggles appearing in the neighborhood of zero. The kernel $H_I$ uncovers sensitivity of the OSCV method to a tiny modification of the kernel used for the cross-validation purposes. The recently found robust bimodal kernels tend to produce OSCV curves with multiple local minima. The problem of finding a robust unimodal nonnegative kernel remains open. △ Less

Submitted 26 February, 2016; originally announced February 2016.

Comments: 9 figures, 2 tables

arXiv:1602.06218 [pdf, other]

Efficient computation of Sobol' indices for stochastic models

Authors: Joseph L. Hart, Alen Alexanderian, Pierre A. Gremaud

Abstract: Stochastic models are necessary for the realistic description of an increasing number of applications. The ability to identify influential parameters and variables is critical to a thorough analysis and understanding of the underlying phenomena. We present a new global sensitivity analysis approach for stochastic models, i.e., models with both uncertain parameters and intrinsic stochasticity. Our… ▽ More Stochastic models are necessary for the realistic description of an increasing number of applications. The ability to identify influential parameters and variables is critical to a thorough analysis and understanding of the underlying phenomena. We present a new global sensitivity analysis approach for stochastic models, i.e., models with both uncertain parameters and intrinsic stochasticity. Our method relies on an analysis of variance through a generalization of Sobol' indices and on the use of surrogate models. We show how to efficiently compute the statistical properties of the resulting indices and illustrate the effectiveness of our approach by computing first order Sobol' indices for two stochastic models. △ Less

Submitted 28 November, 2016; v1 submitted 19 February, 2016; originally announced February 2016.

Comments: Minor revisions

MSC Class: 60G99; 65C05; 65C20; 62H99; 62J02

arXiv:0812.0052 [pdf, ps, other]

Empirical study of indirect cross-validation

Authors: Olga Y. Savchuk, Jeffrey D. Hart, Simon J. Sheather

Abstract: In this paper we provide insight into the empirical properties of indirect cross-validation (ICV), a new method of bandwidth selection for kernel density estimators. First, we describe the method and report on the theoretical results used to develop a practical-purpose model for certain ICV parameters. Next, we provide a detailed description of a numerical study which shows that the ICV method u… ▽ More In this paper we provide insight into the empirical properties of indirect cross-validation (ICV), a new method of bandwidth selection for kernel density estimators. First, we describe the method and report on the theoretical results used to develop a practical-purpose model for certain ICV parameters. Next, we provide a detailed description of a numerical study which shows that the ICV method usually outperforms least squares cross-validation (LSCV) in finite samples. One of the major advantages of ICV is its increased stability compared to LSCV. Two real data examples show the benefit of using both ICV and a local version of ICV. △ Less

Submitted 29 November, 2008; originally announced December 2008.

Comments: 22 pages, 21 figures

arXiv:0812.0051 [pdf, ps, other]

Indirect Cross-validation for Density Estimation

Authors: Olga Y. Savchuk, Jeffrey D. Hart, Simon J. Sheather

Abstract: A new method of bandwidth selection for kernel density estimators is proposed. The method, termed indirect cross-validation, or ICV, makes use of so-called selection kernels. Least squares cross-validation (LSCV) is used to select the bandwidth of a selection-kernel estimator, and this bandwidth is appropriately rescaled for use in a Gaussian kernel estimator. The proposed selection kernels are… ▽ More A new method of bandwidth selection for kernel density estimators is proposed. The method, termed indirect cross-validation, or ICV, makes use of so-called selection kernels. Least squares cross-validation (LSCV) is used to select the bandwidth of a selection-kernel estimator, and this bandwidth is appropriately rescaled for use in a Gaussian kernel estimator. The proposed selection kernels are linear combinations of two Gaussian kernels, and need not be unimodal or positive. Theory is developed showing that the relative error of ICV bandwidths can converge to 0 at a rate of $n^{-1/4}$, which is substantially better than the $n^{-1/10}$ rate of LSCV. Interestingly, the selection kernels that are best for purposes of bandwidth selection are very poor if used to actually estimate the density function. This property appears to be part of the larger and well-documented paradox to the effect that "the harder the estimation problem, the better cross-validation performs." The ICV method uniformly outperforms LSCV in a simulation study, a real data example, and a simulated example in which bandwidths are chosen locally. △ Less

Submitted 29 November, 2008; originally announced December 2008.

Comments: 26 pages, 10 figures

Showing 1–15 of 15 results for author: Hart, J