-
Accelerated Inference for Partially Observed Markov Processes using Automatic Differentiation
Authors:
Kevin Tan,
Giles Hooker,
Edward L. Ionides
Abstract:
Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model paramete…
▽ More
Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model parameters. We show how to embed two existing AD particle filter methods in a theoretical framework that provides an extension to a new class of algorithms. This new class permits a bias/variance tradeoff and hence a mean squared error substantially lower than the existing algorithms. We develop likelihood maximization algorithms suited to the Monte Carlo properties of the AD gradient estimate. Our algorithms require only a differentiable simulator for the latent dynamic system; by contrast, most previous approaches to AD likelihood maximization for particle filters require access to the system's transition probabilities. Numerical results indicate that a hybrid algorithm that uses AD to refine a coarse solution from an iterated filtering algorithm show substantial improvement on current state-of-the-art methods for a challenging scientific benchmark problem.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
An Understanding of Principal Differential Analysis
Authors:
Edward Gunning,
Giles Hooker
Abstract:
In functional data analysis, replicate observations of a smooth functional process and its derivatives offer a unique opportunity to flexibly estimate continuous-time ordinary differential equation models. Ramsay (1996) first proposed to estimate a linear ordinary differential equation from functional data in a technique called Principal Differential Analysis, by formulating a functional regressio…
▽ More
In functional data analysis, replicate observations of a smooth functional process and its derivatives offer a unique opportunity to flexibly estimate continuous-time ordinary differential equation models. Ramsay (1996) first proposed to estimate a linear ordinary differential equation from functional data in a technique called Principal Differential Analysis, by formulating a functional regression in which the highest-order derivative of a function is modelled as a time-varying linear combination of its lower-order derivatives. Principal Differential Analysis was introduced as a technique for data reduction and representation, using solutions of the estimated differential equation as a basis to represent the functional data. In this work, we re-formulate PDA as a generative statistical model in which functional observations arise as solutions of a deterministic ODE that is forced by a smooth random error process. This viewpoint defines a flexible class of functional models based on differential equations and leads to an improved understanding and characterisation of the sources of variability in Principal Differential Analysis. It does, however, result in parameter estimates that can be heavily biased under the standard estimation approach of PDA. Therefore, we introduce an iterative bias-reduction algorithm that can be applied to improve parameter estimates. We also examine the utility of our approach when the form of the deterministic part of the differential equation is unknown and possibly non-linear, where Principal Differential Analysis is treated as an approximate model based on time-varying linearisation. We demonstrate our approach on simulated data from linear and non-linear differential equations and on real data from human movement biomechanics. Supplementary R code for this manuscript is available at \url{https://github.com/edwardgunning/UnderstandingOfPDAManuscript}.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Differentiable Programming for Differential Equations: A Review
Authors:
Facundo Sapienza,
Jordi Bolibar,
Frank Schäfer,
Brian Groenke,
Avik Pal,
Victor Boussange,
Patrick Heimbach,
Giles Hooker,
Fernando Pérez,
Per-Olof Persson,
Christopher Rackauckas
Abstract:
The differentiable programming paradigm is a cornerstone of modern scientific computing. It refers to numerical methods for computing the gradient of a numerical model's output. Many scientific models are based on differential equations, where differentiable programming plays a crucial role in calculating model sensitivities, inverting model parameters, and training hybrid models that combine diff…
▽ More
The differentiable programming paradigm is a cornerstone of modern scientific computing. It refers to numerical methods for computing the gradient of a numerical model's output. Many scientific models are based on differential equations, where differentiable programming plays a crucial role in calculating model sensitivities, inverting model parameters, and training hybrid models that combine differential equations with data-driven approaches. Furthermore, recognizing the strong synergies between inverse methods and machine learning offers the opportunity to establish a coherent framework applicable to both fields. Differentiating functions based on the numerical solution of differential equations is non-trivial. Numerous methods based on a wide variety of paradigms have been proposed in the literature, each with pros and cons specific to the type of problem investigated. Here, we provide a comprehensive review of existing techniques to compute derivatives of numerical solutions of differential equations. We first discuss the importance of gradients of solutions of differential equations in a variety of scientific domains. Second, we lay out the mathematical foundations of the various approaches and compare them with each other. Third, we cover the computational considerations and explore the solutions available in modern scientific software. Last but not least, we provide best-practices and recommendations for practitioners. We hope that this work accelerates the fusion of scientific models and data, and fosters a modern approach to scientific modelling.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots
Authors:
Xi Xin,
Giles Hooker,
Fei Huang
Abstract:
The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework mo…
▽ More
The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our results show that it is possible to intentionally hide the discriminatory behavior of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. Managerial insights for regulators and practitioners are provided based on the findings.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Longitudinal Counterfactuals: Constraints and Opportunities
Authors:
Alexander Asemota,
Giles Hooker
Abstract:
Counterfactual explanations are a common approach to providing recourse to data subjects. However, current methodology can produce counterfactuals that cannot be achieved by the subject, making the use of counterfactuals for recourse difficult to justify in practice. Though there is agreement that plausibility is an important quality when using counterfactuals for algorithmic recourse, ground trut…
▽ More
Counterfactual explanations are a common approach to providing recourse to data subjects. However, current methodology can produce counterfactuals that cannot be achieved by the subject, making the use of counterfactuals for recourse difficult to justify in practice. Though there is agreement that plausibility is an important quality when using counterfactuals for algorithmic recourse, ground truth plausibility continues to be difficult to quantify. In this paper, we propose using longitudinal data to assess and improve plausibility in counterfactuals. In particular, we develop a metric that compares longitudinal differences to counterfactual differences, allowing us to evaluate how similar a counterfactual is to prior observed changes. Furthermore, we use this metric to generate plausible counterfactuals. Finally, we discuss some of the inherent difficulties of using counterfactuals for recourse.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Provably Stable Feature Rankings with SHAP and LIME
Authors:
Jeremy Goldwasser,
Giles Hooker
Abstract:
Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, the calculation of popular methods for scoring input variables such as SHAP and LIME suffers from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that ensure the most important features are ranked correctly with h…
▽ More
Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, the calculation of popular methods for scoring input variables such as SHAP and LIME suffers from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that ensure the most important features are ranked correctly with high probability. Given SHAP estimates from KernelSHAP or Shapley Sampling, we demonstrate how to retrospectively verify the number of stable rankings. Further, we introduce efficient sampling algorithms for SHAP and LIME that guarantee the $K$ highest-ranked features have the proper ordering. Finally, we show how to adapt these local feature attribution methods for the global importance setting.
△ Less
Submitted 2 June, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Stabilizing Estimates of Shapley Values with Control Variates
Authors:
Jeremy Goldwasser,
Giles Hooker
Abstract:
Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applic…
▽ More
Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.
△ Less
Submitted 9 April, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
A Generic Approach for Reproducible Model Distillation
Authors:
Yunzhe Zhou,
Peiru Xu,
Giles Hooker
Abstract:
Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when kee** the teacher fixed, the corresponded interpretation is not reliable. Existing strategies…
▽ More
Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when kee** the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough corpus of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed for a specific student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the average loss. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a corpus size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available at https://github.com/yunzhe-zhou/GenericDistillation.
△ Less
Submitted 27 April, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
The Infinitesimal Jackknife and Combinations of Models
Authors:
Indrayudh Ghosal,
Yunzhe Zhou,
Giles Hooker
Abstract:
The Infinitesimal Jackknife is a general method for estimating variances of parametric models, and more recently also for some ensemble methods. In this paper we extend the Infinitesimal Jackknife to estimate the covariance between any two models. This can be used to quantify uncertainty for combinations of models, or to construct test statistics for comparing different models or ensembles of mode…
▽ More
The Infinitesimal Jackknife is a general method for estimating variances of parametric models, and more recently also for some ensemble methods. In this paper we extend the Infinitesimal Jackknife to estimate the covariance between any two models. This can be used to quantify uncertainty for combinations of models, or to construct test statistics for comparing different models or ensembles of models fitted using the same training dataset. Specific examples in this paper use boosted combinations of models like random forests and M-estimators. We also investigate its application on neural networks and ensembles of XGBoost models. We illustrate the efficacy of variance estimates through extensive simulations and its application to the Bei**g Housing data, and demonstrate the theoretical consistency of the Infinitesimal Jackknife covariance estimate.
△ Less
Submitted 31 August, 2022;
originally announced September 2022.
-
S-LIME: Stabilized-LIME for Model Explanation
Authors:
Zhengze Zhou,
Giles Hooker,
Fei Wang
Abstract:
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely us…
▽ More
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Generalised Boosted Forests
Authors:
Indrayudh Ghosal,
Giles Hooker
Abstract:
This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the sam…
▽ More
This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a \textit{generalised boosted forest}. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.
△ Less
Submitted 2 March, 2021; v1 submitted 24 February, 2021;
originally announced February 2021.
-
Bridging Breiman's Brook: From Algorithmic Modeling to Statistical Learning
Authors:
Lucas Mentch,
Giles Hooker
Abstract:
In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developmen…
▽ More
In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breiman's own Random Forest methods. While this can be simplistically described as "Breiman won", these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
Generalized Single Index Models and Jensen Effects on Reproduction and Survival
Authors:
Zi Ye,
Giles Hooker,
Stephen P. Ellner
Abstract:
Environmental variability often has substantial impacts on natural populations and communities through its effects on the performance of individuals. Because organisms' responses to environmental conditions are often nonlinear (e.g., decreasing performance on both sides of an optimal temperature), the mean response is often different from the response in the mean environment. Ye et. al. 2020, prop…
▽ More
Environmental variability often has substantial impacts on natural populations and communities through its effects on the performance of individuals. Because organisms' responses to environmental conditions are often nonlinear (e.g., decreasing performance on both sides of an optimal temperature), the mean response is often different from the response in the mean environment. Ye et. al. 2020, proposed testing for the presence of such variance effects on individual or population growth rates by estimating the "Jensen Effect", the difference in average growth rates under varying versus fixed environments, in functional single index models for environmental effects on growth. In this paper, we extend this analysis to effect of environmental variance on reproduction and survival, which have count and binary outcomes. In the standard generalized linear models used to analyze such data the direction of the Jensen Effect is tacitly assumed a priori by the model's link function. Here we extend the methods of Ye et. al. 2020 using a generalized single index model to test whether this assumed direction is contradicted by the data. We show that our test has reasonable power under mild alternatives, but requires sample sizes that are larger than are often available. We demonstrate our methods on a long-term time series of plant ground cover on the Idaho steppe.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Selecting the Derivative of a Functional Covariate in Scalar-on-Function Regression
Authors:
Giles Hooker,
Hanlin Shang
Abstract:
This paper presents tests to formally choose between regression models using different derivatives of a functional covariate in scalar-on-function regression. We demonstrate that for linear regression, models using different derivatives can be nested within a model that includes point-impact effects at the end-points of the observed functions. Contrasts can then be employed to test the specificati…
▽ More
This paper presents tests to formally choose between regression models using different derivatives of a functional covariate in scalar-on-function regression. We demonstrate that for linear regression, models using different derivatives can be nested within a model that includes point-impact effects at the end-points of the observed functions. Contrasts can then be employed to test the specification of different derivatives. When nonlinear regression models are defined, we apply a $J$ test to determine the statistical significance of the nonlinear structure between a functional covariate and a scalar response. The finite-sample performance of these methods is verified in simulation, and their practical application is demonstrated using a chemometric data set.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
$V$-statistics and Variance Estimation
Authors:
Zhengze Zhou,
Lucas Mentch,
Giles Hooker
Abstract:
This paper develops a general framework for analyzing asymptotics of $V$-statistics. Previous literature on limiting distribution mainly focuses on the cases when $n \to \infty$ with fixed kernel size $k$. Under some regularity conditions, we demonstrate asymptotic normality when $k$ grows with $n$ by utilizing existing results for $U$-statistics. The key in our approach lies in a mathematical red…
▽ More
This paper develops a general framework for analyzing asymptotics of $V$-statistics. Previous literature on limiting distribution mainly focuses on the cases when $n \to \infty$ with fixed kernel size $k$. Under some regularity conditions, we demonstrate asymptotic normality when $k$ grows with $n$ by utilizing existing results for $U$-statistics. The key in our approach lies in a mathematical reduction to $U$-statistics by designing an equivalent kernel for $V$-statistics. We also provide a unified treatment on variance estimation for both $U$- and $V$-statistics by observing connections to existing methods and proposing an empirically more accurate estimator. Ensemble methods such as random forests, where multiple base learners are trained and aggregated for prediction purposes, serve as a running example throughout the paper because they are a natural and flexible application of $V$-statistics.
△ Less
Submitted 6 May, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
Authors:
Benjamin Lengerich,
Sarah Tan,
Chun-Hao Chang,
Giles Hooker,
Rich Caruana
Abstract:
Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits "contradictory" models to represent the same function. To solve this problem, we propose pure int…
▽ More
Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits "contradictory" models to represent the same function. To solve this problem, we propose pure interaction effects: variance in the outcome which cannot be represented by any smaller subset of features. This definition has an equivalence with the Functional ANOVA decomposition. To compute this decomposition, we present a fast, exact algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to Generalized Additive Models with interactions trained on several datasets and show large disparity, including contradictions, between the effects before and after purification. These results underscore the need to specify data distributions and ensure identifiability before interpreting model parameters.
△ Less
Submitted 1 May, 2020; v1 submitted 12 November, 2019;
originally announced November 2019.
-
Unrestricted Permutation forces Extrapolation: Variable Importance Requires at least One More Model, or There Is No Free Variable Importance
Authors:
Giles Hooker,
Lucas Mentch,
Siyu Zhou
Abstract:
This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationall…
▽ More
This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationally efficient and widely available in software. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. The purpose of our work here is to (i) review this growing body of literature, (ii) provide further demonstrations of these drawbacks along with a detailed explanation as to why they occur, and (iii) advocate for alternative measures that involve additional modeling. In particular, we describe how breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects across various model setups and find support for previous claims in the literature that PaP metrics can vastly over-emphasize correlated features in both variable importance measures and partial dependence plots. As an alternative, we discuss and recommend more direct approaches that involve measuring the change in model performance after muting the effects of the features under investigation.
△ Less
Submitted 7 October, 2021; v1 submitted 1 May, 2019;
originally announced May 2019.
-
Tree Boosted Varying Coefficient Models
Authors:
Yichen Zhou,
Giles Hooker
Abstract:
This paper investigates the integration of gradient boosted decision trees and varying coefficient models. We introduce the tree boosted varying coefficient framework which justifies the implementation of decision tree boosting as the nonparametric effect modifiers in varying coefficient models. This framework requires no structural assumptions in the space containing the varying coefficient covar…
▽ More
This paper investigates the integration of gradient boosted decision trees and varying coefficient models. We introduce the tree boosted varying coefficient framework which justifies the implementation of decision tree boosting as the nonparametric effect modifiers in varying coefficient models. This framework requires no structural assumptions in the space containing the varying coefficient covariates, is easy to implement, and keeps a balance between model complexity and interpretability. To provide statistical guarantees, we prove the asymptotic consistency of the proposed method under the regression settings with $L^2$ loss. We further conduct a thorough empirical study to show that the proposed method is capable of providing accurate predictions as well as intelligible visual explanations.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
Unbiased Measurement of Feature Importance in Tree-Based Methods
Authors:
Zhengze Zhou,
Giles Hooker
Abstract:
We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summ…
▽ More
We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summaries and screening tools.
△ Less
Submitted 23 March, 2020; v1 submitted 12 March, 2019;
originally announced March 2019.
-
The Jensen Effect and Functional Single Index Models: Estimating the Ecological Implications of Nonlinear Reaction Norms
Authors:
Zi Ye,
Giles Hooker,
Stephen Ellner
Abstract:
This paper develops tools to characterize how species are affected by environmental variability, based on a functional single index model relating a response such as growth rate or survival to environmental conditions. In ecology, the curvature of such responses are used, via Jensen's inequality, to determine whether environmental variability is harmful or beneficial, and differing nonlinear respo…
▽ More
This paper develops tools to characterize how species are affected by environmental variability, based on a functional single index model relating a response such as growth rate or survival to environmental conditions. In ecology, the curvature of such responses are used, via Jensen's inequality, to determine whether environmental variability is harmful or beneficial, and differing nonlinear responses to environmental variability can contribute to the coexistence of competing species.
Here, we address estimation and inference for these models with observational data on individual responses to environmental conditions. Because nonparametric estimation of the curvature (second derivative) in a nonparametric functional single index model requires unrealistic sample sizes, we instead focus on directly estimating the effect of the nonlinearity, by comparing the average response to a variable environment with the response at the expected environment, which we call the Jensen Effect. We develop a test statistic to assess whether this effect is significantly different from zero. In doing so we re-interpret the SiZer method of Chaudhuri and Marron (1995) by maximizing a test statistic over smoothing parameters. We show that our proposed method works well both in simulations and on real ecological data from the long-term data set described in Drake (2005).
△ Less
Submitted 16 December, 2019; v1 submitted 7 January, 2019;
originally announced January 2019.
-
Asymptotic Properties for Methods Combining Minimum Hellinger Distance Estimates and Bayesian Nonparametric Density Estimates
Authors:
Yuefeng Wu,
Giles Hooker
Abstract:
In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficienty when the parametric model is correct. This paper seeks to extend these results to the use of nonparametric Bayesian density estimators within disparity methods. We propose two estimators: one replaces…
▽ More
In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficienty when the parametric model is correct. This paper seeks to extend these results to the use of nonparametric Bayesian density estimators within disparity methods. We propose two estimators: one replaces the kernel density estimator with the expected posterior density from a random histogram prior; the other induces a posterior over parameters through the posterior for the random histogram. We show that it is possible to adapt the mathematical machinery of efficient influence functions from semiparametric models to demonstrate that both our estimators are efficient in the sense of achieving the Cramer-Rao lower bound. We further demonstrate a Bernstein-von-Mises result for our second estimator indicating that it's posterior is asymptotically Gaussian. In addition, the robustness properties of classical minimum Hellinger distance estimators continue to hold.
△ Less
Submitted 11 December, 2018; v1 submitted 18 October, 2018;
originally announced October 2018.
-
Approximation Trees: Statistical Stability in Model Distillation
Authors:
Yichen Zhou,
Zhengze Zhou,
Giles Hooker
Abstract:
This paper examines the stability of learned explanations for black-box predictions via model distillation with decision trees. One approach to intelligibility in machine learning is to use an understandable `student' model to mimic the output of an accurate `teacher'. Here, we consider the use of regression trees as a student model, in which nodes of the tree can be used as `explanations' for par…
▽ More
This paper examines the stability of learned explanations for black-box predictions via model distillation with decision trees. One approach to intelligibility in machine learning is to use an understandable `student' model to mimic the output of an accurate `teacher'. Here, we consider the use of regression trees as a student model, in which nodes of the tree can be used as `explanations' for particular predictions, and the whole structure of the tree can be used as a global representation of the resulting function. However, individual trees are sensitive to the particular data sets used to train them, and an interpretation of a student model may be suspect if small changes in the training data have a large effect on it. In this context, access to outcomes from a teacher helps to stabilize the greedy splitting strategy by generating a much larger corpus of training examples than was originally available. We develop tests to ensure that enough examples are generated at each split so that the same splitting rule would be chosen with high probability were the tree to be re trained. Further, we develop a stop** rule to indicate how deep the tree should be built based on recent results on the variability of Random Forests when these are used as the teacher. We provide concrete examples of these procedures on the CAD-MDD and COMPAS data sets.
△ Less
Submitted 22 August, 2018;
originally announced August 2018.
-
Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution
Authors:
Yichen Zhou,
Giles Hooker
Abstract:
This paper examines a novel gradient boosting framework for regression. We regularize gradient boosted trees by introducing subsampling and employ a modified shrinkage algorithm so that at every boosting stage the estimate is given by an average of trees. The resulting algorithm, titled Boulevard, is shown to converge as the number of trees grows. We also demonstrate a central limit theorem for th…
▽ More
This paper examines a novel gradient boosting framework for regression. We regularize gradient boosted trees by introducing subsampling and employ a modified shrinkage algorithm so that at every boosting stage the estimate is given by an average of trees. The resulting algorithm, titled Boulevard, is shown to converge as the number of trees grows. We also demonstrate a central limit theorem for this limit, allowing a characterization of uncertainty for predictions. A simulation study and real world examples provide support for both the predictive accuracy of the model and its limiting behavior.
△ Less
Submitted 13 September, 2019; v1 submitted 25 June, 2018;
originally announced June 2018.
-
Local Quadratic Estimation of the Curvature in a Functional Single Index Model
Authors:
Zi Ye,
Giles Hooker
Abstract:
The nonlinear effects of environmental variability on species abundance plays an important role in the maintenance of ecological diversity. Nonetheless, many common models use parametric nonlinear terms pre-determining ecological conclusions. Motivated by this concern, we study the estimate of the second derivative (curvature) of the link function g in a functional single index model. Since the co…
▽ More
The nonlinear effects of environmental variability on species abundance plays an important role in the maintenance of ecological diversity. Nonetheless, many common models use parametric nonlinear terms pre-determining ecological conclusions. Motivated by this concern, we study the estimate of the second derivative (curvature) of the link function g in a functional single index model. Since the coefficient function and the link function are both unknown, the estimate is expressed as a nested optimization. For a fixed and unknown coefficient function, the link function and its second derivative are estimated by local quadratic approximation, then the coefficient function is estimated by minimizing the MSE of the model. In this paper, we derive the rate of convergence of the estimation. In addition, we prove that the argument of g, can be estimated root-n consistently. However, practical implementation of the method requires solving a nonlinear optimization problem, and our results show that the estimates of the link function and the coefficient function are quite sensitive to the choices of starting values.
△ Less
Submitted 25 March, 2018;
originally announced March 2018.
-
Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate
Authors:
Indrayudh Ghosal,
Giles Hooker
Abstract:
In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a \textit{one-step boosted forest}. We show with simulated and real data that the one-step boosted forest h…
▽ More
In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a \textit{one-step boosted forest}. We show with simulated and real data that the one-step boosted forest has a reduced bias compared to the original random forest. The paper also provides a variance estimate of the one-step boosted forest by an extension of the infinitesimal Jackknife estimator. Using this variance estimate we can construct prediction intervals for the boosted forest and we show that they have good coverage probabilities. Combining the bias reduction and the variance estimate we show that the one-step boosted forest has a significant reduction in predictive mean squared error and thus an improvement in predictive performance. When applied on datasets from the UCI database, one-step boosted forest performs better than random forest and gradient boosting machine algorithms. Theoretically we can also extend such a boosting process to more than one step and the same principles outlined in this paper can be used to find variance estimates for such predictors. Such boosting will reduce bias even further but it risks over-fitting and also increases the computational burden.
△ Less
Submitted 22 April, 2020; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Considerations When Learning Additive Explanations for Black-Box Models
Authors:
Sarah Tan,
Giles Hooker,
Paul Koch,
Albert Gordo,
Rich Caruana
Abstract:
Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additiv…
▽ More
Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additive components in a black-box model's prediction function in different ways. We use the concepts of main and total effects to anchor additive explanations, and quantitatively evaluate additive and non-additive explanations. Even though distilled explanations are generally the most accurate additive explanations, non-additive explanations such as tree explanations that explicitly model non-additive components tend to be even more accurate. Despite this, our user study showed that machine learning practitioners were better able to leverage additive explanations for various tasks. These considerations should be taken into account when considering which explanation to trust and use to explain black-box models.
△ Less
Submitted 31 July, 2023; v1 submitted 25 January, 2018;
originally announced January 2018.
-
A Double Parametric Bootstrap Test for Topic Models
Authors:
Skyler Seto,
Sarah Tan,
Giles Hooker,
Martin T. Wells
Abstract:
Non-negative matrix factorization (NMF) is a technique for finding latent representations of data. The method has been applied to corpora to construct topic models. However, NMF has likelihood assumptions which are often violated by real document corpora. We present a double parametric bootstrap test for evaluating the fit of an NMF-based topic model based on the duality of the KL divergence and P…
▽ More
Non-negative matrix factorization (NMF) is a technique for finding latent representations of data. The method has been applied to corpora to construct topic models. However, NMF has likelihood assumptions which are often violated by real document corpora. We present a double parametric bootstrap test for evaluating the fit of an NMF-based topic model based on the duality of the KL divergence and Poisson maximum likelihood estimation. The test correctly identifies whether a topic model based on an NMF approach yields reliable results in simulated and real data.
△ Less
Submitted 20 November, 2017; v1 submitted 19 November, 2017;
originally announced November 2017.
-
Statistical Inference on Tree Swallow Migrations with Random Forests
Authors:
Tim Coleman,
Lucas Mentch,
Daniel Fink,
Frank La Sorte,
Giles Hooker,
Wesley Hochachka,
David Winkler
Abstract:
Bird species' migratory patterns have typically been studied through individual observations and historical records. In recent years however, the eBird citizen science project, which solicits observations from thousands of bird watchers around the world, has opened the door for a data-driven approach to understanding the large-scale geographical movements. Here, we focus on the North American Tree…
▽ More
Bird species' migratory patterns have typically been studied through individual observations and historical records. In recent years however, the eBird citizen science project, which solicits observations from thousands of bird watchers around the world, has opened the door for a data-driven approach to understanding the large-scale geographical movements. Here, we focus on the North American Tree Swallow (\textit{Tachycineta bicolor}) occurrence patterns throughout the eastern United States. Migratory departure dates for this species are widely believed by both ornithologists and casual observers to vary substantially across years, but the reasons for this are largely unknown. In this work, we present evidence that maximum daily temperature is a major factor influencing Tree Swallow occurrence. Because it is generally understood that species occurrence is a function of many complex, high-order interactions between ecological covariates, we utilize the flexible modeling approach offered by random forests. Making use of recent asymptotic results, we provide formal hypothesis tests for predictive significance various covariates and also develop and implement a permutation-based approach for formally assessing interannual variations by treating the prediction surfaces generated by random forests as functional data. Each of these tests suggest that maximum daily temperature has a significant effect on migration patterns.
△ Less
Submitted 8 November, 2019; v1 submitted 26 October, 2017;
originally announced October 2017.
-
Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
Authors:
Sarah Tan,
Rich Caruana,
Giles Hooker,
Yin Lou
Abstract:
Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillatio…
▽ More
Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillation to a second un-distilled transparent model trained on ground-truth outcomes, and use differences between the two models to gain insight into the black-box model. Our approach can be applied in a realistic setting, without probing the black-box model API. We demonstrate the approach on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS.
△ Less
Submitted 11 October, 2018; v1 submitted 17 October, 2017;
originally announced October 2017.
-
Timing Observations of Diffusions
Authors:
Aurya Javeed,
Giles Hooker
Abstract:
This paper addresses a problem in experimental design: We consider Itô diffusions specified by some $θ\in \mathbb{R}$ and assume that we are allowed to observe their sample paths only $n$ times before a terminal time $τ< \infty$. We propose a policy for timing these observations to optimally estimate $θ$. Our policy is adaptive (meaning it leverages earlier observations), and it maximizes the expe…
▽ More
This paper addresses a problem in experimental design: We consider Itô diffusions specified by some $θ\in \mathbb{R}$ and assume that we are allowed to observe their sample paths only $n$ times before a terminal time $τ< \infty$. We propose a policy for timing these observations to optimally estimate $θ$. Our policy is adaptive (meaning it leverages earlier observations), and it maximizes the expected Fisher information for $θ$ carried by the observations. In numerical studies, this design reduces the variation of estimated parameters by as much as 75% relative to observations spaced uniformly in time. The policy depends on the value of the parameter being estimated, so we also discuss strategies for incorporating Bayesian priors over $θ$.
△ Less
Submitted 3 September, 2017;
originally announced September 2017.
-
Sparse Inverse Covariance Estimation for High-throughput microRNA Sequencing Data in the Poisson Log-Normal Graphical Model
Authors:
David Sinclair,
Giles Hooker
Abstract:
We introduce the Poisson Log-Normal Graphical Model for count data, and present a normality transformation for data arising from this distribution. The model and transformation are feasible for high-throughput microRNA (miRNA) sequencing data and directly account for known overdispersion relationships present in this data set. The model allows for network dependencies to be modeled, and we provide…
▽ More
We introduce the Poisson Log-Normal Graphical Model for count data, and present a normality transformation for data arising from this distribution. The model and transformation are feasible for high-throughput microRNA (miRNA) sequencing data and directly account for known overdispersion relationships present in this data set. The model allows for network dependencies to be modeled, and we provide an algorithm which utilizes a one-step EM based result in order to allow for a provable increase in performance in determining the network structure. The model is shown to provide an increase in performance in simulation settings over a range of network structures. The model is applied to high-throughput miRNA sequencing data from patients with breast cancer from The Cancer Genome Atlas (TCGA). By selecting the most highly connected miRNA molecules in the fitted network we find that nearly all of them are known to be involved in the regulation of breast cancer.
△ Less
Submitted 15 August, 2017;
originally announced August 2017.
-
An Expectation Maximization Algorithm for High-Dimensional Model Selection for the Ising Model with Misclassified States
Authors:
David G. Sinclair,
Giles Hooker
Abstract:
We propose the misclassified Ising Model; a framework for analyzing dependent binary data where the binary state is susceptible to error. We extend the theoretical results of the model selection method presented in Ravikumar et. al. (2010) to show that the method will still correctly identify edges in the underlying graphical model under suitable misclassification settings. With knowledge of the m…
▽ More
We propose the misclassified Ising Model; a framework for analyzing dependent binary data where the binary state is susceptible to error. We extend the theoretical results of the model selection method presented in Ravikumar et. al. (2010) to show that the method will still correctly identify edges in the underlying graphical model under suitable misclassification settings. With knowledge of the misclassification process, an expectation maximization algorithm is developed that accounts for misclassification during model selection. We illustrate the increase of performance of the proposed expectation maximization algorithm with simulated data, and using data from a functional magnetic resonance imaging analysis.
△ Less
Submitted 19 April, 2017;
originally announced April 2017.
-
Machine Learning and the Future of Realism
Authors:
Giles Hooker,
Cliff Hooker
Abstract:
The preceding three decades have seen the emergence, rise, and proliferation of machine learning (ML). From half-recognised beginnings in perceptrons, neural nets, and decision trees, algorithms that extract correlations (that is, patterns) from a set of data points have broken free from their origin in computational cognition to embrace all forms of problem solving, from voice recognition to medi…
▽ More
The preceding three decades have seen the emergence, rise, and proliferation of machine learning (ML). From half-recognised beginnings in perceptrons, neural nets, and decision trees, algorithms that extract correlations (that is, patterns) from a set of data points have broken free from their origin in computational cognition to embrace all forms of problem solving, from voice recognition to medical diagnosis to automated scientific research and driverless cars, and it is now widely opined that the real industrial revolution lies less in mobile phone and similar than in the maturation and universal application of ML. Among the consequences just might be the triumph of anti-realism over realism.
△ Less
Submitted 15 April, 2017;
originally announced April 2017.
-
Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable
Authors:
Sarah Tan,
Matvey Soloviev,
Giles Hooker,
Martin T. Wells
Abstract:
Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Booste…
▽ More
Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Boosted Tree models, and propose new, adaptive prototype selection methods with theoretical guarantees, with the flexibility to choose a different number of prototypes in each class. We demonstrate our methods on random forests and gradient boosted trees, showing that the prototypes can perform as well as or even better than the original tree ensemble when used as a nearest-prototype classifier. In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method. Hence, prototypes present a viable alternative to feature-based explanations for tree ensembles.
△ Less
Submitted 25 August, 2020; v1 submitted 21 November, 2016;
originally announced November 2016.
-
Interpreting Models via Single Tree Approximation
Authors:
Yichen Zhou,
Giles Hooker
Abstract:
We propose a procedure to build a decision tree which approximates the performance of complex machine learning models. This single approximation tree can be used to interpret and simplify the predicting pattern of random forests (RFs) and other models. The use of a tree structure is particularly relevant in medical questionnaires where it enables an adaptive shortening of the questionnaire, reduci…
▽ More
We propose a procedure to build a decision tree which approximates the performance of complex machine learning models. This single approximation tree can be used to interpret and simplify the predicting pattern of random forests (RFs) and other models. The use of a tree structure is particularly relevant in medical questionnaires where it enables an adaptive shortening of the questionnaire, reducing response burden. We study the asymptotic behavior of splits and introduce an improved splitting method designed to stabilize tree structure. Empirical studies on both simulation and real data sets illustrate that our method can simultaneously achieve high approximation power and stability.
△ Less
Submitted 27 October, 2016;
originally announced October 2016.
-
Bootstrap Bias Corrections for Ensemble Methods
Authors:
Giles Hooker,
Lucas Mentch
Abstract:
This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning methods. We demonstrate empirically that the proposed bootstrap bias correction can lead to substantial improvements in both bias and predictive accuracy. In the context o…
▽ More
This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning methods. We demonstrate empirically that the proposed bootstrap bias correction can lead to substantial improvements in both bias and predictive accuracy. In the context of ensembles of trees, we show that this correction can be approximated at only double the cost of training the original ensemble without introducing additional variance. Our method is shown to improve test-set accuracy over random forests by up to 70\% on example problems from the UCI repository.
△ Less
Submitted 1 June, 2015;
originally announced June 2015.
-
Combining Functional Data Registration and Factor Analysis
Authors:
Cecilia Earls,
Giles Hooker
Abstract:
We extend the definition of functional data registration to encompass a larger class of registered functions. In contrast to traditional registration models, we allow for registered functions that have more than one primary direction of variation. The proposed Bayesian hierarchical model simultaneously registers the observed functions and estimates the two primary factors that characterize variati…
▽ More
We extend the definition of functional data registration to encompass a larger class of registered functions. In contrast to traditional registration models, we allow for registered functions that have more than one primary direction of variation. The proposed Bayesian hierarchical model simultaneously registers the observed functions and estimates the two primary factors that characterize variation in the registered functions. Each registered function is assumed to be predominantly composed of a linear combination of these two primary factors, and the function-specific weights for each observation are estimated within the registration model. We show how these estimated weights can easily be used to classify functions after registration using both simulated data and a juggling data set.
△ Less
Submitted 4 June, 2015; v1 submitted 2 February, 2015;
originally announced February 2015.
-
Adapted Variational Bayes for Functional Data Registration, Smoothing, and Prediction
Authors:
Cecilia Earls,
Giles Hooker
Abstract:
We propose a model for functional data registration that compares favorably to the best methods of functional data registration currently available. It also extends current inferential capabilities for unregistered data by providing a flexible probabilistic framework that 1) allows for functional prediction in the context of registration and 2) can be adapted to include smoothing and registration…
▽ More
We propose a model for functional data registration that compares favorably to the best methods of functional data registration currently available. It also extends current inferential capabilities for unregistered data by providing a flexible probabilistic framework that 1) allows for functional prediction in the context of registration and 2) can be adapted to include smoothing and registration in one model. The proposed inferential framework is a Bayesian hierarchical model where the registered functions are modeled as Gaussian processes. To address the computational demands of inference in high-dimensional Bayesian models, we propose an adapted form of the variational Bayes algorithm for approximate inference that performs similarly to MCMC sampling methods for well-defined problems. The efficiency of the adapted variational Bayes (AVB) algorithm allows variability in a predicted registered, war**, and unregistered function to be depicted separately via bootstrap**. Temperature data related to the el-niño phenomenon is used to demonstrate the unique inferential capabilities for prediction provided by this model.
△ Less
Submitted 3 June, 2016; v1 submitted 2 February, 2015;
originally announced February 2015.
-
Functional Principal Components Analysis of Spatially Correlated Data
Authors:
Chong Liu,
Surajit Ray,
Giles Hooker
Abstract:
This paper focuses on the analysis of spatially correlated functional data. The between-curve correlation is modeled by correlating functional principal component scores of the functional data. We propose a Spatial Principal Analysis by Conditional Expectation framework to explicitly estimate spatial correlations and reconstruct individual curves. This approach works even when the observed data pe…
▽ More
This paper focuses on the analysis of spatially correlated functional data. The between-curve correlation is modeled by correlating functional principal component scores of the functional data. We propose a Spatial Principal Analysis by Conditional Expectation framework to explicitly estimate spatial correlations and reconstruct individual curves. This approach works even when the observed data per curve are sparse. Assuming spatial stationarity, empirical spatial correlations are calculated as the ratio of eigenvalues of the smoothed covariance surface $Cov(X_i(s),X_i(t))$ and cross-covariance surface $Cov(X_i(s), X_j(t))$ at locations indexed by $i$ and $j$. Then a anisotropy Matérn spatial correlation model is fit to empirical correlations. Finally, principal component scores are estimated to reconstruct the sparsely observed curves. This framework can naturally accommodate arbitrary covariance structures, but there is an enormous reduction in computation if one can assume the separability of temporal and spatial components. We propose hypothesis tests to examine the separability as well as the isotropy effect of spatial correlation. Simulation studies and applications of empirical data show improvements in the curve reconstruction using our framework over the method where curves are assumed to be independent. In addition, we show that the asymptotic properties of estimates in uncorrelated case still hold in our case if 'mild' spatial correlation is assumed.
△ Less
Submitted 17 November, 2014;
originally announced November 2014.
-
Maximal Autocorrelation Functions in Functional Data Analysis
Authors:
Giles Hooker,
Steven Roberts
Abstract:
This paper proposes a new factor rotation for the context of functional principal components analysis. This rotation seeks to re-represent a functional subspace in terms of directions of decreasing smoothness as represented by a generalized smoothing metric. The rotation can be implemented simply and we show on two examples that this rotation can improve the interpretability of the leading compone…
▽ More
This paper proposes a new factor rotation for the context of functional principal components analysis. This rotation seeks to re-represent a functional subspace in terms of directions of decreasing smoothness as represented by a generalized smoothing metric. The rotation can be implemented simply and we show on two examples that this rotation can improve the interpretability of the leading components.
△ Less
Submitted 17 July, 2014;
originally announced July 2014.
-
Truncated Linear Models for Functional Data
Authors:
Peter Hall,
Giles Hooker
Abstract:
A conventional linear model for functional data involves expressing a response variable $Y$ in terms of the explanatory function $X(t)$, via the model: $Y=a+\int_I b(t)X(t)dt+\hbox{error}$, where $a$ is a scalar, $b$ is an unknown function and $I=[0, α]$ is a compact interval. However, in some problems the support of $b$ or $X$, $I_1$ say, is a proper and unknown subset of $I$, and is a quantity o…
▽ More
A conventional linear model for functional data involves expressing a response variable $Y$ in terms of the explanatory function $X(t)$, via the model: $Y=a+\int_I b(t)X(t)dt+\hbox{error}$, where $a$ is a scalar, $b$ is an unknown function and $I=[0, α]$ is a compact interval. However, in some problems the support of $b$ or $X$, $I_1$ say, is a proper and unknown subset of $I$, and is a quantity of particular practical interest. In this paper, motivated by a real-data example involving particulate emissions, we develop methods for estimating $I_1$. We give particular emphasis to the case $I_1=[0,θ]$, where $θ\in(0,α]$, and suggest two methods for estimating $a$, $b$ and $θ$ jointly; we introduce techniques for selecting tuning parameters; and we explore properties of our methodology using both simulation and the real-data example mentioned above. Additionally, we derive theoretical properties of the methodology, and discuss implications of the theory. Our theoretical arguments give particular emphasis to the problem of identifiability.
△ Less
Submitted 30 June, 2014;
originally announced June 2014.
-
Formal Hypothesis Tests for Additive Structure in Random Forests
Authors:
Lucas Mentch,
Giles Hooker
Abstract:
While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference. However, the natural structure of ensemble learners like bagged trees and random forests has been shown to admit desirable asymptotic properties when base learners are built with…
▽ More
While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference. However, the natural structure of ensemble learners like bagged trees and random forests has been shown to admit desirable asymptotic properties when base learners are built with proper subsamples. In this work, we demonstrate that by defining an appropriate grid structure on the covariate space, we may carry out formal hypothesis tests for both variable importance and underlying additive model structure. To our knowledge, these tests represent the first statistical tools for investigating the underlying regression structure in a context such as random forests. We develop notions of total and partial additivity and further demonstrate that testing can be carried out at no additional computational cost by estimating the variance within the process of constructing the ensemble. Furthermore, we propose a novel extension of these testing procedures utilizing random projections in order to allow for computationally efficient testing procedures that retain high power even when the grid size is much larger than that of the training set.
△ Less
Submitted 26 August, 2016; v1 submitted 6 June, 2014;
originally announced June 2014.
-
Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests
Authors:
Lucas Mentch,
Giles Hooker
Abstract:
This work develops formal statistical inference procedures for machine learning ensemble methods. Ensemble methods based on bootstrap**, such as bagging and random forests, have improved the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider predicting by ave…
▽ More
This work develops formal statistical inference procedures for machine learning ensemble methods. Ensemble methods based on bootstrap**, such as bagging and random forests, have improved the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider predicting by averaging over trees built on subsamples of the training set and demonstrate that the resulting estimator takes the form of a U-statistic. As such, predictions for individual feature vectors are asymptotically normal, allowing for confidence intervals to accompany predictions. In practice, a subset of subsamples is used for computational speed; here our estimators take the form of incomplete U-statistics and equivalent results are derived. We further demonstrate that this setup provides a framework for testing the significance of features. Moreover, the internal estimation method we develop allows us to estimate the variance parameters and perform these inference procedures at no additional computational cost. Simulations and illustrations on a real dataset are provided.
△ Less
Submitted 10 September, 2015; v1 submitted 25 April, 2014;
originally announced April 2014.
-
Goodness of fit in nonlinear dynamics: Misspecified rates or misspecified states?
Authors:
Giles Hooker,
Stephen P. Ellner
Abstract:
This paper introduces diagnostic tests for the nature of lack of fit in ordinary differential equation models (ODEs) proposed for data. We present a hierarchy of three possible sources of lack of fit: unaccounted-for stochastic variation, misspecification of functional forms in rate equations, and omission of dynamic variables in the description of the system. We represent lack of fit by allowing…
▽ More
This paper introduces diagnostic tests for the nature of lack of fit in ordinary differential equation models (ODEs) proposed for data. We present a hierarchy of three possible sources of lack of fit: unaccounted-for stochastic variation, misspecification of functional forms in rate equations, and omission of dynamic variables in the description of the system. We represent lack of fit by allowing a parameter vector to vary over time, and propose generic testing procedures that do not rely on specific alternative models. Instead, different sources for lack of fit are characterized in terms of nonparametric relationships among latent variables. The tests are carried out through a combination of residual bootstrap and permutation methods. We demonstrate the effectiveness of these tests on simulated data and on real data from laboratory ecological experiments and electro-cardiogram data.
△ Less
Submitted 17 September, 2015; v1 submitted 1 December, 2013;
originally announced December 2013.
-
Restricted Likelihood Ratio Tests for Linearity in Scalar-on-Function Regression
Authors:
Mathew W. McLean,
Giles Hooker,
David Ruppert
Abstract:
We propose a procedure for testing the linearity of a scalar-on-function regression relationship. To do so, we use the functional generalized additive model (FGAM), a recently developed extension of the functional linear model. For a functional covariate X(t), the FGAM models the mean response as the integral with respect to t of F{X(t),t} where F is an unknown bivariate function. The FGAM can be…
▽ More
We propose a procedure for testing the linearity of a scalar-on-function regression relationship. To do so, we use the functional generalized additive model (FGAM), a recently developed extension of the functional linear model. For a functional covariate X(t), the FGAM models the mean response as the integral with respect to t of F{X(t),t} where F is an unknown bivariate function. The FGAM can be viewed as the natural functional extension of generalized additive models. We show how the functional linear model can be represented as a simple mixed model nested within the FGAM. Using this representation, we then consider restricted likelihood ratio tests for zero variance components in mixed models to test the null hypothesis that the functional linear model holds. The methods are general and can also be applied to testing for interactions in a multivariate additive model or for testing for no effect in the functional linear model. The performance of the proposed tests is assessed on simulated data and in an application to measuring diesel truck emissions, where strong evidence of nonlinearities in the relationship between the functional predictor and the response are found.
△ Less
Submitted 22 October, 2013;
originally announced October 2013.
-
Hellinger Distance and Bayesian Non-Parametrics: Hierarchical Models for Robust and Efficient Bayesian Inference
Authors:
Yuefeng Wu,
Giles Hooker
Abstract:
This paper introduces a hierarchical framework to incorporate Hellinger distance methods into Bayesian analysis. We propose to modify a prior over non-parametric densities with the exponential of twice the Hellinger distance between a candidate and a parametric density. By incorporating a prior over the parameters of the second density, we arrive at a hierarchical model in which a non-parametric m…
▽ More
This paper introduces a hierarchical framework to incorporate Hellinger distance methods into Bayesian analysis. We propose to modify a prior over non-parametric densities with the exponential of twice the Hellinger distance between a candidate and a parametric density. By incorporating a prior over the parameters of the second density, we arrive at a hierarchical model in which a non-parametric model is placed between parameters and the data. The parameters of the family can then be estimated as hyperparameters in the model. In frequentist estimation, minimizing the Hellinger distance between a kernel density estimate and a parametric family has been shown to produce estimators that are both robust to outliers and statistically efficient when the parametric model is correct. In this paper, we demonstrate that the same results are applicable when a non-parametric Bayes density estimate replaces the kernel density estimate. We then demonstrate that robustness and efficiency also hold for the proposed hierarchical model. The finite-sample behavior of the resulting estimates is investigated by simulation and on real world data.
△ Less
Submitted 26 September, 2013;
originally announced September 2013.
-
On the Identifiability of the Functional Convolution Model
Authors:
Giles Hooker
Abstract:
This report details conditions under which the Functional Convolution Model described in \citet{AHG13} can be identified from Ordinary Least Squares estimates without either dimension reduction or smoothing penalties. We demonstrate that if the covariate functions are not spanned by the space of solutions to linear differential equations, the functional coefficients in the model are uniquely deter…
▽ More
This report details conditions under which the Functional Convolution Model described in \citet{AHG13} can be identified from Ordinary Least Squares estimates without either dimension reduction or smoothing penalties. We demonstrate that if the covariate functions are not spanned by the space of solutions to linear differential equations, the functional coefficients in the model are uniquely determined in the Sobolev space of functions with absolutely continuous second derivatives.
△ Less
Submitted 9 September, 2013;
originally announced September 2013.
-
Consistency, efficiency and robustness of conditional disparity methods
Authors:
Giles Hooker
Abstract:
This paper considers extensions of minimum-disparity estimators to the problem of estimating parameters in a regression model that is conditionally specified; that is where a parametric model describes the distribution of a response $y$ conditional on covariates $x$ but does not specify the distribution of $x$. We define these estimators by estimating a non-parametric conditional density estimates…
▽ More
This paper considers extensions of minimum-disparity estimators to the problem of estimating parameters in a regression model that is conditionally specified; that is where a parametric model describes the distribution of a response $y$ conditional on covariates $x$ but does not specify the distribution of $x$. We define these estimators by estimating a non-parametric conditional density estimates and minimizing a disparity between this estimate and the parametric model averaged over values of $x$. The consistency and asymptotic normality of such estimators is demonstrated for a broad class of models in which response and covariate vectors can take both discrete and continuous values and incorportates a wide set of choices for kernel-based conditional density estimation. It also establishes the robustness of these estimators for a broad class of disparities. As has been observed in Tamura and Boos (J. Amer. Statist. Assoc. 81 (1986) 223--229), minimum disparity estimators incorporating kernel density estimates of more than one dimension can result in an asymptotic bias that is larger that $n^{-1/2}$ and we characterize a similar bias in our results and show that in specialized cases it can be eliminated by appropriately centering the kernel density estimate. We also demonstrate empirically that bootstrap methods can be employed to reduce this bias and to provide robust confidence intervals. In order to demonstrate these results, we establish a set of $L_1$-consistency results for kernel-based estimates of centered conditional densities.
△ Less
Submitted 9 February, 2016; v1 submitted 14 July, 2013;
originally announced July 2013.
-
Bayesian Functional Generalized Additive Models with Sparsely Observed Covariates
Authors:
Mathew W. McLean,
Fabian Scheipl,
Giles Hooker,
Sonja Greven,
David Ruppert
Abstract:
The functional generalized additive model (FGAM) was recently proposed in McLean et al. (2013) as a more flexible alternative to the common functional linear model (FLM) for regressing a scalar on functional covariates. In this paper, we develop a Bayesian version of FGAM for the case of Gaussian errors with identity link function. Our approach allows the functional covariates to be sparsely obser…
▽ More
The functional generalized additive model (FGAM) was recently proposed in McLean et al. (2013) as a more flexible alternative to the common functional linear model (FLM) for regressing a scalar on functional covariates. In this paper, we develop a Bayesian version of FGAM for the case of Gaussian errors with identity link function. Our approach allows the functional covariates to be sparsely observed and measured with error, whereas the estimation procedure of McLean et al. (2013) required that they be noiselessly observed on a regular grid. We consider both Monte Carlo and variational Bayes methods for fitting the FGAM with sparsely observed covariates. Due to the complicated form of the model posterior distribution and full conditional distributions, standard Monte Carlo and variational Bayes algorithms cannot be used. The strategies we use to handle the updating of parameters without closed-form full conditionals should be of independent interest to applied Bayesian statisticians working with nonconjugate models. Our numerical studies demonstrate the benefits of our algorithms over a two-step approach of first recovering the complete trajectories using standard techniques and then fitting a functional regression model. In a real data analysis, our methods are applied to forecasting closing price for items up for auction on the online auction website eBay.
△ Less
Submitted 26 May, 2017; v1 submitted 15 May, 2013;
originally announced May 2013.
-
Control Theory and Experimental Design in Diffusion Processes
Authors:
Giles Hooker,
Kevin K. Lin,
Bruce Rogers
Abstract:
This paper considers the problem of designing time-dependent, real-time control policies for controllable nonlinear diffusion processes, with the goal of obtaining maximally-informative observations about parameters of interest. More precisely, we maximize the expected Fisher information for the parameter obtained over the duration of the experiment, conditional on observations made up to that tim…
▽ More
This paper considers the problem of designing time-dependent, real-time control policies for controllable nonlinear diffusion processes, with the goal of obtaining maximally-informative observations about parameters of interest. More precisely, we maximize the expected Fisher information for the parameter obtained over the duration of the experiment, conditional on observations made up to that time. We propose to accomplish this with a two-step strategy: when the full state vector of the diffusion process is observable continuously, we formulate this as an optimal control problem and apply numerical techniques from stochastic optimal control to solve it. When observations are incomplete, infrequent, or noisy, we propose using standard filtering techniques to first estimate the state of the system, then apply the optimal control policy using the posterior expectation of the state. We assess the effectiveness of these methods in 3 situations: a paradigmatic bistable model from statistical physics, a model of action potential generation in neurons, and a model of a simple ecological system.
△ Less
Submitted 15 October, 2014; v1 submitted 13 October, 2012;
originally announced October 2012.