Skip to main content

Showing 1–50 of 53 results for author: Hooker, G

.
  1. arXiv:2407.03085  [pdf, other

    stat.ME stat.CO stat.ML

    Accelerated Inference for Partially Observed Markov Processes using Automatic Differentiation

    Authors: Kevin Tan, Giles Hooker, Edward L. Ionides

    Abstract: Automatic differentiation (AD) has driven recent advances in machine learning, including deep neural networks and Hamiltonian Markov Chain Monte Carlo methods. Partially observed nonlinear stochastic dynamical systems have proved resistant to AD techniques because widely used particle filter algorithms yield an estimated likelihood function that is discontinuous as a function of the model paramete… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2406.18484  [pdf, other

    stat.ME

    An Understanding of Principal Differential Analysis

    Authors: Edward Gunning, Giles Hooker

    Abstract: In functional data analysis, replicate observations of a smooth functional process and its derivatives offer a unique opportunity to flexibly estimate continuous-time ordinary differential equation models. Ramsay (1996) first proposed to estimate a linear ordinary differential equation from functional data in a technique called Principal Differential Analysis, by formulating a functional regressio… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Main text: 34 pages, 16 figures. Appendix: 34 pages, 18 figures. References: 3 pages

  3. arXiv:2406.09699  [pdf, other

    math.NA math.DS physics.comp-ph stat.ML

    Differentiable Programming for Differential Equations: A Review

    Authors: Facundo Sapienza, Jordi Bolibar, Frank Schäfer, Brian Groenke, Avik Pal, Victor Boussange, Patrick Heimbach, Giles Hooker, Fernando Pérez, Per-Olof Persson, Christopher Rackauckas

    Abstract: The differentiable programming paradigm is a cornerstone of modern scientific computing. It refers to numerical methods for computing the gradient of a numerical model's output. Many scientific models are based on differential equations, where differentiable programming plays a crucial role in calculating model sensitivities, inverting model parameters, and training hybrid models that combine diff… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    MSC Class: 34-04; 49K40; 65D25; 65L09; 65M32; 86A22; 90C31

  4. arXiv:2404.18702  [pdf, other

    cs.LG cs.CR stat.AP stat.ML

    Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots

    Authors: Xi Xin, Giles Hooker, Fei Huang

    Abstract: The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework mo… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  5. arXiv:2403.00105  [pdf, other

    cs.LG cs.CY

    Longitudinal Counterfactuals: Constraints and Opportunities

    Authors: Alexander Asemota, Giles Hooker

    Abstract: Counterfactual explanations are a common approach to providing recourse to data subjects. However, current methodology can produce counterfactuals that cannot be achieved by the subject, making the use of counterfactuals for recourse difficult to justify in practice. Though there is agreement that plausibility is an important quality when using counterfactuals for algorithmic recourse, ground trut… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  6. arXiv:2401.15800  [pdf, other

    stat.ML cs.LG

    Provably Stable Feature Rankings with SHAP and LIME

    Authors: Jeremy Goldwasser, Giles Hooker

    Abstract: Feature attributions are ubiquitous tools for understanding the predictions of machine learning models. However, the calculation of popular methods for scoring input variables such as SHAP and LIME suffers from high instability due to random sampling. Leveraging ideas from multiple hypothesis testing, we devise attribution methods that ensure the most important features are ranked correctly with h… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  7. arXiv:2310.07672  [pdf, other

    stat.ML cs.LG

    Stabilizing Estimates of Shapley Values with Control Variates

    Authors: Jeremy Goldwasser, Giles Hooker

    Abstract: Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applic… ▽ More

    Submitted 9 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  8. arXiv:2211.12631  [pdf, other

    stat.ML cs.LG

    A Generic Approach for Reproducible Model Distillation

    Authors: Yunzhe Zhou, Peiru Xu, Giles Hooker

    Abstract: Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when kee** the teacher fixed, the corresponded interpretation is not reliable. Existing strategies… ▽ More

    Submitted 27 April, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: 31 pages, 8 figures

  9. arXiv:2209.00147  [pdf, other

    stat.ML cs.LG stat.ME

    The Infinitesimal Jackknife and Combinations of Models

    Authors: Indrayudh Ghosal, Yunzhe Zhou, Giles Hooker

    Abstract: The Infinitesimal Jackknife is a general method for estimating variances of parametric models, and more recently also for some ensemble methods. In this paper we extend the Infinitesimal Jackknife to estimate the covariance between any two models. This can be used to quantify uncertainty for combinations of models, or to construct test statistics for comparing different models or ensembles of mode… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: 47 pages, 11 figures

  10. S-LIME: Stabilized-LIME for Model Explanation

    Authors: Zhengze Zhou, Giles Hooker, Fei Wang

    Abstract: An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely us… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21), August 14--18, 2021, Virtual Event, Singapore

  11. arXiv:2102.12561  [pdf, other

    stat.ME stat.ML

    Generalised Boosted Forests

    Authors: Indrayudh Ghosal, Giles Hooker

    Abstract: This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the sam… ▽ More

    Submitted 2 March, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: Paper: 14 pages, 4 figures, 3 tables; Appendix: 34 pages, 28 figures, 1 table

  12. arXiv:2102.12328  [pdf, ps, other

    stat.OT cs.LG

    Bridging Breiman's Brook: From Algorithmic Modeling to Statistical Learning

    Authors: Lucas Mentch, Giles Hooker

    Abstract: In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developmen… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: In response to the Journal of Observational Studies reprinting Leo Breiman's paper "Statistical Modeling: The Two Cultures" on its 20th anniversary

  13. arXiv:2011.13057  [pdf, ps, other

    stat.ME stat.CO

    Generalized Single Index Models and Jensen Effects on Reproduction and Survival

    Authors: Zi Ye, Giles Hooker, Stephen P. Ellner

    Abstract: Environmental variability often has substantial impacts on natural populations and communities through its effects on the performance of individuals. Because organisms' responses to environmental conditions are often nonlinear (e.g., decreasing performance on both sides of an optimal temperature), the mean response is often different from the response in the mean environment. Ye et. al. 2020, prop… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    MSC Class: 62G08

  14. arXiv:2008.07859  [pdf, other

    stat.ME stat.CO

    Selecting the Derivative of a Functional Covariate in Scalar-on-Function Regression

    Authors: Giles Hooker, Hanlin Shang

    Abstract: This paper presents tests to formally choose between regression models using different derivatives of a functional covariate in scalar-on-function regression. We demonstrate that for linear regression, models using different derivatives can be nested within a model that includes point-impact effects at the end-points of the observed functions. Contrasts can then be employed to test the specificati… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

  15. arXiv:1912.01089  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    $V$-statistics and Variance Estimation

    Authors: Zhengze Zhou, Lucas Mentch, Giles Hooker

    Abstract: This paper develops a general framework for analyzing asymptotics of $V$-statistics. Previous literature on limiting distribution mainly focuses on the cases when $n \to \infty$ with fixed kernel size $k$. Under some regularity conditions, we demonstrate asymptotic normality when $k$ grows with $n$ by utilizing existing results for $U$-statistics. The key in our approach lies in a mathematical red… ▽ More

    Submitted 6 May, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: This version supersedes the previous technical report titled "Asymptotic Normality and Variance Estimation For Supervised Ensembles". Extensive simulations are added and we also provide a more detailed discussion on the bias phenomenon in variance estimation

  16. arXiv:1911.04974  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models

    Authors: Benjamin Lengerich, Sarah Tan, Chun-Hao Chang, Giles Hooker, Rich Caruana

    Abstract: Models which estimate main effects of individual variables alongside interaction effects have an identifiability challenge: effects can be freely moved between main effects and interaction effects without changing the model prediction. This is a critical problem for interpretability because it permits "contradictory" models to represent the same function. To solve this problem, we propose pure int… ▽ More

    Submitted 1 May, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: AISTATS 2020

  17. arXiv:1905.03151  [pdf, other

    stat.ME cs.LG stat.ML

    Unrestricted Permutation forces Extrapolation: Variable Importance Requires at least One More Model, or There Is No Free Variable Importance

    Authors: Giles Hooker, Lucas Mentch, Siyu Zhou

    Abstract: This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationall… ▽ More

    Submitted 7 October, 2021; v1 submitted 1 May, 2019; originally announced May 2019.

    MSC Class: 62G08 ACM Class: I.5.1

  18. arXiv:1904.01058  [pdf, other

    stat.ME

    Tree Boosted Varying Coefficient Models

    Authors: Yichen Zhou, Giles Hooker

    Abstract: This paper investigates the integration of gradient boosted decision trees and varying coefficient models. We introduce the tree boosted varying coefficient framework which justifies the implementation of decision tree boosting as the nonparametric effect modifiers in varying coefficient models. This framework requires no structural assumptions in the space containing the varying coefficient covar… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

  19. arXiv:1903.05179  [pdf, other

    stat.ML cs.LG

    Unbiased Measurement of Feature Importance in Tree-Based Methods

    Authors: Zhengze Zhou, Giles Hooker

    Abstract: We propose a modification that corrects for split-improvement variable importance measures in Random Forests and other tree-based methods. These methods have been shown to be biased towards increasing the importance of features with more potential splits. We show that by appropriately incorporating split-improvement as measured on out of sample data, this bias can be corrected yielding better summ… ▽ More

    Submitted 23 March, 2020; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: add Section 3.4 to compare with other methods for dealing with similar bias; add more simulation results in Section 5; add link to Github repository for code access

  20. arXiv:1901.01864  [pdf, other

    stat.ME

    The Jensen Effect and Functional Single Index Models: Estimating the Ecological Implications of Nonlinear Reaction Norms

    Authors: Zi Ye, Giles Hooker, Stephen Ellner

    Abstract: This paper develops tools to characterize how species are affected by environmental variability, based on a functional single index model relating a response such as growth rate or survival to environmental conditions. In ecology, the curvature of such responses are used, via Jensen's inequality, to determine whether environmental variability is harmful or beneficial, and differing nonlinear respo… ▽ More

    Submitted 16 December, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

  21. Asymptotic Properties for Methods Combining Minimum Hellinger Distance Estimates and Bayesian Nonparametric Density Estimates

    Authors: Yuefeng Wu, Giles Hooker

    Abstract: In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficienty when the parametric model is correct. This paper seeks to extend these results to the use of nonparametric Bayesian density estimators within disparity methods. We propose two estimators: one replaces… ▽ More

    Submitted 11 December, 2018; v1 submitted 18 October, 2018; originally announced October 2018.

  22. arXiv:1808.07573  [pdf, other

    stat.ML cs.LG

    Approximation Trees: Statistical Stability in Model Distillation

    Authors: Yichen Zhou, Zhengze Zhou, Giles Hooker

    Abstract: This paper examines the stability of learned explanations for black-box predictions via model distillation with decision trees. One approach to intelligibility in machine learning is to use an understandable `student' model to mimic the output of an accurate `teacher'. Here, we consider the use of regression trees as a student model, in which nodes of the tree can be used as `explanations' for par… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

    Comments: This paper supercedes arXiv:1610.09036

  23. arXiv:1806.09762  [pdf, other

    stat.ME math.ST stat.ML

    Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution

    Authors: Yichen Zhou, Giles Hooker

    Abstract: This paper examines a novel gradient boosting framework for regression. We regularize gradient boosted trees by introducing subsampling and employ a modified shrinkage algorithm so that at every boosting stage the estimate is given by an average of trees. The resulting algorithm, titled Boulevard, is shown to converge as the number of trees grows. We also demonstrate a central limit theorem for th… ▽ More

    Submitted 13 September, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: 45 pages, 7 figures

  24. arXiv:1803.09321  [pdf, ps, other

    math.ST stat.ME

    Local Quadratic Estimation of the Curvature in a Functional Single Index Model

    Authors: Zi Ye, Giles Hooker

    Abstract: The nonlinear effects of environmental variability on species abundance plays an important role in the maintenance of ecological diversity. Nonetheless, many common models use parametric nonlinear terms pre-determining ecological conclusions. Motivated by this concern, we study the estimate of the second derivative (curvature) of the link function g in a functional single index model. Since the co… ▽ More

    Submitted 25 March, 2018; originally announced March 2018.

  25. arXiv:1803.08000  [pdf, other

    stat.ML cs.LG stat.ME

    Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate

    Authors: Indrayudh Ghosal, Giles Hooker

    Abstract: In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a \textit{one-step boosted forest}. We show with simulated and real data that the one-step boosted forest h… ▽ More

    Submitted 22 April, 2020; v1 submitted 21 March, 2018; originally announced March 2018.

    Comments: 39 pages, 7 tables, 3 figures

  26. Considerations When Learning Additive Explanations for Black-Box Models

    Authors: Sarah Tan, Giles Hooker, Paul Koch, Albert Gordo, Rich Caruana

    Abstract: Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations adapted to a global setting, distilled additive explanations, and gradient-based explanations. We show that different explanation methods characterize non-additiv… ▽ More

    Submitted 31 July, 2023; v1 submitted 25 January, 2018; originally announced January 2018.

    Comments: Published at Machine Learning (2023). Previously titled "Learning Global Additive Explanations for Neural Nets Using Model Distillation". A short version was presented at NeurIPS 2018 Machine Learning for Health Workshop

  27. arXiv:1711.07104  [pdf, other

    stat.ML

    A Double Parametric Bootstrap Test for Topic Models

    Authors: Skyler Seto, Sarah Tan, Giles Hooker, Martin T. Wells

    Abstract: Non-negative matrix factorization (NMF) is a technique for finding latent representations of data. The method has been applied to corpora to construct topic models. However, NMF has likelihood assumptions which are often violated by real document corpora. We present a double parametric bootstrap test for evaluating the fit of an NMF-based topic model based on the duality of the KL divergence and P… ▽ More

    Submitted 20 November, 2017; v1 submitted 19 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

  28. arXiv:1710.09793  [pdf, other

    q-bio.PE stat.AP

    Statistical Inference on Tree Swallow Migrations with Random Forests

    Authors: Tim Coleman, Lucas Mentch, Daniel Fink, Frank La Sorte, Giles Hooker, Wesley Hochachka, David Winkler

    Abstract: Bird species' migratory patterns have typically been studied through individual observations and historical records. In recent years however, the eBird citizen science project, which solicits observations from thousands of bird watchers around the world, has opened the door for a data-driven approach to understanding the large-scale geographical movements. Here, we focus on the North American Tree… ▽ More

    Submitted 8 November, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

    Comments: 23 pages, 7 figures. Work between Cornell Lab of Ornithology and University of Pittsburgh Department of Statistics

  29. arXiv:1710.06169  [pdf, other

    stat.ML cs.AI cs.LG

    Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation

    Authors: Sarah Tan, Rich Caruana, Giles Hooker, Yin Lou

    Abstract: Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillatio… ▽ More

    Submitted 11 October, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

    Comments: Camera-ready version for AAAI/ACM AIES 2018. Data and pseudocode at https://github.com/shftan/auditblackbox. Previously titled "Detecting Bias in Black-Box Models Using Transparent Model Distillation". A short version was presented at NIPS 2017 Symposium on Interpretable Machine Learning

  30. arXiv:1709.00771  [pdf, ps, other

    stat.ME

    Timing Observations of Diffusions

    Authors: Aurya Javeed, Giles Hooker

    Abstract: This paper addresses a problem in experimental design: We consider Itô diffusions specified by some $θ\in \mathbb{R}$ and assume that we are allowed to observe their sample paths only $n$ times before a terminal time $τ< \infty$. We propose a policy for timing these observations to optimally estimate $θ$. Our policy is adaptive (meaning it leverages earlier observations), and it maximizes the expe… ▽ More

    Submitted 3 September, 2017; originally announced September 2017.

  31. arXiv:1708.04490  [pdf, other

    stat.CO stat.AP stat.ME

    Sparse Inverse Covariance Estimation for High-throughput microRNA Sequencing Data in the Poisson Log-Normal Graphical Model

    Authors: David Sinclair, Giles Hooker

    Abstract: We introduce the Poisson Log-Normal Graphical Model for count data, and present a normality transformation for data arising from this distribution. The model and transformation are feasible for high-throughput microRNA (miRNA) sequencing data and directly account for known overdispersion relationships present in this data set. The model allows for network dependencies to be modeled, and we provide… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: 21 pages, 3 figures

  32. arXiv:1704.05995  [pdf, other

    stat.ME

    An Expectation Maximization Algorithm for High-Dimensional Model Selection for the Ising Model with Misclassified States

    Authors: David G. Sinclair, Giles Hooker

    Abstract: We propose the misclassified Ising Model; a framework for analyzing dependent binary data where the binary state is susceptible to error. We extend the theoretical results of the model selection method presented in Ravikumar et. al. (2010) to show that the method will still correctly identify edges in the underlying graphical model under suitable misclassification settings. With knowledge of the m… ▽ More

    Submitted 19 April, 2017; originally announced April 2017.

  33. arXiv:1704.04688  [pdf

    stat.ML cs.LG

    Machine Learning and the Future of Realism

    Authors: Giles Hooker, Cliff Hooker

    Abstract: The preceding three decades have seen the emergence, rise, and proliferation of machine learning (ML). From half-recognised beginnings in perceptrons, neural nets, and decision trees, algorithms that extract correlations (that is, patterns) from a set of data points have broken free from their origin in computational cognition to embrace all forms of problem solving, from voice recognition to medi… ▽ More

    Submitted 15 April, 2017; originally announced April 2017.

  34. arXiv:1611.07115  [pdf, other

    stat.ML cs.LG

    Tree Space Prototypes: Another Look at Making Tree Ensembles Interpretable

    Authors: Sarah Tan, Matvey Soloviev, Giles Hooker, Martin T. Wells

    Abstract: Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Booste… ▽ More

    Submitted 25 August, 2020; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: Camera-ready version for ACM-IMS FODS 2020. A short version was presented at NIPS 2016 Workshop on Interpretable Machine Learning for Complex Systems

  35. arXiv:1610.09036  [pdf, other

    stat.ME

    Interpreting Models via Single Tree Approximation

    Authors: Yichen Zhou, Giles Hooker

    Abstract: We propose a procedure to build a decision tree which approximates the performance of complex machine learning models. This single approximation tree can be used to interpret and simplify the predicting pattern of random forests (RFs) and other models. The use of a tree structure is particularly relevant in medical questionnaires where it enables an adaptive shortening of the questionnaire, reduci… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

  36. arXiv:1506.00553  [pdf, other

    stat.ML

    Bootstrap Bias Corrections for Ensemble Methods

    Authors: Giles Hooker, Lucas Mentch

    Abstract: This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning methods. We demonstrate empirically that the proposed bootstrap bias correction can lead to substantial improvements in both bias and predictive accuracy. In the context o… ▽ More

    Submitted 1 June, 2015; originally announced June 2015.

  37. arXiv:1502.00587  [pdf, other

    stat.ME

    Combining Functional Data Registration and Factor Analysis

    Authors: Cecilia Earls, Giles Hooker

    Abstract: We extend the definition of functional data registration to encompass a larger class of registered functions. In contrast to traditional registration models, we allow for registered functions that have more than one primary direction of variation. The proposed Bayesian hierarchical model simultaneously registers the observed functions and estimates the two primary factors that characterize variati… ▽ More

    Submitted 4 June, 2015; v1 submitted 2 February, 2015; originally announced February 2015.

    Comments: The paper was updated with a better real data example

  38. arXiv:1502.00552  [pdf, other

    stat.ME

    Adapted Variational Bayes for Functional Data Registration, Smoothing, and Prediction

    Authors: Cecilia Earls, Giles Hooker

    Abstract: We propose a model for functional data registration that compares favorably to the best methods of functional data registration currently available. It also extends current inferential capabilities for unregistered data by providing a flexible probabilistic framework that 1) allows for functional prediction in the context of registration and 2) can be adapted to include smoothing and registration… ▽ More

    Submitted 3 June, 2016; v1 submitted 2 February, 2015; originally announced February 2015.

    Comments: Additional details are included in this version in response to reviewer comments. All main results are unchanged

  39. arXiv:1411.4681  [pdf, other

    math.ST

    Functional Principal Components Analysis of Spatially Correlated Data

    Authors: Chong Liu, Surajit Ray, Giles Hooker

    Abstract: This paper focuses on the analysis of spatially correlated functional data. The between-curve correlation is modeled by correlating functional principal component scores of the functional data. We propose a Spatial Principal Analysis by Conditional Expectation framework to explicitly estimate spatial correlations and reconstruct individual curves. This approach works even when the observed data pe… ▽ More

    Submitted 17 November, 2014; originally announced November 2014.

  40. arXiv:1407.4578  [pdf, other

    stat.ME

    Maximal Autocorrelation Functions in Functional Data Analysis

    Authors: Giles Hooker, Steven Roberts

    Abstract: This paper proposes a new factor rotation for the context of functional principal components analysis. This rotation seeks to re-represent a functional subspace in terms of directions of decreasing smoothness as represented by a generalized smoothing metric. The rotation can be implemented simply and we show on two examples that this rotation can improve the interpretability of the leading compone… ▽ More

    Submitted 17 July, 2014; originally announced July 2014.

    Comments: 10 pages 2 figures

  41. arXiv:1406.7732  [pdf, ps, other

    stat.ME

    Truncated Linear Models for Functional Data

    Authors: Peter Hall, Giles Hooker

    Abstract: A conventional linear model for functional data involves expressing a response variable $Y$ in terms of the explanatory function $X(t)$, via the model: $Y=a+\int_I b(t)X(t)dt+\hbox{error}$, where $a$ is a scalar, $b$ is an unknown function and $I=[0, α]$ is a compact interval. However, in some problems the support of $b$ or $X$, $I_1$ say, is a proper and unknown subset of $I$, and is a quantity o… ▽ More

    Submitted 30 June, 2014; originally announced June 2014.

  42. arXiv:1406.1845  [pdf, other

    stat.ML stat.AP

    Formal Hypothesis Tests for Additive Structure in Random Forests

    Authors: Lucas Mentch, Giles Hooker

    Abstract: While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference. However, the natural structure of ensemble learners like bagged trees and random forests has been shown to admit desirable asymptotic properties when base learners are built with… ▽ More

    Submitted 26 August, 2016; v1 submitted 6 June, 2014; originally announced June 2014.

  43. arXiv:1404.6473  [pdf, other

    stat.ML stat.AP stat.CO stat.ME

    Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

    Authors: Lucas Mentch, Giles Hooker

    Abstract: This work develops formal statistical inference procedures for machine learning ensemble methods. Ensemble methods based on bootstrap**, such as bagging and random forests, have improved the predictive accuracy of individual trees, but fail to provide a framework in which distributional results can be easily determined. Instead of aggregating full bootstrap samples, we consider predicting by ave… ▽ More

    Submitted 10 September, 2015; v1 submitted 25 April, 2014; originally announced April 2014.

    Comments: To appear in The Journal of Machine Learning Research

  44. Goodness of fit in nonlinear dynamics: Misspecified rates or misspecified states?

    Authors: Giles Hooker, Stephen P. Ellner

    Abstract: This paper introduces diagnostic tests for the nature of lack of fit in ordinary differential equation models (ODEs) proposed for data. We present a hierarchy of three possible sources of lack of fit: unaccounted-for stochastic variation, misspecification of functional forms in rate equations, and omission of dynamic variables in the description of the system. We represent lack of fit by allowing… ▽ More

    Submitted 17 September, 2015; v1 submitted 1 December, 2013; originally announced December 2013.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS828 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS828

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 754-776

  45. Restricted Likelihood Ratio Tests for Linearity in Scalar-on-Function Regression

    Authors: Mathew W. McLean, Giles Hooker, David Ruppert

    Abstract: We propose a procedure for testing the linearity of a scalar-on-function regression relationship. To do so, we use the functional generalized additive model (FGAM), a recently developed extension of the functional linear model. For a functional covariate X(t), the FGAM models the mean response as the integral with respect to t of F{X(t),t} where F is an unknown bivariate function. The FGAM can be… ▽ More

    Submitted 22 October, 2013; originally announced October 2013.

  46. arXiv:1309.6906  [pdf, ps, other

    stat.ME

    Hellinger Distance and Bayesian Non-Parametrics: Hierarchical Models for Robust and Efficient Bayesian Inference

    Authors: Yuefeng Wu, Giles Hooker

    Abstract: This paper introduces a hierarchical framework to incorporate Hellinger distance methods into Bayesian analysis. We propose to modify a prior over non-parametric densities with the exponential of twice the Hellinger distance between a candidate and a parametric density. By incorporating a prior over the parameters of the second density, we arrive at a hierarchical model in which a non-parametric m… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    MSC Class: 62F35; 62F12; 62G07

  47. arXiv:1309.2178  [pdf, ps, other

    math.ST

    On the Identifiability of the Functional Convolution Model

    Authors: Giles Hooker

    Abstract: This report details conditions under which the Functional Convolution Model described in \citet{AHG13} can be identified from Ordinary Least Squares estimates without either dimension reduction or smoothing penalties. We demonstrate that if the covariate functions are not spanned by the space of solutions to linear differential equations, the functional coefficients in the model are uniquely deter… ▽ More

    Submitted 9 September, 2013; originally announced September 2013.

  48. Consistency, efficiency and robustness of conditional disparity methods

    Authors: Giles Hooker

    Abstract: This paper considers extensions of minimum-disparity estimators to the problem of estimating parameters in a regression model that is conditionally specified; that is where a parametric model describes the distribution of a response $y$ conditional on covariates $x$ but does not specify the distribution of $x$. We define these estimators by estimating a non-parametric conditional density estimates… ▽ More

    Submitted 9 February, 2016; v1 submitted 14 July, 2013; originally announced July 2013.

    Comments: Published at http://dx.doi.org/10.3150/14-BEJ678 in the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

    Report number: IMS-BEJ-BEJ678

    Journal ref: Bernoulli 2016, Vol. 22, No. 2, 857-900

  49. arXiv:1305.3585  [pdf, other

    stat.ME stat.CO

    Bayesian Functional Generalized Additive Models with Sparsely Observed Covariates

    Authors: Mathew W. McLean, Fabian Scheipl, Giles Hooker, Sonja Greven, David Ruppert

    Abstract: The functional generalized additive model (FGAM) was recently proposed in McLean et al. (2013) as a more flexible alternative to the common functional linear model (FLM) for regressing a scalar on functional covariates. In this paper, we develop a Bayesian version of FGAM for the case of Gaussian errors with identity link function. Our approach allows the functional covariates to be sparsely obser… ▽ More

    Submitted 26 May, 2017; v1 submitted 15 May, 2013; originally announced May 2013.

    Comments: substantial updates based on referee comments

  50. arXiv:1210.3739  [pdf, ps, other

    stat.ME

    Control Theory and Experimental Design in Diffusion Processes

    Authors: Giles Hooker, Kevin K. Lin, Bruce Rogers

    Abstract: This paper considers the problem of designing time-dependent, real-time control policies for controllable nonlinear diffusion processes, with the goal of obtaining maximally-informative observations about parameters of interest. More precisely, we maximize the expected Fisher information for the parameter obtained over the duration of the experiment, conditional on observations made up to that tim… ▽ More

    Submitted 15 October, 2014; v1 submitted 13 October, 2012; originally announced October 2012.