Skip to main content

Showing 1–50 of 59 results for author: Cook, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.11029  [pdf, other

    stat.AP math.ST

    Model-assisted estimation of domain totals, areas, and densities in two-stage sample survey designs

    Authors: Hans-Erik Andersen, Göran Ståhl, Bruce D. Cook, Douglas C. Morton, Andrew O. Finley

    Abstract: Model-assisted, two-stage forest survey sampling designs provide a means to combine airborne remote sensing data, collected in a sampling mode, with field plot data to increase the precision of national forest inventory estimates, while maintaining important properties of design-based inventories, such as unbiased estimation and quantification of uncertainty. In this study, we present a comprehens… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  2. arXiv:2401.05812  [pdf, other

    stat.CO

    A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data

    Authors: H. Sherry Zhang, Dianne Cook, Ursula Laa, Nicolas Langrené, Patricia Menéndez

    Abstract: Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation… ▽ More

    Submitted 13 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  3. arXiv:2311.08181  [pdf, other

    stat.ME

    Frame to frame interpolation for high-dimensional data visualisation using the woylier package

    Authors: Zoljargal Batsaikhan, Dianne Cook, Ursula Laa

    Abstract: The woylier package implements tour interpolation paths between frames using Givens rotations. This provides an alternative to the geodesic interpolation between planes currently available in the tourr package. Tours are used to visualise high-dimensional data and models, to detect clustering, anomalies and non-linear relationships. Frame-to-frame interpolation can be useful for projection pursuit… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2308.10505  [pdf, other

    cs.LG stat.AP stat.CO stat.ML

    A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

    Authors: Weihao Li, Emily Dodwell, Dianne Cook

    Abstract: This paper proposes a spatiotemporal clustering algorithm and its implementation in the R package spotoroo. This work is motivated by the catastrophic bushfires in Australia throughout the summer of 2019-2020 and made possible by the availability of satellite hotspot data. The algorithm is inspired by two existing spatiotemporal clustering algorithms but makes enhancements to cluster points spatia… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  5. arXiv:2308.05964  [pdf, other

    stat.AP

    A Plot is Worth a Thousand Tests: Assessing Residual Diagnostics with the Lineup Protocol

    Authors: Weihao Li, Dianne Cook, Emi Tanaka, Susan VanderPlas

    Abstract: Regression experts consistently recommend plotting residuals for model diagnosis, despite the availability of many numerical hypothesis test procedures designed to use residuals to assess problems with a model fit. Here we provide evidence for why this is good advice using data from a visual inference experiment. We show how conventional tests are too sensitive, which means that too often the conc… ▽ More

    Submitted 24 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

  6. arXiv:2303.17331  [pdf, other

    stat.ME stat.AP

    Multiple Imputation Approaches for Epoch-level Accelerometer data in Trials

    Authors: Mia S. Tackney, Elizabeth Williamson, Derek G. Cook, Elizabeth Limb, Tess Harris, James Carpenter

    Abstract: Clinical trials that investigate interventions on physical activity often use accelerometers to measure step count at a very granular level, often in 5-second epochs. Participants typically wear the accelerometer for a week-long period at baseline, and for one or more week-long follow-up periods after the intervention. The data is usually aggregated to provide daily or weekly step counts for the p… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 32 pages, 16 figures, 2 tables

  7. arXiv:2302.13356  [pdf, other

    stat.ML cs.LG stat.AP

    Performance is not enough: the story told by a Rashomon quartet

    Authors: Przemyslaw Biecek, Hubert Baniecki, Mateusz Krzyzinski, Dianne Cook

    Abstract: The usual goal of supervised learning is to find the best model, the one that optimizes a particular performance measure. However, what if the explanation provided by this model is completely different from another model and different again from another model despite all having similarly good fit statistics? Is it possible that the equally effective models put the spotlight on different relationsh… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 February, 2023; originally announced February 2023.

  8. arXiv:2302.06410  [pdf, other

    stat.AP

    Models to support forest inventory and small area estimation using sparsely sampled LiDAR: A case study involving G-LiHT LiDAR in Tanana, Alaska

    Authors: Andrew O. Finley, Hans-Erik Andersen, Chad Babcock, Bruce D. Cook, Douglas C. Morton, Sudipto Banerjee

    Abstract: A two-stage hierarchical Bayesian model is developed and implemented to estimate forest biomass density and total given sparsely sampled LiDAR and georeferenced forest inventory plot measurements. The model is motivated by the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) objective to provide biomass estimates for the remote Tanana Inventory Unit… ▽ More

    Submitted 31 January, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

  9. arXiv:2301.00077  [pdf, other

    stat.AP

    A Study on a User-Controlled Radial Tour for Variable Importance in High-Dimensional Data

    Authors: Nicholas Spyrison, Dianne Cook, Kim Marriott

    Abstract: Principal component analysis is a long-standing go-to method for exploring multivariate data. The principal components are linear combinations of the original variables, ordered by descending variance. The first few components typically provide a good visual summary of the data. Tours also make linear projections of the original variables but offer many different views, like examining the data fro… ▽ More

    Submitted 30 December, 2022; originally announced January 2023.

    Comments: 9 pages, 8 figures, 2 tables

  10. arXiv:2210.05228  [pdf, other

    stat.CO

    New and simplified manual controls for projection and slice tours, with application to exploring classification boundaries in high dimensions

    Authors: Ursula Laa, Alex Aumann, Dianne Cook, German Valencia

    Abstract: This paper describes new user controls for examining high-dimensional data using low-dimensional linear projections and slices. A user can interactively change the contribution of a given variable to a low-dimensional projection, which is useful for exploring the sensitivity of structure to particular variables. The user can also interactively shift the center of a slice, for example, to explore h… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: 16 pages, 9 figures

  11. arXiv:2209.11797  [pdf, other

    stat.AP

    Quantifying and correcting geolocation error in spaceborne LiDAR forest canopy observations using high spatial accuracy ALS: A Bayesian model approach

    Authors: Elliot S. Shannon, Andrew O. Finley, Daniel J. Hayes, Sylvia N. Noralez, Aaron R. Weiskittel, Bruce D. Cook, Chad Babcock

    Abstract: Geolocation error in spaceborne sampling light detection and ranging (LiDAR) measurements of forest structure can compromise forest attribute estimates and degrade integration with georeferenced field measurements or other remotely sensed data. Data integration is especially problematic when geolocation error is not well quantified. We propose a general model that uses airborne laser scanning (ALS… ▽ More

    Submitted 23 August, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

  12. arXiv:2205.06417  [pdf, other

    stat.OT

    A Journey from Wild to Textbook Data to Reproducibly Refresh the Wages Data from the National Longitudinal Survey of Youth Database

    Authors: Dewi Amaliah, Dianne Cook, Emi Tanaka, Kate Hyde, Nicholas Tierney

    Abstract: Textbook data is essential for teaching statistics and data science methods because they are clean, allowing the instructor to focus on methodology. Ideally textbook data sets are refreshed regularly, especially when they are subsets taken from an on-going data collection. It is also important to use contemporary data for teaching, to imbue the sense that the methodology is relevant today. This pa… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  13. arXiv:2205.05359  [pdf

    stat.ML cs.AI cs.LG

    Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections

    Authors: Nicholas Spyrison, Dianne Cook, Przemyslaw Biecek

    Abstract: The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to a… ▽ More

    Submitted 18 January, 2024; v1 submitted 11 May, 2022; originally announced May 2022.

  14. arXiv:2205.00259  [pdf, other

    stat.CO stat.ME

    cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data

    Authors: H. Sherry Zhang, Dianne Cook, Ursula Laa, Nicolas Langrené, Patricia Menéndez

    Abstract: Multivariate spatio-temporal data refers to multiple measurements taken across space and time. For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for some studies, it is important to analyse different aspects… ▽ More

    Submitted 10 January, 2024; v1 submitted 30 April, 2022; originally announced May 2022.

  15. arXiv:2111.06941  [pdf

    stat.ME econ.EM

    Absolute and Relative Bias in Eight Common Observational Study Designs: Evidence from a Meta-analysis

    Authors: Jelena Zurovac, Thomas D. Cook, John Deke, Mariel M. Finucane, Duncan Chaplin, Jared S. Coopersmith, Michael Barna, Lauren Vollmer Forrow

    Abstract: Observational studies are needed when experiments are not possible. Within study comparisons (WSC) compare observational and experimental estimates that test the same hypothesis using the same treatment group, outcome, and estimand. Meta-analyzing 39 of them, we compare mean bias and its variance for the eight observational designs that result from combining whether there is a pretest measure of t… ▽ More

    Submitted 15 November, 2021; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: 39 pages, 2 tables, 1 figure. Working paper

  16. arXiv:2104.08016  [pdf, other

    cs.GR stat.OT

    A Review of the State-of-the-Art on Tours for Dynamic Visualization of High-dimensional Data

    Authors: Stuart Lee, Dianne Cook, Natalia da Silva, Ursula Laa, Earo Wang, Nick Spyrison, H. Sherry Zhang

    Abstract: This article discusses a high-dimensional visualization technique called the tour, which can be used to view data in more than three dimensions. We review the theory and history behind the technique, as well as modern software developments and applications of the tour that are being found across the sciences and machine learning.

    Submitted 19 April, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

  17. Visual Diagnostics for Constrained Optimisation with Application to Guided Tours

    Authors: H. Sherry Zhang, Dianne Cook, Ursula Laa, Nicolas Langrené, Patricia Menéndez

    Abstract: A guided tour helps to visualise high-dimensional data by showing low-dimensional projections along a projection pursuit optimisation path. Projection pursuit is a generalisation of principal component analysis, in the sense that different indexes are used to define the interestingness of the projected data. While much work has been done in develo** new indexes in the literature, less has been d… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Journal ref: R Journal 13(2) 624-641 (2021)

  18. arXiv:2101.00514  [pdf, other

    stat.ME math.ST

    Envelopes for multivariate linear regression with linearly constrained coefficients

    Authors: Dennis Cook, Liliana Forzani, Lan Liu

    Abstract: A constrained multivariate linear model is a multivariate linear model with the columns of its coefficient matrix constrained to lie in a known subspace. This class of models includes those typically used to study growth curves and longitudinal data. Envelope methods have been proposed to improve estimation efficiency in the class of unconstrained multivariate linear models, but have not yet been… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

  19. arXiv:2012.06077  [pdf, other

    stat.OT stat.AP

    Casting Multiple Shadows: High-Dimensional Interactive Data Visualisation with Tours and Embeddings

    Authors: Stuart Lee, Ursula Laa, Dianne Cook

    Abstract: Non-linear dimensionality reduction (NLDR) methods such as t-distributed stochastic neighbour embedding (t-SNE) are ubiquitous in the natural sciences, however, the appropriate use of these methods is difficult because of their complex parameterisations; analysts must make trade-offs in order to identify structure in the visualisation of an NLDR technique. We present visual diagnostics for the pra… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

    Comments: 25 pages, 7 figures, submitted to JDSSV

  20. arXiv:2012.01619  [pdf, other

    stat.AP stat.ME

    brolgar: An R package to BRowse Over Longitudinal Data Graphically and Analytically in R

    Authors: Nicholas J Tierney, Dianne Cook, Tania Prvan

    Abstract: Longitudinal (panel) data provide the opportunity to examine temporal patterns of individuals, because measurements are collected on the same person at different, and often irregular, time points. The data is typically visualised using a "spaghetti plot", where a line plot is drawn for each individual. When overlaid in one plot, it can have the appearance of a bowl of spaghetti. With even a small… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: 19 pages, 14 figures

  21. arXiv:2011.06436  [pdf, other

    stat.ME math.ST physics.soc-ph

    Fundamentals of path analysis in the social sciences

    Authors: R. Dennis Cook, Liliana Forzani

    Abstract: Motivated by a recent series of diametrically opposed articles on the relative value of statistical methods for the analysis of path diagrams in the social sciences, we discuss from a primarily theoretical perspective selected fundamental aspects of path modeling and analysis based on a common re reflexive setting. Since there is a paucity of technical support evident in the debate, our aim is to… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  22. arXiv:2011.00119  [pdf, other

    stat.ME

    Enveloped Huber Regression

    Authors: Le Zhou, R. Dennis Cook, Hui Zou

    Abstract: Huber regression (HR) is a popular robust alternative to the least squares regression when the error follows a heavy-tailed distribution. We propose a new method called the enveloped Huber regression (EHR) by considering the envelope assumption that there exists some subspace of the predictors that has no association with the response, which is referred to as the immaterial part. More efficient es… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

  23. arXiv:2010.00794  [pdf, other

    stat.AP stat.CO stat.ME

    Visualizing probability distributions across bivariate cyclic temporal granularities

    Authors: Sayani Gupta, Rob J Hyndman, Dianne Cook, Antony Unwin

    Abstract: Deconstructing a time index into time granularities can assist in exploration and automated analysis of large temporal data sets. This paper describes classes of time deconstructions using linear and cyclic time granularities. Linear granularities respect the linear progression of time such as hours, days, weeks and months. Cyclic granularities can be circular such as hour-of-the-day, quasi-circul… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: 32 pages, 6 figures, 7 tables

  24. arXiv:2009.10979  [pdf, other

    stat.CO

    Burning sage: Reversing the curse of dimensionality in the visualization of high-dimensional data

    Authors: Ursula Laa, Dianne Cook, Stuart Lee

    Abstract: In high-dimensional data analysis the curse of dimensionality reasons that points tend to be far away from the center of the distribution and on the edge of high-dimensional space. Contrary to this, is that projected data tends to clump at the center. This gives a sense that any structure near the center of the projection is obscured, whether this is true or not. A transformation to reverse the cu… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  25. arXiv:2007.06062  [pdf, other

    cs.LG cs.HC stat.ML

    Transfer Learning for Activity Recognition in Mobile Health

    Authors: Yuchao Ma, Andrew T. Campbell, Diane J. Cook, John Lach, Shwetak N. Patel, Thomas Ploetz, Majid Sarrafzadeh, Donna Spruijt-Metz, Hassan Ghasemzadeh

    Abstract: While activity recognition from inertial sensors holds potential for mobile health, differences in sensing platforms and user movement patterns cause performance degradation. Aiming to address these challenges, we propose a transfer learning framework, TransFall, for sensor-based activity recognition. TransFall's design contains a two-tier data transformation, a label estimation layer, and a model… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

  26. arXiv:2005.10996  [pdf, other

    cs.LG stat.ML

    Multi-Source Deep Domain Adaptation with Weak Supervision for Time-Series Sensor Data

    Authors: Garrett Wilson, Janardhan Rao Doppa, Diane J. Cook

    Abstract: Domain adaptation (DA) offers a valuable means to reuse data and models for new problem domains. However, robust techniques have not yet been considered for time series data with varying amounts of data availability. In this paper, we make three main contributions to fill this gap. First, we propose a novel Convolutional deep Domain Adaptation model for Time Series data (CoDATS) that significantly… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

    Comments: Accepted at KDD 2020

  27. Hole or grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions

    Authors: Ursula Laa, Dianne Cook, Andreas Buja, German Valencia

    Abstract: Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This paper develops a section pursuit method, building on the… ▽ More

    Submitted 10 March, 2022; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: v3 is accepted for publication in JCGS and contains the appendix

    Journal ref: Journal of Computational and Graphical Statistics, 2022

  28. arXiv:1910.10854  [pdf, other

    stat.CO cs.HC hep-ex

    A slice tour for finding hollowness in high-dimensional data

    Authors: Ursula Laa, Dianne Cook, German Valencia

    Abstract: Taking projections of high-dimensional data is a common analytical and visualisation technique in statistics for working with high-dimensional problems. Sectioning, or slicing, through high dimensions is less common, but can be useful for visualising data with concavities, or non-linear structure. It is associated with conditional distributions in statistics, and also linked brushing between plots… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 13 pages, 6 figures

    Journal ref: Journal of Computational and Graphical Statistics 29 (2020) 681-687

  29. arXiv:1909.04791  [pdf, ps, other

    cs.LG stat.ML

    A Survey of Techniques All Classifiers Can Learn from Deep Networks: Models, Optimizations, and Regularization

    Authors: Alireza Ghods, Diane J Cook

    Abstract: Deep neural networks have introduced novel and useful tools to the machine learning community. Other types of classifiers can potentially make use of these tools as well to improve their performance and generality. This paper reviews the current state of the art for deep learning classifier technologies that are being used outside of deep neural networks. Non-network classifiers can employ many co… ▽ More

    Submitted 27 September, 2019; v1 submitted 10 September, 2019; originally announced September 2019.

  30. arXiv:1907.10109  [pdf, other

    stat.ME stat.AP stat.CO

    Conjugate Nearest Neighbor Gaussian Process Models for Efficient Statistical Interpolation of Large Spatial Data

    Authors: Shinichiro Shirota, Andrew O. Finley, Bruce D. Cook, Sudipto Banerjee

    Abstract: A key challenge in spatial statistics is the analysis for massive spatially-referenced data sets. Such analyses often proceed from Gaussian process specifications that can produce rich and robust inference, but involve dense covariance matrices that lack computationally exploitable structures. The matrix computations required for fitting such models involve floating point operations in cubic order… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

  31. arXiv:1907.07802  [pdf, other

    cs.LG stat.ML

    Multi-Purposing Domain Adaptation Discriminators for Pseudo Labeling Confidence

    Authors: Garrett Wilson, Diane J. Cook

    Abstract: Often domain adaptation is performed using a discriminator (domain classifier) to learn domain-invariant feature representations so that a classifier trained on labeled source data will generalize well to unlabeled target data. A line of research stemming from semi-supervised learning uses pseudo labeling to directly generate "pseudo labels" for the unlabeled target data and trains a classifier on… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

  32. arXiv:1902.00181  [pdf, other

    stat.ME astro-ph.IM hep-ex hep-ph physics.data-an

    Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics

    Authors: Ursula Laa, Dianne Cook

    Abstract: Projection pursuit is used to find interesting low-dimensional projections of high-dimensional data by optimizing an index over all possible projections. Most indexes have been developed to detect departure from known distributions, such as normality, or to find separations between known groups. Here, we are interested in finding projections revealing potentially complex bivariate patterns, using… ▽ More

    Submitted 13 January, 2020; v1 submitted 31 January, 2019; originally announced February 2019.

    Comments: 39 pages, 13 figures

  33. arXiv:1901.10257  [pdf, other

    stat.AP stat.CO

    A new tidy data structure to support exploration and modeling of temporal data

    Authors: Earo Wang, Dianne Cook, Rob J Hyndman

    Abstract: Mining temporal data for information is often inhibited by a multitude of formats: irregular or multiple time intervals, point events that need aggregating, multiple observational units or repeated measurements on multiple individuals, and heterogeneous data types. On the other hand, the software supporting time series modeling and forecasting, makes strict assumptions on the data to be provided,… ▽ More

    Submitted 13 February, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Revision on Section 4 and 5

  34. arXiv:1812.02849  [pdf, other

    cs.LG stat.ML

    A Survey of Unsupervised Deep Domain Adaptation

    Authors: Garrett Wilson, Diane J. Cook

    Abstract: Deep learning has produced state-of-the-art results for a variety of tasks. While such approaches for supervised learning have performed well, they assume that training and testing data are drawn from the same distribution, which may not always be the case. As a complement to this challenge, single-source unsupervised domain adaptation can handle situations where a network is trained on labeled da… ▽ More

    Submitted 6 February, 2020; v1 submitted 6 December, 2018; originally announced December 2018.

  35. arXiv:1810.09624  [pdf, other

    stat.AP stat.CO

    Calendar-based graphics for visualizing people's daily schedules

    Authors: Earo Wang, Dianne Cook, Rob J Hyndman

    Abstract: Calendars are broadly used in society to display temporal information, and events. This paper describes a new R package with functionality to organize and display temporal data, collected on sub-daily resolution, into a calendar layout. The function `frame_calendar` uses linear algebra on the date variable to restructure data into a format lending itself to calendar layouts. The user can apply the… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Comments: 31 pages, 19 figures

  36. arXiv:1809.06024  [pdf, ps, other

    stat.ML cs.LG

    A convex formulation for high-dimensional sparse sliced inverse regression

    Authors: Kean Ming Tan, Zhaoran Wang, Tong Zhang, Han Liu, R. Dennis Cook

    Abstract: Sliced inverse regression is a popular tool for sufficient dimension reduction, which replaces covariates with a minimal set of their linear combinations without loss of information on the conditional distribution of the response given the covariates. The estimated linear combinations include all covariates, making results difficult to interpret and perhaps unnecessarily variable, particularly whe… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

  37. arXiv:1809.02264  [pdf, other

    stat.CO

    Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations

    Authors: Nicholas J Tierney, Dianne H Cook

    Abstract: Despite the large body of research on missing value distributions and imputation, there is comparatively little literature with a focus on how to make it easy to handle, explore, and impute missing values in data. This paper addresses this gap. The new methodology builds upon tidy data principles, with the goal of integrating missing value handling as a key part of data analysis workflows. We defi… ▽ More

    Submitted 14 May, 2020; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: 30 pages, 16 figures, 7 tables, package available at github.com/njtierney/naniar

  38. A Projection Pursuit Forest Algorithm for Supervised Classification

    Authors: Natalia da Silva, Dianne Cook, Eun-Kyung Lee

    Abstract: This paper presents a new ensemble learning method for classification problems called projection pursuit random forest (PPF). PPF uses the PPtree algorithm introduced in Lee et al. (2013). In PPF, trees are constructed by splitting on linear combinations of randomly chosen variables. Projection pursuit is used to choose a projection of the variables that best separates the classes. Utilizing linea… ▽ More

    Submitted 25 July, 2018; v1 submitted 18 July, 2018; originally announced July 2018.

    Journal ref: Journal of Computational and Graphical Statistics, (2021), 1-13

  39. arXiv:1801.02078  [pdf, other

    stat.AP

    Spatial Factor Models for High-Dimensional and Large Spatial Data: An Application in Forest Variable Map**

    Authors: Daniel Taylor-Rodriguez, Andrew O. Finley, Abhirup Datta, Chad Babcock, Hans-Erik Andersen, Bruce D. Cook, Douglas C. Morton, Sudipto Banerjee

    Abstract: Gathering information about forest variables is an expensive and arduous activity. As such, directly collecting the data required to produce high-resolution maps over large spatial domains is infeasible. Next generation collection initiatives of remotely sensed Light Detection and Ranging (LiDAR) data are specifically aimed at producing complete-coverage maps over large spatial domains. Given that… ▽ More

    Submitted 8 November, 2018; v1 submitted 6 January, 2018; originally announced January 2018.

  40. arXiv:1708.01481  [pdf, other

    stat.AP stat.ME

    Multivariate Design of Experiments for Engineering Dimensional Analysis

    Authors: Daniel J. Eck, Christopher J. Nachtsheim, R. Dennis Cook, Thomas A. Albrecht

    Abstract: We consider the design of dimensional analysis experiments when there is more than a single response. We first give a brief overview of dimensional analysis experiments and the dimensional analysis (DA) procedure. The validity of the DA method for univariate responses was established by the Buckingham $Π$-Theorem in the early 20th century. We extend the theorem to the multivariate case, develop ba… ▽ More

    Submitted 7 August, 2018; v1 submitted 4 August, 2017; originally announced August 2017.

  41. arXiv:1705.03534  [pdf, other

    stat.AP

    Geostatistical estimation of forest biomass in interior Alaska combining Landsat-derived tree cover, sampled airborne lidar and field observations

    Authors: Chad Babcock, Andrew O. Finley, Hans-Erik Andersen, Robert Pattison, Bruce D. Cook, Douglas C. Morton, Michael Alonzo, Ross Nelson, Timothy Gregoire, Liviu Ene, Terje Gobakken, Erik Næsset

    Abstract: The goal of this research was to develop and examine the performance of a geostatistical coregionalization modeling approach for combining field inventory measurements, strip samples of airborne lidar and Landsat-based remote sensing data products to predict aboveground biomass (AGB) in interior Alaska's Tanana Valley. The proposed modeling strategy facilitates pixel-level map** of AGB density p… ▽ More

    Submitted 20 December, 2017; v1 submitted 9 May, 2017; originally announced May 2017.

  42. arXiv:1704.02502  [pdf, other

    stat.ML

    Interactive Graphics for Visually Diagnosing Forest Classifiers in R

    Authors: Natalia da Silva, Dianne Cook, Eun-Kyung Lee

    Abstract: This paper describes structuring data and constructing plots to explore forest classification models interactively. A forest classifier is an example of an ensemble, produced by bagging multiple trees. The process of bagging and combining results from multiple trees, produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into class structure in high dimensions… ▽ More

    Submitted 8 April, 2017; originally announced April 2017.

  43. arXiv:1701.07910  [pdf, ps, other

    stat.AP stat.ME

    Combining Envelope Methodology and Aster Models for Variance Reduction in Life History Analyses

    Authors: Daniel J. Eck, Charles J. Geyer, R. Dennis Cook

    Abstract: Precise estimation of expected Darwinian fitness, the expected lifetime number of offspring of organism, is a central component of life history analysis. The aster model serves as a defensible statistical model for distributions of Darwinian fitness. The aster model is equipped to incorporate the major life stages an organism travels through which separately may effect Darwinian fitness. Envelope… ▽ More

    Submitted 27 February, 2018; v1 submitted 26 January, 2017; originally announced January 2017.

    Comments: Title changed from "An Application of Envelope Methodology and Aster Models" to "Combining Envelope Methodology and Aster Models for Variance Reduction in Life History Analyses"

  44. arXiv:1701.00856  [pdf, ps, other

    stat.ME

    Weighted envelope estimation to handle variability in model selection

    Authors: Daniel J. Eck, R. Dennis Cook

    Abstract: Envelope methodology can provide substantial efficiency gains in multivariate statistical problems, but in some applications the estimation of the envelope dimension can induce selection volatility that may mitigate those gains. Current envelope methodology does not account for the added variance that can result from this selection. In this article, we circumvent dimension selection volatility thr… ▽ More

    Submitted 14 April, 2017; v1 submitted 3 January, 2017; originally announced January 2017.

  45. arXiv:1605.01485  [pdf, other

    stat.ME math.ST

    Matrix-Variate Regressions and Envelope Models

    Authors: Shanshan Ding, R. Dennis Cook

    Abstract: Modern technology often generates data with complex structures in which both response and explanatory variables are matrix-valued. Existing methods in the literature are able to tackle matrix-valued predictors but are rather limited for matrix-valued responses. In this article, we study matrix-variate regressions for such data, where the response Y on each experimental unit is a random matrix and… ▽ More

    Submitted 30 July, 2017; v1 submitted 5 May, 2016; originally announced May 2016.

    Comments: 28 pages, 4 figures

  46. arXiv:1603.07409  [pdf, other

    stat.AP

    Joint hierarchical models for sparsely sampled high-dimensional LiDAR and forest variables

    Authors: Andrew O. Finley, Sudipto Banerjee, Yuzhen Zhou, Bruce D. Cook, Chad Babcock

    Abstract: Recent advancements in remote sensing technology, specifically Light Detection and Ranging (LiDAR) sensors, provide the data needed to quantify forest characteristics at a fine spatial resolution over large geographic domains. From an inferential standpoint, there is interest in prediction and interpolation of the often sparsely sampled and spatially misaligned LiDAR signals and forest variables.… ▽ More

    Submitted 5 December, 2016; v1 submitted 23 March, 2016; originally announced March 2016.

  47. arXiv:1509.03767  [pdf, other

    stat.ME

    Algorithms for Envelope Estimation II

    Authors: Dennis Cook, Liliana Forzani, Zhihua Su

    Abstract: We propose a new algorithm for envelope estimation, along with a new root n consistent method for computing starting values. The new algorithm, which does not require optimization over a Grassmannian, is shown by simulation to be much faster and typically more accurate that the best existing algorithm proposed by Cook and Zhang (2015c).

    Submitted 12 September, 2015; originally announced September 2015.

    Comments: 38 pages, 2 figures, 7 tables

  48. arXiv:1502.06988  [pdf, other

    stat.ME

    Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners

    Authors: Adam Loy, Heike Hofmann, Dianne Cook

    Abstract: The complexity of linear mixed-effects (LME) models means that traditional diagnostics are rendered less effective. This is due to a breakdown of asymptotic results, boundary issues, and visible patterns in residual plots that are introduced by the model fitting process. Some of these issues are well known and adjustments have been proposed. Working with LME models typically requires that the anal… ▽ More

    Submitted 6 December, 2016; v1 submitted 24 February, 2015; originally announced February 2015.

    Comments: 52 pages, 15 figures, 3 tables

  49. arXiv:1412.6675  [pdf, other

    stat.CO

    Enabling Interactivity on Displays of Multivariate Time Series and Longitudinal Data

    Authors: Xiaoyue Cheng, Dianne Cook, Heike Hofmann

    Abstract: Temporal data is information measured in the context of time. This contextual structure provides components that need to be explored to understand the data and that can form the basis of interactions applied to the plots. In multivariate time series we expect to see temporal dependence, long term and seasonal trends and cross-correlations. In longitudinal data we also expect within and between sub… ▽ More

    Submitted 20 December, 2014; originally announced December 2014.

    Comments: 28 pages, 12 figures, 5 tables. Submitted for journal publication

  50. arXiv:1411.0599  [pdf, other

    stat.AP

    Dynamic spatial regression models for space-varying forest stand tables

    Authors: Andrew O. Finley, Sudipto Banerjee, Aaron R. Weiskittel, Chad Babcock, Bruce D. Cook

    Abstract: Many forest management planning decisions are based on information about the number of trees by species and diameter per unit area. This information is commonly summarized in a stand table, where a stand is defined as a group of forest trees of sufficiently uniform species composition, age, condition, or productivity to be considered a homogeneous unit for planning purposes. Typically information… ▽ More

    Submitted 3 November, 2014; originally announced November 2014.