Skip to main content

Showing 1–12 of 12 results for author: Peng, R D

.
  1. arXiv:2406.18681  [pdf, other

    stat.ME

    Data Sketching and Stacking: A Confluence of Two Strategies for Predictive Inference in Gaussian Process Regressions with High-Dimensional Features

    Authors: Samuel Gailliot, Rajarshi Guhaniyogi, Roger D. Peng

    Abstract: This article focuses on drawing computationally-efficient predictive inference from Gaussian process (GP) regressions with a large number of features when the response is conditionally independent of the features given the projection to a noisy low dimensional manifold. Bayesian estimation of the regression relationship using Markov Chain Monte Carlo and subsequent predictive inference is computat… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 32 Pages, 10 Figures

  2. arXiv:2312.07616  [pdf, other

    stat.ME math.ST stat.AP

    Evaluating the Alignment of a Data Analysis between Analyst and Audience

    Authors: Lucy D'Agostino McGowan, Roger D. Peng, Stephanie C. Hicks

    Abstract: A challenge that data analysts face is building a data analysis that is useful for a given consumer. Previously, we defined a set of principles for describing data analyses that can be used to create a data analysis and to characterize the variation between analyses. Here, we introduce a concept that we call the alignment of a data analysis between the data analyst and a consumer. We define a succ… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  3. arXiv:2310.17506  [pdf, other

    stat.AP

    Predicting Patient No-Shows in Community Health Clinics: A Case Study in Designing a Data Analytic Product

    Authors: Roger D. Peng

    Abstract: The data science revolution has highlighted the varying roles that data analytic products can play in a different industries and applications. There has been particular interest in using analytic products coupled with algorithmic prediction models to aid in human decision-making. However, detailed descriptions of the decision-making process that leads to the design and development of analytic prod… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  4. arXiv:2309.08494  [pdf, other

    stat.ME

    Modeling Data Analytic Iteration With Probabilistic Outcome Sets

    Authors: Roger D. Peng, Stephanie C. Hicks

    Abstract: In 1977 John Tukey described how in exploratory data analysis, data analysts use tools, such as data visualizations, to separate their expectations from what they observe. In contrast to statistical theory, an underappreciated aspect of data analysis is that a data analyst must make decisions by comparing the observed data or output from a statistical tool to what the analyst previously expected f… ▽ More

    Submitted 1 February, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 30 pages

  5. arXiv:2203.13982  [pdf

    q-bio.QM

    Implications of Mortality Displacement for Effect Modification and Selection Bias

    Authors: Honghyok Kim, Jong-Tae Lee, Roger D. Peng, Kelvin C. Fong, Michelle L. Bell

    Abstract: Mortality displacement is the concept that deaths are moved forward in time (e.g., a few days, several months, and years) by exposure from when they would occur without the exposure, which is common in environmental time-series studies. Using concepts of a frail population and loss of life expectancy, it is understood that mortality displacement may decrease rate ratio (RR). Such decreases are tho… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: This is an epidemiological theory paper

  6. arXiv:2105.06324  [pdf, other

    stat.OT stat.AP

    Perspective on Data Science

    Authors: Roger D. Peng, Hilary S. Parker

    Abstract: The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be natural, but over time maintaining such a broad definition becomes unwieldy and impedes progress. In particular, the teaching of data science is hampered by the s… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  7. arXiv:2103.05689  [pdf, other

    stat.ME stat.AP stat.OT

    Design Principles for Data Analysis

    Authors: Lucy D'Agostino McGowan, Roger D. Peng, Stephanie C. Hicks

    Abstract: The data science revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design thinking -- the problem-solving process to understand the people for whom a product is being designed. For a given problem, there can be significant or subtle d… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: text overlap with arXiv:1903.07639

  8. arXiv:2007.12210  [pdf, ps, other

    stat.OT

    Reproducible Research: A Retrospective

    Authors: Roger D. Peng, Stephanie C. Hicks

    Abstract: Rapid advances in computing technology over the past few decades have spurred two extraordinary phenomena in science: large-scale and high-throughput data collection coupled with the creation and implementation of complex statistical algorithms for data analysis. Together, these two phenomena have brought about tremendous advances in scientific discovery but have also raised two serious concerns,… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

  9. arXiv:1904.11907  [pdf, ps, other

    stat.OT stat.AP

    Evaluating the Success of a Data Analysis

    Authors: Stephanie C. Hicks, Roger D. Peng

    Abstract: A fundamental problem in the practice and teaching of data science is how to evaluate the quality of a given data analysis, which is different than the evaluation of the science or question underlying the data analysis. Previously, we defined a set of principles for describing data analyses that can be used to create a data analysis and to characterize the variation between data analyses. Here, we… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

    Comments: 16 pages

  10. arXiv:1903.07639  [pdf, other

    stat.AP

    Elements and Principles for Characterizing Variation between Data Analyses

    Authors: Stephanie C. Hicks, Roger D. Peng

    Abstract: The data revolution has led to an increased interest in the practice of data analysis. For a given problem, there can be significant or subtle differences in how a data analyst constructs or creates a data analysis, including differences in the choice of methods, tooling, and workflow. In addition, data analysts can prioritize (or not) certain objective characteristics in a data analysis, leading… ▽ More

    Submitted 25 July, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

    Comments: 14 pages, 7 figures, 1 table

  11. arXiv:1509.08968  [pdf, other

    stat.AP

    A glass half full interpretation of the replicability of psychological science

    Authors: Jeffrey T. Leek, Prasad Patil, Roger D. Peng

    Abstract: A recent study of the replicability of key psychological findings is a major contribution toward understanding the human side of the scientific process. Despite the careful and nuanced analysis reported in the paper, mass and social media adhered to the simple narrative that only 36% of the studies replicated their original results. Here we show that 77% of the replication effect sizes reported we… ▽ More

    Submitted 29 September, 2015; originally announced September 2015.

    Comments: 6 pages, 3 figures

  12. Reproducible Research Can Still Be Wrong: Adopting a Prevention Approach

    Authors: Jeffrey T. Leek, Roger D. Peng

    Abstract: Reproducibility, the ability to recompute results, and replicability, the chances other experimenters will achieve a consistent result, are two foundational characteristics of successful scientific research. Consistent findings from independent investigators are the primary means by which scientific evidence accumulates for or against an hypothesis. And yet, of late there has been a crisis of conf… ▽ More

    Submitted 10 February, 2015; originally announced February 2015.

    Comments: 3 pages, 1 figure

    Journal ref: PNAS 112 (6) 1645-1645, 2015