Skip to main content

Showing 1–18 of 18 results for author: Shaw, P A

.
  1. arXiv:2405.10925  [pdf

    stat.ME cs.AI cs.LG

    High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates

    Authors: Janick Weberpals, Pamela A. Shaw, Kueiyu Joshua Lin, Richard Wyss, Joseph M Plasek, Li Zhou, Kerry Ngan, Thomas DeRamus, Sudha R. Raman, Bradley G. Hammill, Hana Lee, Sengwee Toh, John G. Connolly, Kimberly J. Dandreo, Fang Tian, Wei Liu, Jie Li, José J. Hernández-Muñoz, Sebastian Schneeweiss, Rishi J. Desai

    Abstract: Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  2. Analysis of the 24-Hour Activity Cycle: An illustration examining the association with cognitive function in the Adult Changes in Thought (ACT) Study

    Authors: Yinxiang Wu, Dori E. Rosenberg, Mikael Anne Greenwood-Hickman, Susan M. McCurry, Cecile Proust-Lima, Jennifer C. Nelson, Paul K. Crane, Andrea Z. LaCroix, Eric B. Larson, Pamela A. Shaw

    Abstract: The 24-hour activity cycle (24HAC) is a new paradigm for studying activity behaviors in relation to health outcomes. This approach captures the interrelatedness of the daily time spent in physical activity (PA), sedentary behavior (SB), and sleep. We illustrate and compare the use of three popular approaches, namely isotemporal substitution model (ISM), compositional data analysis (CoDA), and late… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: 51 pages, 11 tables, 8 figures

  3. arXiv:2209.12304  [pdf

    stat.ME

    Issues in Implementing Regression Calibration Analyses

    Authors: Lillian Boe, Pamela A. Shaw, Douglas Midthune, Paul Gustafson, Victor Kipnis, Eunyoung Park, Daniela Sotres-Alvarez, Laurence Freedman

    Abstract: Regression calibration is a popular approach for correcting biases in estimated regression parameters when exposure variables are measured with error. This approach involves building a calibration equation to estimate the value of the unknown true exposure given the error-prone measurement and other confounding covariates. The estimated, or calibrated, exposure is then substituted for the true exp… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

  4. arXiv:2209.10061  [pdf, ps, other

    stat.ME stat.AP

    Practical considerations for sandwich variance estimation in two-stage regression settings

    Authors: Lillian A. Boe, Thomas Lumley, Pamela A. Shaw

    Abstract: We present a practical approach for computing the sandwich variance estimator in two-stage regression model settings. As a motivating example for two-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has been rarely applied in regression calibration, despite that it requires less computation time than… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: 18 pages of main manuscript including 2 figures and 4 tables; 14 pages of supplementary materials and references (including 2 tables)

  5. arXiv:2205.01743  [pdf, other

    stat.ME stat.AP

    Three-phase generalized raking and multiple imputation estimators to address error-prone data

    Authors: Gustavo Amorim, Ran Tao, Sarah Lotspeich, Pamela A. Shaw, Thomas Lumley, Rena C. Patel, Bryan E. Shepherd

    Abstract: Validation studies are often used to obtain more reliable information in settings with error-prone data. Validated data on a subsample of subjects can be used together with error-prone data on all subjects to improve estimation. In practice, more than one round of data validation may be required, and direct application of standard approaches for combining validation data into analyses may lead to… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

  6. arXiv:2201.03111  [pdf, ps, other

    stat.ME math.OC

    A Model-Free and Finite-Population-Exact Framework for Randomized Experiments Subject to Outcome Misclassification via Integer Programming

    Authors: Siyu Heng, Pamela A. Shaw

    Abstract: Results from randomized experiments (trials) can be severely distorted by outcome misclassification, such as from measurement error or reporting bias in binary outcomes. All existing approaches to outcome misclassification rely on some data-generating (super-population) model and therefore may not be applicable to randomized experiments without additional assumptions. We propose a model-free and f… ▽ More

    Submitted 27 April, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

    Comments: 60 pages, 5 tables

  7. arXiv:2112.12207  [pdf

    stat.AP

    Nutritional blood concentration biomarkers in the Hispanic Community Health Study/Study of Latinos: Measurement characteristics and power

    Authors: Lillian A. Boe, Yasmin Mossavar-Rahmani, Daniela Sotres-Alvarez, Martha L. Daviglus, Ramon A. Durazo-Arvizu, Bharat Thyagarajan, Robert C. Kaplan, Pamela A. Shaw

    Abstract: Measurement error is a major issue in self-reported diet that can distort diet-disease relationships. Use of blood concentration biomarkers has the potential to mitigate the subjective bias inherent in self-report. As part of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) baseline visit (2008-2011), self-reported diet was collected on all participants (N=16,415). Blood concentrati… ▽ More

    Submitted 20 September, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: 20 pages in main manuscript including 5 tables and 2 figures; 14 pages of supplement including 5 tables and 1 figure

  8. arXiv:2111.12760  [pdf, ps, other

    stat.ME

    An Augmented Likelihood Approach for the Discrete Proportional Hazards Model Using Auxiliary and Validated Outcome Data -- with Application to the HCHS/SOL Study

    Authors: Lillian A. Boe, Pamela A. Shaw

    Abstract: In large epidemiologic studies, it is typical for an inexpensive, non-invasive procedure to be used to record disease status during regular follow-up visits, with less frequent assessment by a gold standard test. Inexpensive outcome measures like self-reported disease status are practical to obtain, but can be error-prone. Association analysis reliant on error-prone outcomes may lead to biased res… ▽ More

    Submitted 20 September, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Main manuscript: 31 pages including 6 pages of figures; 27 pages including 5 pages of figures and references

  9. arXiv:2109.14001  [pdf, other

    stat.AP stat.ME

    Analysis of Error-prone Electronic Health Records with Multi-wave Validation Sampling: Association of Maternal Weight Gain during Pregnancy with Childhood Outcomes

    Authors: Bryan E. Shepherd, Kyunghee Han, Tong Chen, Aihua Bian, Shannon Pugh, Stephany N. Duda, Thomas Lumley, William J. Heerman, Pamela A. Shaw

    Abstract: Electronic health record (EHR) data are increasingly used for biomedical research, but these data have recognized data quality challenges. Data validation is necessary to use EHR data with confidence, but limited resources typically make complete data validation impossible. Using EHR data, we illustrate prospective, multi-wave, two-phase validation sampling to estimate the association between mate… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

  10. Optimal Multi-Wave Validation of Secondary Use Data with Outcome and Exposure Misclassification

    Authors: Sarah C. Lotspeich, Gustavo G. C. Amorim, Pamela A. Shaw, Ran Tao, Bryan E. Shepherd

    Abstract: The growing availability of observational databases like electronic health records (EHR) provides unprecedented opportunities for secondary use of such data in biomedical research. However, these data can be error-prone and need to be validated before use. It is usually unrealistic to validate the whole database due to resource constraints. A cost-effective alternative is to implement a two-phase… ▽ More

    Submitted 12 September, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Main text (29 pages), followed by Supplementary Materials (19 pages)

    MSC Class: 62P10

  11. arXiv:2106.09494  [pdf, other

    stat.ME

    Optimum Allocation for Adaptive Multi-Wave Sampling in R: The R Package optimall

    Authors: Jasper B. Yang, Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw

    Abstract: The R package optimall offers a collection of functions that efficiently streamline the design process of sampling in surveys ranging from simple to complex. The package's main functions allow users to interactively define and adjust strata cut points based on values or quantiles of auxiliary covariates, adaptively calculate the optimum number of samples to allocate to each stratum using Neyman or… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 31 pages, 7 figures

  12. arXiv:2006.07480  [pdf, other

    stat.ME

    Improved Generalized Raking Estimators to Address Dependent Covariate and Failure-Time Outcome Error

    Authors: Eric J. Oh, Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw

    Abstract: Biomedical studies that use electronic health records (EHR) data for inference are often subject to bias due to measurement error. The measurement error present in EHR data is typically complex, consisting of errors of unknown functional form in covariates and the outcome, which can be dependent. To address the bias resulting from such errors, generalized raking has recently been proposed as a rob… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  13. arXiv:2005.05511  [pdf, other

    stat.ME

    Two-phase analysis and study design for survival models with error-prone exposures

    Authors: Kyunghee Han, Thomas Lumley, Bryan E. Shepherd, Pamela A. Shaw

    Abstract: Increasingly, medical research is dependent on data collected for non-research purposes, such as electronic health records data (EHR). EHR data and other large databases can be prone to measurement error in key exposures, and unadjusted analyses of error-prone data can bias study results. Validating a subset of records is a cost-effective way of gaining information on the error structure, which in… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: 22 pages, 2 figures, 3 tables, supplementary material

  14. arXiv:2004.01112  [pdf, other

    stat.ME

    An Approximate Quasi-Likelihood Approach for Error-Prone Failure Time Outcomes and Exposures

    Authors: Lillian A. Boe, Lesley F. Tinker, Pamela A. Shaw

    Abstract: Measurement error arises commonly in clinical research settings that rely on data from electronic health records or large observational cohorts. In particular, self-reported outcomes are typical in cohort studies for chronic diseases such as diabetes in order to avoid the burden of expensive diagnostic tests. Dietary intake, which is also commonly collected by self-report and subject to measuremen… ▽ More

    Submitted 4 February, 2021; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: 61 pages, 1 figure, 14 tables in total. Main manuscript: first 38 pages including references and 6 tables, followed by supplementary materials with remaining 23 pages including 1 figure and 8 tables

  15. arXiv:1910.01162  [pdf, other

    stat.ME

    Combining multiple imputation with raking of weights: An efficient and robust approach in the setting of nearly-true models

    Authors: Kyunghee Han, Pamela A. Shaw, Thomas Lumley

    Abstract: Multiple imputation provides us with efficient estimators in model-based methods for handling missing data under the true model. It is also well-understood that design-based estimators are robust methods that do not require accurately modeling the missing data; however, they can be inefficient. In any applied setting, it is difficult to know whether a missing data model may be good enough to win t… ▽ More

    Submitted 9 June, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: 24 pages, 3 figures

  16. arXiv:1909.04706  [pdf, other

    stat.ME econ.EM

    Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis

    Authors: Nicholas Illenberger, Dylan S. Small, Pamela A. Shaw

    Abstract: To make informed policy recommendations from observational data, we must be able to discern true treatment effects from random noise and effects due to confounding. Difference-in-Difference techniques which match treated units to control units based on pre-treatment outcomes, such as the synthetic control approach have been presented as principled methods to account for confounding. However, we sh… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: 15 pages, 4 figures

    MSC Class: 62K99

  17. arXiv:1905.08330  [pdf, other

    stat.ME

    Raking and Regression Calibration: Methods to Address Bias from Correlated Covariate and Time-to-Event Error

    Authors: Eric J. Oh, Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw

    Abstract: Medical studies that depend on electronic health records (EHR) data are often subject to measurement error, as the data are not collected to support research questions under study. These data errors, if not accounted for in study analyses, can obscure or cause spurious associations between patient exposures and disease risk. Methodology to address covariate measurement error has been well develope… ▽ More

    Submitted 9 March, 2020; v1 submitted 20 May, 2019; originally announced May 2019.

  18. Epidemiologic analyses with error-prone exposures: Review of current practice and recommendations

    Authors: Pamela A. Shaw, Veronika Deffner, Ruth H. Keogh, Janet A. Tooze, Kevin W. Dodd, Helmut Küchenhoff, Victor Kipnis, Laurence S. Freedman

    Abstract: Background: Variables in epidemiological observational studies are commonly subject to measurement error and misclassification, but the impact of such errors is frequently not appreciated or ignored. As part of the STRengthening Analytical Thinking for Observational Studies (STRATOS) Initiative, a Task Group on measurement error and misclassification (TG4) seeks to describe the scope of this probl… ▽ More

    Submitted 28 February, 2018; originally announced February 2018.

    Comments: 41 pages, including 4 tables and supplementary material

    Journal ref: Annals of Epidemiology, 2018, 28(11), 821-828