-
Accounting for survey design in Bayesian disaggregation of survey-based areal estimates of proportions: an application to the American Community Survey
Authors:
Marco H. Benedetti,
Veronica J. Berrocal,
Roderick J. Little
Abstract:
Understanding the effects of social determinants of health on health outcomes requires data on characteristics of the neighborhoods in which subjects live. However, estimates of these characteristics are often aggregated over space and time in a fashion that diminishes their utility. Take, for example, estimates from the American Community Survey (ACS), a multi-year nationwide survey administered…
▽ More
Understanding the effects of social determinants of health on health outcomes requires data on characteristics of the neighborhoods in which subjects live. However, estimates of these characteristics are often aggregated over space and time in a fashion that diminishes their utility. Take, for example, estimates from the American Community Survey (ACS), a multi-year nationwide survey administered by the U.S. Census Bureau: estimates for small municipal areas are aggregated over 5-year periods, whereas 1-year estimates are only available for municipal areas with populations $>$65,000. Researchers may wish to use ACS estimates in studies of population health to characterize neighborhood-level exposures. However, 5-year estimates may not properly characterize temporal changes or align temporally with other data in the study, while the coarse spatial resolution of the 1-year estimates diminishes their utility in characterizing neighborhood exposure. To circumvent this issue, in this paper we propose a modeling framework to disaggregate estimates of proportions derived from sampling surveys which explicitly accounts for the survey design effect. We illustrate the utility of our model by applying it to the ACS data, generating estimates of poverty for the state of Michigan at fine spatio-temporal resolution.
△ Less
Submitted 14 December, 2021; v1 submitted 13 December, 2021;
originally announced December 2021.
-
A Case Study of Nonresponse Bias Analysis In Educational Assessment Surveys
Authors:
Yajuan Si,
Roderick J. A. Little,
Ya Mo,
Nell Sedransk
Abstract:
Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest.…
▽ More
Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software.
△ Less
Submitted 25 July, 2022; v1 submitted 9 April, 2021;
originally announced April 2021.
-
Framework for the Treatment And Reporting of Missing data in Observational Studies: The TARMOS framework
Authors:
Katherine J Lee,
Kate Tilling,
Rosie P Cornish,
Roderick JA Little,
Melanie L Bell,
Els Goetghebeur,
Joseph W Hogan,
James R Carpenter
Abstract:
Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal S…
▽ More
Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. An important consideration is whether a complete records analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits, and whether a sensitivity analysis regarding the missingness mechanism is required. 2) Explore the data, checking the methods outlined in the analysis plan are appropriate, and conduct the pre-planned analysis. 3) Report the results, including a description of the missing data, details on how the missing data were addressed, and the results from all analyses, interpreted in light of the missing data and the clinical relevance. This framework seeks to support researchers in thinking systematically about missing data, and transparently reporting the potential effect on the study results.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Assessing Selection Bias in Regression Coefficients Estimated from Non-Probability Samples, with Applications to Genetics and Demographic Surveys
Authors:
Brady T. West,
Roderick J. A. Little,
Rebecca R. Andridge,
Philip S. Boonstra,
Erin B. Ware,
Anita Pandit,
Fernanda Alvarado-Leiton
Abstract:
Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers, and (b) estimated differences in subgroup means in surveys of smartphone…
▽ More
Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers, and (b) estimated differences in subgroup means in surveys of smartphone users, we derive novel measures of selection bias for estimates of the coefficients in linear and probit regression models fitted to non-probability samples, when aggregate-level auxiliary data are available for the selected sample and the target population. The measures arise from normal pattern-mixture models that allow analysts to examine the sensitivity of their inferences to assumptions about non-ignorable selection in these samples. We examine the effectiveness of the proposed measures in a simulation study, and then use them to quantify the selection bias in (a) estimated PGS-phenotype relationships in a large study of volunteers recruited via Facebook, and (b) estimated subgroup differences in mean past-year employment duration in a non-probability sample of low-educated smartphone users. We evaluate the performance of the measures in these applications using benchmark estimates from large probability samples.
△ Less
Submitted 8 March, 2021; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Block-Conditional Missing at Random Models for Missing Data
Authors:
Yan Zhou,
Roderick J. A. Little,
John D. Kalbfleisch
Abstract:
Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data ${Z}$ and missing data indicators ${M}$, and associated "missing at random"; (MAR) condition under which a model for ${M}$ is unnecessary [R…
▽ More
Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data ${Z}$ and missing data indicators ${M}$, and associated "missing at random"; (MAR) condition under which a model for ${M}$ is unnecessary [Rubin, Biometrika 63 (1976) 581--592]. Most previous work has treated ${Z}$ and ${M}$ as single blocks, yielding selection or pattern-mixture models depending on how their joint distribution is factorized. This paper explores "block-sequential"; models that interleave subsets of the variables and their missing data indicators, and then make parameter restrictions based on assumptions in each block. These include models that are not MAR. We examine a subclass of block-sequential models we call block-conditional MAR (BCMAR) models, and an associated block-monotone reduced likelihood strategy that typically yields consistent estimates by selectively discarding some data. Alternatively, full ML estimation can often be achieved via the EM algorithm. We examine in some detail BCMAR models for the case of two multinomially distributed categorical variables, and a two block structure where the first block is categorical and the second block arises from a (possibly multivariate) exponential family distribution.
△ Less
Submitted 13 April, 2011;
originally announced April 2011.
-
Quantitative magnetic resonance image analysis via the EM algorithm with stochastic variation
Authors:
Xiaoxi Zhang,
Timothy D. Johnson,
Roderick J. A. Little,
Yue Cao
Abstract:
Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our research provides a powerful statistical framewo…
▽ More
Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our research provides a powerful statistical framework for image analysis and sheds light on future localized adaptive treatment regimes tailored to the individual's response. We assume in an imperfect world we only observe a blurred and noisy version of the underlying pathological/physiological changes via qMRI, due to measurement errors or unpredictable influences. We use a hidden Markov random field to model the spatial dependence in the data and develop a maximum likelihood approach via the Expectation--Maximization algorithm with stochastic variation. An important improvement over previous work is the assessment of variability in parameter estimation, which is the valid basis for statistical inference. More importantly, we focus on the expected changes rather than image segmentation. Our research has shown that the approach is powerful in both simulation studies and on a real dataset, while quite robust in the presence of some model assumption violations.
△ Less
Submitted 29 July, 2008;
originally announced July 2008.
-
Comment: Struggles with Survey Weighting and Regression Modeling
Authors:
Roderick J. Little
Abstract:
Comment: Struggles with Survey Weighting and Regression Modeling [arXiv:0710.5005]
Comment: Struggles with Survey Weighting and Regression Modeling [arXiv:0710.5005]
△ Less
Submitted 26 October, 2007;
originally announced October 2007.