Search | arXiv e-print repository

arXiv:2112.06802 [pdf, other]

Accounting for survey design in Bayesian disaggregation of survey-based areal estimates of proportions: an application to the American Community Survey

Authors: Marco H. Benedetti, Veronica J. Berrocal, Roderick J. Little

Abstract: Understanding the effects of social determinants of health on health outcomes requires data on characteristics of the neighborhoods in which subjects live. However, estimates of these characteristics are often aggregated over space and time in a fashion that diminishes their utility. Take, for example, estimates from the American Community Survey (ACS), a multi-year nationwide survey administered… ▽ More Understanding the effects of social determinants of health on health outcomes requires data on characteristics of the neighborhoods in which subjects live. However, estimates of these characteristics are often aggregated over space and time in a fashion that diminishes their utility. Take, for example, estimates from the American Community Survey (ACS), a multi-year nationwide survey administered by the U.S. Census Bureau: estimates for small municipal areas are aggregated over 5-year periods, whereas 1-year estimates are only available for municipal areas with populations $>$65,000. Researchers may wish to use ACS estimates in studies of population health to characterize neighborhood-level exposures. However, 5-year estimates may not properly characterize temporal changes or align temporally with other data in the study, while the coarse spatial resolution of the 1-year estimates diminishes their utility in characterizing neighborhood exposure. To circumvent this issue, in this paper we propose a modeling framework to disaggregate estimates of proportions derived from sampling surveys which explicitly accounts for the survey design effect. We illustrate the utility of our model by applying it to the ACS data, generating estimates of poverty for the state of Michigan at fine spatio-temporal resolution. △ Less

Submitted 14 December, 2021; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2104.04432 [pdf]

A Case Study of Nonresponse Bias Analysis In Educational Assessment Surveys

Authors: Yajuan Si, Roderick J. A. Little, Ya Mo, Nell Sedransk

Abstract: Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest.… ▽ More Nonresponse bias is a widely prevalent problem for data on education. We develop a ten-step exemplar to guide nonresponse bias analysis (NRBA) in cross-sectional studies and apply these steps to the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11. A key step is the construction of indices of nonresponse bias based on proxy pattern-mixture models for survey variables of interest. A novel feature is to characterize the strength of evidence about nonresponse bias contained in these indices, based on the strength of the relationship between the characteristics in the nonresponse adjustment and the key survey variables. Our NRBA improves existing methods by incorporating both missing at random and missing not at random mechanisms, and all analyses can be done straightforwardly with standard statistical software. △ Less

Submitted 25 July, 2022; v1 submitted 9 April, 2021; originally announced April 2021.

arXiv:2004.14066 [pdf]

Framework for the Treatment And Reporting of Missing data in Observational Studies: The TARMOS framework

Authors: Katherine J Lee, Kate Tilling, Rosie P Cornish, Roderick JA Little, Melanie L Bell, Els Goetghebeur, Joseph W Hogan, James R Carpenter

Abstract: Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal S… ▽ More Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. An important consideration is whether a complete records analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits, and whether a sensitivity analysis regarding the missingness mechanism is required. 2) Explore the data, checking the methods outlined in the analysis plan are appropriate, and conduct the pre-planned analysis. 3) Report the results, including a description of the missing data, details on how the missing data were addressed, and the results from all analyses, interpreted in light of the missing data and the clinical relevance. This framework seeks to support researchers in thinking systematically about missing data, and transparently reporting the potential effect on the study results. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 37 pages, including 3 Figures, 1 table and supplementary material

MSC Class: 6207

arXiv:2004.06139 [pdf]

Assessing Selection Bias in Regression Coefficients Estimated from Non-Probability Samples, with Applications to Genetics and Demographic Surveys

Authors: Brady T. West, Roderick J. A. Little, Rebecca R. Andridge, Philip S. Boonstra, Erin B. Ware, Anita Pandit, Fernanda Alvarado-Leiton

Abstract: Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers, and (b) estimated differences in subgroup means in surveys of smartphone… ▽ More Selection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers, and (b) estimated differences in subgroup means in surveys of smartphone users, we derive novel measures of selection bias for estimates of the coefficients in linear and probit regression models fitted to non-probability samples, when aggregate-level auxiliary data are available for the selected sample and the target population. The measures arise from normal pattern-mixture models that allow analysts to examine the sensitivity of their inferences to assumptions about non-ignorable selection in these samples. We examine the effectiveness of the proposed measures in a simulation study, and then use them to quantify the selection bias in (a) estimated PGS-phenotype relationships in a large study of volunteers recruited via Facebook, and (b) estimated subgroup differences in mean past-year employment duration in a non-probability sample of low-educated smartphone users. We evaluate the performance of the measures in these applications using benchmark estimates from large probability samples. △ Less

Submitted 8 March, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

Comments: 29 pages, 4 figures, 2 tables, supplementary material

arXiv:1104.2400 [pdf, ps, other]

doi 10.1214/10-STS344

Block-Conditional Missing at Random Models for Missing Data

Authors: Yan Zhou, Roderick J. A. Little, John D. Kalbfleisch

Abstract: Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data ${Z}$ and missing data indicators ${M}$, and associated "missing at random"; (MAR) condition under which a model for ${M}$ is unnecessary [R… ▽ More Two major ideas in the analysis of missing data are (a) the EM algorithm [Dempster, Laird and Rubin, J. Roy. Statist. Soc. Ser. B 39 (1977) 1--38] for maximum likelihood (ML) estimation, and (b) the formulation of models for the joint distribution of the data ${Z}$ and missing data indicators ${M}$, and associated "missing at random"; (MAR) condition under which a model for ${M}$ is unnecessary [Rubin, Biometrika 63 (1976) 581--592]. Most previous work has treated ${Z}$ and ${M}$ as single blocks, yielding selection or pattern-mixture models depending on how their joint distribution is factorized. This paper explores "block-sequential"; models that interleave subsets of the variables and their missing data indicators, and then make parameter restrictions based on assumptions in each block. These include models that are not MAR. We examine a subclass of block-sequential models we call block-conditional MAR (BCMAR) models, and an associated block-monotone reduced likelihood strategy that typically yields consistent estimates by selectively discarding some data. Alternatively, full ML estimation can often be achieved via the EM algorithm. We examine in some detail BCMAR models for the case of two multinomially distributed categorical variables, and a two block structure where the first block is categorical and the second block arises from a (possibly multivariate) exponential family distribution. △ Less

Submitted 13 April, 2011; originally announced April 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-STS344 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS344

Journal ref: Statistical Science 2010, Vol. 25, No. 4, 517-532

arXiv:0807.4672 [pdf, ps, other]

doi 10.1214/07-AOAS157

Quantitative magnetic resonance image analysis via the EM algorithm with stochastic variation

Authors: Xiaoxi Zhang, Timothy D. Johnson, Roderick J. A. Little, Yue Cao

Abstract: Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our research provides a powerful statistical framewo… ▽ More Quantitative Magnetic Resonance Imaging (qMRI) provides researchers insight into pathological and physiological alterations of living tissue, with the help of which researchers hope to predict (local) therapeutic efficacy early and determine optimal treatment schedule. However, the analysis of qMRI has been limited to ad-hoc heuristic methods. Our research provides a powerful statistical framework for image analysis and sheds light on future localized adaptive treatment regimes tailored to the individual's response. We assume in an imperfect world we only observe a blurred and noisy version of the underlying pathological/physiological changes via qMRI, due to measurement errors or unpredictable influences. We use a hidden Markov random field to model the spatial dependence in the data and develop a maximum likelihood approach via the Expectation--Maximization algorithm with stochastic variation. An important improvement over previous work is the assessment of variability in parameter estimation, which is the valid basis for statistical inference. More importantly, we focus on the expected changes rather than image segmentation. Our research has shown that the approach is powerful in both simulation studies and on a real dataset, while quite robust in the presence of some model assumption violations. △ Less

Submitted 29 July, 2008; originally announced July 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-AOAS157 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS157

Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 2, 736-755

arXiv:0710.5013 [pdf, ps, other]

doi 10.1214/088342307000000186

Comment: Struggles with Survey Weighting and Regression Modeling

Authors: Roderick J. Little

Abstract: Comment: Struggles with Survey Weighting and Regression Modeling [arXiv:0710.5005] Comment: Struggles with Survey Weighting and Regression Modeling [arXiv:0710.5005] △ Less

Submitted 26 October, 2007; originally announced October 2007.

Comments: Published in at http://dx.doi.org/10.1214/088342307000000186 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS226C

Journal ref: Statistical Science 2007, Vol. 22, No. 2, 171-174

Showing 1–7 of 7 results for author: Little, R J