Skip to main content

Showing 1–29 of 29 results for author: Savitsky, T D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.05383  [pdf, other

    stat.AP

    Review of Quasi-Randomization Approaches for Estimation from Non-probability Samples

    Authors: Vladislav Beresovsky, Julie Gershunskaya, Terrance D. Savitsky

    Abstract: The recent proliferation of computers and the internet have opened new opportunities for collecting and processing data. However, such data are often obtained without a well-planned probability survey design. Such non-probability based samples cannot be automatically regarded as representative of the population of interest. Several classes of methods for estimation and inferences from non-probabil… ▽ More

    Submitted 26 June, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 38 pages, 12 figures

  2. arXiv:2310.01575  [pdf, other

    stat.ME stat.AP

    Derivation of outcome-dependent dietary patterns for low-income women obtained from survey data using a Supervised Weighted Overfitted Latent Class Analysis

    Authors: Stephanie M. Wu, Matthew R. Williams, Terrance D. Savitsky, Briana J. K. Stephenson

    Abstract: Poor diet quality is a key modifiable risk factor for hypertension and disproportionately impacts low-income women. \sw{Analyzing diet-driven hypertensive outcomes in this demographic is challenging due to the complexity of dietary data and selection bias when the data come from surveys, a main data source for understanding diet-disease relationships in understudied populations. Supervised Bayesia… ▽ More

    Submitted 28 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 16 pages, 8 tables, 7 figures

  3. arXiv:2308.06845  [pdf, other

    stat.CO stat.AP

    csSampling: An R Package for Bayesian Models for Complex Survey Data

    Authors: Ryan Hornby, Matthew R. Williams, Terrance D. Savitsky, Mahmoud Elkasabi

    Abstract: We present csSampling, an R package for estimation of Bayesian models for data collected from complex survey samples. csSampling combines functionality from the probabilistic programming language Stan (via the rstan and brms R packages) and the handling of complex survey data from the survey R package. Under this approach, the user creates a survey-weighted model in brms or provides a custom weigh… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: 22 pages, 5 figures

  4. arXiv:2210.14366  [pdf, other

    stat.ME

    Joint Point and Variance Estimation under a Hierarchical Bayesian model for Survey Count Data

    Authors: Terrance D. Savitsky, Julie Gershunskaya, Mark Crankshaw

    Abstract: We propose a novel Bayesian framework for the joint modeling of survey point and variance estimates for count data. The approach incorporates an induced prior distribution on the modeled true variance that sets it equal to the generating variance of the point estimate, a key property more readily achieved for continuous data response type models. Our count data model formulation allows the input o… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 5 figures, 3 tables

  5. arXiv:2208.14541  [pdf, other

    stat.ME stat.CO

    Methods for Combining Probability and Nonprobability Samples Under Unknown Overlaps

    Authors: Terrance D. Savitsky, Matthew R. Williams, Julie Gershunskaya, Vladislav Beresovsky, Nels G. Johnson

    Abstract: Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in… ▽ More

    Submitted 9 June, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 37 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:2204.02271

  6. arXiv:2205.05003  [pdf, other

    stat.ME

    Mechanisms for Global Differential Privacy under Bayesian Data Synthesis

    Authors: **gchen Hu, Matthew R. Williams, Terrance D. Savitsky

    Abstract: This paper introduces a new method that embeds any Bayesian model used to generate synthetic data and converts it into a differentially private (DP) mechanism. We propose an alteration of the model synthesizer to utilize a censored likelihood that induces upper and lower bounds of [$\exp(-ε/ 2), \exp(ε/ 2)$], where $ε$ denotes the level of the DP guarantee. This censoring mechanism equipped with a… ▽ More

    Submitted 3 August, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

  7. arXiv:2204.02271   

    stat.ME

    Methods for Combining Probability and Nonprobability Samples Under Unknown Overlaps

    Authors: Terrance D. Savitsky, Matthew R. Williams, Julie Gershunskaya, Vladislav Beresovsky, Nels G. Johnson

    Abstract: Nonprobability (convenience) samples are increasingly sought to stabilize estimations for one or more population variables of interest that are performed using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the conve… ▽ More

    Submitted 9 June, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Duplication with arXiv.2208.14541

  8. arXiv:2101.06237  [pdf, other

    stat.ME

    Fully Bayesian Estimation under Dependent and Informative Cluster Sampling

    Authors: Luis G. Leon-Novelo, Terrance D. Savitsky

    Abstract: Survey data are often collected under multistage sampling designs where units are binned to clusters that are sampled in a first stage. The unit-indexed population variables of interest are typically dependent within cluster. We propose a Fully Bayesian method that constructs an exact likelihood for the observed sample to incorporate unit-level marginal sampling weights for performing unbiased inf… ▽ More

    Submitted 24 August, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: Total of 22 pages including 3 figures and 4 tables

  9. arXiv:2101.06188  [pdf, other

    stat.ME stat.AP

    Private Tabular Survey Data Products through Synthetic Microdata Generation

    Authors: **gchen Hu, Terrance D. Savitsky, Matthew R. Williams

    Abstract: We propose two synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a pseudo posterior mechanism that downweights by-record likelihood contributions with weights $\in [0,1]$ based on their identification disclosure risks to producing tabular products for survey data. Our method applied to an observed survey database achieves an asymptotic glo… ▽ More

    Submitted 3 March, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

  10. arXiv:2006.01230  [pdf, other

    stat.ME stat.AP

    Re-weighting of Vector-weighted Mechanisms for Utility Maximization under Differential Privacy

    Authors: Terrance D. Savitsky, **gchen Hu, Matthew R. Williams

    Abstract: We address practical implementation of a risk-weighted pseudo posterior synthesizer for microdata dissemination with a new re-weighting strategy that maximizes utility of released synthetic data under at any level of formal privacy guarantee. Our re-weighting strategy applies to any vector-weighted pseudo posterior mechanism under which a vector of observation-indexed weights are used to downweigh… ▽ More

    Submitted 28 April, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

  11. arXiv:2006.00783  [pdf, other

    stat.ME stat.CO

    Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

    Authors: Rajarshi Guhaniyogi, Cheng Li, Terrance D. Savitsky, Sanvesh Srivastava

    Abstract: Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this… ▽ More

    Submitted 25 February, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

  12. arXiv:2004.06191  [pdf, other

    stat.ME

    Pseudo Bayesian Estimation of One-way ANOVA Model in Complex Surveys

    Authors: Terrance D. Savitsky, Matthew R. Williams, Sanvesh Srivastava

    Abstract: We devise survey-weighted pseudo posterior distribution estimators under two-stage informative sampling of both primary clusters and secondary nested units for a one-way analysis of variance (ANOVA) population generating model as a simple canonical case where population model random effects are defined to be coincident with the primary clusters, for example student performance based on a survey of… ▽ More

    Submitted 12 May, 2023; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: 45 pages, 12 figures

    MSC Class: 62D05; 62F15; 62J05

  13. arXiv:1909.11796  [pdf, other

    stat.ME

    Bayesian Pseudo Posterior Mechanism under Asymptotic Differential Privacy

    Authors: Terrance D. Savitsky, Matthew R. Williams, **gchen Hu

    Abstract: We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic databases equipped with an $(ε,δ)-$ probabilistic differential privacy (pDP) guarantee, where $δ$ denotes the probability that any observed database exceeds $ε$. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weight values $\in [0, 1]$ that surgically downweight the like… ▽ More

    Submitted 13 August, 2021; v1 submitted 25 September, 2019; originally announced September 2019.

    Comments: 35 pages, 7 figures, 2 tables

  14. arXiv:1908.07639  [pdf, other

    stat.ME stat.AP

    Risk-Efficient Bayesian Data Synthesis for Privacy Protection

    Authors: **gchen Hu, Terrance D. Savitsky, Matthew R. Williams

    Abstract: Statistical agencies utilize models to synthesize respondent-level data for release to the public for privacy protection. In this work, we efficiently induce privacy protection into any Bayesian synthesis model by employing a pseudo likelihood that exponentiates each likelihood contribution by an observation record-indexed weight in [0, 1], defined to be inversely proportional to the identificatio… ▽ More

    Submitted 8 February, 2021; v1 submitted 20 August, 2019; originally announced August 2019.

    Journal ref: Journal of Survey Statistics and Methodology, 2021

  15. arXiv:1904.07680  [pdf, other

    stat.ME

    Pseudo Bayesian Mixed Models under Informative Sampling

    Authors: Terrance D. Savitsky, Matthew R. Williams

    Abstract: When random effects are correlated with sample design variables, the usual approach of employing individual survey weights (constructed to be inversely proportional to the unit survey inclusion probabilities) to form a pseudo-likelihood no longer produces asymptotically unbiased inference. We construct a weight-exponentiated formulation for the random effects distribution that achieves unbiased in… ▽ More

    Submitted 24 August, 2021; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: 31 pages, 6 figures, 2 table

    MSC Class: 62F15; 62D05

  16. arXiv:1901.06462   

    math.ST stat.AP stat.ME

    Bayesian Pseudo Posterior Synthesis for Data Privacy Protection

    Authors: **gchen Hu, Terrance D. Savitsky

    Abstract: Statistical agencies utilize models to synthesize respondent-level data for release to the general public as an alternative to the actual data records. A Bayesian model synthesizer encodes privacy protection by employing a hierarchical prior construction that induces smoothing of the real data distribution. Synthetic respondent-level data records are often preferred to summary data tables due to t… ▽ More

    Submitted 15 May, 2020; v1 submitted 18 January, 2019; originally announced January 2019.

    Comments: This is to replace arXiv:1908.07639

  17. arXiv:1901.03791  [pdf, other

    stat.CO stat.ME

    Optimization of Survey Weights under a Large Number of Conflicting Constraints

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to increase efficiency by incorporating auxiliary population information, and to reduce the variability in estimates due to extreme weights. It is often the case t… ▽ More

    Submitted 11 January, 2019; originally announced January 2019.

    Comments: 23 pages, 2 figures, 3 tables

  18. arXiv:1809.10074  [pdf, other

    stat.AP

    Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

    Authors: **gchen Hu, Terrance D. Savitsky

    Abstract: The release of synthetic data generated from a model estimated on the data helps statistical agencies disseminate respondent-level data with high utility and privacy protection. Motivated by the challenge of disseminating sensitive variables containing geographic information in the Consumer Expenditure Surveys (CE) at the U.S. Bureau of Labor Statistics, we propose two non-parametric Bayesian mode… ▽ More

    Submitted 2 February, 2021; v1 submitted 26 September, 2018; originally announced September 2018.

  19. Bayesian Uncertainty Estimation Under Complex Sampling

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: Social and economic studies are often implemented as complex survey designs. For example, multistage, unequal probability sampling designs utilized by federal statistical agencies are typically constructed to maximize the efficiency of the target domain level estimator (e.g., indexed by geographic area) within cost constraints for survey administration. Such designs may induce dependence between t… ▽ More

    Submitted 29 July, 2019; v1 submitted 31 July, 2018; originally announced July 2018.

    Comments: 45 pages, 4 figures, 1 table

    MSC Class: 62D05; 62F15; 62F12

    Journal ref: International Statistical Review 2020

  20. Bayesian Estimation Under Informative Sampling with Unattenuated Dependence

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: An informative sampling design leads to unit inclusion probabilities that are correlated with the response variable of interest. However, multistage sampling designs may also induce higher order dependencies, which are typically ignored in the literature when establishing consistency of estimators for survey data under a condition requiring asymptotic independence among the unit inclusion probabil… ▽ More

    Submitted 12 July, 2018; originally announced July 2018.

    Comments: 35 pages, 5 figures. arXiv admin note: text overlap with arXiv:1710.10102

    Journal ref: Bayesian Anal., advance publication, 4 January 2019

  21. arXiv:1712.09767  [pdf, other

    stat.ME

    A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

    Authors: Rajarshi Guhaniyogi, Cheng Li, Terrance D. Savitsky, Sanvesh Srivastava

    Abstract: We propose a three-step divide-and-conquer strategy within the Bayesian paradigm that delivers massive scalability for any spatial process model. We partition the data into a large number of subsets, apply a readily available Bayesian spatial process model on every subset, in parallel, and optimally combine the posterior distributions estimated across all the subsets into a pseudo-posterior distri… ▽ More

    Submitted 12 June, 2019; v1 submitted 28 December, 2017; originally announced December 2017.

    Comments: 29 pages, including 4 figures and 5 tables

  22. Bayesian Pairwise Estimation Under Dependent Informative Sampling

    Authors: Matthew R. Williams, Terrance D. Savitsky

    Abstract: An informative sampling design leads to the selection of units whose inclusion probabilities are correlated with the response variable of interest. Model inference performed on the resulting observed sample will be biased for the population generative model. One approach that produces asymptotically unbiased inference employs marginal inclusion probabilities to form sampling weights used to expone… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

    Comments: 35 pages, 9 figures

    MSC Class: 62D05; 62G20

    Journal ref: Electron. J. Statist. Volume 12, Number 1 (2018), 1631-1661

  23. arXiv:1710.00019  [pdf, other

    stat.ME math.ST

    Fully Bayesian Estimation Under Informative Sampling

    Authors: Luis G. Leon-Novelo, Terrance D. Savitsky

    Abstract: Bayesian estimation is increasingly popular for performing model based inference to support policymaking. These data are often collected from surveys under informative sampling designs where subject inclusion probabilities are designed to be correlated with the response variable of interest. Sampling weights constructed from marginal inclusion probabilities are typically used to form an exponentia… ▽ More

    Submitted 11 July, 2018; v1 submitted 29 September, 2017; originally announced October 2017.

    Comments: Pages 1-29 conform the main paper and they include seven figures and three tables. Pages 30-36 contain Supplementary Material and pages 36-37 contain references

  24. arXiv:1606.07488  [pdf, other

    stat.ME

    Scalable Bayes under Informative Sampling

    Authors: Terrance D. Savitsky, Sanvesh Srivastava

    Abstract: The United States Bureau of Labor Statistics collects data using survey instruments under informative sampling designs that assign probabilities of inclusion to be correlated with the response. The bureau extensively uses Bayesian hierarchical models and posterior sampling to impute missing items in respondent-level data and to infer population parameters. Posterior sampling for survey data collec… ▽ More

    Submitted 24 October, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

    Comments: 34 pages, 6 figures, 2 tables

  25. Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions

    Authors: J. R. Lockwood, Terrance D. Savitsky, Daniel F. McCaffrey

    Abstract: Ratings of teachers' instructional practices using standardized classroom observation instruments are increasingly being used for both research and teacher accountability. There are multiple instruments in use, each attempting to evaluate many dimensions of teaching and classroom activities, and little is known about what underlying teaching quality attributes are being measured. We use data from… ▽ More

    Submitted 17 November, 2015; originally announced November 2015.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS833 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS833

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1484-1509

  26. arXiv:1508.00615  [pdf, other

    stat.ME

    Bayesian Nonparametric Functional Mixture Estimation for Time-Series Data, With Application to Estimation of State Employment Totals

    Authors: Terrance D. Savitsky

    Abstract: The U.S. Bureau of Labor Statistics use monthly, by-state employment totals from the Current Population Survey (CPS) as a key input to develop employment estimates for counties within the states. The monthly CPS by-state totals, however, express high levels of volatility that compromise the accuracy of resulting estimates composed for the counties. Typically-employed models for small area estimati… ▽ More

    Submitted 3 August, 2015; originally announced August 2015.

    Comments: 30 pages, 9 figures

  27. arXiv:1508.00604  [pdf, other

    stat.ME

    Bayesian Nonparameteric Multiresolution Estimation for the American Community Survey

    Authors: Terrance D. Savitsky

    Abstract: Bayesian hierarchical methods implemented for small area estimation focus on reducing the noise variation in published government official statistics by borrowing information among dependent response values. Even the most flexible models confine parameters defined at the finest scale to link to each data observation in a one-to-one construction. We propose a Bayesian multiresolution formulation th… ▽ More

    Submitted 3 August, 2015; originally announced August 2015.

    Comments: 35 pages, 11 figures

  28. arXiv:1507.07050  [pdf, other

    math.ST stat.ME

    Bayesian Estimation Under Informative Sampling

    Authors: Terrance D. Savitsky, Daniell Toth

    Abstract: Bayesian analysis is increasingly popular for use in social science and other application areas where the data are observations from an informative sample. An informative sampling design leads to inclusion probabilities that are correlated with the response variable of interest. Model inference performed on the observed sample taken from the population will be biased for the population generative… ▽ More

    Submitted 3 June, 2016; v1 submitted 24 July, 2015; originally announced July 2015.

    Comments: 24 pages, 3 figures

  29. Bayesian nonparametric hierarchical modeling for multiple membership data in grouped attendance interventions

    Authors: Terrance D. Savitsky, Susan M. Paddock

    Abstract: We develop a dependent Dirichlet process (DDP) model for repeated measures multiple membership (MM) data. This data structure arises in studies under which an intervention is delivered to each client through a sequence of elements which overlap with those of other clients on different occasions. Our interest concentrates on study designs for which the overlaps of sequences occur for clients who re… ▽ More

    Submitted 6 December, 2013; originally announced December 2013.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS620 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS620

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 2, 1074-1094