-
Method G: Uncertainty Quantification for Distributed Data Problems using Generalized Fiducial Inference
Authors:
Randy C. S. Lai,
J. Hannig,
Thomas C. M. Lee
Abstract:
It is not unusual for a data analyst to encounter data sets distributed across several computers. This can happen for reasons such as privacy concerns, efficiency of likelihood evaluations, or just the sheer size of the whole data set. This presents new challenges to statisticians as even computing simple summary statistics such as the median becomes computationally challenging. Furthermore, if ot…
▽ More
It is not unusual for a data analyst to encounter data sets distributed across several computers. This can happen for reasons such as privacy concerns, efficiency of likelihood evaluations, or just the sheer size of the whole data set. This presents new challenges to statisticians as even computing simple summary statistics such as the median becomes computationally challenging. Furthermore, if other advanced statistical methods are desired, novel computational strategies are needed. In this paper we propose a new approach for distributed analysis of massive data that is suitable for generalized fiducial inference and is based on a careful implementation of a "divide and conquer" strategy combined with importance sampling. The proposed approach requires only small amount of communication between nodes, and is shown to be asymptotically equivalent to using the whole data set. Unlike most existing methods, the proposed approach produces uncertainty measures (such as confidence intervals) in addition to point estimates for parameters of interest. The proposed approach is also applied to the analysis of a large set of solar images.
△ Less
Submitted 18 May, 2018;
originally announced May 2018.
-
Uncertainty Quantification for High Dimensional Sparse Nonparametric Additive Models
Authors:
Qi Gao,
Randy C. S. Lai,
Thomas C. M. Lee,
Yao Li
Abstract:
Statistical inference in high dimensional settings has recently attracted enormous attention within the literature. However, most published work focuses on the parametric linear regression problem. This paper considers an important extension of this problem: statistical inference for high dimensional sparse nonparametric additive models. To be more precise, this paper develops a methodology for co…
▽ More
Statistical inference in high dimensional settings has recently attracted enormous attention within the literature. However, most published work focuses on the parametric linear regression problem. This paper considers an important extension of this problem: statistical inference for high dimensional sparse nonparametric additive models. To be more precise, this paper develops a methodology for constructing a probability density function on the set of all candidate models. This methodology can also be applied to construct confidence intervals for various quantities of interest (such as noise variance) and confidence bands for the additive functions. This methodology is derived using a generalized fiducial inference framework. It is shown that results produced by the proposed methodology enjoy correct asymptotic frequentist properties. Empirical results obtained from numerical experimentation verify this theoretical claim. Lastly, the methodology is applied to a gene expression data set and discovers new findings for which most existing methods based on parametric linear modeling failed to observe.
△ Less
Submitted 13 November, 2019; v1 submitted 23 September, 2017;
originally announced September 2017.
-
Covariance Estimation via Fiducial Inference
Authors:
W. Jenny Shi,
Jan Hannig,
Randy C. S. Lai,
Thomas C. M. Lee
Abstract:
As a classical problem, covariance estimation has drawn much attention from the statistical community for decades. Much work has been done under the frequentist and the Bayesian frameworks. Aiming to quantify the uncertainty of the estimators without having to choose a prior, we have developed a fiducial approach to the estimation of covariance matrix. Built upon the Fiducial Berstein-von Mises Th…
▽ More
As a classical problem, covariance estimation has drawn much attention from the statistical community for decades. Much work has been done under the frequentist and the Bayesian frameworks. Aiming to quantify the uncertainty of the estimators without having to choose a prior, we have developed a fiducial approach to the estimation of covariance matrix. Built upon the Fiducial Berstein-von Mises Theorem (Sonderegger and Hannig 2014), we show that the fiducial distribution of the covariate matrix is consistent under our framework. Consequently, the samples generated from this fiducial distribution are good estimators to the true covariance matrix, which enable us to define a meaningful confidence region for the covariance matrix. Lastly, we also show that the fiducial approach can be a powerful tool for identifying clique structures in covariance matrices.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.
-
Generalized Fiducial Inference for Ultrahigh Dimensional Regression
Authors:
Randy C. S. Lai,
Jan Hannig,
Thomas C. M. Lee
Abstract:
In recent years the ultrahigh dimensional linear regression problem has attracted enormous attentions from the research community. Under the sparsity assumption most of the published work is devoted to the selection and estimation of the significant predictor variables. This paper studies a different but fundamentally important aspect of this problem: uncertainty quantification for parameter estim…
▽ More
In recent years the ultrahigh dimensional linear regression problem has attracted enormous attentions from the research community. Under the sparsity assumption most of the published work is devoted to the selection and estimation of the significant predictor variables. This paper studies a different but fundamentally important aspect of this problem: uncertainty quantification for parameter estimates and model choices. To be more specific, this paper proposes methods for deriving a probability density function on the set of all possible models, and also for constructing confidence intervals for the corresponding parameters. These proposed methods are developed using the generalized fiducial methodology, which is a variant of Fisher's controversial fiducial idea. Theoretical properties of the proposed methods are studied, and in particular it is shown that statistical inference based on the proposed methods will have exact asymptotic frequentist property. In terms of empirical performances, the proposed methods are tested by simulation experiments and an application to a real data set. Lastly this work can also be seen as an interesting and successful application of Fisher's fiducial idea to an important and contemporary problem. To the best of the authors' knowledge, this is the first time that the fiducial idea is being applied to a so-called "large p small n" problem.
△ Less
Submitted 29 April, 2013;
originally announced April 2013.