Search | arXiv e-print repository

A Calibrated Sensitivity Analysis for Weighted Causal Decompositions

Authors: Andy Shen, Elina Visoki, Ran Barzilay, Samuel D. Pimentel

Abstract: Disparities in health or well-being experienced by minority groups can be difficult to study using the traditional exposure-outcome paradigm in causal inference, since potential outcomes in variables such as race or sexual minority status are challenging to interpret. Causal decomposition analysis addresses this gap by positing causal effects on disparities under interventions to other, intervenab… ▽ More Disparities in health or well-being experienced by minority groups can be difficult to study using the traditional exposure-outcome paradigm in causal inference, since potential outcomes in variables such as race or sexual minority status are challenging to interpret. Causal decomposition analysis addresses this gap by positing causal effects on disparities under interventions to other, intervenable exposures that may play a mediating role in the disparity. While invoking weaker assumptions than causal mediation approaches, decomposition analyses are often conducted in observational settings and require uncheckable assumptions that eliminate unmeasured confounders. Leveraging the marginal sensitivity model, we develop a sensitivity analysis for weighted causal decomposition estimators and use the percentile bootstrap to construct valid confidence intervals for causal effects on disparities. We also propose a two-parameter amplification that enhances interpretability and facilitates an intuitive understanding of the plausibility of unmeasured confounders and their effects. We illustrate our framework on a study examining the effect of parental acceptance on disparities in suicidal ideation among sexual minority youth. We find that the effect is small and sensitive to unmeasured confounding, suggesting that further screening studies are needed to identify mitigating interventions in this vulnerable population. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2402.05330 [pdf, other]

Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

Authors: Luca Masserano, Alex Shen, Michele Doro, Tommaso Dorigo, Rafael Izbicki, Ann B. Lee

Abstract: An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data… ▽ More An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data $\mathbf{X}$ as covariates leads to biased predictions and invalid uncertainty estimates of labels $Y$. We overcome these biases by proposing a new method for robust uncertainty quantification that casts classification as a hypothesis testing problem under nuisance parameters. The key idea is to estimate the classifier's receiver operating characteristic (ROC) across the entire nuisance parameter space, which allows us to devise cutoffs that are invariant under GLS. Our method effectively endows a pre-trained classifier with domain adaptation capabilities and returns valid prediction sets while maintaining high power. We demonstrate its performance on two challenging scientific problems in biology and astroparticle physics with data from realistic mechanistic models. △ Less

Submitted 1 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 26 pages, 19 figures, code available at https://github.com/lee-group-cmu/lf2i

arXiv:2306.01911 [pdf, other]

Generalized Bayesian MARS: Tools for Emulating Stochastic Computer Models

Authors: Kellin Rumsey, Devin Francom, Andy Shen

Abstract: The multivariate adaptive regression spline (MARS) approach of Friedman (1991) and its Bayesian counterpart (Francom et al. 2018) are effective approaches for the emulation of computer models. The traditional assumption of Gaussian errors limits the usefulness of MARS, and many popular alternatives, when dealing with stochastic computer models. We propose a generalized Bayesian MARS (GBMARS) frame… ▽ More The multivariate adaptive regression spline (MARS) approach of Friedman (1991) and its Bayesian counterpart (Francom et al. 2018) are effective approaches for the emulation of computer models. The traditional assumption of Gaussian errors limits the usefulness of MARS, and many popular alternatives, when dealing with stochastic computer models. We propose a generalized Bayesian MARS (GBMARS) framework which admits the broad class of generalized hyperbolic distributions as the induced likelihood function. This allows us to develop tools for the emulation of stochastic simulators which are parsimonious, scalable, interpretable and require minimal tuning, while providing powerful predictive and uncertainty quantification capabilities. GBMARS is capable of robust regression with t distributions, quantile regression with asymmetric Laplace distributions and a general form of "Normal-Wald" regression in which the shape of the error distribution and the structure of the mean function are learned simultaneously. We demonstrate the effectiveness of GBMARS on various stochastic computer models and we show that it compares favorably to several popular alternatives. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:1905.07790 [pdf, other]

Correlation Coefficients and Semantic Textual Similarity

Authors: Vitalii Zhelezniak, Aleksandar Savkov, April Shen, Nils Y. Hammerla

Abstract: A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustra… ▽ More A large body of research into semantic textual similarity has focused on constructing state-of-the-art embeddings using sophisticated modelling, careful choice of learning signals and many clever tricks. By contrast, little attention has been devoted to similarity measures between these embeddings, with cosine similarity being used unquestionably in the majority of cases. In this work, we illustrate that for all common word vectors, cosine similarity is essentially equivalent to the Pearson correlation coefficient, which provides some justification for its use. We thoroughly characterise cases where Pearson correlation (and thus cosine similarity) is unfit as similarity measure. Importantly, we show that Pearson correlation is appropriate for some word vectors but not others. When it is not appropriate, we illustrate how common non-parametric rank correlation coefficients can be used instead to significantly improve performance. We support our analysis with a series of evaluations on word-level and sentence-level semantic textual similarity benchmarks. On the latter, we show that even the simplest averaged word vectors compared by rank correlation easily rival the strongest deep representations compared by cosine similarity. △ Less

Submitted 19 May, 2019; originally announced May 2019.

Comments: Accepted as a long paper at NAACL-HLT 2019

arXiv:1204.0307 [pdf, other]

Elections and statistics: the case of "United Russia", 2009-2020

Authors: Alexander Shen

Abstract: This survey contains statistics on elections in Russia published in different places and available online. This data is discussed from the viewpoint of statistical model selection. The current version is updated including the materials up to July, 2020 voting on constitutional changes, Belarus 2020 elections and papers that appeared in 2020; most of the data are not consistent with the assumption… ▽ More This survey contains statistics on elections in Russia published in different places and available online. This data is discussed from the viewpoint of statistical model selection. The current version is updated including the materials up to July, 2020 voting on constitutional changes, Belarus 2020 elections and papers that appeared in 2020; most of the data are not consistent with the assumption of fair elections. △ Less

Submitted 7 September, 2020; v1 submitted 1 April, 2012; originally announced April 2012.

Comments: in Russian

MSC Class: 91F10

arXiv:0912.4269 [pdf, ps, other]

doi 10.1214/10-STS347

Test Martingales, Bayes Factors and $p$-Values

Authors: Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, Vladimir Vovk

Abstract: A nonnegative martingale with initial value equal to one measures evidence against a probabilistic hypothesis. The inverse of its value at some stop** time can be interpreted as a Bayes factor. If we exaggerate the evidence by considering the largest value attained so far by such a martingale, the exaggeration will be limited, and there are systematic ways to eliminate it. The inverse of the exa… ▽ More A nonnegative martingale with initial value equal to one measures evidence against a probabilistic hypothesis. The inverse of its value at some stop** time can be interpreted as a Bayes factor. If we exaggerate the evidence by considering the largest value attained so far by such a martingale, the exaggeration will be limited, and there are systematic ways to eliminate it. The inverse of the exaggerated value at some stop** time can be interpreted as a $p$-value. We give a simple characterization of all increasing functions that eliminate the exaggeration. △ Less

Submitted 16 June, 2011; v1 submitted 21 December, 2009; originally announced December 2009.

Comments: Published in at http://dx.doi.org/10.1214/10-STS347 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS347

Journal ref: Statistical Science 2011, Vol. 26, No. 1, 84-101

Showing 1–6 of 6 results for author: Shen, A