-
Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results
Authors:
Subhankar Ghosh,
Jayant Gupta,
Arun Sharma,
Shuai An,
Shashi Shekhar
Abstract:
Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The prob…
▽ More
Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Informality, Education-Occupation Mismatch, and Wages: Evidence from India
Authors:
Shweta Bahl,
Ajay Sharma
Abstract:
This article examines the intertwining relationship between informality and education-occupation mismatch and the consequent impact on wages. In particular, we discuss two issues: first, the relative importance of informality and education-occupation mismatch in determining wages, and second, the relevance of EOM for formal and informal workers. The analysis reveals that although both informality…
▽ More
This article examines the intertwining relationship between informality and education-occupation mismatch and the consequent impact on wages. In particular, we discuss two issues: first, the relative importance of informality and education-occupation mismatch in determining wages, and second, the relevance of EOM for formal and informal workers. The analysis reveals that although both informality and EOM are significant determinants of wages, the former is more crucial for a develo** country like India. Further, we find that EOM is one of the crucial determinants of wages for formal workers, but it is not critical for informal workers. The study highlights the need for considering the bifurcation of formal-informal workers to understand the complete dynamics of EOM, especially for develo** countries where informality is predominant.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Strategic Environmental Corporate Social Responsibility (ECSR) Certification and Endogenous Market Structure
Authors:
Ajay Sharma,
Siddhartha Rastogi
Abstract:
This paper extends the findings of Liu et al. (2015, Strategic environmental corporate social responsibility in a differentiated duopoly market, Economics Letters), along two dimensions. First, we consider the case of endogenous market structure a la Vives and Singh (1984, Price and quantity competition in a differentiated duopoly, The Rand Journal of Economics). Second, we refine the ECSR certifi…
▽ More
This paper extends the findings of Liu et al. (2015, Strategic environmental corporate social responsibility in a differentiated duopoly market, Economics Letters), along two dimensions. First, we consider the case of endogenous market structure a la Vives and Singh (1984, Price and quantity competition in a differentiated duopoly, The Rand Journal of Economics). Second, we refine the ECSR certification standards in differentiated duopoly with rankings. We find that optimal ECSR certification standards by NGO are the highest in Bertrand competition, followed by mixed markets and the lowest in Cournot competition. Next, NGO certifier will set the ECSR standards below the optimal level. Also, we show that given the ECSR certification standards, there is a possibility of both price and quantity contracts choices by the firms in endogenous market structure.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Long Story Short: Omitted Variable Bias in Causal Machine Learning
Authors:
Victor Chernozhukov,
Carlos Cinelli,
Whitney Newey,
Amit Sharma,
Vasilis Syrgkanis
Abstract:
We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such…
▽ More
We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias, thus facilitating sensitivity analysis in otherwise complex, nonlinear models. Finally, we provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation. These results allow empirical researchers to perform sensitivity analyses in a flexible class of machine-learned causal models using very simple, and interpretable, tools. We demonstrate the utility of our approach with two empirical examples.
△ Less
Submitted 26 May, 2024; v1 submitted 26 December, 2021;
originally announced December 2021.
-
DoWhy: An End-to-End Library for Causal Inference
Authors:
Amit Sharma,
Emre Kiciman
Abstract:
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Py…
▽ More
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step. The library is available at https://github.com/microsoft/dowhy
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Necessary and Probably Sufficient Test for Finding Valid Instrumental Variables
Authors:
Amit Sharma
Abstract:
Can instrumental variables be found from data? While instrumental variable (IV) methods are widely used to identify causal effect, testing their validity from observed data remains a challenge. This is because validity of an IV depends on two assumptions, exclusion and as-if-random, that are largely believed to be untestable from data. In this paper, we show that under certain conditions, testing…
▽ More
Can instrumental variables be found from data? While instrumental variable (IV) methods are widely used to identify causal effect, testing their validity from observed data remains a challenge. This is because validity of an IV depends on two assumptions, exclusion and as-if-random, that are largely believed to be untestable from data. In this paper, we show that under certain conditions, testing for instrumental variables is possible. We build upon prior work on necessary tests to derive a test that characterizes the odds of being a valid instrument, thus yielding the name "necessary and probably sufficient". The test works by defining the class of invalid-IV and valid-IV causal models as Bayesian generative models and comparing their marginal likelihood based on observed data. When all variables are discrete, we also provide a method to efficiently compute these marginal likelihoods.
We evaluate the test on an extensive set of simulations for binary data, inspired by an open problem for IV testing proposed in past work. We find that the test is most powerful when an instrument follows monotonicity---effect on treatment is either non-decreasing or non-increasing---and has moderate-to-weak strength; incidentally, such instruments are commonly used in observational studies. Among as-if-random and exclusion, it detects exclusion violations with higher power. Applying the test to IVs from two seminal studies on instrumental variables and five recent studies from the American Economic Review shows that many of the instruments may be flawed, at least when all variables are discretized. The proposed test opens the possibility of data-driven validation and search for instrumental variables.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.