-
Estimating Causal Effects of Discrete and Continuous Treatments with Binary Instruments
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Suk** Han,
Kaspar Wüthrich
Abstract:
We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption…
▽ More
We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption, so-called copula invariance, that restricts the local dependence of the copula with respect to the treatment propensity. We show that copula invariance identifies treatment effects for the entire population and other subpopulations such as the treated. The identification results are constructive and lead to straightforward semiparametric estimation procedures based on distribution regression. An application to the effect of sleep on well-being uncovers interesting patterns of heterogeneity.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Chen Huang,
Weining Wang
Abstract:
The Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. However, the estimator is severely biased when the data's time series dimension $T$ is long due to the large degree of overidentification. We show that weak dependence along the panel's time series dimension naturally implies approximate sparsity of the most informative moment conditions, mo…
▽ More
The Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. However, the estimator is severely biased when the data's time series dimension $T$ is long due to the large degree of overidentification. We show that weak dependence along the panel's time series dimension naturally implies approximate sparsity of the most informative moment conditions, motivating the following approach to remove the bias: First, apply LASSO to the cross-section data at each time period to construct most informative (and cross-fitted) instruments, using lagged values of suitable covariates. This step relies on approximate sparsity to select the most informative instruments. Second, apply a linear instrumental variable estimator after first differencing the dynamic structural equation using the constructed instruments. Under weak time series dependence, we show the new estimator is consistent and asymptotically normal under much weaker conditions on $T$'s growth than the Arellano-Bond estimator. Our theory covers models with high dimensional covariates, including multiple lags of the dependent variable, common in modern applications. We illustrate our approach by applying it to weekly county-level panel data from the United States to study opening K-12 schools and other mitigation policies' short and long-term effects on COVID-19's spread.
△ Less
Submitted 29 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Marital Sorting, Household Inequality and Selection
Authors:
Iván Fernández-Val,
Aico van Vuuren,
Francis Vella
Abstract:
Using CPS data for 1976 to 2022 we explore how wage inequality has evolved for married couples with both spouses working full time full year, and its impact on household income inequality. We also investigate how marriage sorting patterns have changed over this period. To determine the factors driving income inequality we estimate a model explaining the joint distribution of wages which accounts f…
▽ More
Using CPS data for 1976 to 2022 we explore how wage inequality has evolved for married couples with both spouses working full time full year, and its impact on household income inequality. We also investigate how marriage sorting patterns have changed over this period. To determine the factors driving income inequality we estimate a model explaining the joint distribution of wages which accounts for the spouses' employment decisions. We find that income inequality has increased for these households and increased assortative matching of wages has exacerbated the inequality resulting from individual wage growth. We find that positive sorting partially reflects the correlation across unobservables influencing both members' of the marriage wages. We decompose the changes in sorting patterns over the 47 years comprising our sample into structural, composition and selection effects and find that the increase in positive sorting primarily reflects the increased skill premia for both observed and unobserved characteristics.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Dynamic Heterogeneous Distribution Regression Panel Models, with an Application to Labor Income Processes
Authors:
Ivan Fernandez-Val,
Wayne Yuan Gao,
Yuan Liao,
Francis Vella
Abstract:
We consider the estimation of a dynamic distribution regression panel data model with heterogeneous coefficients across units. The objects of primary interest are specific functionals of these coefficients. These include predicted actual and stationary distributions of the outcome variable and quantile treatment effects. Coefficients and their functionals are estimated via fixed effect methods. We…
▽ More
We consider the estimation of a dynamic distribution regression panel data model with heterogeneous coefficients across units. The objects of primary interest are specific functionals of these coefficients. These include predicted actual and stationary distributions of the outcome variable and quantile treatment effects. Coefficients and their functionals are estimated via fixed effect methods. We investigate how these functionals vary in response to changes in initial conditions or covariate values. We also identify a uniformity issue related to the robustness of inference to the unknown degree of heterogeneity, and propose a cross-sectional bootstrap method for uniformly valid inference on function-valued objects. Employing PSID annual labor income data we illustrate some important empirical issues we can address. We first quantify the impact of a negative labor income shock on the distribution of future labor income. We also examine the impact on the distribution of labor income from increasing the education level of a chosen group of workers. Finally, we demonstrate the existence of heterogeneity in income mobility, and how this leads to substantial variation in individuals' incidences to be trapped in poverty. We also provide simulation evidence confirming that our procedures work well.
△ Less
Submitted 14 January, 2023; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Low-Rank Approximations of Nonseparable Panel Models
Authors:
Iván Fernández-Val,
Hugo Freeman,
Martin Weidner
Abstract:
We provide estimation methods for nonseparable panel models based on low-rank factor structure approximations. The factor structures are estimated by matrix-completion methods to deal with the computational challenges of principal component analysis in the presence of missing data. We show that the resulting estimators are consistent in large panels, but suffer from approximation and shrinkage bia…
▽ More
We provide estimation methods for nonseparable panel models based on low-rank factor structure approximations. The factor structures are estimated by matrix-completion methods to deal with the computational challenges of principal component analysis in the presence of missing data. We show that the resulting estimators are consistent in large panels, but suffer from approximation and shrinkage biases. We correct these biases using matching and difference-in-differences approaches. Numerical examples and an empirical application to the effect of election day registration on voter turnout in the U.S. illustrate the properties and usefulness of our methods.
△ Less
Submitted 3 March, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Parametric Modeling of Quantile Regression Coefficient Functions with Longitudinal Data
Authors:
Paolo Frumento,
Matteo Bottai,
Iván Fernández-Val
Abstract:
In ordinary quantile regression, quantiles of different order are estimated one at a time. An alternative approach, which is referred to as quantile regression coefficients modeling (QRCM), is to model quantile regression coefficients as parametric functions of the order of the quantile. In this paper, we describe how the QRCM paradigm can be applied to longitudinal data. We introduce a two-level…
▽ More
In ordinary quantile regression, quantiles of different order are estimated one at a time. An alternative approach, which is referred to as quantile regression coefficients modeling (QRCM), is to model quantile regression coefficients as parametric functions of the order of the quantile. In this paper, we describe how the QRCM paradigm can be applied to longitudinal data. We introduce a two-level quantile function, in which two different quantile regression models are used to describe the (conditional) distribution of the within-subject response and that of the individual effects. We propose a novel type of penalized fixed-effects estimator, and discuss its advantages over standard methods based on $\ell_1$ and $\ell_2$ penalization. We provide model identifiability conditions, derive asymptotic properties, describe goodness-of-fit measures and model selection criteria, present simulation results, and discuss an application. The proposed method has been implemented in the R package qrcm.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
Hours Worked and the U.S. Distribution of Real Annual Earnings 1976-2019
Authors:
Iván Fernández-Val,
Franco Peracchi,
Aico van Vuuren,
Francis Vella
Abstract:
We examine the impact of annual hours worked on annual earnings by decomposing changes in the real annual earnings distribution into composition, structural and hours effects. We do so via a nonseparable simultaneous model of hours, wages and earnings. Using the Current Population Survey for the survey years 1976--2019, we find that changes in the female distribution of annual hours of work are im…
▽ More
We examine the impact of annual hours worked on annual earnings by decomposing changes in the real annual earnings distribution into composition, structural and hours effects. We do so via a nonseparable simultaneous model of hours, wages and earnings. Using the Current Population Survey for the survey years 1976--2019, we find that changes in the female distribution of annual hours of work are important in explaining movements in inequality in female annual earnings. This captures the substantial changes in their employment behavior over this period. Movements in the male hours distribution only affect the lower part of their earnings distribution and reflect the sensitivity of these workers' annual hours of work to cyclical factors.
△ Less
Submitted 18 November, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Fast Algorithms for the Quantile Regression Process
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Blaise Melly
Abstract:
The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile…
▽ More
The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction of the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton-Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations.
△ Less
Submitted 6 April, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
SortedEffects: Sorted Causal Effects in R
Authors:
Shuowen Chen,
Victor Chernozhukov,
Iván Fernández-Val,
Ye Luo
Abstract:
Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and le…
▽ More
Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and least affected if their causal effects are above or below some tail sorted effects. The R package SortedEffects implements the estimation and inference methods therein and provides tools to visualize the results. This vignette serves as an introduction to the package and displays basic functionality of the functions within.
△ Less
Submitted 6 November, 2019; v1 submitted 2 September, 2019;
originally announced September 2019.
-
Mastering Panel 'Metrics: Causal Impact of Democracy on Growth
Authors:
Shuowen Chen,
Victor Chernozhukov,
Iván Fernández-Val
Abstract:
The relationship between democracy and economic growth is of long-standing interest. We revisit the panel data analysis of this relationship by Acemoglu, Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric methods. We argue that this and lots of other panel data settings in economics are in fact high-dimensional, resulting in principal estimators -- the fixed effects (FE)…
▽ More
The relationship between democracy and economic growth is of long-standing interest. We revisit the panel data analysis of this relationship by Acemoglu, Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric methods. We argue that this and lots of other panel data settings in economics are in fact high-dimensional, resulting in principal estimators -- the fixed effects (FE) and Arellano-Bond (AB) estimators -- to be biased to the degree that invalidates statistical inference. We can however remove these biases by using simple analytical and sample-splitting methods, and thereby restore valid statistical inference. We find that the debiased FE and AB estimators produce substantially higher estimates of the long-run effect of democracy on growth, providing even stronger support for the key hypothesis in Acemoglu, Naidu, Restrepo and Robinson (forthcoming). Given the ubiquitous nature of panel data, we conclude that the use of debiased panel data estimators should substantially improve the quality of empirical inference in economics.
△ Less
Submitted 12 January, 2019;
originally announced January 2019.
-
Selection and the Distribution of Female Hourly Wages in the U.S
Authors:
Iván Fernández-Val,
Franco Peracchi,
Aico van Vuuren,
Francis Vella
Abstract:
We analyze the role of selection bias in generating the changes in the observed distribution of female hourly wages in the United States using CPS data for the years 1975 to 2020. We account for the selection bias from the employment decision by modeling the distribution of the number of working hours and estimating a nonseparable model of wages. We decompose changes in the wage distribution into…
▽ More
We analyze the role of selection bias in generating the changes in the observed distribution of female hourly wages in the United States using CPS data for the years 1975 to 2020. We account for the selection bias from the employment decision by modeling the distribution of the number of working hours and estimating a nonseparable model of wages. We decompose changes in the wage distribution into composition, structural and selection effects. Composition effects have increased wages at all quantiles while the impact of the structural effects varies by time period and quantile. Changes in the role of selection only appear at the lower quantiles of the wage distribution. The evidence suggests that there is positive selection in the 1970s which diminishes until the later 1990s. This reduces wages at lower quantiles and increases wage inequality. Post 2000 there appears to be an increase in positive sorting which reduces the selection effects on wage inequality.
△ Less
Submitted 27 January, 2022; v1 submitted 21 December, 2018;
originally announced January 2019.
-
Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Siyi Luo
Abstract:
We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractab…
▽ More
We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution.
△ Less
Submitted 18 December, 2023; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Shape-Enforcing Operators for Point and Interval Estimators
Authors:
Xi Chen,
Victor Chernozhukov,
Iván Fernández-Val,
Scott Kostyshak,
Ye Luo
Abstract:
A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these r…
▽ More
A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these restrictions ex post on point and interval estimates of the target function by applying functional operators. If an operator satisfies certain properties that we make precise, the shape-enforced point estimates are closer to the target function than the original point estimates and the shape-enforced interval estimates have greater coverage and shorter length than the original interval estimates. We show that these properties hold for six different operators that cover commonly used shape restrictions in practice: range, convexity, monotonicity, monotone convexity, quasi-convexity, and monotone quasi-convexity. We illustrate the results with two empirical applications to the estimation of a height growth chart for infants in India and a production function for chemical firms in China.
△ Less
Submitted 12 February, 2021; v1 submitted 4 September, 2018;
originally announced September 2018.
-
Network and Panel Quantile Effects Via Distribution Regression
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Martin Weidner
Abstract:
This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution r…
▽ More
This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution regression estimators. These fixed effects estimators are debiased to deal with the incidental parameter problem. Under asymptotic sequences where both dimensions of the data set grow at the same rate, the confidence bands for the quantile functions and effects have correct joint coverage in large samples. An empirical application to gravity models of trade illustrates the applicability of the methods to network data.
△ Less
Submitted 8 June, 2020; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Nonseparable Sample Selection Models with Censored Selection Rules
Authors:
Iván Fernández-Val,
Aico van Vuuren,
Francis Vella
Abstract:
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We derive the conditions for the identification…
▽ More
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We derive the conditions for the identification of these different objects and suggest strategies for estimation. Moreover, we provide the associated asymptotic theory. These strategies are illustrated in an empirical investigation of the determinants of female wages in the United Kingdom.
△ Less
Submitted 29 September, 2020; v1 submitted 26 January, 2018;
originally announced January 2018.
-
Censored Quantile Instrumental Variable Estimation with Stata
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Suk** Han,
Amanda Kowalski
Abstract:
Many applications involve a censored dependent variable and an endogenous independent variable. Chernozhukov et al. (2015) introduced a censored quantile instrumental variable estimator (CQIV) for use in those applications, which has been applied by Kowalski (2016), among others. In this article, we introduce a Stata command, cqiv, that simplifes application of the CQIV estimator in Stata. We summ…
▽ More
Many applications involve a censored dependent variable and an endogenous independent variable. Chernozhukov et al. (2015) introduced a censored quantile instrumental variable estimator (CQIV) for use in those applications, which has been applied by Kowalski (2016), among others. In this article, we introduce a Stata command, cqiv, that simplifes application of the CQIV estimator in Stata. We summarize the CQIV estimator and algorithm, we describe the use of the cqiv command, and we provide empirical examples.
△ Less
Submitted 24 September, 2019; v1 submitted 13 January, 2018;
originally announced January 2018.
-
Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India
Authors:
Victor Chernozhukov,
Mert Demirer,
Esther Duflo,
Iván Fernández-Val
Abstract:
We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxi…
▽ More
We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India.
△ Less
Submitted 23 October, 2023; v1 submitted 13 December, 2017;
originally announced December 2017.
-
Semiparametric Estimation of Structural Functions in Nonseparable Triangular Models
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Whitney Newey,
Sami Stouli,
Francis Vella
Abstract:
Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modelling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces t…
▽ More
Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modelling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces two classes of semiparametric nonseparable triangular models that address these limitations. They are based on distribution and quantile regression modelling of the reduced form conditional distributions of the endogenous variables. We show that average, distribution and quantile structural functions are identified in these systems through a control function approach that does not require a large support condition. We propose a computationally attractive three-stage procedure to estimate the structural functions where the first two stages consist of quantile or distribution regressions. We provide asymptotic theory and uniform inference methods for each stage. In particular, we derive functional central limit theorems and bootstrap functional central limit theorems for the distribution regression estimators of the structural functions. These results establish the validity of the bootstrap for three-stage estimators of structural functions, and lead to simple inference algorithms. We illustrate the implementation and applicability of all our methods with numerical simulations and an empirical application to demand analysis.
△ Less
Submitted 5 October, 2019; v1 submitted 6 November, 2017;
originally announced November 2017.
-
Fixed Effect Estimation of Large T Panel Data Models
Authors:
Iván Fernández-Val,
Martin Weidner
Abstract:
This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved…
▽ More
This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved effects is left unrestricted. Compared to existing reviews on long panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we discuss models with both individual and time effects, split-panel Jackknife bias corrections, unbalanced panels, distribution and quantile effects, and other extensions. Understanding and correcting the incidental parameter bias caused by the estimation of many fixed effects is our main focus, and the unifying theme is that the order of this bias is given by the simple formula p/n for all models discussed, with p the number of estimated parameters and n the total sample size.
△ Less
Submitted 27 March, 2018; v1 submitted 26 September, 2017;
originally announced September 2017.
-
Nonseparable Multinomial Choice Models in Cross-Section and Panel Data
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Whitney Newey
Abstract:
Multinomial choice models are fundamental for empirical modeling of economic choices among discrete alternatives. We analyze identification of binary and multinomial choice models when the choice utilities are nonseparable in observed attributes and multidimensional unobserved heterogeneity with cross-section and panel data. We show that derivatives of choice probabilities with respect to continuo…
▽ More
Multinomial choice models are fundamental for empirical modeling of economic choices among discrete alternatives. We analyze identification of binary and multinomial choice models when the choice utilities are nonseparable in observed attributes and multidimensional unobserved heterogeneity with cross-section and panel data. We show that derivatives of choice probabilities with respect to continuous attributes are weighted averages of utility derivatives in cross-section models with exogenous heterogeneity. In the special case of random coefficient models with an independent additive effect, we further characterize that the probability derivative at zero is proportional to the population mean of the coefficients. We extend the identification results to models with endogenous heterogeneity using either a control function or panel data. In time stationary panel models with two periods, we find that differences over time of derivatives of choice probabilities identify utility derivatives "on the diagonal," i.e. when the observed attributes take the same values in the two periods. We also show that time stationarity does not identify structural derivatives "off the diagonal" both in continuous and multinomial choice panel models.
△ Less
Submitted 9 May, 2018; v1 submitted 26 June, 2017;
originally announced June 2017.
-
Extremal Quantile Regression: An Overview
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Tetsuya Kaji
Abstract:
Extremal quantile regression, i.e. quantile regression applied to the tails of the conditional distribution, counts with an increasing number of economic and financial applications such as value-at-risk, production frontiers, determinants of low infant birth weights, and auction models. This chapter provides an overview of recent developments in the theory and empirics of extremal quantile regress…
▽ More
Extremal quantile regression, i.e. quantile regression applied to the tails of the conditional distribution, counts with an increasing number of economic and financial applications such as value-at-risk, production frontiers, determinants of low infant birth weights, and auction models. This chapter provides an overview of recent developments in the theory and empirics of extremal quantile regression. The advances in the theory have relied on the use of extreme value approximations to the law of the Koenker and Bassett (1978) quantile regression estimator. Extreme value laws not only have been shown to provide more accurate approximations than Gaussian laws at the tails, but also have served as the basis to develop bias corrected estimators and inference methods using simulation and suitable variations of bootstrap and subsampling. The applicability of these methods is illustrated with two empirical examples on conditional value-at-risk and financial contagion.
△ Less
Submitted 8 February, 2017; v1 submitted 20 December, 2016;
originally announced December 2016.
-
quantreg.nonpar: An R Package for Performing Nonparametric Series Quantile Regression
Authors:
Michael Lipsitz,
Alexandre Belloni,
Victor Chernozhukov,
Iván Fernández-Val
Abstract:
The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of cova…
▽ More
The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods. This paper serves as an introduction to the package and displays basic functionality of the functions contained within.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.
-
Counterfactual: An R Package for Counterfactual Analysis
Authors:
Mingli Chen,
Victor Chernozhukov,
Iván Fernández-Val,
Blaise Melly
Abstract:
The Counterfactual package implements the estimation and inference methods of Chernozhukov, Fernández-Val and Melly (2013) for counterfactual analysis. The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be appl…
▽ More
The Counterfactual package implements the estimation and inference methods of Chernozhukov, Fernández-Val and Melly (2013) for counterfactual analysis. The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be applied to estimate quantile treatment effects and wage decompositions. This paper serves as an introduction to the package and displays basic functionality of the commands contained within.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
probitfe and logitfe: Bias corrections for probit and logit models with two-way fixed effects
Authors:
Mario Cruz-Gonzalez,
Ivan Fernandez-Val,
Martin Weidner
Abstract:
We present the Stata commands probitfe and logitfe, which estimate probit and logit panel data models with individual and/or time unobserved effects. Fixed effect panel data methods that estimate the unobserved effects can be severely biased because of the incidental parameter problem (Neyman and Scott, 1948). We tackle this problem by using the analytical and jackknife bias corrections derived in…
▽ More
We present the Stata commands probitfe and logitfe, which estimate probit and logit panel data models with individual and/or time unobserved effects. Fixed effect panel data methods that estimate the unobserved effects can be severely biased because of the incidental parameter problem (Neyman and Scott, 1948). We tackle this problem by using the analytical and jackknife bias corrections derived in Fernandez-Val and Weidner (2016) for panels where the two dimensions ($N$ and $T$) are moderately large. We illustrate the commands with an empirical application to international trade and a Monte Carlo simulation calibrated to this application.
△ Less
Submitted 26 February, 2017; v1 submitted 24 October, 2016;
originally announced October 2016.
-
Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes
Authors:
Victor Chernozhukov,
Iván Fernández-Val,
Blaise Melly,
Kaspar Wüthrich
Abstract:
Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables…
▽ More
Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables. It is based on a natural transformation of simultaneous confidence bands for distribution functions, which are readily available for many problems. The construction is generic and does not depend on the nature of the underlying problem. It works in conjunction with parametric, semiparametric, and nonparametric modeling methods for observed and counterfactual distributions, and does not depend on the sampling scheme. We apply our method to characterize the distributional impact of insurance coverage on health care utilization and obtain the distributional decomposition of the racial test score gap. We find that universal insurance coverage increases the number of doctor visits across the entire distribution, and that the racial test score gap is small at early ages but grows with age due to socio economic factors affecting child development especially at the top of the distribution. These are new, interesting empirical findings that complement previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering existing inference methods invalid for obtaining uniform confidence bands for observed and counterfactual quantile functions and for their difference -- the quantile effects functions.
△ Less
Submitted 30 August, 2018; v1 submitted 17 August, 2016;
originally announced August 2016.
-
The Sorted Effects Method: Discovering Heterogeneous Effects Beyond Their Averages
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Ye Luo
Abstract:
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). Wh…
▽ More
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.
△ Less
Submitted 25 May, 2018; v1 submitted 17 December, 2015;
originally announced December 2015.
-
Nonlinear Factor Models for Network and Panel Data
Authors:
Mingli Chen,
Iván Fernández-Val,
Martin Weidner
Abstract:
Factor structures or interactive effects are convenient devices to incorporate latent variables in panel data models. We consider fixed effect estimation of nonlinear panel single-index models with factor structures in the unobservables, which include logit, probit, ordered probit and Poisson specifications. We establish that fixed effect estimators of model parameters and average partial effects…
▽ More
Factor structures or interactive effects are convenient devices to incorporate latent variables in panel data models. We consider fixed effect estimation of nonlinear panel single-index models with factor structures in the unobservables, which include logit, probit, ordered probit and Poisson specifications. We establish that fixed effect estimators of model parameters and average partial effects have normal distributions when the two dimensions of the panel grow large, but might suffer of incidental parameter bias. We show how models with factor structures can also be applied to capture important features of network data such as reciprocity, degree heterogeneity, homophily in latent variables and clustering. We illustrate this applicability with an empirical example to the estimation of a gravity equation of international trade between countries using a Poisson model with multiple factors.
△ Less
Submitted 15 October, 2019; v1 submitted 17 December, 2014;
originally announced December 2014.
-
Nonparametric Identification in Panels using Quantiles
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Stefan Hoderlein,
Hajo Holzmann,
Whitney Newey
Abstract:
This paper considers identification and estimation of ceteris paribus effects of continuous regressors in nonseparable panel models with time homogeneity. The effects of interest are derivatives of the average and quantile structural functions of the model. We find that these derivatives are identified with two time periods for "stayers", i.e. for individuals with the same regressor values in two…
▽ More
This paper considers identification and estimation of ceteris paribus effects of continuous regressors in nonseparable panel models with time homogeneity. The effects of interest are derivatives of the average and quantile structural functions of the model. We find that these derivatives are identified with two time periods for "stayers", i.e. for individuals with the same regressor values in two time periods. We show that the identification results carry over to models that allow location and scale time effects. We propose nonparametric series methods and a weighted bootstrap scheme to estimate and make inference on the identified effects. The bootstrap proposed allows uniform inference for function-valued parameters such as quantile effects uniformly over a region of quantile indices and/or regressor values. An empirical application to Engel curve estimation with panel data illustrates the results.
△ Less
Submitted 5 August, 2014; v1 submitted 14 December, 2013;
originally announced December 2013.
-
Individual and Time Effects in Nonlinear Panel Models with Large N, T
Authors:
Ivan Fernandez-Val,
Martin Weidner
Abstract:
We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental par…
▽ More
We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental parameter problem, and are asymptotically unbiased under asymptotic sequences where $N/T$ converges to a constant. We develop inference methods and show that they perform well in numerical examples.
△ Less
Submitted 18 December, 2018; v1 submitted 27 November, 2013;
originally announced November 2013.
-
Program Evaluation and Causal Inference with High-Dimensional Data
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Ivan Fernández-Val,
Christian Hansen
Abstract:
In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenou…
▽ More
In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments. We can handle very many control variables, endogenous receipt of treatment, heterogeneous treatment effects, and function-valued outcomes. Our framework covers the special case of exogenous receipt of treatment, either conditional on controls or unconditionally as in randomized control trials. In the latter case, our approach produces efficient estimators and honest bands for (functional) average treatment effects (ATE) and quantile treatment effects (QTE). To make informative inference possible, we assume that key reduced form predictive relationships are approximately sparse. This assumption allows the use of regularization and selection methods to estimate those relations, and we provide methods for post-regularization and post-selection inference that are uniformly valid (honest) across a wide-range of models. We show that a key ingredient enabling honest inference is the use of orthogonal or doubly robust moment conditions in estimating certain reduced form functional parameters. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) eligibility and participation on accumulated assets.
△ Less
Submitted 5 January, 2018; v1 submitted 11 November, 2013;
originally announced November 2013.
-
Panel Data Models with Nonadditive Unobserved Heterogeneity: Estimation and Inference
Authors:
Ivan Fernandez-Val,
Joonhwah Lee
Abstract:
This paper considers fixed effects estimation and inference in linear and nonlinear panel data models with random coefficients and endogenous regressors. The quantities of interest -- means, variances, and other moments of the random coefficients -- are estimated by cross sectional sample moments of GMM estimators applied separately to the time series of each individual. To deal with the incidenta…
▽ More
This paper considers fixed effects estimation and inference in linear and nonlinear panel data models with random coefficients and endogenous regressors. The quantities of interest -- means, variances, and other moments of the random coefficients -- are estimated by cross sectional sample moments of GMM estimators applied separately to the time series of each individual. To deal with the incidental parameter problem introduced by the noise of the within-individual estimators in short panels, we develop bias corrections. These corrections are based on higher-order asymptotic expansions of the GMM estimators and produce improved point and interval estimates in moderately long panels. Under asymptotic sequences where the cross sectional and time series dimensions of the panel pass to infinity at the same rate, the uncorrected estimator has an asymptotic bias of the same order as the asymptotic variance. The bias corrections remove the bias without increasing variance. An empirical example on cigarette demand based on Becker, Grossman and Murphy (1994) shows significant heterogeneity in the price effect across U.S. states.
△ Less
Submitted 11 October, 2013; v1 submitted 13 June, 2012;
originally announced June 2012.
-
Conditional Quantile Processes based on Series or Many Regressors
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Iván Fernández-Val
Abstract:
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In…
▽ More
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the QR-series coefficient process, namely we obtain uniform strong approximations to the QR-series coefficient process by conditionally pivotal and Gaussian processes. Based on these strong approximations, or couplings, we develop four resampling methods (pivotal, gradient bootstrap, Gaussian, and weighted bootstrap) that can be used for inference on the entire QR-series coefficient function.
We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Specifically, we obtain uniform rates of convergence and show how to use the four resampling methods mentioned above for inference on the functionals. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and the covariate value, and covering the pointwise case as a by-product. We demonstrate the practical utility of these results with an example, where we estimate the price elasticity function and test the Slutsky condition of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.
△ Less
Submitted 9 August, 2018; v1 submitted 30 May, 2011;
originally announced May 2011.
-
Quantile Regression with Censoring and Endogeneity
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Amanda Kowalski
Abstract:
In this paper, we develop a new censored quantile instrumental variable (CQIV) estimator and describe its properties and computation. The CQIV estimator combines Powell (1986) censored quantile regression (CQR) to deal with censoring, with a control variable approach to incorporate endogenous regressors. The CQIV estimator is obtained in two stages that are non-additive in the unobservables. The f…
▽ More
In this paper, we develop a new censored quantile instrumental variable (CQIV) estimator and describe its properties and computation. The CQIV estimator combines Powell (1986) censored quantile regression (CQR) to deal with censoring, with a control variable approach to incorporate endogenous regressors. The CQIV estimator is obtained in two stages that are non-additive in the unobservables. The first stage estimates a non-additive model with infinite dimensional parameters for the control variable, such as a quantile or distribution regression model. The second stage estimates a non-additive censored quantile regression model for the response variable of interest, including the estimated control variable to deal with endogeneity. For computation, we extend the algorithm for CQR developed by Chernozhukov and Hong (2002) to incorporate the estimation of the control variable. We give generic regularity conditions for asymptotic normality of the CQIV estimator and for the validity of resampling methods to approximate its asymptotic distribution. We verify these conditions for quantile and distribution regression estimation of the control variable. Our analysis covers two-stage (uncensored) quantile regression with non-additive first stage as an important special case. We illustrate the computation and applicability of the CQIV estimator with a Monte-Carlo numerical example and an empirical application on estimation of Engel curves for alcohol.
△ Less
Submitted 13 March, 2014; v1 submitted 23 April, 2011;
originally announced April 2011.
-
Inference for Extremal Conditional Quantile Models, with an Application to Market and Birthweight Risks
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val
Abstract:
Quantile regression is an increasingly important empirical tool in economics and other sciences for analyzing the impact of a set of regressors on the conditional distribution of an outcome. Extremal quantile regression, or quantile regression applied to the tails, is of interest in many economic and financial applications, such as conditional value-at-risk, production efficiency, and adjustment…
▽ More
Quantile regression is an increasingly important empirical tool in economics and other sciences for analyzing the impact of a set of regressors on the conditional distribution of an outcome. Extremal quantile regression, or quantile regression applied to the tails, is of interest in many economic and financial applications, such as conditional value-at-risk, production efficiency, and adjustment bands in (S,s) models. In this paper we provide feasible inference tools for extremal conditional quantile models that rely upon extreme value approximations to the distribution of self-normalized quantile regression statistics. The methods are simple to implement and can be of independent interest even in the non-regression case. We illustrate the results with two empirical examples analyzing extreme fluctuations of a stock return and extremely low percentiles of live infants' birthweights in the range between 250 and 1500 grams.
△ Less
Submitted 26 December, 2009;
originally announced December 2009.
-
Average and Quantile Effects in Nonseparable Panel Models
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
**yong Hahn,
Whitney Newey
Abstract:
Nonseparable panel models are important in a variety of economic settings, including discrete choice. This paper gives identification and estimation results for nonseparable models under time homogeneity conditions that are like "time is randomly assigned" or "time is an instrument." Partial identification results for average and quantile effects are given for discrete regressors, under static or…
▽ More
Nonseparable panel models are important in a variety of economic settings, including discrete choice. This paper gives identification and estimation results for nonseparable models under time homogeneity conditions that are like "time is randomly assigned" or "time is an instrument." Partial identification results for average and quantile effects are given for discrete regressors, under static or dynamic conditions, in fully nonparametric and in semiparametric models, with time effects. It is shown that the usual, linear, fixed-effects estimator is not a consistent estimator of the identified average effect, and a consistent estimator is given. A simple estimator of identified quantile treatment effects is given, providing a solution to the important problem of estimating quantile treatment effects from panel data. Bounds for overall effects in static and dynamic models are given. The dynamic bounds provide a partial identification solution to the important problem of estimating the effect of state dependence in the presence of unobserved heterogeneity. The impact of $T$, the number of time periods, is shown by deriving shrinkage rates for the identified set as $T$ grows. We also consider semiparametric, discrete-choice models and find that semiparametric panel bounds can be much tighter than nonparametric bounds. Computationally-convenient methods for semiparametric models are presented. We propose a novel inference method that applies in panel data and other settings and show that it produces uniformly valid confidence regions in large samples. We give empirical illustrations.
△ Less
Submitted 26 March, 2013; v1 submitted 13 April, 2009;
originally announced April 2009.
-
Inference on Counterfactual Distributions
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Blaise Melly
Abstract:
Counterfactual distributions are important ingredients for policy analysis and decomposition analysis in empirical economics. In this article we develop modeling and inference tools for counterfactual distributions based on regression methods. The counterfactual scenarios that we consider consist of ceteris paribus changes in either the distribution of covariates related to the outcome of interest…
▽ More
Counterfactual distributions are important ingredients for policy analysis and decomposition analysis in empirical economics. In this article we develop modeling and inference tools for counterfactual distributions based on regression methods. The counterfactual scenarios that we consider consist of ceteris paribus changes in either the distribution of covariates related to the outcome of interest or the conditional distribution of the outcome given covariates. For either of these scenarios we derive joint functional central limit theorems and bootstrap validity results for regression-based estimators of the status quo and counterfactual outcome distributions. These results allow us to construct simultaneous confidence sets for function-valued effects of the counterfactual changes, including the effects on the entire distribution and quantile functions of the outcome as well as on related functionals. These confidence sets can be used to test functional hypotheses such as no-effect, positive effect, or stochastic dominance. Our theory applies to general counterfactual changes and covers the main regression methods including classical, quantile, duration, and distribution regressions. We illustrate the results with an empirical application to wage decompositions using data for the United States.
As a part of develo** the main results, we introduce distribution regression as a comprehensive and flexible tool for modeling and estimating the \textit{entire} conditional distribution. We show that distribution regression encompasses the Cox duration regression and represents a useful alternative to quantile regression. We establish functional central limit theorems and bootstrap validity results for the empirical distribution regression process and various related functionals.
△ Less
Submitted 18 September, 2013; v1 submitted 6 April, 2009;
originally announced April 2009.
-
Improving Point and Interval Estimates of Monotone Functions by Rearrangement
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Alfred Galichon
Abstract:
Suppose that a target function is monotonic, namely, weakly increasing, and an available original estimate of this target function is not weakly increasing. Rearrangements, univariate and multivariate, transform the original estimate to a monotonic estimate that always lies closer in common metrics to the target function. Furthermore, suppose an original simultaneous confidence interval, which c…
▽ More
Suppose that a target function is monotonic, namely, weakly increasing, and an available original estimate of this target function is not weakly increasing. Rearrangements, univariate and multivariate, transform the original estimate to a monotonic estimate that always lies closer in common metrics to the target function. Furthermore, suppose an original simultaneous confidence interval, which covers the target function with probability at least $1-α$, is defined by an upper and lower end-point functions that are not weakly increasing. Then the rearranged confidence interval, defined by the rearranged upper and lower end-point functions, is shorter in length in common norms than the original interval and also covers the target function with probability at least $1-α$. We demonstrate the utility of the improved point and interval estimates with an age-height growth chart example.
△ Less
Submitted 7 November, 2008; v1 submitted 28 June, 2008;
originally announced June 2008.
-
Rearranging Edgeworth-Cornish-Fisher Expansions
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Alfred Galichon
Abstract:
This paper applies a regularization procedure called increasing rearrangement to monotonize Edgeworth and Cornish-Fisher expansions and any other related approximations of distribution and quantile functions of sample statistics. Besides satisfying the logical monotonicity, required of distribution and quantile functions, the procedure often delivers strikingly better approximations to the distrib…
▽ More
This paper applies a regularization procedure called increasing rearrangement to monotonize Edgeworth and Cornish-Fisher expansions and any other related approximations of distribution and quantile functions of sample statistics. Besides satisfying the logical monotonicity, required of distribution and quantile functions, the procedure often delivers strikingly better approximations to the distribution and quantile functions of the sample mean than the original Edgeworth-Cornish-Fisher expansions.
△ Less
Submitted 30 May, 2013; v1 submitted 12 August, 2007;
originally announced August 2007.
-
Improving Estimates of Monotone Functions by Rearrangement
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Alfred Galichon
Abstract:
Suppose that a target function is monotonic, namely, weakly increasing, and an original estimate of the target function is available, which is not weakly increasing. Many common estimation methods used in statistics produce such estimates. We show that these estimates can always be improved with no harm using rearrangement techniques: The rearrangement methods, univariate and multivariate, trans…
▽ More
Suppose that a target function is monotonic, namely, weakly increasing, and an original estimate of the target function is available, which is not weakly increasing. Many common estimation methods used in statistics produce such estimates. We show that these estimates can always be improved with no harm using rearrangement techniques: The rearrangement methods, univariate and multivariate, transform the original estimate to a monotonic estimate, and the resulting estimate is closer to the true curve in common metrics than the original estimate. We illustrate the results with a computational example and an empirical example dealing with age-height growth charts.
△ Less
Submitted 3 November, 2010; v1 submitted 27 April, 2007;
originally announced April 2007.
-
Quantile and Probability Curves Without Crossing
Authors:
Victor Chernozhukov,
Ivan Fernandez-Val,
Alfred Galichon
Abstract:
This paper proposes a method to address the longstanding problem of lack of monotonicity in estimation of conditional and structural quantile functions, also known as the quantile crossing problem. The method consists in sorting or monotone rearranging the original estimated non-monotone curve into a monotone rearranged curve. We show that the rearranged curve is closer to the true quantile curve…
▽ More
This paper proposes a method to address the longstanding problem of lack of monotonicity in estimation of conditional and structural quantile functions, also known as the quantile crossing problem. The method consists in sorting or monotone rearranging the original estimated non-monotone curve into a monotone rearranged curve. We show that the rearranged curve is closer to the true quantile curve in finite samples than the original curve, establish a functional delta method for rearrangement-related operators, and derive functional limit theory for the entire rearranged curve and its functionals. We also establish validity of the bootstrap for estimating the limit law of the the entire rearranged curve and its functionals. Our limit results are generic in that they apply to every estimator of a monotone econometric function, provided that the estimator satisfies a functional central limit theorem and the function satisfies some smoothness conditions. Consequently, our results apply to estimation of other econometric functions with monotonicity restrictions, such as demand, production, distribution, and structural distribution functions. We illustrate the results with an application to estimation of structural quantile functions using data on Vietnam veteran status and earnings.
△ Less
Submitted 14 July, 2014; v1 submitted 27 April, 2007;
originally announced April 2007.