-
Double Robustness of Local Projections and Some Unpleasant VARithmetic
Authors:
José Luis Montiel Olea,
Mikkel Plagborg-Møller,
Eric Qian,
Christian K. Wolf
Abstract:
We consider impulse response inference in a locally misspecified stationary vector autoregression (VAR) model. The conventional local projection (LP) confidence interval has correct coverage even when the misspecification is so large that it can be detected with probability approaching 1. This follows from a "double robustness" property analogous to that of modern estimators for partially linear r…
▽ More
We consider impulse response inference in a locally misspecified stationary vector autoregression (VAR) model. The conventional local projection (LP) confidence interval has correct coverage even when the misspecification is so large that it can be detected with probability approaching 1. This follows from a "double robustness" property analogous to that of modern estimators for partially linear regressions. In contrast, VAR confidence intervals dramatically undercover even for misspecification so small that it is difficult to detect statistically and cannot be ruled out based on economic theory. This is because of a "no free lunch" result for VARs: the worst-case bias and coverage distortion are small if, and only if, the variance is close to that of LP. While VAR coverage can be restored by using a bias-aware critical value or a large lag length, the resulting confidence interval tends to be at least as wide as the LP interval.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Decision Theory for Treatment Choice Problems with Partial Identification
Authors:
José Luis Montiel Olea,
Chen Qiu,
Jörg Stoye
Abstract:
We apply classical statistical decision theory to a large class of treatment choice problems with partial identification, revealing important theoretical and practical challenges but also interesting research opportunities. The challenges are: In a general class of problems with Gaussian likelihood, all decision rules are admissible; it is maximin-welfare optimal to ignore all data; and, for sever…
▽ More
We apply classical statistical decision theory to a large class of treatment choice problems with partial identification, revealing important theoretical and practical challenges but also interesting research opportunities. The challenges are: In a general class of problems with Gaussian likelihood, all decision rules are admissible; it is maximin-welfare optimal to ignore all data; and, for severe enough partial identification, there are infinitely many minimax-regret optimal decision rules, all of which sometimes randomize the policy recommendation. The opportunities are: We introduce a profiled regret criterion that can reveal important differences between rules and render some of them inadmissible; and we uniquely characterize the minimax-regret optimal rule that least frequently randomizes. We apply our results to aggregation of experimental estimates for policy adoption, to extrapolation of Local Average Treatment Effects, and to policy making in the presence of omitted variable bias.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
The out-of-sample prediction error of the square-root-LASSO and related estimators
Authors:
José Luis Montiel Olea,
Cynthia Rush,
Amilcar Velez,
Johannes Wiesel
Abstract:
We study the classical problem of predicting an outcome variable, $Y$, using a linear combination of a $d$-dimensional covariate vector, $\mathbf{X}$. We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbolβ \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}β\right)^r \right] \right)^{1/r} +δ\, ρ\left(\boldsymbolβ\right),…
▽ More
We study the classical problem of predicting an outcome variable, $Y$, using a linear combination of a $d$-dimensional covariate vector, $\mathbf{X}$. We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbolβ \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}β\right)^r \right] \right)^{1/r} +δ\, ρ\left(\boldsymbolβ\right), \end{align*} where $δ>0$ is a regularization parameter, $ρ:\mathbb{R}^d\to \mathbb{R}_+$ is a convex penalty function, $\mathbb{P}_n$ is the empirical distribution of the data, and $r\geq 1$. We present three sets of new results. First, we provide conditions under which linear predictors based on these estimators % solve a \emph{distributionally robust optimization} problem: they minimize the worst-case prediction error over distributions that are close to each other in a type of \emph{max-sliced Wasserstein metric}. Second, we provide a detailed finite-sample and asymptotic analysis of the statistical properties of the balls of distributions over which the worst-case prediction error is analyzed. Third, we use the distributionally robust optimality and our statistical analysis to present i) an oracle recommendation for the choice of regularization parameter, $δ$, that guarantees good out-of-sample prediction error; and ii) a test-statistic to rank the out-of-sample performance of two different linear estimators. None of our results rely on sparsity assumptions about the true data generating process; thus, they broaden the scope of use of the square-root lasso and related estimators in prediction problems.
△ Less
Submitted 8 April, 2024; v1 submitted 14 November, 2022;
originally announced November 2022.
-
On the Robustness to Misspecification of $α$-Posteriors and Their Variational Approximations
Authors:
Marco Avella Medina,
José Luis Montiel Olea,
Cynthia Rush,
Amilcar Velez
Abstract:
$α…
▽ More
$α$-posteriors and their variational approximations distort standard posterior inference by downweighting the likelihood and introducing variational approximation errors. We show that such distortions, if tuned appropriately, reduce the Kullback-Leibler (KL) divergence from the true, but perhaps infeasible, posterior distribution when there is potential parametric model misspecification. To make this point, we derive a Bernstein-von Mises theorem showing convergence in total variation distance of $α$-posteriors and their variational approximations to limiting Gaussian distributions. We use these distributions to evaluate the KL divergence between true and reported posteriors. We show this divergence is minimized by choosing $α$ strictly smaller than one, assuming there is a vanishingly small probability of model misspecification. The optimized value becomes smaller as the the misspecification becomes more severe. The optimized KL divergence increases logarithmically in the degree of misspecification and not linearly as with the usual posterior.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Machine Learning's Dropout Training is Distributionally Robust Optimal
Authors:
Jose Blanchet,
Yang Kang,
Jose Luis Montiel Olea,
Viet Anh Nguyen,
Xuhui Zhang
Abstract:
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game, nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some f…
▽ More
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician's covariates using a multiplicative nonparametric errors-in-variables model. In this game, nature's least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability $δ$. This result implies that dropout training indeed provides out-of-sample expected loss guarantees for distributions that arise from multiplicative perturbations of in-sample data. In addition to the decision-theoretic analysis, the paper makes two more contributions. First, there is a concrete recommendation on how to select the tuning parameter $δ$ to guarantee that, as the sample size grows large, the in-sample loss after dropout training exceeds the true population loss with some pre-specified probability. Second, the paper provides a novel, parallelizable, Unbiased Multi-Level Monte Carlo algorithm to speed-up the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout, provided the number of data points is much smaller than the dimension of the covariate vector.
△ Less
Submitted 14 April, 2021; v1 submitted 13 September, 2020;
originally announced September 2020.
-
Local Projection Inference is Simpler and More Robust Than You Think
Authors:
José Luis Montiel Olea,
Mikkel Plagborg-Møller
Abstract:
Applied macroeconomists often compute confidence intervals for impulse responses using local projections, i.e., direct linear regressions of future outcomes on current covariates. This paper proves that local projection inference robustly handles two issues that commonly arise in applications: highly persistent data and the estimation of impulse responses at long horizons. We consider local projec…
▽ More
Applied macroeconomists often compute confidence intervals for impulse responses using local projections, i.e., direct linear regressions of future outcomes on current covariates. This paper proves that local projection inference robustly handles two issues that commonly arise in applications: highly persistent data and the estimation of impulse responses at long horizons. We consider local projections that control for lags of the variables in the regression. We show that lag-augmented local projections with normal critical values are asymptotically valid uniformly over (i) both stationary and non-stationary data, and also over (ii) a wide range of response horizons. Moreover, lag augmentation obviates the need to correct standard errors for serial correlation in the regression residuals. Hence, local projection inference is arguably both simpler than previously thought and more robust than standard autoregressive inference, whose validity is known to depend sensitively on the persistence of the data and on the length of the horizon.
△ Less
Submitted 21 December, 2022; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Competing Models
Authors:
Jose Luis Montiel Olea,
Pietro Ortoleva,
Mallesh M Pai,
Andrea Prat
Abstract:
Different agents need to make a prediction. They observe identical data, but have different models: they predict using different explanatory variables. We study which agent believes they have the best predictive ability -- as measured by the smallest subjective posterior mean squared prediction error -- and show how it depends on the sample size. With small samples, we present results suggesting i…
▽ More
Different agents need to make a prediction. They observe identical data, but have different models: they predict using different explanatory variables. We study which agent believes they have the best predictive ability -- as measured by the smallest subjective posterior mean squared prediction error -- and show how it depends on the sample size. With small samples, we present results suggesting it is an agent using a low-dimensional model. With large samples, it is generally an agent with a high-dimensional model, possibly including irrelevant variables, but never excluding relevant ones. We apply our results to characterize the winning model in an auction of productive assets, to argue that entrepreneurs and investors with simple models will be over-represented in new sectors, and to understand the proliferation of "factors" that explain the cross-sectional variation of expected stock returns in the asset-pricing literature.
△ Less
Submitted 11 November, 2021; v1 submitted 8 July, 2019;
originally announced July 2019.