-
Wisdom of the Crowds or Ignorance of the Masses? A data-driven guide to WSB
Authors:
Valentina Semenova,
Dragos Gorduza,
William Wildi,
Xiaowen Dong,
Stefan Zohren
Abstract:
A trite yet fundamental question in economics is: What causes large asset price fluctuations? A tenfold rise in the price of GameStop equity, between the 22nd and 28th of January 2021, demonstrated that herding behaviour among retail investors is an important contributing factor. This paper presents a data-driven guide to the forum that started the hype -- WallStreetBets (WSB). Our initial experim…
▽ More
A trite yet fundamental question in economics is: What causes large asset price fluctuations? A tenfold rise in the price of GameStop equity, between the 22nd and 28th of January 2021, demonstrated that herding behaviour among retail investors is an important contributing factor. This paper presents a data-driven guide to the forum that started the hype -- WallStreetBets (WSB). Our initial experiments decompose the forum using a large language topic model and network tools. The topic model describes the evolution of the forum over time and shows the persistence of certain topics (such as the market / S\&P500 discussion), and the sporadic interest in others, such as COVID or crude oil. Network analysis allows us to decompose the landscape of retail investors into clusters based on their posting and discussion habits; several large, correlated asset discussion clusters emerge, surrounded by smaller, niche ones. A second set of experiments assesses the impact that WSB discussions have had on the market. We show that forum activity has a Granger-causal relationship with the returns of several assets, some of which are now commonly classified as `meme stocks', while others have gone under the radar. The paper extracts a set of short-term trade signals from posts and long-term (monthly and weekly) trade signals from forum dynamics, and considers their predictive power at different time horizons. In addition to the analysis, the paper presents the dataset, as well as an interactive dashboard, in order to promote further research.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Aggregated Intersection Bounds and Aggregated Minimax Values
Authors:
Vira Semenova
Abstract:
This paper proposes a novel framework of aggregated intersection bounds, where the target parameter is obtained by averaging the minimum (or maximum) of a collection of regression functions over the covariate space. Examples of such quantities include the lower and upper bounds on distributional effects (Fréchet-Hoeffding, Makarov) as well as the optimal welfare in statistical treatment choice pro…
▽ More
This paper proposes a novel framework of aggregated intersection bounds, where the target parameter is obtained by averaging the minimum (or maximum) of a collection of regression functions over the covariate space. Examples of such quantities include the lower and upper bounds on distributional effects (Fréchet-Hoeffding, Makarov) as well as the optimal welfare in statistical treatment choice problems. The proposed estimator -- the envelope score estimator -- is shown to have an oracle property, where the oracle knows the identity of the minimizer for each covariate value. Next, the result is extended to the aggregated minimax values of a collection of regression functions, covering optimal distributional welfare in worst-case and best-case, respectively. This proposed estimator -- the envelope saddle value estimator -- is shown to have an oracle property, where the oracle knows the identity of the saddle point.
△ Less
Submitted 11 June, 2024; v1 submitted 2 March, 2023;
originally announced March 2023.
-
Social contagion and asset prices: Reddit's self-organised bull runs
Authors:
Valentina Semenova,
Julian Winkler
Abstract:
Can unstructured text data from social media help explain the drivers of large asset price fluctuations? This paper investigates how social forces affect asset prices, by using machine learning tools to extract beliefs and positions of `hype' traders active on Reddit's WallStreetBets (WSB) forum. Our stylized model shows that peer effects help explain return predictability and reversals, as well a…
▽ More
Can unstructured text data from social media help explain the drivers of large asset price fluctuations? This paper investigates how social forces affect asset prices, by using machine learning tools to extract beliefs and positions of `hype' traders active on Reddit's WallStreetBets (WSB) forum. Our stylized model shows that peer effects help explain return predictability and reversals, as well as bubble dynamics. We empirically document that sentiments expressed by WSB users about assets' future performances (bullish or bearish) are in part due to sentiments of their peers and past asset returns. The paper directly estimates the effect of WSB activity on asset prices. We document: that retail trader demand follows WSB discussions through using Trade and Quote (TAQ) data, the predictability of prices from retail trader discourse, the amplified market impact of idiosyncratic investor sentiment from viral content online, and the greater exposure of hype investors to bubbles in the markets.
△ Less
Submitted 8 August, 2023; v1 submitted 5 April, 2021;
originally announced April 2021.
-
Generalized Lee Bounds
Authors:
Vira Semenova
Abstract:
Lee (2009) is a common approach to bound the average causal effect in the presence of selection bias, assuming the treatment effect on selection has the same sign for all subjects. This paper generalizes Lee bounds to allow the sign of this effect to be identified by pretreatment covariates, relaxing the standard (unconditional) monotonicity to its conditional analog. Asymptotic theory for general…
▽ More
Lee (2009) is a common approach to bound the average causal effect in the presence of selection bias, assuming the treatment effect on selection has the same sign for all subjects. This paper generalizes Lee bounds to allow the sign of this effect to be identified by pretreatment covariates, relaxing the standard (unconditional) monotonicity to its conditional analog. Asymptotic theory for generalized Lee bounds is proposed in low-dimensional smooth and high-dimensional sparse designs. The paper also generalizes Lee bounds to accommodate multiple outcomes. It characterizes the sharp identified set for the causal parameter and proposes uniform Gaussian inference on the support function. The estimated bounds achieve nearly point-identification in JobCorps job training program (Lee (2009)), where unconditional monotonicity is unlikely to hold.
△ Less
Submitted 28 February, 2023; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Inference on weighted average value function in high-dimensional state space
Authors:
Victor Chernozhukov,
Whitney Newey,
Vira Semenova
Abstract:
This paper gives a consistent, asymptotically normal estimator of the expected value function when the state space is high-dimensional and the first-stage nuisance functions are estimated by modern machine learning tools. First, we show that value function is orthogonal to the conditional choice probability, therefore, this nuisance function needs to be estimated only at $n^{-1/4}$ rate. Second, w…
▽ More
This paper gives a consistent, asymptotically normal estimator of the expected value function when the state space is high-dimensional and the first-stage nuisance functions are estimated by modern machine learning tools. First, we show that value function is orthogonal to the conditional choice probability, therefore, this nuisance function needs to be estimated only at $n^{-1/4}$ rate. Second, we give a correction term for the transition density of the state variable. The resulting orthogonal moment is robust to misspecification of the transition density and does not require this nuisance function to be consistently estimated. Third, we generalize this result by considering the weighted expected value. In this case, the orthogonal moment is doubly robust in the transition density and additional second-stage nuisance functions entering the correction term. We complete the asymptotic theory by providing bounds on second-order asymptotic terms.
△ Less
Submitted 24 August, 2019;
originally announced August 2019.
-
Machine Learning for Dynamic Discrete Choice
Authors:
Vira Semenova
Abstract:
Dynamic discrete choice models often discretize the state vector and restrict its dimension in order to achieve valid inference. I propose a novel two-stage estimator for the set-identified structural parameter that incorporates a high-dimensional state space into the dynamic model of imperfect competition. In the first stage, I estimate the state variable's law of motion and the equilibrium polic…
▽ More
Dynamic discrete choice models often discretize the state vector and restrict its dimension in order to achieve valid inference. I propose a novel two-stage estimator for the set-identified structural parameter that incorporates a high-dimensional state space into the dynamic model of imperfect competition. In the first stage, I estimate the state variable's law of motion and the equilibrium policy function using machine learning tools. In the second stage, I plug the first-stage estimates into a moment inequality and solve for the structural parameter. The moment function is presented as the sum of two components, where the first one expresses the equilibrium assumption and the second one is a bias correction term that makes the sum insensitive (i.e., orthogonal) to first-stage bias. The proposed estimator uniformly converges at the root-N rate and I use it to construct confidence regions. The results developed here can be used to incorporate high-dimensional state space into classic dynamic discrete choice models, for example, those considered in Rust (1987), Bajari et al. (2007), and Scott (2013).
△ Less
Submitted 5 November, 2018; v1 submitted 7 August, 2018;
originally announced August 2018.
-
Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models
Authors:
Denis Nekipelov,
Vira Semenova,
Vasilis Syrgkanis
Abstract:
This paper proposes a Lasso-type estimator for a high-dimensional sparse parameter identified by a single index conditional moment restriction (CMR). In addition to this parameter, the moment function can also depend on a nuisance function, such as the propensity score or the conditional choice probability, which we estimate by modern machine learning tools. We first adjust the moment function so…
▽ More
This paper proposes a Lasso-type estimator for a high-dimensional sparse parameter identified by a single index conditional moment restriction (CMR). In addition to this parameter, the moment function can also depend on a nuisance function, such as the propensity score or the conditional choice probability, which we estimate by modern machine learning tools. We first adjust the moment function so that the gradient of the future loss function is insensitive (formally, Neyman-orthogonal) with respect to the first-stage regularization bias, preserving the single index property. We then take the loss function to be an indefinite integral of the adjusted moment function with respect to the single index. The proposed Lasso estimator converges at the oracle rate, where the oracle knows the nuisance function and solves only the parametric problem. We demonstrate our method by estimating the short-term heterogeneous impact of Connecticut's Jobs First welfare reform experiment on women's welfare participation decision.
△ Less
Submitted 10 September, 2021; v1 submitted 12 June, 2018;
originally announced June 2018.
-
Debiased Machine Learning of Set-Identified Linear Models
Authors:
Vira Semenova
Abstract:
This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussi…
▽ More
This paper provides estimation and inference methods for an identified set's boundary (i.e., support function) where the selection among a very large number of covariates is based on modern regularized tools. I characterize the boundary using a semiparametric moment equation. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the boundary and propose a multiplier bootstrap procedure to conduct inference. I apply this result to the partially linear model, the partially linear IV model and the average partial derivative with an interval-valued outcome.
△ Less
Submitted 13 December, 2022; v1 submitted 28 December, 2017;
originally announced December 2017.