Skip to main content

Showing 1–22 of 22 results for author: Taddy, M

.
  1. The Geometry of Culture: Analyzing Meaning through Word Embeddings

    Authors: Austin C. Kozlowski, Matt Taddy, James A. Evans

    Abstract: We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing… ▽ More

    Submitted 25 March, 2018; originally announced March 2018.

    Journal ref: American Sociological Review 2019, Vol. 84(5) 905-949

  2. arXiv:1801.10242  [pdf, other

    cs.LG stat.ML

    Low-Rank Bandit Methods for High-Dimensional Dynamic Pricing

    Authors: Jonas Mueller, Vasilis Syrgkanis, Matt Taddy

    Abstract: We consider dynamic pricing with many products under an evolving but low-dimensional demand model. Assuming the temporal variation in cross-elasticities exhibits low-rank structure based on fixed (latent) features of the products, we show that the revenue maximization problem reduces to an online bandit convex optimization with side information given by the observed demands. We design dynamic pric… ▽ More

    Submitted 10 September, 2019; v1 submitted 30 January, 2018; originally announced January 2018.

    Comments: NeurIPS 2019

  3. arXiv:1712.09988  [pdf, other

    stat.ML

    Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

    Authors: Vira Semenova, Matt Goldman, Victor Chernozhukov, Matt Taddy

    Abstract: This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization,… ▽ More

    Submitted 10 December, 2022; v1 submitted 28 December, 2017; originally announced December 2017.

  4. arXiv:1712.06695  [pdf, other

    stat.ML cs.LG

    Accurate Inference for Adaptive Linear Models

    Authors: Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, Matt Taddy

    Abstract: Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method -- $\mathbf{W}$-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method u… ▽ More

    Submitted 2 January, 2020; v1 submitted 18 December, 2017; originally announced December 2017.

    Comments: Typos fixed for clarification

  5. arXiv:1706.08160  [pdf, other

    cs.CL cs.LG

    Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

    Authors: Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, James Zou

    Abstract: Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a… ▽ More

    Submitted 25 June, 2017; originally announced June 2017.

    Comments: ACL 2017 Repl4NLP workshop

  6. arXiv:1612.09596  [pdf, other

    stat.AP cs.LG stat.ML

    Counterfactual Prediction with Deep Instrumental Variables Networks

    Authors: Jason Hartford, Greg Lewis, Kevin Leyton-Brown, Matt Taddy

    Abstract: We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve co… ▽ More

    Submitted 30 December, 2016; originally announced December 2016.

  7. arXiv:1602.08066  [pdf, other

    stat.AP

    Scalable semiparametric inference for the means of heavy-tailed distributions

    Authors: Matt Taddy, Hedibert Freitas Lopes, Matt Gardner

    Abstract: Heavy tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with Internet transaction datasets, and machine learners often analyze such data without considering the biases and risks associated with the misuse of standard tools. This paper outlines a procedure for inference about the mean of a (possibly conditional) heavy tailed di… ▽ More

    Submitted 13 October, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

  8. arXiv:1510.02172  [pdf, other

    stat.AP

    Hockey Player Performance via Regularized Logistic Regression

    Authors: Robert B. Gramacy, Matt Taddy, Sen Tian

    Abstract: A hockey player's plus-minus measures the difference between goals scored by and against that player's team while the player was on the ice. This measures only a marginal effect, failing to account for the influence of the others he is playing with and against. A better approach would be to jointly model the effects of all players, and any other confounding information, in order to infer a partial… ▽ More

    Submitted 25 January, 2016; v1 submitted 7 October, 2015; originally announced October 2015.

  9. arXiv:1509.03940  [pdf, other

    stat.AP

    Causal Inference in Repeated Observational Studies: A Case Study of eBay Product Releases

    Authors: Vadim von Brzeski, Matt Taddy, David Draper

    Abstract: Causal inference in observational studies is notoriously difficult, due to the fact that the experimenter is not in charge of the treatment assignment mechanism. Many potential con- founding factors (PCFs) exist in such a scenario, and if one seeks to estimate the causal effect of the treatment on a response, one needs to control for such factors. Identifying all relevant PCFs may be difficult (or… ▽ More

    Submitted 13 September, 2015; originally announced September 2015.

  10. arXiv:1504.07295  [pdf, other

    cs.CL cs.IR stat.AP

    Document Classification by Inversion of Distributed Language Representations

    Authors: Matt Taddy

    Abstract: There have been many recent advances in the structure and measurement of distributed language models: those that map from words to a vector-space that is rich in information about word choice and composition. This vector-space is the distributed language representation. The goal of this note is to point out that any distributed representation can be turned into a classifier through inversion via B… ▽ More

    Submitted 24 July, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

  11. arXiv:1502.02312  [pdf, other

    stat.AP

    Bayesian and empirical Bayesian forests

    Authors: Matt Taddy, Chun-Sheng Chen, Jun Yu, Mitch Wyle

    Abstract: We derive ensembles of decision trees through a nonparametric Bayesian model, allowing us to view random forests as samples from a posterior distribution. This insight provides large gains in interpretability, and motivates a class of Bayesian forest (BF) algorithms that yield small but reliable performance gains. Based on the BF framework, we are able to show that high-level tree hierarchy is sta… ▽ More

    Submitted 15 May, 2015; v1 submitted 8 February, 2015; originally announced February 2015.

  12. arXiv:1412.8563  [pdf, other

    stat.AP

    A nonparametric Bayesian analysis of heterogeneous treatment effects in digital experimentation

    Authors: Matt Taddy, Matt Gardner, Liyun Chen, David Draper

    Abstract: Randomized controlled trials play an important role in how Internet companies predict the impact of policy decisions and product changes. In these `digital experiments', different units (people, devices, products) respond differently to the treatment. This article presents a fast and scalable Bayesian nonparametric analysis of such heterogeneous treatment effects and their measurement in relation… ▽ More

    Submitted 18 December, 2015; v1 submitted 29 December, 2014; originally announced December 2014.

  13. Distributed multinomial regression

    Authors: Matt Taddy

    Abstract: This article introduces a model-based approach to distributed computing for multinomial logistic (softmax) regression. We treat counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories. The work is driven by the high-dimensional-response multinomial models that are used in analysis of a large number of random counts. Our… ▽ More

    Submitted 5 November, 2015; v1 submitted 24 November, 2013; originally announced November 2013.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS831 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS831

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1394-1414

  14. arXiv:1308.5623  [pdf, other

    stat.AP

    One-step estimator paths for concave regularization

    Authors: Matt Taddy

    Abstract: The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques which can roughly be understood as providing estimation under penalty functions spanning the range of concavity between $L_0$ and $L_1$ norms. However, lasso $L_1$-regularized estimation remains the standard tool for industrial `Big Data' applications beca… ▽ More

    Submitted 1 May, 2016; v1 submitted 26 August, 2013; originally announced August 2013.

    Comments: Data and code are in the gamlr package for R. Supplemental appendix is at https://github.com/TaddyLab/pose/raw/master/paper/supplemental.pdf

  15. arXiv:1304.4200  [pdf, ps, other

    stat.AP

    Efficiency and Structure in Multinomial Inverse Regression

    Authors: Matt Taddy

    Abstract: This is the rejoinder for discussion of "Multinomial Inverse Regression for Text Analysis", Journal of the American Statistical Association 108, 2013.

    Submitted 8 August, 2013; v1 submitted 15 April, 2013; originally announced April 2013.

    Comments: The main article is here: http://arxiv.longhoe.net/abs/1012.2098

  16. arXiv:1209.5026  [pdf, other

    stat.AP

    Estimating Player Contribution in Hockey with Regularized Logistic Regression

    Authors: Robert B. Gramacy, Matthew A. Taddy, Shane T. Jensen

    Abstract: We present a regularized logistic regression model for evaluating player contributions in hockey. The traditional metric for this purpose is the plus-minus statistic, which allocates a single unit of credit (for or against) to each player on the ice for a goal. However, plus-minus scores measure only the marginal effect of players, do not account for sample size, and provide a very noisy estimate… ▽ More

    Submitted 12 January, 2013; v1 submitted 22 September, 2012; originally announced September 2012.

    Comments: 23 pages, 10 figures

  17. arXiv:1206.3776  [pdf, other

    stat.AP

    Measuring political sentiment on Twitter: factor-optimal design for multinomial inverse regression

    Authors: Matt Taddy

    Abstract: This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed towards particular US politicians. The study requires selection of a sub-sample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining. As a general contribution, our application is preceded by a proposed algorithm for… ▽ More

    Submitted 1 March, 2013; v1 submitted 17 June, 2012; originally announced June 2012.

    Comments: To appear in Technometrics. Code is available in the textir package for R

  18. arXiv:1109.4518  [pdf, other

    stat.AP

    On Estimation and Selection for Topic Models

    Authors: Matthew A. Taddy

    Abstract: This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix,that facilitates choosing the number of latent topics. This… ▽ More

    Submitted 27 December, 2011; v1 submitted 21 September, 2011; originally announced September 2011.

    Comments: Scheduled to appear in the proceedings of AISTATS 2012

  19. Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning

    Authors: Robert B. Gramacy, Matt Taddy, Stefan M. Wild

    Abstract: We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing environments. Here, the response function is nonlinear and noisy and may not be smooth or stationary. Clearly needed are variable selection, decomposition of influence,… ▽ More

    Submitted 16 April, 2013; v1 submitted 23 August, 2011; originally announced August 2011.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOAS590 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS590

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 1, 51-80

  20. arXiv:1012.2105  [pdf, other

    stat.ME

    Mixture Modeling for Marked Poisson Processes

    Authors: Matthew A. Taddy, Athanasios Kottas

    Abstract: We propose a general modeling framework for marked Poisson processes observed over time or space. The modeling approach exploits the connection of the nonhomogeneous Poisson process intensity with a density function. Nonparametric Dirichlet process mixtures for this density, combined with nonparametric or semiparametric modeling for the mark distribution, yield flexible prior models for the marked… ▽ More

    Submitted 1 November, 2011; v1 submitted 9 December, 2010; originally announced December 2010.

  21. arXiv:1012.2098  [pdf, other

    stat.ME

    Multinomial Inverse Regression for Text Analysis

    Authors: Matt Taddy

    Abstract: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-preserving dimension reduction for text data. Multinomial inver… ▽ More

    Submitted 8 August, 2013; v1 submitted 9 December, 2010; originally announced December 2010.

    Comments: Published in the Journal of the American Statistical Association 108, 2013, with discussion (rejoinder is here: http://arxiv.longhoe.net/abs/1304.4200). Software is available in the textir package for R

  22. arXiv:0912.1586  [pdf, other

    stat.ME stat.CO

    Dynamic Trees for Learning and Design

    Authors: Matthew A. Taddy, Robert B. Gramacy, Nicholas G. Polson

    Abstract: Dynamic regression trees are an attractive option for automatic regression and classification with complicated response surfaces in on-line application settings. We create a sequential tree model whose state changes in time with the accumulation of new data, and provide particle learning algorithms that allow for the efficient on-line posterior filtering of tree-states. A major advantage of tree r… ▽ More

    Submitted 21 November, 2010; v1 submitted 8 December, 2009; originally announced December 2009.

    Comments: 37 pages, 8 figures, 3 tables; accepted at JASA