Search | arXiv e-print repository

doi 10.1177/0003122419877135

The Geometry of Culture: Analyzing Meaning through Word Embeddings

Authors: Austin C. Kozlowski, Matt Taddy, James A. Evans

Abstract: We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing… ▽ More We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing a relational model of meaning consistent with contemporary theories of identity and culture. We show that dimensions induced by word differences (e.g. man - woman, rich - poor, black - white, liberal - conservative) in these vector spaces closely correspond to dimensions of cultural meaning, and the projection of words onto these dimensions reflects widely shared cultural connotations when compared to surveyed responses and labeled historical data. We pilot a method for testing the stability of these associations, then demonstrate applications of word embeddings for macro-cultural investigation with a longitudinal analysis of the coevolution of gender and class associations in the United States over the 20th century and a comparative analysis of historic distinctions between markers of gender and class in the U.S. and Britain. We argue that the success of these high-dimensional models motivates a move towards "high-dimensional theorizing" of meanings, identities and cultural processes. △ Less

Submitted 25 March, 2018; originally announced March 2018.

Journal ref: American Sociological Review 2019, Vol. 84(5) 905-949

arXiv:1801.10242 [pdf, other]

Low-Rank Bandit Methods for High-Dimensional Dynamic Pricing

Authors: Jonas Mueller, Vasilis Syrgkanis, Matt Taddy

Abstract: We consider dynamic pricing with many products under an evolving but low-dimensional demand model. Assuming the temporal variation in cross-elasticities exhibits low-rank structure based on fixed (latent) features of the products, we show that the revenue maximization problem reduces to an online bandit convex optimization with side information given by the observed demands. We design dynamic pric… ▽ More We consider dynamic pricing with many products under an evolving but low-dimensional demand model. Assuming the temporal variation in cross-elasticities exhibits low-rank structure based on fixed (latent) features of the products, we show that the revenue maximization problem reduces to an online bandit convex optimization with side information given by the observed demands. We design dynamic pricing algorithms whose revenue approaches that of the best fixed price vector in hindsight, at a rate that only depends on the intrinsic rank of the demand model and not the number of products. Our approach applies a bandit convex optimization algorithm in a projected low-dimensional space spanned by the latent product features, while simultaneously learning this span via online singular value decomposition of a carefully-crafted matrix containing the observed demands. △ Less

Submitted 10 September, 2019; v1 submitted 30 January, 2018; originally announced January 2018.

Comments: NeurIPS 2019

arXiv:1712.09988 [pdf, other]

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Authors: Vira Semenova, Matt Goldman, Victor Chernozhukov, Matt Taddy

Abstract: This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization,… ▽ More This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case. △ Less

Submitted 10 December, 2022; v1 submitted 28 December, 2017; originally announced December 2017.

arXiv:1712.06695 [pdf, other]

Accurate Inference for Adaptive Linear Models

Authors: Yash Deshpande, Lester Mackey, Vasilis Syrgkanis, Matt Taddy

Abstract: Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method -- $\mathbf{W}$-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method u… ▽ More Estimators computed from adaptively collected data do not behave like their non-adaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method -- $\mathbf{W}$-decorrelation -- for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finite-sample bias and variance of the $\mathbf{W}$-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic $\mathbf{W}$-decorrelation procedure in two different adaptive data settings: the multi-armed bandit and the autoregressive time series. △ Less

Submitted 2 January, 2020; v1 submitted 18 December, 2017; originally announced December 2017.

Comments: Typos fixed for clarification

arXiv:1706.08160 [pdf, other]

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

Authors: Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, James Zou

Abstract: Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a… ▽ More Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i.e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner. Ours is the first approach with the ability to leverage multilingual corpora efficiently for multi-sense representation learning. Experiments show that multilingual training significantly improves performance over monolingual and bilingual training, by allowing us to combine different parallel corpora to leverage multilingual context. Multilingual training yields comparable performance to a state of the art mono-lingual model trained on five times more training data. △ Less

Submitted 25 June, 2017; originally announced June 2017.

Comments: ACL 2017 Repl4NLP workshop

arXiv:1612.09596 [pdf, other]

Counterfactual Prediction with Deep Instrumental Variables Networks

Authors: Jason Hartford, Greg Lewis, Kevin Leyton-Brown, Matt Taddy

Abstract: We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve co… ▽ More We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve complex automated reasoning problems. This paper provides a recipe for combining ML algorithms to solve for causal effects in the presence of instrumental variables -- sources of treatment randomization that are conditionally independent from the response. We show that a flexible IV specification resolves into two prediction tasks that can be solved with deep neural nets: a first-stage network for treatment prediction and a second-stage network whose loss function involves integration over the conditional treatment distribution. This Deep IV framework imposes some specific structure on the stochastic gradient descent routine used for training, but it is general enough that we can take advantage of off-the-shelf ML capabilities and avoid extensive algorithm customization. We outline how to obtain out-of-sample causal validation in order to avoid over-fit. We also introduce schemes for both Bayesian and frequentist inference: the former via a novel adaptation of dropout training, and the latter via a data splitting routine. △ Less

Submitted 30 December, 2016; originally announced December 2016.

arXiv:1602.08066 [pdf, other]

Scalable semiparametric inference for the means of heavy-tailed distributions

Authors: Matt Taddy, Hedibert Freitas Lopes, Matt Gardner

Abstract: Heavy tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with Internet transaction datasets, and machine learners often analyze such data without considering the biases and risks associated with the misuse of standard tools. This paper outlines a procedure for inference about the mean of a (possibly conditional) heavy tailed di… ▽ More Heavy tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with Internet transaction datasets, and machine learners often analyze such data without considering the biases and risks associated with the misuse of standard tools. This paper outlines a procedure for inference about the mean of a (possibly conditional) heavy tailed distribution that combines nonparametric analysis for the bulk of the support with Bayesian parametric modeling -- motivated from extreme value theory -- for the heavy tail. The procedure is fast and massively scalable. The resulting point estimators attain lowest-possible error rates and, unique among alternatives, we are able to provide accurate uncertainty quantification for these estimators. The work should find application in settings wherever correct inference is important and reward tails are heavy; we illustrate the framework in causal inference for A/B experiments involving hundreds of millions of users of eBay.com. △ Less

Submitted 13 October, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

arXiv:1510.02172 [pdf, other]

Hockey Player Performance via Regularized Logistic Regression

Authors: Robert B. Gramacy, Matt Taddy, Sen Tian

Abstract: A hockey player's plus-minus measures the difference between goals scored by and against that player's team while the player was on the ice. This measures only a marginal effect, failing to account for the influence of the others he is playing with and against. A better approach would be to jointly model the effects of all players, and any other confounding information, in order to infer a partial… ▽ More A hockey player's plus-minus measures the difference between goals scored by and against that player's team while the player was on the ice. This measures only a marginal effect, failing to account for the influence of the others he is playing with and against. A better approach would be to jointly model the effects of all players, and any other confounding information, in order to infer a partial effect for this individual: his influence on the box score regardless of who else is on the ice. This chapter describes and illustrates a simple algorithm for recovering such partial effects. There are two main ingredients. First, we provide a logistic regression model that can predict which team has scored a given goal as a function of who was on the ice, what teams were playing, and details of the game situation (e.g. full-strength or power-play). Since the resulting model is so high dimensional that standard maximum likelihood estimation techniques fail, our second ingredient is a scheme for regularized estimation. This adds a penalty to the objective that favors parsimonious models and stabilizes estimation. Such techniques have proven useful in fields from genetics to finance over the past two decades, and have demonstrated an impressive ability to gracefully handle large and highly imbalanced data sets. The latest software packages accompanying this new methodology -- which exploit parallel computing environments, sparse matrices, and other features of modern data structures -- are widely available and make it straightforward for interested analysts to explore their own models of player contribution. △ Less

Submitted 25 January, 2016; v1 submitted 7 October, 2015; originally announced October 2015.

arXiv:1509.03940 [pdf, other]

Causal Inference in Repeated Observational Studies: A Case Study of eBay Product Releases

Authors: Vadim von Brzeski, Matt Taddy, David Draper

Abstract: Causal inference in observational studies is notoriously difficult, due to the fact that the experimenter is not in charge of the treatment assignment mechanism. Many potential con- founding factors (PCFs) exist in such a scenario, and if one seeks to estimate the causal effect of the treatment on a response, one needs to control for such factors. Identifying all relevant PCFs may be difficult (or… ▽ More Causal inference in observational studies is notoriously difficult, due to the fact that the experimenter is not in charge of the treatment assignment mechanism. Many potential con- founding factors (PCFs) exist in such a scenario, and if one seeks to estimate the causal effect of the treatment on a response, one needs to control for such factors. Identifying all relevant PCFs may be difficult (or impossible) given a single observational study. Instead, we argue that if one can observe a sequence of similar treatments over the course of a lengthy time period, one can identify patterns of behavior in the experimental subjects that are correlated with the response of interest and control for those patterns directly. Specifically, in our case-study we find and control for an early-adopter effect: the scenario in which the magnitude of the response is highly correlated with how quickly one adopts a treatment after its release. We provide a flexible hierarchical Bayesian framework that controls for such early-adopter effects in the analysis of the effects of multiple sequential treatments. The methods are presented and evaluated in the context of a detailed case-study involving product updates (newer versions of the same product) from eBay, Inc. The users in our study upgrade (or not) to a new version of the product at their own volition and timing. Our response variable is a measure of user actions, and we study the behavior of a large set of users (n = 10.5 million) in a targeted subset of eBay categories over a period of one year. We find that (a) naive causal estimates are hugely misleading and (b) our method, which is relatively insensitive to modeling assumptions and exhibits good out-of-sample predictive validation, yields sensible causal estimates that offer eBay a stable basis for decision-making. △ Less

Submitted 13 September, 2015; originally announced September 2015.

arXiv:1504.07295 [pdf, other]

Document Classification by Inversion of Distributed Language Representations

Authors: Matt Taddy

Abstract: There have been many recent advances in the structure and measurement of distributed language models: those that map from words to a vector-space that is rich in information about word choice and composition. This vector-space is the distributed language representation. The goal of this note is to point out that any distributed representation can be turned into a classifier through inversion via B… ▽ More There have been many recent advances in the structure and measurement of distributed language models: those that map from words to a vector-space that is rich in information about word choice and composition. This vector-space is the distributed language representation. The goal of this note is to point out that any distributed representation can be turned into a classifier through inversion via Bayes rule. The approach is simple and modular, in that it will work with any language representation whose training can be formulated as optimizing a probability model. In our application to 2 million sentences from Yelp reviews, we also find that it performs as well as or better than complex purpose-built algorithms. △ Less

Submitted 24 July, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

arXiv:1502.02312 [pdf, other]

Bayesian and empirical Bayesian forests

Authors: Matt Taddy, Chun-Sheng Chen, Jun Yu, Mitch Wyle

Abstract: We derive ensembles of decision trees through a nonparametric Bayesian model, allowing us to view random forests as samples from a posterior distribution. This insight provides large gains in interpretability, and motivates a class of Bayesian forest (BF) algorithms that yield small but reliable performance gains. Based on the BF framework, we are able to show that high-level tree hierarchy is sta… ▽ More We derive ensembles of decision trees through a nonparametric Bayesian model, allowing us to view random forests as samples from a posterior distribution. This insight provides large gains in interpretability, and motivates a class of Bayesian forest (BF) algorithms that yield small but reliable performance gains. Based on the BF framework, we are able to show that high-level tree hierarchy is stable in large samples. This leads to an empirical Bayesian forest (EBF) algorithm for building approximate BFs on massive distributed datasets and we show that EBFs outperform sub-sampling based alternatives by a large margin. △ Less

Submitted 15 May, 2015; v1 submitted 8 February, 2015; originally announced February 2015.

arXiv:1412.8563 [pdf, other]

A nonparametric Bayesian analysis of heterogeneous treatment effects in digital experimentation

Authors: Matt Taddy, Matt Gardner, Liyun Chen, David Draper

Abstract: Randomized controlled trials play an important role in how Internet companies predict the impact of policy decisions and product changes. In these `digital experiments', different units (people, devices, products) respond differently to the treatment. This article presents a fast and scalable Bayesian nonparametric analysis of such heterogeneous treatment effects and their measurement in relation… ▽ More Randomized controlled trials play an important role in how Internet companies predict the impact of policy decisions and product changes. In these `digital experiments', different units (people, devices, products) respond differently to the treatment. This article presents a fast and scalable Bayesian nonparametric analysis of such heterogeneous treatment effects and their measurement in relation to observable covariates. New results and algorithms are provided for quantifying the uncertainty associated with treatment effect measurement via both linear projections and nonlinear regression trees (CART and Random Forests). For linear projections, our inference strategy leads to results that are mostly in agreement with those from the frequentist literature. We find that linear regression adjustment of treatment effect averages (i.e., post-stratification) can provide some variance reduction, but that this reduction will be vanishingly small in the low-signal and large-sample setting of digital experiments. For regression trees, we provide uncertainty quantification for the machine learning algorithms that are commonly applied in tree-fitting. We argue that practitioners should look to ensembles of trees (forests) rather than individual trees in their analysis. The ideas are applied on and illustrated through an example experiment involving 21 million unique users of EBay.com. △ Less

Submitted 18 December, 2015; v1 submitted 29 December, 2014; originally announced December 2014.

arXiv:1311.6139 [pdf, ps, other]

doi 10.1214/15-AOAS831

Distributed multinomial regression

Authors: Matt Taddy

Abstract: This article introduces a model-based approach to distributed computing for multinomial logistic (softmax) regression. We treat counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories. The work is driven by the high-dimensional-response multinomial models that are used in analysis of a large number of random counts. Our… ▽ More This article introduces a model-based approach to distributed computing for multinomial logistic (softmax) regression. We treat counts for each response category as independent Poisson regressions via plug-in estimates for fixed effects shared across categories. The work is driven by the high-dimensional-response multinomial models that are used in analysis of a large number of random counts. Our motivating applications are in text analysis, where documents are tokenized and the token counts are modeled as arising from a multinomial dependent upon document attributes. We estimate such models for a publicly available data set of reviews from Yelp, with text regressed onto a large set of explanatory variables (user, business, and rating information). The fitted models serve as a basis for exploring the connection between words and variables of interest, for reducing dimension into supervised factor scores, and for prediction. We argue that the approach herein provides an attractive option for social scientists and other text analysts who wish to bring familiar regression tools to bear on text data. △ Less

Submitted 5 November, 2015; v1 submitted 24 November, 2013; originally announced November 2013.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS831 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS831

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1394-1414

arXiv:1308.5623 [pdf, other]

One-step estimator paths for concave regularization

Authors: Matt Taddy

Abstract: The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques which can roughly be understood as providing estimation under penalty functions spanning the range of concavity between $L_0$ and $L_1$ norms. However, lasso $L_1$-regularized estimation remains the standard tool for industrial `Big Data' applications beca… ▽ More The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques which can roughly be understood as providing estimation under penalty functions spanning the range of concavity between $L_0$ and $L_1$ norms. However, lasso $L_1$-regularized estimation remains the standard tool for industrial `Big Data' applications because of its minimal computational cost and the presence of easy-to-apply rules for penalty selection. In response, this article proposes a simple new algorithm framework that requires no more computation than a lasso path: the path of one-step estimators (POSE) does $L_1$ penalized regression estimation on a grid of decreasing penalties, but adapts coefficient-specific weights to decrease as a function of the coefficient estimated in the previous path step. This provides sparse diminishing-bias regularization at no extra cost over the fastest lasso algorithms. Moreover, our `gamma lasso' implementation of POSE is accompanied by a reliable heuristic for the fit degrees of freedom, so that standard information criteria can be applied in penalty selection. We also provide novel results on the distance between weighted-$L_1$ and $L_0$ penalized predictors; this allows us to build intuition about POSE and other diminishing-bias regularization schemes. The methods and results are illustrated in extensive simulations and in application of logistic regression to evaluating the performance of hockey players. △ Less

Submitted 1 May, 2016; v1 submitted 26 August, 2013; originally announced August 2013.

Comments: Data and code are in the gamlr package for R. Supplemental appendix is at https://github.com/TaddyLab/pose/raw/master/paper/supplemental.pdf

arXiv:1304.4200 [pdf, ps, other]

Efficiency and Structure in Multinomial Inverse Regression

Authors: Matt Taddy

Abstract: This is the rejoinder for discussion of "Multinomial Inverse Regression for Text Analysis", Journal of the American Statistical Association 108, 2013. This is the rejoinder for discussion of "Multinomial Inverse Regression for Text Analysis", Journal of the American Statistical Association 108, 2013. △ Less

Submitted 8 August, 2013; v1 submitted 15 April, 2013; originally announced April 2013.

Comments: The main article is here: http://arxiv.longhoe.net/abs/1012.2098

arXiv:1209.5026 [pdf, other]

Estimating Player Contribution in Hockey with Regularized Logistic Regression

Authors: Robert B. Gramacy, Matthew A. Taddy, Shane T. Jensen

Abstract: We present a regularized logistic regression model for evaluating player contributions in hockey. The traditional metric for this purpose is the plus-minus statistic, which allocates a single unit of credit (for or against) to each player on the ice for a goal. However, plus-minus scores measure only the marginal effect of players, do not account for sample size, and provide a very noisy estimate… ▽ More We present a regularized logistic regression model for evaluating player contributions in hockey. The traditional metric for this purpose is the plus-minus statistic, which allocates a single unit of credit (for or against) to each player on the ice for a goal. However, plus-minus scores measure only the marginal effect of players, do not account for sample size, and provide a very noisy estimate of performance. We investigate a related regression problem: what does each player on the ice contribute, beyond aggregate team performance and other factors, to the odds that a given goal was scored by their team? Due to the large-p (number of players) and imbalanced design setting of hockey analysis, a major part of our contribution is a careful treatment of prior shrinkage in model estimation. We showcase two recently developed techniques -- for posterior maximization or simulation -- that make such analysis feasible. Each approach is accompanied with publicly available software and we include the simple commands used in our analysis. Our results show that most players do not stand out as measurably strong (positive or negative) contributors. This allows the stars to really shine, reveals diamonds in the rough overlooked by earlier analyses, and argues that some of the highest paid players in the league are not making contributions worth their expense. △ Less

Submitted 12 January, 2013; v1 submitted 22 September, 2012; originally announced September 2012.

Comments: 23 pages, 10 figures

arXiv:1206.3776 [pdf, other]

Measuring political sentiment on Twitter: factor-optimal design for multinomial inverse regression

Authors: Matt Taddy

Abstract: This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed towards particular US politicians. The study requires selection of a sub-sample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining. As a general contribution, our application is preceded by a proposed algorithm for… ▽ More This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed towards particular US politicians. The study requires selection of a sub-sample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining. As a general contribution, our application is preceded by a proposed algorithm for maximizing sampling efficiency. In particular, we outline and illustrate greedy selection of documents to build designs that are D-optimal in a topic-factor decomposition of the original text. The strategy is applied to our motivating dataset of political posts, and we outline a new technique for predicting both generic and subject-specific document sentiment through use of variable interactions in multinomial inverse regression. Results are presented for analysis of 2.1 million Twitter posts around February 2012. △ Less

Submitted 1 March, 2013; v1 submitted 17 June, 2012; originally announced June 2012.

Comments: To appear in Technometrics. Code is available in the textir package for R

arXiv:1109.4518 [pdf, other]

On Estimation and Selection for Topic Models

Authors: Matthew A. Taddy

Abstract: This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix,that facilitates choosing the number of latent topics. This… ▽ More This article describes posterior maximization for topic models, identifying computational and conceptual gains from inference under a non-standard parametrization. We then show that fitted parameters can be used as the basis for a novel approach to marginal likelihood estimation, via block-diagonal approximation to the information matrix,that facilitates choosing the number of latent topics. This likelihood-based model selection is complemented with a goodness-of-fit analysis built around estimated residual dispersion. Examples are provided to illustrate model selection as well as to compare our estimation against standard alternative techniques. △ Less

Submitted 27 December, 2011; v1 submitted 21 September, 2011; originally announced September 2011.

Comments: Scheduled to appear in the proceedings of AISTATS 2012

arXiv:1108.4739 [pdf, ps, other]

doi 10.1214/12-AOAS590

Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning

Authors: Robert B. Gramacy, Matt Taddy, Stefan M. Wild

Abstract: We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing environments. Here, the response function is nonlinear and noisy and may not be smooth or stationary. Clearly needed are variable selection, decomposition of influence,… ▽ More We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing environments. Here, the response function is nonlinear and noisy and may not be smooth or stationary. Clearly needed are variable selection, decomposition of influence, and analysis of main and secondary effects for both real-valued and binary inputs and outputs. Our contribution is a novel set of tools for variable selection and sensitivity analysis based on the recently proposed dynamic tree model. We argue that this approach is uniquely well suited to the demands of our motivating example. In illustrations on benchmark data sets, we show that the new techniques are faster and offer richer feature sets than do similar approaches in the static tree and computer experiment literature. We apply the methods in code-tuning optimization, examination of a cold-cache effect, and detection of transformation errors. △ Less

Submitted 16 April, 2013; v1 submitted 23 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-AOAS590 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS590

Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 1, 51-80

arXiv:1012.2105 [pdf, other]

Mixture Modeling for Marked Poisson Processes

Authors: Matthew A. Taddy, Athanasios Kottas

Abstract: We propose a general modeling framework for marked Poisson processes observed over time or space. The modeling approach exploits the connection of the nonhomogeneous Poisson process intensity with a density function. Nonparametric Dirichlet process mixtures for this density, combined with nonparametric or semiparametric modeling for the mark distribution, yield flexible prior models for the marked… ▽ More We propose a general modeling framework for marked Poisson processes observed over time or space. The modeling approach exploits the connection of the nonhomogeneous Poisson process intensity with a density function. Nonparametric Dirichlet process mixtures for this density, combined with nonparametric or semiparametric modeling for the mark distribution, yield flexible prior models for the marked Poisson process. In particular, we focus on fully nonparametric model formulations that build the mark density and intensity function from a joint nonparametric mixture, and provide guidelines for straightforward application of these techniques. A key feature of such models is that they can yield flexible inference about the conditional distribution for multivariate marks without requiring specification of a complicated dependence scheme. We address issues relating to choice of the Dirichlet process mixture kernels, and develop methods for prior specification and posterior simulation for full inference about functionals of the marked Poisson process. Moreover, we discuss a method for model checking that can be used to assess and compare goodness of fit of different model specifications under the proposed framework. The methodology is illustrated with simulated and real data sets. △ Less

Submitted 1 November, 2011; v1 submitted 9 December, 2010; originally announced December 2010.

arXiv:1012.2098 [pdf, other]

Multinomial Inverse Regression for Text Analysis

Authors: Matt Taddy

Abstract: Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-preserving dimension reduction for text data. Multinomial inver… ▽ More Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-preserving dimension reduction for text data. Multinomial inverse regression is introduced as a general tool for simplifying predictor sets that can be represented as draws from a multinomial distribution, and we show that logistic regression of phrase counts onto document annotations can be used to obtain low dimension document representations that are rich in sentiment information. To facilitate this modeling, a novel estimation technique is developed for multinomial logistic regression with very high-dimension response. In particular, independent Laplace priors with unknown variance are assigned to each regression coefficient, and we detail an efficient routine for maximization of the joint posterior over coefficients and their prior scale. This "gamma-lasso" scheme yields stable and effective estimation for general high-dimension logistic regression, and we argue that it will be superior to current methods in many settings. Guidelines for prior specification are provided, algorithm convergence is detailed, and estimator properties are outlined from the perspective of the literature on non-concave likelihood penalization. Related work on sentiment analysis from statistics, econometrics, and machine learning is surveyed and connected. Finally, the methods are applied in two detailed examples and we provide out-of-sample prediction studies to illustrate their effectiveness. △ Less

Submitted 8 August, 2013; v1 submitted 9 December, 2010; originally announced December 2010.

Comments: Published in the Journal of the American Statistical Association 108, 2013, with discussion (rejoinder is here: http://arxiv.longhoe.net/abs/1304.4200). Software is available in the textir package for R

arXiv:0912.1586 [pdf, other]

Dynamic Trees for Learning and Design

Authors: Matthew A. Taddy, Robert B. Gramacy, Nicholas G. Polson

Abstract: Dynamic regression trees are an attractive option for automatic regression and classification with complicated response surfaces in on-line application settings. We create a sequential tree model whose state changes in time with the accumulation of new data, and provide particle learning algorithms that allow for the efficient on-line posterior filtering of tree-states. A major advantage of tree r… ▽ More Dynamic regression trees are an attractive option for automatic regression and classification with complicated response surfaces in on-line application settings. We create a sequential tree model whose state changes in time with the accumulation of new data, and provide particle learning algorithms that allow for the efficient on-line posterior filtering of tree-states. A major advantage of tree regression is that it allows for the use of very simple models within each partition. The model also facilitates a natural division of labor in our sequential particle-based inference: tree dynamics are defined through a few potential changes that are local to each newly arrived observation, while global uncertainty is captured by the ensemble of particles. We consider both constant and linear mean functions at the tree leaves, along with multinomial leaves for classification problems, and propose default prior specifications that allow for prediction to be integrated over all model parameters conditional on a given tree. Inference is illustrated in some standard nonparametric regression examples, as well as in the setting of sequential experiment design, including both active learning and optimization applications, and in on-line classification. We detail implementation guidelines and problem specific methodology for each of these motivating applications. Throughout, it is demonstrated that our practical approach is able to provide better results compared to commonly used methods at a fraction of the cost. △ Less

Submitted 21 November, 2010; v1 submitted 8 December, 2009; originally announced December 2009.

Comments: 37 pages, 8 figures, 3 tables; accepted at JASA

Showing 1–22 of 22 results for author: Taddy, M