Search | arXiv e-print repository

Drawing Causal Inferences About Performance Effects in NLP

Abstract: This article emphasizes that NLP as a science seeks to make inferences about the performance effects that result from applying one method (compared to another method) in the processing of natural language. Yet NLP research in practice usually does not achieve this goal: In NLP research articles, typically only a few models are compared. Each model results from a specific procedural pipeline (here… ▽ More This article emphasizes that NLP as a science seeks to make inferences about the performance effects that result from applying one method (compared to another method) in the processing of natural language. Yet NLP research in practice usually does not achieve this goal: In NLP research articles, typically only a few models are compared. Each model results from a specific procedural pipeline (here named processing system) that is composed of a specific collection of methods that are used in preprocessing, pretraining, hyperparameter tuning, and training on the target task. To make generalizing inferences about the performance effect that is caused by applying some method A vs. another method B, it is not sufficient to compare a few specific models that are produced by a few specific (probably incomparable) processing systems. Rather, the following procedure would allow drawing inferences about methods' performance effects: (1) A population of processing systems that researchers seek to infer to has to be defined. (2) A random sample of processing systems from this population is drawn. (The drawn processing systems in the sample will vary with regard to the methods they apply along their procedural pipelines and also will vary regarding the compositions of their training and test data sets used for training and evaluation.) (3) Each processing system is applied once with method A and once with method B. (4) Based on the sample of applied processing systems, the expected generalization errors of method A and method B are approximated. (5) The difference between the expected generalization errors of method A and method B is the estimated average treatment effect due to applying method A compared to method B in the population of processing systems. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 15 pages

arXiv:2205.01600 [pdf, other]

A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis

Authors: Sandra Wankmüller

Abstract: One of the first steps in many text-based social science studies is to retrieve documents that are relevant for the analysis from large corpora of otherwise irrelevant documents. The conventional approach in social science to address this retrieval task is to apply a set of keywords and to consider those documents to be relevant that contain at least one of the keywords. But the application of inc… ▽ More One of the first steps in many text-based social science studies is to retrieve documents that are relevant for the analysis from large corpora of otherwise irrelevant documents. The conventional approach in social science to address this retrieval task is to apply a set of keywords and to consider those documents to be relevant that contain at least one of the keywords. But the application of incomplete keyword lists risks drawing biased inferences. More complex and costly methods such as query expansion techniques, topic model-based classification rules, and active as well as passive supervised learning could have the potential to more accurately separate relevant from irrelevant documents and thereby reduce the potential size of bias. Yet, whether applying these more expensive approaches increases retrieval performance compared to keyword lists at all, and if so, by how much, is unclear as a comparison of these approaches is lacking. This study closes this gap by comparing these methods across three retrieval tasks associated with a data set of German tweets (Linder, 2017), the Social Bias Inference Corpus (SBIC) (Sap et al., 2020), and the Reuters-21578 corpus (Lewis, 1997). Results show that query expansion techniques and topic model-based classification rules in most studied settings tend to decrease rather than increase retrieval performance. Active supervised learning, however, if applied on a not too small set of labeled training instances (e.g. 1,000 documents), reaches a substantially higher retrieval performance than keyword lists. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 78 pages, 17 figures, 9 tables

ACM Class: I.2.7; H.3.3

arXiv:2104.02496 [pdf, other]

Exploring Topic-Metadata Relationships with the STM: A Bayesian Approach

Authors: P. Schulze, S. Wiegrebe, P. W. Thurner, C. Heumann, M. Aßenmacher, S. Wankmüller

Abstract: Topic models such as the Structural Topic Model (STM) estimate latent topical clusters within text. An important step in many topic modeling applications is to explore relationships between the discovered topical structure and metadata associated with the text documents. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but inst… ▽ More Topic models such as the Structural Topic Model (STM) estimate latent topical clusters within text. An important step in many topic modeling applications is to explore relationships between the discovered topical structure and metadata associated with the text documents. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself. The authors of the STM, for instance, perform repeated OLS regressions of sampled topic proportions on metadata covariates by using a Monte Carlo sampling technique known as the method of composition. In this paper, we propose two improvements: first, we replace OLS with more appropriate Beta regression. Second, we suggest a fully Bayesian approach instead of the current blending of frequentist and Bayesian methods. We demonstrate our improved methodology by exploring relationships between Twitter posts by German members of parliament (MPs) and different metadata covariates. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: 8 pages, 4 figures

arXiv:2102.02111 [pdf, other]

Introduction to Neural Transfer Learning with Transformers for Social Science Text Analysis

Authors: Sandra Wankmüller

Abstract: Transformer-based models for transfer learning have the potential to achieve high prediction accuracies on text-based supervised learning tasks with relatively few training data instances. These models are thus likely to benefit social scientists that seek to have as accurate as possible text-based measures but only have limited resources for annotating training data. To enable social scientists t… ▽ More Transformer-based models for transfer learning have the potential to achieve high prediction accuracies on text-based supervised learning tasks with relatively few training data instances. These models are thus likely to benefit social scientists that seek to have as accurate as possible text-based measures but only have limited resources for annotating training data. To enable social scientists to leverage these potential benefits for their research, this paper explains how these methods work, why they might be advantageous, and what their limitations are. Additionally, three Transformer-based models for transfer learning, BERT (Devlin et al. 2019), RoBERTa (Liu et al. 2019), and the Longformer (Beltagy et al. 2020), are compared to conventional machine learning algorithms on three applications. Across all evaluated tasks, textual styles, and training data set sizes, the conventional models are consistently outperformed by transfer learning with Transformers, thereby demonstrating the benefits these models can bring to text-based social science research. △ Less

Submitted 31 August, 2022; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: 80 pages, 12 figures; changed the title; more focused presentation of contents; moved contents to the appendix; created a new Figure 9; discussion of additional aspects (zero-shot learning, cross-lingual learning, interpretability, foundation models); removed old Figures 4 and 5; made non-essential changes to Figures 1, 2, 4, 6, 7, 8 and 10; changed notation. The original results are unchanged

ACM Class: I.2.7

Showing 1–4 of 4 results for author: Wankmüller, S