Skip to main content

Showing 1–13 of 13 results for author: Keith, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.06687  [pdf, other

    cs.CL cs.LG stat.ME

    Proximal Causal Inference With Text Data

    Authors: Jacob M. Chen, Rohit Bhattacharya, Katherine A. Keith

    Abstract: Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. These approaches, however, assume analysts have supervised labels of the confounders given text for a subset of instances, a constraint that is sometimes infeasible due to data privacy or annotation costs. In th… ▽ More

    Submitted 21 May, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 26 pages

  2. arXiv:2307.15176  [pdf, other

    cs.AI cs.CL cs.LG stat.ME

    RCT Rejection Sampling for Causal Estimation Evaluation

    Authors: Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya

    Abstract: Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment… ▽ More

    Submitted 31 January, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Code and data at https://github.com/kakeith/rct_rejection_sampling

    Journal ref: Transactions on Machine Learning Research (TMLR) 2023

  3. arXiv:2212.09676  [pdf, other

    cs.CL cs.DL

    Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications

    Authors: Li Lucy, Jesse Dodge, David Bamman, Katherine A. Keith

    Abstract: Scholarly text is often laden with jargon, or specialized language that can facilitate efficient in-group communication within fields but hinder understanding for out-groups. In this work, we develop and validate an interpretable approach for measuring scholarly jargon from text. Expanding the scope of prior work which focuses on word types, we use word sense induction to also identify words that… ▽ More

    Submitted 22 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 17 pages, 11 figures, to appear in Findings of the Association for Computational Linguistics 2023

  4. arXiv:2211.15971  [pdf, other

    cs.CL

    Democratizing Machine Learning for Interdisciplinary Scholars: Report on Organizing the NLP+CSS Online Tutorial Series

    Authors: Ian Stewart, Katherine Keith

    Abstract: Many scientific fields -- including biology, health, education, and the social sciences -- use machine learning (ML) to help them analyze data at an unprecedented scale. However, ML researchers who develop advanced methods rarely provide detailed tutorials showing how to apply these methods. Existing tutorials are often costly to participants, presume extensive programming knowledge, and are not t… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  5. arXiv:2109.07542  [pdf, other

    cs.CL

    Text as Causal Mediators: Research Design for Causal Estimates of Differential Treatment of Social Groups via Language Aspects

    Authors: Katherine A. Keith, Douglas Rice, Brendan O'Connor

    Abstract: Using observed language to understand interpersonal interactions is important in high-stakes decision making. We propose a causal research design for observational (non-experimental) data to estimate the natural direct and indirect effects of social group signals (e.g. race or gender) on speakers' responses with separate aspects of language as causal mediators. We illustrate the promises and chall… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted to Causal Inference and NLP (CI+NLP) Workshop at EMNLP 2021

    Journal ref: Causal Inference and NLP (CI+NLP) Workshop at EMNLP 2021

  6. arXiv:2109.00725  [pdf, other

    cs.CL cs.LG

    Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

    Authors: Amir Feder, Katherine A. Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E. Roberts, Brandon M. Stewart, Victor Veitch, Diyi Yang

    Abstract: A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the conver… ▽ More

    Submitted 30 July, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: Accepted to Transactions of the Association for Computational Linguistics (TACL)

  7. arXiv:2105.12936  [pdf, other

    cs.CL

    Corpus-Level Evaluation for Event QA: The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

    Authors: Andrew Halterman, Katherine A. Keith, Sheikh Muhammad Sarwar, Brendan O'Connor

    Abstract: Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall. We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the IndiaPoliceEvents corpus--all 21,391 sentences from 1,257 English-language Times of India articles ab… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: To appear in Findings of ACL 2021

    Journal ref: Findings of ACL 2021

  8. arXiv:2012.09951  [pdf, other

    cs.LG cs.HC

    Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting Data Scientists in Training Fair Models

    Authors: Brittany Johnson, Jesse Bartola, Rico Angell, Katherine Keith, Sam Witty, Stephen J. Giguere, Yuriy Brun

    Abstract: Modern software relies heavily on data and machine learning, and affects decisions that shape our world. Unfortunately, recent studies have shown that because of biases in data, software systems frequently inject bias into their decisions, from producing better closed caption transcriptions of men's voices than of women's voices to overcharging people of color for financial loans. To address bias… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

  9. arXiv:2010.04706  [pdf, other

    cs.CL

    Uncertainty over Uncertainty: Investigating the Assumptions, Annotations, and Text Measurements of Economic Policy Uncertainty

    Authors: Katherine A. Keith, Christoph Teichmann, Brendan O'Connor, Edgar Meij

    Abstract: Methods and applications are inextricably linked in science, and in particular in the domain of text-as-data. In this paper, we examine one such text-as-data application, an established economic index that measures economic policy uncertainty from keyword occurrences in news. This index, which is shown to correlate with firm investment, employment, and excess market returns, has had substantive im… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted to the 2020 Natural Language Processing + Computational Social Science Workshop (NLP+CSS) at EMNLP

    Journal ref: 2020 Natural Language Processing + Computational Social Science Workshop (NLP+CSS) at EMNLP

  10. arXiv:2005.00649  [pdf, other

    cs.CL

    Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates

    Authors: Katherine A. Keith, David Jensen, Brendan O'Connor

    Abstract: Many applications of computational social science aim to infer causal conclusions from non-experimental data. Such observational data often contains confounders, variables that influence both potential causes and potential effects. Unmeasured or latent confounders can bias causal estimates, and this has motivated interest in measuring potential confounders from observed text. For example, an indiv… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Accepted to ACL 2020

    Journal ref: ACL 2020

  11. arXiv:1906.02868  [pdf, other

    cs.CL

    Modeling financial analysts' decision making via the pragmatics and semantics of earnings calls

    Authors: Katherine A. Keith, Amanda Stent

    Abstract: Every fiscal quarter, companies hold earnings calls in which company executives respond to questions from analysts. After these calls, analysts often change their price target recommendations, which are used in equity research reports to help investors make decisions. In this paper, we examine analysts' decision making behavior as it pertains to the language content of earnings calls. We identify… ▽ More

    Submitted 24 June, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted at ACL 2019. Revised version includes appendix and NSF funding acknowledgment

  12. arXiv:1804.06004  [pdf, other

    cs.CL

    Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses

    Authors: Katherine A. Keith, Su Lin Blodgett, Brendan O'Connor

    Abstract: Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: To appear in Proceedings of NAACL 2018

  13. arXiv:1707.07086  [pdf, other

    cs.CL

    Identifying civilians killed by police with distantly supervised entity-event extraction

    Authors: Katherine A. Keith, Abram Handler, Michael Pinkham, Cara Magliozzi, Joshua McDuffie, Brendan O'Connor

    Abstract: We propose a new, socially-impactful task for natural language processing: from a news corpus, extract names of persons who have been killed by police. We present a newly collected police fatality corpus, which we release publicly, and present a model to solve this problem that uses EM-based distant supervision with logistic regression and convolutional neural network classifiers. Our model outper… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing