Skip to main content

Showing 1–21 of 21 results for author: Blodgett, S L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08723  [pdf, other

    cs.CL

    ECBD: Evidence-Centered Benchmark Design for NLP

    Authors: Yu Lu Liu, Su Lin Blodgett, Jackie Chi Kit Cheung, Q. Vera Liao, Alexandra Olteanu, Ziang Xiao

    Abstract: Benchmarking is seen as critical to assessing progress in NLP. However, creating a benchmark involves many design decisions (e.g., which datasets to include, which metrics to use) that often rely on tacit, untested assumptions about what the benchmark is intended to measure or is actually measuring. There is currently no principled way of analyzing these decisions and how they impact the validity… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2405.05860  [pdf, other

    cs.LG cs.CL cs.CY

    The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels

    Authors: Eve Fleisig, Su Lin Blodgett, Dan Klein, Zeerak Talat

    Abstract: Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2402.04420  [pdf, other

    cs.CY cs.AI

    Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways

    Authors: Angelina Wang, Xuechunzi Bai, Solon Barocas, Su Lin Blodgett

    Abstract: As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show th… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: earlier draft non-archival at EAAMO 2023

  4. arXiv:2311.11103  [pdf, other

    cs.CL

    Responsible AI Considerations in Text Summarization Research: A Review of Current Practices

    Authors: Yu Lu Liu, Meng Cao, Su Lin Blodgett, Jackie Chi Kit Cheung, Alexandra Olteanu, Adam Trischler

    Abstract: AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task lar… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  5. arXiv:2310.15398  [pdf, other

    cs.CL cs.HC

    "One-Size-Fits-All"? Examining Expectations around What Constitute "Fair" or "Good" NLG System Behaviors

    Authors: Li Lucy, Su Lin Blodgett, Milad Shokouhi, Hanna Wallach, Alexandra Olteanu

    Abstract: Fairness-related assumptions about what constitute appropriate NLG system behaviors range from invariance, where systems are expected to behave identically for social groups, to adaptation, where behaviors should instead vary across them. To illuminate tensions around invariance and adaptation, we conduct five case studies, in which we perturb different types of identity-related language features… ▽ More

    Submitted 3 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 36 pages, 24 figures, NAACL 2024

  6. arXiv:2306.05949  [pdf, other

    cs.CY cs.AI

    Evaluating the Social Impact of Generative AI Systems in Systems and Society

    Authors: Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Canyu Chen, Hal Daumé III, Jesse Dodge, Isabella Duan, Ellie Evans, Felix Friedrich, Avijit Ghosh, Usman Gohar, Sara Hooker, Yacine Jernite, Ria Kalluri, Alberto Lusoli, Alina Leidinger, Michelle Lin, Xiuzhu Lin, Sasha Luccioni, Jennifer Mickel, Margaret Mitchell, Jessica Newman , et al. (6 additional authors not shown)

    Abstract: Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categor… ▽ More

    Submitted 28 June, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: Forthcoming in Hacker, Engel, Hammer, Mittelstadt (eds), Oxford Handbook on the Foundations and Regulation of Generative AI. Oxford University Press

  7. arXiv:2305.12757  [pdf, other

    cs.CL

    This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language Models

    Authors: Seraphina Goldfarb-Tarrant, Eddie Ungless, Esma Balkir, Su Lin Blodgett

    Abstract: Bias research in NLP seeks to analyse models for social biases, thus hel** NLP practitioners uncover, measure, and mitigate social harms. We analyse the body of work that uses prompts and templates to assess bias in language models. We draw on a measurement modelling framework to create a taxonomy of attributes that capture what a bias test aims to measure and how that measurement is carried out… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL Findings 2023

  8. arXiv:2305.09022  [pdf, other

    cs.CL

    It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance

    Authors: Arjun Subramonian, Xingdi Yuan, Hal Daumé III, Su Lin Blodgett

    Abstract: Progress in NLP is increasingly measured through benchmarks; hence, contextualizing progress requires understanding when and why practitioners may disagree about the validity of benchmarks. We develop a taxonomy of disagreement, drawing on tools from measurement modeling, and distinguish between two types of disagreement: 1) how tasks are conceptualized and 2) how measurements of model performance… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2023

  9. arXiv:2305.01776  [pdf, other

    cs.CY

    Taxonomizing and Measuring Representational Harms: A Look at Image Tagging

    Authors: Jared Katzman, Angelina Wang, Morgan Scheuerman, Su Lin Blodgett, Kristen Laird, Hanna Wallach, Solon Barocas

    Abstract: In this paper, we examine computational approaches for measuring the "fairness" of image tagging systems, finding that they cluster into five distinct categories, each with its own analytic foundation. We also identify a range of normative concerns that are often collapsed under the terms "unfairness," "bias," or even "discrimination" when discussing problematic cases of image tagging. Specificall… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: AAAI-23 Special Track on AI for Social Impact

    Journal ref: Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023)

  10. arXiv:2301.05753  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Fairness and Sequential Decision Making: Limits, Lessons, and Opportunities

    Authors: Samer B. Nashed, Justin Svegliato, Su Lin Blodgett

    Abstract: As automated decision making and decision assistance systems become common in everyday life, research on the prevention or mitigation of potential harms that arise from decisions made by these systems has proliferated. However, various research communities have independently conceptualized these harms, envisioned potential applications, and proposed interventions. The result is a somewhat fracture… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: 10 pages

  11. arXiv:2212.14486  [pdf, other

    cs.CL

    Examining Political Rhetoric with Epistemic Stance Detection

    Authors: Ankita Gupta, Su Lin Blodgett, Justin H Gross, Brendan O'Connor

    Abstract: Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance predict… ▽ More

    Submitted 5 January, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: Forthcoming in Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS) at EMNLP 2022

  12. arXiv:2205.06828  [pdf, other

    cs.CL cs.AI

    Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

    Authors: Kaitlyn Zhou, Su Lin Blodgett, Adam Trischler, Hal Daumé III, Kaheer Suleman, Alexandra Olteanu

    Abstract: There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners' goals, assumptions, and constraints -- which inform decisions about what, when, an… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: Camera Ready for NAACL 2022 (Main Conference)

  13. arXiv:2110.10024  [pdf, other

    cs.CY cs.AI

    Risks of AI Foundation Models in Education

    Authors: Su Lin Blodgett, Michael Madaio

    Abstract: If the authors of a recent Stanford report (Bommasani et al., 2021) on the opportunities and risks of "foundation models" are to be believed, these models represent a paradigm shift for AI and for the domains in which they will supposedly be used, including education. Although the name is new (and contested (Field, 2021)), the term describes existing types of algorithmic models that are "trained o… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  14. arXiv:2106.11410  [pdf, other

    cs.CL

    A Survey of Race, Racism, and Anti-Racism in NLP

    Authors: Anjalie Field, Su Lin Blodgett, Zeerak Waseem, Yulia Tsvetkov

    Abstract: Despite inextricable ties between race and language, little work has considered race in NLP research and development. In this work, we survey 79 papers from the ACL anthology that mention race. These papers reveal various types of race-related bias in all stages of NLP model development, highlighting the need for proactive consideration of how NLP systems can uphold racial hierarchies. However, pe… ▽ More

    Submitted 15 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL 2021

  15. arXiv:2105.08847  [pdf, ps, other

    cs.CY cs.AI cs.HC

    Beyond "Fairness:" Structural (In)justice Lenses on AI for Education

    Authors: Michael Madaio, Su Lin Blodgett, Elijah Mayfield, Ezekiel Dixon-Román

    Abstract: Educational technologies, and the systems of schooling in which they are deployed, enact particular ideologies about what is important to know and how learners should learn. As artificial intelligence technologies -- in education and beyond -- may contribute to inequitable outcomes for marginalized communities, various approaches have been developed to evaluate and mitigate the harmful impacts of… ▽ More

    Submitted 1 November, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: To be published in: The Ethics of Artificial Intelligence in Education: Current Challenges, Practices and Debates, W. Holmesand K. Porayska-Pomsta (Eds.), Routledge. This revision incorporates reviewer feedback and updates the title to reflect the current book chapter title

    ACM Class: K.3; K.4; I.2

  16. arXiv:2104.03026  [pdf, ps, other

    cs.CL

    How to Write a Bias Statement: Recommendations for Submissions to the Workshop on Gender Bias in NLP

    Authors: Christian Hardmeier, Marta R. Costa-jussà, Kellie Webster, Will Radford, Su Lin Blodgett

    Abstract: At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage authors to give explicit consideration to the wider aspects of bias and its social implications. For the 2020 edition of the workshop, we therefore requested that all authors include an explicit bias statement in their work to clarify how their work relates to the social context in which NLP systems are used. The programme co… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: This document was originally published as a blog post on the web site of GeBNLP 2020

  17. arXiv:2005.14050  [pdf, other

    cs.CL cs.CY

    Language (Technology) is Power: A Critical Survey of "Bias" in NLP

    Authors: Su Lin Blodgett, Solon Barocas, Hal Daumé III, Hanna Wallach

    Abstract: We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the rel… ▽ More

    Submitted 29 May, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

  18. arXiv:1804.06004  [pdf, other

    cs.CL

    Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses

    Authors: Katherine A. Keith, Su Lin Blodgett, Brendan O'Connor

    Abstract: Dependency parsing research, which has made significant gains in recent years, typically focuses on improving the accuracy of single-tree predictions. However, ambiguity is inherent to natural language syntax, and communicating such ambiguity is important for error analysis and better-informed downstream applications. In this work, we propose a transition sampling algorithm to sample from the full… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: To appear in Proceedings of NAACL 2018

  19. arXiv:1707.00061  [pdf, other

    cs.CY cs.CL

    Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English

    Authors: Su Lin Blodgett, Brendan O'Connor

    Abstract: We highlight an important frontier in algorithmic fairness: disparity in the quality of natural language processing algorithms when applied to language from authors of different social groups. For example, current systems sometimes analyze the language of females and minorities more poorly than they do of whites and males. We conduct an empirical analysis of racial disparity in language identifica… ▽ More

    Submitted 30 June, 2017; originally announced July 2017.

    Comments: Presented as a talk at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

  20. arXiv:1608.08868  [pdf, other

    cs.CL

    Demographic Dialectal Variation in Social Media: A Case Study of African-American English

    Authors: Su Lin Blodgett, Lisa Green, Brendan O'Connor

    Abstract: Though dialectal language is increasingly abundant on social media, few resources exist for develo** NLP tools to handle such language. We conduct a case study of dialectal language in online conversational text by investigating African-American English (AAE) on Twitter. We propose a distantly supervised model to identify AAE-like language from demographics associated with geo-located messages,… ▽ More

    Submitted 31 August, 2016; originally announced August 2016.

    Comments: To be published in EMNLP 2016, 15 pages

  21. arXiv:1606.06352  [pdf, other

    stat.ML cs.CL cs.LG

    Visualizing textual models with in-text and word-as-pixel highlighting

    Authors: Abram Handler, Su Lin Blodgett, Brendan O'Connor

    Abstract: We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model's view of particular tokens in particular documents. Another uses a high-level, "words-as-pixels" graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a model's understanding of text. We show how… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY