Skip to main content

Showing 1–11 of 11 results for author: Antoniak, M

.
  1. arXiv:2406.18906  [pdf, other

    cs.CL

    Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets

    Authors: Melanie Walsh, Anna Preus, Maria Antoniak

    Abstract: Large language models (LLMs) can now generate and recognize text in a wide range of styles and genres, including highly specialized, creative genres like poetry. But what do LLMs really know about poetry? What can they know about poetry? We develop a task to evaluate how well LLMs recognize a specific aspect of poetry, poetic form, for more than 20 forms and formal elements in the English language… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2404.06664  [pdf, other

    cs.CL cs.AI cs.HC

    CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

    Authors: Yu Ying Chiu, Liwei Jiang, Maria Antoniak, Chan Young Park, Shuyue Stella Li, Mehar Bhatia, Sahithya Ravi, Yulia Tsvetkov, Vered Shwartz, Ye** Choi

    Abstract: Frontier large language models (LLMs) are developed by researchers and practitioners with skewed cultural backgrounds and on datasets with skewed sources. However, LLMs' (lack of) multicultural knowledge cannot be effectively assessed with current methods for develo** benchmarks. Existing multicultural evaluations primarily rely on expensive and restricted human annotations or potentially outdat… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Preprint (under review)

  3. arXiv:2401.07340  [pdf

    cs.CL

    The Afterlives of Shakespeare and Company in Online Social Readership

    Authors: Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney

    Abstract: The growth of social reading platforms such as Goodreads and LibraryThing enables us to analyze reading activity at very large scale and in remarkable detail. But twenty-first century systems give us a perspective only on contemporary readers. Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller com… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  4. arXiv:2312.11803  [pdf, other

    cs.CL

    NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs

    Authors: Maria Antoniak, Aakanksha Naik, Carla S. Alvarado, Lucy Lu Wang, Irene Y. Chen

    Abstract: Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices… ▽ More

    Submitted 23 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. Riveter: Measuring Power and Social Dynamics Between Entities

    Authors: Maria Antoniak, Anjalie Field, Jimin Mun, Melanie Walsh, Lauren F. Klein, Maarten Sap

    Abstract: Riveter provides a complete easy-to-use pipeline for analyzing verb connotations associated with entities in text corpora. We prepopulate the package with connotation frames of sentiment, power, and agency, which have demonstrated usefulness for capturing social phenomena, such as gender bias, in a broad range of corpora. For decades, lexical frameworks have been foundational tools in computationa… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Volume 3: System Demonstrations, 2023, pages 377-388

  6. arXiv:2311.09675  [pdf, other

    cs.CL

    Where Do People Tell Stories Online? Story Detection Across Online Communities

    Authors: Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper

    Abstract: Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict stor… ▽ More

    Submitted 26 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  7. arXiv:2311.09481  [pdf, other

    cs.CL

    Personalized Jargon Identification for Enhanced Interdisciplinary Communication

    Authors: Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal August

    Abstract: Scientific jargon can impede researchers when they read materials from other domains. Current methods of jargon identification mainly use corpus-level familiarity indicators (e.g., Simple Wikipedia represents plain language). However, researchers' familiarity of a term can vary greatly based on their own background. We collect a dataset of over 10K term familiarity annotations from 11 computer sci… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  8. arXiv:2301.09295  [pdf, other

    cs.CL

    Sensemaking About Contraceptive Methods Across Online Platforms

    Authors: LeAnn McDowall, Maria Antoniak, David Mimno

    Abstract: Selecting a birth control method is a complex healthcare decision. While birth control methods provide important benefits, they can also cause unpredictable side effects and be stigmatized, leading many people to seek additional information online, where they can find reviews, advice, hypotheses, and experiences of other birth control users. However, the relationships between their healthcare conc… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

  9. arXiv:2205.07557  [pdf, ps, other

    cs.CL

    Heroes, Villains, and Victims, and GPT-3: Automated Extraction of Character Roles Without Training Data

    Authors: Dominik Stammbach, Maria Antoniak, Elliott Ash

    Abstract: This paper shows how to use large-scale pre-trained language models to extract character roles from narrative texts without training data. Queried with a zero-shot question-answering prompt, GPT-3 can identify the hero, villain, and victim in diverse domains: newspaper articles, movie plot summaries, and political speeches.

    Submitted 17 May, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

  10. Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron

    Authors: A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

    Abstract: We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian. We focus our analysis on the question: Do the different storytellers in the text exhibit distinct personalities? To answer this question, we curate and release a dataset based on the authoritative edition of the text. We us… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (co-located with EMNLP 2021)

  11. Unsupervised Segmentation of Hyperspectral Images Using 3D Convolutional Autoencoders

    Authors: Jakub Nalepa, Michal Myller, Yasuteru Imai, Ken-ichi Honda, Tomomi Takeda, Marek Antoniak

    Abstract: Hyperspectral image analysis has become an important topic widely researched by the remote sensing community. Classification and segmentation of such imagery help understand the underlying materials within a scanned scene, since hyperspectral images convey a detailed information captured in a number of spectral bands. Although deep learning has established the state of the art in the field, it sti… ▽ More

    Submitted 20 July, 2019; originally announced July 2019.

    Comments: Submitted to IEEE Geoscience and Remote Sensing Letters