Skip to main content

Showing 1–47 of 47 results for author: Hovy, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.09482  [pdf, other

    cs.CL

    Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts

    Authors: Donya Rooein, Paul Rottger, Anastassia Shaitarova, Dirk Hovy

    Abstract: Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic. Effective teaching, however, requires teachers to adapt the difficulty of content and explanations to the education level of their students. Even the best LLMs today struggle to do this well. If we want to improve LLMs on this adaptation task, we need to be able to measure adaptation succes… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  2. arXiv:2405.06563  [pdf, other

    cs.CL

    What Can Natural Language Processing Do for Peer Review?

    Authors: Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, **gyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

    Abstract: The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  3. arXiv:2405.02411  [pdf, other

    cs.CL

    The Call for Socially Aware Language Technologies

    Authors: Diyi Yang, Dirk Hovy, David Jurgens, Barbara Plank

    Abstract: Language technologies have made enormous progress, especially with the introduction of large language models (LLMs). On traditional tasks such as machine translation and sentiment analysis, these models perform at near-human level. These advances can, however, exacerbate a variety of issues that models have traditionally struggled with, such as bias, evaluation, and risks. In this position paper,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  4. arXiv:2404.10475  [pdf, other

    cs.CL

    Conversations as a Source for Teaching Scientific Concepts at Different Education Levels

    Authors: Donya Rooein, Dirk Hovy

    Abstract: Open conversations are one of the most engaging forms of teaching. However, creating those conversations in educational software is a complex endeavor, especially if we want to address the needs of different audiences. While language models hold great promise for educational applications, there are substantial challenges in training them to engage in meaningful and effective conversational teachin… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  5. arXiv:2404.05399  [pdf, other

    cs.CL cs.AI

    SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

    Authors: Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

    Abstract: The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by introducing an abundance of new datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  6. arXiv:2403.05700  [pdf, other

    cs.CL

    DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods

    Authors: Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz

    Abstract: Social scientists increasingly use demographically stratified social media data to study the attitudes, beliefs, and behavior of the general public. To facilitate such analyses, we construct, validate, and release publicly the representative DADIT dataset of 30M tweets of 20k Italian Twitter users, along with their bios and profile pictures. We enrich the user data with high-quality labels for gen… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  7. arXiv:2403.04445  [pdf, other

    cs.CL

    Classist Tools: Social Class Correlates with Performance in NLP

    Authors: Amanda Cercas Curry, Giuseppe Attanasio, Zeerak Talat, Dirk Hovy

    Abstract: Since the foundational work of William Labov on the social stratification of language (Labov, 1964), linguistics has made concentrated efforts to explore the links between sociodemographic characteristics and language production and perception. But while there is strong evidence for socio-demographic characteristics in language, they are infrequently used in Natural Language Processing (NLP). Age… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  8. arXiv:2403.03874  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Impoverished Language Technology: The Lack of (Social) Class in NLP

    Authors: Amanda Cercas Curry, Zeerak Talat, Dirk Hovy

    Abstract: Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception. Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these facto… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  9. arXiv:2403.03121  [pdf, other

    cs.CL

    Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution

    Authors: Flor Miriam Plaza-del-Arco, Amanda Cercas Curry, Alba Curry, Gavin Abercrombie, Dirk Hovy

    Abstract: Large language models (LLMs) reflect societal norms and biases, especially about gender. While societal biases and stereotypes have been extensively researched in various NLP applications, there is a surprising gap for emotion analysis. However, emotion and gender are closely linked in societal discourse. E.g., women are often thought of as more empathetic, while men's anger is more socially accep… ▽ More

    Submitted 28 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024

  10. arXiv:2403.01222  [pdf, other

    cs.CL

    Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions

    Authors: Flor Miriam Plaza-del-Arco, Alba Curry, Amanda Cercas Curry, Dirk Hovy

    Abstract: Emotions are a central aspect of communication. Consequently, emotion analysis (EA) is a rapidly growing field in natural language processing (NLP). However, there is no consensus on scope, direction, or methods. In this paper, we conduct a thorough review of 154 relevant NLP publications from the last decade. Based on this review, we address four different questions: (1) How are EA tasks defined… ▽ More

    Submitted 18 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  11. arXiv:2402.17954  [pdf, other

    cs.CL

    Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps

    Authors: Giuseppe Attanasio, Beatrice Savoldi, Dennis Fucci, Dirk Hovy

    Abstract: Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes. However, this broad language coverage hides performance gaps within languages, for example, across genders. Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets, encompassing 19 languages from eight languag… ▽ More

    Submitted 19 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 23 pages. Code and artifacts at https://github.com/g8a9/multilingual-asr-gender-gap

  12. arXiv:2402.16786  [pdf, other

    cs.CL cs.AI

    Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

    Authors: Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

    Abstract: Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificial… ▽ More

    Submitted 5 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (Main Conference)

  13. arXiv:2402.14499  [pdf, other

    cs.CL

    "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

    Authors: Xinpeng Wang, Bolei Ma, Chengzhi Hu, Leon Weber-Genzel, Paul Röttger, Frauke Kreuter, Dirk Hovy, Barbara Plank

    Abstract: The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions (MCQ) to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final r… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  14. arXiv:2401.12492  [pdf, other

    cs.CL cs.AI cs.LG

    Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?

    Authors: Nikita Soni, Niranjan Balasubramanian, H. Andrew Schwartz, Dirk Hovy

    Abstract: Incorporating human context into language models is the next frontier for human-centered natural language processing. Currently, two pre-training methods exist: group-wise attributes (e.g., over-45-year-olds) or individual traits. Group attributes are coarse -- not all 45-year-olds write the same way -- while modeling individual traits allows for a more personalized representation, but requires mo… ▽ More

    Submitted 26 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  15. arXiv:2312.02065  [pdf, other

    cs.CL cs.AI

    Know Your Audience: Do LLMs Adapt to Different Age and Education Levels?

    Authors: Donya Rooein, Amanda Cercas Curry, Dirk Hovy

    Abstract: Large language models (LLMs) offer a range of new possibilities, including adapting the text to different audiences and their reading needs. But how well do they adapt? We evaluate the readability of answers generated by four state-of-the-art LLMs (commercial and open-source) to science questions when prompted to target different age groups and education levels. To assess the adaptability of LLMs… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  16. arXiv:2311.11844  [pdf, other

    cs.CL cs.CY

    How to Use Large Language Models for Text Coding: The Case of Fatherhood Roles in Public Policy Documents

    Authors: Lorenzo Lupo, Oscar Magnusson, Dirk Hovy, Elin Naurin, Lena Wängnerud

    Abstract: Recent advances in large language models (LLMs) like GPT-3 and GPT-4 have opened up new opportunities for text analysis in political science. They promise automation with better results and less programming. In this study, we evaluate LLMs on three original coding tasks of non-English political science texts, and we provide a detailed description of a general workflow for using LLMs for text codin… ▽ More

    Submitted 15 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

    ACM Class: J.4; I.2

  17. arXiv:2309.07733  [pdf, other

    cs.CL cs.SD eess.AS

    Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

    Authors: Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis

    Abstract: Recent advances in eXplainable AI (XAI) have provided new insights into how models for vision, language, and tabular data operate. However, few approaches exist for understanding speech models. Existing work focuses on a few spoken language understanding (SLU) tasks, and explanations are difficult to interpret for most users. We introduce a new approach to explain speech classification models. We… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 8 pages

  18. arXiv:2308.01263  [pdf, other

    cs.CL cs.AI

    XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

    Authors: Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

    Abstract: Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both helpful and harmless. However, there is a tension between these two objectives, since harmlessness requires models to refuse to comply with unsafe prompts, and… ▽ More

    Submitted 1 April, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: Accepted at NAACL 2024 (Main Conference)

  19. arXiv:2307.12973  [pdf, other

    cs.CL

    Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation

    Authors: Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy

    Abstract: Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and FSL) scenarios. However, since they are trained on different datasets, performance varies widely across tasks between those models. Recent studies emphasize the importance of considering human label variation in data annotation. However, how this human label variation… ▽ More

    Submitted 15 April, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted to the 3rd Workshop on Perspectivist Approaches to NLP at LREC-COLING 2024

  20. arXiv:2306.11559  [pdf, other

    cs.CL

    The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

    Authors: Matthias Orlikowski, Paul Röttger, Philipp Cimiano, Dirk Hovy

    Abstract: Many NLP tasks exhibit human label variation, where different annotators give different labels to the same texts. This variation is known to depend, at least in part, on the sociodemographics of annotators. Recent research aims to model individual annotator behaviour rather than predicting aggregated labels, and we would expect that sociodemographic information is useful for these models. On the o… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: ACL2023 Camera-Ready

  21. arXiv:2305.16051  [pdf, other

    cs.CL cs.AI cs.CY

    What about em? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns

    Authors: Anne Lauscher, Debora Nozza, Archie Crowley, Ehm Miltersen, Dirk Hovy

    Abstract: As 3rd-person pronoun usage shifts to include novel forms, e.g., neopronouns, we need more research on identity-inclusive NLP. Exclusion is particularly harmful in one of the most popular NLP applications, machine translation (MT). Wrong pronoun translations can discriminate against marginalized groups, e.g., non-binary individuals (Dev et al., 2021). In this ``reality check'', we study how three… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL

  22. arXiv:2305.01633  [pdf, other

    cs.CL

    Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

    Authors: Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai , et al. (17 additional authors not shown)

    Abstract: We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, a… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023). Updated author list and acknowledgements

    MSC Class: 68 ACM Class: I.2.7

  23. arXiv:2304.02983  [pdf, other

    cs.CL cs.SI

    Leveraging Social Interactions to Detect Misinformation on Social Media

    Authors: Tommaso Fornaciari, Luca Luceri, Emilio Ferrara, Dirk Hovy

    Abstract: Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as reliable or unreliable, based on a previous evaluation of the information source. The models identifying unreliable threads usually rely on textual feat… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  24. arXiv:2301.10684  [pdf, other

    cs.CL

    Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement

    Authors: Gavin Abercrombie, Verena Rieser, Dirk Hovy

    Abstract: We commonly use agreement measures to assess the utility of judgements made by human annotators in Natural Language Processing (NLP) tasks. While inter-annotator agreement is frequently used as an indication of label reliability by measuring consistency between annotators, we argue for the additional use of intra-annotator agreement to measure label stability over time. However, in a systematic re… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  25. arXiv:2212.09056  [pdf, other

    cs.CL

    Beyond Digital "Echo Chambers": The Role of Viewpoint Diversity in Political Discussion

    Authors: Rishav Hada, Amir Ebrahimi Fard, Sarah Shugars, Federico Bianchi, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev

    Abstract: Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming -- siloed in so called ``echo chambers'' of exclusively like-minded discussants. Yet, to date we lack sufficient means to measure viewpoint diversity in conversations. To this end, in this paper, we operationalize two viewpoint metrics proposed for recommender systems a… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

    Comments: Camera-ready version in WSDM 2023

  26. arXiv:2211.04281  [pdf, other

    cs.CL

    SocioProbe: What, When, and Where Language Models Learn about Sociodemographics

    Authors: Anne Lauscher, Federico Bianchi, Samuel Bowman, Dirk Hovy

    Abstract: Pre-trained language models (PLMs) have outperformed other NLP models on a wide range of tasks. Opting for a more thorough understanding of their capabilities and inner workings, researchers have established the extend to which they capture lower-level knowledge like grammaticality, and mid-level semantic knowledge like factual understanding. However, there is still little understanding of their k… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Accepted for publication at EMNLP 2022

  27. arXiv:2211.04256  [pdf, other

    cs.CL

    Bridging Fairness and Environmental Sustainability in Natural Language Processing

    Authors: Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher

    Abstract: Fairness and environmental impact are important research directions for the sustainable development of artificial intelligence. However, while each topic is an active research area in natural language processing (NLP), there is a surprising lack of research on the interplay between the two fields. This lacuna is highly problematic, since there is increasing evidence that an exclusive focus on fair… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: Accepted for publication at EMNLP 2022

  28. arXiv:2210.15870  [pdf, other

    cs.CL

    "It's Not Just Hate'': A Multi-Dimensional Perspective on Detecting Harmful Speech Online

    Authors: Federico Bianchi, Stefanie Anja Hills, Patricia Rossini, Dirk Hovy, Rebekah Tromble, Nava Tintarev

    Abstract: Well-annotated data is a prerequisite for good Natural Language Processing models. Too often, though, annotation decisions are governed by optimizing time or annotator agreement. We make a case for nuanced efforts in an interdisciplinary setting for annotating offensive online speech. Detecting offensive content is rapidly becoming one of the most important real-world NLP tasks. However, most data… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  29. arXiv:2210.14763  [pdf, other

    cs.CL

    ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

    Authors: Tommaso Fornaciari, Dirk Hovy, Federico Bianchi

    Abstract: The most common ways to explore latent document dimensions are topic models and clustering methods. However, topic models have several drawbacks: e.g., they require us to choose the number of latent dimensions a priori, and the results are stochastic. Most clustering methods have the same issues and lack flexibility in various ways, such as not accounting for the influence of different topics on s… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  30. arXiv:2210.11359  [pdf, other

    cs.CL

    Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages

    Authors: Paul Röttger, Debora Nozza, Federico Bianchi, Dirk Hovy

    Abstract: Hate speech is a global phenomenon, but most hate speech datasets so far focus on English-language content. This hinders the development of more effective hate speech detection models in hundreds of languages spoken by billions across the world. More data is needed, but annotating hateful content is expensive, time-consuming and potentially harmful to annotators. To mitigate these issues, we explo… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 (Main Conference)

  31. arXiv:2210.07595  [pdf, other

    cs.CL

    The State of Profanity Obfuscation in Natural Language Processing

    Authors: Debora Nozza, Dirk Hovy

    Abstract: Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While maintaining publications' professional appearance,… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  32. arXiv:2210.07365  [pdf, other

    cs.CL

    Is It Worth the (Environmental) Cost? Limited Evidence for Temporal Adaptation via Continuous Training

    Authors: Giuseppe Attanasio, Debora Nozza, Federico Bianchi, Dirk Hovy

    Abstract: Language is constantly changing and evolving, leaving language models to become quickly outdated. Consequently, we should continuously update our models with new data to expose them to new events and facts. However, that requires additional computing, which means new carbon emissions. Do any measurable benefits justify this cost? This paper looks for empirical evidence to support continuous traini… ▽ More

    Submitted 4 May, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 8 pages

  33. arXiv:2210.07362  [pdf, other

    cs.CL

    Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

    Authors: Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

    Abstract: Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models. In this work, we investigate whether these previous findings still hold with state-of-the-art pretrained Transformer-based language models (PLMs). We use three common specialization methods… ▽ More

    Submitted 9 May, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Findings of EACL 2023. arXiv admin note: text overlap with arXiv:2208.01029

  34. arXiv:2208.01029  [pdf, other

    cs.CL

    On the Limitations of Sociodemographic Adaptation with Transformers

    Authors: Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

    Abstract: Sociodemographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating specific sociodemographic factors can consistently improve performance for various NLP tasks in traditional NLP models. We investigate whether these previous findings still hold with state-of-the-art pretrained Transformers. We use three common specialization methods proven effective for inco… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

  35. arXiv:2203.09192  [pdf, other

    cs.CL

    Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

    Authors: Giuseppe Attanasio, Debora Nozza, Dirk Hovy, Elena Baralis

    Abstract: Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms like gay, or women, resulting in false positives, severe unintended bias, and lower performance. Most mitigation techniques use lists of identity term… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted to Findings of ACL 2022

  36. arXiv:2202.11923  [pdf, other

    cs.CL

    Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender

    Authors: Anne Lauscher, Archie Crowley, Dirk Hovy

    Abstract: The world of pronouns is changing. From a closed class of words with few members to a much more open set of terms to reflect identities. However, Natural Language Processing (NLP) is barely reflecting this linguistic shift, even though recent work outlined the harms of gender-exclusive language technology. Particularly problematic is the current modeling 3rd person pronouns, as it largely ignores… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  37. arXiv:2201.10986  [pdf, other

    cs.CL

    Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

    Authors: Federico Bianchi, Vincenzo Cutrona, Dirk Hovy

    Abstract: Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years. However, the textual data alone are often not enough to conduct studies: especially social scientists need more variables to perform their analysis and control for various factors. How we augment this information, such as users' location, age,… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

  38. arXiv:2201.07670  [pdf, other

    cs.CL cs.LG

    Top-Down Influence? Predicting CEO Personality and Risk Impact from Speech Transcripts

    Authors: Kilian Theil, Dirk Hovy, Heiner Stuckenschmidt

    Abstract: How much does a CEO's personality impact the performance of their company? Management theory posits a great influence, but it is difficult to show empirically -- there is a lack of publicly available self-reported personality data of top managers. Instead, we propose a text-based personality regressor using crowd-sourced Myers--Briggs Type Indicator (MBTI) assessments. The ratings have a high inte… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

  39. arXiv:2112.07475  [pdf, other

    cs.CL

    Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

    Authors: Paul Röttger, Bertie Vidgen, Dirk Hovy, Janet B. Pierrehumbert

    Abstract: Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but rarely actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downs… ▽ More

    Submitted 29 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Accepted at NAACL 2022 (Main Conference)

  40. arXiv:2109.13037  [pdf, other

    cs.CL

    Language Invariant Properties in Natural Language Processing

    Authors: Federico Bianchi, Debora Nozza, Dirk Hovy

    Abstract: Meaning is context-dependent, but many properties of language (should) remain the same even if we transform the context. For example, sentiment, entailment, or speaker properties should be the same in a translation and original of a text. We introduce language invariant properties: i.e., properties that should not change when we transform text, and how they can be used to quantitatively evaluate t… ▽ More

    Submitted 1 October, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

  41. arXiv:2107.03451  [pdf, other

    cs.CL cs.AI

    Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling

    Authors: Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, Verena Rieser

    Abstract: Over the last several years, end-to-end neural conversational agents have vastly improved in their ability to carry a chit-chat conversation with humans. However, these models are often trained on large datasets from the internet, and as a result, may learn undesirable behaviors from this data, such as toxic or otherwise harmful language. Researchers must thus wrestle with the issue of how and whe… ▽ More

    Submitted 23 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

  42. arXiv:2004.07737  [pdf, other

    cs.CL

    Cross-lingual Contextualized Topic Models with Zero-shot Learning

    Authors: Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, Elisabetta Fersini

    Abstract: Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in multiple languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by transfer learning. In this paper, we introduc… ▽ More

    Submitted 4 February, 2021; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Updated version. Published as a conference paper at EACL2021

  43. arXiv:2004.03974  [pdf, other

    cs.CL

    Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence

    Authors: Federico Bianchi, Silvia Terragni, Dirk Hovy

    Abstract: Topic models extract groups of words from documents, whose interpretation as a topic hopefully allows for a better understanding of the data. However, the resulting word groups are often not coherent, making them harder to interpret. Recently, neural topic models have shown improvements in overall coherence. Concurrently, contextual embeddings have advanced the state of the art of neural models in… ▽ More

    Submitted 17 June, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Updated version. Published as a conference paper at ACL-IJCNLP 2021

  44. arXiv:2003.02912  [pdf, other

    cs.CL

    What the [MASK]? Making Sense of Language-Specific BERT Models

    Authors: Debora Nozza, Federico Bianchi, Dirk Hovy

    Abstract: Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional Encoder Representations from Transformers), which enables researchers to obtain state-of-the art performance on numerous NLP tasks by fine-tuning the… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

  45. Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview

    Authors: Deven Shah, H. Andrew Schwartz, Dirk Hovy

    Abstract: An increasing number of works in natural language processing have addressed the effect of bias on the predicted outcomes, introducing mitigation techniques that act on different parts of the standard NLP pipeline (data and models). However, these works have been conducted in isolation, without a unifying framework to organize efforts within the field. This leads to repetitive approaches, and puts… ▽ More

    Submitted 12 September, 2020; v1 submitted 9 November, 2019; originally announced December 2019.

    Comments: 9 pages excluding references, 1 figure, 3 pages for appendix

    Journal ref: Association for Computational Linguistics. (2020) 5248--5264

  46. arXiv:1712.03538  [pdf, other

    cs.CL

    Multi-Task Learning for Mental Health using Social Media Text

    Authors: Adrian Benton, Margaret Mitchell, Dirk Hovy

    Abstract: We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework. By modeling multiple conditions, the system learns to make predictions about suicide risk and mental health at a low false positive rate. Conditions are modeled as tasks in a multi-task learning (MTL) framework, with gender prediction as an additional auxiliary task. We demonstrate the effec… ▽ More

    Submitted 10 December, 2017; originally announced December 2017.

    ACM Class: I.2.7

    Journal ref: Proceedings of the 15th Conference of the EACL (2017) 152-162

  47. arXiv:1707.04913  [pdf, other

    cs.CL

    End-to-End Information Extraction without Token-Level Supervision

    Authors: Rasmus Berg Palm, Dirk Hovy, Florian Laws, Ole Winther

    Abstract: Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels are usually not the desired output, but just an intermediary step. End-to-end (E2E) models, which take raw… ▽ More

    Submitted 16 July, 2017; originally announced July 2017.

    Comments: http://speechnlp.github.io/2017 @ EMNLP 2017