Skip to main content

Showing 1–50 of 110 results for author: Mihalcea, R

.
  1. arXiv:2407.02623  [pdf, other

    cs.CY cs.AI cs.CL cs.CV

    Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

    Authors: Joan Nwatu, Oana Ignat, Rada Mihalcea

    Abstract: To address this issue, we formulate translated non-English, geographic, and socioeconomic integrated prompts and evaluate their impact on VL model performance for data from different countries and income groups. Our findings show that geographic and socioeconomic integrated prompts improve VL performance on lower-income data and favor the retrieval of topic appearances commonly found in data from… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    ACM Class: K.4; I.2.7; I.2.8

  2. arXiv:2407.02273  [pdf, other

    cs.CL

    Multilingual Trolley Problems for Language Models

    Authors: Zhi**g **, Sydney Levine, Max Kleiman-Weiner, Giorgio Piatti, Jiarui Liu, Fernando Gonzalez Adauto, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Ye** Choi, Bernhard Schölkopf

    Abstract: As large language models (LLMs) are deployed in more and more real-world situations, it is crucial to understand their decision-making when faced with moral dilemmas. Inspired by a large-scale cross-cultural study of human moral preferences, "The Moral Machine Experiment", we set up the same set of moral choices for LLMs. We translate 1K vignettes of moral dilemmas, parametrically varied across ke… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2406.16152  [pdf, other

    cs.CL

    Towards Region-aware Bias Evaluation Metrics

    Authors: Angana Borah, Aparna Garimella, Rada Mihalcea

    Abstract: When exposed to human-generated data, language models are known to learn and amplify societal biases. While previous works introduced benchmarks that can be used to assess the bias in these models, they rely on assumptions that may not be universally true. For instance, a gender bias dimension commonly used by these metrics is that of family--career, but this may not be the only common bias in cer… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  4. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  5. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  6. arXiv:2405.20318  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    CausalQuest: Collecting Natural Causal Questions for AI Agents

    Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Amélie Reymond, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schölkopf, Zhi**g **

    Abstract: Humans have an innate drive to seek out causality. Whether fuelled by curiosity or specific goals, we constantly question why things happen, how they are interconnected, and many other related phenomena. To develop AI agents capable of addressing this natural human quest for causality, we urgently need a comprehensive dataset of natural causal questions. Unfortunately, existing datasets either con… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2405.14808  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Implicit Personalization in Language Models: A Systematic Study

    Authors: Zhi**g **, Nils Heil, Jiarui Liu, Shehzaad Dhuliawala, Yahang Qi, Bernhard Schölkopf, Rada Mihalcea, Mrinmaya Sachan

    Abstract: Implicit Personalization (IP) is a phenomenon of language models inferring a user's background from the implicit cues in the input prompts and tailoring the response based on this inference. While previous work has touched upon various instances of this problem, there lacks a unified framework to study this behavior. This work systematically studies IP through a rigorous mathematical formulation,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.04655  [pdf, other

    cs.CL

    Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

    Authors: Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, Rada Mihalcea

    Abstract: Large language models (LLMs) have demonstrated substantial commonsense understanding through numerous benchmark evaluations. However, their understanding of cultural commonsense remains largely unexamined. In this paper, we conduct a comprehensive examination of the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks. Using several general and… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2404.18739  [pdf, other

    cs.CL

    Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification

    Authors: Artem Abzaliev, Humberto Pérez Espinosa, Rada Mihalcea

    Abstract: Similar to humans, animals make extensive use of verbal and non-verbal forms of communication, including a large range of audio signals. In this paper, we address dog vocalizations and explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks that find parallels in human-centered tasks in speech recognition. We specifically… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: to be published in LREC-COLING 2024

  10. arXiv:2404.16698  [pdf, other

    cs.CL

    Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents

    Authors: Giorgio Piatti, Zhi**g **, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea

    Abstract: As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions is a significant challenge. This paper introduces the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. Using GovSim, we investigate the dynamics of sustainable resource sharing in a society… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Revised version

  11. arXiv:2404.12938  [pdf, other

    cs.CL cs.AI

    MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews

    Authors: Oana Ignat, Xiaomeng Xu, Rada Mihalcea

    Abstract: Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily o… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  12. arXiv:2404.12933  [pdf, other

    cs.CL cs.AI

    Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

    Authors: Oana Ignat, Gayathri Ganesh Lakshmy, Rada Mihalcea

    Abstract: Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  13. arXiv:2404.11055  [pdf, other

    cs.CL

    On the Causal Nature of Sentiment Analysis

    Authors: Zhiheng Lyu, Zhi**g **, Fernando Gonzalez, Rada Mihalcea, Bernhard Schoelkopf, Mrinmaya Sachan

    Abstract: Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the tradi… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: An enhanced version of our previous exploration in arXiv:2305.01764

  14. arXiv:2404.09956  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

    Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

    Abstract: Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: https://github.com/declare-lab/tango

  15. arXiv:2404.08760  [pdf, other

    cs.CL cs.AI

    The Generation Gap:Exploring Age Bias in the Underlying Value Systems of Large Language Models

    Authors: Siyang Liu, Trish Maturi, Bowen Yi, Siqi Shen, Rada Mihalcea

    Abstract: In this paper, we explore the alignment of values in Large Language Models (LLMs) with specific age groups, leveraging data from the World Value Survey across thirteen categories. Through a diverse set of prompts tailored to ensure response robustness, we find a general inclination of LLM values towards younger demographics. Additionally, we explore the impact of incorporating age identity informa… ▽ More

    Submitted 13 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 4 pages

  16. arXiv:2403.16909  [pdf, other

    cs.AI cs.CL cs.CY

    Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data

    Authors: Shinka Mori, Oana Ignat, Andrew Lee, Rada Mihalcea

    Abstract: Synthetic data generation has the potential to impact applications and domains with scarce data. However, before such data is used for sensitive tasks such as mental health, we need an understanding of how different demographics are represented in it. In our paper, we analyze the potential of producing synthetic data using GPT-3 by exploring the various stressors it attributes to different race an… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 14 pages, 16 figures

  17. arXiv:2403.13578  [pdf, other

    cs.CL cs.LG

    Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation

    Authors: Do June Min, Veronica Perez-Rosas, Kenneth Resnicow, Rada Mihalcea

    Abstract: In this paper, we study the problem of multi-reward reinforcement learning to jointly optimize for multiple text qualities for natural language generation. We focus on the task of counselor reflection generation, where we optimize the generators to simultaneously improve the fluency, coherence, and reflection quality of generated counselor responses. We introduce two novel bandit methods, DynaOpt… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  18. arXiv:2403.07687  [pdf, other

    cs.CV cs.AI cs.CL

    Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

    Authors: Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea

    Abstract: Current foundation models have shown impressive performance across various tasks. However, several studies have revealed that these models are not effective for everyone due to the imbalanced geographical and economic representation of the data used in the training process. Most of this data comes from Western countries, leading to poor results for underrepresented countries. To address this issue… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted at COLING 2024

  19. arXiv:2403.00096  [pdf

    cs.CY

    Future of Pandemic Prevention and Response CCC Workshop Report

    Authors: David Danks, Rada Mihalcea, Katie Siek, Mona Singh, Brian Dixon, Haley Griffin

    Abstract: This report summarizes the discussions and conclusions of a 2-day multidisciplinary workshop that brought together researchers and practitioners in healthcare, computer science, and social sciences to explore what lessons were learned and what actions, primarily in research, could be taken. One consistent observation was that there is significant merit in thinking not only about pandemic situation… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  20. arXiv:2402.15021  [pdf, other

    cs.CV cs.CL

    CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

    Authors: Santiago Castro, Amir Ziai, Avneesh Saluja, Zhuoning Yuan, Rada Mihalcea

    Abstract: Recent years have witnessed a significant increase in the performance of Vision and Language tasks. Foundational Vision-Language Models (VLMs), such as CLIP, have been leveraged in multiple settings and demonstrated remarkable performance across several tasks. Such models excel at object-centric recognition yet learn text representations that seem invariant to word order, failing to compose known… ▽ More

    Submitted 29 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  21. arXiv:2402.14851  [pdf, other

    cs.CL cs.AI cs.DB

    SQL-CRAFT: Text-to-SQL through Interactive Refinement and Enhanced Reasoning

    Authors: Hanchen Xia, Feng Jiang, Naihao Deng, Cunxiang Wang, Guojiang Zhao, Rada Mihalcea, Yue Zhang

    Abstract: Modern LLMs have become increasingly powerful, but they are still facing challenges in specialized tasks such as Text-to-SQL. We propose SQL-CRAFT, a framework to advance LLMs' SQL generation Capabilities through inteRActive reFinemenT and enhanced reasoning. We leverage an Interactive Correction Loop (IC-Loop) for LLMs to interact with databases automatically, as well as Python-enhanced reasoning… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 11 pages, 3 figures, 6 tables

  22. arXiv:2402.12424  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

    Authors: Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea

    Abstract: In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats. Our analyses extend across six benchmarks for table-related tasks such as question-answering and fact-checking. We introduce for the first time the assessment of LLMs' performance on image-based table representations. Specifically, we compare five te… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 Findings

  23. arXiv:2402.12071  [pdf, other

    cs.CL cs.AI

    EmoBench: Evaluating the Emotional Intelligence of Large Language Models

    Authors: Sahand Sabour, Siyang Liu, Zheyuan Zhang, June M. Liu, **feng Zhou, Alvionna S. Sunaryo, Juanzi Li, Tatia M. C. Lee, Rada Mihalcea, Minlie Huang

    Abstract: Recent advances in Large Language Models (LLMs) have highlighted the need for robust, comprehensive, and challenging benchmarks. Yet, research on evaluating their Emotional Intelligence (EI) is considerably limited. Existing benchmarks have two major shortcomings: first, they mainly focus on emotion recognition, neglecting essential EI capabilities such as emotion regulation and thought facilitati… ▽ More

    Submitted 7 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Main Conference

  24. arXiv:2401.09395  [pdf, other

    cs.CL

    Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

    Authors: Pengfei Hong, Navonil Majumder, Deepanway Ghosal, Somak Aditya, Rada Mihalcea, Soujanya Poria

    Abstract: Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness in reasoning tasks remains an open question. To this end, in this paper, we focus on two popular reasoning tasks: arithmetic reasoning and code generation. Parti… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  25. arXiv:2401.04972  [pdf, other

    cs.CL

    Whose wife is it anyway? Assessing bias against same-gender relationships in machine translation

    Authors: Ian Stewart, Rada Mihalcea

    Abstract: Machine translation often suffers from biased data and algorithms that can lead to unacceptable errors in system output. While bias in gender norms has been investigated, less is known about whether MT systems encode bias about social relationships, e.g. sentences such as "the lawyer kissed her wife." We investigate the degree of bias against same-gender relationships in MT systems, using generate… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  26. arXiv:2401.01967  [pdf, other

    cs.CL cs.AI

    A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

    Authors: Andrew Lee, Xiaoyan Bai, Itamar Pres, Martin Wattenberg, Jonathan K. Kummerfeld, Rada Mihalcea

    Abstract: While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks. In this work we study a popular algorithm, direct preference optimization (DPO), and the mechanisms by which it reduces toxicity. Namel… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  27. arXiv:2311.10944  [pdf, other

    cs.CL

    Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks

    Authors: Panfeng Li, Mohamed Abouelenien, Rada Mihalcea, Zhicheng Ding, Qikai Yang, Yiming Zhou

    Abstract: Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contri… ▽ More

    Submitted 26 June, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: Accepted by 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  28. arXiv:2311.08299  [pdf, other

    cs.CL cs.AI

    VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing

    Authors: Do June Min, Verónica Pérez-Rosas, Kenneth Resnicow, Rada Mihalcea

    Abstract: Reflective listening is a fundamental skill that counselors must acquire to achieve proficiency in motivational interviewing (MI). It involves responding in a manner that acknowledges and explores the meaning of what the client has expressed in the conversation. In this work, we introduce the task of counseling response rewriting, which transforms non-reflective statements into reflective response… ▽ More

    Submitted 8 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  29. arXiv:2311.05746  [pdf, other

    cs.CY cs.AI cs.CL cs.CV

    Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models

    Authors: Joan Nwatu, Oana Ignat, Rada Mihalcea

    Abstract: Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by these technologies. Among the minority groups under-represented in AI, data from low-income households are often overlooked in data collection and model evaluation. We evaluate the per… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Journal ref: EMNLP 2023

  30. arXiv:2310.20159  [pdf, other

    cs.CV cs.AI

    Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

    Authors: Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria

    Abstract: Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where an… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  31. arXiv:2310.16755  [pdf, other

    cs.CL cs.AI

    HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

    Authors: Yinghui He, Yufan Wu, Yilin Jia, Rada Mihalcea, Yulong Chen, Naihao Deng

    Abstract: Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted at Findings of EMNLP 2023

    Journal ref: Findings of EMNLP 2023

  32. arXiv:2310.05317  [pdf, other

    cs.CL cs.AI

    Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond

    Authors: Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada Mihalcea

    Abstract: We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building… ▽ More

    Submitted 13 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted at the main conference of The 2023 Conference on Empirical Methods in Natural Language Processing; 8 pages

    MSC Class: 68 ACM Class: I.2.7

    Journal ref: The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

  33. arXiv:2309.06219  [pdf, other

    cs.CV cs.CL cs.CY cs.IR

    Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

    Authors: Oana Ignat, Santiago Castro, Weiji Li, Rada Mihalcea

    Abstract: We introduce the task of automatic human action co-occurrence identification, i.e., determine whether two human actions can co-occur in the same interval of time. We create and make publicly available the ACE (Action Co-occurrencE) dataset, consisting of a large graph of ~12k co-occurring pairs of visual actions and their corresponding video clips. We describe graph link prediction models that lev… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

  34. arXiv:2306.12466  [pdf, ps, other

    cs.SI cs.CL

    Misinformation as Information Pollution

    Authors: Ashkan Kazemi, Rada Mihalcea

    Abstract: Social media feed algorithms are designed to optimize online social engagements for the purpose of maximizing advertising profits, and therefore have an incentive to promote controversial posts including misinformation. By thinking about misinformation as information pollution, we can draw parallels with environmental policy for countering pollution such as carbon taxes. Similar to pollution, a Pi… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: 9 pages

  35. arXiv:2306.05836  [pdf, other

    cs.CL cs.AI cs.LG

    Can Large Language Models Infer Causation from Correlation?

    Authors: Zhi**g **, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf

    Abstract: Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (… ▽ More

    Submitted 17 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: ICLR 2024

  36. arXiv:2305.18786  [pdf, other

    cs.CV cs.CL

    Scalable Performance Analysis for Vision-Language Models

    Authors: Santiago Castro, Oana Ignat, Rada Mihalcea

    Abstract: Joint vision-language models have shown great performance over a diverse set of tasks. However, little is known about their limitations, as the high dimensional space learned by these models makes it difficult to identify semantic errors. Recent work has addressed this problem by designing highly controlled probing task benchmarks. Our paper introduces a more scalable solution that relies on alrea… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Camera-ready version for *SEM 2023

  37. arXiv:2305.14663  [pdf, other

    cs.CL

    You Are What You Annotate: Towards Better Models through Annotator Representations

    Authors: Naihao Deng, Xinliang Frederick Zhang, Siyang Liu, Winston Wu, Lu Wang, Rada Mihalcea

    Abstract: Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are multiple reasons for such disagreements, including the subjectivity of the task, difficult cases, unclear guidelines, and so on. Rather than simply aggregating labels to obtain data annotations, we instead try to directly model the diverse perspectives of the annotators, and explicitly account for annotators… ▽ More

    Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of EMNLP 2023

  38. arXiv:2305.14597  [pdf, other

    cs.CL cs.AI cs.LG

    Voices of Her: Analyzing Gender Differences in the AI Publication World

    Authors: Yiwen Ding, Jiarui Liu, Zhiheng Lyu, Kun Zhang, Bernhard Schoelkopf, Zhi**g **, Rada Mihalcea

    Abstract: While several previous studies have analyzed gender bias in research, we are still missing a comprehensive analysis of gender differences in the AI community, covering diverse topics and different development trends. Using the AI Scholar dataset of 78K researchers in the field of AI, we identify several gender differences: (1) Although female researchers tend to have fewer overall citations than m… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  39. arXiv:2305.14169  [pdf, other

    cs.HC cs.CL

    EASE: An Easily-Customized Annotation System Powered by Efficiency Enhancement Mechanisms

    Authors: Naihao Deng, Yikai Liu, Mingye Chen, Winston Wu, Siyang Liu, Yulong Chen, Yue Zhang, Rada Mihalcea

    Abstract: The performance of current supervised AI systems is tightly connected to the availability of annotated datasets. Annotations are usually collected through annotation tools, which are often designed for specific tasks and are difficult to customize. Moreover, existing annotation tools with an active learning mechanism often only support limited use cases. To address these limitations, we present EA… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 20 pages

  40. arXiv:2305.12544  [pdf, other

    cs.CL cs.AI

    Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models

    Authors: Oana Ignat, Zhi**g **, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea

    Abstract: Recent progress in large language models (LLMs) has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on. Has it all be… ▽ More

    Submitted 15 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted at COLING 2024

  41. arXiv:2305.05471  [pdf, other

    cs.CL

    Beyond Good Intentions: Reporting the Research Landscape of NLP for Social Good

    Authors: Fernando Gonzalez, Zhi**g **, Bernhard Schölkopf, Tom Hope, Mrinmaya Sachan, Rada Mihalcea

    Abstract: With the recent advances in natural language processing (NLP), a vast number of applications have emerged across various use cases. Among the plethora of NLP applications, many academic researchers are motivated to do work that has a positive social impact, in line with the recent initiatives of NLP for Social Good (NLP4SG). However, it is not always obvious to researchers how their research effor… ▽ More

    Submitted 21 October, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings

  42. arXiv:2305.01764  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    Psychologically-Inspired Causal Prompts

    Authors: Zhiheng Lyu, Zhi**g **, Justus Mattern, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schoelkopf

    Abstract: NLP datasets are richer than just input-output pairs; rather, they carry causal relations between the input and output variables. In this work, we take sentiment classification as an example and look into the causal relations between the review (X) and sentiment (Y). As psychology studies show that language can affect emotion, different psychological processes are evoked when a person first makes… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  43. arXiv:2305.00359  [pdf, other

    eess.AS

    A Review of Deep Learning Techniques for Speech Processing

    Authors: Ambuj Mehrish, Navonil Majumder, Rishabh Bhardwaj, Rada Mihalcea, Soujanya Poria

    Abstract: The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognitio… ▽ More

    Submitted 30 May, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

  44. arXiv:2303.03267  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding

    Authors: Yingting Li, Ambuj Mehrish, Shuai Zhao, Rishabh Bhardwaj, Amir Zadeh, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models. Parameter inefficiency can however arise when, during transfer learning, all the parameters of a large pre-trained model need to be updated for individual downstream tasks. As the number of parameters grows, fine-tuning is prone to overfitting and catastrophic forgetting. In addition, full fine-tunin… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  45. arXiv:2302.03490  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Natural Language Processing for Policymaking

    Authors: Zhi**g **, Rada Mihalcea

    Abstract: Language is the medium for many political activities, from campaigns to news reports. Natural language processing (NLP) uses computational tools to parse text into key information that is needed for policymaking. In this chapter, we introduce common methods of NLP, including text classification, topic modeling, event extraction, and text scaling. We then overview how these methods can be used for… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: Handbook of Computational Social Science for Policy (2023), Chapter 7 (pages 141-162). Open Access on Springer: https://doi.org/10.1007/978-3-031-16624-2

  46. arXiv:2212.10678  [pdf, other

    cs.CL cs.LG

    Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

    Authors: Justus Mattern, Zhi**g **, Mrinmaya Sachan, Rada Mihalcea, Bernhard Schölkopf

    Abstract: Generated texts from large pretrained language models have been shown to exhibit a variety of harmful, human-like biases about various demographics. These findings prompted large efforts aiming to understand and measure such effects, with the goal of providing benchmarks that can guide the development of techniques mitigating these stereotypical associations. However, as recent research has pointe… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  47. arXiv:2212.02581  [pdf, other

    econ.GN cs.DL physics.soc-ph

    Editing a Woman's Voice

    Authors: Anna Costello, Ekaterina Fedorova, Zhi**g **, Rada Mihalcea

    Abstract: Prior work shows that men and women speak with different levels of confidence, though it's often assumed that these differences are innate or are learned in early childhood. Using academic publishing as a setting, we find that language differences across male and female authors are initially negligible: in first drafts of academic manuscripts, men and women write with similar levels of uncertainty… ▽ More

    Submitted 11 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

  48. arXiv:2210.16495  [pdf, other

    cs.CL cs.AI cs.LG

    Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering

    Authors: Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: We propose a simple refactoring of multi-choice question answering (MCQA) tasks as a series of binary classifications. The MCQA task is generally performed by scoring each (question, answer) pair normalized over all the pairs, and then selecting the answer from the pair that yield the highest score. For n answer choices, this is equivalent to an n-class classification setup where only one class (t… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

  49. arXiv:2210.07467  [pdf, other

    cs.CL

    Query Rewriting for Effective Misinformation Discovery

    Authors: Ashkan Kazemi, Artem Abzaliev, Naihao Deng, Rui Hou, Scott A. Hale, Verónica Pérez-Rosas, Rada Mihalcea

    Abstract: We propose a novel system to help fact-checkers formulate search queries for known misinformation claims and effectively search across multiple social media platforms. We introduce an adaptable rewriting strategy, where editing actions for queries containing claims (e.g., swap a word with its synonym; change verb tense into present simple) are automatically learned through offline reinforcement le… ▽ More

    Submitted 2 October, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: AACL 2023 (long paper)

  50. arXiv:2210.02890  [pdf, other

    cs.CL

    Multiview Contextual Commonsense Inference: A New Dataset and Task

    Authors: Siqi Shen, Deepanway Ghosal, Navonil Majumder, Henry Lim, Rada Mihalcea, Soujanya Poria

    Abstract: Contextual commonsense inference is the task of generating various types of explanations around the events in a dyadic dialogue, including cause, motivation, emotional reaction, and others. Producing a coherent and non-trivial explanation requires awareness of the dialogue's structure and of how an event is grounded in the context. In this work, we create CICEROv2, a dataset consisting of 8,351 in… ▽ More

    Submitted 2 November, 2022; v1 submitted 6 October, 2022; originally announced October 2022.