Skip to main content

Showing 1–10 of 10 results for author: Menghini, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.16442  [pdf, other

    cs.CL cs.CV cs.LG

    If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions

    Authors: Reza Esfandiarpoor, Cristina Menghini, Stephen H. Bach

    Abstract: Recent works often assume that Vision-Language Model (VLM) representations are based on visual attributes like shape. However, it is unclear to what extent VLMs prioritize this information to represent concepts. We propose Extract and Explore (EX2), a novel approach to characterize important textual features for VLMs. EX2 uses reinforcement learning to align a large language model with VLM prefere… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/BatsResearch/ex2

  2. arXiv:2402.14086  [pdf, other

    cs.CL cs.AI cs.LG

    LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons

    Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach

    Abstract: Data scarcity in low-resource languages can be addressed with word-to-word translations from labeled task data in high-resource languages using bilingual lexicons. However, bilingual lexicons often have limited lexical overlap with task data, which results in poor translation coverage and lexicon utilization. We propose lexicon-conditioned data generation (LexC-Gen), a method that generates low-re… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  3. arXiv:2310.02446  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Low-Resource Languages Jailbreak GPT-4

    Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach

    Abstract: AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On… ▽ More

    Submitted 27 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: NeurIPS Workshop on Socially Responsible Language Modelling Research (SoLaR) 2023. Best Paper Award

  4. arXiv:2306.01669  [pdf, other

    cs.CV cs.LG

    Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

    Authors: Cristina Menghini, Andrew Delworth, Stephen H. Bach

    Abstract: Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i.e., heuristic labels for unlabeled data, to enhance CLIP via prompt tuning. Conventional pseudolabeling trains a model on labeled data and then generates labels for unlabe… ▽ More

    Submitted 7 March, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

  5. arXiv:2205.13068  [pdf, other

    cs.LG

    Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes

    Authors: Alessio Mazzetto, Cristina Menghini, Andrew Yuan, Eli Upfal, Stephen H. Bach

    Abstract: We develop a rigorous mathematical analysis of zero-shot learning with attributes. In this setting, the goal is to label novel classes with no training data, only detectors for attributes and a description of how those attributes are correlated with the target classes, called the class-attribute matrix. We develop the first non-trivial lower bound on the worst-case error of the best map from attri… ▽ More

    Submitted 28 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  6. The Drift of #MyBodyMyChoice Discourse on Twitter

    Authors: Cristina Menghini, Justin Uhr, Shahrzad Haddadan, Ashley Champagne, Bjorn Sandstede, Sohini Ramachandran

    Abstract: #MyBodyMyChoice is a well-known hashtag originally created to advocate for women's rights, often used in discourse about abortion and bodily autonomy. The Covid-19 outbreak prompted governments to take containment measures such as vaccination campaigns and mask mandates. Population groups opposed to such measures started to use the slogan "My Body My Choice" to claim their bodily autonomy. In this… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted at WebSci'22

  7. arXiv:2111.04798  [pdf, other

    cs.LG cs.CV

    TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary Data

    Authors: Wasu Piriyakulkij, Cristina Menghini, Ross Briden, Nihal V. Nayak, Jeffrey Zhu, Elaheh Raisi, Stephen H. Bach

    Abstract: Machine learning practitioners often have access to a spectrum of data: labeled data for the target task (which is often limited), unlabeled data, and auxiliary data, the many available labeled datasets for other tasks. We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data and creating high-quality, servable classifiers. The key components of… ▽ More

    Submitted 5 May, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: Paper published at MLSys 2022. It passed the artifact evaluation earning two ACM badges: (1) Artifacts Evaluated Functional v1.1 and (2) Artifacts Available v1.1

  8. RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions

    Authors: Shahrzad Haddadan, Cristina Menghini, Matteo Riondato, Eli Upfal

    Abstract: The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The struc… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

  9. How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing Topics?

    Authors: Cristina Menghini, Aris Anagnostopoulos, Eli Upfal

    Abstract: Wikipedia relies on an extensive review process to verify that the content of each individual page is unbiased and presents a neutral point of view. Less attention has been paid to possible biases in the hyperlink structure of Wikipedia, which has a significant influence on the user's exploration process when visiting more than one page. The evaluation of hyperlink bias is challenging because it d… ▽ More

    Submitted 31 March, 2022; v1 submitted 16 July, 2020; originally announced July 2020.

  10. arXiv:1905.13651  [pdf, other

    cs.DS cs.LG

    Principal Fairness: Removing Bias via Projections

    Authors: Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Cristina Menghini, Chris Schwiegelshohn

    Abstract: Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. We complement several recent papers in this line of research by introducing a general method to reduce bias in the data through random projections in a "fair" subspace. We apply this method to densest subgraph problem. For densest subgraph, our approach based on fair p… ▽ More

    Submitted 5 March, 2021; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Partially supported by the ERC Advanced Grant 788893 AMDROMA "Algorithmic and Mechanism Design Research in Online Markets" and MIUR PRIN project ALGADIMAR "Algorithms, Games, and Digital Markets"