Skip to main content

Showing 1–16 of 16 results for author: Yüksekgönül, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07496  [pdf, other

    cs.CL cs.AI cs.LG

    TextGrad: Automatic "Differentiation" via Text

    Authors: Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, James Zou

    Abstract: AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, develo** principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic different… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 41 pages, 6 figures

  2. arXiv:2402.05863  [pdf, other

    cs.AI cs.CL cs.GT

    How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

    Authors: Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, James Zou

    Abstract: Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs) to act as agents on behalf of human users, such LLM agents would also need to be able to negotiate. In this paper, we study how well LLMs can negotiate with each other. We develop NegotiationArena:… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2311.14703  [pdf

    cs.CY cs.CL

    ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management

    Authors: Angela Zhang, Mert Yuksekgonul, Joshua Guild, James Zou, Joseph C. Wu

    Abstract: Recent breakthroughs in large language models (LLMs) have led to their rapid dissemination and widespread use. One early application has been to medicine, where LLMs have been investigated to streamline clinical workflows and facilitate clinical analysis and decision-making. However, a leading barrier to the deployment of Artificial Intelligence (AI) and in particular LLMs has been concern for emb… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 19 pages, 2 tables, 2 figures

  4. arXiv:2310.15511  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

    Authors: Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi

    Abstract: We study the ability of state-of-the art models to answer constraint satisfaction queries for information retrieval (e.g., 'a list of ice cream shops in San Diego'). In the past, such queries were considered to be tasks that could only be solved via web-search or knowledge bases. More recently, large language models (LLMs) have demonstrated initial emergent abilities in this task. However, many cu… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 23 pages

    ACM Class: I.2.7

  5. arXiv:2310.07088  [pdf, other

    cs.CL cs.AI

    Diversity of Thought Improves Reasoning Abilities of LLMs

    Authors: Ranjita Naik, Varun Chandrasekaran, Mert Yuksekgonul, Hamid Palangi, Besmira Nushi

    Abstract: Large language models (LLMs) are documented to struggle in settings that require complex reasoning. Nevertheless, instructing the model to break down the problem into smaller reasoning steps, or ensembling various generations through modifying decoding steps boosts performance. However, these methods assume that the input prompt is fixed and expect the decoding strategies to introduce the diversit… ▽ More

    Submitted 23 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2309.15098  [pdf, other

    cs.CL cs.AI cs.LG

    Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

    Authors: Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

    Abstract: We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as constraint satisfaction problems and use this framework to investigate how the LLM interacts internally with factual constraints. We find a strong positive relationship between the LLM's attention to constraint tokens and the fac… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Published at ICLR 2024

  7. arXiv:2305.18262  [pdf, other

    cs.LG cs.AI

    Beyond Confidence: Reliable Models Should Also Consider Atypicality

    Authors: Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin

    Abstract: While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical(rare) a sample or a c… ▽ More

    Submitted 30 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Published at NeurIPS 2023

  8. arXiv:2305.00650  [pdf, other

    cs.LG cs.CV eess.IV

    Discover and Cure: Concept-aware Mitigation of Spurious Correlation

    Authors: Shirley Wu, Mert Yuksekgonul, Linjun Zhang, James Zou

    Abstract: Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency t… ▽ More

    Submitted 5 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  9. arXiv:2304.02819  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    GPT detectors are biased against non-native English writers

    Authors: Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou

    Abstract: The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this… ▽ More

    Submitted 10 July, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

  10. arXiv:2302.00785  [pdf, other

    cs.CV cs.AI

    SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained model debugging and analysis

    Authors: Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, James Zou

    Abstract: For the deployment of artificial intelligence (AI) in high-risk settings, such as healthcare, methods that provide interpretability/explainability or allow fine-grained error analysis are critical. Many recent methods for interpretability/explainability and fine-grained error analysis use concepts, which are meta-labels that are semantically meaningful to humans. However, there are only a few data… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2022 Datasets and Benchmarks Track

  11. arXiv:2211.09110  [pdf, other

    cs.CL cs.AI cs.LG

    Holistic Evaluation of Language Models

    Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao , et al. (25 additional authors not shown)

    Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest fo… ▽ More

    Submitted 1 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2023

  12. arXiv:2210.01936  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    When and why vision-language models behave like bags-of-words, and what to do about it?

    Authors: Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, James Zou

    Abstract: Despite the success of large vision and language models (VLMs) in many downstream applications, it is unclear how well they encode compositional information. Here, we create the Attribution, Relation, and Order (ARO) benchmark to systematically evaluate the ability of VLMs to understand different types of relationships, attributes, and order. ARO consists of Visual Genome Attribution, to test the… ▽ More

    Submitted 23 March, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 Oral (notable-top-5%)

  13. arXiv:2205.15480  [pdf, other

    cs.LG cs.AI stat.ML

    Post-hoc Concept Bottleneck Models

    Authors: Mert Yuksekgonul, Maggie Wang, James Zou

    Abstract: Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense conce… ▽ More

    Submitted 1 February, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: ICLR 2023 Spotlight (notable-top-25%)

  14. arXiv:2106.12723  [pdf, other

    cs.LG

    Meaningfully Debugging Model Mistakes using Conceptual Counterfactual Explanations

    Authors: Abubakar Abid, Mert Yuksekgonul, James Zou

    Abstract: Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases. However, this is often an ad hoc process that involves manually looking at the model's mistakes on many test samples and guessing at the underlying reasons for those incorrect predictions. In this paper, we… ▽ More

    Submitted 14 June, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: ICML 2022

  15. arXiv:1910.00965  [pdf, other

    cs.LG stat.ML

    Learning Maximally Predictive Prototypes in Multiple Instance Learning

    Authors: Mert Yuksekgonul, Ozgur Emre Sivrikaya, Mustafa Gokce Baydogan

    Abstract: In this work, we propose a simple model that provides permutation invariant maximally predictive prototype generator from a given dataset, which leads to interpretability of the solution and concrete insights to the nature and the solution of a problem. Our aim is to find out prototypes in the feature space to map the collection of instances (i.e. bags) to a distance feature space and simultaneous… ▽ More

    Submitted 22 January, 2021; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: Sets & Partitions Workshop at NeurIPS 2019

  16. arXiv:1909.11229  [pdf, other

    cs.CV cs.LG

    Pretraining boosts out-of-domain robustness for pose estimation

    Authors: Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yüksekgönül, Byron Rogers, Matthias Bethge, Mackenzie W. Mathis

    Abstract: Neural networks are highly effective tools for pose estimation. However, as in other computer vision tasks, robustness to out-of-domain data remains a challenge, especially for small training sets that are common for real-world applications. Here, we probe the generalization ability with three architecture classes (MobileNetV2s, ResNets, and EfficientNets) for pose estimation. We developed a datas… ▽ More

    Submitted 12 November, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

    Comments: A.M. and T.B. co-first authors. Dataset available at http://horse10. deeplabcut.org . WACV 2021 conference

    Journal ref: https://openaccess.thecvf.com/content/WACV2021/html/Mathis_Pretraining_Boosts_Out-of-Domain_Robustness_for_Pose_Estimation_WACV_2021_paper.html