Skip to main content

Showing 1–32 of 32 results for author: Yatskar, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18672  [pdf, other

    cs.CV cs.CL

    LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification

    Authors: Renyi Qu, Mark Yatskar

    Abstract: (Renyi Qu's Master's Thesis) Recent advancements in interpretable models for vision-language tasks have achieved competitive performance; however, their interpretability often suffers due to the reliance on unstructured text outputs from large language models (LLMs). This introduces randomness and compromises both transparency and reliability, which are essential for addressing safety issues in AI… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.14839  [pdf, other

    cs.CV cs.CL

    A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis

    Authors: Yue Yang, Mona Gandhi, Yufei Wang, Yifan Wu, Michael S. Yao, Chris Callison-Burch, James C. Gee, Mark Yatskar

    Abstract: While deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations. We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images. A… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 23 pages, 9 figures, 12 tables, project page: https://yueyang1996.github.io/knobo/

  3. arXiv:2405.05938  [pdf, other

    cs.CL

    DOLOMITES: Domain-Specific Long-Form Methodical Tasks

    Authors: Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti

    Abstract: Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form o… ▽ More

    Submitted 28 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Dataset now available at https://dolomites-benchmark.github.io

  4. arXiv:2403.13900  [pdf, other

    cs.CV

    CoMo: Controllable Motion Generation through Language Guided Pose Code Editing

    Authors: Yiming Huang, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu

    Abstract: Text-to-motion models excel at efficient human motion generation, but existing approaches lack fine-grained controllability over the generation process. Consequently, modifying subtle postures within a motion or inserting new actions at specific moments remains a challenge, limiting the applicability of these methods in diverse scenarios. In light of these challenges, we introduce CoMo, a Controll… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  5. arXiv:2312.09067  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    Holodeck: Language Guided Generation of 3D Embodied AI Environments

    Authors: Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark

    Abstract: 3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs… ▽ More

    Submitted 22 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Published in CVPR 2024, 21 pages, 27 figures, 2 tables

  6. arXiv:2311.09558  [pdf, other

    cs.CL

    What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception

    Authors: Chaitanya Malaviya, Subin Lee, Dan Roth, Mark Yatskar

    Abstract: Eliciting feedback from end users of NLP models can be beneficial for improving models. However, how should we present model responses to users so they are most amenable to be corrected from user feedback? Further, what properties do users value to understand and trust responses? We answer these questions by analyzing the effect of rationales (or explanations) generated by QA models to support the… ▽ More

    Submitted 1 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024. Code & data available at https://github.com/chaitanyamalaviya/rationale_formats

  7. arXiv:2310.19660  [pdf, other

    cs.CL

    Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck

    Authors: Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, Chris Callison-Burch

    Abstract: Black-box deep neural networks excel in text classification, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBM), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBM predicts categorical value… ▽ More

    Submitted 3 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  8. arXiv:2309.07852  [pdf, other

    cs.CL cs.AI

    ExpertQA: Expert-Curated Questions and Attributed Answers

    Authors: Chaitanya Malaviya, Subin Lee, Sihao Chen, Elizabeth Sieber, Mark Yatskar, Dan Roth

    Abstract: As language models are adopted by a more sophisticated and diverse set of users, the importance of guaranteeing that they provide factually correct information supported by verifiable sources is critical across fields of study. This is especially the case for high-stakes fields, such as medicine and law, where the risk of propagating false information is high and can lead to undesirable societal c… ▽ More

    Submitted 1 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to NAACL 2024. Dataset & code is available at https://github.com/chaitanyamalaviya/expertqa

  9. arXiv:2305.14882  [pdf, other

    cs.CL cs.AI cs.CV

    Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering

    Authors: Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth

    Abstract: Recent advances in multimodal large language models (LLMs) have shown extreme effectiveness in visual question answering (VQA). However, the design nature of these end-to-end models prevents them from being interpretable to humans, undermining trust and applicability in critical domains. While post-hoc rationales offer certain insight into understanding model behavior, these explanations are not g… ▽ More

    Submitted 13 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Multimodal, Visual Question Answering, Vision and Language

  10. arXiv:2302.00762  [pdf, other

    cs.CL

    AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference

    Authors: Yuewei Yuan, Chaitanya Malaviya, Mark Yatskar

    Abstract: Given a sentence "Abby told Brittney that she upset Courtney", one would struggle to understand who "she" refers to, and ask for clarification. However, if the word "upset" were replaced with "hugged", "she" unambiguously refers to Abby. We study if modern coreference resolution models are sensitive to such pronominal ambiguity. To this end, we construct AmbiCoref, a diagnostic corpus of minimal s… ▽ More

    Submitted 3 February, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: EACL 2023 Findings

  11. arXiv:2211.11158  [pdf, other

    cs.CV cs.CL

    Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

    Authors: Yue Yang, Artemis Panagopoulou, Shenghao Zhou, Daniel **, Chris Callison-Burch, Mark Yatskar

    Abstract: Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and… ▽ More

    Submitted 25 April, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Published in CVPR 2023, 18 pages, 12 figures, 16 tables

  12. arXiv:2210.13439  [pdf, other

    cs.CL

    Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models

    Authors: Chaitanya Malaviya, Sudeep Bhatia, Mark Yatskar

    Abstract: Cognitive psychologists have documented that humans use cognitive heuristics, or mental shortcuts, to make quick decisions while expending less effort. While performing annotation work on crowdsourcing platforms, we hypothesize that such heuristic use among annotators cascades on to data quality and model robustness. In this work, we study cognitive heuristic use in the context of annotating multi… ▽ More

    Submitted 23 January, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  13. arXiv:2210.12905  [pdf, other

    cs.CL

    Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

    Authors: Yue Yang, Artemis Panagopoulou, Marianna Apidianaki, Mark Yatskar, Chris Callison-Burch

    Abstract: Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022; The first two authors contributed equally

    Journal ref: Findings of EMNLP 2022

  14. arXiv:2112.00800  [pdf, other

    cs.CL cs.AI

    Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

    Authors: Christopher Clark, Jordi Salvador, Dustin Schwenk, Derrick Bonafilia, Mark Yatskar, Eric Kolve, Alvaro Herrasti, Jonghyun Choi, Sachin Mehta, Sam Skjonsberg, Carissa Schoenick, Aaron Sarnat, Hannaneh Hajishirzi, Aniruddha Kembhavi, Oren Etzioni, Ali Farhadi

    Abstract: Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challeng… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: In EMNLP 2021

  15. arXiv:2111.09276  [pdf, other

    cs.CV cs.CL

    Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval

    Authors: Yue Yang, Joongwon Kim, Artemis Panagopoulou, Mark Yatskar, Chris Callison-Burch

    Abstract: Schemata are structured representations of complex tasks that can aid artificial intelligence by allowing models to break down complex tasks into intermediate steps. We propose a novel system that induces schemata from web videos and generalizes them to capture unseen tasks with the goal of improving video retrieval performance. Our system proceeds in three major phases: (1) Given a task with rela… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  16. arXiv:2104.05845  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Visual Goal-Step Inference using wikiHow

    Authors: Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar, Chris Callison-Burch

    Abstract: Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible st… ▽ More

    Submitted 9 September, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

  17. arXiv:2104.00990  [pdf, other

    cs.CV cs.CL

    Visual Semantic Role Labeling for Video Understanding

    Authors: Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

    Abstract: We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill various roles relevant to that event. To study the challenging task of semantic role labeling in videos or VidSRL, we introduce the VidSitu benchm… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: CVPR21 camera-ready including appendix. Project Page at https://vidsitu.org/

  18. arXiv:2011.03856  [pdf, other

    cs.LG cs.CL cs.CV

    Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

    Authors: Christopher Clark, Mark Yatskar, Luke Zettlemoyer

    Abstract: Many datasets have been shown to contain incidental correlations created by idiosyncrasies in the data collection process. For example, sentence entailment datasets can have spurious word-class correlations if nearly all contradiction sentences contain the word "not", and image recognition datasets can have tell-tale object-background correlations if dogs are always indoors. In this paper, we prop… ▽ More

    Submitted 7 November, 2020; originally announced November 2020.

    Comments: In EMNLP Findings

  19. arXiv:2004.06799  [pdf, other

    cs.CV cs.RO

    RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

    Authors: Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi

    Abstract: Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI.… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: CVPR 2020

  20. arXiv:2003.12058  [pdf, other

    cs.CV

    Grounded Situation Recognition

    Authors: Sarah Pratt, Mark Yatskar, Luca Weihs, Ali Farhadi, Aniruddha Kembhavi

    Abstract: We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities. GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of en… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

  21. arXiv:1909.03683  [pdf, other

    cs.CL cs.CV cs.LG

    Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases

    Authors: Christopher Clark, Mark Yatskar, Luke Zettlemoyer

    Abstract: State-of-the-art models often make use of superficial patterns in the data that do not generalize well to out-of-domain or adversarial settings. For example, textual entailment models often learn that particular key words imply entailment, irrespective of context, and visual question answering models learn to predict prototypical answers, without considering evidence in the image. In this paper, w… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: In EMNLP 2019

  22. arXiv:1908.03557  [pdf, other

    cs.CV cs.CL cs.LG

    VisualBERT: A Simple and Performant Baseline for Vision and Language

    Authors: Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang

    Abstract: We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experim… ▽ More

    Submitted 9 August, 2019; originally announced August 2019.

    Comments: Work in Progress

  23. arXiv:1904.03310  [pdf, other

    cs.CL

    Gender Bias in Contextualized Word Embeddings

    Authors: Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang

    Abstract: In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo's contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female enti… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

  24. arXiv:1811.08489  [pdf, other

    cs.CV

    Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

    Authors: Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez

    Abstract: In this work, we present a framework to measure and mitigate intrinsic biases with respect to protected variables --such as gender-- in visual recognition tasks. We show that trained models significantly amplify the association of target labels with gender beyond what one would expect from biased datasets. Surprisingly, we show that even when datasets are balanced such that each label co-occurs eq… ▽ More

    Submitted 10 October, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: 10 pages, 7 figures, ICCV 2019

  25. arXiv:1809.10735  [pdf, other

    cs.CL

    A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC

    Authors: Mark Yatskar

    Abstract: We compare three new datasets for question answering: SQuAD 2.0, QuAC, and CoQA, along several of their new features: (1) unanswerable questions, (2) multi-turn interactions, and (3) abstractive answers. We show that the datasets provide complementary coverage of the first two aspects, but weak coverage of the third. Because of the datasets' structural similarity, a single extractive model can be… ▽ More

    Submitted 1 July, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

    Comments: Camera Ready Presented at NAACL '19

  26. arXiv:1808.07036  [pdf, other

    cs.CL cs.AI cs.LG

    QuAC : Question Answering in Context

    Authors: Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Ye** Choi, Percy Liang, Luke Zettlemoyer

    Abstract: We present QuAC, a dataset for Question Answering in Context that contains 14K information-seeking QA dialogs (100K questions in total). The dialogs involve two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts from the text. QuAC introduces chal… ▽ More

    Submitted 27 August, 2018; v1 submitted 21 August, 2018; originally announced August 2018.

    Comments: EMNLP Camera Ready

  27. arXiv:1804.06876  [pdf, other

    cs.CL cs.AI

    Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

    Authors: Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang

    Abstract: We introduce a new benchmark, WinoBias, for coreference resolution focused on gender bias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with h… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: NAACL '18 Camera Ready

  28. arXiv:1711.06640  [pdf, other

    cs.CV

    Neural Motifs: Scene Graph Parsing with Global Context

    Authors: Rowan Zellers, Mark Yatskar, Sam Thomson, Ye** Choi

    Abstract: We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that object labels are highly predictive of relation labels but not vice-versa. We also find that there ar… ▽ More

    Submitted 29 March, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

    Comments: CVPR 2018 camera ready

  29. arXiv:1707.09457  [pdf, other

    cs.AI cs.CL cs.CV stat.ML

    Men Also Like Shop**: Reducing Gender Bias Amplification using Corpus-level Constraints

    Authors: Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang

    Abstract: Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: 11 pages, published in EMNLP 2017

  30. arXiv:1704.08381  [pdf, other

    cs.CL

    Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

    Authors: Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Ye** Choi, Luke Zettlemoyer

    Abstract: Sequence-to-sequence models have shown strong performance across a broad range of applications. However, their application to parsing and generating text usingAbstract Meaning Representation (AMR)has been limited, due to the relatively limited amount of labeled data and the non-sequential nature of the AMR graphs. We present a novel training procedure that can lift this limitation using millions o… ▽ More

    Submitted 18 August, 2017; v1 submitted 26 April, 2017; originally announced April 2017.

    Comments: Accepted in ACL 2017

  31. arXiv:1612.00901  [pdf, other

    cs.CV cs.AI

    Commonly Uncommon: Semantic Sparsity in Situation Recognition

    Authors: Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi

    Abstract: Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objec… ▽ More

    Submitted 2 December, 2016; originally announced December 2016.

  32. arXiv:1008.1986  [pdf, ps, other

    cs.CL

    For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia

    Authors: Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil, Lillian Lee

    Abstract: We report on work in progress on extracting lexical simplifications (e.g., "collaborate" -> "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to… ▽ More

    Submitted 11 August, 2010; originally announced August 2010.

    Comments: 4 pp; data available at http://www.cs.cornell.edu/home/llee/data/simple/

    ACM Class: I.2.7

    Journal ref: Proceedings of the NAACL, pp. 365-368, 2010. Short paper