Skip to main content

Showing 1–13 of 13 results for author: Whitehead, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.08751  [pdf, other

    cs.CV

    Improving Selective Visual Question Answering by Learning from Your Peers

    Authors: Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach

    Abstract: Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: CVPR 2023. Code available here: https://github.com/facebookresearch/selective-vqa_ood

  2. arXiv:2305.07021  [pdf, other

    cs.CV

    Simple Token-Level Confidence Improves Caption Correctness

    Authors: Suzanne Petryk, Spencer Whitehead, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

    Abstract: The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding. However, state-of-the-art models often misinterpret the correctness of fine-grained details, leading to errors in outputs such as hallucinating objects in generated captions or poor compositional reasoning. In this work, we explore Token-Level Confidence, or TLC, as a simple yet… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  3. arXiv:2304.02643  [pdf, other

    cs.CV cs.AI cs.LG

    Segment Anything

    Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

    Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project web-page: https://segment-anything.com

  4. arXiv:2204.13631  [pdf, other

    cs.CV

    Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

    Authors: Spencer Whitehead, Suzanne Petryk, Vedaad Shakib, Joseph Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

    Abstract: Machine learning has advanced dramatically, narrowing the accuracy gap to humans in multimodal tasks like visual question answering (VQA). However, while humans can say "I don't know" when they are uncertain (i.e., abstain from answering a question), such ability has been largely neglected in multimodal research, despite the importance of this problem to the usage of VQA in real settings. In this… ▽ More

    Submitted 20 October, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: ECCV 2022. Code and models are available here: https://github.com/facebookresearch/reliable_vqa

  5. arXiv:2107.09106  [pdf, other

    cs.CV cs.CL cs.LG

    Separating Skills and Concepts for Novel Visual Question Answering

    Authors: Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko

    Abstract: Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models. To measure generalization to novel questions, we propose to separate them into "skills" and "concepts". "Skills" are visual tasks, such as counting or attribute recognition, and are applied to "concepts" mentioned in the question, such as objects and people. VQA methods should be able to compo… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Paper at CVPR 2021. 14 pages, 7 figures

  6. arXiv:2011.13406  [pdf, other

    cs.CV

    Learning from Lexical Perturbations for Consistent Visual Question Answering

    Authors: Spencer Whitehead, Hui Wu, Yi Ren Fung, Heng Ji, Rogerio Feris, Kate Saenko

    Abstract: Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations. In this paper, we propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations and regularizes the visual reasoning process between them to be consistent during training. We show that our framework markedly improves consis… ▽ More

    Submitted 22 December, 2020; v1 submitted 26 November, 2020; originally announced November 2020.

    Comments: 14 pages, 8 figures

  7. arXiv:2010.09270  [pdf, other

    cs.CL cs.AI

    Global Attention for Name Tagging

    Authors: Boliang Zhang, Spencer Whitehead, Lifu Huang, Heng Ji

    Abstract: Many name tagging approaches use local contextual information with much success, but fail when the local context is ambiguous or limited. We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information. We retrieve document-level context from other sentences within the same document and corpus-level context from sentences in other topi… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

  8. arXiv:2005.02472  [pdf, other

    cs.MM cs.CL cs.CV cs.LG

    Cross-media Structured Common Space for Multimedia Event Extraction

    Authors: Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang

    Abstract: We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic inform… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: Accepted as an oral paper at ACL 2020

  9. arXiv:1911.02065  [pdf, other

    cs.AI cs.LG cs.LO

    A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving

    Authors: Maxwell Crouse, Ibrahim Abdelaziz, Bassem Makni, Spencer Whitehead, Cristina Cornelio, Pavan Kapanipathi, Kavitha Srinivas, Veronika Thost, Michael Witbrock, Achille Fokoue

    Abstract: Automated theorem provers have traditionally relied on manually tuned heuristics to guide how they perform proof search. Deep reinforcement learning has been proposed as a way to obviate the need for such heuristics, however, its deployment in automated theorem proving remains a challenge. In this paper we introduce TRAIL, a system that applies deep reinforcement learning to saturation-based theor… ▽ More

    Submitted 15 September, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

  10. arXiv:1911.02060  [pdf, other

    cs.CL cs.AI

    Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

    Authors: Pavan Kapanipathi, Veronika Thost, Siva Sankalp Patel, Spencer Whitehead, Ibrahim Abdelaziz, Avinash Balakrishnan, Maria Chang, Kshitij Fadnis, Chulaka Gunasekara, Bassem Makni, Nicholas Mattei, Kartik Talamadupula, Achille Fokoue

    Abstract: Textual entailment is a fundamental task in natural language processing. Most approaches for solving the problem use only the textual content present in training data. A few approaches have shown that information from external knowledge sources like knowledge graphs (KGs) can add value, in addition to the textual content, by providing background knowledge that may be critical for a task. However,… ▽ More

    Submitted 21 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

  11. Studying Wythoff and Zometool Constructions using Maple

    Authors: Benoit Charbonneau, Spencer Whitehead

    Abstract: We describe a Maple package that serves at least four purposes. First, one can use it to compute whether or not a given polyhedral structure is Zometool constructible. Second, one can use it to manipulate Zometool objects, for example to determine how to best build a given structure. Third, the package allows for an easy computation of the polytopes obtained by the kaleiodoscopic construction call… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

    Comments: 11 pages, 11 figures

    Journal ref: In: Gerhard J., Kotsireas I. (eds) Maple in Mathematics Education and Research. MC 2019. Communications in Computer and Information Science, vol 1125. Springer

  12. Paper Abstract Writing through Editing Mechanism

    Authors: Qingyun Wang, Zhihao Zhou, Lifu Huang, Spencer Whitehead, Boliang Zhang, Heng Ji, Kevin Knight

    Abstract: We present a paper abstract writing system based on an attentive neural sequence-to-sequence model that can take a title as input and automatically generate an abstract. We design a novel Writing-editing Network that can attend to both the title and the previously generated abstract drafts and then iteratively revise and polish the abstract. With two series of Turing tests, where the human judges… ▽ More

    Submitted 15 May, 2018; originally announced May 2018.

    Comments: * Equal contribution. 6 pages. Accepted by ACL 2018; The code and dataset are available at https://github.com/EagleW/Writing-editing-Network

  13. arXiv:1804.07889  [pdf, other

    cs.CL

    Entity-aware Image Caption Generation

    Authors: Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang

    Abstract: Current image captioning approaches generate descriptions which lack specific information, such as named entities that are involved in the images. In this paper we propose a new task which aims to generate informative image captions, given images and hashtags as input. We propose a simple but effective approach to tackle this problem. We first train a convolutional neural networks - long short ter… ▽ More

    Submitted 6 November, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

    Comments: In proceedings of EMNLP 2018