Skip to main content

Showing 1–5 of 5 results for author: Krojer, B

.
  1. arXiv:2310.02567  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Improving Automatic VQA Evaluation Using Large Language Models

    Authors: Oscar Mañas, Benno Krojer, Aishwarya Agrawal

    Abstract: 8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the… ▽ More

    Submitted 10 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at AAAI 2024 (main track)

  2. arXiv:2306.08818  [pdf, other

    cs.CL

    Pragmatic Inference with a CLIP Listener for Contrastive Captioning

    Authors: Jiefu Ou, Benno Krojer, Daniel Fried

    Abstract: We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedure that formulates captioning as a reference game between a speaker, which produces possible captions describing the target, and a listener, which sele… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Findings of ACL 2023, fixed some references

  3. arXiv:2305.16397  [pdf, other

    cs.CV cs.AI cs.CL

    Are Diffusion Models Vision-And-Language Reasoners?

    Authors: Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy

    Abstract: Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innov… ▽ More

    Submitted 2 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted to NeurIPS 2023

  4. Image Retrieval from Contextual Descriptions

    Authors: Benno Krojer, Vaibhav Adlakha, Vibhav Vineet, Yash Goyal, Edoardo Ponti, Siva Reddy

    Abstract: The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utterance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: accepted to ACL 2022

  5. arXiv:2006.10413  [pdf, other

    cs.CL

    Are Pretrained Language Models Symbolic Reasoners Over Knowledge?

    Authors: Nora Kassner, Benno Krojer, Hinrich Schütze

    Abstract: How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reas… ▽ More

    Submitted 10 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted to CoNLL 2020