Skip to main content

Showing 1–5 of 5 results for author: Bravo, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.07270  [pdf, other

    cs.CV cs.CL cs.LG

    Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy

    Authors: Simon Ging, María A. Bravo, Thomas Brox

    Abstract: The evaluation of text-generative vision-language models is a challenging yet crucial endeavor. By addressing the limitations of existing Visual Question Answering (VQA) benchmarks and proposing innovative evaluation methodologies, our research seeks to advance our understanding of these models' capabilities. We propose a novel VQA benchmark based on well-known visual classification datasets which… ▽ More

    Submitted 5 May, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: Accepted as Spotlight Paper for ICLR 2024. The first two authors contributed equally to this work

  2. arXiv:2211.12914  [pdf, other

    cs.CV cs.LG

    Open-vocabulary Attribute Detection

    Authors: María A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox

    Abstract: Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corres… ▽ More

    Submitted 8 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted at CVPR 2023. https://ovad-benchmark.github.io

  3. arXiv:2208.14195  [pdf, other

    cs.CV

    Probing Contextual Diversity for Dense Out-of-Distribution Detection

    Authors: Silvio Galesso, Maria Alejandra Bravo, Mehdi Naouar, Thomas Brox

    Abstract: Detection of out-of-distribution (OoD) samples in the context of image classification has recently become an area of interest and active study, along with the topic of uncertainty estimation, to which it is closely related. In this paper we explore the task of OoD segmentation, which has been studied less than its classification counterpart and presents additional challenges. Segmentation is a den… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: Safe Artificial Intelligence for Automated Driving Workshop, ECCV 2022

  4. arXiv:2205.06160  [pdf, other

    cs.CV cs.LG

    Localized Vision-Language Matching for Open-vocabulary Object Detection

    Authors: Maria A. Bravo, Sudhanshu Mittal, Thomas Brox

    Abstract: In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the mo… ▽ More

    Submitted 28 July, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: Accepted at DAGM German Conference on Pattern Recognition (GCPR 2022)

  5. arXiv:1904.05847  [pdf, other

    cs.CV

    MAIN: Multi-Attention Instance Network for Video Segmentation

    Authors: Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret, Thomas Brox, Pablo Arbelaez, Bernard Ghanem

    Abstract: Instance-level video segmentation requires a solid integration of spatial and temporal information. However, current methods rely mostly on domain-specific information (online learning) to produce accurate instance-level segmentations. We propose a novel approach that relies exclusively on the integration of generic spatio-temporal attention cues. Our strategy, named Multi-Attention Instance Netwo… ▽ More

    Submitted 11 April, 2019; originally announced April 2019.