Skip to main content

Showing 1–26 of 26 results for author: Uijlings, J

.
  1. arXiv:2405.19773  [pdf, other

    cs.CV

    VQA Training Sets are Self-play Environments for Generating Few-shot Pools

    Authors: Tautvydas Misiunas, Hassan Mansoor, Jasper Uijlings, Oriana Riva, Victor Carbune

    Abstract: Large-language models and large-vision models are increasingly capable of solving compositional reasoning tasks, as measured by breakthroughs in visual-question answering benchmarks. However, state-of-the-art solutions often involve careful construction of large pre-training and fine-tuning datasets, which can be expensive. The use of external tools, whether other ML models, search engines, or API… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2404.05465  [pdf, other

    cs.CV cs.LG

    HAMMR: HierArchical MultiModal React agents for generic VQA

    Authors: Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

    Abstract: Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal probl… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2310.06641  [pdf, other

    cs.CV

    How (not) to ensemble LVLMs for VQA

    Authors: Lisa Alazraki, Lluis Castrejon, Mostafa Dehghani, Fantine Huot, Jasper Uijlings, Thomas Mensink

    Abstract: This paper studies ensembling in the era of Large Vision-Language Models (LVLMs). Ensembling is a classical method to combine different models to get increased performance. In the recent work on Encyclopedic-VQA the authors examine a wide variety of models to solve their task: from vanilla LVLMs, to models including the caption as extra context, to models augmented with Lens-based retrieval of Wik… ▽ More

    Submitted 7 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: 4th I Can't Believe It's Not Better Workshop (co-located with NeurIPS 2023)

  4. arXiv:2306.09224  [pdf, other

    cs.CV

    Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

    Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

    Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More

    Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: ICCV'23

  5. arXiv:2206.04453  [pdf, other

    cs.CV

    The Missing Link: Finding label relations across datasets

    Authors: Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

    Abstract: Computer vision is driven by the many datasets available for training or evaluating novel methods. However, each dataset has a different set of class labels, visual definition of classes, images following a specific distribution, annotation protocols, etc. In this paper we explore the automatic discovery of visual-semantic relations between labels across datasets. We aim to understand how instance… ▽ More

    Submitted 9 August, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: ECCV 2022

  6. arXiv:2204.01403  [pdf, other

    cs.CV

    How stable are Transferability Metrics evaluations?

    Authors: Andrea Agostinelli, Michal Pándy, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

    Abstract: Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about which transferability metrics work best. In this pa… ▽ More

    Submitted 20 October, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: ECCV 2022

  7. arXiv:2111.13011  [pdf, other

    cs.CV

    Transferability Metrics for Selecting Source Model Ensembles

    Authors: Andrea Agostinelli, Jasper Uijlings, Thomas Mensink, Vittorio Ferrari

    Abstract: We address the problem of ensemble selection in transfer learning: Given a large pool of source models we want to select an ensemble of models which, after fine-tuning on the target training set, yields the best performance on the target test set. Since fine-tuning all possible ensembles is computationally prohibitive, we aim at predicting performance on the target dataset using a computationally… ▽ More

    Submitted 31 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

  8. arXiv:2111.12780  [pdf, other

    cs.CV

    Transferability Estimation using Bhattacharyya Class Separability

    Authors: Michal Pándy, Andrea Agostinelli, Jasper Uijlings, Vittorio Ferrari, Thomas Mensink

    Abstract: Transfer learning has become a popular method for leveraging pre-trained models in computer vision. However, without performing computationally expensive fine-tuning, it is difficult to quantify which pre-trained source models are suitable for a specific target task, or, conversely, to which tasks a pre-trained source model can be easily adapted to. In this work, we propose Gaussian Bhattacharyya… ▽ More

    Submitted 11 April, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Accepted for CVPR 2022

  9. arXiv:2103.13318  [pdf, other

    cs.CV

    Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types

    Authors: Thomas Mensink, Jasper Uijlings, Alina Kuznetsova, Michael Gygli, Vittorio Ferrari

    Abstract: Transfer learning enables to re-use knowledge learned on a source task to help learning a target task. A simple form of transfer learning is common in current state-of-the-art computer vision models, i.e. pre-training a model for image classification on the ILSVRC dataset, and then fine-tune on any target task. However, previous systematic studies of transfer learning have been limited and the cir… ▽ More

    Submitted 20 November, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted for future publication in TPAMI

  10. arXiv:2004.03898  [pdf, other

    cs.LG cs.CV stat.ML

    Towards Reusable Network Components by Learning Compatible Representations

    Authors: Michael Gygli, Jasper Uijlings, Vittorio Ferrari

    Abstract: This paper proposes to make a first step towards compatible and hence reusable network components. Rather than training networks for different tasks independently, we adapt the training process to produce network components that are compatible across tasks. In particular, we split a network into two components, a features extractor and a target task head, and propose various approaches to accompli… ▽ More

    Submitted 16 December, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Preprint; To be presented at AAAI 2021

  11. arXiv:1912.03098  [pdf, other

    cs.CV

    Connecting Vision and Language with Localized Narratives

    Authors: Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari

    Abstract: We propose Localized Narratives, a new form of multimodal image annotations connecting vision and language. We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing. Since the voice and the mouse pointer are synchronized, we can localize every single word in the description. This dense visual grounding takes the form of a… ▽ More

    Submitted 20 July, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

    Comments: ECCV 2020 Camera Ready

  12. arXiv:1911.12709  [pdf, other

    cs.CV

    Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections

    Authors: Theodora Kontogianni, Michael Gygli, Jasper Uijlings, Vittorio Ferrari

    Abstract: In interactive object segmentation a user collaborates with a computer vision model to segment an object. Recent works employ convolutional neural networks for this task: Given an image and a set of corrections made by the user as input, they output a segmentation mask. These approaches achieve strong performance by training on large datasets but they keep the model parameters unchanged at test ti… ▽ More

    Submitted 8 November, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: ECCV 2020 Camera Ready

  13. arXiv:1906.06798  [pdf, other

    cs.CV

    Panoptic Image Annotation with a Collaborative Assistant

    Authors: Jasper R. R. Uijlings, Mykhaylo Andriluka, Vittorio Ferrari

    Abstract: This paper aims to reduce the time to annotate images for panoptic segmentation, which requires annotating segmentation masks and class labels for all object instances and stuff regions. We formulate our approach as a collaborative process between an annotator and an automated assistant who take turns to jointly annotate an image using a predefined pool of segments. Actions performed by the annota… ▽ More

    Submitted 15 December, 2020; v1 submitted 16 June, 2019; originally announced June 2019.

  14. arXiv:1812.01888  [pdf, other

    cs.CV

    Interactive Full Image Segmentation by Considering All Regions Jointly

    Authors: Eirikur Agustsson, Jasper R. R. Uijlings, Vittorio Ferrari

    Abstract: We address interactive full image annotation, where the goal is to accurately segment all object and stuff regions in an image. We propose an interactive, scribble-based annotation framework which operates on the whole image to produce segmentations for all regions. This enables sharing scribble corrections across regions, and allows the annotator to focus on the largest errors made by the machine… ▽ More

    Submitted 10 April, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

    Comments: Accepted to CVPR 2019

  15. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale

    Authors: Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, Vittorio Ferrari

    Abstract: We present Open Images V4, a dataset of 9.2M images with unified annotations for image classification, object detection and visual relationship detection. The images have a Creative Commons Attribution license that allows to share and adapt the material, and they have been collected from Flickr without a predefined list of class names or tags, leading to natural class statistics and avoiding an in… ▽ More

    Submitted 21 February, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: Accepted to International Journal of Computer Vision, 2020

  16. Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

    Authors: Mykhaylo Andriluka, Jasper R. R. Uijlings, Vittorio Ferrari

    Abstract: We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles: (I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new… ▽ More

    Submitted 20 December, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

    Comments: ACM MultiMedia 2018. Live demo is available at fluidann.appspot.com

  17. arXiv:1712.08087  [pdf, other

    cs.CV

    Learning Intelligent Dialogs for Bounding Box Annotation

    Authors: Ksenia Konyushkova, Jasper Uijlings, Christoph Lampert, Vittorio Ferrari

    Abstract: We introduce Intelligent Annotation Dialogs for bounding box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. Specifically, we consider two actions: box verification, where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one b… ▽ More

    Submitted 20 November, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: This paper appeared at CVPR 2018

  18. arXiv:1708.06128  [pdf, other

    cs.CV

    Revisiting knowledge transfer for training object class detectors

    Authors: Jasper Uijlings, Stefan Popov, Vittorio Ferrari

    Abstract: We propose to revisit knowledge transfer for training object detectors on target classes from weakly supervised training images, helped by a set of source classes with bounding-box annotations. We present a unified knowledge transfer framework based on training a single neural network multi-class object detector over all source classes, organized in a semantic hierarchy. This generates proposals w… ▽ More

    Submitted 28 March, 2018; v1 submitted 21 August, 2017; originally announced August 2017.

    Comments: CVPR 18

  19. arXiv:1708.02750  [pdf, other

    cs.CV

    Extreme clicking for efficient object annotation

    Authors: Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

    Abstract: Manually annotating object bounding boxes is central to building computer vision datasets, and it is very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary corners of a tight box around the object. This is difficult as these corners are often outside the actual object and several adjustments are required to obtain a tight box. We prop… ▽ More

    Submitted 9 August, 2017; originally announced August 2017.

    Comments: ICCV 2017

  20. arXiv:1707.05847  [pdf, other

    cs.CV

    The Devil is in the Decoder: Classification, Regression and GANs

    Authors: Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings

    Abstract: Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders… ▽ More

    Submitted 19 February, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

  21. arXiv:1704.06189  [pdf, other

    cs.CV

    Training object class detectors with click supervision

    Authors: Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

    Abstract: Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation time by proposing center-click annotations: we ask annotators to click on the center of an imaginary bounding box which tightly encloses the object instance. We then incorporate… ▽ More

    Submitted 19 May, 2017; v1 submitted 20 April, 2017; originally announced April 2017.

    Comments: CVPR 2017

  22. arXiv:1612.03716  [pdf, other

    cs.CV

    COCO-Stuff: Thing and Stuff Classes in Context

    Authors: Holger Caesar, Jasper Uijlings, Vittorio Ferrari

    Abstract: Semantic classes can be either things (objects with a well-defined shape, e.g. car, person) or stuff (amorphous background regions, e.g. grass, sky). While lots of classification and detection works focus on thing classes, less attention has been given to stuff classes. Nonetheless, stuff classes are important as they allow to explain important aspects of an image, including (1) scene type; (2) wh… ▽ More

    Submitted 28 March, 2018; v1 submitted 12 December, 2016; originally announced December 2016.

    Comments: CVPR 2018 camera-ready

  23. arXiv:1607.07671  [pdf, other

    cs.CV

    Region-based semantic segmentation with end-to-end training

    Authors: Holger Caesar, Jasper Uijlings, Vittorio Ferrari

    Abstract: We propose a novel method for semantic segmentation, the task of labeling each pixel in an image with a semantic class. Our method combines the advantages of the two main competing paradigms. Methods based on region classification offer proper spatial support for appearance measurements, but typically operate in two separate stages, none of which targets pixel labeling performance at the end of th… ▽ More

    Submitted 26 July, 2016; originally announced July 2016.

    Comments: ECCV 2016 camera-ready

  24. arXiv:1602.08405  [pdf, other

    cs.CV

    We don't need no bounding-boxes: Training object class detectors using only human verification

    Authors: Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari

    Abstract: Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes. However, manually drawing bounding-boxes is very time consuming. We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme iterates between re-training the detector… ▽ More

    Submitted 24 April, 2017; v1 submitted 26 February, 2016; originally announced February 2016.

    Comments: CVPR 2016, pp. 854-863. Las Vegas, NV

  25. arXiv:1507.01581  [pdf, other

    cs.CV

    Joint Calibration for Semantic Segmentation

    Authors: Holger Caesar, Jasper Uijlings, Vittorio Ferrari

    Abstract: Semantic segmentation is the task of assigning a class-label to each pixel in an image. We propose a region-based semantic segmentation framework which handles both full and weak supervision, and addresses three common problems: (1) Objects occur at multiple scales and therefore we should use regions at multiple scales. However, these regions are overlap** which creates conflicting class predict… ▽ More

    Submitted 12 August, 2015; v1 submitted 6 July, 2015; originally announced July 2015.

    Comments: Includes improved results based on VGG16 CNN

    MSC Class: 68T45

    Journal ref: BMVC 2015

  26. arXiv:1504.06434  [pdf, other

    cs.CV

    Situational Object Boundary Detection

    Authors: Jasper Uijlings, Vittorio Ferrari

    Abstract: Intuitively, the appearance of true object boundaries varies from image to image. Hence the usual monolithic approach of training a single boundary predictor and applying it to all images regardless of their content is bound to be suboptimal. In this paper we therefore propose situational object boundary detection: We first define a variety of situations and train a specialized object boundary det… ▽ More

    Submitted 24 April, 2015; originally announced April 2015.