Skip to main content

Showing 1–34 of 34 results for author: Brostow, G

.
  1. arXiv:2407.05921  [pdf, other

    cs.CV cs.AI cs.LG

    TAPVid-3D: A Benchmark for Tracking Any Point in 3D

    Authors: Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch

    Abstract: We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three-dimensional point tracking has none. To this end, leveraging existing footage, we build a new benchmark for 3D point tracking featuring 4,000+ real-w… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2406.18387  [pdf, other

    cs.CV cs.LG

    DoubleTake: Geometry Guided Depth Estimation

    Authors: Mohamed Sayed, Filippo Aleotti, Jamie Watson, Zawar Qureshi, Guillermo Garcia-Hernando, Gabriel Brostow, Sara Vicente, Michael Firman

    Abstract: Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood. In contrast, our model leverages historical predictions by giving the latest 3D geometry data as an extra input… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.08960  [pdf, other

    cs.CV

    AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

    Authors: Jamie Watson, Filippo Aleotti, Mohamed Sayed, Zawar Qureshi, Oisin Mac Aodha, Gabriel Brostow, Michael Firman, Sara Vicente

    Abstract: Extracting planes from a 3D scene is useful for downstream tasks in robotics and augmented reality. In this paper we tackle the problem of estimating the planar surfaces in a scene from posed images. Our first finding is that a surprisingly competitive baseline results from combining popular clustering algorithms with recent improvements in 3D geometry estimation. However, such purely geometric me… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  4. arXiv:2306.01596  [pdf, other

    cs.CV

    Two-View Geometry Scoring Without Correspondences

    Authors: Axel Barroso-Laguna, Eric Brachmann, Victor Adrian Prisacariu, Gabriel J. Brostow, Daniyar Turmukhambetov

    Abstract: Camera pose estimation for two-view geometry traditionally relies on RANSAC. Normally, a multitude of image correspondences leads to a pool of proposed hypotheses, which are then scored to find a winning model. The inlier count is generally regarded as a reliable indicator of "consensus". We examine this scoring heuristic, and find that it favors disappointing models under certain circumstances. A… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  5. arXiv:2305.07014  [pdf, other

    cs.CV

    Virtual Occlusions Through Implicit Depth

    Authors: Jamie Watson, Mohamed Sayed, Zawar Qureshi, Gabriel J. Brostow, Sara Vicente, Oisin Mac Aodha, Michael Firman

    Abstract: For augmented reality (AR), it is important that virtual assets appear to `sit among' real world objects. The virtual element should variously occlude and be occluded by real matter, based on a plausible depth ordering. This occlusion should be consistent over time as the viewer's camera moves. Unfortunately, small mistakes in the estimated scene depth can ruin the downstream occlusion mask, and t… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted to CVPR 2023

  6. Automatic Joint Parameter Estimation from Magnetic Motion Capture Data

    Authors: James F. O'Brien, Robert E. Bodenheimer, Gabriel J. Brostow, Jessica K. Hodgins

    Abstract: This paper describes a technique for using magnetic motion capture data to determine the joint parameters of an articulated hierarchy. This technique makes it possible to determine limb lengths, joint locations, and sensor placement for a human subject without external measurements. Instead, the joint parameters are inferred with high accuracy from the motion data acquired during the capture sessi… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: 8 pages, 8 figures, 4 tables

    ACM Class: I.3.5

    Journal ref: In Proceedings of Graphics Interface 2000, pages 53-60, May 2000

  7. arXiv:2212.11966  [pdf, other

    cs.CV

    Removing Objects From Neural Radiance Fields

    Authors: Silvan Weder, Guillermo Garcia-Hernando, Aron Monszpart, Marc Pollefeys, Gabriel Brostow, Michael Firman, Sara Vicente

    Abstract: Neural Radiance Fields (NeRFs) are emerging as a ubiquitous scene representation that allows for novel view synthesis. Increasingly, NeRFs will be shareable with other people. Before sharing a NeRF, though, it might be desirable to remove personal information or unsightly objects. Such removal is not easily achieved with the current NeRF editing frameworks. We propose a framework to remove objects… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  8. arXiv:2212.07098  [pdf, other

    cs.CV cs.GR

    Interactive Sketching of Mannequin Poses

    Authors: Gizem Unlu, Mohamed Sayed, Gabriel Brostow

    Abstract: It can be easy and even fun to sketch humans in different poses. In contrast, creating those same poses on a 3D graphics "mannequin" is comparatively tedious. Yet 3D body poses are necessary for various downstream applications. We seek to preserve the convenience of 2D sketching while giving users of different skill levels the flexibility to accurately and more quickly pose\slash refine a 3D manne… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: accepted and published at 3DV 2022

  9. arXiv:2210.03794  [pdf, other

    cs.CV

    SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models

    Authors: Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha

    Abstract: Vision-language models such as CLIP are pretrained on large volumes of internet sourced image and text pairs, and have been shown to sometimes exhibit impressive zero- and low-shot image classification performance. However, due to their size, fine-tuning these models on new datasets can be prohibitively expensive, both in terms of the supervision and compute required. To combat this, a series of l… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  10. arXiv:2108.06435  [pdf, other

    cs.CV cs.LG

    Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring

    Authors: Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha

    Abstract: We address the problem of learning self-supervised representations from unlabeled image collections. Unlike existing approaches that attempt to learn useful features by maximizing similarity between augmented versions of each input image or by speculatively picking negative samples, we instead also make use of the natural variation that occurs in image collections that are captured using static mo… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  11. arXiv:2104.14540  [pdf, other

    cs.CV

    The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

    Authors: Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel Brostow, Michael Firman

    Abstract: Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to impr… ▽ More

    Submitted 14 July, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  12. arXiv:2104.03962  [pdf, other

    cs.CV

    Panoptic Segmentation Forecasting

    Authors: Colin Graber, Grace Tsai, Michael Firman, Gabriel Brostow, Alexander Schwing

    Abstract: Our goal is to forecast the near future given a set of recent observations. We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents which need not only passively analyze an observation but also must react to it in real-time. Importantly, accurate forecasting hinges upon the chosen scene decomposition. We think that superior forecasting can be achiev… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  13. arXiv:2104.02538  [pdf, other

    cs.CV

    Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

    Authors: Mehmet Ozgur Turkoglu, Eric Brachmann, Konrad Schindler, Gabriel Brostow, Aron Monszpart

    Abstract: Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment. The highest-scoring methods are "structure based," and need the query camera's intrinsics as an input to the model, with careful geometric optimization. When intrinsics are absent, methods vie for accuracy by making various other assumptions. This yie… ▽ More

    Submitted 12 April, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

  14. arXiv:2012.02087  [pdf, other

    cs.GR cs.HC cs.RO

    LookOut! Interactive Camera Gimbal Controller for Filming Long Takes

    Authors: Mohamed Sayed, Robert Cinca, Enrico Costanza, Gabriel Brostow

    Abstract: The job of a camera operator is challenging, and potentially dangerous, when filming long moving camera shots. Broadly, the operator must keep the actors in-frame while safely navigating around obstacles, and while fulfilling an artistic vision. We propose a unified hardware and software system that distributes some of the camera operator's burden, freeing them up to focus on safety and aesthetics… ▽ More

    Submitted 30 April, 2022; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: V3: - ToG version with final edits

    Journal ref: ACM Trans. Graph. 41, 3, Article 30 (March 2022), 22 pages

  15. arXiv:2011.14448  [pdf, other

    cs.CV

    Improved Handling of Motion Blur in Online Object Detection

    Authors: Mohamed Sayed, Gabriel Brostow

    Abstract: We wish to detect specific categories of objects, for online vision systems that will run in the real world. Object detection is already very challenging. It is even harder when the images are blurred, from the camera being in a car or a hand-held phone. Most existing efforts either focused on sharp images, with easy to label ground truth, or they have treated motion blur as one of many generic co… ▽ More

    Submitted 30 March, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: Mirroring accepted CVPR paper. Added results for other real-world blur datasets. Main paper: 8 pages + 3 references. Supplemental: 9 pages

  16. arXiv:2008.10634  [pdf, other

    cs.CV

    DiverseNet: When One Right Answer is not Enough

    Authors: Michael Firman, Neill D. F. Campbell, Lourdes Agapito, Gabriel J. Brostow

    Abstract: Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a sing… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

    Comments: Presented at CVPR 2018

  17. arXiv:2008.09497  [pdf, other

    cs.CV

    Single-Image Depth Prediction Makes Feature Matching Easier

    Authors: Carl Toft, Daniyar Turmukhambetov, Torsten Sattler, Fredrik Kahl, Gabriel Brostow

    Abstract: Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical.… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

    Comments: 14 pages, 7 figures, accepted for publication at the European conference on computer vision (ECCV) 2020

    ACM Class: I.4

  18. arXiv:2008.06959  [pdf, other

    cs.CV

    Image Stylization for Robust Features

    Authors: Iaroslav Melekhov, Gabriel J. Brostow, Juho Kannala, Daniyar Turmukhambetov

    Abstract: Local features that are robust to both viewpoint and appearance changes are crucial for many computer vision tasks. In this work we investigate if photorealistic image stylization improves robustness of local features to not only day-night, but also weather and season variations. We show that image stylization in addition to color augmentation is a powerful method of learning robust features. We e… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: v1.1

  19. arXiv:2008.05785  [pdf, other

    cs.CV cs.LG

    Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings

    Authors: Anita Rau, Guillermo Garcia-Hernando, Danail Stoyanov, Gabriel J. Brostow, Daniyar Turmukhambetov

    Abstract: To what extent are two images picturing the same 3D surfaces? Even when this is a known scene, the answer typically requires an expensive search across scale space, with matching and geometric verification of large sets of local features. This expense is further multiplied when a query image is evaluated against a gallery, e.g. in visual relocalization. While we don't obviate the need for geometri… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  20. arXiv:2008.01484  [pdf, other

    cs.CV

    Learning Stereo from Single Images

    Authors: Jamie Watson, Oisin Mac Aodha, Daniyar Turmukhambetov, Gabriel J. Brostow, Michael Firman

    Abstract: Supervised deep networks are among the best methods for finding correspondences in stereo image pairs. Like all supervised approaches, these networks require ground truth data during training. However, collecting large quantities of accurate dense correspondence data is very challenging. We propose that it is unnecessary to have such a high reliance on ground truth depths or even corresponding ste… ▽ More

    Submitted 20 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted as an oral presentation at ECCV 2020

  21. arXiv:2004.06376  [pdf, other

    cs.CV

    Footprints and Free Space from a Single Color Image

    Authors: Jamie Watson, Michael Firman, Aron Monszpart, Gabriel J. Brostow

    Abstract: Understanding the shape of a scene from a single color image is a formidable computer vision task. However, most methods aim to predict the geometry of surfaces that are visible to the camera, which is of limited use when planning paths for robots or augmented reality agents. Such agents can only move when grounded on a traversable surface, which we define as the set of classes which humans can al… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: Accepted to CVPR 2020 as an oral presentation

  22. arXiv:1909.09051  [pdf, other

    cs.CV

    Self-Supervised Monocular Depth Hints

    Authors: Jamie Watson, Michael Firman, Gabriel J. Brostow, Daniyar Turmukhambetov

    Abstract: Monocular depth estimators can be trained with various forms of self-supervision from binocular-stereo data to circumvent the need for high-quality laser scans or other ground-truth data. The disadvantage, however, is that the photometric reprojection losses used with self-supervised learning typically have multiple local minima. These plausible-looking alternatives to ground truth can restrict wh… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted to ICCV 2019

  23. arXiv:1806.01260  [pdf, other

    cs.CV stat.ML

    Digging Into Self-Supervised Monocular Depth Estimation

    Authors: Clément Godard, Oisin Mac Aodha, Michael Firman, Gabriel Brostow

    Abstract: Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.… ▽ More

    Submitted 17 August, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: ICCV 19

  24. arXiv:1804.04458  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    CubeNet: Equivariance to 3D Rotation and Translation

    Authors: Daniel Worrall, Gabriel Brostow

    Abstract: 3D Convolutional Neural Networks are sensitive to transformations applied to their input. This is a problem because a voxelized version of a 3D object, and its rotated clone, will look unrelated to each other after passing through to the last layer of a network. Instead, an idealized model would preserve a meaningful representation of the voxelized object, while explaining the pose-difference betw… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Comments: Preprint

  25. arXiv:1711.07476  [pdf, other

    cs.LG cs.CV stat.ML

    Virtual Adversarial Ladder Networks For Semi-supervised Learning

    Authors: Saki Shinoda, Daniel E. Worrall, Gabriel J. Brostow

    Abstract: Semi-supervised learning (SSL) partially circumvents the high cost of labeling data by augmenting a small labeled dataset with a large and relatively cheap unlabeled dataset drawn from the same distribution. This paper offers a novel interpretation of two deep learning-based SSL approaches, ladder networks and virtual adversarial training (VAT), as applying distributional smoothing to their respec… ▽ More

    Submitted 12 December, 2017; v1 submitted 20 November, 2017; originally announced November 2017.

    Comments: Camera-ready version for NIPS 2017 workshop Learning with Limited Labeled Data

  26. arXiv:1710.07307  [pdf, other

    cs.CV

    Interpretable Transformations with Encoder-Decoder Networks

    Authors: Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow

    Abstract: Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding fac… ▽ More

    Submitted 19 October, 2017; originally announced October 2017.

    Comments: Accepted at ICCV 2017

  27. Responsive Action-based Video Synthesis

    Authors: Corneliu Ilisescu, Halil Aytac Kanaci, Matteo Romagnoli, Neill D. F. Campbell, Gabriel J. Brostow

    Abstract: We propose technology to enable a new medium of expression, where video elements can be looped, merged, and triggered, interactively. Like audio, video is easy to sample from the real world but hard to segment into clean reusable elements. Reusing a video clip means non-linear editing and compositing with novel footage. The new context dictates how carefully a clip must be prepared, so our end-to-… ▽ More

    Submitted 20 May, 2017; originally announced May 2017.

    Comments: 10 pages, 12 figures, 1 table, accepted and published in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

    ACM Class: H.5.2

  28. arXiv:1612.04642  [pdf, other

    cs.CV cs.LG stat.ML

    Harmonic Networks: Deep Translation and Rotation Equivariance

    Authors: Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow

    Abstract: Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more diffi… ▽ More

    Submitted 11 April, 2017; v1 submitted 14 December, 2016; originally announced December 2016.

    Comments: Submitted to CVPR 2017

  29. arXiv:1611.03906  [pdf, other

    cs.HC

    Help, It Looks Confusing: GUI Task Automation Through Demonstration and Follow-up Questions

    Authors: Thanapong Intharah, Daniyar Turmukhambetov, Gabriel J. Brostow

    Abstract: Non-programming users should be able to create their own customized scripts to perform computer-based tasks for them, just by demonstrating to the machine how it's done. To that end, we develop a system prototype which learns-by-demonstration called HILC (Help, It Looks Confusing). Users train HILC to synthesize a task script by demonstrating the task, which produces the needed screenshots and the… ▽ More

    Submitted 13 January, 2017; v1 submitted 11 November, 2016; originally announced November 2016.

    Comments: Camera Ready version. Accepted to be presented at the ACM IUI 2017

  30. arXiv:1609.08080  [pdf, other

    cs.CV

    Swipe Mosaics from Video

    Authors: Malcolm Reynolds, Tom S. F. Haines, Gabriel J. Brostow

    Abstract: A panoramic image mosaic is an attractive visualization for viewing many overlap** photos, but its images must be both captured and processed correctly to produce an acceptable composite. We propose Swipe Mosaics, an interactive visualization that places the individual video frames on a 2D planar map that represents the layout of the physical scene. Compared to traditional panoramic mosaics, our… ▽ More

    Submitted 26 September, 2016; originally announced September 2016.

  31. arXiv:1609.03677  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Monocular Depth Estimation with Left-Right Consistency

    Authors: Clément Godard, Oisin Mac Aodha, Gabriel J. Brostow

    Abstract: Learning based methods have shown very promising results for the task of depth estimation in single images. However, most existing approaches treat depth prediction as a supervised regression problem and as a result, require vast quantities of corresponding ground truth depth data for training. Just recording quality depth data in a range of environments is a challenging problem. In this paper, we… ▽ More

    Submitted 12 April, 2017; v1 submitted 13 September, 2016; originally announced September 2016.

    Comments: CVPR 2017 oral

  32. arXiv:1504.08219  [pdf, other

    cs.CV cs.LG stat.ML

    Hierarchical Subquery Evaluation for Active Learning on a Graph

    Authors: Oisin Mac Aodha, Neill D. F. Campbell, Jan Kautz, Gabriel J. Brostow

    Abstract: To train good supervised and semi-supervised object classifiers, it is critical that we not waste the time of the human experts who are providing the training labels. Existing active learning strategies can have uneven performance, being efficient on some datasets but wasteful on others, or inconsistent just between runs on the same dataset. We propose perplexity based graph construction and a new… ▽ More

    Submitted 30 April, 2015; originally announced April 2015.

    Comments: CVPR 2014

  33. arXiv:1504.07575  [pdf, other

    cs.CV cs.LG stat.ML

    Becoming the Expert - Interactive Multi-Class Machine Teaching

    Authors: Edward Johns, Oisin Mac Aodha, Gabriel J. Brostow

    Abstract: Compared to machines, humans are extremely good at classifying images into categories, especially when they possess prior knowledge of the categories at hand. If this prior information is not available, supervision in the form of teaching images is required. To learn categories more quickly, people should see important and representative images first, followed by less important images later - or n… ▽ More

    Submitted 28 April, 2015; originally announced April 2015.

    Comments: CVPR 2015

  34. arXiv:1502.04983  [pdf, other

    cs.CV

    Context Tricks for Cheap Semantic Segmentation

    Authors: Thanapong Intharah, Gabriel J. Brostow

    Abstract: Accurate semantic labeling of image pixels is difficult because intra-class variability is often greater than inter-class variability. In turn, fast semantic segmentation is hard because accurate models are usually too complicated to also run quickly at test-time. Our experience with building and running semantic segmentation systems has also shown a reasonably obvious bottleneck on model complexi… ▽ More

    Submitted 17 February, 2015; originally announced February 2015.

    Comments: Supplementary material can be found at http://www0.cs.ucl.ac.uk/staff/T.Intharah/research.html