Skip to main content

Showing 1–50 of 60 results for author: Cipolla, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.12057  [pdf, other

    cs.CV

    NPLMV-PS: Neural Point-Light Multi-View Photometric Stereo

    Authors: Fotios Logothetis, Ignas Budvytis, Roberto Cipolla

    Abstract: In this work we present a novel multi-view photometric stereo (PS) method. Like many works in 3D reconstruction we are leveraging neural shape representations and learnt renderers. However, our work differs from the state-of-the-art multi-view PS methods such as PS-NeRF or SuperNormal we explicity leverage per-pixel intensity renderings rather than relying mainly on estimated normals. We model p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  2. arXiv:2404.09271  [pdf, other

    cs.CV cs.RO

    VRS-NeRF: Visual Relocalization with Sparse Neural Radiance Field

    Authors: Fei Xue, Ignas Budvytis, Daniel Olmeda Reino, Roberto Cipolla

    Abstract: Visual relocalization is a key technique to autonomous driving, robotics, and virtual/augmented reality. After decades of explorations, absolute pose regression (APR), scene coordinate regression (SCR), and hierarchical methods (HMs) have become the most popular frameworks. However, in spite of high efficiency, APRs and SCRs have limited accuracy especially in large-scale outdoor scenes; HMs are a… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: source code https://github.com/feixue94/vrs-nerf

  3. arXiv:2404.07785  [pdf, other

    cs.CV cs.RO

    PRAM: Place Recognition Anywhere Model for Efficient Visual Localization

    Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla

    Abstract: Humans localize themselves efficiently in known environments by first recognizing landmarks defined on certain objects and their spatial relationships, and then verifying the location by aligning detailed structures of recognized objects with those in the memory. Inspired by this, we propose the place recognition anywhere model (PRAM) to perform visual localization as efficiently as humans do. PRA… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: project page: https://feixue94.github.io/pram-project/

  4. arXiv:2312.09056  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    ReCoRe: Regularized Contrastive Representation Learning of World Model

    Authors: Rudra P. K. Poudel, Harit Pandya, Stephan Liwicki, Roberto Cipolla

    Abstract: While recent model-free Reinforcement Learning (RL) methods have demonstrated human-level effectiveness in gaming environments, their success in everyday tasks like visual navigation has been limited, particularly under significant appearance variations. This limitation arises from (i) poor sample efficiency and (ii) over-fitting to training scenarios. To address these challenges, we present a wor… ▽ More

    Submitted 3 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR 2024. arXiv admin note: text overlap with arXiv:2209.14932

  5. arXiv:2311.17593  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.RO

    LanGWM: Language Grounded World Model

    Authors: Rudra P. K. Poudel, Harit Pandya, Chao Zhang, Roberto Cipolla

    Abstract: Recent advances in deep reinforcement learning have showcased its potential in tackling complex tasks. However, experiments on visual control tasks have revealed that state-of-the-art reinforcement learning models struggle with out-of-distribution generalization. Conversely, expressing higher-level concepts and global contexts is relatively easy using language. Building upon recent success of th… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  6. arXiv:2311.05958  [pdf, other

    cs.CV

    A Neural Height-Map Approach for the Binocular Photometric Stereo Problem

    Authors: Fotios Logothetis, Ignas Budvytis, Roberto Cipolla

    Abstract: In this work we propose a novel, highly practical, binocular photometric stereo (PS) framework, which has same acquisition speed as single view PS, however significantly improves the quality of the estimated geometry. As in recent neural multi-view shape estimation frameworks such as NeRF, SIREN and inverse graphics approaches to multi-view photometric stereo (e.g. PS-NeRF) we formulate shape es… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  7. arXiv:2310.18279  [pdf, other

    cs.CV

    FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data

    Authors: Oliver Boyne, Gwangbin Bae, James Charles, Roberto Cipolla

    Abstract: Surface reconstruction from multi-view images is a challenging task, with solutions often requiring a large number of sampled images with high overlap. We seek to develop a method for few-view reconstruction, for the case of the human foot. To solve this task, we must extract rich geometric cues from RGB images, before carefully fusing them into a final 3D object. Our FOUND approach tackles this,… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 14 pages, 15 figures

  8. arXiv:2310.11184  [pdf, other

    cs.CV

    Sparse Multi-Object Render-and-Compare

    Authors: Florian Langer, Ignas Budvytis, Roberto Cipolla

    Abstract: Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries, including robotics, augmented reality, and digital content creation. This can be done by directly predicting 3D shape in various representations or by retrieving CAD models from a database and predicting their alignments. Directly predicting 3D shapes often produces unrealistic, over… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  9. arXiv:2305.06968  [pdf, other

    cs.CV

    HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation

    Authors: Akash Sengupta, Ignas Budvytis, Roberto Cipolla

    Abstract: Monocular 3D human pose and shape estimation is an ill-posed problem since multiple 3D solutions can explain a 2D image of a subject. Recent approaches predict a probability distribution over plausible 3D pose and shape parameters conditioned on the image. We show that these approaches exhibit a trade-off between three key properties: (i) accuracy - the likelihood of the ground-truth 3D solution u… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: CVPR 2023

  10. arXiv:2304.14845  [pdf, other

    cs.CV

    SFD2: Semantic-guided Feature Detection and Description

    Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla

    Abstract: Visual localization is a fundamental task for various applications including autonomous driving and robotics. Prior methods focus on extracting large amounts of often redundant locally reliable features, resulting in limited efficiency and accuracy, especially in large-scale environments under challenging conditions. Instead, we propose to extract globally reliable features by implicitly embedding… ▽ More

    Submitted 11 June, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. code is available at https://github.com/feixue94/sfd2

  11. arXiv:2304.14837  [pdf, other

    cs.CV

    IMP: Iterative Matching and Pose Estimation with Adaptive Pooling

    Authors: Fei Xue, Ignas Budvytis, Roberto Cipolla

    Abstract: Previous methods solve feature matching and pose estimation using a two-stage process by first finding matches and then estimating the pose. As they ignore the geometric relationships between the two tasks, they focus on either improving the quality of matches or filtering potential outliers, leading to limited efficiency or accuracy. In contrast, we propose an iterative matching and pose estimati… ▽ More

    Submitted 11 June, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. code available at https://github.com/feixue94/imp-release

  12. arXiv:2210.12241  [pdf, other

    cs.CV

    FIND: An Unsupervised Implicit 3D Model of Articulated Human Feet

    Authors: Oliver Boyne, James Charles, Roberto Cipolla

    Abstract: In this paper we present a high fidelity and articulated 3D human foot model. The model is parameterised by a disentangled latent code in terms of shape, texture and articulated pose. While high fidelity models are typically created with strong supervision such as 3D keypoint correspondences or pre-registration, we focus on the difficult case of little to no annotation. To this end, we make the fo… ▽ More

    Submitted 21 November, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  13. arXiv:2210.07729  [pdf, other

    cs.CV cs.AI cs.RO

    Model-Based Imitation Learning for Urban Driving

    Authors: Anthony Hu, Gianluca Corrado, Nicolas Griffiths, Zak Murez, Corina Gurau, Hudson Yeo, Alex Kendall, Roberto Cipolla, Jamie Shotton

    Abstract: An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expe… ▽ More

    Submitted 3 November, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  14. A CNN Based Approach for the Point-Light Photometric Stereo Problem

    Authors: Fotios Logothetis, Roberto Mecca, Ignas Budvytis, Roberto Cipolla

    Abstract: Reconstructing the 3D shape of an object using several images under different light sources is a very challenging task, especially when realistic assumptions such as light propagation and attenuation, perspective viewing geometry and specular light reflection are considered. Many of works tackling Photometric Stereo (PS) problems often relax most of the aforementioned assumptions. Especially they… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: arXiv admin note: text overlap with arXiv:2009.05792

  15. arXiv:2210.03676  [pdf, other

    cs.CV

    IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty

    Authors: Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

    Abstract: Single image surface normal estimation and depth estimation are closely related problems as the former can be calculated from the latter. However, the surface normals computed from the output of depth estimation methods are significantly less accurate than the surface normals directly estimated by networks. To reduce such discrepancy, we introduce a novel framework that uses surface normal and its… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  16. arXiv:2210.02579  [pdf, other

    cs.CV

    DigiFace-1M: 1 Million Digital Face Images for Face Recognition

    Authors: Gwangbin Bae, Martin de La Gorce, Tadas Baltrusaitis, Charlie Hewitt, Dong Chen, Julien Valentin, Roberto Cipolla, **g**g Shen

    Abstract: State-of-the-art face recognition models show impressive accuracy, achieving over 99.8% on Labeled Faces in the Wild (LFW) dataset. Such models are trained on large-scale datasets that contain millions of real human face images collected from the internet. Web-crawled face images are severely biased (in terms of race, lighting, make-up, etc) and often contain label noise. More importantly, the fac… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: WACV 2023

  17. arXiv:2210.01044  [pdf, other

    cs.CV

    SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image

    Authors: Florian Langer, Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

    Abstract: Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation. Often this is done through direct mesh predictions which produces unrealistic, overly tessellated shapes or by formulating shape prediction as a retrieval task followed by CAD model alignment. Directly predicting CAD model poses from 2D image… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  18. arXiv:2209.14932  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Contrastive Unsupervised Learning of World Model with Invariant Causal Features

    Authors: Rudra P. K. Poudel, Harit Pandya, Roberto Cipolla

    Abstract: In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and th… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  19. arXiv:2112.08177  [pdf, other

    cs.CV

    Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

    Authors: Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

    Abstract: Multi-view depth estimation methods typically require the computation of a multi-view cost-volume, which leads to huge memory consumption and slow inference. Furthermore, multi-view matching can fail for texture-less surfaces, reflective surfaces and moving objects. For such failure modes, single-view depth estimation methods are often more reliable. To this end, we propose MaGNet, a novel framewo… ▽ More

    Submitted 29 March, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 (oral)

  20. arXiv:2112.05585  [pdf, other

    cs.CV

    Discrete neural representations for explainable anomaly detection

    Authors: Stanislaw Szymanowicz, James Charles, Roberto Cipolla

    Abstract: The aim of this work is to detect and automatically generate high-level explanations of anomalous events in video. Understanding the cause of an anomalous event is crucial as the required response is dependant on its nature and severity. Recent works typically use object or action classifier to detect and provide labels for anomalous events. However, this constrains detection systems to a finite s… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Journal ref: Winter Conference on Applications of Computer Vision 2022

  21. arXiv:2111.15404  [pdf, other

    cs.CV

    Probabilistic Estimation of 3D Human Shape and Pose with a Semantic Local Parametric Model

    Authors: Akash Sengupta, Ignas Budvytis, Roberto Cipolla

    Abstract: This paper addresses the problem of 3D human body shape and pose estimation from RGB images. Some recent approaches to this task predict probability distributions over human body model parameters conditioned on the input images. This is motivated by the ill-posed nature of the problem wherein multiple 3D reconstructions may match the image evidence, particularly when some parts of the body are loc… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: BMVC 2021

  22. arXiv:2111.05615  [pdf, other

    cs.CV

    Leveraging Geometry for Shape Estimation from a Single RGB Image

    Authors: Florian Langer, Ignas Budvytis, Roberto Cipolla

    Abstract: Predicting 3D shapes and poses of static objects from a single RGB image is an important research area in modern computer vision. Its applications range from augmented reality to robotics and digital content creation. Typically this task is performed through direct object shape and pose predictions which is inaccurate. A promising research direction ensures meaningful shape predictions by retrievi… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  23. arXiv:2110.00990  [pdf, other

    cs.CV

    Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild

    Authors: Akash Sengupta, Ignas Budvytis, Roberto Cipolla

    Abstract: This paper addresses the problem of 3D human body shape and pose estimation from an RGB image. This is often an ill-posed problem, since multiple plausible 3D bodies may match the visual evidence present in the input - particularly when the subject is occluded. Thus, it is desirable to estimate a distribution over 3D body shape and pose conditioned on the input image instead of a single 3D reconst… ▽ More

    Submitted 23 November, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: ICCV 2021 (Edited to reduce file size)

  24. arXiv:2109.09881  [pdf, other

    cs.CV

    Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation

    Authors: Gwangbin Bae, Ignas Budvytis, Roberto Cipolla

    Abstract: Surface normal estimation from a single image is an important task in 3D scene understanding. In this paper, we address two limitations shared by the existing methods: the inability to estimate the aleatoric uncertainty and lack of detail in the prediction. The proposed network estimates the per-pixel surface normal probability distribution. We introduce a new parameterization for the distribution… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 (oral)

  25. arXiv:2106.08856  [pdf, other

    cs.CV

    X-MAN: Explaining multiple sources of anomalies in video

    Authors: Stanislaw Szymanowicz, James Charles, Roberto Cipolla

    Abstract: Our objective is to detect anomalies in video while also automatically explaining the reason behind the detector's response. In a practical sense, explainability is crucial for this task as the required response to an anomaly depends on its nature and severity. However, most leading methods (based on deep neural networks) are not interpretable and hide the decision making process in uninterpretabl… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2021

  26. arXiv:2104.13135  [pdf, other

    cs.CV

    LUCES: A Dataset for Near-Field Point Light Source Photometric Stereo

    Authors: Roberto Mecca, Fotios Logothetis, Ignas Budvytis, Roberto Cipolla

    Abstract: Three-dimensional reconstruction of objects from shading information is a challenging task in computer vision. As most of the approaches facing the Photometric Stereo problem use simplified far-field assumptions, real-world scenarios have essentially more complex physical effects that need to be handled for accurately reconstructing the 3D shape. An increasing number of methods have been proposed… ▽ More

    Submitted 12 October, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

  27. arXiv:2104.10490  [pdf, other

    cs.CV cs.RO

    FIERY: Future Instance Prediction in Bird's-Eye View from Surround Monocular Cameras

    Authors: Anthony Hu, Zak Murez, Nikhil Mohan, Sofía Dudas, Jeffrey Hawke, Vijay Badrinarayanan, Roberto Cipolla, Alex Kendall

    Abstract: Driving requires interacting with road agents and predicting their future behaviour in order to navigate safely. We present FIERY: a probabilistic future prediction model in bird's-eye view from monocular cameras. Our model predicts future instance segmentation and motion of dynamic agents that can be transformed into non-parametric future trajectories. Our approach combines the perception, sensor… ▽ More

    Submitted 18 October, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: ICCV 2021

  28. arXiv:2103.10978  [pdf, other

    cs.CV

    Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild

    Authors: Akash Sengupta, Ignas Budvytis, Roberto Cipolla

    Abstract: This paper addresses the problem of 3D human body shape and pose estimation from RGB images. Recent progress in this field has focused on single images, video or multi-view images as inputs. In contrast, we propose a new task: shape and pose estimation from a group of multiple images of a human subject, without constraints on subject pose, camera viewpoint or background conditions between images i… ▽ More

    Submitted 30 March, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: Accepted at CVPR 2021, 16 pages, 8 figures

  29. arXiv:2009.10013  [pdf, other

    cs.CV

    Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild

    Authors: Akash Sengupta, Ignas Budvytis, Roberto Cipolla

    Abstract: This paper addresses the problem of monocular 3D human shape and pose estimation from an RGB image. Despite great progress in this field in terms of pose prediction accuracy, state-of-the-art methods often predict inaccurate body shapes. We suggest that this is primarily due to the scarcity of in-the-wild training data with diverse and accurate body shape labels. Thus, we propose STRAPS (Synthetic… ▽ More

    Submitted 22 September, 2020; v1 submitted 21 September, 2020; originally announced September 2020.

    Comments: 14 pages, 7 figures, BMVC 2020, Fixed abstract typos

  30. arXiv:2009.05792  [pdf, other

    cs.CV

    A CNN Based Approach for the Near-Field Photometric Stereo Problem

    Authors: Fotios Logothetis, Ignas Budvytis, Roberto Mecca, Roberto Cipolla

    Abstract: Reconstructing the 3D shape of an object using several images under different light sources is a very challenging task, especially when realistic assumptions such as light propagation and attenuation, perspective viewing geometry and specular light reflection are considered. Many of works tackling Photometric Stereo (PS) problems often relax most of the aforementioned assumptions. Especially they… ▽ More

    Submitted 12 September, 2020; originally announced September 2020.

  31. arXiv:2009.05429  [pdf, other

    cs.RO cs.AI cs.CV

    Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments

    Authors: Steven D. Morad, Roberto Mecca, Rudra P. K. Poudel, Stephan Liwicki, Roberto Cipolla

    Abstract: We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents can navigat… ▽ More

    Submitted 6 January, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

  32. arXiv:2008.04933  [pdf, other

    cs.CV

    PX-NET: Simple and Efficient Pixel-Wise Training of Photometric Stereo Networks

    Authors: Fotios Logothetis, Ignas Budvytis, Roberto Mecca, Roberto Cipolla

    Abstract: Retrieving accurate 3D reconstructions of objects from the way they reflect light is a very challenging task in computer vision. Despite more than four decades since the definition of the Photometric Stereo problem, most of the literature has had limited success when global illumination effects such as cast shadows, self-reflections and ambient light come into play, especially for specular surface… ▽ More

    Submitted 12 October, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

  33. Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop

    Authors: Benjamin Biggs, Oliver Boyne, James Charles, Andrew Fitzgibbon, Roberto Cipolla

    Abstract: We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images. The large variation in shape between dog breeds, significant occlusion and low quality of internet images makes this a challenging problem. We learn a richer prior over shapes than previous work, which helps regularize parameter estimation. We demonstrate results on the Stanfor… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: Accepted at ECCV 2020

    Journal ref: 16th European Conference Glasgow UK August 23 to 28 2020 Proceedings Part XI

  34. arXiv:2003.13402  [pdf, other

    cs.CV

    Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks

    Authors: Thomas Roddick, Roberto Cipolla

    Abstract: Autonomous vehicles commonly rely on highly detailed birds-eye-view maps of their environment, which capture both static elements of the scene such as road layout as well as dynamic elements such as other cars and pedestrians. Generating these map representations on the fly is a complex multi-stage process which incorporates many important vision-based elements, including ground plane estimation,… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  35. arXiv:1912.08969  [pdf, other

    cs.CV

    Learning a Spatio-Temporal Embedding for Video Instance Segmentation

    Authors: Anthony Hu, Alex Kendall, Roberto Cipolla

    Abstract: We present a novel embedding approach for video instance segmentation. Our method learns a spatio-temporal embedding integrating cues from appearance, motion, and geometry; a 3D causal convolutional network models motion, and a monocular self-supervised depth loss models geometry. In this embedding space, video-pixels of the same instance are clustered together while being separated from other ins… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

  36. arXiv:1909.10239  [pdf, other

    cs.CV

    Large Scale Joint Semantic Re-Localisation and Scene Understanding via Globally Unique Instance Coordinate Regression

    Authors: Ignas Budvytis, Marvin Teichmann, Tomas Vojir, Roberto Cipolla

    Abstract: In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: BMVC 2019

  37. arXiv:1907.12849  [pdf, other

    cs.CV

    Orientation-aware Semantic Segmentation on Icosahedron Spheres

    Authors: Chao Zhang, Stephan Liwicki, William Smith, Roberto Cipolla

    Abstract: We address semantic segmentation on omnidirectional images, to leverage a holistic understanding of the surrounding scene for applications like autonomous driving systems. For the spherical domain, several methods recently adopt an icosahedron mesh, but systems are typically rotation invariant or require significant memory and parameters, thus enabling execution only at very low resolutions. In ou… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

    Comments: 9 pages, accepted to iccv 2019

  38. arXiv:1902.04502  [pdf, other

    cs.CV

    Fast-SCNN: Fast Semantic Segmentation Network

    Authors: Rudra P K Poudel, Stephan Liwicki, Roberto Cipolla

    Abstract: The encoder-decoder framework is state-of-the-art for offline semantic image segmentation. Since the rise in autonomous systems, real-time computation is increasingly desirable. In this paper, we introduce fast segmentation convolutional neural network (Fast-SCNN), an above real-time semantic segmentation model on high resolution image data (1024x2048px) suited to efficient computation on embedded… ▽ More

    Submitted 12 February, 2019; originally announced February 2019.

  39. arXiv:1811.08188  [pdf, other

    cs.CV

    Orthographic Feature Transform for Monocular 3D Object Detection

    Authors: Thomas Roddick, Alex Kendall, Roberto Cipolla

    Abstract: 3D object detection from monocular images has proven to be an enormously challenging task, with the performance of leading systems not yet achieving even 10\% of that of LiDAR-based counterparts. One explanation for this performance gap is that existing systems are entirely at the mercy of the perspective image-based representation, in which the appearance and scale of objects varies drastically w… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  40. arXiv:1811.05804  [pdf, other

    cs.CV

    Creatures great and SMAL: Recovering the shape and motion of animals from video

    Authors: Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, Roberto Cipolla

    Abstract: We present a system to recover the 3D shape and motion of a wide variety of quadrupeds from video. The system comprises a machine learning front-end which predicts candidate 2D joint positions, a discrete optimization which finds kinematically plausible joint correspondences, and an energy minimization stage which fits a detailed 3D model to the image. In order to overcome the limited availability… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: 17 pages, ACCV 2018 oral paper

  41. arXiv:1811.01984  [pdf, other

    cs.CV

    A Differential Volumetric Approach to Multi-View Photometric Stereo

    Authors: Fotios Logothetis, Roberto Mecca, Roberto Cipolla

    Abstract: Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, sup… ▽ More

    Submitted 2 August, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

  42. arXiv:1805.04777  [pdf, other

    cs.CV

    Convolutional CRFs for Semantic Segmentation

    Authors: Marvin T. T. Teichmann, Roberto Cipolla

    Abstract: For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs. In more recent works however, CRF post-processing has fallen out of favour. We argue that this is mainly due to the slow training and inference speeds of CRFs, as well as the d… ▽ More

    Submitted 15 May, 2018; v1 submitted 12 May, 2018; originally announced May 2018.

    Comments: 8 Pages + Appendix, references. Code can be found under: https://github.com/MarvinTeichmann/ConvCRF

  43. arXiv:1705.07115  [pdf, other

    cs.CV

    Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

    Authors: Alex Kendall, Yarin Gal, Roberto Cipolla

    Abstract: Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We prop… ▽ More

    Submitted 24 April, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: CVPR 2018

  44. arXiv:1704.00390  [pdf, other

    cs.CV

    Geometric Loss Functions for Camera Pose Regression with Deep Learning

    Authors: Alex Kendall, Roberto Cipolla

    Abstract: Deep learning has shown to be effective for robust and real-time monocular image relocalisation. In particular, PoseNet is a deep convolutional neural network which learns to regress the 6-DOF camera pose from a single image. It learns to localize using high level features and is robust to difficult lighting, motion blur and unknown camera intrinsics, where point based SIFT registration fails. How… ▽ More

    Submitted 23 May, 2017; v1 submitted 2 April, 2017; originally announced April 2017.

    Comments: CVPR 2017

  45. arXiv:1612.07695  [pdf, other

    cs.CV cs.RO

    MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

    Authors: Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun

    Abstract: While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation via a unified architecture where the encoder is shared amongst the three task… ▽ More

    Submitted 8 May, 2018; v1 submitted 22 December, 2016; originally announced December 2016.

    Comments: 9 pages, 7 tables and 9 figures; first place on Kitti Road Segmentation; Code on GitHub (https://github.com/MarvinTeichmann/MultiNet)

  46. arXiv:1605.06489  [pdf, other

    cs.NE cs.CV cs.LG

    Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

    Authors: Yani Ioannou, Duncan Robertson, Roberto Cipolla, Antonio Criminisi

    Abstract: We propose a new method for creating computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. This allows a significant reduction in computational cost and number of parameters compared to state-of-the-art deep CNNs, without compromising accuracy, by exploiting the sparsity of inter-layer filter dependencies. W… ▽ More

    Submitted 30 November, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

    Comments: Updated full version of paper, in full letter paper two-column paper. Includes many textual changes, updated CIFAR10 results, and new analysis of inter/intra-layer correlation

  47. arXiv:1604.06832  [pdf, ps, other

    cs.CV

    Refining Architectures of Deep Convolutional Neural Networks

    Authors: Sukrit Shankar, Duncan Robertson, Yani Ioannou, Antonio Criminisi, Roberto Cipolla

    Abstract: Deep Convolutional Neural Networks (CNNs) have recently evinced immense success for various image recognition tasks. However, a question of paramount importance is somewhat unanswered in deep learning research - is the selected CNN optimal for the dataset in terms of accuracy and model size? In this paper, we intend to answer this question and introduce a novel strategy that alters the architectur… ▽ More

    Submitted 22 April, 2016; originally announced April 2016.

    Comments: 9 pages, 6 figures, CVPR 2016

  48. arXiv:1511.07041  [pdf, other

    cs.CV

    SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

    Authors: Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

    Abstract: Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data --- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive wh… ▽ More

    Submitted 26 November, 2015; v1 submitted 22 November, 2015; originally announced November 2015.

  49. arXiv:1511.06744  [pdf, other

    cs.CV cs.LG cs.NE

    Training CNNs with Low-Rank Filters for Efficient Image Classification

    Authors: Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, Antonio Criminisi

    Abstract: We propose a new method for creating computationally efficient convolutional neural networks (CNNs) by using low-rank representations of convolutional filters. Rather than approximating filters in previously-trained networks with more efficient versions, we learn a set of small basis filters from scratch; during training, the network learns to combine these basis filters into more complex filters… ▽ More

    Submitted 7 February, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016. v3: updated ICLR status. v2: Incorporated reviewer's feedback including: Amend Fig. 2 and 5 descriptions to explain that there are no ReLUs within the figures. Fix headings of Table 5 - Fix typo in the sentence at bottom of page 6. Add ref. to Predicting Parameters in Deep Learning. Fix Table 6, GMP-LR and GMP-LR-2x had incorrect numbers of filters

    Journal ref: International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2-4 May 2016

  50. arXiv:1511.06309  [pdf, other

    cs.LG cs.CV

    Spatio-temporal video autoencoder with differentiable memory

    Authors: Viorica Patraucean, Ankur Handa, Roberto Cipolla

    Abstract: We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal decoder a robust optical flow prediction modu… ▽ More

    Submitted 1 September, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: The experiments section has been extended and a direct application to weakly-supervised video segmentation through label propagation has been included