Skip to main content

Showing 1–8 of 8 results for author: Howard-Jenkins, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13064  [pdf, other

    cs.CV

    SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

    Authors: Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas

    Abstract: We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: see project page, https://projectaria.com/scenescript

  2. arXiv:2308.13561  [pdf, other

    cs.HC cs.CV

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

    Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More

    Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  3. arXiv:2307.13485  [pdf, other

    cs.CV

    Cos R-CNN for Online Few-shot Object Detection

    Authors: Gratianus Wesley Putra Data, Henry Howard-Jenkins, David Murray, Victor Prisacariu

    Abstract: We propose Cos R-CNN, a simple exemplar-based R-CNN formulation that is designed for online few-shot object detection. That is, it is able to localise and classify novel object categories in images with few examples without fine-tuning. Cos R-CNN frames detection as a learning-to-compare task: unseen classes are represented as exemplar images, and objects are detected based on their similarity to… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Unpublished tech report from 2020

  4. arXiv:2104.09169  [pdf, other

    cs.CV

    LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments

    Authors: Henry Howard-Jenkins, Jose-Raul Ruiz-Sarmiento, Victor Adrian Prisacariu

    Abstract: We present LaLaLoc to localise in environments without the need for prior visitation, and in a manner that is robust to large changes in scene appearance, such as a full rearrangement of furniture. Specifically, LaLaLoc performs localisation through latent representations of room layout. LaLaLoc learns a rich embedding space shared between RGB panoramas and layouts inferred from a known floor plan… ▽ More

    Submitted 12 October, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: As presented at the International Conference on Computer Vision (ICCV) 2021

  5. arXiv:2003.12059  [pdf, other

    cs.CV

    Correspondence Networks with Adaptive Neighbourhood Consensus

    Authors: Shuda Li, Kai Han, Theo W. Costain, Henry Howard-Jenkins, Victor Prisacariu

    Abstract: In this paper, we tackle the task of establishing dense visual correspondences between images containing objects of the same category. This is a challenging task due to large intra-class variations and a lack of dense pixel level annotations. We propose a convolutional neural network architecture, called adaptive neighbourhood consensus network (ANC-Net), that can be trained end-to-end with sparse… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

    Comments: CVPR 2020. Project page: https://ancnet.avlcode.org/

  6. arXiv:1912.01438  [pdf, other

    cs.CV cs.RO

    FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation

    Authors: Zirui Wang, Shuda Li, Henry Howard-Jenkins, Victor Adrian Prisacariu, Min Chen

    Abstract: We present FlowNet3D++, a deep scene flow estimation network. Inspired by classical methods, FlowNet3D++ incorporates geometric constraints in the form of point-to-plane distance and angular alignment between individual vectors in the flow field, into FlowNet3D. We demonstrate that the addition of these geometric loss terms improves the previous state-of-art FlowNet3D accuracy from 57.85% to 63.43… ▽ More

    Submitted 26 April, 2021; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: WACV 2020

  7. arXiv:1912.00673  [pdf, other

    cs.LG stat.ML

    GroSS: Group-Size Series Decomposition for Grouped Architecture Search

    Authors: Henry Howard-Jenkins, Yiwen Li, Victor A. Prisacariu

    Abstract: We present a novel approach which is able to explore the configuration of grouped convolutions within neural networks. Group-size Series (GroSS) decomposition is a mathematical formulation of tensor factorisation into a series of approximations of increasing rank terms. GroSS allows for dynamic and differentiable selection of factorisation rank, which is analogous to a grouped convolution. Therefo… ▽ More

    Submitted 16 July, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted for publication at ECCV 2020

  8. arXiv:1905.03105  [pdf, other

    cs.CV

    Thinking Outside the Box: Generation of Unconstrained 3D Room Layouts

    Authors: Henry Howard-Jenkins, Shuda Li, Victor Prisacariu

    Abstract: We propose a method for room layout estimation that does not rely on the typical box approximation or Manhattan world assumption. Instead, we reformulate the geometry inference problem as an instance detection task, which we solve by directly regressing 3D planes using an R-CNN. We then use a variant of probabilistic clustering to combine the 3D planes regressed at each frame in a video sequence,… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: Asian Conference on Computer Vision (ACCV), 2018