Skip to main content

Showing 1–33 of 33 results for author: Graham, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02599  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D Gen

    Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

    Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2307.12067  [pdf, other

    cs.CV

    Replay: Multi-modal Multi-view Acted Videos for Casual Holography

    Authors: Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova

    Abstract: We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 4000 minutes of footage and over 7 million… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted for ICCV 2023. Roman, Yanir, and Ignacio contributed equally

  3. arXiv:2307.07635  [pdf, other

    cs.CV

    CoTracker: It is Better to Track Together

    Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

    Abstract: We introduce CoTracker, a transformer-based model that tracks dense points in a frame jointly across a video sequence. This differs from most existing state-of-the-art approaches that track points independently, ignoring their correlation. We show that joint tracking results in a significantly higher tracking accuracy and robustness. We also provide several technical innovations, including the con… ▽ More

    Submitted 26 December, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Code and model weights are available at: https://co-tracker.github.io/

  4. arXiv:2305.02296  [pdf, other

    cs.CV cs.AI

    DynamicStereo: Consistent Dynamic Depth from Stereo Videos

    Authors: Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

    Abstract: We consider the problem of reconstructing a dynamic scene observed from a stereo camera. Most existing methods for depth from stereo treat different stereo frames independently, leading to temporally inconsistent depth predictions. Temporal consistency is especially important for immersive AR or VR scenarios, where flickering greatly diminishes the user experience. We propose DynamicStereo, a nove… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: CVPR 2023; project page available at https://dynamic-stereo.github.io/

  5. arXiv:2303.11898  [pdf, other

    cs.CV cs.GR

    Real-time volumetric rendering of dynamic humans

    Authors: Ignacio Rocco, Iurii Makarov, Filippos Kokkinos, David Novotny, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos with accompanying parametric body fits. Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h. These speedups are obtained by using a lightweight deformation model solely based on linear blend sk… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Project page: https://real-time-humans.github.io/

  6. arXiv:2212.03236  [pdf, other

    cs.CV

    Self-Supervised Correspondence Estimation via Multiview Registration

    Authors: Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham

    Abstract: Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlap** frames. To address this, we propose a self-supervised approach for cor… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted to WACV 2023. Project page: https://mbanani.github.io/syncmatch/

  7. arXiv:2209.11882  [pdf, ps, other

    math.CO cs.DM cs.IT

    Logarithmically larger deletion codes of all distances

    Authors: Noga Alon, Gabriela Bourla, Ben Graham, Xiaoyu He, Noah Kravitz

    Abstract: The deletion distance between two binary words $u,v \in \{0,1\}^n$ is the smallest $k$ such that $u$ and $v$ share a common subsequence of length $n-k$. A set $C$ of binary words of length $n$ is called a $k$-deletion code if every pair of distinct words in $C$ has deletion distance greater than $k$. In 1965, Levenshtein initiated the study of deletion codes by showing that, for $k\ge 1$ fixed and… ▽ More

    Submitted 17 October, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

  8. arXiv:2109.00033  [pdf, other

    cs.CV

    DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

    Authors: Roman Shapovalov, David Novotny, Benjamin Graham, Patrick Labatut, Andrea Vedaldi

    Abstract: We tackle the problem of monocular 3D reconstruction of articulated objects like humans and animals. We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. This is in stark contrast with previous deformable reconstruction methods that use parametric models such as SMPL pre-trained on a large dataset of 3D object scans… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

    Comments: Accepted for ICCV 2021

    ACM Class: I.4.5

  9. arXiv:2104.11225  [pdf, other

    cs.CV

    Pri3D: Can 3D Priors Help 2D Representation Learning?

    Authors: Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, Matthias Nießner

    Abstract: Recent advances in 3D perception have shown impressive progress in understanding geometric structures of 3Dshapes and even scenes. Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints. We introduce an approach to learn view-invariant,geometry-aware representations for network pre-training, based on mu… ▽ More

    Submitted 18 December, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: ICCV 2021

  10. arXiv:2104.01136  [pdf, other

    cs.CV

    LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

    Authors: Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze

    Abstract: We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  11. arXiv:2012.09165  [pdf, other

    cs.CV

    Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts

    Authors: Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie

    Abstract: The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e.g. point clouds) are notoriously hard. For example, the number of scenes (e.g. indoor rooms) that can be accessed and scanned might be limited; even given sufficient data, acquiring 3D labels (e.g. instance masks) requires intensive human labor. In this paper, we expl… ▽ More

    Submitted 25 June, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: CVPR 2021

  12. arXiv:2011.10359  [pdf, other

    cs.CV cs.LG eess.IV

    RidgeSfM: Structure from Motion via Robust Pairwise Matching Under Depth Uncertainty

    Authors: Benjamin Graham, David Novotny

    Abstract: We consider the problem of simultaneously estimating a dense depth map and camera pose for a large set of images of an indoor scene. While classical SfM pipelines rely on a two-step approach where cameras are first estimated using a bundle adjustment in order to ground the ensuing multi-view stereo stage, both our poses and dense reconstructions are a direct output of an altered bundle adjuster. T… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Presenting at 3DV 2020. Source code released at https://github.com/facebookresearch/RidgeSfM

  13. arXiv:2011.00980  [pdf, other

    cs.CV

    3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data

    Authors: Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny

    Abstract: We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we aim at recovering several plausible reconstructions compatible with the input data. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020 Spotlight; 14 pages including supplementary

  14. arXiv:2004.07320  [pdf, other

    cs.LG stat.ML

    Training with Quantization Noise for Extreme Model Compression

    Authors: Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin

    Abstract: We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression… ▽ More

    Submitted 28 February, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

  15. arXiv:2003.03524  [pdf, other

    cs.LG cs.IT stat.ML

    The Variational InfoMax Learning Objective

    Authors: Vincenzo Crescimanna, Bruce Graham

    Abstract: Bayesian Inference and Information Bottleneck are the two most popular objectives for neural networks, but they can be optimised only via a variational lower bound: the Variational Information Bottleneck (VIB). In this manuscript we show that the two objectives are actually equivalent to the InfoMax: maximise the information between the data and the labels. The InfoMax representation of the two ob… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

  16. arXiv:1909.02533  [pdf, other

    cs.CV cs.GR

    C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

    Authors: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations. In order to achieve this factorization, we introduce a nov… ▽ More

    Submitted 15 October, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Added a link to the source code into the abstract

    Journal ref: IEEE/CVF International Conference on Computer Vision 2019

  17. arXiv:1907.05686  [pdf, other

    cs.CV

    And the Bit Goes Down: Revisiting the Quantization of Neural Networks

    Authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou

    Abstract: In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights. The principle of our approach is that it minimizes the loss reconstruction error for in-domain inputs. Our method only requires a set of unla… ▽ More

    Submitted 9 November, 2020; v1 submitted 12 July, 2019; originally announced July 2019.

    Comments: ICLR 2020 camera-ready

  18. arXiv:1905.10549  [pdf, other

    cs.LG stat.ML

    The Variational InfoMax AutoEncoder

    Authors: Vincenzo Crescimanna, Bruce Graham

    Abstract: The Variational AutoEncoder (VAE) learns simultaneously an inference and a generative model, but only one of these models can be learned at optimum, this behaviour is associated to the ELBO learning objective, that is optimised by a non-informative generator. In order to solve such an issue, we provide a learning objective, learning a maximal informative generator while maintaining bounded the net… ▽ More

    Submitted 8 November, 2020; v1 submitted 25 May, 2019; originally announced May 2019.

  19. arXiv:1902.10416  [pdf, other

    cs.CV cs.LG

    Equi-normalization of Neural Networks

    Authors: Pierre Stock, Benjamin Graham, Rémi Gribonval, Hervé Jégou

    Abstract: Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network. Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the L2 norm of the weights, equivalently the weight decay regularizer. It provably conv… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

    Comments: ICLR 2019 camera-ready

  20. arXiv:1901.08019  [pdf, other

    cs.LG stat.ML

    An information theoretic approach to the autoencoder

    Authors: Vincenzo Crescimanna, Bruce Graham

    Abstract: We present a variation of the Autoencoder (AE) that explicitly maximizes the mutual information between the input data and the hidden representation. The proposed model, the InfoMax Autoencoder (IMAE), by construction is able to learn a robust representation and good prototypes of the data. IMAE is compared both theoretically and then computationally with the state of the art models: the Denoising… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

    Comments: 10 pages

  21. arXiv:1811.10355  [pdf, other

    cs.CV cs.AI

    Unsupervised learning with sparse space-and-time autoencoders

    Authors: Benjamin Graham

    Abstract: We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time. We evaluate the resulting latent spaces by testing their usefulness for downstream tasks. Applications are to handwriting recognition in 2D, segmentation for parts in 3D objects, segmentation for objects in 3D scenes, and bod… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

  22. arXiv:1802.08252  [pdf, other

    cs.DS cs.MS math.RA

    The iisignature library: efficient calculation of iterated-integral signatures and log signatures

    Authors: Jeremy Reizenstein, Benjamin Graham

    Abstract: Iterated-integral signatures and log signatures are vectors calculated from a path that characterise its shape. They come from the theory of differential equations driven by rough paths, and also have applications in statistics and machine learning. We present algorithms for efficiently calculating these signatures, and benchmark their performance. We release the methods as a Python package.

    Submitted 22 February, 2018; originally announced February 2018.

    Comments: 18 pages

  23. arXiv:1711.10275  [pdf, other

    cs.CV

    3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

    Authors: Benjamin Graham, Martin Engelcke, Laurens van der Maaten

    Abstract: Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: text overlap with arXiv:1706.01307

  24. arXiv:1710.06104  [pdf, other

    cs.CV

    Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

    Authors: Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra , et al. (25 additional authors not shown)

    Abstract: We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database. The benchmark consists of two tasks: part-level segmentation of 3D shapes and 3D reconstruction from single view images. Ten teams have participated in the challenge and the best performing teams have outperformed state-of-the-art approaches on both tasks. A few novel deep learni… ▽ More

    Submitted 27 October, 2017; v1 submitted 17 October, 2017; originally announced October 2017.

  25. arXiv:1706.01307  [pdf, other

    cs.NE cs.CV

    Submanifold Sparse Convolutional Networks

    Authors: Benjamin Graham, Laurens van der Maaten

    Abstract: Convolutional network are the de-facto standard for analysing spatio-temporal data such as images, videos, 3D shapes, etc. Whilst some of this data is naturally dense (for instance, photos), many other data sources are inherently sparse. Examples include pen-strokes forming on a piece of paper, or (colored) 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense"… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: 10 pages

  26. arXiv:1702.08231  [pdf, other

    cs.NE cs.CV

    Low-Precision Batch-Normalized Activations

    Authors: Benjamin Graham

    Abstract: Artificial neural networks can be trained with relatively low-precision floating-point and fixed-point arithmetic, using between one and 16 bits. Previous works have focused on relatively wide-but-shallow, feed-forward networks. We introduce a quantization scheme that is compatible with training very deep neural networks. Quantizing the network activations in the middle of each batch-normalization… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

    Comments: 16 pages

  27. arXiv:1510.06925  [pdf, other

    cs.CV cs.NE

    Confusing Deep Convolution Networks by Relabelling

    Authors: Leigh Robinson, Benjamin Graham

    Abstract: Deep convolutional neural networks have become the gold standard for image recognition tasks, demonstrating many current state-of-the-art results and even achieving near-human level performance on some tasks. Despite this fact it has been shown that their strong generalisation qualities can be fooled to misclassify previously correctly classified natural images and give erroneous high confidence c… ▽ More

    Submitted 3 December, 2015; v1 submitted 23 October, 2015; originally announced October 2015.

    Comments: Submitted to BMVC 2015

  28. arXiv:1505.02890  [pdf, other

    cs.CV

    Sparse 3D convolutional neural networks

    Authors: Ben Graham

    Abstract: We have implemented a convolutional neural network designed for processing sparse three-dimensional input data. The world we live in is three dimensional so there are a large number of potential applications including 3D object recognition and analysis of space-time objects. In the quest for efficiency, we experiment with CNNs on the 2D triangular-lattice and 3D tetrahedral-lattice.

    Submitted 25 August, 2015; v1 submitted 12 May, 2015; originally announced May 2015.

    Comments: BMVC 2015

  29. arXiv:1502.02478  [pdf, other

    cs.NE cs.CV

    Efficient batchwise dropout training using submatrices

    Authors: Ben Graham, Jeremy Reizenstein, Leigh Robinson

    Abstract: Dropout is a popular technique for regularizing artificial neural networks. Dropout networks are generally trained by minibatch gradient descent with a dropout mask turning off some of the units---a different pattern of dropout is applied to every sample in the minibatch. We explore a very simple alternative to the dropout mask. Instead of masking dropped out units by setting them to zero, we perf… ▽ More

    Submitted 9 February, 2015; originally announced February 2015.

  30. arXiv:1412.6071  [pdf, other

    cs.CV

    Fractional Max-Pooling

    Authors: Benjamin Graham

    Abstract: Convolutional networks almost always incorporate some form of spatial pooling, and very often it is alpha times alpha max-pooling with alpha=2. Max-pooling act on the hidden layers of the network, reducing their size by an integer multiplicative factor alpha. The amazing by-product of discarding 75% of your data is that you build into the network a degree of invariance with respect to translations… ▽ More

    Submitted 12 May, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

  31. arXiv:1409.6070  [pdf, other

    cs.CV cs.NE

    Spatially-sparse convolutional neural networks

    Authors: Benjamin Graham

    Abstract: Convolutional neural networks (CNNs) perform well on problems such as handwriting recognition and image classification. However, the performance of the networks is often limited by budget and time constraints, particularly when trying to train deep networks. Motivated by the problem of online handwriting recognition, we developed a CNN for processing spatially-sparse inputs; a character drawn wi… ▽ More

    Submitted 21 September, 2014; originally announced September 2014.

    Comments: 13 pages

  32. arXiv:1309.1410  [pdf, ps, other

    math.PR cs.IT math.CO

    A binary deletion channel with a fixed number of deletions

    Authors: Benjamin Graham

    Abstract: Suppose a binary string x = x_1...x_n is being broadcast repeatedly over a faulty communication channel. Each time, the channel delivers a fixed number m of the digits (m<n) with the lost digits chosen uniformly at random, and the order of the surviving digits preserved. How large does m have to be to reconstruct the message?

    Submitted 5 September, 2013; originally announced September 2013.

    Comments: 4 pages

    MSC Class: 60C05; 68P30

    Journal ref: Combinator. Probab. Comp. 24 (2015) 486-489

  33. arXiv:1308.0371  [pdf, other

    cs.CV cs.NE

    Sparse arrays of signatures for online character recognition

    Authors: Benjamin Graham

    Abstract: In mathematics the signature of a path is a collection of iterated integrals, commonly used for solving differential equations. We show that the path signature, used as a set of features for consumption by a convolutional neural network (CNN), improves the accuracy of online character recognition---that is the task of reading characters represented as a collection of paths. Using datasets of lette… ▽ More

    Submitted 1 December, 2013; v1 submitted 1 August, 2013; originally announced August 2013.

    Comments: 10 pages, 2 figures