Skip to main content

Showing 1–50 of 61 results for author: Litany, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18677  [pdf, other

    cs.CV

    Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

    Authors: Ido Sobol, Chenfeng Xu, Or Litany

    Abstract: Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences. Recent advancements in diffusion models, particularly the Zero-1-to-3 model, have been widely adopted for generating plausible views, videos, and 3D models. However, these models still s… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page: https://zero2hero-nvs.github.io

  2. arXiv:2403.09577  [pdf, other

    cs.CV

    The NeRFect Match: Exploring NeRF Features for Visual Localization

    Authors: Qunjie Zhou, Maxim Maximov, Or Litany, Laura Leal-Taixé

    Abstract: In this work, we propose the use of Neural Radiance Fields (NeRF) as a scene representation for visual localization. Recently, NeRF has been employed to enhance pose regression and scene coordinate regression models by augmenting the training database, providing auxiliary supervision through rendered images, or serving as an iterative refinement module. We extend its recognized advantages -- its a… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2402.15321  [pdf, other

    cs.CV cs.AI cs.LG

    OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

    Authors: Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, Marc Pollefeys, Leonidas Guibas, Hongbo Tian, Chunjie Wang, Xiaosheng Yan, Bingwen Wang, Xuanyang Zhang, Xiao Liu, Phuc Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham, Zhening Huang, Xiaoyang Wu, Xi Chen , et al. (3 additional authors not shown)

    Abstract: This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and map**. We provide an overview of the chall… ▽ More

    Submitted 17 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Our OpenSUN3D workshop website for ICCV 2023: https://opensun3d.github.io/index_iccv23.html

  4. arXiv:2402.08529  [pdf, other

    cs.LG cs.CV

    Approximately Piecewise E(3) Equivariant Point Networks

    Authors: Matan Atzmon, Jiahui Huang, Francis Williams, Or Litany

    Abstract: Integrating a notion of symmetry into point cloud neural networks is a provably effective way to improve their generalization capability. Of particular interest are $E(3)$ equivariant point cloud networks where Euclidean transformations applied to the inputs are preserved in the outputs. Recent efforts aim to extend networks that are $E(3)$ equivariant, to accommodate inputs made of multiple parts… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  5. arXiv:2401.08559  [pdf, other

    cs.CV cs.GR cs.LG

    Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

    Authors: Mathis Petrovich, Or Litany, Umar Iqbal, Michael J. Black, Gül Varol, Xue Bin Peng, Davis Rempe

    Abstract: Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To… ▽ More

    Submitted 24 May, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: CVPR 2024, HuMoGen Workshop

  6. arXiv:2312.05247  [pdf, other

    cs.CV

    Dynamic LiDAR Re-simulation using Compositional Neural Fields

    Authors: Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger, Or Litany, Konrad Schindler, Shengyu Huang

    Abstract: We introduce DyNFL, a novel neural field-based approach for high-fidelity re-simulation of LiDAR scans in dynamic driving scenes. DyNFL processes LiDAR measurements from dynamic environments, accompanied by bounding boxes of moving objects, to construct an editable neural field. This field, comprising separately reconstructed static background and dynamic objects, allows users to modify viewpoints… ▽ More

    Submitted 3 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Project page: https://shengyuh.github.io/dynfl

  7. arXiv:2311.04391  [pdf, other

    cs.CV

    3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

    Authors: Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

    Abstract: We present 3DiffTection, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming. Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are in… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Project page: \url{https://research.nvidia.com/labs/toronto-ai/3difftection/}

  8. arXiv:2311.02077  [pdf, other

    cs.CV

    EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

    Authors: Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang

    Abstract: We present EmerNeRF, a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes. Grounded in neural fields, EmerNeRF simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrap**. EmerNeRF hinges upon two core components: First, it stratifies scenes into static and dynamic fields. This decomposition emerges purely from… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: See the project page for code, data, and request pre-trained models: https://emernerf.github.io

  9. arXiv:2309.05192  [pdf, other

    cs.CV

    Towards Viewpoint Robustness in Bird's Eye View Segmentation

    Authors: Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez

    Abstract: Autonomous vehicles (AV) require that neural networks used for perception be robust to different viewpoints if they are to be deployed across many types of vehicles without the repeated cost of data collection and labeling for each. AV companies typically focus on collecting data from diverse scenarios and locations, but not camera rig configurations, due to cost. As a result, only a small number… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Project Page: https://nvlabs.github.io/viewpoint-robustness

  10. arXiv:2308.13900  [pdf, other

    cs.CV cs.LG

    Semi-Supervised Semantic Segmentation via Marginal Contextual Information

    Authors: Moshe Kimhi, Shai Kimhi, Evgenii Zheltonozhskii, Or Litany, Chaim Baskin

    Abstract: We present a novel confidence refinement scheme that enhances pseudo labels in semi-supervised semantic segmentation. Unlike existing methods, which filter pixels with low-confidence predictions in isolation, our approach leverages the spatial correlation of labels in segmentation maps by grou** neighboring pixels and considering their pseudo labels collectively. With this contextual information… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: Published at TMLR

  11. arXiv:2305.19590  [pdf, other

    cs.CV

    Neural Kernel Surface Reconstruction

    Authors: Jiahui Huang, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler, Francis Williams

    Abstract: We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud. Our approach builds upon the recently introduced Neural Kernel Fields (NKF) representation. It enjoys similar generalization capabilities to NKF, while simultaneously addressing its main limitations: (a) We can scale to large scenes through compactly supported kernel functions, whi… ▽ More

    Submitted 9 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: CVPR 2023

  12. arXiv:2305.13220  [pdf, other

    cs.CV

    Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

    Authors: Wei Dong, Chris Choy, Charles Loop, Or Litany, Yuke Zhu, Anima Anandkumar

    Abstract: Indoor scene reconstruction from monocular images has long been sought after by augmented reality and robotics developers. Recent advances in neural field representations and monocular priors have led to remarkable results in scene-level surface reconstructions. The reliance on Multilayer Perceptrons (MLP), however, significantly limits speed in training and rendering. In this work, we propose to… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: CVPR 2023

  13. arXiv:2305.01643  [pdf, other

    cs.CV

    Neural LiDAR Fields for Novel View Synthesis

    Authors: Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, Or Litany

    Abstract: We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints. NFL combines the rendering power of neural fields with a detailed, physically motivated model of the LiDAR sensing process, thus enabling it to accurately reproduce key sensor behaviors like beam diver… ▽ More

    Submitted 13 August, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICCV 2023 - camera ready. Project page: https://research.nvidia.com/labs/toronto-ai/nfl/

  14. arXiv:2304.01893  [pdf, other

    cs.CV cs.GR cs.LG

    Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

    Authors: Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany

    Abstract: We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. We draw on recent advances in guided diffusion modeling to achieve test-time controllability of trajectories, which is normally only associated with rule-based systems. Our guided diffusion model allows users to constrain trajectories through target way… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  15. arXiv:2303.14541  [pdf, other

    cs.CV

    UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

    Authors: David Rozenberszki, Or Litany, Angela Dai

    Abstract: 3D instance segmentation is fundamental to geometric understanding of the world around us. Existing methods for instance segmentation of 3D scenes rely on supervision from expensive, manual 3D annotations. We propose UnScene3D, the first fully unsupervised 3D learning approach for class-agnostic 3D instance segmentation of indoor scans. UnScene3D first generates pseudo masks by leveraging self-sup… ▽ More

    Submitted 30 April, 2024; v1 submitted 25 March, 2023; originally announced March 2023.

    Comments: Project page: https://rozdavid.github.io/unscene3d, paper updated according to CVPR24 camera ready version

  16. arXiv:2210.06978  [pdf, other

    cs.CV cs.LG stat.ML

    LION: Latent Point Diffusion Models for 3D Shape Generation

    Authors: Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis

    Abstract: Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the hierarchical… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  17. arXiv:2210.03105  [pdf, other

    cs.CV

    Mask3D: Mask Transformer for 3D Semantic Instance Segmentation

    Authors: Jonas Schult, Francis Engelmann, Alexander Hermans, Or Litany, Siyu Tang, Bastian Leibe

    Abstract: Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic T… ▽ More

    Submitted 12 April, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICRA 2023 camera-ready version

  18. arXiv:2209.11163  [pdf, other

    cs.CV

    GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

    Authors: Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, Sanja Fidler

    Abstract: As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident. In our work, we aim to train performant 3D generative models that synthesize textured meshes which can be directly consumed by 3D rendering engines, thus immediately usable in downstream a… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, Project Page: https://nv-tlabs.github.io/GET3D/

  19. arXiv:2208.08580  [pdf, other

    cs.CV cs.AI cs.GR

    MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D Segmentation

    Authors: Gopal Sharma, Kangxue Yin, Subhransu Maji, Evangelos Kalogerakis, Or Litany, Sanja Fidler

    Abstract: We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks. This is inspired by the observation that view-based surface representations are more effective at modeling high-resolution surface details and texture than their 3D counterparts based on point clouds or voxel occupancy. Specifically, given a 3D shape, we render it from multiple views, an… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: project page: https://nv-tlabs.github.io/MvDeCor/

  20. arXiv:2204.07761  [pdf, other

    cs.CV

    Language-Grounded Indoor 3D Semantic Segmentation in the Wild

    Authors: David Rozenberszki, Or Litany, Angela Dai

    Abstract: Recent advances in 3D semantic segmentation with deep neural networks have shown remarkable success, with rapid performance increase on available datasets. However, current 3D semantic segmentation benchmarks contain only a small number of categories -- less than 30 for ScanNet and SemanticKITTI, for instance, which are not enough to reflect the diversity of real environments (e.g., semantic image… ▽ More

    Submitted 28 July, 2022; v1 submitted 16 April, 2022; originally announced April 2022.

    Comments: Project page: https://rozdavid.github.io/scannet200, GitHub: https://github.com/RozDavid/LanguageGroundedSemseg, Video: https://www.youtube.com/watch?v=Cu-zW1oXrvU

  21. arXiv:2202.08345  [pdf, other

    cs.CV cs.GR

    Learning Smooth Neural Functions via Lipschitz Regularization

    Authors: Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, Or Litany

    Abstract: Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are commonly represented as neural networks which map latent descriptors and 3D coordinates to implicit function values. The latent descriptor of a neural field acts as a deformation handle for the 3D shape it represents. Thus, smoothness with respect to this descriptor is paramount for performing s… ▽ More

    Submitted 10 May, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  22. arXiv:2202.03651  [pdf, other

    cs.CV

    Causal Scene BERT: Improving object detection by searching for challenging groups of data

    Authors: Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler

    Abstract: Modern computer vision applications rely on learning-based perception modules parameterized with neural networks for tasks like object detection. These modules frequently have low expected error overall but high error on atypical groups of data due to biases inherent in the training process. In building autonomous vehicles (AV), this problem is an especially important challenge because their perce… ▽ More

    Submitted 21 April, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: In submission at JMLR; 0xe5110eA3B5014cd9a585Dc76c74Ee509F504Be14

  23. arXiv:2201.08459  [pdf, other

    cs.LG cs.AI

    Federated Learning with Heterogeneous Architectures using Graph HyperNetworks

    Authors: Or Litany, Haggai Maron, David Acuna, Jan Kautz, Gal Chechik, Sanja Fidler

    Abstract: Standard Federated Learning (FL) techniques are limited to clients with identical network architectures. This restricts potential use-cases like cross-platform training or inter-organizational collaboration when both data privacy and architectural proprietary are required. We propose a new FL framework that accommodates heterogeneous client architecture by adopting a graph hypernetwork for paramet… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

  24. arXiv:2112.05077  [pdf, other

    cs.CV cs.LG cs.RO

    Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior

    Authors: Davis Rempe, Jonah Philion, Leonidas J. Guibas, Sanja Fidler, Or Litany

    Abstract: Evaluating and improving planning for autonomous vehicles requires scalable generation of long-tail traffic scenarios. To be useful, these scenarios must be realistic and challenging, but not impossible to drive through safely. In this work, we introduce STRIVE, a method to automatically generate challenging scenarios that cause a given planner to produce undesirable behavior, like collisions. To… ▽ More

    Submitted 28 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: CVPR 2022 camera-ready

  25. arXiv:2111.13674  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Fields as Learnable Kernels for 3D Reconstruction

    Authors: Francis Williams, Zan Gojcic, Sameh Khamis, Denis Zorin, Joan Bruna, Sanja Fidler, Or Litany

    Abstract: We present Neural Kernel Fields: a novel method for reconstructing implicit 3D shapes based on a learned kernel ridge regression. Our technique achieves state-of-the-art results when reconstructing 3D objects and large scenes from sparse oriented points, and can reconstruct shape categories outside the training set with almost no drop in accuracy. The core insight of our approach is that kernel me… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  26. arXiv:2111.11426  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Fields in Visual Computing and Beyond

    Authors: Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, Srinath Sridhar

    Abstract: Recent advances in machine learning have created increasing interest in solving visual computing problems using a class of coordinate-based neural networks that parametrize physical properties of scenes or objects across space and time. These methods, which we call neural fields, have seen successful application in the synthesis of 3D shapes and image, animation of human bodies, 3D reconstruction,… ▽ More

    Submitted 5 April, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: Equal advising: Vincent Sitzmann and Srinath Sridhar

  27. arXiv:2111.00140  [pdf, other

    cs.CV cs.GR

    DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer

    Authors: Wenzheng Chen, Joey Litalien, Jun Gao, Zian Wang, Clement Fuji Tsang, Sameh Khamis, Or Litany, Sanja Fidler

    Abstract: We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiable renderers. Many previous learning-based approaches for inverse graphics adopt rasterization-based renderers and assume naive lighting and material models, which often fail to account for non-Lambertian, specular reflections commonly observed in the wild. In this work, we p… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

  28. arXiv:2110.02210  [pdf, other

    cs.CV

    Mix3D: Out-of-Context Data Augmentation for 3D Scenes

    Authors: Alexey Nekrasov, Jonas Schult, Or Litany, Bastian Leibe, Francis Engelmann

    Abstract: We present Mix3D, a data augmentation technique for segmenting large-scale 3D scenes. Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. However, strong contextual priors can have detrimental implications like mistaking a pedestrian crossing the street for… ▽ More

    Submitted 29 November, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at 3DV 2021. Camera-ready submission. Link to code: https://github.com/kumuji/mix3d - Project page: https://nekrasov.dev/mix3d/

  29. arXiv:2105.08016  [pdf, other

    cs.CV cs.CG

    StrobeNet: Category-Level Multiview Reconstruction of Articulated Objects

    Authors: Ge Zhang, Or Litany, Srinath Sridhar, Leonidas Guibas

    Abstract: We present StrobeNet, a method for category-level 3D reconstruction of articulating objects from one or more unposed RGB images. Reconstructing general articulating object categories % has important applications, but is challenging since objects can have wide variation in shape, articulation, appearance and topology. We address this by building on the idea of category-level articulation canonicali… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: preprint

  30. arXiv:2104.12229  [pdf, other

    cs.CV

    Vector Neurons: A General Framework for SO(3)-Equivariant Networks

    Authors: Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacchi, Leonidas Guibas

    Abstract: Invariance and equivariance to the rotation group have been widely discussed in the 3D deep learning community for pointclouds. Yet most proposed methods either use complex mathematical tools that may limit their accessibility, or are tied to specific input data types and network architectures. In this paper, we introduce a general framework built on top of what we call Vector Neuron representatio… ▽ More

    Submitted 25 April, 2021; originally announced April 2021.

  31. arXiv:2104.00514  [pdf, other

    cs.GR cs.CG cs.LG

    Learning Spectral Unions of Partial Deformable 3D Shapes

    Authors: Luca Moschella, Simone Melzi, Luca Cosmo, Filippo Maggioli, Or Litany, Maks Ovsjanikov, Leonidas Guibas, Emanuele Rodolà

    Abstract: Spectral geometric methods have brought revolutionary changes to the field of geometry processing. Of particular interest is the study of the Laplacian spectrum as a compact, isometry and permutation-invariant representation of a shape. Some recent works show how the intrinsic geometry of a full shape can be recovered from its spectrum, but there are approaches that consider the more challenging p… ▽ More

    Submitted 21 December, 2022; v1 submitted 31 March, 2021; originally announced April 2021.

    Comments: 18 pages, 20 figures

  32. Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

    Authors: Evgenii Zheltonozhskii, Chaim Baskin, Avi Mendelson, Alex M. Bronstein, Or Litany

    Abstract: The success of learning with noisy labels (LNL) methods relies heavily on the success of a warm-up stage where standard supervised training is performed using the full (noisy) training set. In this paper, we identify a "warm-up obstacle": the inability of standard warm-up stages to train high quality feature extractors and avert memorization of noisy labels. We propose "Contrast to Divide" (C2D),… ▽ More

    Submitted 20 October, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

  33. arXiv:2102.08945  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Weakly Supervised Learning of Rigid 3D Scene Flow

    Authors: Zan Gojcic, Or Litany, Andreas Wieser, Leonidas J. Guibas, Tolga Birdal

    Abstract: We propose a data-driven scene flow estimation algorithm exploiting the observation that many 3D scenes can be explained by a collection of agents moving as rigid bodies. At the core of our method lies a deep architecture able to reason at the \textbf{object-level} by considering 3D scene flow in conjunction with other 3D tasks. This object level abstraction, enables us to relax the requirement fo… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

  34. arXiv:2102.00863  [pdf, other

    cs.CV

    Self-Supervised Equivariant Scene Synthesis from Video

    Authors: Cinjon Resnick, Or Litany, Cosmas Heiß, Hugo Larochelle, Joan Bruna, Kyunghyun Cho

    Abstract: We propose a self-supervised framework to learn scene representations from video that are automatically delineated into background, characters, and their animations. Our method capitalizes on moving characters being equivariant with respect to their transformation across frames and the background being constant with respect to that same transformation. After training, we can manipulate image encod… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:2011.05787

  35. arXiv:2012.10518  [pdf, other

    cs.CV

    Human 3D keypoints via spatial uncertainty modeling

    Authors: Francis Williams, Or Litany, Avneesh Sud, Kevin Swersky, Andrea Tagliasacchi

    Abstract: We introduce a technique for 3D human keypoint estimation that directly models the notion of spatial uncertainty of a keypoint. Our technique employs a principled approach to modelling spatial uncertainty inspired from techniques in robust statistics. Furthermore, our pipeline requires no 3D ground truth labels, relying instead on (possibly noisy) 2D image-level keypoints. Our method achieves near… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

  36. arXiv:2012.04355  [pdf, other

    cs.CV

    3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection

    Authors: He Wang, Yezhen Cong, Or Litany, Yue Gao, Leonidas J. Guibas

    Abstract: 3D object detection is an important yet demanding task that heavily relies on difficult to obtain 3D annotations. To reduce the required amount of supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes. We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled t… ▽ More

    Submitted 6 July, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: CVPR 2021

  37. Non-Rigid Puzzles

    Authors: Or Litany, Emanuele Rodolà, Alex Bronstein, Michael Bronstein, Daniel Cremers

    Abstract: Shape correspondence is a fundamental problem in computer graphics and vision, with applications in various problems including animation, texture map**, robotic vision, medical imaging, archaeology and many more. In settings where the shapes are allowed to undergo non-rigid deformations and only partial views are available, the problem becomes very challenging. To this end, we present a non-rigi… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Journal ref: Computer Graphics Forum, Volume 35, Issue 5, August 2016

  38. arXiv:2011.05787  [pdf, other

    cs.CV

    Learned Equivariant Rendering without Transformation Supervision

    Authors: Cinjon Resnick, Or Litany, Hugo Larochelle, Joan Bruna, Kyunghyun Cho

    Abstract: We propose a self-supervised framework to learn scene representations from video that are automatically delineated into objects and background. Our method relies on moving objects being equivariant with respect to their transformation across frames and the background being constant. After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, trans… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Workshop on Differentiable Vision, Graphics, and Physics in Machine Learning at NeurIPS 2020

  39. arXiv:2008.07792  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

    Authors: Fei Xia, Chengshu Li, Roberto Martín-Martín, Or Litany, Alexander Toshev, Silvio Savarese

    Abstract: Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners… ▽ More

    Submitted 26 March, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: First two authors contributed equally. Access project website at http://svl.stanford.edu/projects/relmogen

  40. arXiv:2007.10985  [pdf, other

    cs.CV

    PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

    Authors: Saining Xie, Jiatao Gu, Demi Guo, Charles R. Qi, Leonidas J. Guibas, Or Litany

    Abstract: Arguably one of the top success stories of deep learning is transfer learning. The finding that pre-training a network on a rich source set (eg., ImageNet) can help boost performance once fine-tuned on a usually much smaller target set, has been instrumental to many applications in language and vision. Yet, very little is known about its usefulness in 3D point cloud understanding. We see this as a… ▽ More

    Submitted 20 November, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: ECCV 2020 (Spotlight); code available at https://github.com/facebookresearch/PointContrast

  41. arXiv:2007.10300  [pdf, other

    cs.CV

    Object-Centric Multi-View Aggregation

    Authors: Shubham Tulsiani, Or Litany, Charles R. Qi, He Wang, Leonidas J. Guibas

    Abstract: We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid. Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation, and then combined -- in a manner that can accommodate a variable number of views and… ▽ More

    Submitted 21 July, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

  42. arXiv:2002.11829  [pdf, other

    cs.LG stat.ML

    Representation Learning Through Latent Canonicalizations

    Authors: Or Litany, Ari Morcos, Srinath Sridhar, Leonidas Guibas, Judy Hoffman

    Abstract: We seek to learn a representation on a large annotated data source that generalizes to a target domain using limited new supervision. Many prior approaches to this problem have focused on learning "disentangled" representations so that as individual factors vary in a new domain, only a portion of the representation need be updated. In this work, we seek the generalization power of disentangled rep… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  43. arXiv:2002.08599  [pdf, other

    cs.LG stat.ML

    On Learning Sets of Symmetric Elements

    Authors: Haggai Maron, Or Litany, Gal Chechik, Ethan Fetaya

    Abstract: Learning from unordered sets is a fundamental learning setup, recently attracting increasing attention. Research in this area has focused on the case where elements of the set are represented by feature vectors, and far less emphasis has been given to the common case where set elements themselves adhere to their own symmetries. That case is relevant to numerous applications, from deblurring image… ▽ More

    Submitted 29 November, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: 37th International Conference on Machine Learning, Vienna,2020, Outstanding paper award

  44. arXiv:2002.02506  [pdf, other

    cs.CV cs.CG

    Continuous Geodesic Convolutions for Learning on 3D Shapes

    Authors: Zhangsihao Yang, Or Litany, Tolga Birdal, Srinath Sridhar, Leonidas Guibas

    Abstract: The majority of descriptor-based methods for geometric processing of non-rigid shape rely on hand-crafted descriptors. Recently, learning-based techniques have been shown effective, achieving state-of-the-art results in a variety of tasks. Yet, even though these methods can in principle work directly on raw data, most methods still rely on hand-crafted descriptors at the input layer. In this work,… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

  45. arXiv:2001.10692  [pdf, other

    cs.CV

    ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

    Authors: Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas

    Abstract: 3D object detection has seen quick progress thanks to advances in deep learning on point clouds. A few recent works have even shown state-of-the-art performance with just point clouds input (e.g. VoteNet). However, point cloud data have inherent limitations. They are sparse, lack color information and often suffer from sensor noise. Images, on the other hand, have high resolution and rich texture.… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

  46. arXiv:2001.09650  [pdf, other

    cs.CV cs.CG cs.LG

    The Whole Is Greater Than the Sum of Its Nonrigid Parts

    Authors: Oshri Halimi, Ido Imanuel, Or Litany, Giovanni Trappolini, Emanuele Rodolà, Leonidas Guibas, Ron Kimmel

    Abstract: According to Aristotle, a philosopher in Ancient Greece, "the whole is greater than the sum of its parts". This observation was adopted to explain human perception by the Gestalt psychology school of thought in the twentieth century. Here, we claim that observing part of an object which was previously acquired as a whole, one could deal with both partial matching and shape completion in a holistic… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

    ACM Class: I.4.5

  47. arXiv:1904.09664  [pdf, other

    cs.CV

    Deep Hough Voting for 3D Object Detection in Point Clouds

    Authors: Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas

    Abstract: Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird's eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles… ▽ More

    Submitted 22 August, 2019; v1 submitted 21 April, 2019; originally announced April 2019.

    Comments: ICCV 2019

  48. arXiv:1812.02415  [pdf, other

    cs.CV

    Self-supervised Learning of Dense Shape Correspondence

    Authors: Oshri Halimi, Or Litany, Emanuele Rodolà, Alex Bronstein, Ron Kimmel

    Abstract: We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for ann… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  49. Class-Aware Fully-Convolutional Gaussian and Poisson Denoising

    Authors: Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein

    Abstract: We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful. Its structure allows to exploit the gradual nature of the denoising process, in which shallow layers handle local noise statistics, while deeper layers recover edges and enhance textures. Our method advances the state-of-the-art when trained for different noise levels and distributions (b… ▽ More

    Submitted 20 August, 2018; originally announced August 2018.

  50. arXiv:1806.00770  [pdf, other

    cs.LG cs.AI stat.ML

    Dual-Primal Graph Convolutional Networks

    Authors: Federico Monti, Oleksandr Shchur, Aleksandar Bojchevski, Or Litany, Stephan Günnemann, Michael M. Bronstein

    Abstract: In recent years, there has been a surge of interest in develo** deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (G… ▽ More

    Submitted 3 June, 2018; originally announced June 2018.