Skip to main content

Showing 1–36 of 36 results for author: Novotny, D

.
  1. arXiv:2407.02599  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Meta 3D Gen

    Authors: Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Benjamin Graham, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

    Abstract: We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously gener… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.02445  [pdf, other

    cs.CV cs.AI cs.GR

    Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

    Authors: Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

    Abstract: We present Meta 3D AssetGen (AssetGen), a significant advancement in text-to-3D generation which produces faithful, high-quality meshes with texture and material control. Compared to works that bake shading in the 3D object's appearance, AssetGen outputs physically-based rendering (PBR) materials, supporting realistic relighting. AssetGen generates first several views of the object with factored s… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project Page: https://assetgen.github.io

  3. arXiv:2404.19760  [pdf, other

    cs.CV cs.GR

    Lightplane: Highly-Scalable Components for Neural 3D Fields

    Authors: Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny

    Abstract: Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. However, current designs for these 2D-3D map** are memory-intensive, posing a significant bottleneck for existing methods and hindering new applications. In response, we propose a pair of highly scalable components for 3D neural fields: Lightplane Render and Splatter, w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project Page: https://lightplane.github.io/ Code: https://github.com/facebookresearch/lightplane

  4. arXiv:2403.17103  [pdf, other

    cs.CV

    Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

    Authors: Remy Sabathier, Niloy J. Mitra, David Novotny

    Abstract: We present a method to build animatable dog avatars from monocular videos. This is challenging as animals display a range of (unpredictable) non-rigid movements and have a variety of appearance details (e.g., fur, spots, tails). We develop an approach that links the video frames via a 4D solution that jointly solves for animal's pose variation, and its appearance (in a canonical pose). To this end… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  5. arXiv:2403.01807  [pdf, other

    cs.CV

    ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

    Authors: Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner

    Abstract: 3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data, which often results in non-photorealistic 3D objects without backgrounds. In this paper, we present a method that leverages pretrained… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024, project page: https://lukashoel.github.io/ViewDiff/, video: https://www.youtube.com/watch?v=SdjoCqHzMMk, code: https://github.com/facebookresearch/ViewDiff

  6. arXiv:2312.08744  [pdf, other

    cs.CV cs.GR

    GOEnFusion: Gradient Origin Encodings for 3D Forward Diffusion Models

    Authors: Animesh Karnewar, Andrea Vedaldi, Niloy J. Mitra, David Novotny

    Abstract: The recently introduced Forward-Diffusion method allows to train a 3D diffusion model using only 2D images for supervision. However, it does not easily generalise to different 3D representations and requires a computationally expensive auto-regressive sampling process to generate the underlying 3D scenes. In this paper, we propose GOEn: Gradient Origin Encoding (pronounced "gone"). GOEn can encode… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: project page at: https://holodiffusion.github.io/goenfusion

  7. arXiv:2312.04563  [pdf, other

    cs.CV cs.RO

    Visual Geometry Grounded Deep Structure From Motion

    Authors: Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny

    Abstract: Structure-from-motion (SfM) is a long-standing problem in the computer vision community, which aims to reconstruct the camera poses and 3D structure of a scene from a set of unconstrained 2D images. Classical frameworks solve this problem in an incremental manner by detecting and matching keypoints, registering images, triangulating 3D points, and conducting bundle adjustment. Recent research effo… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 8 figures. Project page: https://vggsfm.github.io/

  8. arXiv:2308.14244  [pdf, other

    cs.CV cs.GR

    HoloFusion: Towards Photo-realistic 3D Generative Modeling

    Authors: Animesh Karnewar, Niloy J. Mitra, Andrea Vedaldi, David Novotny

    Abstract: Diffusion-based image generators can now produce high-quality and diverse samples, but their success has yet to fully translate to 3D generation: existing diffusion methods can either generate low-resolution but 3D consistent outputs, or detailed 2D views of 3D objects but with potential structural defects and lacking view consistency or realism. We present HoloFusion, a method that combines the b… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 conference; project page at: https://holodiffusion.github.io/holofusion

  9. arXiv:2307.12067  [pdf, other

    cs.CV

    Replay: Multi-modal Multi-view Acted Videos for Casual Holography

    Authors: Roman Shapovalov, Yanir Kleiman, Ignacio Rocco, David Novotny, Andrea Vedaldi, Changan Chen, Filippos Kokkinos, Ben Graham, Natalia Neverova

    Abstract: We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 4000 minutes of footage and over 7 million… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted for ICCV 2023. Roman, Yanir, and Ignacio contributed equally

  10. arXiv:2306.15667  [pdf, other

    cs.CV

    PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment

    Authors: Jianyuan Wang, Christian Rupprecht, David Novotny

    Abstract: Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of… ▽ More

    Submitted 24 January, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: ICCV Camera Ready: revised Introduction and Related work, added a metric mAA (AUC), added some quantitative results, and added Appendix

  11. arXiv:2303.16509  [pdf, other

    cs.CV cs.GR

    HoloDiffusion: Training a 3D Diffusion Model using 2D Images

    Authors: Animesh Karnewar, Andrea Vedaldi, David Novotny, Niloy Mitra

    Abstract: Diffusion models have emerged as the best approach for generative modeling of 2D images. Part of their success is due to the possibility of training them on millions if not billions of images with a stable learning objective. However, extending these models to 3D remains difficult for two reasons. First, finding a large quantity of 3D training data is much more complex than for 2D images. Second,… ▽ More

    Submitted 21 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 conference; project page at: https://holodiffusion.github.io/

  12. arXiv:2303.11898  [pdf, other

    cs.CV cs.GR

    Real-time volumetric rendering of dynamic humans

    Authors: Ignacio Rocco, Iurii Makarov, Filippos Kokkinos, David Novotny, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos with accompanying parametric body fits. Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h. These speedups are obtained by using a lightweight deformation model solely based on linear blend sk… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Project page: https://real-time-humans.github.io/

  13. arXiv:2212.03236  [pdf, other

    cs.CV

    Self-Supervised Correspondence Estimation via Multiview Registration

    Authors: Mohamed El Banani, Ignacio Rocco, David Novotny, Andrea Vedaldi, Natalia Neverova, Justin Johnson, Benjamin Graham

    Abstract: Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlap** frames. To address this, we propose a self-supervised approach for cor… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted to WACV 2023. Project page: https://mbanani.github.io/syncmatch/

  14. arXiv:2211.03889  [pdf, other

    cs.CV

    Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories

    Authors: Samarth Sinha, Roman Shapovalov, Jeremy Reizenstein, Ignacio Rocco, Natalia Neverova, Andrea Vedaldi, David Novotny

    Abstract: Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introd… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  15. arXiv:2206.01916  [pdf, other

    cs.CV

    Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation

    Authors: Gil Avraham, Julian Straub, Tianwei Shen, Tsun-Yi Yang, Hugo Germain, Chris Sweeney, Vasileios Balntas, David Novotny, Daniel DeTone, Richard Newcombe

    Abstract: This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism. Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse. As opposed to existing invertible neural rendering systems which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic,… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: Published at CVPRW with supplementary material

  16. arXiv:2204.02296  [pdf, other

    cs.RO cs.CV

    iSDF: Real-Time Neural Signed Distance Fields for Robot Perception

    Authors: Joseph Ortiz, Alexander Clegg, **g Dong, Edgar Sucar, David Novotny, Michael Zollhoefer, Mustafa Mukadam

    Abstract: We present iSDF, a continual learning system for real-time signed distance field (SDF) reconstruction. Given a stream of posed depth images from a moving camera, it trains a randomly initialised neural network to map input 3D coordinate to approximate signed distance. The model is self-supervised by minimising a loss that bounds the predicted signed distance using the distance to the closest sampl… ▽ More

    Submitted 4 May, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Published in Robotics: Science and Systems (RSS) 2022. Project page: https://joeaortiz.github.io/iSDF/

  17. arXiv:2109.00512  [pdf, other

    cs.CV

    Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

    Authors: Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, David Novotny

    Abstract: Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in this field by collecting real-world data in a magnitude similar to the existing synthetic counterparts. The principal contribution of this work is thus a large-sc… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Journal ref: International Conference on Computer Vision, 2021

  18. arXiv:2109.00033  [pdf, other

    cs.CV

    DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

    Authors: Roman Shapovalov, David Novotny, Benjamin Graham, Patrick Labatut, Andrea Vedaldi

    Abstract: We tackle the problem of monocular 3D reconstruction of articulated objects like humans and animals. We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only. This is in stark contrast with previous deformable reconstruction methods that use parametric models such as SMPL pre-trained on a large dataset of 3D object scans… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

    Comments: Accepted for ICCV 2021

    ACM Class: I.4.5

  19. arXiv:2108.08931  [pdf, other

    cs.CV cs.GR cs.LG

    Augmenting Implicit Neural Shape Representations with Explicit Deformation Fields

    Authors: Matan Atzmon, David Novotny, Andrea Vedaldi, Yaron Lipman

    Abstract: Implicit neural representation is a recent approach to learn shape collections as zero level-sets of neural networks, where each shape is represented by a latent code. So far, the focus has been shape reconstruction, while shape generalization was mostly left to generic encoder-decoder or auto-decoder regularization. In this paper we advocate deformation-aware regularization for implicit neural… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  20. arXiv:2106.09758  [pdf, other

    cs.CV

    Discovering Relationships between Object Categories via Universal Canonical Maps

    Authors: Natalia Neverova, Artsiom Sanakoyeu, Patrick Labatut, David Novotny, Andrea Vedaldi

    Abstract: We tackle the problem of learning the geometry of multiple categories of deformable objects jointly. Recent work has shown that it is possible to learn a unified dense pose predictor for several categories of related objects. However, training such models requires to initialize inter-category correspondences by hand. This is suboptimal and the resulting models fail to maintain correct corresponden… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at CVPR 2021; Project page: https://gdude.de/discovering-3d-obj-rel

  21. arXiv:2106.09431  [pdf, other

    cs.CV

    NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

    Authors: Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi

    Abstract: We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an el… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021

  22. arXiv:2103.16552  [pdf, other

    cs.CV cs.LG

    Unsupervised Learning of 3D Object Categories from Videos in the Wild

    Authors: Philipp Henzler, Jeremy Reizenstein, Patrick Labatut, Roman Shapovalov, Tobias Ritschel, Andrea Vedaldi, David Novotny

    Abstract: Our goal is to learn a deep network that, given a small number of images of an object of a given category, reconstructs it in 3D. While several recent works have obtained analogous results using synthetic data or assuming the availability of 2D primitives such as keypoints, we are interested in working with challenging real data and with no manual annotations. We thus focus on learning a model fro… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

  23. arXiv:2011.12438  [pdf, other

    cs.CV

    Continuous Surface Embeddings

    Authors: Natalia Neverova, David Novotny, Vasil Khalidov, Marc Szafraniec, Patrick Labatut, Andrea Vedaldi

    Abstract: In this work, we focus on the task of learning and representing dense correspondences in deformable object categories. While this problem has been considered before, solutions so far have been rather ad-hoc for specific object types (i.e., humans), often with significant manual work involved. However, scaling the geometry understanding to all objects in nature requires more automated approaches th… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Comments: NeurIPS, 2020

  24. arXiv:2011.10359  [pdf, other

    cs.CV cs.LG eess.IV

    RidgeSfM: Structure from Motion via Robust Pairwise Matching Under Depth Uncertainty

    Authors: Benjamin Graham, David Novotny

    Abstract: We consider the problem of simultaneously estimating a dense depth map and camera pose for a large set of images of an indoor scene. While classical SfM pipelines rely on a two-step approach where cameras are first estimated using a bundle adjustment in order to ground the ensuing multi-view stereo stage, both our poses and dense reconstructions are a direct output of an altered bundle adjuster. T… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Presenting at 3DV 2020. Source code released at https://github.com/facebookresearch/RidgeSfM

  25. arXiv:2011.00980  [pdf, other

    cs.CV

    3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data

    Authors: Benjamin Biggs, Sébastien Ehrhadt, Hanbyul Joo, Benjamin Graham, Andrea Vedaldi, David Novotny

    Abstract: We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we aim at recovering several plausible reconstructions compatible with the input data. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020 Spotlight; 14 pages including supplementary

  26. arXiv:2008.12709  [pdf, other

    cs.CV

    Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction

    Authors: David Novotny, Roman Shapovalov, Andrea Vedaldi

    Abstract: We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate… ▽ More

    Submitted 6 December, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: Published at NeurIPS 2020

    ACM Class: I.4.5; I.4.10

  27. arXiv:2007.08501  [pdf, other

    cs.CV cs.GR cs.LG

    Accelerating 3D Deep Learning with PyTorch3D

    Authors: Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari

    Abstract: Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges invol… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: tech report

  28. arXiv:1909.02533  [pdf, other

    cs.CV cs.GR

    C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

    Authors: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

    Abstract: We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations. In order to achieve this factorization, we introduce a nov… ▽ More

    Submitted 15 October, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Added a link to the source code into the abstract

    Journal ref: IEEE/CVF International Conference on Computer Vision 2019

  29. Detecting and Receiving Phase Modulated Signals with a Rydberg Atom-Based Mixer

    Authors: Christopher L. Holloway, Matthew T. Simons, Joshua A. Gordon, David Novotny

    Abstract: Recently, we introduced a Rydberg-atom based mixer capable of detecting and measuring the phase of a radio-frequency field through the electromagnetically induced transparency (EIT) and Autler-Townes (AT) effect. The ability to measure phase with this mixer allows for an atom-based receiver to detect digital modulated communication signals. In this paper, we demonstrate detection and reception of… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: 5 pages, 6 figures,

  30. arXiv:1807.10712  [pdf, other

    cs.CV

    Semi-convolutional Operators for Instance Segmentation

    Authors: David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi

    Abstract: Object detection and instance segmentation are dominated by region-based methods such as Mask RCNN. However, there is a growing interest in reducing these problems to pixel labeling tasks, as the latter could be more efficient, could be integrated seamlessly in image-to-image network architectures as used in many other tasks, and could be more accurate for objects that are not well approximated by… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

    Comments: Accepted as a conference paper at ECCV 2018

  31. arXiv:1804.01552  [pdf, other

    cs.CV

    Self-supervised Learning of Geometrically Stable Features Through Probabilistic Introspection

    Authors: David Novotny, Samuel Albanie, Diane Larlus, Andrea Vedaldi

    Abstract: Self-supervision can dramatically cut back the amount of manually-labelled data required to train deep neural networks. While self-supervision has usually been considered for tasks such as image classification, in this paper we aim at extending it to geometry-oriented tasks such as semantic matching and part detection. We do so by building on several recent ideas in unsupervised landmark detection… ▽ More

    Submitted 4 April, 2018; originally announced April 2018.

    Comments: In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)

  32. arXiv:1705.03951  [pdf, other

    cs.CV

    Learning 3D Object Categories by Looking Around Them

    Authors: David Novotny, Diane Larlus, Andrea Vedaldi

    Abstract: Traditional approaches for learning 3D object categories use either synthetic data or manual supervision. In this paper, we propose a method which does not require manual annotations and is instead cued by observing objects from a moving vantage point. Our system builds on two innovations: a Siamese viewpoint factorization network that robustly aligns different videos together without explicitly c… ▽ More

    Submitted 2 December, 2021; v1 submitted 10 May, 2017; originally announced May 2017.

    Comments: Proceedings of the International Conference on Computer Vision, 2017

  33. arXiv:1704.04749  [pdf, other

    cs.CV

    AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

    Authors: David Novotny, Diane Larlus, Andrea Vedaldi

    Abstract: Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we p… ▽ More

    Submitted 16 April, 2017; originally announced April 2017.

    Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017

  34. arXiv:1607.01205  [pdf, other

    cs.CV

    Learning the semantic structure of objects from Web supervision

    Authors: David Novotny, Diane Larlus, Andrea Vedaldi

    Abstract: While recent research in image understanding has often focused on recognizing more types of objects, understanding more about the objects is just as important. Recognizing object parts and attributes has been extensively studied before, yet learning large space of such concepts remains elusive due to the high cost of providing detailed object annotations for supervision. The key contribution of th… ▽ More

    Submitted 2 December, 2021; v1 submitted 5 July, 2016; originally announced July 2016.

  35. arXiv:1504.07029  [pdf, other

    cs.CV

    Cascaded Sparse Spatial Bins for Efficient and Effective Generic Object Detection

    Authors: David Novotny, Jiri Matas

    Abstract: A novel efficient method for extraction of object proposals is introduced. Its "objectness" function exploits deep spatial pyramid features, a novel fast-to-compute HoG-based edge statistic and the EdgeBoxes score. The efficiency is achieved by the use of spatial bins in a novel combination with sparsity-inducing group normalized SVM. State-of-the-art recall performance is achieved on Pascal VOC07… ▽ More

    Submitted 13 October, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

    Comments: Accepted to ICCV15

  36. arXiv:1504.04763  [pdf, other

    cs.CV

    Understanding the Fisher Vector: a multimodal part model

    Authors: David Novotný, Diane Larlus, Florent Perronnin, Andrea Vedaldi

    Abstract: Fisher Vectors and related orderless visual statistics have demonstrated excellent performance in object detection, sometimes superior to established approaches such as the Deformable Part Models. However, it remains unclear how these models can capture complex appearance variations using visual codebooks of limited sizes and coarse geometric information. In this work, we propose to interpret Fish… ▽ More

    Submitted 18 April, 2015; originally announced April 2015.