-
OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation
Authors:
Gonca Yilmaz,
Songyou Peng,
Francis Engelmann,
Marc Pollefeys,
Hermann Blum
Abstract:
The advent of Vision Language Models (VLMs) transformed image understanding from closed-set classifications to dynamic image-language interactions, enabling open-vocabulary segmentation. Despite this flexibility, VLMs often fall behind closed-set classifiers in accuracy due to their reliance on ambiguous image captions and lack of domain-specific knowledge. We, therefore, introduce a new task doma…
▽ More
The advent of Vision Language Models (VLMs) transformed image understanding from closed-set classifications to dynamic image-language interactions, enabling open-vocabulary segmentation. Despite this flexibility, VLMs often fall behind closed-set classifiers in accuracy due to their reliance on ambiguous image captions and lack of domain-specific knowledge. We, therefore, introduce a new task domain adaptation for open-vocabulary segmentation, enhancing VLMs with domain-specific priors while preserving their open-vocabulary nature. Existing adaptation methods, when applied to segmentation tasks, improve performance on training queries but can reduce VLM performance on zero-shot text inputs. To address this shortcoming, we propose an approach that combines parameter-efficient prompt tuning with a triplet-loss-based training strategy. This strategy is designed to enhance open-vocabulary generalization while adapting to the visual domain. Our results outperform other parameter-efficient adaptation strategies in open-vocabulary segment classification tasks across indoor and outdoor datasets. Notably, our approach is the only one that consistently surpasses the original VLM on zero-shot queries. Our adapted VLMs can be plug-and-play integrated into existing open-vocabulary segmentation pipelines, improving OV-Seg by +6.0% mIoU on ADE20K, and OpenMask3D by +4.1% AP on ScanNet++ Offices without any changes to the methods.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds
Authors:
Oliver Lemke,
Zuria Bauer,
René Zurbrügg,
Marc Pollefeys,
Francis Engelmann,
Hermann Blum
Abstract:
In recent years, modern techniques in deep learning and large-scale datasets have led to impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This allows for accurate detection directly in 3D scenes, object- and environment-aware grasp prediction, as well as robust and repeatable robotic manipulation. This work aims to integrate these recent methods into a comprehe…
▽ More
In recent years, modern techniques in deep learning and large-scale datasets have led to impressive progress in 3D instance segmentation, grasp pose estimation, and robotics. This allows for accurate detection directly in 3D scenes, object- and environment-aware grasp prediction, as well as robust and repeatable robotic manipulation. This work aims to integrate these recent methods into a comprehensive framework for robotic interaction and manipulation in human-centric environments. Specifically, we leverage 3D reconstructions from a commodity 3D scanner for open-vocabulary instance segmentation, alongside grasp pose estimation, to demonstrate dynamic picking of objects, and opening of drawers. We show the performance and robustness of our model in two sets of real-world experiments including dynamic object retrieval and drawer opening, reporting a 51% and 82% success rate respectively. Code of our framework as well as videos are available on: https://spot-compose.github.io/.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views
Authors:
Francis Engelmann,
Fabian Manhardt,
Michael Niemeyer,
Keisuke Tateno,
Marc Pollefeys,
Federico Tombari
Abstract:
Large visual-language models (VLMs), like CLIP, enable open-set image segmentation to segment arbitrary concepts from an image in a zero-shot manner. This goes beyond the traditional closed-set assumption, i.e., where models can only segment classes from a pre-defined training set. More recently, first works on open-set segmentation in 3D scenes have appeared in the literature. These methods are h…
▽ More
Large visual-language models (VLMs), like CLIP, enable open-set image segmentation to segment arbitrary concepts from an image in a zero-shot manner. This goes beyond the traditional closed-set assumption, i.e., where models can only segment classes from a pre-defined training set. More recently, first works on open-set segmentation in 3D scenes have appeared in the literature. These methods are heavily influenced by closed-set 3D convolutional approaches that process point clouds or polygon meshes. However, these 3D scene representations do not align well with the image-based nature of the visual-language models. Indeed, point cloud and 3D meshes typically have a lower resolution than images and the reconstructed 3D scene geometry might not project well to the underlying 2D image sequences used to compute pixel-aligned CLIP features. To address these challenges, we propose OpenNeRF which naturally operates on posed images and directly encodes the VLM features within the NeRF. This is similar in spirit to LERF, however our work shows that using pixel-wise VLM features (instead of global CLIP features) results in an overall less complex architecture without the need for additional DINO regularization. Our OpenNeRF further leverages NeRF's ability to render novel views and extract open-set VLM features from areas that are not well observed in the initial posed images. For 3D point cloud segmentation on the Replica dataset, OpenNeRF outperforms recent open-vocabulary methods such as LERF and OpenScene by at least +4.9 mIoU.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
Authors:
Yang Miao,
Francis Engelmann,
Olga Vysotska,
Federico Tombari,
Marc Pollefeys,
Dániel Béla Baráth
Abstract:
We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases.…
▽ More
We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding
Authors:
Francis Engelmann,
Ayca Takmaz,
Jonas Schult,
Elisabetta Fedele,
Johanna Wald,
Songyou Peng,
Xi Wang,
Or Litany,
Siyu Tang,
Federico Tombari,
Marc Pollefeys,
Leonidas Guibas,
Hongbo Tian,
Chunjie Wang,
Xiaosheng Yan,
Bingwen Wang,
Xuanyang Zhang,
Xiao Liu,
Phuc Nguyen,
Khoi Nguyen,
Anh Tran,
Cuong Pham,
Zhening Huang,
Xiaoyang Wu,
Xi Chen
, et al. (3 additional authors not shown)
Abstract:
This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and map**. We provide an overview of the chall…
▽ More
This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and map**. We provide an overview of the challenge hosted at the workshop, present the challenge dataset, the evaluation methodology, and brief descriptions of the winning methods. For additional details, please see https://opensun3d.github.io/index_iccv23.html.
△ Less
Submitted 17 March, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
ICGNet: A Unified Approach for Instance-Centric Gras**
Authors:
René Zurbrügg,
Yifan Liu,
Francis Engelmann,
Suryansh Kumar,
Marco Hutter,
Vaishakh Patil,
Fisher Yu
Abstract:
Accurate gras** is the key to several robotic tasks including assembly and household robotics. Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps. These grasps need to be compliant with the local object geometry. Second, for each proposed…
▽ More
Accurate gras** is the key to several robotic tasks including assembly and household robotics. Executing a successful grasp in a cluttered environment requires multiple levels of scene understanding: First, the robot needs to analyze the geometric properties of individual objects to find feasible grasps. These grasps need to be compliant with the local object geometry. Second, for each proposed grasp, the robot needs to reason about the interactions with other objects in the scene. Finally, the robot must compute a collision-free grasp trajectory while taking into account the geometry of the target object. Most grasp detection algorithms directly predict grasp poses in a monolithic fashion, which does not capture the composability of the environment. In this paper, we introduce an end-to-end architecture for object-centric gras**. The method uses pointcloud data from a single arbitrary viewing direction as an input and generates an instance-centric representation for each partially observed object in the scene. This representation is further used for object reconstruction and grasp detection in cluttered table-top scenes. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets, indicating superior performance for gras** and reconstruction. Additionally, we demonstrate real-world applicability by decluttering scenes with varying numbers of objects.
△ Less
Submitted 9 May, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
Authors:
Rui Huang,
Songyou Peng,
Ayca Takmaz,
Federico Tombari,
Marc Pollefeys,
Shiji Song,
Gao Huang,
Francis Engelmann
Abstract:
Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets. Such manual annotations are labor-intensive, and often lack fine-grained details. Importantly, models trained on this data typically struggle to recognize object classes beyond the annotated classes, i.e., they do not generalize well to unseen domains and require additional domain-specific annot…
▽ More
Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets. Such manual annotations are labor-intensive, and often lack fine-grained details. Importantly, models trained on this data typically struggle to recognize object classes beyond the annotated classes, i.e., they do not generalize well to unseen domains and require additional domain-specific annotations. In contrast, 2D foundation models demonstrate strong generalization and impressive zero-shot abilities, inspiring us to incorporate these characteristics from 2D models into 3D models. Therefore, we explore the use of image segmentation foundation models to automatically generate training labels for 3D segmentation. We propose Segment3D, a method for class-agnostic 3D scene segmentation that produces high-quality 3D segmentation masks. It improves over existing 3D segmentation models (especially on fine-grained masks), and enables easily adding new training data to further boost the segmentation performance -- all without the need for manual training labels.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction
Authors:
Silvan Weder,
Francis Engelmann,
Johannes L. Schönberger,
Akihito Seki,
Marc Pollefeys,
Martin R. Oswald
Abstract:
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality. To overcome the inherent challenges of online methods, we make two main contributions. First, to effectively extract information from the…
▽ More
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality. To overcome the inherent challenges of online methods, we make two main contributions. First, to effectively extract information from the input RGB-D video stream, we jointly estimate geometry and semantic labels per frame in 3D. A key focus of our approach is to reason about semantic entities both in the 2D input and the local 3D domain to leverage differences in spatial context and network architectures. Our method predicts 2D features using an off-the-shelf segmentation network. The extracted 2D features are refined by a lightweight 3D network to enable reasoning about the local 3D structure. Second, to efficiently deal with an infinite stream of input RGB-D frames, a subsequent network serves as a temporal expert predicting the incremental scene updates by leveraging 2D, 3D, and past information in a learned manner. These updates are then integrated into a global scene representation. Using these main contributions, our method can enable scenarios with real-time constraints and can scale to arbitrary scene sizes by processing and updating the scene only in a local region defined by the new measurement. Our experiments demonstrate improved results compared to existing online methods that purely operate in local regions and show that complementary sources of information can boost the performance. We provide a thorough ablation study on the benefits of different architectural as well as algorithmic design decisions. Our method yields competitive results on the popular ScanNet benchmark and SceneNN dataset.
△ Less
Submitted 3 December, 2023; v1 submitted 29 November, 2023;
originally announced November 2023.
-
LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories
Authors:
Silvan Weder,
Hermann Blum,
Francis Engelmann,
Marc Pollefeys
Abstract:
Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-ar…
▽ More
Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-art segmentation models and 3D lifting through neural rendering. We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset. Code and models are available at https://labelmaker.org
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Authors:
Ayça Takmaz,
Elisabetta Fedele,
Robert W. Sumner,
Marc Pollefeys,
Federico Tombari,
Francis Engelmann
Abstract:
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related…
▽ More
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related to a wide variety of objects. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features for each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods cannot separate multiple object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. Experiments and ablation studies on ScanNet200 and Replica show that OpenMask3D outperforms other open-vocabulary methods, especially on the long-tail distribution. Qualitative experiments further showcase OpenMask3D's ability to segment object properties based on free-form queries describing geometry, affordances, and materials.
△ Less
Submitted 29 October, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
Authors:
Yuanwen Yue,
Sabarinath Mahadevan,
Jonas Schult,
Francis Engelmann,
Bastian Leibe,
Konrad Schindler,
Theodora Kontogianni
Abstract:
During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments obj…
▽ More
During interactive segmentation, a model and a user work together to delineate objects of interest in a 3D point cloud. In an iterative process, the model assigns each data point to an object (or the background), while the user corrects errors in the resulting segmentation and feeds them back into the model. The current best practice formulates the problem as binary classification and segments objects one at a time. The model expects the user to provide positive clicks to indicate regions wrongly assigned to the background and negative clicks on regions wrongly assigned to the object. Sequentially visiting objects is wasteful since it disregards synergies between objects: a positive click for a given object can, by definition, serve as a negative click for nearby objects. Moreover, a direct competition between adjacent objects can speed up the identification of their common boundary. We introduce AGILE3D, an efficient, attention-based model that (1) supports simultaneous segmentation of multiple 3D objects, (2) yields more accurate segmentation masks with fewer user clicks, and (3) offers faster inference. Our core idea is to encode user clicks as spatial-temporal queries and enable explicit interactions between click queries as well as between them and the 3D scene through a click attention module. Every time new clicks are added, we only need to run a lightweight decoder that produces updated segmentation masks. In experiments with four different 3D point cloud datasets, AGILE3D sets a new state-of-the-art. Moreover, we also verify its practicality in real-world setups with real user studies.
△ Less
Submitted 10 April, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
3D Segmentation of Humans in Point Clouds with Synthetic Data
Authors:
Ayça Takmaz,
Jonas Schult,
Irem Kaftan,
Mertcan Akçay,
Bastian Leibe,
Robert Sumner,
Francis Engelmann,
Siyu Tang
Abstract:
Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation. Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated train…
▽ More
Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation. Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated training data of humans interacting with 3D scenes. We address this challenge and propose a framework for generating training data of synthetic humans interacting with real 3D scenes. Furthermore, we propose a novel transformer-based model, Human3D, which is the first end-to-end model for segmenting multiple human instances and their body-parts in a unified manner. The key advantage of our synthetic data generation framework is its ability to generate diverse and realistic human-scene interactions, with highly accurate ground truth. Our experiments show that pre-training on synthetic data improves performance on a wide variety of 3D human segmentation tasks. Finally, we demonstrate that Human3D outperforms even task-specific state-of-the-art 3D segmentation methods.
△ Less
Submitted 18 August, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries
Authors:
Yuanwen Yue,
Theodora Kontogianni,
Konrad Schindler,
Francis Engelmann
Abstract:
We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that genera…
▽ More
We address 2D floorplan reconstruction from 3D scans. Existing approaches typically employ heuristically designed multi-stage pipelines. Instead, we formulate floorplan reconstruction as a single-stage structured prediction task: find a variable-size set of polygons, which in turn are variable-length sequences of ordered vertices. To solve it we develop a novel Transformer architecture that generates polygons of multiple rooms in parallel, in a holistic manner without hand-crafted intermediate stages. The model features two-level queries for polygons and corners, and includes polygon matching to make the network end-to-end trainable. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD, along with significantly faster inference than previous methods. Moreover, it can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows. Our code and models are available at: https://github.com/ywyue/RoomFormer.
△ Less
Submitted 27 March, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
Authors:
Jonas Schult,
Francis Engelmann,
Alexander Hermans,
Or Litany,
Siyu Tang,
Bastian Leibe
Abstract:
Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic T…
▽ More
Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques. Building on the successes of recent Transformer-based methods for object detection and image segmentation, we propose the first Transformer-based approach for 3D semantic instance segmentation. We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. In our model called Mask3D each object instance is represented as an instance query. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Combined with point features, the instance queries directly yield all instance masks in parallel. Mask3D has several advantages over current state-of-the-art approaches, since it neither relies on (1) voting schemes which require hand-selected geometric properties (such as centers) nor (2) geometric grou** mechanisms requiring manually-tuned hyper-parameters (e.g. radii) and (3) enables a loss that directly optimizes instance masks. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP), STPLS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP).
△ Less
Submitted 12 April, 2023; v1 submitted 6 October, 2022;
originally announced October 2022.
-
4D-StOP: Panoptic Segmentation of 4D LiDAR using Spatio-temporal Object Proposal Generation and Aggregation
Authors:
Lars Kreuzberg,
Idil Esen Zulfikar,
Sabarinath Mahadevan,
Francis Engelmann,
Bastian Leibe
Abstract:
In this work, we present a new paradigm, called 4D-StOP, to tackle the task of 4D Panoptic LiDAR Segmentation. 4D-StOP first generates spatio-temporal proposals using voting-based center predictions, where each point in the 4D volume votes for a corresponding center. These tracklet proposals are further aggregated using learned geometric features. The tracklet aggregation method effectively genera…
▽ More
In this work, we present a new paradigm, called 4D-StOP, to tackle the task of 4D Panoptic LiDAR Segmentation. 4D-StOP first generates spatio-temporal proposals using voting-based center predictions, where each point in the 4D volume votes for a corresponding center. These tracklet proposals are further aggregated using learned geometric features. The tracklet aggregation method effectively generates a video-level 4D scene representation over the entire space-time volume. This is in contrast to existing end-to-end trainable state-of-the-art approaches which use spatio-temporal embeddings that are represented by Gaussian probability distributions. Our voting-based tracklet generation method followed by geometric feature-based aggregation generates significantly improved panoptic LiDAR segmentation quality when compared to modeling the entire 4D volume using Gaussian probability distributions. 4D-StOP achieves a new state-of-the-art when applied to the SemanticKITTI test dataset with a score of 63.9 LSTQ, which is a large (+7%) improvement compared to current best-performing end-to-end trainable methods. The code and pre-trained models are available at: https://github.com/LarsKreuzberg/4D-StOP.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Evaluating the Future Device Security Risk Indicator for Hundreds of IoT Devices
Authors:
Pascal Oser,
Felix Engelmann,
Stefan Lüders,
Frank Kargl
Abstract:
IoT devices are present in many, especially corporate and sensitive, networks and regularly introduce security risks due to slow vendor responses to vulnerabilities and high difficulty of patching. In this paper, we want to evaluate to what extent the development of future risk of IoT devices due to new and unpatched vulnerabilities can be predicted based on historic information. For this analysis…
▽ More
IoT devices are present in many, especially corporate and sensitive, networks and regularly introduce security risks due to slow vendor responses to vulnerabilities and high difficulty of patching. In this paper, we want to evaluate to what extent the development of future risk of IoT devices due to new and unpatched vulnerabilities can be predicted based on historic information. For this analysis, we build on existing prediction algorithms available in the SAFER framework (prophet and ARIMA) which we evaluate by means of a large data-set of vulnerabilities and patches from 793 IoT devices. Our analysis shows that the SAFER framework can predict a correct future risk for 91% of the devices, demonstrating its applicability. We conclude that this approach is a reliable means for network operators to efficiently detect and act on risks emanating from IoT devices in their networks.
△ Less
Submitted 16 September, 2022; v1 submitted 8 September, 2022;
originally announced September 2022.
-
Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using Bounding Boxes
Authors:
Julian Chibane,
Francis Engelmann,
Tuan Anh Tran,
Gerard Pons-Moll
Abstract:
Current 3D segmentation methods heavily rely on large-scale point-cloud datasets, which are notoriously laborious to annotate. Few attempts have been made to circumvent the need for dense per-point annotations. In this work, we look at weakly-supervised 3D semantic instance segmentation. The key idea is to leverage 3D bounding box labels which are easier and faster to annotate. Indeed, we show tha…
▽ More
Current 3D segmentation methods heavily rely on large-scale point-cloud datasets, which are notoriously laborious to annotate. Few attempts have been made to circumvent the need for dense per-point annotations. In this work, we look at weakly-supervised 3D semantic instance segmentation. The key idea is to leverage 3D bounding box labels which are easier and faster to annotate. Indeed, we show that it is possible to train dense segmentation models using only bounding box labels. At the core of our method, \name{}, lies a deep model, inspired by classical Hough voting, that directly votes for bounding box parameters, and a clustering method specifically tailored to bounding box votes. This goes beyond commonly used center votes, which would not fully exploit the bounding box annotations. On ScanNet test, our weakly supervised model attains leading performance among other weakly supervised approaches (+18 mAP@50). Remarkably, it also achieves 97% of the mAP@50 score of current fully supervised models. To further illustrate the practicality of our work, we train Box2Mask on the recently released ARKitScenes dataset which is annotated with 3D bounding boxes only, and show, for the first time, compelling 3D instance segmentation masks.
△ Less
Submitted 31 October, 2023; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Confidential Token-Based License Management
Authors:
Felix Engelmann,
Jan Philip Speichert,
Ralf God,
Frank Kargl,
Christoph Bösch
Abstract:
In a global economy with many competitive participants, licensing and tracking of 3D printed parts is desirable if not mandatory for many use-cases. We investigate a blockchain-based approach, as blockchains provide many attractive features, like decentralized architecture and high security assurances. An often neglected aspect of the product life-cycle management is the confidentiality of transac…
▽ More
In a global economy with many competitive participants, licensing and tracking of 3D printed parts is desirable if not mandatory for many use-cases. We investigate a blockchain-based approach, as blockchains provide many attractive features, like decentralized architecture and high security assurances. An often neglected aspect of the product life-cycle management is the confidentiality of transactions to hide valuable business information from competitors. To solve the combined problem of trust and confidentiality, we present a confidential licensing and tracking system which works on any publicly verifiable, token-based blockchain that supports tokens of different types representing licenses or attributes of parts. Together with the secure integration of a unique eID in each part, our system provides an efficient, immutable and authenticated transaction log scalable to thousands of transactions per second. With our confidential Token-Based License Management system (cTLM), large industries such as automotive or aviation can license and trace all parts confidentially.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Mix3D: Out-of-Context Data Augmentation for 3D Scenes
Authors:
Alexey Nekrasov,
Jonas Schult,
Or Litany,
Bastian Leibe,
Francis Engelmann
Abstract:
We present Mix3D, a data augmentation technique for segmenting large-scale 3D scenes. Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. However, strong contextual priors can have detrimental implications like mistaking a pedestrian crossing the street for…
▽ More
We present Mix3D, a data augmentation technique for segmenting large-scale 3D scenes. Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. However, strong contextual priors can have detrimental implications like mistaking a pedestrian crossing the street for a car. In this work, we focus on the importance of balancing global scene context and local geometry, with the goal of generalizing beyond the contextual priors in the training set. In particular, we propose a "mixing" technique which creates new training samples by combining two augmented scenes. By doing so, object instances are implicitly placed into novel out-of-context environments and therefore making it harder for models to rely on scene context alone, and instead infer semantics from local structure as well. We perform detailed analysis to understand the importance of global context, local structures and the effect of mixing scenes. In experiments, we show that models trained with Mix3D profit from a significant performance boost on indoor (ScanNet, S3DIS) and outdoor datasets (SemanticKITTI). Mix3D can be trivially used with any existing method, e.g., trained with Mix3D, MinkowskiNet outperforms all prior state-of-the-art methods by a significant margin on the ScanNet test benchmark 78.1 mIoU. Code is available at: https://nekrasov.dev/mix3d/
△ Less
Submitted 29 November, 2021; v1 submitted 5 October, 2021;
originally announced October 2021.
-
PeQES: A Platform for Privacy-enhanced Quantitative Empirical Studies
Authors:
Dominik Meißner,
Felix Engelmann,
Frank Kargl,
Benjamin Erb
Abstract:
Empirical sciences and in particular psychology suffer a methodological crisis due to the non-reproducibility of results, and in rare cases, questionable research practices. Pre-registered studies and the publication of raw data sets have emerged as effective countermeasures. However, this approach represents only a conceptual procedure and may in some cases exacerbate privacy issues associated wi…
▽ More
Empirical sciences and in particular psychology suffer a methodological crisis due to the non-reproducibility of results, and in rare cases, questionable research practices. Pre-registered studies and the publication of raw data sets have emerged as effective countermeasures. However, this approach represents only a conceptual procedure and may in some cases exacerbate privacy issues associated with data publications. We establish a novel, privacy-enhanced workflow for pre-registered studies. We also introduce PeQES, a corresponding platform that technically enforces the appropriate execution while at the same time protecting the participants' data from unauthorized use or data repurposing. Our PeQES prototype proves the overall feasibility of our privacy-enhanced workflow while introducing only a negligible performance overhead for data acquisition and data analysis of an actual study. Using trusted computing mechanisms, PeQES is the first platform to enable privacy-enhanced studies, to ensure the integrity of study protocols, and to safeguard the confidentiality of participants' data at the same time.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
From Points to Multi-Object 3D Reconstruction
Authors:
Francis Engelmann,
Konstantinos Rematas,
Bastian Leibe,
Vittorio Ferrari
Abstract:
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image. The key idea is to optimize for detection, alignment and shape jointly over all objects in the RGB image, while focusing on realistic and physically plausible reconstructions. To this end, we propose a keypoint detector that localizes objects as center points and directly predicts all object properties, incl…
▽ More
We propose a method to detect and reconstruct multiple 3D objects from a single RGB image. The key idea is to optimize for detection, alignment and shape jointly over all objects in the RGB image, while focusing on realistic and physically plausible reconstructions. To this end, we propose a keypoint detector that localizes objects as center points and directly predicts all object properties, including 9-DoF bounding boxes and 3D shapes -- all in a single forward pass. The proposed method formulates 3D shape reconstruction as a shape selection problem, i.e. it selects among exemplar shapes from a given database. This makes it agnostic to shape representations, which enables a lightweight reconstruction of realistic and visually-pleasing shapes based on CAD-models, while the training objective is formulated around point clouds and voxel representations. A collision-loss promotes non-intersecting objects, further increasing the reconstruction realism. Given the RGB image, the presented approach performs lightweight reconstruction in a single-stage, it is real-time capable, fully differentiable and end-to-end trainable. Our experiments compare multiple approaches for 9-DoF bounding box estimation, evaluate the novel shape-selection mechanism and compare to recent methods in terms of 3D bounding box estimation and 3D shape reconstruction quality.
△ Less
Submitted 21 June, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
SAMP: Shape and Motion Priors for 4D Vehicle Reconstruction
Authors:
Francis Engelmann,
Jörg Stückler,
Bastian Leibe
Abstract:
Inferring the pose and shape of vehicles in 3D from a movable platform still remains a challenging task due to the projective sensing principle of cameras, difficult surface properties e.g. reflections or transparency, and illumination changes between images. In this paper, we propose to use 3D shape and motion priors to regularize the estimation of the trajectory and the shape of vehicles in sequ…
▽ More
Inferring the pose and shape of vehicles in 3D from a movable platform still remains a challenging task due to the projective sensing principle of cameras, difficult surface properties e.g. reflections or transparency, and illumination changes between images. In this paper, we propose to use 3D shape and motion priors to regularize the estimation of the trajectory and the shape of vehicles in sequences of stereo images. We represent shapes by 3D signed distance functions and embed them in a low-dimensional manifold. Our optimization method allows for imposing a common shape across all image observations along an object track. We employ a motion model to regularize the trajectory to plausible object motions. We evaluate our method on the KITTI dataset and show state-of-the-art results in terms of shape reconstruction and pose estimation accuracy.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes
Authors:
Jonas Schult,
Francis Engelmann,
Theodora Kontogianni,
Bastian Leibe
Abstract:
We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. The first type, geodesic convolutions, defines the kernel weights over mesh surfaces or graphs. That is, the convolutional kernel weights are mapped to the local surface of a given mesh. The second type, Euclidean convolutions, is independent o…
▽ More
We propose DualConvMesh-Nets (DCM-Net) a family of deep hierarchical convolutional networks over 3D geometric data that combines two types of convolutions. The first type, geodesic convolutions, defines the kernel weights over mesh surfaces or graphs. That is, the convolutional kernel weights are mapped to the local surface of a given mesh. The second type, Euclidean convolutions, is independent of any underlying mesh structure. The convolutional kernel is applied on a neighborhood obtained from a local affinity representation based on the Euclidean distance between 3D points. Intuitively, geodesic convolutions can easily separate objects that are spatially close but have disconnected surfaces, while Euclidean convolutions can represent interactions between nearby objects better, as they are oblivious to object surfaces. To realize a multi-resolution architecture, we borrow well-established mesh simplification methods from the geometry processing domain and adapt them to define mesh-preserving pooling and unpooling operations. We experimentally show that combining both types of convolutions in our architecture leads to significant performance gains for 3D semantic segmentation, and we report competitive results on three scene segmentation benchmarks. Our models and code are publicly available.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation
Authors:
Francis Engelmann,
Martin Bokeloh,
Alireza Fathi,
Bastian Leibe,
Matthias Nießner
Abstract:
We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. Then, we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces inter-propo…
▽ More
We present 3D-MPA, a method for instance segmentation on 3D point clouds. Given an input point cloud, we propose an object-centric approach where each point votes for its object center. We sample object proposals from the predicted object centers. Then, we learn proposal features from grouped point features that voted for the same object center. A graph convolutional network introduces inter-proposal relations, providing higher-level feature learning in addition to the lower-level point features. Each proposal comprises a semantic label, a set of associated points over which we define a foreground-background mask, an objectness score and aggregation features. Previous works usually perform non-maximum-suppression (NMS) over proposals to obtain the final object detections or semantic instances. However, NMS can discard potentially correct predictions. Instead, our approach keeps all proposals and groups them together based on the learned aggregation features. We show that grou** proposals improves over NMS and outperforms previous state-of-the-art methods on the tasks of 3D object detection and semantic instance segmentation on the ScanNetV2 benchmark and the S3DIS dataset.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds
Authors:
Francis Engelmann,
Theodora Kontogianni,
Bastian Leibe
Abstract:
In this work, we propose Dilated Point Convolutions (DPC). In a thorough ablation study, we show that the receptive field size is directly related to the performance of 3D point cloud processing tasks, including semantic segmentation and object classification. Point convolutions are widely used to efficiently process 3D data representations such as point clouds or graphs. However, we observe that…
▽ More
In this work, we propose Dilated Point Convolutions (DPC). In a thorough ablation study, we show that the receptive field size is directly related to the performance of 3D point cloud processing tasks, including semantic segmentation and object classification. Point convolutions are widely used to efficiently process 3D data representations such as point clouds or graphs. However, we observe that the receptive field size of recent point convolutional networks is inherently limited. Our dilated point convolutions alleviate this issue, they significantly increase the receptive field size of point convolutions. Importantly, our dilation mechanism can easily be integrated into most existing point convolutional networks. To evaluate the resulting network architectures, we visualize the receptive field and report competitive scores on popular point cloud benchmarks.
△ Less
Submitted 23 May, 2020; v1 submitted 28 July, 2019;
originally announced July 2019.
-
3D-BEVIS: Bird's-Eye-View Instance Segmentation
Authors:
Cathrin Elich,
Francis Engelmann,
Theodora Kontogianni,
Bastian Leibe
Abstract:
Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point cloud…
▽ More
Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods scale linearly with the number of points by processing local sub-parts of a scene individually. However, to perform instance segmentation by clustering, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information from an intermediate bird's-eye view representation.
△ Less
Submitted 1 August, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Know What Your Neighbors Do: 3D Semantic Segmentation of Point Clouds
Authors:
Francis Engelmann,
Theodora Kontogianni,
Jonas Schult,
Bastian Leibe
Abstract:
In this paper, we present a deep learning architecture which addresses the problem of 3D semantic segmentation of unstructured point clouds. Compared to previous work, we introduce grou** techniques which define point neighborhoods in the initial world space and the learned feature space. Neighborhoods are important as they allow to compute local or global point features depending on the spatial…
▽ More
In this paper, we present a deep learning architecture which addresses the problem of 3D semantic segmentation of unstructured point clouds. Compared to previous work, we introduce grou** techniques which define point neighborhoods in the initial world space and the learned feature space. Neighborhoods are important as they allow to compute local or global point features depending on the spatial extend of the neighborhood. Additionally, we incorporate dedicated loss functions to further structure the learned point feature space: the pairwise distance loss and the centroid loss. We show how to apply these mechanisms to the task of 3D semantic segmentation of point clouds and report state-of-the-art performance on indoor and outdoor datasets.
△ Less
Submitted 8 December, 2018; v1 submitted 2 October, 2018;
originally announced October 2018.
-
Coloured Ring Confidential Transactions
Authors:
Felix Engelmann,
Frank Kargl,
Christoph Bösch
Abstract:
Privacy in block-chains is considered second to functionality, but a vital requirement for many new applications, e.g., in the industrial environment. We propose a novel transaction type, which enables privacy preserving trading of independent assets on a common block-chain. This is achieved by extending the ring confidential transaction with an additional commitment to a colour and a publicly ver…
▽ More
Privacy in block-chains is considered second to functionality, but a vital requirement for many new applications, e.g., in the industrial environment. We propose a novel transaction type, which enables privacy preserving trading of independent assets on a common block-chain. This is achieved by extending the ring confidential transaction with an additional commitment to a colour and a publicly verifiable proof of conservation. With our coloured confidential ring signatures, new token types can be introduced and transferred by any participant using the same sized anonymity set as single-token privacy aware block-chains. Thereby, our system facilitates tracking assets on an immutable ledger without compromising the confidentiality of transactions.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds
Authors:
Francis Engelmann,
Theodora Kontogianni,
Alexander Hermans,
Bastian Leibe
Abstract:
Deep learning approaches have made tremendous progress in the field of semantic segmentation over the past few years. However, most current approaches operate in the 2D image space. Direct semantic segmentation of unstructured 3D point clouds is still an open research problem. The recently proposed PointNet architecture presents an interesting step ahead in that it can operate on unstructured poin…
▽ More
Deep learning approaches have made tremendous progress in the field of semantic segmentation over the past few years. However, most current approaches operate in the 2D image space. Direct semantic segmentation of unstructured 3D point clouds is still an open research problem. The recently proposed PointNet architecture presents an interesting step ahead in that it can operate on unstructured point clouds, achieving encouraging segmentation results. However, it subdivides the input points into a grid of blocks and processes each such block individually. In this paper, we investigate the question how such an architecture can be extended to incorporate larger-scale spatial context. We build upon PointNet and propose two extensions that enlarge the receptive field over the 3D scene. We evaluate the proposed strategies on challenging indoor and outdoor datasets and show improved results in both scenarios.
△ Less
Submitted 18 December, 2019; v1 submitted 5 February, 2018;
originally announced February 2018.
-
Towards an Economic Analysis of Routing in Payment Channel Networks
Authors:
Felix Engelmann,
Florian Glaser,
Henning Kopp,
Frank Kargl,
Christof Weinhardt
Abstract:
Payment channel networks are supposed to overcome technical scalability limitations of blockchain infrastructure by employing a special overlay network with fast payment confirmation and only sporadic settlement of netted transactions on the blockchain. However, they introduce economic routing constraints that limit decentralized scalability and are currently not well understood. In this paper, we…
▽ More
Payment channel networks are supposed to overcome technical scalability limitations of blockchain infrastructure by employing a special overlay network with fast payment confirmation and only sporadic settlement of netted transactions on the blockchain. However, they introduce economic routing constraints that limit decentralized scalability and are currently not well understood. In this paper, we model the economic incentives for participants in payment channel networks. We provide the first formal model of payment channel economics and analyze how the cheapest path can be found. Additionally, our simulation assesses the long-term evolution of a payment channel network. We find that even for small routing fees, sometimes it is cheaper to settle the transaction directly on the blockchain.
△ Less
Submitted 7 November, 2017;
originally announced November 2017.
-
Blackchain: Scalability for Resource-Constrained Accountable Vehicle-to-X Communication
Authors:
Rens Wouter van der Heijden,
Felix Engelmann,
David Mödinger,
Franziska Schönig,
Frank Kargl
Abstract:
In this paper, we propose a new Blockchain-based message and revocation accountability system called Blackchain. Combining a distributed ledger with existing mechanisms for security in V2X communication systems, we design a distributed event data recorder (EDR) that satisfies traditional accountability requirements by providing a compressed global state. Unlike previous approaches, our distributed…
▽ More
In this paper, we propose a new Blockchain-based message and revocation accountability system called Blackchain. Combining a distributed ledger with existing mechanisms for security in V2X communication systems, we design a distributed event data recorder (EDR) that satisfies traditional accountability requirements by providing a compressed global state. Unlike previous approaches, our distributed ledger solution provides an accountable revocation mechanism without requiring trust in a single misbehavior authority, instead allowing a collaborative and transparent decision making process through Blackchain. This makes Blackchain an attractive alternative to existing solutions for revocation in a Security Credential Management System (SCMS), which suffer from the traditional disadvantages of PKIs, notably including centralized trust. Our proposal becomes scalable through the use of hierarchical consensus: individual vehicles dynamically create clusters, which then provide their consensus decisions as input for road-side units (RSUs), which in turn publish their results to misbehavior authorities. This authority, which is traditionally a single entity in the SCMS, responsible for the integrity of the entire V2X network, is now a set of authorities that transparently perform a revocation, whose result is then published in a global Blackchain state. This state can be used to prevent the issuance of certificates to previously malicious users, and also prevents the authority from misbehaving through the transparency implied by a global system state.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
A computational investigation of sources of variability in sentence comprehension difficulty in aphasia
Authors:
Paul Mätzig,
Shravan Vasishth,
Felix Engelmann,
David Caplan
Abstract:
We present a computational evaluation of three hypotheses about sources of deficit in sentence comprehension in aphasia: slowed processing, intermittent deficiency, and resource reduction. The ACT-R based Lewis and Vasishth (2005) model is used to implement these three proposals. Slowed processing is implemented as slowed default production-rule firing time; intermittent deficiency as increased ra…
▽ More
We present a computational evaluation of three hypotheses about sources of deficit in sentence comprehension in aphasia: slowed processing, intermittent deficiency, and resource reduction. The ACT-R based Lewis and Vasishth (2005) model is used to implement these three proposals. Slowed processing is implemented as slowed default production-rule firing time; intermittent deficiency as increased random noise in activation of chunks in memory; and resource reduction as reduced goal activation. As data, we considered subject vs. object rela- tives whose matrix clause contained either an NP or a reflexive, presented in a self-paced listening modality to 56 individuals with aphasia (IWA) and 46 matched controls. The participants heard the sentences and carried out a picture verification task to decide on an interpretation of the sentence. These response accuracies are used to identify the best parameters (for each participant) that correspond to the three hypotheses mentioned above. We show that controls have more tightly clustered (less variable) parameter values than IWA; specifically, compared to controls, among IWA there are more individuals with low goal activations, high noise, and slow default action times. This suggests that (i) individual patients show differential amounts of deficit along the three dimensions of slowed processing, intermittent deficient, and resource reduction, (ii) overall, there is evidence for all three sources of deficit playing a role, and (iii) IWA have a more variable range of parameter values than controls. In sum, this study contributes a proof of concept of a quantitative implementation of, and evidence for, these three accounts of comprehension deficits in aphasia.
△ Less
Submitted 31 May, 2017; v1 submitted 14 March, 2017;
originally announced March 2017.
-
Keyframe-Based Visual-Inertial Online SLAM with Relocalization
Authors:
Anton Kasyanov,
Francis Engelmann,
Jörg Stückler,
Bastian Leibe
Abstract:
Complementing images with inertial measurements has become one of the most popular approaches to achieve highly accurate and robust real-time camera pose tracking. In this paper, we present a keyframe-based approach to visual-inertial simultaneous localization and map** (SLAM) for monocular and stereo cameras. Our visual-inertial SLAM system is based on a real-time capable visual-inertial odomet…
▽ More
Complementing images with inertial measurements has become one of the most popular approaches to achieve highly accurate and robust real-time camera pose tracking. In this paper, we present a keyframe-based approach to visual-inertial simultaneous localization and map** (SLAM) for monocular and stereo cameras. Our visual-inertial SLAM system is based on a real-time capable visual-inertial odometry method that provides locally consistent trajectory and map estimates. We achieve global consistency in the estimate through online loop-closing and non-linear optimization. Furthermore, our system supports relocalization in a map that has been previously obtained and allows for continued SLAM operation. We evaluate our approach in terms of accuracy, relocalization capability and run-time efficiency on public indoor benchmark datasets and on newly recorded outdoor sequences. We demonstrate state-of-the-art performance of our system compared to a visual-inertial odometry method and baseline visual SLAM approaches in recovering the trajectory of the camera.
△ Less
Submitted 2 March, 2017; v1 submitted 7 February, 2017;
originally announced February 2017.