-
Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications
Authors:
Markus Hillemann,
Robert Langendörfer,
Max Heiken,
Max Mehltretter,
Andreas Schenk,
Martin Weinmann,
Stefan Hinz,
Christian Heipke,
Markus Ulrich
Abstract:
Neural Radiance Fields (NeRFs) have become a rapidly growing research field with the potential to revolutionize typical photogrammetric workflows, such as those used for 3D scene reconstruction. As input, NeRFs require multi-view images with corresponding camera poses as well as the interior orientation. In the typical NeRF workflow, the camera poses and the interior orientation are estimated in a…
▽ More
Neural Radiance Fields (NeRFs) have become a rapidly growing research field with the potential to revolutionize typical photogrammetric workflows, such as those used for 3D scene reconstruction. As input, NeRFs require multi-view images with corresponding camera poses as well as the interior orientation. In the typical NeRF workflow, the camera poses and the interior orientation are estimated in advance with Structure from Motion (SfM). But the quality of the resulting novel views, which depends on different parameters such as the number and distribution of available images, as well as the accuracy of the related camera poses and interior orientation, is difficult to predict. In addition, SfM is a time-consuming pre-processing step, and its quality strongly depends on the image content. Furthermore, the undefined scaling factor of SfM hinders subsequent steps in which metric information is required. In this paper, we evaluate the potential of NeRFs for industrial robot applications. We propose an alternative to SfM pre-processing: we capture the input images with a calibrated camera that is attached to the end effector of an industrial robot and determine accurate camera poses with metric scale based on the robot kinematics. We then investigate the quality of the novel views by comparing them to ground truth, and by computing an internal quality measure based on ensemble methods. For evaluation purposes, we acquire multiple datasets that pose challenges for reconstruction typical of industrial applications, like reflective objects, poor texture, and fine structures. We show that the robot-based pose determination reaches similar accuracy as SfM in non-demanding cases, while having clear advantages in more challenging scenarios. Finally, we present first results of applying the ensemble method to estimate the quality of the synthetic novel view in the absence of a ground truth.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
RANRAC: Robust Neural Scene Representations via Random Ray Consensus
Authors:
Benno Buschmann,
Andreea Dogaru,
Elmar Eisemann,
Michael Weinmann,
Bernhard Egger
Abstract:
Learning-based scene representations such as neural radiance fields or light field networks, that rely on fitting a scene model to image observations, commonly encounter challenges in the presence of inconsistencies within the images caused by occlusions, inaccurately estimated camera parameters or effects like lens flare. To address this challenge, we introduce RANdom RAy Consensus (RANRAC), an e…
▽ More
Learning-based scene representations such as neural radiance fields or light field networks, that rely on fitting a scene model to image observations, commonly encounter challenges in the presence of inconsistencies within the images caused by occlusions, inaccurately estimated camera parameters or effects like lens flare. To address this challenge, we introduce RANdom RAy Consensus (RANRAC), an efficient approach to eliminate the effect of inconsistent data, thereby taking inspiration from classical RANSAC based outlier detection for model fitting. In contrast to the down-weighting of the effect of outliers based on robust loss formulations, our approach reliably detects and excludes inconsistent perspectives, resulting in clean images without floating artifacts. For this purpose, we formulate a fuzzy adaption of the RANSAC paradigm, enabling its application to large scale models. We interpret the minimal number of samples to determine the model parameters as a tunable hyperparameter, investigate the generation of hypotheses with data-driven models, and analyze the validation of hypotheses in noisy environments. We demonstrate the compatibility and potential of our solution for both photo-realistic robust multi-view reconstruction from real-world images based on neural radiance fields and for single-shot reconstruction based on light-field networks. In particular, the results indicate significant improvements compared to state-of-the-art robust methods for novel-view synthesis on both synthetic and captured scenes with various inconsistencies including occlusions, noisy camera pose estimates, and unfocused perspectives. The results further indicate significant improvements for single-shot reconstruction from occluded images. Project Page: https://bennobuschmann.com/ranrac/
△ Less
Submitted 19 April, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024
Authors:
Benjamin Kiefer,
Lojze Žust,
Matej Kristan,
Janez Perš,
Matija Teršek,
Arnold Wiliem,
Martin Messmer,
Cheng-Yen Yang,
Hsiang-Wei Huang,
Zhongyu Jiang,
Heng-Cheng Kuo,
Jie Mei,
Jenq-Neng Hwang,
Daniel Stadler,
Lars Sommer,
Kaer Huang,
Aiguo Zheng,
Weitu Chong,
Kanokphan Lertniphonphan,
Jun Xie,
Feng Chen,
Jian Li,
Zhepeng Wang,
Luca Zedda,
Andrea Loddo
, et al. (24 additional authors not shown)
Abstract:
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst…
▽ More
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Neural inverse procedural modeling of knitting yarns from images
Authors:
Elena Trunz,
Jonathan Klein,
Jan Müller,
Lukas Bode,
Ralf Sarlette,
Michael Weinmann,
Reinhard Klein
Abstract:
We investigate the capabilities of neural inverse procedural modeling to infer high-quality procedural yarn models with fiber-level details from single images of depicted yarn samples. While directly inferring all parameters of the underlying yarn model based on a single neural network may seem an intuitive choice, we show that the complexity of yarn structures in terms of twisting and migration c…
▽ More
We investigate the capabilities of neural inverse procedural modeling to infer high-quality procedural yarn models with fiber-level details from single images of depicted yarn samples. While directly inferring all parameters of the underlying yarn model based on a single neural network may seem an intuitive choice, we show that the complexity of yarn structures in terms of twisting and migration characteristics of the involved fibers can be better encountered in terms of ensembles of networks that focus on individual characteristics. We analyze the effect of different loss functions including a parameter loss to penalize the deviation of inferred parameters to ground truth annotations, a reconstruction loss to enforce similar statistics of the image generated for the estimated parameters in comparison to training images as well as an additional regularization term to explicitly penalize deviations between latent codes of synthetic images and the average latent code of real images in the latent space of the encoder. We demonstrate that the combination of a carefully designed parametric, procedural yarn model with respective network ensembles as well as loss functions even allows robust parameter inference when solely trained on synthetic data. Since our approach relies on the availability of a yarn database with parameter annotations and we are not aware of such a respectively available dataset, we additionally provide, to the best of our knowledge, the first dataset of yarn images with annotations regarding the respective yarn parameters. For this purpose, we use a novel yarn generator that improves the realism of the produced results over previous approaches.
△ Less
Submitted 28 February, 2023;
originally announced March 2023.
-
Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments
Authors:
Leif Van Holland,
Patrick Stotko,
Stefan Krumpen,
Reinhard Klein,
Michael Weinmann
Abstract:
Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging.
In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic…
▽ More
Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging.
In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices.
△ Less
Submitted 13 February, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
BoundED: Neural Boundary and Edge Detection in 3D Point Clouds via Local Neighborhood Statistics
Authors:
Lukas Bode,
Michael Weinmann,
Reinhard Klein
Abstract:
Extracting high-level structural information from 3D point clouds is challenging but essential for tasks like urban planning or autonomous driving requiring an advanced understanding of the scene at hand. Existing approaches are still not able to produce high-quality results consistently while being fast enough to be deployed in scenarios requiring interactivity. We propose to utilize a novel set…
▽ More
Extracting high-level structural information from 3D point clouds is challenging but essential for tasks like urban planning or autonomous driving requiring an advanced understanding of the scene at hand. Existing approaches are still not able to produce high-quality results consistently while being fast enough to be deployed in scenarios requiring interactivity. We propose to utilize a novel set of features describing the local neighborhood on a per-point basis via first and second order statistics as input for a simple and compact classification network to distinguish between non-edge, sharp-edge, and boundary points in the given data. Leveraging this feature embedding enables our algorithm to outperform the state-of-the-art techniques in terms of quality and processing time.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Beyond single receptive field: A receptive field fusion-and-stratification network for airborne laser scanning point cloud classification
Authors:
Yongqiang Mao,
Kaiqiang Chen,
Wenhui Diao,
Xian Sun,
Xiaonan Lu,
Kun Fu,
Martin Weinmann
Abstract:
The classification of airborne laser scanning (ALS) point clouds is a critical task of remote sensing and photogrammetry fields. Although recent deep learning-based methods have achieved satisfactory performance, they have ignored the unicity of the receptive field, which makes the ALS point cloud classification remain challenging for the distinguishment of the areas with complex structures and ex…
▽ More
The classification of airborne laser scanning (ALS) point clouds is a critical task of remote sensing and photogrammetry fields. Although recent deep learning-based methods have achieved satisfactory performance, they have ignored the unicity of the receptive field, which makes the ALS point cloud classification remain challenging for the distinguishment of the areas with complex structures and extreme scale variations. In this article, for the objective of configuring multi-receptive field features, we propose a novel receptive field fusion-and-stratification network (RFFS-Net). With a novel dilated graph convolution (DGConv) and its extension annular dilated convolution (ADConv) as basic building blocks, the receptive field fusion process is implemented with the dilated and annular graph fusion (DAGFusion) module, which obtains multi-receptive field feature representation through capturing dilated and annular graphs with various receptive regions. The stratification of the receptive fields with point sets of different resolutions as the calculation bases is performed with Multi-level Decoders nested in RFFS-Net and driven by the multi-level receptive field aggregation loss (MRFALoss) to drive the network to learn in the direction of the supervision labels with different resolutions. With receptive field fusion-and-stratification, RFFS-Net is more adaptable to the classification of regions with complex structures and extreme scale variations in large-scale ALS point clouds. Evaluated on the ISPRS Vaihingen 3D dataset, our RFFS-Net significantly outperforms the baseline approach by 5.3% on mF1 and 5.4% on mIoU, accomplishing an overall accuracy of 82.1%, an mF1 of 71.6%, and an mIoU of 58.2%. Furthermore, experiments on the LASDU dataset and the 2019 IEEE-GRSS Data Fusion Contest dataset show that RFFS-Net achieves a new state-of-the-art classification performance.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Incomplete Gamma Kernels: Generalizing Locally Optimal Projection Operators
Authors:
Patrick Stotko,
Michael Weinmann,
Reinhard Klein
Abstract:
We present incomplete gamma kernels, a generalization of Locally Optimal Projection (LOP) operators. In particular, we reveal the relation of the classical localized $ L_1 $ estimator, used in the LOP operator for surface reconstruction from noisy point clouds, to the common Mean Shift framework via a novel kernel. Furthermore, we generalize this result to a whole family of kernels that are built…
▽ More
We present incomplete gamma kernels, a generalization of Locally Optimal Projection (LOP) operators. In particular, we reveal the relation of the classical localized $ L_1 $ estimator, used in the LOP operator for surface reconstruction from noisy point clouds, to the common Mean Shift framework via a novel kernel. Furthermore, we generalize this result to a whole family of kernels that are built upon the incomplete gamma function and each represents a localized $ L_p $ estimator. By deriving various properties of the kernel family concerning distributional, Mean Shift induced, and other aspects such as strict positive definiteness, we obtain a deeper understanding of the operator's projection behavior. From these theoretical insights, we illustrate several applications ranging from an improved Weighted LOP (WLOP) density weighting scheme and a more accurate Continuous LOP (CLOP) kernel approximation to the definition of a novel set of robust loss functions. These incomplete gamma losses include the Gaussian and LOP loss as special cases and can be applied for reconstruction tasks such as normal filtering. We demonstrate the effects of each application in a range of quantitative and qualitative experiments that highlight the benefits induced by our modifications.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Occlusion Fields: An Implicit Representation for Non-Line-of-Sight Surface Reconstruction
Authors:
Javier Grau,
Markus Plack,
Patrick Haehn,
Michael Weinmann,
Matthias Hullin
Abstract:
Non-line-of-sight reconstruction (NLoS) is a novel indirect imaging modality that aims to recover objects or scene parts outside the field of view from measurements of light that is indirectly scattered off a directly visible, diffuse wall. Despite recent advances in acquisition and reconstruction techniques, the well-posedness of the problem at large, and the recoverability of objects and their s…
▽ More
Non-line-of-sight reconstruction (NLoS) is a novel indirect imaging modality that aims to recover objects or scene parts outside the field of view from measurements of light that is indirectly scattered off a directly visible, diffuse wall. Despite recent advances in acquisition and reconstruction techniques, the well-posedness of the problem at large, and the recoverability of objects and their shapes in particular, remains an open question. The commonly employed Fermat path criterion is rather conservative with this regard, as it classifies some surfaces as unrecoverable, although they contribute to the signal.
In this paper, we use a simpler necessary criterion for an opaque surface patch to be recoverable. Such piece of surface must be directly visible from some point on the wall, and it must occlude the space behind itself. Inspired by recent advances in neural implicit representations, we devise a new representation and reconstruction technique for NLoS scenes that unifies the treatment of recoverability with the reconstruction itself. Our approach, which we validate on various synthetic and experimental datasets, exhibits interesting properties. Unlike memory-inefficient volumetric representations, ours allows to infer adaptively tessellated surfaces from time-of-flight measurements of moderate resolution. It can further recover features beyond the Fermat path criterion, and it is robust to significant amounts of self-occlusion. We believe that this is the first time that these properties have been achieved in one system that, as an additional benefit, is trainable and hence suited for data-driven approaches.
△ Less
Submitted 22 March, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
FaSS-MVS -- Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-borne Monocular Imagery
Authors:
Boitumelo Ruf,
Martin Weinmann,
Stefan Hinz
Abstract:
With FaSS-MVS, we present an approach for fast multi-view stereo with surface-aware Semi-Global Matching that allows for rapid depth and normal map estimation from monocular aerial video data captured by UAVs. The data estimated by FaSS-MVS, in turn, facilitates online 3D map**, meaning that a 3D map of the scene is immediately and incrementally generated while the image data is acquired or bein…
▽ More
With FaSS-MVS, we present an approach for fast multi-view stereo with surface-aware Semi-Global Matching that allows for rapid depth and normal map estimation from monocular aerial video data captured by UAVs. The data estimated by FaSS-MVS, in turn, facilitates online 3D map**, meaning that a 3D map of the scene is immediately and incrementally generated while the image data is acquired or being received. FaSS-MVS is comprised of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing to efficiently process large scene depths which are inherent to oblique imagery captured by low-flying UAVs. The actual depth estimation employs a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of SGM. Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and calculating the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study we show that the accuracies of the 3D information calculated by FaSS-MVS is close to that of state-of-the-art approaches for offline multi-view stereo, with the error not even being one magnitude higher than that of COLMAP. At the same time, however, the average run-time of FaSS-MVS to estimate a single depth and normal map is less than 14 % of that of COLMAP, allowing to perform an online and incremental processing of Full-HD imagery at 1-2 Hz.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Spline-PINN: Approaching PDEs without Data using Fast, Physics-Informed Hermite-Spline CNNs
Authors:
Nils Wandel,
Michael Weinmann,
Michael Neidlin,
Reinhard Klein
Abstract:
Partial Differential Equations (PDEs) are notoriously difficult to solve. In general, closed-form solutions are not available and numerical approximation schemes are computationally expensive. In this paper, we propose to approach the solution of PDEs based on a novel technique that combines the advantages of two recently emerging machine learning based approaches. First, physics-informed neural n…
▽ More
Partial Differential Equations (PDEs) are notoriously difficult to solve. In general, closed-form solutions are not available and numerical approximation schemes are computationally expensive. In this paper, we propose to approach the solution of PDEs based on a novel technique that combines the advantages of two recently emerging machine learning based approaches. First, physics-informed neural networks (PINNs) learn continuous solutions of PDEs and can be trained with little to no ground truth data. However, PINNs do not generalize well to unseen domains. Second, convolutional neural networks provide fast inference and generalize but either require large amounts of training data or a physics-constrained loss based on finite differences that can lead to inaccuracies and discretization artifacts. We leverage the advantages of both of these approaches by using Hermite spline kernels in order to continuously interpolate a grid-based state representation that can be handled by a CNN. This allows for training without any precomputed training data using a physics-informed loss function only and provides fast, continuous solutions that generalize to unseen domains. We demonstrate the potential of our method at the examples of the incompressible Navier-Stokes equation and the damped wave equation. Our models are able to learn several intriguing phenomena such as Karman vortex streets, the Magnus effect, Doppler effect, interference patterns and wave reflections. Our quantitative assessment and an interactive real-time demo show that we are narrowing the gap in accuracy of unsupervised ML based methods to industrial CFD solvers while being orders of magnitude faster.
△ Less
Submitted 22 March, 2022; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Pose Normalization of Indoor Map** Datasets Partially Compliant to the Manhattan World Assumption
Authors:
Patrick Hübner,
Martin Weinmann,
Sven Wursthorn,
Stefan Hinz
Abstract:
In this paper, we present a novel pose normalization method for indoor map** point clouds and triangle meshes that is robust against large fractions of the indoor map** geometries deviating from an ideal Manhattan World structure. In the case of building structures that contain multiple Manhattan World systems, the dominant Manhattan World structure supported by the largest fraction of geometr…
▽ More
In this paper, we present a novel pose normalization method for indoor map** point clouds and triangle meshes that is robust against large fractions of the indoor map** geometries deviating from an ideal Manhattan World structure. In the case of building structures that contain multiple Manhattan World systems, the dominant Manhattan World structure supported by the largest fraction of geometries is determined and used for alignment. In a first step, a vertical alignment orienting a chosen axis to be orthogonal to horizontal floor and ceiling surfaces is conducted. Subsequently, a rotation around the resulting vertical axis is determined that aligns the dataset horizontally with the coordinate axes. The proposed method is evaluated quantitatively against several publicly available indoor map** datasets. Our implementation of the proposed procedure along with code for reproducing the evaluation will be made available to the public upon acceptance for publication.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
FaDIV-Syn: Fast Depth-Independent View Synthesis using Soft Masks and Implicit Blending
Authors:
Andre Rochow,
Max Schwarz,
Michael Weinmann,
Sven Behnke
Abstract:
Novel view synthesis is required in many robotic applications, such as VR teleoperation and scene reconstruction. Existing methods are often too slow for these contexts, cannot handle dynamic scenes, and are limited by their explicit depth estimation stage, where incorrect depth predictions can lead to large projection errors. Our proposed method runs in real time on live streaming data and avoids…
▽ More
Novel view synthesis is required in many robotic applications, such as VR teleoperation and scene reconstruction. Existing methods are often too slow for these contexts, cannot handle dynamic scenes, and are limited by their explicit depth estimation stage, where incorrect depth predictions can lead to large projection errors. Our proposed method runs in real time on live streaming data and avoids explicit depth estimation by efficiently war** input images into the target frame for a range of assumed depth planes. The resulting plane sweep volume (PSV) is directly fed into our network, which first estimates soft PSV masks in a self-supervised manner, and then directly produces the novel output view. This improves efficiency and performance on transparent, reflective, thin, and feature-less scene parts. FaDIV-Syn can perform both interpolation and extrapolation tasks at 540p in real-time and outperforms state-of-the-art extrapolation methods on the large-scale RealEstate10k dataset. We thoroughly evaluate ablations, such as removing the Soft-Masking network, training from fewer examples as well as generalization to higher resolutions and stronger depth discretization. Our implementation is available.
△ Less
Submitted 13 May, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
ReS2tAC -- UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
Authors:
Boitumelo Ruf,
Jonas Mohrs,
Martin Weinmann,
Stefan Hinz,
Jürgen Beyerer
Abstract:
With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability…
▽ More
With the emergence of low-cost robotic systems, such as unmanned aerial vehicle, the importance of embedded high-performance image processing has increased. For a long time, FPGAs were the only processing hardware that were capable of high-performance computing, while at the same time preserving a low power consumption, essential for embedded systems. However, the recently increasing availability of embedded GPU-based systems, such as the NVIDIA Jetson series, comprised of an ARM CPU and a NVIDIA Tegra GPU, allows for massively parallel embedded computing on graphics hardware. With this in mind, we propose an approach for real-time embedded stereo processing on ARM and CUDA-enabled devices, which is based on the popular and widely used Semi-Global Matching algorithm. In this, we propose an optimization of the algorithm for embedded CUDA GPUs, by using massively parallel computing, as well as using the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs. We have evaluated our approach with different configurations on two public stereo benchmark datasets to demonstrate that they can reach an error rate as low as 3.3%. Furthermore, our experiments show that the fastest configuration of our approach reaches up to 46 FPS on VGA image resolution. Finally, in a use-case specific qualitative evaluation, we have evaluated the power consumption of our approach and deployed it on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK unmanned aerial vehicle (UAV), demonstrating its suitability for real-time stereo processing onboard a UAV.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
Real-time dense 3D Reconstruction from monocular video data captured by low-cost UAVs
Authors:
Max Hermann,
Boitumelo Ruf,
Martin Weinmann
Abstract:
Real-time 3D reconstruction enables fast dense map** of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned…
▽ More
Real-time 3D reconstruction enables fast dense map** of the environment which benefits numerous applications, such as navigation or live evaluation of an emergency. In contrast to most real-time capable approaches, our approach does not need an explicit depth sensor. Instead, we only rely on a video stream from a camera and its intrinsic calibration. By exploiting the self-motion of the unmanned aerial vehicle (UAV) flying with oblique view around buildings, we estimate both camera trajectory and depth for selected images with enough novel content. To create a 3D model of the scene, we rely on a three-stage processing chain. First, we estimate the rough camera trajectory using a simultaneous localization and map** (SLAM) algorithm. Once a suitable constellation is found, we estimate depth for local bundles of images using a Multi-View Stereo (MVS) approach and then fuse this depth into a global surfel-based model. For our evaluation, we use 55 video sequences with diverse settings, consisting of both synthetic and real scenes. We evaluate not only the generated reconstruction but also the intermediate products and achieve competitive results both qualitatively and quantitatively. At the same time, our method can keep up with a 30 fps video for a resolution of 768x448 pixels.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery
Authors:
Xian Sun,
Pei** Wang,
Zhiyuan Yan,
Feng Xu,
Rui** Wang,
Wenhui Diao,
** Chen,
Jihao Li,
Yingchao Feng,
Tao Xu,
Martin Weinmann,
Stefan Hinz,
Cheng Wang,
Kun Fu
Abstract:
With the rapid development of deep learning, many deep learning-based approaches have made great achievements in object detection task. It is generally known that deep learning is a data-driven method. Data directly impact the performance of object detectors to some extent. Although existing datasets have included common objects in remote sensing images, they still have some limitations in terms o…
▽ More
With the rapid development of deep learning, many deep learning-based approaches have made great achievements in object detection task. It is generally known that deep learning is a data-driven method. Data directly impact the performance of object detectors to some extent. Although existing datasets have included common objects in remote sensing images, they still have some limitations in terms of scale, categories, and images. Therefore, there is a strong requirement for establishing a large-scale benchmark on object detection in high-resolution remote sensing images. In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15,000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M. All objects in the FAIR1M dataset are annotated with respect to 5 categories and 37 sub-categories by oriented bounding boxes. Compared with existing detection datasets dedicated to object detection, the FAIR1M dataset has 4 particular characteristics: (1) it is much larger than other existing object detection datasets both in terms of the quantity of instances and the quantity of images, (2) it provides more rich fine-grained category information for objects in remote sensing images, (3) it contains geographic information such as latitude, longitude and resolution, (4) it provides better image quality owing to a careful data cleaning procedure. To establish a baseline for fine-grained object recognition, we propose a novel evaluation method and benchmark fine-grained object detection tasks and a visual classification task using several State-Of-The-Art (SOTA) deep learning-based models on our FAIR1M dataset. Experimental results strongly indicate that the FAIR1M dataset is closer to practical application and it is considerably more challenging than existing datasets.
△ Less
Submitted 24 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Towards Tangible Cultural Heritage Experiences -- Enriching VR-Based Object Inspection with Haptic Feedback
Authors:
Stefan Krumpen,
Reinhard Klein,
Michael Weinmann
Abstract:
VR/AR technology is a key enabler for new ways of immersively experiencing cultural heritage artifacts based on their virtual counterparts obtained from a digitization process. In this paper, we focus on enriching VR-based object inspection by additional haptic feedback, thereby creating tangible cultural heritage experiences. For this purpose, we present an approach for interactive and collaborat…
▽ More
VR/AR technology is a key enabler for new ways of immersively experiencing cultural heritage artifacts based on their virtual counterparts obtained from a digitization process. In this paper, we focus on enriching VR-based object inspection by additional haptic feedback, thereby creating tangible cultural heritage experiences. For this purpose, we present an approach for interactive and collaborative VR-based object inspection and annotation. Our system supports high-quality 3D models with accurate reflectance characteristics while additionally providing haptic feedback regarding the object shape features based on a 3D printed replica. The digital object model in terms of a printable representation of the geometry as well as reflectance characteristics are stored in a compact and streamable representation on a central server, which streams the data to remotely connected users/clients. The latter can jointly perform an interactive inspection of the object in VR with additional haptic feedback through the 3D printed replica. Evaluations regarding system performance, visual quality of the considered models as well as insights from a user study indicate an improved interaction, assessment and experience of the considered objects.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Teaching the Incompressible Navier-Stokes Equations to Fast Neural Surrogate Models in 3D
Authors:
Nils Wandel,
Michael Weinmann,
Reinhard Klein
Abstract:
Physically plausible fluid simulations play an important role in modern computer graphics and engineering. However, in order to achieve real-time performance, computational speed needs to be traded-off with physical accuracy. Surrogate fluid models based on neural networks have the potential to achieve both, fast fluid simulations and high physical accuracy. However, these approaches rely on massi…
▽ More
Physically plausible fluid simulations play an important role in modern computer graphics and engineering. However, in order to achieve real-time performance, computational speed needs to be traded-off with physical accuracy. Surrogate fluid models based on neural networks have the potential to achieve both, fast fluid simulations and high physical accuracy. However, these approaches rely on massive amounts of training data, require complex pipelines for training and inference or do not generalize to new fluid domains.
In this work, we present significant extensions to a recently proposed deep learning framework, which addresses the aforementioned challenges in 2D. We go from 2D to 3D and propose an efficient architecture to cope with the high demands of 3D grids in terms of memory and computational complexity. Furthermore, we condition the neural fluid model on additional information about the fluid's viscosity and density which allows simulating laminar as well as turbulent flows based on the same surrogate model.
Our method allows to train fluid models without requiring fluid simulation data beforehand. Inference is fast and simple, as the fluid model directly maps a fluid state and boundary conditions at a moment t to a subsequent fluid state at t+dt. We obtain real-time fluid simulations on a 128x64x64 grid that include various fluid phenomena such as the Magnus effect or Karman vortex streets and generalize to domain geometries not considered during training. Our method indicates strong improvements in terms of accuracy, speed and generalization capabilities over current 3D NN-based fluid models.
△ Less
Submitted 23 March, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery
Authors:
Max Hermann,
Boitumelo Ruf,
Martin Weinmann,
Stefan Hinz
Abstract:
Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For t…
▽ More
Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy δ1.25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Learning Incompressible Fluid Dynamics from Scratch -- Towards Fast, Differentiable Fluid Models that Generalize
Authors:
Nils Wandel,
Michael Weinmann,
Reinhard Klein
Abstract:
Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer-generated imagery to computer-aided design in research and development. However, solving the partial differential equations of incompressible fluids is a challenging task and traditional numerical approximation schemes come at high computational costs. Recent deep learning based approaches promis…
▽ More
Fast and stable fluid simulations are an essential prerequisite for applications ranging from computer-generated imagery to computer-aided design in research and development. However, solving the partial differential equations of incompressible fluids is a challenging task and traditional numerical approximation schemes come at high computational costs. Recent deep learning based approaches promise vast speed-ups but do not generalize to new fluid domains, require fluid simulation data for training, or rely on complex pipelines that outsource major parts of the fluid simulation to traditional methods.
In this work, we propose a novel physics-constrained training approach that generalizes to new fluid domains, requires no fluid simulation data, and allows convolutional neural networks to map a fluid state from time-point t to a subsequent state at time t + dt in a single forward pass. This simplifies the pipeline to train and evaluate neural fluid models. After training, the framework yields models that are capable of fast fluid simulations and can handle various fluid phenomena including the Magnus effect and Karman vortex streets. We present an interactive real-time demo to show the speed and generalization capabilities of our trained models. Moreover, the trained neural networks are efficient differentiable fluid solvers as they offer a differentiable update step to advance the fluid simulation in time. We exploit this fact in a proof-of-concept optimal control experiment. Our models significantly outperform a recent differentiable fluid solver in terms of computational speed and accuracy.
△ Less
Submitted 2 March, 2021; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Voxel-Based Indoor Reconstruction From HoloLens Triangle Meshes
Authors:
P. Hübner,
M. Weinmann,
S. Wursthorn
Abstract:
Current mobile augmented reality devices are often equipped with range sensors. The Microsoft HoloLens for instance is equipped with a Time-Of-Flight (ToF) range camera providing coarse triangle meshes that can be used in custom applications. We suggest to use the triangle meshes for the automatic generation of indoor models that can serve as basis for augmenting their physical counterpart with lo…
▽ More
Current mobile augmented reality devices are often equipped with range sensors. The Microsoft HoloLens for instance is equipped with a Time-Of-Flight (ToF) range camera providing coarse triangle meshes that can be used in custom applications. We suggest to use the triangle meshes for the automatic generation of indoor models that can serve as basis for augmenting their physical counterpart with location-dependent information. In this paper, we present a novel voxel-based approach for automated indoor reconstruction from unstructured three-dimensional geometries like triangle meshes. After an initial voxelization of the input data, rooms are detected in the resulting voxel grid by segmenting connected voxel components of ceiling candidates and extruding them downwards to find floor candidates. Semantic class labels like 'Wall', 'Wall Opening', 'Interior Object' and 'Empty Interior' are then assigned to the room voxels in-between ceiling and floor by a rule-based voxel sweep algorithm. Finally, the geometry of the detected walls and their openings is refined in voxel representation. The proposed approach is not restricted to Manhattan World scenarios and does not rely on room surfaces being planar.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Bonn Activity Maps: Dataset Description
Authors:
Julian Tanke,
Oh-Hun Kwon,
Patrick Stotko,
Radu Alexandru Rosu,
Michael Weinmann,
Hassan Errami,
Sven Behnke,
Maren Bennewitz,
Reinhard Klein,
Andreas Weber,
Angela Yao,
Juergen Gall
Abstract:
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environ…
▽ More
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environments. We present Bonn Activity Maps, a large-scale dataset for human tracking, activity recognition and anticipation of multiple persons. Our dataset comprises four different scenes that have been recorded by time-synchronized cameras each only capturing the scene partially, the reconstructed 3D models with semantic annotations, motion trajectories for individual people including 3D human poses as well as human activity annotations. We utilize the annotations to generate activity likelihoods on the 3D models called activity maps.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Orthogonal Wasserstein GANs
Authors:
Jan Müller,
Reinhard Klein,
Michael Weinmann
Abstract:
Wasserstein-GANs have been introduced to address the deficiencies of generative adversarial networks (GANs) regarding the problems of vanishing gradients and mode collapse during the training, leading to improved convergence behaviour and improved image quality. However, Wasserstein-GANs require the discriminator to be Lipschitz continuous. In current state-of-the-art Wasserstein-GANs this constra…
▽ More
Wasserstein-GANs have been introduced to address the deficiencies of generative adversarial networks (GANs) regarding the problems of vanishing gradients and mode collapse during the training, leading to improved convergence behaviour and improved image quality. However, Wasserstein-GANs require the discriminator to be Lipschitz continuous. In current state-of-the-art Wasserstein-GANs this constraint is enforced via gradient norm regularization. In this paper, we demonstrate that this regularization does not encourage a broad distribution of spectral-values in the discriminator weights, hence resulting in less fidelity in the learned distribution. We therefore investigate the possibility of substituting this Lipschitz constraint with an orthogonality constraint on the weight matrices. We compare three different weight orthogonalization techniques with regards to their convergence properties, their ability to ensure the Lipschitz condition and the achieved quality of the learned distribution. In addition, we provide a comparison to Wasserstein-GANs trained with current state-of-the-art methods, where we demonstrate the potential of solely using orthogonality-based regularization. In this context, we propose an improved training procedure for Wasserstein-GANs which utilizes orthogonalization to further increase its generalization capability. Finally, we provide a novel metric to evaluate the generalization capabilities of the discriminators of different Wasserstein-GANs.
△ Less
Submitted 14 December, 2019; v1 submitted 29 November, 2019;
originally announced November 2019.
-
Efficient Surface-Aware Semi-Global Matching with Multi-View Plane-Sweep Sampling
Authors:
Boitumelo Ruf,
Thomas Pollok,
Martin Weinmann
Abstract:
Online augmentation of an oblique aerial image sequence with structural information is an essential aspect in the process of 3D scene interpretation and analysis. One key aspect in this is the efficient dense image matching and depth estimation. Here, the Semi-Global Matching (SGM) approach has proven to be one of the most widely used algorithms for efficient depth estimation, providing a good tra…
▽ More
Online augmentation of an oblique aerial image sequence with structural information is an essential aspect in the process of 3D scene interpretation and analysis. One key aspect in this is the efficient dense image matching and depth estimation. Here, the Semi-Global Matching (SGM) approach has proven to be one of the most widely used algorithms for efficient depth estimation, providing a good trade-off between accuracy and computational complexity. However, SGM only models a first-order smoothness assumption, thus favoring fronto-parallel surfaces. In this work, we present a hierarchical algorithm that allows for efficient depth and normal map estimation together with confidence measures for each estimate. Our algorithm relies on a plane-sweep multi-image matching followed by an extended SGM optimization that allows to incorporate local surface orientations, thus achieving more consistent and accurate estimates in areasmade up of slanted surfaces, inherent to oblique aerial imagery. We evaluate numerous configurations of our algorithm on two different datasets using an absolute and relative accuracy measure. In our evaluation, we show that the results of our approach are comparable to the ones achieved by refined Structure-from-Motion (SfM) pipelines, such as COLMAP, which are designed for offline processing. In contrast, however, our approach only considers a confined image bundle of an input sequence, thus allowing to perform an online and incremental computation at 1Hz-2Hz.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
Efficient 3D Reconstruction and Streaming for Group-Scale Multi-Client Live Telepresence
Authors:
Patrick Stotko,
Stefan Krumpen,
Michael Weinmann,
Reinhard Klein
Abstract:
Sharing live telepresence experiences for teleconferencing or remote collaboration receives increasing interest with the recent progress in capturing and AR/VR technology. Whereas impressive telepresence systems have been proposed on top of on-the-fly scene capture, data transmission and visualization, these systems are restricted to the immersion of single or up to a low number of users into the…
▽ More
Sharing live telepresence experiences for teleconferencing or remote collaboration receives increasing interest with the recent progress in capturing and AR/VR technology. Whereas impressive telepresence systems have been proposed on top of on-the-fly scene capture, data transmission and visualization, these systems are restricted to the immersion of single or up to a low number of users into the respective scenarios. In this paper, we direct our attention on immersing significantly larger groups of people into live-captured scenes as required in education, entertainment or collaboration scenarios. For this purpose, rather than abandoning previous approaches, we present a range of optimizations of the involved reconstruction and streaming components that allow the immersion of a group of more than 24 users within the same scene - which is about a factor of 6 higher than in previous work - without introducing further latency or changing the involved consumer hardware setup. We demonstrate that our optimized system is capable of generating high-quality scene reconstructions as well as providing an immersive viewing experience to a large group of people within these live-captured scenes.
△ Less
Submitted 13 January, 2020; v1 submitted 8 August, 2019;
originally announced August 2019.
-
A VR System for Immersive Teleoperation and Live Exploration with a Mobile Robot
Authors:
Patrick Stotko,
Stefan Krumpen,
Max Schwarz,
Christian Lenz,
Sven Behnke,
Reinhard Klein,
Michael Weinmann
Abstract:
Applications like disaster management and industrial inspection often require experts to enter contaminated places. To circumvent the need for physical presence, it is desirable to generate a fully immersive individual live teleoperation experience. However, standard video-based approaches suffer from a limited degree of immersion and situation awareness due to the restriction to the camera view,…
▽ More
Applications like disaster management and industrial inspection often require experts to enter contaminated places. To circumvent the need for physical presence, it is desirable to generate a fully immersive individual live teleoperation experience. However, standard video-based approaches suffer from a limited degree of immersion and situation awareness due to the restriction to the camera view, which impacts the navigation. In this paper, we present a novel VR-based practical system for immersive robot teleoperation and scene exploration. While being operated through the scene, a robot captures RGB-D data that is streamed to a SLAM-based live multi-client telepresence system. Here, a global 3D model of the already captured scene parts is reconstructed and streamed to the individual remote user clients where the rendering for e.g. head-mounted display devices (HMDs) is performed. We introduce a novel lightweight robot client component which transmits robot-specific data and enables a quick integration into existing robotic systems. This way, in contrast to first-person exploration systems, the operators can explore and navigate in the remote site completely independent of the current position and view of the capturing robot, complementing traditional input devices for teleoperation. We provide a proof-of-concept implementation and demonstrate the capabilities as well as the performance of our system regarding interactive object measurements and bandwidth-efficient data streaming and visualization. Furthermore, we show its benefits over purely video-based teleoperation in a user study revealing a higher degree of situation awareness and a more precise navigation in challenging environments.
△ Less
Submitted 3 February, 2020; v1 submitted 8 August, 2019;
originally announced August 2019.
-
Automatic Co-Registration of Aerial Imagery and Untextured Model Data Utilizing Average Shading Gradients
Authors:
Sylvia Schmitz,
Martin Weinmann,
Boitumelo Ruf
Abstract:
The comparison of current image data with existing 3D model data of a scene provides an efficient method to keep models up to date. In order to transfer information between 2D and 3D data, a preliminary co-registration is necessary. In this paper, we present a concept to automatically co-register aerial imagery and untextured 3D model data. To refine a given initial camera pose, our algorithm comp…
▽ More
The comparison of current image data with existing 3D model data of a scene provides an efficient method to keep models up to date. In order to transfer information between 2D and 3D data, a preliminary co-registration is necessary. In this paper, we present a concept to automatically co-register aerial imagery and untextured 3D model data. To refine a given initial camera pose, our algorithm computes dense correspondence fields using SIFT flow between gradient representations of the model and camera image, from which 2D-3D correspondences are obtained. These correspondences are then used in an iterative optimization scheme to refine the initial camera pose by minimizing the reprojection error. Since it is assumed that the model does not contain texture information, our algorithm is built up on an existing method based on Average Shading Gradients (ASG) to generate gradient images based on raw geometry information only. We apply our algorithm for the co-registering of aerial photographs to an untextured, noisy mesh model. We have investigated different magnitudes of input error and show that the proposed approach can reduce the final reprojection error to a minimum of 1.27 plus-minus 0.54 pixels, which is less than 10 % of its initial value. Furthermore, our evaluation shows that our approach outperforms the accuracy of a standard Iterative Closest Point (ICP) implementation.
△ Less
Submitted 21 September, 2019; v1 submitted 26 June, 2019;
originally announced June 2019.
-
Computational Parquetry: Fabricated Style Transfer with Wood Pixels
Authors:
Julian Iseringhausen,
Michael Weinmann,
Weizhen Huang,
Matthias B. Hullin
Abstract:
Parquetry is the art and craft of decorating a surface with a pattern of differently colored veneers of wood, stone or other materials. Traditionally, the process of designing and making parquetry has been driven by color, using the texture found in real wood only for stylization or as a decorative effect. Here, we introduce a computational pipeline that draws from the rich natural structure of st…
▽ More
Parquetry is the art and craft of decorating a surface with a pattern of differently colored veneers of wood, stone or other materials. Traditionally, the process of designing and making parquetry has been driven by color, using the texture found in real wood only for stylization or as a decorative effect. Here, we introduce a computational pipeline that draws from the rich natural structure of strongly textured real-world veneers as a source of detail in order to approximate a target image as faithfully as possible using a manageable number of parts. This challenge is closely related to the established problems of patch-based image synthesis and stylization in some ways, but fundamentally different in others. Most importantly, the limited availability of resources (any piece of wood can only be used once) turns the relatively simple problem of finding the right piece for the target location into the combinatorial problem of finding optimal parts while avoiding resource collisions. We introduce an algorithm that allows to efficiently solve an approximation to the problem. It further addresses challenges like gamut map**, feature characterization and the search for fabricable cuts. We demonstrate the effectiveness of the system by fabricating a selection of "photo-realistic" pieces of parquetry from different kinds of unstained wood veneer.
△ Less
Submitted 19 December, 2019; v1 submitted 9 April, 2019;
originally announced April 2019.
-
A Study of Material Sonification in Touchscreen Devices
Authors:
Rodrigo Martín,
Michael Weinmann,
Matthias B. Hullin
Abstract:
Even in the digital age, designers largely rely on physical material samples to illustrate their products, as existing visual representations fail to sufficiently reproduce the look and feel of real world materials. Here, we investigate the use of interactive material sonification as an additional sensory modality for communicating well-established material qualities like softness, pleasantness or…
▽ More
Even in the digital age, designers largely rely on physical material samples to illustrate their products, as existing visual representations fail to sufficiently reproduce the look and feel of real world materials. Here, we investigate the use of interactive material sonification as an additional sensory modality for communicating well-established material qualities like softness, pleasantness or value. We developed a custom application for touchscreen devices that receives tactile input and translate it into material rubbing sound using granular synthesis. We used this system to perform a psychophysical study, in which the ability of the user to rate subjective material qualities is evaluated, with the actual material samples serving as reference stimulus. Our experimental results indicate that the considered audio cues do not significantly contribute to the perception of material qualities but are able to increase the level of immersion when interacting with digital samples.
△ Less
Submitted 26 September, 2018; v1 submitted 3 July, 2018;
originally announced July 2018.
-
SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence
Authors:
Patrick Stotko,
Stefan Krumpen,
Matthias B. Hullin,
Michael Weinmann,
Reinhard Klein
Abstract:
Real-time 3D scene reconstruction from RGB-D sensor data, as well as the exploration of such data in VR/AR settings, has seen tremendous progress in recent years. The combination of both these components into telepresence systems, however, comes with significant technical challenges. All approaches proposed so far are extremely demanding on input and output devices, compute resources and transmiss…
▽ More
Real-time 3D scene reconstruction from RGB-D sensor data, as well as the exploration of such data in VR/AR settings, has seen tremendous progress in recent years. The combination of both these components into telepresence systems, however, comes with significant technical challenges. All approaches proposed so far are extremely demanding on input and output devices, compute resources and transmission bandwidth, and they do not reach the level of immediacy required for applications such as remote collaboration. Here, we introduce what we believe is the first practical client-server system for real-time capture and many-user exploration of static 3D scenes. Our system is based on the observation that interactive frame rates are sufficient for capturing and reconstruction, and real-time performance is only required on the client site to achieve lag-free view updates when rendering the 3D model. Starting from this insight, we extend previous voxel block hashing frameworks by overcoming internal dependencies and introducing, to the best of our knowledge, the first thread-safe GPU hash map data structure that is robust under massively concurrent retrieval, insertion and removal of entries on a thread level. We further propose a novel transmission scheme for volume data that is specifically targeted to Marching Cubes geometry reconstruction and enables a 90% reduction in bandwidth between server and exploration clients. The resulting system poses very moderate requirements on network bandwidth, latency and client-side computation, which enables it to rely entirely on consumer-grade hardware, including mobile devices. We demonstrate that our technique achieves state-of-the-art representation accuracy while providing, for any number of clients, an immersive and fluid lag-free viewing experience even during network outages.
△ Less
Submitted 12 April, 2019; v1 submitted 9 May, 2018;
originally announced May 2018.
-
Deep cross-domain building extraction for selective depth estimation from oblique aerial imagery
Authors:
Boitumelo Ruf,
Laurenz Thiel,
Martin Weinmann
Abstract:
With the technological advancements of aerial imagery and accurate 3d reconstruction of urban environments, more and more attention has been paid to the automated analyses of urban areas. In our work, we examine two important aspects that allow live analysis of building structures in city models given oblique aerial imagery, namely automatic building extraction with convolutional neural networks (…
▽ More
With the technological advancements of aerial imagery and accurate 3d reconstruction of urban environments, more and more attention has been paid to the automated analyses of urban areas. In our work, we examine two important aspects that allow live analysis of building structures in city models given oblique aerial imagery, namely automatic building extraction with convolutional neural networks (CNNs) and selective real-time depth estimation from aerial imagery. We use transfer learning to train the Faster R-CNN method for real-time deep object detection, by combining a large ground-based dataset for urban scene understanding with a smaller number of images from an aerial dataset. We achieve an average precision (AP) of about 80% for the task of building extraction on a selected evaluation dataset. Our evaluation focuses on both dataset-specific learning and transfer learning. Furthermore, we present an algorithm that allows for multi-view depth estimation from aerial imagery in real-time. We adopt the semi-global matching (SGM) optimization strategy to preserve sharp edges at object boundaries. In combination with the Faster R-CNN, it allows a selective reconstruction of buildings, identified with regions of interest (RoIs), from oblique aerial imagery.
△ Less
Submitted 21 September, 2019; v1 submitted 23 April, 2018;
originally announced April 2018.