Search | arXiv e-print repository

An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses

Authors: Johanna Bräunig, Vanessa Wirth, Marc Stamminger, Ingrid Ullmann, Martin Vossiek

Abstract: Contactless hand pose estimation requires sensors that provide precise spatial information and low computational complexity for real-time processing. Unlike vision-based systems, radar offers lighting independence and direct motion assessments. Yet, there is limited research balancing real-time constraints, suitable frame rates for motion evaluations, and the need for precise 3D data. To address t… ▽ More Contactless hand pose estimation requires sensors that provide precise spatial information and low computational complexity for real-time processing. Unlike vision-based systems, radar offers lighting independence and direct motion assessments. Yet, there is limited research balancing real-time constraints, suitable frame rates for motion evaluations, and the need for precise 3D data. To address this, we extend the ultra-efficient two-tone hand imaging method from our prior work to a three-tone approach. Maintaining high frame rates and real-time constraints, this approach significantly enhances reconstruction accuracy and precision. We assess these measures by evaluating reconstruction results for different hand poses obtained by an imaging radar. Accuracy is assessed against ground truth from a spatially calibrated photogrammetry setup, while precision is measured using 3D-printed hand poses. The results emphasize the method's great potential for future radar-based hand sensing. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 4 pages, 4 figures, accepted at European Microwave Week (EuMW 2024) to the topic "R28 Human Activity Monitoring, including Gesture Recognition" (EuRAD)

arXiv:2403.16862 [pdf, other]

INPC: Implicit Neural Point Clouds for Radiance Field Rendering

Authors: Florian Hahlbohm, Linus Franke, Moritz Kappel, Susana Castillo, Marc Stamminger, Marcus Magnor

Abstract: We introduce a new approach for reconstruction and novel-view synthesis of unbounded real-world scenes. In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxies, we propose a hybrid scene representation, which implicitly encodes a point cloud in a continuous octree-based probability field and a multi-resolution hash grid. In doing so, we co… ▽ More We introduce a new approach for reconstruction and novel-view synthesis of unbounded real-world scenes. In contrast to previous methods using either volumetric fields, grid-based models, or discrete point cloud proxies, we propose a hybrid scene representation, which implicitly encodes a point cloud in a continuous octree-based probability field and a multi-resolution hash grid. In doing so, we combine the benefits of both worlds by retaining favorable behavior during optimization: Our novel implicit point cloud representation and differentiable bilinear rasterizer enable fast rendering while preserving fine geometric detail without depending on initial priors like structure-from-motion point clouds. Our method achieves state-of-the-art image quality on several common benchmark datasets. Furthermore, we achieve fast inference at interactive frame rates, and can extract explicit point clouds to further enhance performance. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Project page: https://fhahlbohm.github.io/inpc/

arXiv:2403.10981 [pdf, other]

Automatic Spatial Calibration of Near-Field MIMO Radar With Respect to Optical Sensors

Authors: Vanessa Wirth, Johanna Bräunig, Danti Khouri, Florian Gutsche, Martin Vossiek, Tim Weyrich, Marc Stamminger

Abstract: Despite an emerging interest in MIMO radar, the utilization of its complementary strengths in combination with optical sensors has so far been limited to far-field applications, due to the challenges that arise from mutual sensor calibration in the near field. In fact, most related approaches in the autonomous industry propose target-based calibration methods using corner reflectors that have prov… ▽ More Despite an emerging interest in MIMO radar, the utilization of its complementary strengths in combination with optical sensors has so far been limited to far-field applications, due to the challenges that arise from mutual sensor calibration in the near field. In fact, most related approaches in the autonomous industry propose target-based calibration methods using corner reflectors that have proven to be unsuitable for the near field. In contrast, we propose a novel, joint calibration approach for optical RGB-D sensors and MIMO radars that is designed to operate in the radar's near-field range, within decimeters from the sensors. Our pipeline consists of a bespoke calibration target, allowing for automatic target detection and localization, followed by the spatial calibration of the two sensor coordinate systems through target registration. We validate our approach using two different depth sensing technologies from the optical domain. The experiments show the efficiency and accuracy of our calibration for various target displacements, as well as its robustness of our localization in terms of signal ambiguities. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 8 pages, 9 figures

arXiv:2401.06003 [pdf, other]

TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering

Authors: Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger

Abstract: Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and clou… ▽ More Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [Rückert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. The project page is located at: https://lfranke.github.io/trips/ △ Less

Submitted 26 March, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

ACM Class: I.3; I.4

arXiv:2401.02281 [pdf, other]

PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation

Authors: Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae

Abstract: We introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian… ▽ More We introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene through interaction between meshes extracted for the objects and the environment. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the Ramen dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Project Page: https://meyerls.github.io/pegasus_web

arXiv:2311.16668 [pdf, other]

doi 10.1145/3610548.3618213

LiveNVS: Neural View Synthesis on Live RGB-D Streams

Authors: Laura Fink, Darius Rückert, Linus Franke, Joachim Keinert, Marc Stamminger

Abstract: Existing real-time RGB-D reconstruction approaches, like Kinect Fusion, lack real-time photo-realistic visualization. This is due to noisy, oversmoothed or incomplete geometry and blurry textures which are fused from imperfect depth maps and camera poses. Recent neural rendering methods can overcome many of such artifacts but are mostly optimized for offline usage, hindering the integration into a… ▽ More Existing real-time RGB-D reconstruction approaches, like Kinect Fusion, lack real-time photo-realistic visualization. This is due to noisy, oversmoothed or incomplete geometry and blurry textures which are fused from imperfect depth maps and camera poses. Recent neural rendering methods can overcome many of such artifacts but are mostly optimized for offline usage, hindering the integration into a live reconstruction pipeline. In this paper, we present LiveNVS, a system that allows for neural novel view synthesis on a live RGB-D input stream with very low latency and real-time rendering. Based on the RGB-D input stream, novel views are rendered by projecting neural features into the target view via a densely fused depth map and aggregating the features in image-space to a target feature map. A generalizable neural network then translates the target feature map into a high-quality RGB image. LiveNVS achieves state-of-the-art neural rendering quality of unknown scenes during capturing, allowing users to virtually explore the scene and assess reconstruction quality in real-time. △ Less

Submitted 29 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: main paper: 8 pages, total number of pages: 15, 13 figures, to be published in SIGGRAPH Asia 2023 Conference Papers; edits: link was fixed

ACM Class: I.3.2; I.3.6; I.4.5

arXiv:2311.15415 [pdf, other]

doi 10.1007/978-3-031-39059-3_28

GAN-Based LiDAR Intensity Simulation

Authors: Richard Marcus, Felix Gabel, Niklas Knoop, Marc Stamminger

Abstract: Realistic vehicle sensor simulation is an important element in develo** autonomous driving. As physics-based implementations of visual sensors like LiDAR are complex in practice, data-based approaches promise solutions. Using pairs of camera images and LiDAR scans from real test drives, GANs can be trained to translate between them. For this process, we contribute two additions. First, we exploi… ▽ More Realistic vehicle sensor simulation is an important element in develo** autonomous driving. As physics-based implementations of visual sensors like LiDAR are complex in practice, data-based approaches promise solutions. Using pairs of camera images and LiDAR scans from real test drives, GANs can be trained to translate between them. For this process, we contribute two additions. First, we exploit the camera images, acquiring segmentation data and dense depth maps as additional input for training. Second, we test the performance of the LiDAR simulation by testing how well an object detection network generalizes between real and synthetic point clouds to enable evaluation without ground truth point clouds. Combining both, we simulate LiDAR point clouds and demonstrate their realism. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Journal ref: Deep Learning Theory and Applications, 4th International Conference, DeLTA 2023, Rome, Italy, 2023, Proceedings, pp 419-433

arXiv:2311.04634 [pdf, other]

doi 10.1145/3610548.3618212

VET: Visual Error Tomography for Point Cloud Completion and High-Quality Neural Rendering

Authors: Linus Franke, Darius Rückert, Laura Fink, Matthias Innmann, Marc Stamminger

Abstract: In the last few years, deep neural networks opened the doors for big advances in novel view synthesis. Many of these approaches are based on a (coarse) proxy geometry obtained by structure from motion algorithms. Small deficiencies in this proxy can be fixed by neural rendering, but larger holes or missing parts, as they commonly appear for thin structures or for glossy regions, still lead to dist… ▽ More In the last few years, deep neural networks opened the doors for big advances in novel view synthesis. Many of these approaches are based on a (coarse) proxy geometry obtained by structure from motion algorithms. Small deficiencies in this proxy can be fixed by neural rendering, but larger holes or missing parts, as they commonly appear for thin structures or for glossy regions, still lead to distracting artifacts and temporal instability. In this paper, we present a novel neural-rendering-based approach to detect and fix such deficiencies. As a proxy, we use a point cloud, which allows us to easily remove outlier geometry and to fill in missing geometry without complicated topological operations. Keys to our approach are (i) a differentiable, blending point-based renderer that can blend out redundant points, as well as (ii) the concept of Visual Error Tomography (VET), which allows us to lift 2D error maps to identify 3D-regions lacking geometry and to spawn novel points accordingly. Furthermore, (iii) by adding points as nested environment maps, our approach allows us to generate high-quality renderings of the surroundings in the same pipeline. In our results, we show that our approach can improve the quality of a point cloud obtained by structure from motion and thus increase novel view synthesis quality significantly. In contrast to point growing techniques, the approach can also fix large-scale holes and missing thin structures effectively. Rendering quality outperforms state-of-the-art methods and temporal stability is significantly improved, while rendering is possible at real-time frame rates. △ Less

Submitted 8 November, 2023; originally announced November 2023.

ACM Class: I.3; I.4

arXiv:2309.06945 [pdf, ps, other]

doi 10.1109/ISM.2018.00063

Improving HEVC Encoding of Rendered Video Data Using True Motion Information

Authors: Christian Herglotz, David Müller, Andreas Weinlich, Frank Bauer, Michael Ortner, Marc Stamminger, André Kaup

Abstract: This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation meth… ▽ More This paper shows that motion vectors representing the true motion of an object in a scene can be exploited to improve the encoding process of computer generated video sequences. Therefore, a set of sequences is presented for which the true motion vectors of the corresponding objects were generated on a per-pixel basis during the rendering process. In addition to conventional motion estimation methods, it is proposed to exploit the computer generated motion vectors to enhance the ratedistortion performance. To this end, a motion vector map** method including disocclusion handling is presented. It is shown that mean rate savings of 3.78% can be achieved. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 4 pages, 4 figures

Journal ref: Proc. 2018 IEEE International Symposium on Multimedia (ISM)

arXiv:2307.15412 [pdf, other]

A Realistic Radar Ray Tracing Simulator for Hand Pose Imaging

Authors: Johanna Bräunig, Christian Schüßler, Vanessa Wirth, Marc Stamminger, Ingrid Ullmann, Martin Vossiek

Abstract: With the increasing popularity of human-computer interaction applications, there is also growing interest in generating sufficiently large and diverse data sets for automatic radar-based recognition of hand poses and gestures. Radar simulations are a vital approach to generating training data (e.g., for machine learning). Therefore, this work applies a ray tracing method to radar imaging of the ha… ▽ More With the increasing popularity of human-computer interaction applications, there is also growing interest in generating sufficiently large and diverse data sets for automatic radar-based recognition of hand poses and gestures. Radar simulations are a vital approach to generating training data (e.g., for machine learning). Therefore, this work applies a ray tracing method to radar imaging of the hand. The performance of the proposed simulation approach is verified by a comparison of simulation and measurement data based on an imaging radar with a high lateral resolution. In addition, the surface material model incorporated into the ray tracer is highlighted in more detail and parameterized for radar hand imaging. Measurements and simulations show a very high similarity between synthetic and real radar image captures. The presented results demonstrate that it is possible to generate very realistic simulations of radar measurement data even for complex radar hand pose imaging systems. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 4 pages, 5 figures, accepted at European Microwave Week (EuMW 2023) to the topic "R28 Human Activity Monitoring, including Gesture Recognition"

arXiv:2305.14176 [pdf, ps, other]

doi 10.1109/RadarConf2351548.2023.10149641

Achieving Efficient and Realistic Full-Radar Simulations and Automatic Data Annotation by exploiting Ray Meta Data of a Radar Ray Tracing Simulator

Authors: Christian Schüßler, Marcel Hoffmann, Vanessa Wirth, Björn Eskofier, Tim Weyrich, Marc Stamminger, Martin Vossiek

Abstract: In this work a novel radar simulation concept is introduced that allows to simulate realistic radar data for Range, Doppler, and for arbitrary antenna positions in an efficient way. Further, it makes it possible to automatically annotate the simulated radar signal by allowing to decompose it into different parts. This approach allows not only almost perfect annotations possible, but also allows th… ▽ More In this work a novel radar simulation concept is introduced that allows to simulate realistic radar data for Range, Doppler, and for arbitrary antenna positions in an efficient way. Further, it makes it possible to automatically annotate the simulated radar signal by allowing to decompose it into different parts. This approach allows not only almost perfect annotations possible, but also allows the annotation of exotic effects, such as multi-path effects or to label signal parts originating from different parts of an object. This is possible by adapting the computation process of a Monte Carlo shooting and bouncing rays (SBR) simulator. By considering the hits of each simulated ray, various meta data can be stored such as hit position, mesh pointer, object IDs, and many more. This collected meta data can then be utilized to predict the change of path lengths introduced by object motion to obtain Doppler information or to apply specific ray filter rules in order obtain radar signals that only fulfil specific conditions, such as multiple bounces or containing specific object IDs. Using this approach, perfect and otherwise almost impossible annotations schemes can be realized. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted for IEEE RadarConf 2023

arXiv:2304.04708 [pdf, other]

doi 10.1109/CVPRW59228.2023.00664

CherryPicker: Semantic Skeletonization and Topological Reconstruction of Cherry Trees

Authors: Lukas Meyer, Andreas Gilson, Oliver Scholz, Marc Stamminger

Abstract: In plant phenoty**, accurate trait extraction from 3D point clouds of trees is still an open problem. For automatic modeling and trait extraction of tree organs such as blossoms and fruits, the semantically segmented point cloud of a tree and the tree skeleton are necessary. Therefore, we present CherryPicker, an automatic pipeline that reconstructs photo-metric point clouds of trees, performs s… ▽ More In plant phenoty**, accurate trait extraction from 3D point clouds of trees is still an open problem. For automatic modeling and trait extraction of tree organs such as blossoms and fruits, the semantically segmented point cloud of a tree and the tree skeleton are necessary. Therefore, we present CherryPicker, an automatic pipeline that reconstructs photo-metric point clouds of trees, performs semantic segmentation and extracts their topological structure in form of a skeleton. Our system combines several state-of-the-art algorithms to enable automatic processing for further usage in 3D-plant phenoty** applications. Within this pipeline, we present a method to automatically estimate the scale factor of a monocular reconstruction to overcome scale ambiguity and obtain metrically correct point clouds. Furthermore, we propose a semantic skeletonization algorithm build up on Laplacian-based contraction. We also show by weighting different tree organs semantically, our approach can effectively remove artifacts induced by occlusion and structural size variations. CherryPicker obtains high-quality topology reconstructions of cherry trees with precise details. △ Less

Submitted 17 August, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

arXiv:2303.10042 [pdf, other]

ShaRPy: Shape Reconstruction and Hand Pose Estimation from RGB-D with Uncertainty

Authors: Vanessa Wirth, Anna-Maria Liphardt, Birte Coppers, Johanna Bräunig, Simon Heinrich, Sigrid Leyendecker, Arnd Kleyer, Georg Schett, Martin Vossiek, Bernhard Egger, Marc Stamminger

Abstract: Despite their potential, markerless hand tracking technologies are not yet applied in practice to the diagnosis or monitoring of the activity in inflammatory musculoskeletal diseases. One reason is that the focus of most methods lies in the reconstruction of coarse, plausible poses, whereas in the clinical context, accurate, interpretable, and reliable results are required. Therefore, we propose S… ▽ More Despite their potential, markerless hand tracking technologies are not yet applied in practice to the diagnosis or monitoring of the activity in inflammatory musculoskeletal diseases. One reason is that the focus of most methods lies in the reconstruction of coarse, plausible poses, whereas in the clinical context, accurate, interpretable, and reliable results are required. Therefore, we propose ShaRPy, the first RGB-D Shape Reconstruction and hand Pose tracking system, which provides uncertainty estimates of the computed pose, e.g., when a finger is hidden or its estimate is inconsistent with the observations in the input, to guide clinical decision-making. Besides pose, ShaRPy approximates a personalized hand shape, promoting a more realistic and intuitive understanding of its digital twin. Our method requires only a light-weight setup with a single consumer-level RGB-D camera yet it is able to distinguish similar poses with only small joint angle deviations in a metrically accurate space. This is achieved by combining a data-driven dense correspondence predictor with traditional energy minimization. To bridge the gap between interactive visualization and biomedical simulation we leverage a parametric hand model in which we incorporate biomedical constraints and optimize for both, its pose and hand shape. We evaluate ShaRPy on a keypoint detection benchmark and show qualitative results of hand function assessments for activity monitoring of musculoskeletal diseases. △ Less

Submitted 12 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: Accepted at ICCVW (CVAMD) 2023

arXiv:2208.03130 [pdf, other]

doi 10.5220/0011309100003277

A Lightweight Machine Learning Pipeline for LiDAR-simulation

Authors: Richard Marcus, Niklas Knoop, Bernhard Egger, Marc Stamminger

Abstract: Virtual testing is a crucial task to ensure safety in autonomous driving, and sensor simulation is an important task in this domain. Most current LiDAR simulations are very simplistic and are mainly used to perform initial tests, while the majority of insights are gathered on the road. In this paper, we propose a lightweight approach for more realistic LiDAR simulation that learns a real sensor's… ▽ More Virtual testing is a crucial task to ensure safety in autonomous driving, and sensor simulation is an important task in this domain. Most current LiDAR simulations are very simplistic and are mainly used to perform initial tests, while the majority of insights are gathered on the road. In this paper, we propose a lightweight approach for more realistic LiDAR simulation that learns a real sensor's behavior from test drive data and transforms this to the virtual domain. The central idea is to cast the simulation into an image-to-image translation problem. We train our pix2pix based architecture on two real world data sets, namely the popular KITTI data set and the Audi Autonomous Driving Dataset which provide both, RGB and LiDAR images. We apply this network on synthetic renderings and show that it generalizes sufficiently from real images to simulated images. This strategy enables to skip the sensor-specific, expensive and complex LiDAR physics simulation in our synthetic world and avoids oversimplification and a large domain-gap through the clean synthetic environment. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: Conference: DeLTA 22; ISBN 978-989-758-584-5; ISSN 2184-9277; publisher: SciTePress, organization: INSTICC

Journal ref: Proceedings of the 3rd International Conference on Deep Learning Theory and Applications - DeLTA, 2022, pages 176-183

arXiv:2110.06635 [pdf, other]

ADOP: Approximate Differentiable One-Pixel Point Rendering

Authors: Darius Rückert, Linus Franke, Marc Stamminger

Abstract: In this paper we present ADOP, a novel point-based, differentiable neural rendering pipeline. Like other neural renderers, our system takes as input calibrated camera images and a proxy geometry of the scene, in our case a point cloud. To generate a novel view, the point cloud is rasterized with learned feature vectors as colors and a deep neural network fills the remaining holes and shades each o… ▽ More In this paper we present ADOP, a novel point-based, differentiable neural rendering pipeline. Like other neural renderers, our system takes as input calibrated camera images and a proxy geometry of the scene, in our case a point cloud. To generate a novel view, the point cloud is rasterized with learned feature vectors as colors and a deep neural network fills the remaining holes and shades each output pixel. The rasterizer renders points as one-pixel splats, which makes it very fast and allows us to compute gradients with respect to all relevant input parameters efficiently. Furthermore, our pipeline contains a fully differentiable physically-based photometric camera model, including exposure, white balance, and a camera response function. Following the idea of inverse rendering, we use our renderer to refine its input in order to reduce inconsistencies and optimize the quality of its output. In particular, we can optimize structural parameters like the camera pose, lens distortions, point positions and features, and a neural environment map, but also photometric parameters like camera response function, vignetting, and per-image exposure and white balance. Because our pipeline includes photometric parameters, e.g.~exposure and camera response function, our system can smoothly handle input images with varying exposure and white balance, and generates high-dynamic range output. We show that due to the improved input, we can achieve high render quality, also for difficult input, e.g. with imperfect camera calibrations, inaccurate proxy geometry, or varying exposure. As a result, a simpler and thus faster deep neural network is sufficient for reconstruction. In combination with the fast point rasterization, ADOP achieves real-time rendering rates even for models with well over 100M points. https://github.com/darglein/ADOP △ Less

Submitted 3 May, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

arXiv:2007.14808 [pdf, other]

Face2Face: Real-time Face Capture and Reenactment of RGB Videos

Authors: Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, Matthias Nießner

Abstract: We present Face2Face, a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we fi… ▽ More We present Face2Face, a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: https://justusthies.github.io/posts/acm-research-highlight/

Journal ref: CVPR2016

arXiv:1903.03837 [pdf, other]

LumiPath -- Towards Real-time Physically-based Rendering on Embedded Devices

Authors: Laura Fink, Sing Chun Lee, Jie Ying Wu, Xingtong Liu, Tianyu Song, Yordanka Stoyanova, Marc Stamminger, Nassir Navab, Mathias Unberath

Abstract: With the increasing computational power of today's workstations, real-time physically-based rendering is within reach, rapidly gaining attention across a variety of domains. These have expeditiously applied to medicine, where it is a powerful tool for intuitive 3D data visualization. Embedded devices such as optical see-through head-mounted displays (OST HMDs) have been a trend for medical augment… ▽ More With the increasing computational power of today's workstations, real-time physically-based rendering is within reach, rapidly gaining attention across a variety of domains. These have expeditiously applied to medicine, where it is a powerful tool for intuitive 3D data visualization. Embedded devices such as optical see-through head-mounted displays (OST HMDs) have been a trend for medical augmented reality. However, leveraging the obvious benefits of physically-based rendering remains challenging on these devices because of limited computational power, memory usage, and power consumption. We navigate the compromise between device limitations and image quality to achieve reasonable rendering results by introducing a novel light field that can be sampled in real-time on embedded devices. We demonstrate its applications in medicine and discuss limitations of the proposed method. An open-source version of this project is available at https://github.com/lorafib/LumiPath which provides full insight on implementation and exemplary demonstrational material. △ Less

Submitted 16 August, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

Comments: To be presented at MICCAI 2019

arXiv:1901.03910 [pdf, other]

NRMVS: Non-Rigid Multi-View Stereo

Authors: Matthias Innmann, Kihwan Kim, **wei Gu, Matthias Niessner, Charles Loop, Marc Stamminger, Jan Kautz

Abstract: Scene reconstruction from unorganized RGB images is an important task in many computer vision applications. Multi-view Stereo (MVS) is a common solution in photogrammetry applications for the dense reconstruction of a static scene. The static scene assumption, however, limits the general applicability of MVS algorithms, as many day-to-day scenes undergo non-rigid motion, e.g., clothes, faces, or h… ▽ More Scene reconstruction from unorganized RGB images is an important task in many computer vision applications. Multi-view Stereo (MVS) is a common solution in photogrammetry applications for the dense reconstruction of a static scene. The static scene assumption, however, limits the general applicability of MVS algorithms, as many day-to-day scenes undergo non-rigid motion, e.g., clothes, faces, or human bodies. In this paper, we open up a new challenging direction: dense 3D reconstruction of scenes with non-rigid changes observed from arbitrary, sparse, and wide-baseline views. We formulate the problem as a joint optimization of deformation and depth estimation, using deformation graphs as the underlying representation. We propose a new sparse 3D to 2D matching technique, together with a dense patch-match evaluation scheme to estimate deformation and depth with photometric consistency. We show that creating a dense 4D structure from a few RGB images with non-rigid changes is possible, and demonstrate that our method can be used to interpolate novel deformed scenes from various combinations of these deformation estimates derived from the sparse views. △ Less

Submitted 12 January, 2019; originally announced January 2019.

arXiv:1811.10720 [pdf, other]

IGNOR: Image-guided Neural Object Rendering

Authors: Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, Matthias Nießner

Abstract: We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis. The goal of our method is to generate photo-realistic re-renderings of reconstructed objects for virtual and augmented reality applications (e.g., virtual showrooms, virtual tours \& sightseeing, the digital inspection of historical artifacts). A core component o… ▽ More We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis. The goal of our method is to generate photo-realistic re-renderings of reconstructed objects for virtual and augmented reality applications (e.g., virtual showrooms, virtual tours \& sightseeing, the digital inspection of historical artifacts). A core component of our work is the handling of view-dependent effects. Specifically, we directly train an object-specific deep neural network to synthesize the view-dependent appearance of an object. As input data we are using an RGB video of the object. This video is used to reconstruct a proxy geometry of the object via multi-view stereo. Based on this 3D proxy, the appearance of a captured view can be warped into a new target view as in classical image-based rendering. This war** assumes diffuse surfaces, in case of view-dependent effects, such as specular highlights, it leads to artifacts. To this end, we propose EffectsNet, a deep neural network that predicts view-dependent effects. Based on these estimations, we are able to convert observed images to diffuse images. These diffuse images can be projected into other views. In the target view, our pipeline reinserts the new view-dependent effects. To composite multiple reprojected images to a final output, we learn a composition network that outputs photo-realistic results. Using this image-guided approach, the network does not have to allocate capacity on ``remembering'' object appearance, instead it learns how to combine the appearance of captured images. We demonstrate the effectiveness of our approach both qualitatively and quantitatively on synthetic as well as on real data. △ Less

Submitted 15 January, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: Video: https://youtu.be/s79HG9yn7QM

arXiv:1805.11729 [pdf, other]

doi 10.1145/3197517.3201350

HeadOn: Real-time Reenactment of Human Portrait Videos

Authors: Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, Matthias Nießner

Abstract: We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm em… ▽ More We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'18

arXiv:1610.03151 [pdf, other]

FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality

Authors: Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, Matthias Nießner

Abstract: We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. State-of-the-art face tracking methods in the VR context are focused on the animation of rigged 3d avatars. While they achieve good tracking performance the results look cartoonish and not real. In contrast to these model-based approaches, FaceVR enables VR teleconferencing using an i… ▽ More We propose FaceVR, a novel image-based method that enables video teleconferencing in VR based on self-reenactment. State-of-the-art face tracking methods in the VR context are focused on the animation of rigged 3d avatars. While they achieve good tracking performance the results look cartoonish and not real. In contrast to these model-based approaches, FaceVR enables VR teleconferencing using an image-based technique that results in nearly photo-realistic outputs. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. Based on reenactment of a prerecorded stereo video of the person without the HMD, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions or change gaze directions in the prerecorded target video. In a live setup, we apply these newly-introduced algorithmic components. △ Less

Submitted 21 March, 2018; v1 submitted 10 October, 2016; originally announced October 2016.

Comments: Video: https://youtu.be/jIlujM5avU8 Presented at Siggraph'18

arXiv:1603.08161 [pdf, other]

VolumeDeform: Real-time Volumetric Non-rigid Reconstruction

Authors: Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, Marc Stamminger

Abstract: We present a novel approach for the reconstruction of dynamic geometric shapes using a single hand-held consumer-grade RGB-D sensor at real-time rates. Our method does not require a pre-defined shape template to start with and builds up the scene model from scratch during the scanning process. Geometry and motion are parameterized in a unified manner by a volumetric representation that encodes a d… ▽ More We present a novel approach for the reconstruction of dynamic geometric shapes using a single hand-held consumer-grade RGB-D sensor at real-time rates. Our method does not require a pre-defined shape template to start with and builds up the scene model from scratch during the scanning process. Geometry and motion are parameterized in a unified manner by a volumetric representation that encodes a distance field of the surface geometry as well as the non-rigid space deformation. Motion tracking is based on a set of extracted sparse color features in combination with a dense depth-based constraint formulation. This enables accurate tracking and drastically reduces drift inherent to standard model-to-depth alignment. We cast finding the optimal deformation of space as a non-linear regularized variational optimization problem by enforcing local smoothness and proximity to the input constraints. The problem is tackled in real-time at the camera's capture rate using a data-parallel flip-flop optimization strategy. Our results demonstrate robust tracking even for fast motion and scenes that lack geometric features. △ Less

Submitted 30 July, 2016; v1 submitted 26 March, 2016; originally announced March 2016.

Showing 1–22 of 22 results for author: Stamminger, M