-
Reducing the Memory Footprint of 3D Gaussian Splatting
Authors:
Panagiotis Papantonakis,
Georgios Kopanas,
Bernhard Kerbl,
Alexandre Lanvin,
George Drettakis
Abstract:
3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the nu…
▽ More
3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a 27 reduction in overall size on disk on the standard datasets we tested, along with a 1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Taming 3DGS: High-Quality Radiance Fields with Limited Resources
Authors:
Saswat Subhajyoti Mallick,
Rahul Goel,
Bernhard Kerbl,
Francisco Vicente Carrasco,
Markus Steinberger,
Fernando De La Torre
Abstract:
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, its resource requirements limit its usability. Especially on constrained devices, training performance degrades quickly and often cannot complete due to excessive memory consumption of the model. The method converges with an indefinite number of Gaussians -- many of…
▽ More
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, its resource requirements limit its usability. Especially on constrained devices, training performance degrades quickly and often cannot complete due to excessive memory consumption of the model. The method converges with an indefinite number of Gaussians -- many of them redundant -- making rendering unnecessarily slow and preventing its usage in downstream tasks that expect fixed-size inputs. To address these issues, we tackle the challenges of training and rendering 3DGS models on a budget. We use a guided, purely constructive densification process that steers densification toward Gaussians that raise the reconstruction quality. Model size continuously increases in a controlled manner towards an exact budget, using score-based densification of Gaussians with training-time priors that measure their contribution. We further address training speed obstacles: following a careful analysis of 3DGS' original pipeline, we derive faster, numerically equivalent solutions for gradient computation and attribute updates, including an alternative parallelization for efficient backpropagation. We also propose quality-preserving approximations where suitable to reduce training time even further. Taken together, these enhancements yield a robust, scalable solution with reduced training times, lower compute and memory requirements, and high quality. Our evaluation shows that in a budgeted setting, we obtain competitive quality metrics with 3DGS while achieving a 4--5x reduction in both model size and training time. With more generous budgets, our measured quality surpasses theirs. These advances open the door for novel-view synthesis in constrained environments, e.g., mobile devices.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets
Authors:
Bernhard Kerbl,
Andréas Meuleman,
Georgios Kopanas,
Michael Wimmer,
Alexandre Lanvin,
George Drettakis
Abstract:
Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual qualit…
▽ More
Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field
Authors:
Chao Wang,
Krzysztof Wolski,
Bernhard Kerbl,
Ana Serrano,
Mojtaba Bemana,
Hans-Peter Seidel,
Karol Myszkowski,
Thomas Leimkühler
Abstract:
Radiance field methods represent the state of the art in reconstructing complex scenes from multi-view photos. However, these reconstructions often suffer from one or both of the following limitations: First, they typically represent scenes in low dynamic range (LDR), which restricts their use to evenly lit environments and hinders immersive viewing experiences. Secondly, their reliance on a pinho…
▽ More
Radiance field methods represent the state of the art in reconstructing complex scenes from multi-view photos. However, these reconstructions often suffer from one or both of the following limitations: First, they typically represent scenes in low dynamic range (LDR), which restricts their use to evenly lit environments and hinders immersive viewing experiences. Secondly, their reliance on a pinhole camera model, assuming all scene elements are in focus in the input images, presents practical challenges and complicates refocusing during novel-view synthesis. Addressing these limitations, we present a lightweight method based on 3D Gaussian Splatting that utilizes multi-view LDR images of a scene with varying exposure times, apertures, and focus distances as input to reconstruct a high-dynamic-range (HDR) radiance field. By incorporating analytical convolutions of Gaussians based on a thin-lens camera model as well as a tonemap** module, our reconstructions enable the rendering of HDR content with flexible refocusing capabilities. We demonstrate that our combined treatment of HDR and depth of field facilitates real-time cinematic rendering, outperforming the state of the art.
△ Less
Submitted 21 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering
Authors:
Lukas Radl,
Michael Steiner,
Mathias Parger,
Alexander Weinrauch,
Bernhard Kerbl,
Markus Steinberger
Abstract:
Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces pop** and blending artifacts during view rotation. Addressing this issue requires acc…
▽ More
Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces pop** and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates pop** artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with pop**, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements.
△ Less
Submitted 24 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Training and Predicting Visual Error for Real-Time Applications
Authors:
João Libório Cardoso,
Bernhard Kerbl,
Lei Yang,
Yury Uralsky,
Michael Wimmer
Abstract:
Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual cha…
▽ More
Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase performance and improve efficiency. A wide range of different metrics has been established, with the most sophisticated being capable of capturing the perceptual characteristics of the human visual system. However, their complexity, computational expense, and reliance on reference images to compare against prevent their generalized use in real-time, restricting such applications to using only the simplest available metrics. In this work, we explore the abilities of convolutional neural networks to predict a variety of visual metrics without requiring either reference or rendered images. Specifically, we train and deploy a neural network to estimate the visual error resulting from reusing shading or using reduced shading rates. The resulting models account for 70%-90% of the variance while achieving up to an order of magnitude faster computation times. Our solution combines image-space information that is readily available in most state-of-the-art deferred shading pipelines with reprojection from previous frames to enable an adequate estimate of visual errors, even in previously unseen regions. We describe a suitable convolutional network architecture and considerations for data preparation for training. We demonstrate the capability of our network to predict complex error metrics at interactive rates in a real-time application that implements content-adaptive shading in a deferred pipeline. Depending on the portion of unseen image regions, our approach can achieve up to $2\times$ performance compared to state-of-the-art methods.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Authors:
Bernhard Kerbl,
Georgios Kopanas,
Thomas Leimkühler,
George Drettakis
Abstract:
Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no c…
▽ More
Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
GPU-Accelerated LOD Generation for Point Clouds
Authors:
Markus Schütz,
Bernhard Kerbl,
Philip Klaus,
Michael Wimmer
Abstract:
About: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-…
▽ More
About: We introduce a GPU-accelerated LOD construction process that creates a hybrid voxel-point-based variation of the widely used layered point cloud (LPC) structure for LOD rendering and streaming. The massive performance improvements provided by the GPU allow us to improve the quality of lower LODs via color filtering while still increasing construction speed compared to the non-filtered, CPU-based state of the art.
Background: LOD structures are required to render hundreds of millions to trillions of points, but constructing them takes time.
Results: LOD structures suitable for rendering and streaming are constructed at rates of about 1 billion points per second (with color filtering) to 4 billion points per second (sample-picking/random sampling, state of the art) on an RTX 3090 -- an improvement of a factor of 80 to 400 times over the CPU-based state of the art (12 million points per second). Due to being in-core, model sizes are limited to about 500 million points per 24GB memory.
Discussion: Our method currently focuses on maximizing in-core construction speed on the GPU. Issues such as out-of-core construction of arbitrarily large data sets are not addressed, but we expect it to be suitable as a component of bottom-up out-of-core LOD construction schemes.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Software Rasterization of 2 Billion Points in Real Time
Authors:
Markus Schütz,
Bernhard Kerbl,
Michael Wimmer
Abstract:
We propose a software rasterization pipeline for point clouds that is capable of brute-force rendering up to two billion points in real time (60fps). Improvements over the state of the art are achieved by batching points in a way that a number of batch-level optimizations can be computed before rasterizing the points within the same rendering pass. These optimizations include frustum culling, leve…
▽ More
We propose a software rasterization pipeline for point clouds that is capable of brute-force rendering up to two billion points in real time (60fps). Improvements over the state of the art are achieved by batching points in a way that a number of batch-level optimizations can be computed before rasterizing the points within the same rendering pass. These optimizations include frustum culling, level-of-detail rendering, and choosing the appropriate coordinate precision for a given batch of points directly within a compute workgroup. Adaptive coordinate precision, in conjunction with visibility buffers, reduces the number of loaded bytes for the majority of points down to 4, thus making our approach several times faster than the bandwidth-limited state of the art. Furthermore, support for LOD rendering makes our software-rasterization approach suitable for rendering arbitrarily large point clouds, and to meet the increased performance demands of virtual reality rendering.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Gaussian Mixture Convolution Networks
Authors:
Adam Celarek,
Pedro Hermosilla,
Bernhard Kerbl,
Timo Ropinski,
Michael Wimmer
Abstract:
This paper proposes a novel method for deep learning based on the analytical convolution of multidimensional Gaussian mixtures. In contrast to tensors, these do not suffer from the curse of dimensionality and allow for a compact representation, as data is only stored where details exist. Convolution kernels and data are Gaussian mixtures with unconstrained weights, positions, and covariance matric…
▽ More
This paper proposes a novel method for deep learning based on the analytical convolution of multidimensional Gaussian mixtures. In contrast to tensors, these do not suffer from the curse of dimensionality and allow for a compact representation, as data is only stored where details exist. Convolution kernels and data are Gaussian mixtures with unconstrained weights, positions, and covariance matrices. Similar to discrete convolutional networks, each convolution step produces several feature channels, represented by independent Gaussian mixtures. Since traditional transfer functions like ReLUs do not produce Gaussian mixtures, we propose using a fitting of these functions instead. This fitting step also acts as a pooling layer if the number of Gaussian components is reduced appropriately. We demonstrate that networks based on this architecture reach competitive accuracy on Gaussian mixtures fitted to the MNIST and ModelNet data sets.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Rendering Point Clouds with Compute Shaders and Vertex Order Optimization
Authors:
Markus Schütz,
Bernhard Kerbl,
Michael Wimmer
Abstract:
While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hard…
▽ More
While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hardware pipeline by up to an order of magnitude and achieve significantly better frame times than previous compute-based methods. Beyond basic closest-point rendering, we also introduce a fast, high-quality variant to reduce aliasing. We present and evaluate several variants of our proposed methods with different flavors of optimization, in order to ensure their applicability and achieve optimal performance on a range of platforms and architectures with varying support for novel GPU hardware features. During our experiments, the observed peak performance was reached rendering 796 million points (12.7GB) at rates of 62 to 64 frames per second (50 billion points per second, 802GB/s) on an RTX 3090 without the use of level-of-detail structures.
We further introduce an optimized vertex order for point clouds to boost the efficiency of GL_POINTS by a factor of 5x in cases where hardware rendering is compulsory. We compare different orderings and show that Morton sorted buffers are faster for some viewpoints, while shuffled vertex buffers are faster in others. In contrast, combining both approaches by first sorting according to Morton-code and shuffling the resulting sequence in batches of 128 points leads to a vertex buffer layout with high rendering performance and low sensitivity to viewpoint changes.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing
Authors:
Michael Kenzel,
Bernhard Kerbl,
Wolfgang Tatzgern,
Elena Ivanchenko,
Dieter Schmalstieg,
Markus Steinberger
Abstract:
Compute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-tran…
▽ More
Compute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-transform cache is traditionally suggested to enable reuse of vertex processing results. However, traditional caching neither scales well as the hardware becomes more parallel, nor can be efficiently implemented in a software design. We investigate alternative strategies to reusing vertex shading results on-the-fly for massively parallel software geometry processing. Forming static and dynamic batching on the data input stream, we analyze the effectiveness of identifying potential local reuse based on sorting, hashing, and efficient intra-thread-group communication. Altogether, we present four vertex reuse strategies, tailored to modern parallel architectures. Our simulations showcase that our batch-based strategies significantly outperform parallel caches in terms of reuse. On actual GPU hardware, our evaluation shows that our strategies not only lead to good reuse of processing results, but also boost performance by $2-3\times$ compared to naïvely ignoring reuse in a variety of practical applications.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.