Rong [email protected]
\addauthorRui [email protected]
\addauthorYue [email protected]
\addauthorMeida [email protected]
\addauthorAndrew [email protected]
\addinstitution
University of Southern California
Institute for Creative Technologies
Los Angeles, CA, USA
AtomGS
AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field
Abstract
3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (https://rongliu-leo.github.io/AtomGS/).
1 Introduction
Multi-view 3D reconstruction and Novel view synthesis remain significant challenges in computer vision and graphics. A successful reconstruction requires both high-quality visual renderings from new viewpoints and precise capture of 3D geometry. These attributes ensure that models are visually appealing and accurate, making them valuable across various applications such as video games, VR/AR, digital twins, 3D map**, simulation, scan-to-BIM, and more. Neural Radiance Fields (NeRF) [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] has achieved significant progress in producing photorealistic renderings through implicit 3D representation; however, the limitation in rendering speed prevents its utilization in real-world applications. In response, 3D Gaussian Splatting (3DGS) [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] has emerged as a promising alternative, offering an explicit method that achieves fast rendering speeds while maintaining high rendering quality.
Existing works of 3DGS have primarily emphasized either improving rendering quality or enhancing 3D geometry accuracy [Lu et al.(2023)Lu, Yu, Xu, Xiangli, Wang, Lin, and Dai, Huang et al.(2024b)Huang, Bai, Guo, Li, and Guo, Yan et al.(2023)Yan, Low, Chen, and Lee, Guédon and Lepetit(2023), Chen et al.(2023)Chen, Li, and Lee]. Efforts that enhance rendering quality focus less on geometric precision, while those that aim to refine geometry often lead to reduced rendering quality. To address these challenges, we introduce AtomGS, an approach that enhances geometric precision in areas with fine details through our proposed Atomized Proliferation process and Edge-Aware Normal Loss. This enhancement in geometric detail consequently leads to improved rendering quality as shown in our experiments. Figure 1 compares the results of our AtomGS method with existing methods, showcasing improvements in rendering quality and geometry surface normals.
(a) Ours | (b) 3DGS |
Our proposed AtomGS refines 3DGS by strategically deploying Atom Gaussians to ensure detailed coverage of complex scenes through the Atomized Proliferation process. In contrast to the 3DGS, AtomGS provides precise guidance on where to focus Gaussians for better 3D geometry optimization. Intuitively, smaller anisotropic Gaussians are constrained into uniformly-sized Atom Gaussians followed by a progressive split schedule to provide better coverage of scene details. Meanwhile, larger anisotropic Gaussians are retained to represent the background or geometric areas with few features. To facilitate this process, we have introduced an Edge-Aware Normal Loss that imposes stricter constraints on the positioning of Gaussians aligned with flat surfaces while allowing more flexibility for those on irregularly shaped areas. Specifically, we integrated weights derived from a 2D edge detector into the curvature map to compute this loss. In addition, during the subsequent pruning phase, Gaussians covering large, less detailed surfaces are merged, resulting in a similar or even reduced number of Gaussians compared to the original 3DGS while retaining competitive rendering quality. To summarize, the main contributions of our paper are listed as follows:
-
1.
We introduced an Atomized Proliferation strategy aimed at enhancing rendering quality by refining 3D geometric precision in areas with fine details.
-
2.
We designed an Edge-Aware Normal Loss to enhance the reconstruction accuracy by preserving details in areas with irregular shapes while reducing noise on flat surfaces.
-
3.
Our proposed AtomGS has achieved state-of-the-art performance on several benchmark datasets, excelling in both rendering quality and geometric precision.
2 Related work
2.1 Novel View Synthesis
Volumetric methods, traditionally utilized for 3D scene representation [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng, Wang et al.(2022)Wang, Chai, He, Chen, and Liao, Wang et al.(2021b)Wang, Wu, Xie, Chen, and Prisacariu, Yen-Chen et al.(2021)Yen-Chen, Florence, Barron, Rodriguez, Isola, and Lin, Srinivasan et al.(2021)Srinivasan, Deng, Zhang, Tancik, Mildenhall, and Barron, Deng et al.(2020)Deng, Lewis, Jeruzalski, Pons-Moll, Hinton, Norouzi, and Tagliasacchi, Jang and Agapito(2021), Liu et al.(2021)Liu, Zhang, Zhang, Zhang, Zhu, and Russell, Noguchi et al.(2021)Noguchi, Sun, Lin, and Harada], involve subdividing space into discrete units known as voxels, allowing for detailed modeling of internal structures but often suffering from resolution limitations and high computational costs. In contrast, implicit neural representations [Jiang et al.(2020)Jiang, Sud, Makadia, Huang, Nießner, Funkhouser, et al., Wu et al.(2022)Wu, Liu, Chen, Li, Zheng, Cai, and Zheng, Ran et al.(2023)Ran, Zeng, He, Chen, Li, Chen, Lee, and Ye] have revolutionized 3D modeling by using continuous mathematical functions, learned by neural networks, to represent complex geometries and appearances without the need for spatial discretization.
Building on these advancements, NeRF [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] employ a coordinate-based neural network to encode volumetric scenes, providing unprecedented detail and realism in novel view synthesis, particularly effective in capturing complex light interactions and intricate details. Building upon the original NeRF framework, InstantNGP [Müller et al.(2022)Müller, Evans, Schied, and Keller] introduces a multiresolution hash encoding that efficiently stores and retrieves neural network data, significantly boosting both training and inference speeds while striking a balance between performance and accuracy. On the other hand, Mip-NeRF 360 [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman] extends NeRF’s capabilities to render large-scale, unbounded 360-degree scenes with consistent quality, effectively managing varied lighting conditions in expansive environments.
As we progress to alternative rendering solutions, 3DGS [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] has gained popularity for its ability to enhance visual rendering effects and speed through the optimized use of anisotropic 3D Gaussian ellipsoids and rasterized splatting techniques. Diverging from traditional 3DGS methods which freely drift and split, Lu et al.’s Scaffold-GS [Lu et al.(2023)Lu, Yu, Xu, Xiangli, Wang, Lin, and Dai] leverages scene structure to guide the distribution of 3D Gaussians, allowing for adaptive modifications based on varying viewing angles and distances. Additionally, Huang et.al [Huang et al.(2024b)Huang, Bai, Guo, Li, and Guo] focuses on analyzing and reducing the artifacts caused by 3DGS errors, aiming to optimize rendering quality. For different levels of detail (LOD) in 3DGS scenes, Yan et.al, introduced a multi-scale approach [Yan et al.(2023)Yan, Low, Chen, and Lee] to enable selective rendering that yields faster and more precise outcomes.
While the methods above focus on enhancing the visual accuracy of rendered images, they often lack the geometric constraints necessary for high-quality surface reconstruction. In contrast, our method balances visual and geometric accuracy by refining the underlying geometry alignment of Gaussians.
2.2 Multi-View 3D Mesh Reconstruction
Inspired by NeRF [Yen-Chen et al.(2021)Yen-Chen, Florence, Barron, Rodriguez, Isola, and Lin], NeuS [Wang et al.(2021a)Wang, Liu, Liu, Theobalt, Komura, and Wang] integrates a Signed Distance Function (SDF) into radiance field to learn a neural SDF representation from multi-view images, thereby representing object surfaces with volumetric rendering accurately. Other concurrent works such as VolSDF [Yariv et al.(2021)Yariv, Gu, Kasten, and Lipman] and UNISURF [Oechsle et al.(2021)Oechsle, Peng, and Geiger] enhance surface reconstruction by improving ray sampling accuracy and simplifying the reconstruction process, respectively. Based on NeuS, Neuralangelo [Li et al.(2023)Li, Müller, Evans, Taylor, Unberath, Liu, and Lin] proposes coarse-to-fine optimization on the hash grids and examines higher-order derivatives to reconstruct surfaces, leading to improved geometry accuracy and fine detail.
While SDF-based methods have greatly enhanced geometric surface reconstruction, they often result in poorer visual rendering performance and reduced reconstruction speeds due to the integration of SDF. Moreover, sphere initialization [Atzmon and Lipman(2020)] is crucial for model convergence, limiting the application to datasets where the object is not centrally located.
Inspired by 3DGS, newer methods tackle precise geometric reconstruction with explicit representation. SuGaR [Guédon and Lepetit(2023)] proposes geometry constraints to regularize 3DGS, achieving geometry improvement, while NeuSG [Chen et al.(2023)Chen, Li, and Lee] introduces a scale regularizer to ensure the accuracy of the reconstructed surfaces by enforcing the 3D Gaussians to be extremely thin.
Despite significant progress in methods for accurate geometric reconstruction, they often lead to a reduction in rendering quality. Our method builds on better alignment between Gaussians and the inherent geometry and designs subsequent optimization processes for improved visual quality.
3 Preliminary for 3D Gaussian Splatting
Proposed as an alternative to NeRF-based methods, 3DGS combines differentiable optimization and non-differentiable adaptive density control for modeling the radiance field.
Ellipsoidal Gaussian Primitive: A 3D Gaussian Primitive is defined as , where represents the center of the Gaussian, also referred to as its position; signifies its opacity; denotes the quaternion; stands for the scale; and represents its learnable features. The color is determined using spherical harmonics based on . The 3D covariance matrix is computed as , where and are the rotation and scale matrices derived from and , respectively.
During training, all Gaussian properties are optimized with the loss function: , where prioritizes pixel-wise accuracy, emphasizes structural similarity and perceptual quality, and serves as the weighting factor.
Adaptive Density Control: The stage inserts, splits or prunes existing Gaussian primitives to better represent the 3D scene.
-
1.
SfM Initialization: Given the SfM points, 3DGS calculates the mean distance to the closest three points, denoted as , where is the number of initialized SfM points. Then it employs this distance to initialize isotropic Gaussians. Additionally, considering camera centers , where is the number of training camera poses, 3DGS determines its radius using , where represents the mean of all camera centers. Subsequently, it sets the scale threshold to , which decides whether to clone or split the Gaussian if the gradient condition is satisfied.
-
2.
Densification: 3DGS adaptively densifies Gaussians to enhance scene detail capture. This densification process occurs regularly, targeting Gaussians with view-space positional gradients equal to or greater than the gradient threshold . Following the gradient condition , it then checks if . If the scale condition is met, the Gaussian is identified as over-reconstructed and split (creating two Gaussians with positions normally sampled based on the original). If not, it’s classified as under-reconstructed, leading to the clone of an identical Gaussian.
-
3.
Prune: The point pruning phase involves removing redundant or less important Gaussians. This involves deleting Gaussians with the opacity below a specified threshold. Moreover, to prevent producing noisy Gaussians near input cameras, the alpha values are gradually set closer to zero after a certain number of iterations. This adjustment facilitates the densification of necessary Gaussians while eliminating redundant ones.
Challenges: 3DGS presents a hybrid optimization approach by integrating differentiable backpropagation with non-differentiable adaptive density control. However, it faces several challenges impacting its effectiveness. First, there’s a lack of prioritization in the optimization process, where the method may focus on enlarging large Gaussians instead of densifying smaller ones to fill in gaps in geometry, or it might replicate transparent Gaussians to mimic a solid surface rather than enhancing the alignment and opacity of existing ones. Second, the absence of geometric regularization leads to misalignment of Gaussians with the underlying geometry, creating noisy artifacts that require opacity adjustments to clean up. Lastly, the simplistic approach to scale thresholding is influenced largely on camera pose radii than scene complexity, which restricts the method’s ability to finely tune the splitting and cloning of Gaussians based on the detail needed for effective scene representation. Consequently, while 3DGS produces a high-quality RGB radiance field, it may not adhere to the underlying geometric structures, which leads to noisy 3D mesh, blurry artifacts, and slower convergence speeds.
4 Proposed Method
To resolve the aforementioned issues, we propose a two-part approach. Firstly, Atomized Proliferation is introduced to enhance geometric precision in areas with intricate details, and secondly, a geometry-guided optimization is utilized to compactly modeling smooth surfaces while retaining enough primitives for fine details.
4.1 Atomized Proliferation
When handling SfM points, 3DGS alternates between densification and optimization to enhance scene representation. In contrast, our method initially constrains Gaussians that represent fine details into Atom Gaussians and prioritizes their proliferation to quickly align with the scene’s inherent geometry. This is followed by a pruning strategy that merges the Gaussians representing large and smooth surfaces while preserving those representing detailed complexities. Figure 2 compares the resulted Gaussians at 7k iterations between ours and 3DGS.
Atom Gaussian Primitive: Our process begins by analyzing the input SfM points to establish the Atom scale , calculated from the first percentile of ordered distances. This scale distinguishes between Gaussians capturing fine details (Atom Gaussians) and those covering broader background elements (traditional Gaussians). Atom Gaussians are distinct in being isotropic spheroids with a uniform size (), in contrast to traditional Gaussians, which are anisotropic ellipsoids of varying sizes. The constant size of Atom Gaussians imposes the priority of densifying them to accurately fill gaps in the geometry, rather than optimizing large Gaussians that merely cover these voids. The uniform size of Atom Gaussians ensures a closer alignment with the actual 3D geometry of the scene, leading to more accurate surface representations.
Atomization: This step checks the condition regularly. If met, the Gaussian is designated as an Atom Gaussian with a size set to . Once categorized as Atom Gaussians, their scales are fixed and no longer optimized through backpropagation but through a geometric progression: , where and represents the total number of atomization iterations, and is the final proportion to the initial . This ensures that the atom scale decreases over iterations, progressively enhancing the representation of fine details.
Densification: This step is similar to original densification strategy, with the modification that allows larger Gaussians to be cloned. Moreover, we implement a linear warm-up approach to the split gradient threshold, enhancing the probability that a Gaussian will divide into smaller Atom Gaussians. Together with atomization, this strategy primarily aims to bridge the gaps in geometry.
Prune: This step is similar to the low-opacity removal method but we increase the frequency of opacity resetting in the training. The focus is to merge Gaussians that depict large, simple surfaces, instead of eliminating the noisy ones near the camera. This step concludes the atom Gaussian strategy, allowing Gaussians to adapt their scales according to the complexity of the scenes. Through this refinement process, we retain Gaussians that capture intricate high-frequency details, while merging those that represent broad and smooth surfaces.
4.2 Geometry-Guided Optimization
(a) Normal | (b) Curvature | (c) Edge | (d) Normal Loss |
To address the issue of Gaussians not always representing actual geometric structures we utilize a geometry-guided optimization, which comprises our proposed Edge-Aware Normal Loss and revised multi-scale SSIM loss. This optimization method ensures that enhancements focus on maintaining geometric accuracy without affecting the the RGB field fidelity.
We first compute the normal map, , which visually represents the surface orientations derived from a radiance field’s geometry (shown in Figure 3a). This is calculated from an unprojected depth map, , using the cross product of the depth map’s gradients:
(1) |
Following this, the curvature map is derived, representing the gradient magnitude of the normal map (shown in Figure 3b). This map indicates the rate of change in surface normals, with higher values suggesting greater variability and lower values indicating smoothness. To enhance the geometric smoothness, one could optimize the curvature map. However, this might inadvertently smooth out high-frequency details such as sharp edges or fine structures, leading to an oversmoothed appearance.
To mitigate this, an edge map , derived from the gradient magnitude of the ground truth RGB image, is also computed (shown in Figure 3c). The edge map helps preserve high-frequency details by excluding them from the smoothing process applied to the curvature map, thus maintaining essential geometric features on flat surfaces.
A weight function , where is a positive even integer, is introduced to finely balance the influence of the edge map on the curvature map. This weighting function is designed such that regions with low gradients (smooth areas) receive higher weights, promoting more smoothing, whereas regions with high gradients (sharp edges) receive lower weights to preserve detail. The tolerance determines the level of sensitivity to gradients, effectively controlling the extent to which details are either preserved or smoothed out.
The Edge-Aware Normal Loss (shown in Figure 3d) is formulated as follows:
(2) |
For improved perceptual performance, we have replaced the SSIM loss with a Multi-Scale SSIM (MS-SSIM) loss [Wang et al.(2003)Wang, Simoncelli, and Bovik] to capture a richer variety of camera view variations. The composite loss function is formulated as follows:
(3) |
where and are hyperparameters that determine the respective contributions of and to the overall loss function.
5 Experiments
This section presents comprehensive evaluations of our designed AtomGS to compare its performance in both rendering quality and 3D geometry precision against previous state-of-the-art methods. In our quantitative tables, a dashed line distinguishes between methods that explicitly represent the scene as Gaussian primitives and non-explicit methods that encode appearance or geometric information within a network. We adapt the metrics from the original papers whenever possible for consistency and comparability.
Datasets and Metrics: In our experiments, we evaluated the proposed AtomGS method using three datasets: Mip-NeRF360 [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman], Tanks&Temples [Knapitsch et al.(2017)Knapitsch, Park, Zhou, and Koltun], and DTU [Aanæs et al.(2016)Aanæs, Jensen, Vogiatzis, Tola, and Dahl]. We assessed rendering quality using three commonly employed metrics—Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) [Wang et al.(2004)Wang, Bovik, Sheikh, and Simoncelli], and Learned Perceptual Image Patch Similarity (LPIPS) [Zhang et al.(2018)Zhang, Isola, Efros, Shechtman, and Wang]—on Mip-NeRF360 and Tanks&Temples datasets. Additionally, we measured the geometry precision using the chamfer distance on the DTU dataset.
2D Rendering Quality: Table 1 provides a quantitative evaluation of rendering quality for the selected methods. Our approach surpasses all other explicit methods in three key metrics and consistently maintains a top-two performance compared to all methods evaluated. Figure 4 offers a qualitative comparison of our approach against two other explicit methods: 3DGS and SuGaR. In the flower scene, both 3DGS and SuGaR show varying levels of blurriness in areas with high frequency details, while our method maintains sharpness. In the kitchen scene, 3DGS presents noticeable artifacts close to the camera. Although SuGaR enhances surface smoothness and reduces noises, it causes unsightly distortions in the geometry of the table mat. In contrast, our method not only reproduces the smooth surface of the kitchen table but also preserves the intricate details of both the table mat and the Lego.
3D Geometry precision: Table 2 presents a quantitative comparison between our approach and other methods aimed at improving 3D reconstruction accuracy. Our approach not only surpasses other explicit methods in geometric precision but also competes favorably with SDF-based implicit methods. Note that compared with other implicit methods, our approach also benefits from faster training speeds, thereby increasing its practical applicability in real-world scenarios. Figure 5 presents a qualitative comparison of mesh reconstruction using 3DGS, SuGaR, and our method on DTU scene 24. SuGaR tends to generate flat disks due to its regularization approach; however, these disks are not always perfectly aligned with the underlying geometry. NeuS employs a signed distance function, resulting in smoother surfaces, but could also lose high-frequency details due to strong smoothness priors. In contrast, our method maintains a balance between smoothness and detail preservation.
Ablation Study: We conducted three ablation studies on the Tanks & Temples dataset [Knapitsch et al.(2017)Knapitsch, Park, Zhou, and Koltun] to evaluate the impact of different configurations on our model’s performance. The first configuration involved the removal of Atomized Proliferation from our model. Following that, we kept Atomized Proliferation but discarded either of the loss functions during training. The results from these configurations and the full model are shown in Table 3. The results indicate that Atomized Proliferation is crucial to our model’s performance. Excluding Atomized Proliferation resulted in the most significant performance decline. This suggests that simply incorporating geometry regularization in the loss functions tend to merely approximate the scene with elongated Gaussian ellipsoids instead of aligning accurately with the underlying geometry. Additionally, when each of the two advanced loss components was individually removed, there was a slight decrease in performance. This demonstrates the importance of each component in achieving optimal results.
Methods | Mip-NeRF360 | Tanks&Temples | |||||
---|---|---|---|---|---|---|---|
non-explicit | Plenoxels [Fridovich-Keil et al.(2022)Fridovich-Keil, Yu, Tancik, Chen, Recht, and Kanazawa] | 23.08 | 0.626 | 0.463 | 21.08 | 0.719 | 0.379 |
Instant-NGP [Müller et al.(2022)Müller, Evans, Schied, and Keller] | 25.59 | 0.699 | 0.331 | 21.92 | 0.745 | 0.305 | |
Mip-NeRF360 [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman] | 27.69 | 0.792 | 0.237 | 22.22 | 0.759 | 0.257 | |
TRIPS [Franke et al.(2024)Franke, Rückert, Fink, and Stamminger] | 25.94 | 0.772 | 0.233 | 24.64 | 0.808 | 0.213 | |
\hdashline explicit | 3DGS [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] | 27.21 | 0.815 | 0.214 | 23.14 | 0.841 | 0.183 |
SuGaR [Guédon and Lepetit(2023)] | 25.51 | 0.756 | 0.268 | 22.68 | 0.794 | 0.217 | |
2DGS [Huang et al.(2024a)Huang, Yu, Chen, Geiger, and Gao] | 27.02 | 0.804 | 0.238 | - | - | - | |
GES [Hamdi et al.(2024)Hamdi, Melas-Kyriazi, Qian, Mai, Liu, Vondrick, Ghanem, and Vedaldi] | 26.91 | 0.794 | 0.250 | 23.35 | 0.836 | 0.198 | |
AtomGS (Ours) | 27.38 | 0.816 | 0.211 | 23.70 | 0.849 | 0.166 |
Methods | 24 | 37 | 40 | 55 | 63 | 65 | 69 | 83 | 97 | 105 | 106 | 110 | 114 | 118 | 122 | Train | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
implicit | NeRF [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] | 1.90 | 1.60 | 1.85 | 0.58 | 2.28 | 1.27 | 1.47 | 1.67 | 2.05 | 1.07 | 0.88 | 2.53 | 1.06 | 1.15 | 0.96 | 1.49 | 4h |
VolSDF [Yariv et al.(2021)Yariv, Gu, Kasten, and Lipman] | 1.14 | 1.26 | 0.81 | 0.49 | 1.25 | 0.7 | 0.72 | 1.29 | 1.18 | 0.7 | 0.66 | 1.08 | 0.42 | 0.61 | 0.55 | 0.86 | 6h | |
NeuS [Wang et al.(2021a)Wang, Liu, Liu, Theobalt, Komura, and Wang] | 1.00 | 1.37 | 0.93 | 0.43 | 1.10 | 0.65 | 0.57 | 1.48 | 1.09 | 0.83 | 0.52 | 1.2 | 0.35 | 0.49 | 0.54 | 0.84 | 6h | |
Neuralangelo [Li et al.(2023)Li, Müller, Evans, Taylor, Unberath, Liu, and Lin] | 0.37 | 0.72 | 0.35 | 0.35 | 0.87 | 0.54 | 0.53 | 1.29 | 0.97 | 0.73 | 0.47 | 0.74 | 0.32 | 0.41 | 0.43 | 0.61 | 12h | |
\hdashline explicit | 3DGS [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] | 2.14 | 1.53 | 2.08 | 1.68 | 3.49 | 2.21 | 1.43 | 2.07 | 2.22 | 1.75 | 1.79 | 2.55 | 1.53 | 1.52 | 1.50 | 1.96 | 0.19h |
SuGaR [Guédon and Lepetit(2023)] | 1.47 | 1.33 | 1.13 | 0.61 | 2.25 | 1.71 | 1.15 | 1.63 | 1.62 | 1.07 | 0.79 | 2.45 | 0.98 | 0.88 | 0.79 | 1.33 | 1.28h | |
2DGS [Huang et al.(2024a)Huang, Yu, Chen, Geiger, and Gao] | 0.48 | 0.91 | 0.39 | 0.39 | 1.01 | 0.83 | 0.81 | 1.36 | 1.27 | 0.76 | 0.7 | 1.40 | 0.40 | 0.76 | 0.52 | 0.80 | 0.31h | |
AtomGS (Ours) | 0.51 | 0.77 | 0.53 | 0.4 | 1.07 | 0.81 | 0.87 | 1.21 | 1.14 | 0.47 | 0.70 | 1.36 | 0.36 | 0.58 | 0.43 | 0.75 | 0.07h |
(a) Ours | (b) NeuS | (c) SuGaR |
Tanks&Temples | |||
---|---|---|---|
No | 23.25 | 0.815 | 0.228 |
No | 23.58 | 0.840 | 0.181 |
No | 23.58 | 0.837 | 0.185 |
Full model | 23.70 | 0.849 | 0.166 |
6 Conclusion
In this paper, we introduced AtomGS, an approach that enhances radiance field reconstruction by focusing on uniform densification through Atomized Proliferation and refining surface details via Geometry-Guided Optimization. Our approach significantly reduces noisy geometry and blurry artifacts that are common in the previous 3DGS methods.
Nonetheless, AtomGS has its own limitations. Similar to the previous methods, our method may not produce accurate geometry for highly specular or semi-transparent material. While our method in general requires fewer GS primitives than the original 3DGS method to achieve improved visual quality, our proliferation strategy could sometimes produce more GS primitives to represent all details for highly complex environments. In the future, we aim to develop an improved pruning strategy to achieve a more compact result.
References
- [Aanæs et al.(2016)Aanæs, Jensen, Vogiatzis, Tola, and Dahl] Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, pages 1–16, 2016.
- [Atzmon and Lipman(2020)] Matan Atzmon and Yaron Lipman. Sal: Sign agnostic learning of shapes from raw data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields, 2022.
- [Chen et al.(2023)Chen, Li, and Lee] Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance, 2023.
- [Deng et al.(2020)Deng, Lewis, Jeruzalski, Pons-Moll, Hinton, Norouzi, and Tagliasacchi] Boyang Deng, John P Lewis, Timothy Jeruzalski, Gerard Pons-Moll, Geoffrey Hinton, Mohammad Norouzi, and Andrea Tagliasacchi. Nasa neural articulated shape approximation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 612–628. Springer, 2020.
- [Franke et al.(2024)Franke, Rückert, Fink, and Stamminger] Linus Franke, Darius Rückert, Laura Fink, and Marc Stamminger. Trips: Trilinear point splatting for real-time radiance field rendering. In Computer Graphics Forum, page e15012. Wiley Online Library, 2024.
- [Fridovich-Keil et al.(2022)Fridovich-Keil, Yu, Tancik, Chen, Recht, and Kanazawa] Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
- [Guédon and Lepetit(2023)] Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering, 2023.
- [Hamdi et al.(2024)Hamdi, Melas-Kyriazi, Qian, Mai, Liu, Vondrick, Ghanem, and Vedaldi] Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, **jie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, and Andrea Vedaldi. Ges: Generalized exponential splatting for efficient radiance field rendering. arXiv preprint arXiv:2402.10128, 2024.
- [Huang et al.(2024a)Huang, Yu, Chen, Geiger, and Gao] Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. arXiv preprint arXiv:2403.17888, 2024a.
- [Huang et al.(2024b)Huang, Bai, Guo, Li, and Guo] Letian Huang, Jiayang Bai, Jie Guo, Yuanqi Li, and Yanwen Guo. On the error analysis of 3d gaussian splatting and an optimal projection strategy, 2024b.
- [Jang and Agapito(2021)] Wonbong Jang and Lourdes Agapito. Codenerf: Disentangled neural radiance fields for object categories. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12949–12958, 2021.
- [Jiang et al.(2020)Jiang, Sud, Makadia, Huang, Nießner, Funkhouser, et al.] Chiyu Jiang, Avneesh Sud, Ameesh Makadia, **gwei Huang, Matthias Nießner, Thomas Funkhouser, et al. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6001–6010, 2020.
- [Kazhdan et al.(2006)Kazhdan, Bolitho, and Hoppe] Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, volume 7, 2006.
- [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
- [Knapitsch et al.(2017)Knapitsch, Park, Zhou, and Koltun] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 2017.
- [Li et al.(2023)Li, Müller, Evans, Taylor, Unberath, Liu, and Lin] Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
- [Liu et al.(2021)Liu, Zhang, Zhang, Zhang, Zhu, and Russell] Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang, Jun-Yan Zhu, and Bryan Russell. Editing conditional radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5773–5783, 2021.
- [Lu et al.(2023)Lu, Yu, Xu, Xiangli, Wang, Lin, and Dai] Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering, 2023.
- [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- [Müller et al.(2022)Müller, Evans, Schied, and Keller] Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- [Noguchi et al.(2021)Noguchi, Sun, Lin, and Harada] Atsuhiro Noguchi, Xiao Sun, Stephen Lin, and Tatsuya Harada. Neural articulated radiance field. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5762–5772, 2021.
- [Oechsle et al.(2021)Oechsle, Peng, and Geiger] Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5589–5599, 2021.
- [Ran et al.(2023)Ran, Zeng, He, Chen, Li, Chen, Lee, and Ye] Yunlong Ran, **g Zeng, Shibo He, Jiming Chen, Lincheng Li, Yingfeng Chen, Gimhee Lee, and Qi Ye. Neurar: Neural uncertainty for autonomous 3d reconstruction with implicit neural representations. IEEE Robotics and Automation Letters, 8(2):1125–1132, 2023.
- [Srinivasan et al.(2021)Srinivasan, Deng, Zhang, Tancik, Mildenhall, and Barron] Pratul P Srinivasan, Boyang Deng, Xiuming Zhang, Matthew Tancik, Ben Mildenhall, and Jonathan T Barron. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7495–7504, 2021.
- [Wang et al.(2022)Wang, Chai, He, Chen, and Liao] Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and **g Liao. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022.
- [Wang et al.(2021a)Wang, Liu, Liu, Theobalt, Komura, and Wang] Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wen** Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021a.
- [Wang et al.(2003)Wang, Simoncelli, and Bovik] Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee, 2003.
- [Wang et al.(2004)Wang, Bovik, Sheikh, and Simoncelli] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. 10.1109/TIP.2003.819861.
- [Wang et al.(2021b)Wang, Wu, Xie, Chen, and Prisacariu] Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021b.
- [Wu et al.(2022)Wu, Liu, Chen, Li, Zheng, Cai, and Zheng] Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, and Jianmin Zheng. Object-compositional neural implicit surfaces. In European Conference on Computer Vision, pages 197–213. Springer, 2022.
- [Yan et al.(2023)Yan, Low, Chen, and Lee] Zhiwen Yan, Weng Fei Low, Yu Chen, and Gim Hee Lee. Multi-scale 3d gaussian splatting for anti-aliased rendering. arXiv preprint arXiv:2311.17089, 2023.
- [Yariv et al.(2021)Yariv, Gu, Kasten, and Lipman] Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces, 2021.
- [Yen-Chen et al.(2021)Yen-Chen, Florence, Barron, Rodriguez, Isola, and Lin] Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021.
- [Zhang et al.(2018)Zhang, Isola, Efros, Shechtman, and Wang] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
Appendix A Atomized Proliferation Algorithm
The Atomized Proliferation algorithm is summarized in Algorithm 1. It starts with setting parameters for Clone Threshold (), Split Threshold (), Prune Threshold (), Atom Scale (), and defining duration limits for atomized proliferation () and warm-up phase (). The algorithm iteratively processes each Gaussian property () from the Gaussian set (). A Gaussian is pruned if its falls below the threshold or its covariance () is excessively large. If the gradient of the loss () exceeds , the Gaussian is cloned to potentially bridge geometry gaps. Additionally, the Gaussian is split when meets a dynamically adjusted threshold based on the warm-up progress and if the norm of exceeds the Atom Scale (). Atomization takes place when the minimum norm of is less than or equal to Atom Scale and within the proliferation timeframe (), ensuring detail refinement before the proliferation endpoint.
Appendix B Gaussian Proliferation Trend
In Figure 6, we illustrate the Gaussian Proliferation Trend, which tracks the count of Gaussians across iterations for nine different scenes within the Mip-NeRF360 dataset. The depicted curve represents the average number of Gaussians across these scenes, with the curve’s width indicates the standard deviation. The fluctuations observed highlight the effectiveness of the opacity resetting strategy in eliminating redundant Gaussians. Initially, the 3DGS method struggles to densify Gaussians, as indicated by an increasing standard deviation, and it appears unable to stabilize by the end of the proliferation stage. In contrast, our method employs a warm-up strategy that aggressively densifies Gaussians at the initial stage, followed by a phase where Gaussians begin to merge, leading to a declining and stabilizing trend in Gaussian proliferation.
On the Mip-NeRF360 dataset, our method demonstrates efficiency with an average training time of 0.28 hours and a final model size of 749MB, compared to 3DGS, which takes 0.40 hours for training and results in a model size of 869MB. This indicates that our approach achieves superior quality without compromising on training time or model size.
Appendix C Implementation Details
Codebase: We have developed AtomGS based on the 3D Gaussian Splatting (3DGS) framework [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis]. To facilitate Edge-Aware Normal Loss computation and Poisson mesh extraction, we have implemented an additional feature renderer. This renderer generates various maps, including accumulation, median and mean depth, normal, and curvature maps. Additionally, we’ve developed an interactive real-time viewer that allows for the monitoring of these features, providing a detailed analysis of Gaussians in terms of both RGB and geometric information. For a detailed derivation of these implementations, please refer to Section D.
Hyper Parameter Settings: Following the 3DGS, we set the Clone Threshold at , Split Threshold at , and Prune Threshold at . For the Atom-related settings, we set Atom Scale at the first percentile of distances from the input SfM points , Atomized Proliferation until iteration at , and Warm-Up until iteration at . For optimization, the weights for SSIM and normal calculations are both set at . When working with object-centered datasets that lack extensive backgrounds, we advise setting the scale learning rate to maximize geometric accuracy.
Mesh Extraction: Mesh extraction involves rendering depth maps from training views, which use median depth values from splats projected onto pixels. These maps are then converted back into 3D space to derive corresponding normal maps. The oriented colored point cloud generated from the RGB image, depth map, and normal map serves as the input for the Poisson extraction method [Kazhdan et al.(2006)Kazhdan, Bolitho, and Hoppe], which is used to create the textured mesh. This process is illustrated in Figure 7.
Hardware: All experiments are conducted on a single NVIDIA GeForce RTX 4090 GPU.
Appendix D Gaussian Splatting and Additional Feature Rendering
Splatting: During this stage, 3D Gaussians are projected into the 2D image space to facilitate rendering. Utilizing the viewing transformation and the 3D covariance matrix , the projected 2D covariance matrix is computed through . Additionally, we can use the same transformation to compute in 2D projected space. Given the position of a pixel , 3D Gaussian splitting can be formed as follows:
(4) |
Rendering: Upon receiving the position of a pixel , the distances to all overlap** Gaussians are computed using the viewing transformation , thereby generating a sorted list of Gaussians . Subsequently, alpha compositing is employed to render accumulated weight for each pixel:
(5) |
Using the above weight function, several key maps can be derived for each pixel:
-
1.
RGB Color Map:
(6) accumulates the RGB colors , each weighted by the respective , to produce the final color output for each pixel.
-
2.
Accumulation Map:
(7) which aggregates the computed weights across all Gaussians.
-
3.
Mean (Expected) Depth Map:
(8) where represents the depth associated with each Gaussian, weighted by .
-
4.
Median Depth Map:
(9) calculates the median depth by identifying the first Gaussian for which the Transmittance value exceeds 0.5.
Depth Map Unprojection: Given a depth map of size , where are pixel coordinates and is the depth at pixel , the unprojecting steps are as follows:
-
1.
Normalization: The pixel coordinates are normalized to the range using and .
-
2.
Homogeneous Coordinates in Camera Space: The normalized coordinates are then transformed into homogeneous camera space coordinates .
-
3.
Depth Scaling: Using elements and from the camera projection matrix , the depth values are scaled as . The adjusted camera space coordinates are set to .
-
4.
World Space Transformation: The transformed camera space coordinates are then multiplied by the inverse of the full projection transform matrix , resulting in .
-
5.
Discarding Homogeneous Coordinate: Finally, to obtain Cartesian coordinates, the homogeneous coordinate is discarded: . This results in being the 3D coordinates in world space for each pixel.
Normal Map Calculation: Given an unprojected depth map, , we can output the corresponding normal map using the cross product of the depth map’s gradients:
(10) |
Appendix E Additional Results
We provide detailed per-scene metrics for the Mip-NeRF360 and Tank & Temple datasets in Table 4. Additionally, we offer further insights through 2D rendering comparisons in Figure 8 and 3D mesh comparisons in Figure 9.
Bicycle | Flowers | Garden | Stump | Treehill | Room | Counter | Kitchen | Bonsai | Truck | Train | Mean | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR | 3DGS | 25.10 | 21.52 | 27.18 | 26.49 | 22.37 | 31.22 | 28.96 | 30.98 | 32.18 | 25.39 | 22.02 | 26.67 |
SuGaR | 23.13 | 19.67 | 25.30 | 24.23 | 21.44 | 29.85 | 27.53 | 29.33 | 30.47 | 22.69 | 20.47 | 24.92 | |
Ours | 25.33 | 21.71 | 27.44 | 26.58 | 22.11 | 31.30 | 28.97 | 30.88 | 32.15 | 25.47 | 21.93 | 26.72 | |
\hdashline SSIM | 3DGS | 0.763 | 0.603 | 0.860 | 0.763 | 0.626 | 0.916 | 0.905 | 0.923 | 0.939 | 0.878 | 0.812 | 0.817 |
SuGaR | 0.663 | 0.514 | 0.793 | 0.669 | 0.558 | 0.901 | 0.882 | 0.892 | 0.928 | 0.827 | 0.762 | 0.763 | |
Ours | 0.772 | 0.611 | 0.865 | 0.774 | 0.633 | 0.918 | 0.906 | 0.925 | 0.938 | 0.880 | 0.817 | 0.822 | |
\hdashline LPIPS | 3DGS | 0.205 | 0.332 | 0.107 | 0.213 | 0.326 | 0.219 | 0.200 | 0.127 | 0.204 | 0.148 | 0.208 | 0.208 |
SuGaR | 0.307 | 0.378 | 0.182 | 0.307 | 0.408 | 0.2395 | 0.222 | 0.167 | 0.211 | 0.175 | 0.259 | 0.260 | |
Ours | 0.203 | 0.325 | 0.104 | 0.202 | 0.317 | 0.222 | 0.202 | 0.127 | 0.202 | 0.133 | 0.200 | 0.203 |
In Figure 8, the RGB rendering results of our AtomGS show enhanced detail compared to those of Sugar. This is evident in the regions observed on the tree trunk in the "stump" scene and the slender, curly hay near the dried grass ornament in the "garden" scene, as highlighted in the magnified areas. While AtomGS’s RGB renderings appear visually similar to those of 3DGS, the normal maps reveal that AtomGS better preserves geometry accuracy, such as the tree trunk in the "stump" scene and both the ground beneath the table and the surface of the soccer ball in the "garden" scene.
In Figure 9, Neus, which uses Signed Distance Functions (SDF), produces the smoothest surfaces. However it sometimes sacrifices sharp features, leading to overly smoothed surfaces. SuGaR attempts to convert every Gaussian into 2D ellipsoidal disks, resulting in relatively smooth surfaces. However, the disks do not always align perfectly with the surfaces, creating noticeable disk-shaped artifacts and sometimes overfitting the background. In contrast, AtomGS achieves smooth surfaces while retaining detailed geometries.
DTU 24 |
|||
---|---|---|---|
DTU 106 |
|||
DTU 122 |
|||
Lego |
|||
Chair |
|||
Hotdog |
|||
(a) Ours | (b) NeuS | (c) SuGaR |