\addauthor

Rong [email protected] \addauthorRui [email protected] \addauthorYue [email protected] \addauthorMeida [email protected] \addauthorAndrew [email protected] \addinstitution University of Southern California
Institute for Creative Technologies
Los Angeles, CA, USA AtomGS

AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field

Abstract

3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (https://rongliu-leo.github.io/AtomGS/).

Refer to caption
Figure 1: AtomGS outperforms existing methods in rendering quality and achieves competitive results in geometry accuracy by constraining Gaussians into Atom Gaussians and aligning them precisely with the natural geometry.

1 Introduction

Multi-view 3D reconstruction and Novel view synthesis remain significant challenges in computer vision and graphics. A successful reconstruction requires both high-quality visual renderings from new viewpoints and precise capture of 3D geometry. These attributes ensure that models are visually appealing and accurate, making them valuable across various applications such as video games, VR/AR, digital twins, 3D map**, simulation, scan-to-BIM, and more. Neural Radiance Fields (NeRF) [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] has achieved significant progress in producing photorealistic renderings through implicit 3D representation; however, the limitation in rendering speed prevents its utilization in real-world applications. In response, 3D Gaussian Splatting (3DGS) [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] has emerged as a promising alternative, offering an explicit method that achieves fast rendering speeds while maintaining high rendering quality.

Existing works of 3DGS have primarily emphasized either improving rendering quality or enhancing 3D geometry accuracy [Lu et al.(2023)Lu, Yu, Xu, Xiangli, Wang, Lin, and Dai, Huang et al.(2024b)Huang, Bai, Guo, Li, and Guo, Yan et al.(2023)Yan, Low, Chen, and Lee, Guédon and Lepetit(2023), Chen et al.(2023)Chen, Li, and Lee]. Efforts that enhance rendering quality focus less on geometric precision, while those that aim to refine geometry often lead to reduced rendering quality. To address these challenges, we introduce AtomGS, an approach that enhances geometric precision in areas with fine details through our proposed Atomized Proliferation process and Edge-Aware Normal Loss. This enhancement in geometric detail consequently leads to improved rendering quality as shown in our experiments. Figure 1 compares the results of our AtomGS method with existing methods, showcasing improvements in rendering quality and geometry surface normals.

Refer to caption Refer to caption
(a) Ours (b) 3DGS
Figure 2: Rendering Comparison of 7k Results: Ours vs. 3DGS. We display images using both full-size and shrunken Gaussians, examining the rendering effects and Gaussian placements. Our approach results in more precise geometric alignments, visible in fine details like bicycle spokes and blades of grass.

Our proposed AtomGS refines 3DGS by strategically deploying Atom Gaussians to ensure detailed coverage of complex scenes through the Atomized Proliferation process. In contrast to the 3DGS, AtomGS provides precise guidance on where to focus Gaussians for better 3D geometry optimization. Intuitively, smaller anisotropic Gaussians are constrained into uniformly-sized Atom Gaussians followed by a progressive split schedule to provide better coverage of scene details. Meanwhile, larger anisotropic Gaussians are retained to represent the background or geometric areas with few features. To facilitate this process, we have introduced an Edge-Aware Normal Loss that imposes stricter constraints on the positioning of Gaussians aligned with flat surfaces while allowing more flexibility for those on irregularly shaped areas. Specifically, we integrated weights derived from a 2D edge detector into the curvature map to compute this loss. In addition, during the subsequent pruning phase, Gaussians covering large, less detailed surfaces are merged, resulting in a similar or even reduced number of Gaussians compared to the original 3DGS while retaining competitive rendering quality. To summarize, the main contributions of our paper are listed as follows:

  1. 1.

    We introduced an Atomized Proliferation strategy aimed at enhancing rendering quality by refining 3D geometric precision in areas with fine details.

  2. 2.

    We designed an Edge-Aware Normal Loss to enhance the reconstruction accuracy by preserving details in areas with irregular shapes while reducing noise on flat surfaces.

  3. 3.

    Our proposed AtomGS has achieved state-of-the-art performance on several benchmark datasets, excelling in both rendering quality and geometric precision.

2 Related work

2.1 Novel View Synthesis

Volumetric methods, traditionally utilized for 3D scene representation [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng, Wang et al.(2022)Wang, Chai, He, Chen, and Liao, Wang et al.(2021b)Wang, Wu, Xie, Chen, and Prisacariu, Yen-Chen et al.(2021)Yen-Chen, Florence, Barron, Rodriguez, Isola, and Lin, Srinivasan et al.(2021)Srinivasan, Deng, Zhang, Tancik, Mildenhall, and Barron, Deng et al.(2020)Deng, Lewis, Jeruzalski, Pons-Moll, Hinton, Norouzi, and Tagliasacchi, Jang and Agapito(2021), Liu et al.(2021)Liu, Zhang, Zhang, Zhang, Zhu, and Russell, Noguchi et al.(2021)Noguchi, Sun, Lin, and Harada], involve subdividing space into discrete units known as voxels, allowing for detailed modeling of internal structures but often suffering from resolution limitations and high computational costs. In contrast, implicit neural representations [Jiang et al.(2020)Jiang, Sud, Makadia, Huang, Nießner, Funkhouser, et al., Wu et al.(2022)Wu, Liu, Chen, Li, Zheng, Cai, and Zheng, Ran et al.(2023)Ran, Zeng, He, Chen, Li, Chen, Lee, and Ye] have revolutionized 3D modeling by using continuous mathematical functions, learned by neural networks, to represent complex geometries and appearances without the need for spatial discretization.

Building on these advancements, NeRF [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] employ a coordinate-based neural network to encode volumetric scenes, providing unprecedented detail and realism in novel view synthesis, particularly effective in capturing complex light interactions and intricate details. Building upon the original NeRF framework, InstantNGP [Müller et al.(2022)Müller, Evans, Schied, and Keller] introduces a multiresolution hash encoding that efficiently stores and retrieves neural network data, significantly boosting both training and inference speeds while striking a balance between performance and accuracy. On the other hand, Mip-NeRF 360 [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman] extends NeRF’s capabilities to render large-scale, unbounded 360-degree scenes with consistent quality, effectively managing varied lighting conditions in expansive environments.

As we progress to alternative rendering solutions, 3DGS [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] has gained popularity for its ability to enhance visual rendering effects and speed through the optimized use of anisotropic 3D Gaussian ellipsoids and rasterized splatting techniques. Diverging from traditional 3DGS methods which freely drift and split, Lu et al.’s Scaffold-GS [Lu et al.(2023)Lu, Yu, Xu, Xiangli, Wang, Lin, and Dai] leverages scene structure to guide the distribution of 3D Gaussians, allowing for adaptive modifications based on varying viewing angles and distances. Additionally, Huang et.al  [Huang et al.(2024b)Huang, Bai, Guo, Li, and Guo] focuses on analyzing and reducing the artifacts caused by 3DGS errors, aiming to optimize rendering quality. For different levels of detail (LOD) in 3DGS scenes, Yan et.al, introduced a multi-scale approach [Yan et al.(2023)Yan, Low, Chen, and Lee] to enable selective rendering that yields faster and more precise outcomes.

While the methods above focus on enhancing the visual accuracy of rendered images, they often lack the geometric constraints necessary for high-quality surface reconstruction. In contrast, our method balances visual and geometric accuracy by refining the underlying geometry alignment of Gaussians.

2.2 Multi-View 3D Mesh Reconstruction

Inspired by NeRF [Yen-Chen et al.(2021)Yen-Chen, Florence, Barron, Rodriguez, Isola, and Lin], NeuS [Wang et al.(2021a)Wang, Liu, Liu, Theobalt, Komura, and Wang] integrates a Signed Distance Function (SDF) into radiance field to learn a neural SDF representation from multi-view images, thereby representing object surfaces with volumetric rendering accurately. Other concurrent works such as VolSDF [Yariv et al.(2021)Yariv, Gu, Kasten, and Lipman] and UNISURF [Oechsle et al.(2021)Oechsle, Peng, and Geiger] enhance surface reconstruction by improving ray sampling accuracy and simplifying the reconstruction process, respectively. Based on NeuS, Neuralangelo [Li et al.(2023)Li, Müller, Evans, Taylor, Unberath, Liu, and Lin] proposes coarse-to-fine optimization on the hash grids and examines higher-order derivatives to reconstruct surfaces, leading to improved geometry accuracy and fine detail.

While SDF-based methods have greatly enhanced geometric surface reconstruction, they often result in poorer visual rendering performance and reduced reconstruction speeds due to the integration of SDF. Moreover, sphere initialization [Atzmon and Lipman(2020)] is crucial for model convergence, limiting the application to datasets where the object is not centrally located.

Inspired by 3DGS, newer methods tackle precise geometric reconstruction with explicit representation. SuGaR [Guédon and Lepetit(2023)] proposes geometry constraints to regularize 3DGS, achieving geometry improvement, while NeuSG [Chen et al.(2023)Chen, Li, and Lee] introduces a scale regularizer to ensure the accuracy of the reconstructed surfaces by enforcing the 3D Gaussians to be extremely thin.

Despite significant progress in methods for accurate geometric reconstruction, they often lead to a reduction in rendering quality. Our method builds on better alignment between Gaussians and the inherent geometry and designs subsequent optimization processes for improved visual quality.

3 Preliminary for 3D Gaussian Splatting

Proposed as an alternative to NeRF-based methods, 3DGS combines differentiable optimization and non-differentiable adaptive density control for modeling the radiance field.

Ellipsoidal Gaussian Primitive: A 3D Gaussian Primitive is defined as Gi:={𝝁,α,𝒒,𝒔,𝒇}assignsubscript𝐺𝑖𝝁𝛼𝒒𝒔𝒇G_{i}:=\{\bm{\mu},\alpha,\bm{q},\bm{s},\bm{f}\}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := { bold_italic_μ , italic_α , bold_italic_q , bold_italic_s , bold_italic_f }, where 𝝁3𝝁superscript3\bm{\mu}\in\mathbb{R}^{3}bold_italic_μ ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT represents the center of the Gaussian, also referred to as its position; α[0,1]𝛼01\alpha\in[0,1]italic_α ∈ [ 0 , 1 ] signifies its opacity; 𝒒[1,1]4𝒒superscript114\bm{q}\in[-1,1]^{4}bold_italic_q ∈ [ - 1 , 1 ] start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT denotes the quaternion; 𝒔[0,)3𝒔superscript03\bm{s}\in[0,\infty)^{3}bold_italic_s ∈ [ 0 , ∞ ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT stands for the scale; and 𝒇d𝒇superscript𝑑\bm{f}\in\mathbb{R}^{d}bold_italic_f ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT represents its learnable features. The color 𝒄𝒄\bm{c}bold_italic_c is determined using spherical harmonics based on 𝒇𝒇\bm{f}bold_italic_f. The 3D covariance matrix is computed as 𝚺=𝑹𝑺𝑺𝑹𝚺𝑹𝑺superscript𝑺topsuperscript𝑹top\bm{\Sigma}=\bm{R}\bm{S}\bm{S}^{\top}\bm{R}^{\top}bold_Σ = bold_italic_R bold_italic_S bold_italic_S start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_R start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, where 𝑹𝑹\bm{R}bold_italic_R and 𝑺𝑺\bm{S}bold_italic_S are the rotation and scale matrices derived from 𝒒𝒒\bm{q}bold_italic_q and 𝒔𝒔\bm{s}bold_italic_s, respectively.

During training, all Gaussian properties are optimized with the loss function: =(1λ)1+λDSSIM1𝜆subscript1𝜆subscript𝐷𝑆𝑆𝐼𝑀\mathcal{L}=(1-\lambda)\mathcal{L}_{1}+\lambda\mathcal{L}_{D-SSIM}caligraphic_L = ( 1 - italic_λ ) caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_λ caligraphic_L start_POSTSUBSCRIPT italic_D - italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT, where 1subscript1\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT prioritizes pixel-wise accuracy, DSSIMsubscript𝐷𝑆𝑆𝐼𝑀\mathcal{L}_{D-SSIM}caligraphic_L start_POSTSUBSCRIPT italic_D - italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT emphasizes structural similarity and perceptual quality, and λ𝜆\lambdaitalic_λ serves as the weighting factor.

Adaptive Density Control: The stage inserts, splits or prunes existing Gaussian primitives to better represent the 3D scene.

  1. 1.

    SfM Initialization: Given the SfM points, 3DGS calculates the mean distance to the closest three points, denoted as 𝒅[0,)i𝒅superscript0𝑖\bm{d}\in[0,\infty)^{i}bold_italic_d ∈ [ 0 , ∞ ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT, where i𝑖iitalic_i is the number of initialized SfM points. Then it employs this distance to initialize isotropic Gaussians. Additionally, considering camera centers 𝝅k𝝅superscript𝑘\bm{\pi}\in\mathbb{R}^{k}bold_italic_π ∈ blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT, where k𝑘kitalic_k is the number of training camera poses, 3DGS determines its radius using r=max(𝝅π¯)𝑟𝝅¯𝜋r=\max(\bm{\pi}-\bar{\pi})italic_r = roman_max ( bold_italic_π - over¯ start_ARG italic_π end_ARG ), where π¯¯𝜋\bar{\pi}over¯ start_ARG italic_π end_ARG represents the mean of all camera centers. Subsequently, it sets the scale threshold to τs=0.01rsubscript𝜏𝑠0.01𝑟\tau_{s}=0.01ritalic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 0.01 italic_r, which decides whether to clone or split the Gaussian if the gradient condition is satisfied.

  2. 2.

    Densification: 3DGS adaptively densifies Gaussians to enhance scene detail capture. This densification process occurs regularly, targeting Gaussians with view-space positional gradients equal to or greater than the gradient threshold τpsubscript𝜏𝑝\tau_{p}italic_τ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Following the gradient condition pτpsubscript𝑝subscript𝜏𝑝\nabla_{p}\mathcal{L}\geq\tau_{p}∇ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT caligraphic_L ≥ italic_τ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, it then checks if 𝑺maxτssubscriptnorm𝑺subscript𝜏𝑠||\bm{S}||_{\max}\geq\tau_{s}| | bold_italic_S | | start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ≥ italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. If the scale condition is met, the Gaussian is identified as over-reconstructed and split (creating two Gaussians with positions normally sampled based on the original). If not, it’s classified as under-reconstructed, leading to the clone of an identical Gaussian.

  3. 3.

    Prune: The point pruning phase involves removing redundant or less important Gaussians. This involves deleting Gaussians with the opacity α𝛼\alphaitalic_α below a specified threshold. Moreover, to prevent producing noisy Gaussians near input cameras, the alpha values are gradually set closer to zero after a certain number of iterations. This adjustment facilitates the densification of necessary Gaussians while eliminating redundant ones.

Challenges: 3DGS presents a hybrid optimization approach by integrating differentiable backpropagation with non-differentiable adaptive density control. However, it faces several challenges impacting its effectiveness. First, there’s a lack of prioritization in the optimization process, where the method may focus on enlarging large Gaussians instead of densifying smaller ones to fill in gaps in geometry, or it might replicate transparent Gaussians to mimic a solid surface rather than enhancing the alignment and opacity of existing ones. Second, the absence of geometric regularization leads to misalignment of Gaussians with the underlying geometry, creating noisy artifacts that require opacity adjustments to clean up. Lastly, the simplistic approach to scale thresholding is influenced largely on camera pose radii than scene complexity, which restricts the method’s ability to finely tune the splitting and cloning of Gaussians based on the detail needed for effective scene representation. Consequently, while 3DGS produces a high-quality RGB radiance field, it may not adhere to the underlying geometric structures, which leads to noisy 3D mesh, blurry artifacts, and slower convergence speeds.

4 Proposed Method

To resolve the aforementioned issues, we propose a two-part approach. Firstly, Atomized Proliferation is introduced to enhance geometric precision in areas with intricate details, and secondly, a geometry-guided optimization is utilized to compactly modeling smooth surfaces while retaining enough primitives for fine details.

4.1 Atomized Proliferation

When handling SfM points, 3DGS alternates between densification and optimization to enhance scene representation. In contrast, our method initially constrains Gaussians that represent fine details into Atom Gaussians and prioritizes their proliferation to quickly align with the scene’s inherent geometry. This is followed by a pruning strategy that merges the Gaussians representing large and smooth surfaces while preserving those representing detailed complexities. Figure 2 compares the resulted Gaussians at 7k iterations between ours and 3DGS.

Atom Gaussian Primitive: Our process begins by analyzing the input SfM points to establish the Atom scale 𝒮a=P1(𝒅)subscript𝒮𝑎subscript𝑃1𝒅\mathcal{S}_{a}=P_{1}(\bm{d})caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_d ), calculated from the first percentile of ordered distances. This scale distinguishes between Gaussians capturing fine details (Atom Gaussians) and those covering broader background elements (traditional Gaussians). Atom Gaussians are distinct in being isotropic spheroids with a uniform size (𝒔1=𝒔2==𝒔i=(𝒮a)3subscript𝒔1subscript𝒔2subscript𝒔𝑖superscriptsubscript𝒮𝑎3\bm{s}_{1}=\bm{s}_{2}=...=\bm{s}_{i}=(\mathcal{S}_{a})^{3}bold_italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = … = bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT), in contrast to traditional Gaussians, which are anisotropic ellipsoids of varying sizes. The constant size of Atom Gaussians imposes the priority of densifying them to accurately fill gaps in the geometry, rather than optimizing large Gaussians that merely cover these voids. The uniform size of Atom Gaussians ensures a closer alignment with the actual 3D geometry of the scene, leading to more accurate surface representations.

Atomization: This step checks the condition 𝑺min𝒮asubscriptnorm𝑺subscript𝒮𝑎||\bm{S}||_{\min}\leq\mathcal{S}_{a}| | bold_italic_S | | start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≤ caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT regularly. If met, the Gaussian is designated as an Atom Gaussian with a size set to 𝒔=(𝒮a)3𝒔superscriptsubscript𝒮𝑎3\bm{s}=(\mathcal{S}_{a})^{3}bold_italic_s = ( caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. Once categorized as Atom Gaussians, their scales are fixed and no longer optimized through backpropagation but through a geometric progression: 𝒮an=𝒮anrn1subscript𝒮𝑎𝑛subscript𝒮𝑎𝑛superscript𝑟𝑛1\mathcal{S}_{an}=\mathcal{S}_{an}\cdot r^{n-1}caligraphic_S start_POSTSUBSCRIPT italic_a italic_n end_POSTSUBSCRIPT = caligraphic_S start_POSTSUBSCRIPT italic_a italic_n end_POSTSUBSCRIPT ⋅ italic_r start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT, where r=pta𝑟subscript𝑡𝑎𝑝r=\sqrt[t_{a}]{p}italic_r = nth-root start_ARG italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_ARG start_ARG italic_p end_ARG and tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT represents the total number of atomization iterations, and p𝑝pitalic_p is the final proportion to the initial 𝒮asubscript𝒮𝑎\mathcal{S}_{a}caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. This ensures that the atom scale decreases over iterations, progressively enhancing the representation of fine details.

Densification: This step is similar to original densification strategy, with the modification that allows larger Gaussians to be cloned. Moreover, we implement a linear warm-up approach to the split gradient threshold, enhancing the probability that a Gaussian will divide into smaller Atom Gaussians. Together with atomization, this strategy primarily aims to bridge the gaps in geometry.

Prune: This step is similar to the low-opacity removal method but we increase the frequency of opacity resetting in the training. The focus is to merge Gaussians that depict large, simple surfaces, instead of eliminating the noisy ones near the camera. This step concludes the atom Gaussian strategy, allowing Gaussians to adapt their scales according to the complexity of the scenes. Through this refinement process, we retain Gaussians that capture intricate high-frequency details, while merging those that represent broad and smooth surfaces.

4.2 Geometry-Guided Optimization

Refer to caption Refer to caption Refer to caption Refer to caption
(a) Normal (b) Curvature (c) Edge (d) Normal Loss
Figure 3: Illustration of Edge-Aware Normal Loss. The normal map is rendered from the 3DGS 30k result. Based on that, we compute the normal loss map, showing the areas that our proposed loss could optimize.

To address the issue of Gaussians not always representing actual geometric structures we utilize a geometry-guided optimization, which comprises our proposed Edge-Aware Normal Loss and revised multi-scale SSIM loss. This optimization method ensures that enhancements focus on maintaining geometric accuracy without affecting the the RGB field fidelity.

We first compute the normal map, 𝑵𝑵\bm{N}bold_italic_N, which visually represents the surface orientations derived from a radiance field’s geometry (shown in Figure 3a). This is calculated from an unprojected depth map, 𝑫H×W×3𝑫superscript𝐻𝑊3\bm{D}\in\mathbb{R}^{H\times W\times 3}bold_italic_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 3 end_POSTSUPERSCRIPT, using the cross product of the depth map’s gradients:

𝑵=x𝑫×y𝑫x𝑫×y𝑫𝑵subscript𝑥𝑫subscript𝑦𝑫normsubscript𝑥𝑫subscript𝑦𝑫\bm{N}=\frac{\nabla_{x}\bm{D}\times\nabla_{y}\bm{D}}{||\nabla_{x}\bm{D}\times% \nabla_{y}\bm{D}||}bold_italic_N = divide start_ARG ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT bold_italic_D × ∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT bold_italic_D end_ARG start_ARG | | ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT bold_italic_D × ∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT bold_italic_D | | end_ARG (1)

Following this, the curvature map |𝑵|[0,1]H×W𝑵superscript01𝐻𝑊|\nabla\bm{N}|\in[0,1]^{H\times W}| ∇ bold_italic_N | ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_H × italic_W end_POSTSUPERSCRIPT is derived, representing the gradient magnitude of the normal map (shown in Figure 3b). This map indicates the rate of change in surface normals, with higher values suggesting greater variability and lower values indicating smoothness. To enhance the geometric smoothness, one could optimize the curvature map. However, this might inadvertently smooth out high-frequency details such as sharp edges or fine structures, leading to an oversmoothed appearance.

To mitigate this, an edge map |𝑰|[0,1]H×W𝑰superscript01𝐻𝑊|\nabla\bm{I}|\in[0,1]^{H\times W}| ∇ bold_italic_I | ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_H × italic_W end_POSTSUPERSCRIPT, derived from the gradient magnitude of the ground truth RGB image, is also computed (shown in Figure 3c). The edge map helps preserve high-frequency details by excluding them from the smoothing process applied to the curvature map, thus maintaining essential geometric features on flat surfaces.

A weight function ω(x)=(x1)q𝜔𝑥superscript𝑥1𝑞\omega(x)=(x-1)^{q}italic_ω ( italic_x ) = ( italic_x - 1 ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT, where q𝑞qitalic_q is a positive even integer, is introduced to finely balance the influence of the edge map on the curvature map. This weighting function is designed such that regions with low gradients (smooth areas) receive higher weights, promoting more smoothing, whereas regions with high gradients (sharp edges) receive lower weights to preserve detail. The tolerance q𝑞qitalic_q determines the level of sensitivity to gradients, effectively controlling the extent to which details are either preserved or smoothed out.

The Edge-Aware Normal Loss (shown in Figure 3d) is formulated as follows:

Normal=1HWiHjW|𝑵|ω(|𝑰|)subscript𝑁𝑜𝑟𝑚𝑎𝑙1𝐻𝑊superscriptsubscript𝑖𝐻superscriptsubscript𝑗𝑊tensor-product𝑵𝜔𝑰\mathcal{L}_{Normal}=\frac{1}{HW}\sum_{i}^{H}\sum_{j}^{W}|\nabla\bm{N}|\otimes% \omega(|\nabla\bm{I}|)caligraphic_L start_POSTSUBSCRIPT italic_N italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_H italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_W end_POSTSUPERSCRIPT | ∇ bold_italic_N | ⊗ italic_ω ( | ∇ bold_italic_I | ) (2)

For improved perceptual performance, we have replaced the SSIM loss with a Multi-Scale SSIM (MS-SSIM) loss [Wang et al.(2003)Wang, Simoncelli, and Bovik] to capture a richer variety of camera view variations. The composite loss function is formulated as follows:

=(1λSSIM)1+λSSIMMSSSIM+λNormalNormal,1subscript𝜆𝑆𝑆𝐼𝑀subscript1subscript𝜆𝑆𝑆𝐼𝑀subscript𝑀𝑆𝑆𝑆𝐼𝑀subscript𝜆𝑁𝑜𝑟𝑚𝑎𝑙subscript𝑁𝑜𝑟𝑚𝑎𝑙\mathcal{L}=(1-\lambda_{SSIM})\mathcal{L}_{1}+\lambda_{SSIM}\mathcal{L}_{MS-% SSIM}+\lambda_{Normal}\mathcal{L}_{Normal},caligraphic_L = ( 1 - italic_λ start_POSTSUBSCRIPT italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT ) caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_M italic_S - italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_N italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_N italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT , (3)

where λSSIMsubscript𝜆𝑆𝑆𝐼𝑀\lambda_{SSIM}italic_λ start_POSTSUBSCRIPT italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT and λNormalsubscript𝜆𝑁𝑜𝑟𝑚𝑎𝑙\lambda_{Normal}italic_λ start_POSTSUBSCRIPT italic_N italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT are hyperparameters that determine the respective contributions of MSSSIMsubscript𝑀𝑆𝑆𝑆𝐼𝑀\mathcal{L}_{MS-SSIM}caligraphic_L start_POSTSUBSCRIPT italic_M italic_S - italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT and Normalsubscript𝑁𝑜𝑟𝑚𝑎𝑙\mathcal{L}_{Normal}caligraphic_L start_POSTSUBSCRIPT italic_N italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT to the overall loss function.

5 Experiments

Refer to caption
Figure 4: Radiance Field Comparison on the Mip-NeRF360 Dataset.

This section presents comprehensive evaluations of our designed AtomGS to compare its performance in both rendering quality and 3D geometry precision against previous state-of-the-art methods. In our quantitative tables, a dashed line distinguishes between methods that explicitly represent the scene as Gaussian primitives and non-explicit methods that encode appearance or geometric information within a network. We adapt the metrics from the original papers whenever possible for consistency and comparability.

Datasets and Metrics: In our experiments, we evaluated the proposed AtomGS method using three datasets: Mip-NeRF360 [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman], Tanks&Temples [Knapitsch et al.(2017)Knapitsch, Park, Zhou, and Koltun], and DTU [Aanæs et al.(2016)Aanæs, Jensen, Vogiatzis, Tola, and Dahl]. We assessed rendering quality using three commonly employed metrics—Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) [Wang et al.(2004)Wang, Bovik, Sheikh, and Simoncelli], and Learned Perceptual Image Patch Similarity (LPIPS) [Zhang et al.(2018)Zhang, Isola, Efros, Shechtman, and Wang]—on Mip-NeRF360 and Tanks&Temples datasets. Additionally, we measured the geometry precision using the chamfer distance on the DTU dataset.

2D Rendering Quality: Table 1 provides a quantitative evaluation of rendering quality for the selected methods. Our approach surpasses all other explicit methods in three key metrics and consistently maintains a top-two performance compared to all methods evaluated. Figure 4 offers a qualitative comparison of our approach against two other explicit methods: 3DGS and SuGaR. In the flower scene, both 3DGS and SuGaR show varying levels of blurriness in areas with high frequency details, while our method maintains sharpness. In the kitchen scene, 3DGS presents noticeable artifacts close to the camera. Although SuGaR enhances surface smoothness and reduces noises, it causes unsightly distortions in the geometry of the table mat. In contrast, our method not only reproduces the smooth surface of the kitchen table but also preserves the intricate details of both the table mat and the Lego.

3D Geometry precision: Table 2 presents a quantitative comparison between our approach and other methods aimed at improving 3D reconstruction accuracy. Our approach not only surpasses other explicit methods in geometric precision but also competes favorably with SDF-based implicit methods. Note that compared with other implicit methods, our approach also benefits from faster training speeds, thereby increasing its practical applicability in real-world scenarios. Figure 5 presents a qualitative comparison of mesh reconstruction using 3DGS, SuGaR, and our method on DTU scene 24. SuGaR tends to generate flat disks due to its regularization approach; however, these disks are not always perfectly aligned with the underlying geometry. NeuS employs a signed distance function, resulting in smoother surfaces, but could also lose high-frequency details due to strong smoothness priors. In contrast, our method maintains a balance between smoothness and detail preservation.

Ablation Study: We conducted three ablation studies on the Tanks & Temples dataset [Knapitsch et al.(2017)Knapitsch, Park, Zhou, and Koltun] to evaluate the impact of different configurations on our model’s performance. The first configuration involved the removal of Atomized Proliferation from our model. Following that, we kept Atomized Proliferation but discarded either of the loss functions during training. The results from these configurations and the full model are shown in Table 3. The results indicate that Atomized Proliferation is crucial to our model’s performance. Excluding Atomized Proliferation resulted in the most significant performance decline. This suggests that simply incorporating geometry regularization in the loss functions tend to merely approximate the scene with elongated Gaussian ellipsoids instead of aligning accurately with the underlying geometry. Additionally, when each of the two advanced loss components was individually removed, there was a slight decrease in performance. This demonstrates the importance of each component in achieving optimal results.

Methods Mip-NeRF360 Tanks&Temples
PSNRsuperscriptPSNR\text{PSNR}^{\uparrow}PSNR start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT SSIMsuperscriptSSIM\text{SSIM}^{\uparrow}SSIM start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT LPIPSsuperscriptLPIPS\text{LPIPS}^{\downarrow}LPIPS start_POSTSUPERSCRIPT ↓ end_POSTSUPERSCRIPT PSNRsuperscriptPSNR\text{PSNR}^{\uparrow}PSNR start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT SSIMsuperscriptSSIM\text{SSIM}^{\uparrow}SSIM start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT LPIPSsuperscriptLPIPS\text{LPIPS}^{\downarrow}LPIPS start_POSTSUPERSCRIPT ↓ end_POSTSUPERSCRIPT
non-explicit Plenoxels [Fridovich-Keil et al.(2022)Fridovich-Keil, Yu, Tancik, Chen, Recht, and Kanazawa] 23.08 0.626 0.463 21.08 0.719 0.379
Instant-NGP [Müller et al.(2022)Müller, Evans, Schied, and Keller] 25.59 0.699 0.331 21.92 0.745 0.305
Mip-NeRF360 [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman] 27.69 0.792 0.237 22.22 0.759 0.257
TRIPS [Franke et al.(2024)Franke, Rückert, Fink, and Stamminger] 25.94 0.772 0.233 24.64 0.808 0.213
\hdashline explicit 3DGS [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] 27.21 0.815 0.214 23.14 0.841 0.183
SuGaR [Guédon and Lepetit(2023)] 25.51 0.756 0.268 22.68 0.794 0.217
2DGS [Huang et al.(2024a)Huang, Yu, Chen, Geiger, and Gao] 27.02 0.804 0.238 - - -
GES [Hamdi et al.(2024)Hamdi, Melas-Kyriazi, Qian, Mai, Liu, Vondrick, Ghanem, and Vedaldi] 26.91 0.794 0.250 23.35 0.836 0.198
AtomGS (Ours) 27.38 0.816 0.211 23.70 0.849 0.166
Table 1: Quantitative evaluation of 2D rendering results on the Mip-NeRF360 and Tanks&Temples datasets.
Methods 24 37 40 55 63 65 69 83 97 105 106 110 114 118 122 MeansuperscriptMean\text{Mean}^{\downarrow}Mean start_POSTSUPERSCRIPT ↓ end_POSTSUPERSCRIPT Train
implicit NeRF [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] 1.90 1.60 1.85 0.58 2.28 1.27 1.47 1.67 2.05 1.07 0.88 2.53 1.06 1.15 0.96 1.49 similar-to\sim4h
VolSDF [Yariv et al.(2021)Yariv, Gu, Kasten, and Lipman] 1.14 1.26 0.81 0.49 1.25 0.7 0.72 1.29 1.18 0.7 0.66 1.08 0.42 0.61 0.55 0.86 similar-to\sim6h
NeuS [Wang et al.(2021a)Wang, Liu, Liu, Theobalt, Komura, and Wang] 1.00 1.37 0.93 0.43 1.10 0.65 0.57 1.48 1.09 0.83 0.52 1.2 0.35 0.49 0.54 0.84 similar-to\sim6h
Neuralangelo [Li et al.(2023)Li, Müller, Evans, Taylor, Unberath, Liu, and Lin] 0.37 0.72 0.35 0.35 0.87 0.54 0.53 1.29 0.97 0.73 0.47 0.74 0.32 0.41 0.43 0.61 similar-to\sim12h
\hdashline explicit 3DGS [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] 2.14 1.53 2.08 1.68 3.49 2.21 1.43 2.07 2.22 1.75 1.79 2.55 1.53 1.52 1.50 1.96 0.19h
SuGaR [Guédon and Lepetit(2023)] 1.47 1.33 1.13 0.61 2.25 1.71 1.15 1.63 1.62 1.07 0.79 2.45 0.98 0.88 0.79 1.33 1.28h
2DGS [Huang et al.(2024a)Huang, Yu, Chen, Geiger, and Gao] 0.48 0.91 0.39 0.39 1.01 0.83 0.81 1.36 1.27 0.76 0.7 1.40 0.40 0.76 0.52 0.80 0.31h
AtomGS (Ours) 0.51 0.77 0.53 0.4 1.07 0.81 0.87 1.21 1.14 0.47 0.70 1.36 0.36 0.58 0.43 0.75 0.07h
Table 2: Quantitative evaluation of 3D geometry precision on the DTU dataset.
Refer to caption Refer to caption Refer to caption
(a) Ours (b) NeuS (c) SuGaR
Figure 5: Mesh Comparison on the DTU Dataset.
Tanks&Temples
PSNRsuperscriptPSNR\text{PSNR}^{\uparrow}PSNR start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT SSIMsuperscriptSSIM\text{SSIM}^{\uparrow}SSIM start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT LPIPSsuperscriptLPIPS\text{LPIPS}^{\downarrow}LPIPS start_POSTSUPERSCRIPT ↓ end_POSTSUPERSCRIPT
No atomization𝑎𝑡𝑜𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛atomizationitalic_a italic_t italic_o italic_m italic_i italic_z italic_a italic_t italic_i italic_o italic_n 23.25 0.815 0.228
No Lnormalsubscript𝐿𝑛𝑜𝑟𝑚𝑎𝑙L_{normal}italic_L start_POSTSUBSCRIPT italic_n italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT 23.58 0.840 0.181
No Lmsssimsubscript𝐿𝑚𝑠𝑠𝑠𝑖𝑚L_{ms-ssim}italic_L start_POSTSUBSCRIPT italic_m italic_s - italic_s italic_s italic_i italic_m end_POSTSUBSCRIPT 23.58 0.837 0.185
Full model 23.70 0.849 0.166
Table 3: Ablation Study.

6 Conclusion

In this paper, we introduced AtomGS, an approach that enhances radiance field reconstruction by focusing on uniform densification through Atomized Proliferation and refining surface details via Geometry-Guided Optimization. Our approach significantly reduces noisy geometry and blurry artifacts that are common in the previous 3DGS methods.

Nonetheless, AtomGS has its own limitations. Similar to the previous methods, our method may not produce accurate geometry for highly specular or semi-transparent material. While our method in general requires fewer GS primitives than the original 3DGS method to achieve improved visual quality, our proliferation strategy could sometimes produce more GS primitives to represent all details for highly complex environments. In the future, we aim to develop an improved pruning strategy to achieve a more compact result.

References

  • [Aanæs et al.(2016)Aanæs, Jensen, Vogiatzis, Tola, and Dahl] Henrik Aanæs, Rasmus Ramsbøl Jensen, George Vogiatzis, Engin Tola, and Anders Bjorholm Dahl. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, pages 1–16, 2016.
  • [Atzmon and Lipman(2020)] Matan Atzmon and Yaron Lipman. Sal: Sign agnostic learning of shapes from raw data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  • [Barron et al.(2022)Barron, Mildenhall, Verbin, Srinivasan, and Hedman] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields, 2022.
  • [Chen et al.(2023)Chen, Li, and Lee] Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance, 2023.
  • [Deng et al.(2020)Deng, Lewis, Jeruzalski, Pons-Moll, Hinton, Norouzi, and Tagliasacchi] Boyang Deng, John P Lewis, Timothy Jeruzalski, Gerard Pons-Moll, Geoffrey Hinton, Mohammad Norouzi, and Andrea Tagliasacchi. Nasa neural articulated shape approximation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 612–628. Springer, 2020.
  • [Franke et al.(2024)Franke, Rückert, Fink, and Stamminger] Linus Franke, Darius Rückert, Laura Fink, and Marc Stamminger. Trips: Trilinear point splatting for real-time radiance field rendering. In Computer Graphics Forum, page e15012. Wiley Online Library, 2024.
  • [Fridovich-Keil et al.(2022)Fridovich-Keil, Yu, Tancik, Chen, Recht, and Kanazawa] Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  • [Guédon and Lepetit(2023)] Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering, 2023.
  • [Hamdi et al.(2024)Hamdi, Melas-Kyriazi, Qian, Mai, Liu, Vondrick, Ghanem, and Vedaldi] Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, **jie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, and Andrea Vedaldi. Ges: Generalized exponential splatting for efficient radiance field rendering. arXiv preprint arXiv:2402.10128, 2024.
  • [Huang et al.(2024a)Huang, Yu, Chen, Geiger, and Gao] Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. arXiv preprint arXiv:2403.17888, 2024a.
  • [Huang et al.(2024b)Huang, Bai, Guo, Li, and Guo] Letian Huang, Jiayang Bai, Jie Guo, Yuanqi Li, and Yanwen Guo. On the error analysis of 3d gaussian splatting and an optimal projection strategy, 2024b.
  • [Jang and Agapito(2021)] Wonbong Jang and Lourdes Agapito. Codenerf: Disentangled neural radiance fields for object categories. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12949–12958, 2021.
  • [Jiang et al.(2020)Jiang, Sud, Makadia, Huang, Nießner, Funkhouser, et al.] Chiyu Jiang, Avneesh Sud, Ameesh Makadia, **gwei Huang, Matthias Nießner, Thomas Funkhouser, et al. Local implicit grid representations for 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6001–6010, 2020.
  • [Kazhdan et al.(2006)Kazhdan, Bolitho, and Hoppe] Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, volume 7, 2006.
  • [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
  • [Knapitsch et al.(2017)Knapitsch, Park, Zhou, and Koltun] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 2017.
  • [Li et al.(2023)Li, Müller, Evans, Taylor, Unberath, Liu, and Lin] Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
  • [Liu et al.(2021)Liu, Zhang, Zhang, Zhang, Zhu, and Russell] Steven Liu, Xiuming Zhang, Zhoutong Zhang, Richard Zhang, Jun-Yan Zhu, and Bryan Russell. Editing conditional radiance fields. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5773–5783, 2021.
  • [Lu et al.(2023)Lu, Yu, Xu, Xiangli, Wang, Lin, and Dai] Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering, 2023.
  • [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  • [Müller et al.(2022)Müller, Evans, Schied, and Keller] Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  • [Noguchi et al.(2021)Noguchi, Sun, Lin, and Harada] Atsuhiro Noguchi, Xiao Sun, Stephen Lin, and Tatsuya Harada. Neural articulated radiance field. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5762–5772, 2021.
  • [Oechsle et al.(2021)Oechsle, Peng, and Geiger] Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5589–5599, 2021.
  • [Ran et al.(2023)Ran, Zeng, He, Chen, Li, Chen, Lee, and Ye] Yunlong Ran, **g Zeng, Shibo He, Jiming Chen, Lincheng Li, Yingfeng Chen, Gimhee Lee, and Qi Ye. Neurar: Neural uncertainty for autonomous 3d reconstruction with implicit neural representations. IEEE Robotics and Automation Letters, 8(2):1125–1132, 2023.
  • [Srinivasan et al.(2021)Srinivasan, Deng, Zhang, Tancik, Mildenhall, and Barron] Pratul P Srinivasan, Boyang Deng, Xiuming Zhang, Matthew Tancik, Ben Mildenhall, and Jonathan T Barron. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7495–7504, 2021.
  • [Wang et al.(2022)Wang, Chai, He, Chen, and Liao] Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and **g Liao. Clip-nerf: Text-and-image driven manipulation of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3835–3844, 2022.
  • [Wang et al.(2021a)Wang, Liu, Liu, Theobalt, Komura, and Wang] Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wen** Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021a.
  • [Wang et al.(2003)Wang, Simoncelli, and Bovik] Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, volume 2, pages 1398–1402. Ieee, 2003.
  • [Wang et al.(2004)Wang, Bovik, Sheikh, and Simoncelli] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004. 10.1109/TIP.2003.819861.
  • [Wang et al.(2021b)Wang, Wu, Xie, Chen, and Prisacariu] Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, and Victor Adrian Prisacariu. Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021b.
  • [Wu et al.(2022)Wu, Liu, Chen, Li, Zheng, Cai, and Zheng] Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, and Jianmin Zheng. Object-compositional neural implicit surfaces. In European Conference on Computer Vision, pages 197–213. Springer, 2022.
  • [Yan et al.(2023)Yan, Low, Chen, and Lee] Zhiwen Yan, Weng Fei Low, Yu Chen, and Gim Hee Lee. Multi-scale 3d gaussian splatting for anti-aliased rendering. arXiv preprint arXiv:2311.17089, 2023.
  • [Yariv et al.(2021)Yariv, Gu, Kasten, and Lipman] Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces, 2021.
  • [Yen-Chen et al.(2021)Yen-Chen, Florence, Barron, Rodriguez, Isola, and Lin] Lin Yen-Chen, Pete Florence, Jonathan T Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021.
  • [Zhang et al.(2018)Zhang, Isola, Efros, Shechtman, and Wang] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.

Appendix A Atomized Proliferation Algorithm

The Atomized Proliferation algorithm is summarized in Algorithm 1. It starts with setting parameters for Clone Threshold (τcsubscript𝜏𝑐\tau_{c}italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT), Split Threshold (τssubscript𝜏𝑠\tau_{s}italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT), Prune Threshold (ϵitalic-ϵ\epsilonitalic_ϵ), Atom Scale (𝒮asubscript𝒮𝑎\mathcal{S}_{a}caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT), and defining duration limits for atomized proliferation (tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT) and warm-up phase (twsubscript𝑡𝑤t_{w}italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT). The algorithm iteratively processes each Gaussian property (𝝁,𝚺,𝒄,α𝝁𝚺𝒄𝛼\bm{\mu},\bm{\Sigma},\bm{c},\alphabold_italic_μ , bold_Σ , bold_italic_c , italic_α) from the Gaussian set (𝑴,𝑺,𝑪,𝑨𝑴𝑺𝑪𝑨\bm{M},\bm{S},\bm{C},\bm{A}bold_italic_M , bold_italic_S , bold_italic_C , bold_italic_A). A Gaussian is pruned if its α𝛼\alphaitalic_α falls below the threshold ϵitalic-ϵ\epsilonitalic_ϵ or its covariance (𝚺𝚺\bm{\Sigma}bold_Σ) is excessively large. If the gradient of the loss (psubscript𝑝\nabla_{p}\mathcal{L}∇ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT caligraphic_L) exceeds τcsubscript𝜏𝑐\tau_{c}italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, the Gaussian is cloned to potentially bridge geometry gaps. Additionally, the Gaussian is split when psubscript𝑝\nabla_{p}\mathcal{L}∇ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT caligraphic_L meets a dynamically adjusted threshold based on the warm-up progress and if the norm of 𝚺𝚺\bm{\Sigma}bold_Σ exceeds the Atom Scale (𝒮asubscript𝒮𝑎\mathcal{S}_{a}caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT). Atomization takes place when the minimum norm of 𝑺𝑺\bm{S}bold_italic_S is less than or equal to Atom Scale 𝒮asubscript𝒮𝑎\mathcal{S}_{a}caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and within the proliferation timeframe (tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT), ensuring detail refinement before the proliferation endpoint.

Algorithm 1 Atomized Proliferation
0:  Clone Threshold τcsubscript𝜏𝑐\tau_{c}italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, Split Threshold τssubscript𝜏𝑠\tau_{s}italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, Prune Threshold ϵitalic-ϵ\epsilonitalic_ϵ,     Atom Scale 𝒮asubscript𝒮𝑎\mathcal{S}_{a}caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, Atomized Proliferation until tasubscript𝑡𝑎t_{a}italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, Warm-Up until twsubscript𝑡𝑤t_{w}italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT
1:  for all Gaussian(𝝁,𝚺,𝒄,α)𝝁𝚺𝒄𝛼(\bm{\mu},\bm{\Sigma},\bm{c},\alpha)( bold_italic_μ , bold_Σ , bold_italic_c , italic_α ) in (𝑴,𝑺,𝑪,𝑨)𝑴𝑺𝑪𝑨(\bm{M},\bm{S},\bm{C},\bm{A})( bold_italic_M , bold_italic_S , bold_italic_C , bold_italic_A ) do
2:     if α<ϵ𝛼italic-ϵ\alpha<\epsilonitalic_α < italic_ϵ or IsTooLarge(𝚺)𝚺(\bm{\Sigma})( bold_Σ ) then
3:        Prune(𝝁,𝚺,𝒄,α)𝝁𝚺𝒄𝛼(\bm{\mu},\bm{\Sigma},\bm{c},\alpha)( bold_italic_μ , bold_Σ , bold_italic_c , italic_α )
4:     end if
5:     if pτcsubscript𝑝subscript𝜏𝑐\nabla_{p}\mathcal{L}\geq\tau_{c}∇ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT caligraphic_L ≥ italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT then
6:        Clone(𝝁,𝚺,𝒄,α)𝝁𝚺𝒄𝛼(\bm{\mu},\bm{\Sigma},\bm{c},\alpha)( bold_italic_μ , bold_Σ , bold_italic_c , italic_α )
7:     end if
8:     if pmin(itwτs,τs)subscript𝑝𝑖subscript𝑡𝑤subscript𝜏𝑠subscript𝜏𝑠\nabla_{p}\mathcal{L}\geq\min\left(\frac{i}{t_{w}}\tau_{s},\tau_{s}\right)∇ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT caligraphic_L ≥ roman_min ( divide start_ARG italic_i end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_ARG italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) and 𝑺max>𝒮asubscriptnorm𝑺subscript𝒮𝑎||\bm{S}||_{\max}>\mathcal{S}_{a}| | bold_italic_S | | start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT > caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT then
9:        Split(𝝁,𝚺,𝒄,α)𝝁𝚺𝒄𝛼(\bm{\mu},\bm{\Sigma},\bm{c},\alpha)( bold_italic_μ , bold_Σ , bold_italic_c , italic_α )
10:     end if
11:     if 𝑺min𝒮asubscriptnorm𝑺subscript𝒮𝑎||\bm{S}||_{\min}\leq\mathcal{S}_{a}| | bold_italic_S | | start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≤ caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and i<ta𝑖subscript𝑡𝑎i<t_{a}italic_i < italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT then
12:        Atomize(𝝁,𝚺,𝒄,α)𝝁𝚺𝒄𝛼(\bm{\mu},\bm{\Sigma},\bm{c},\alpha)( bold_italic_μ , bold_Σ , bold_italic_c , italic_α )
13:     end if
14:  end for

Appendix B Gaussian Proliferation Trend

In Figure 6, we illustrate the Gaussian Proliferation Trend, which tracks the count of Gaussians across iterations for nine different scenes within the Mip-NeRF360 dataset. The depicted curve represents the average number of Gaussians across these scenes, with the curve’s width indicates the standard deviation. The fluctuations observed highlight the effectiveness of the opacity resetting strategy in eliminating redundant Gaussians. Initially, the 3DGS method struggles to densify Gaussians, as indicated by an increasing standard deviation, and it appears unable to stabilize by the end of the proliferation stage. In contrast, our method employs a warm-up strategy that aggressively densifies Gaussians at the initial stage, followed by a phase where Gaussians begin to merge, leading to a declining and stabilizing trend in Gaussian proliferation.

On the Mip-NeRF360 dataset, our method demonstrates efficiency with an average training time of 0.28 hours and a final model size of 749MB, compared to 3DGS, which takes 0.40 hours for training and results in a model size of 869MB. This indicates that our approach achieves superior quality without compromising on training time or model size.

Refer to caption
Figure 6: Gaussian Proliferation Trend.

Appendix C Implementation Details

Codebase: We have developed AtomGS based on the 3D Gaussian Splatting (3DGS) framework [Kerbl et al.(2023)Kerbl, Kopanas, Leimkuehler, and Drettakis]. To facilitate Edge-Aware Normal Loss computation and Poisson mesh extraction, we have implemented an additional feature renderer. This renderer generates various maps, including accumulation, median and mean depth, normal, and curvature maps. Additionally, we’ve developed an interactive real-time viewer that allows for the monitoring of these features, providing a detailed analysis of Gaussians in terms of both RGB and geometric information. For a detailed derivation of these implementations, please refer to Section D.

Hyper Parameter Settings: Following the 3DGS, we set the Clone Threshold at τc=0.002subscript𝜏𝑐0.002\tau_{c}=0.002italic_τ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 0.002, Split Threshold at τs=0.002subscript𝜏𝑠0.002\tau_{s}=0.002italic_τ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 0.002, and Prune Threshold at ϵ=0.005italic-ϵ0.005\epsilon=0.005italic_ϵ = 0.005. For the Atom-related settings, we set Atom Scale at the first percentile of distances from the input SfM points 𝒮a=P1(𝒅)subscript𝒮𝑎subscript𝑃1𝒅\mathcal{S}_{a}=P_{1}(\bm{d})caligraphic_S start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_d ), Atomized Proliferation until iteration at ta=7000subscript𝑡𝑎7000t_{a}=7000italic_t start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 7000, and Warm-Up until iteration at tw=7000subscript𝑡𝑤7000t_{w}=7000italic_t start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 7000. For optimization, the weights for SSIM and normal calculations are both set at λSSIM=λNormal=0.1subscript𝜆𝑆𝑆𝐼𝑀subscript𝜆𝑁𝑜𝑟𝑚𝑎𝑙0.1\lambda_{SSIM}=\lambda_{Normal}=0.1italic_λ start_POSTSUBSCRIPT italic_S italic_S italic_I italic_M end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_N italic_o italic_r italic_m italic_a italic_l end_POSTSUBSCRIPT = 0.1. When working with object-centered datasets that lack extensive backgrounds, we advise setting the scale learning rate ηs=0subscript𝜂𝑠0\eta_{s}=0italic_η start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 0 to maximize geometric accuracy.

Mesh Extraction: Mesh extraction involves rendering depth maps from training views, which use median depth values from splats projected onto pixels. These maps are then converted back into 3D space to derive corresponding normal maps. The oriented colored point cloud generated from the RGB image, depth map, and normal map serves as the input for the Poisson extraction method [Kazhdan et al.(2006)Kazhdan, Bolitho, and Hoppe], which is used to create the textured mesh. This process is illustrated in Figure 7.

Refer to caption
Figure 7: Poisson Mesh Extraction.

Hardware: All experiments are conducted on a single NVIDIA GeForce RTX 4090 GPU.

Appendix D Gaussian Splatting and Additional Feature Rendering

Splatting: During this stage, 3D Gaussians are projected into the 2D image space to facilitate rendering. Utilizing the viewing transformation 𝑾𝑾\bm{W}bold_italic_W and the 3D covariance matrix 𝚺𝚺\bm{\Sigma}bold_Σ, the projected 2D covariance matrix 𝚺superscript𝚺\bm{\Sigma}^{\prime}bold_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is computed through 𝚺=𝑱𝑾𝚺𝑾𝑱superscript𝚺𝑱𝑾𝚺superscript𝑾topsuperscript𝑱top\bm{\Sigma}^{\prime}=\bm{JW\Sigma W^{\top}J^{\top}}bold_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_italic_J bold_italic_W bold_Σ bold_italic_W start_POSTSUPERSCRIPT bold_⊤ end_POSTSUPERSCRIPT bold_italic_J start_POSTSUPERSCRIPT bold_⊤ end_POSTSUPERSCRIPT. Additionally, we can use the same transformation to compute 𝝁2superscript𝝁superscript2\bm{\mu}^{\prime}\in\mathbb{R}^{2}bold_italic_μ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in 2D projected space. Given the position of a pixel 𝒙2𝒙superscript2\bm{x}\in\mathbb{R}^{2}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, 3D Gaussian splitting can be formed as follows:

𝒢i(𝒙):=exp(12(𝒙𝝁i)𝚺i1(𝒙𝝁i)).assignsubscript𝒢𝑖𝒙12superscript𝒙superscriptsubscript𝝁𝑖topsuperscriptsubscript𝚺𝑖1𝒙superscriptsubscript𝝁𝑖\mathcal{G}_{i}(\bm{x}):=\exp{(-\frac{1}{2}(\bm{x}-\bm{\mu}_{i}^{\prime})^{% \top}\bm{\Sigma}_{i}^{\prime-1}(\bm{x}-\bm{\mu}_{i}^{\prime}))}.caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) := roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_italic_x - bold_italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ - 1 end_POSTSUPERSCRIPT ( bold_italic_x - bold_italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) . (4)

Rendering: Upon receiving the position of a pixel 𝒙𝒙\bm{x}bold_italic_x, the distances to all overlap** Gaussians are computed using the viewing transformation 𝑾𝑾\bm{W}bold_italic_W, thereby generating a sorted list of Gaussians 𝒩:={G1,,GN}assign𝒩subscript𝐺1subscript𝐺𝑁\mathcal{N}:=\{G_{1},...,G_{N}\}caligraphic_N := { italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_G start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT }. Subsequently, alpha compositing is employed to render accumulated weight for each pixel:

wi(𝒙)=αi𝒢i(𝒙)j=1i1(1αj𝒢j(𝒙)).subscript𝑤𝑖𝒙subscript𝛼𝑖subscript𝒢𝑖𝒙superscriptsubscriptproduct𝑗1𝑖11subscript𝛼𝑗subscript𝒢𝑗𝒙w_{i}(\bm{x})=\alpha_{i}\mathcal{G}_{i}(\bm{x})\prod_{j=1}^{i-1}(1-\alpha_{j}% \mathcal{G}_{j}(\bm{x})).italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT caligraphic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) ) . (5)

Using the above weight function, several key maps can be derived for each pixel:

  1. 1.

    RGB Color Map:

    𝑪(𝒙)=i=1Nciwi(𝒙)𝑪𝒙superscriptsubscript𝑖1𝑁subscript𝑐𝑖subscript𝑤𝑖𝒙\bm{C}(\bm{x})=\sum_{i=1}^{N}c_{i}w_{i}(\bm{x})bold_italic_C ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) (6)

    accumulates the RGB colors cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, each weighted by the respective wi(𝒙)subscript𝑤𝑖𝒙w_{i}(\bm{x})italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ), to produce the final color output for each pixel.

  2. 2.

    Accumulation Map:

    𝑨(𝒙)=i=1Nwi(𝒙),𝑨𝒙superscriptsubscript𝑖1𝑁subscript𝑤𝑖𝒙\bm{A}(\bm{x})=\sum_{i=1}^{N}w_{i}(\bm{x}),bold_italic_A ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) , (7)

    which aggregates the computed weights across all Gaussians.

  3. 3.

    Mean (Expected) Depth Map:

    𝑫mean(𝒙)=i=1Nziwi(𝒙),subscript𝑫𝑚𝑒𝑎𝑛𝒙superscriptsubscript𝑖1𝑁subscript𝑧𝑖subscript𝑤𝑖𝒙\bm{D}_{mean}(\bm{x})=\sum_{i=1}^{N}z_{i}w_{i}(\bm{x}),bold_italic_D start_POSTSUBSCRIPT italic_m italic_e italic_a italic_n end_POSTSUBSCRIPT ( bold_italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ) , (8)

    where zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the depth associated with each Gaussian, weighted by wi(𝒙)subscript𝑤𝑖𝒙w_{i}(\bm{x})italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x ).

  4. 4.

    Median Depth Map:

    𝑫median(𝒙)=ziwherei=mini{j=1i1(1αj𝒢j(𝒙))>0.5}formulae-sequencesubscript𝑫𝑚𝑒𝑑𝑖𝑎𝑛𝒙subscript𝑧𝑖where𝑖subscript𝑖superscriptsubscriptproduct𝑗1𝑖11subscript𝛼𝑗subscript𝒢𝑗𝒙0.5\bm{D}_{median}(\bm{x})=z_{i}\quad\text{where}\quad i=\min_{i}\left\{\prod_{j=% 1}^{i-1}(1-\alpha_{j}\mathcal{G}_{j}(\bm{x}))>0.5\right\}bold_italic_D start_POSTSUBSCRIPT italic_m italic_e italic_d italic_i italic_a italic_n end_POSTSUBSCRIPT ( bold_italic_x ) = italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where italic_i = roman_min start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT { ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT caligraphic_G start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( bold_italic_x ) ) > 0.5 } (9)

    calculates the median depth by identifying the first Gaussian for which the Transmittance value exceeds 0.5.

Depth Map Unprojection: Given a depth map 𝑫superscript𝑫\bm{D}^{\prime}bold_italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of size H×W𝐻𝑊H\times Witalic_H × italic_W, where (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) are pixel coordinates and di,jsubscript𝑑𝑖𝑗d_{i,j}italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT is the depth at pixel (i,j)𝑖𝑗(i,j)( italic_i , italic_j ), the unprojecting steps are as follows:

  1. 1.

    Normalization: The pixel coordinates are normalized to the range [1,1]11[-1,1][ - 1 , 1 ] using xnorm=2jW11subscript𝑥norm2𝑗𝑊11x_{\text{norm}}=\frac{2j}{W-1}-1italic_x start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT = divide start_ARG 2 italic_j end_ARG start_ARG italic_W - 1 end_ARG - 1 and ynorm=2iH11subscript𝑦norm2𝑖𝐻11y_{\text{norm}}=\frac{2i}{H-1}-1italic_y start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT = divide start_ARG 2 italic_i end_ARG start_ARG italic_H - 1 end_ARG - 1.

  2. 2.

    Homogeneous Coordinates in Camera Space: The normalized coordinates are then transformed into homogeneous camera space coordinates 𝐩camera=[xnorm,ynorm,di,j]Tsubscript𝐩camerasuperscriptsubscript𝑥normsubscript𝑦normsubscript𝑑𝑖𝑗𝑇\mathbf{p}_{\text{camera}}=[x_{\text{norm}},y_{\text{norm}},d_{i,j}]^{T}bold_p start_POSTSUBSCRIPT camera end_POSTSUBSCRIPT = [ italic_x start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

  3. 3.

    Depth Scaling: Using elements f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from the camera projection matrix 𝐊𝐊\mathbf{K}bold_K, the depth values are scaled as sdi,j=f1di,j+f2di,jsubscript𝑠subscript𝑑𝑖𝑗subscript𝑓1subscript𝑑𝑖𝑗subscript𝑓2subscript𝑑𝑖𝑗s_{d_{i,j}}=\frac{f_{1}\cdot d_{i,j}+f_{2}}{d_{i,j}}italic_s start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG. The adjusted camera space coordinates are set to 𝐩camera=[xnorm,ynorm,sdi,j]Tsuperscriptsubscript𝐩camerasuperscriptsubscript𝑥normsubscript𝑦normsubscript𝑠subscript𝑑𝑖𝑗𝑇\mathbf{p}_{\text{camera}}^{\prime}=[x_{\text{norm}},y_{\text{norm}},s_{d_{i,j% }}]^{T}bold_p start_POSTSUBSCRIPT camera end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ italic_x start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

  4. 4.

    World Space Transformation: The transformed camera space coordinates are then multiplied by the inverse of the full projection transform matrix 𝐓𝐓\mathbf{T}bold_T, resulting in 𝐩world=𝐓×[xnorm,ynorm,sdi,j,1]Tsubscript𝐩world𝐓superscriptsubscript𝑥normsubscript𝑦normsubscript𝑠subscript𝑑𝑖𝑗1𝑇\mathbf{p}_{\text{world}}=\mathbf{T}\times[x_{\text{norm}},y_{\text{norm}},s_{% d_{i,j}},1]^{T}bold_p start_POSTSUBSCRIPT world end_POSTSUBSCRIPT = bold_T × [ italic_x start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT norm end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 1 ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

  5. 5.

    Discarding Homogeneous Coordinate: Finally, to obtain Cartesian coordinates, the homogeneous coordinate is discarded: 𝐩world=𝐩world[0:3]𝐩world[3]\mathbf{p}_{\text{world}}^{\prime}=\frac{\mathbf{p}_{\text{world}}[0:3]}{% \mathbf{p}_{\text{world}}[3]}bold_p start_POSTSUBSCRIPT world end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG bold_p start_POSTSUBSCRIPT world end_POSTSUBSCRIPT [ 0 : 3 ] end_ARG start_ARG bold_p start_POSTSUBSCRIPT world end_POSTSUBSCRIPT [ 3 ] end_ARG. This results in 𝑫=𝐩world𝑫subscriptsuperscript𝐩world\bm{D}=\mathbf{p}^{\prime}_{\text{world}}bold_italic_D = bold_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT world end_POSTSUBSCRIPT being the 3D coordinates in world space for each pixel.

Normal Map Calculation: Given an unprojected depth map, 𝑫H×W×3𝑫superscript𝐻𝑊3\bm{D}\in\mathbb{R}^{H\times W\times 3}bold_italic_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 3 end_POSTSUPERSCRIPT, we can output the corresponding normal map using the cross product of the depth map’s gradients:

𝑵=x𝑫×y𝑫x𝑫×y𝑫𝑵subscript𝑥𝑫subscript𝑦𝑫normsubscript𝑥𝑫subscript𝑦𝑫\bm{N}=\frac{\nabla_{x}\bm{D}\times\nabla_{y}\bm{D}}{||\nabla_{x}\bm{D}\times% \nabla_{y}\bm{D}||}bold_italic_N = divide start_ARG ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT bold_italic_D × ∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT bold_italic_D end_ARG start_ARG | | ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT bold_italic_D × ∇ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT bold_italic_D | | end_ARG (10)

Appendix E Additional Results

We provide detailed per-scene metrics for the Mip-NeRF360 and Tank & Temple datasets in Table 4. Additionally, we offer further insights through 2D rendering comparisons in Figure 8 and 3D mesh comparisons in Figure 9.

Bicycle Flowers Garden Stump Treehill Room Counter Kitchen Bonsai Truck Train Mean
PSNR 3DGS 25.10 21.52 27.18 26.49 22.37 31.22 28.96 30.98 32.18 25.39 22.02 26.67
SuGaR 23.13 19.67 25.30 24.23 21.44 29.85 27.53 29.33 30.47 22.69 20.47 24.92
Ours 25.33 21.71 27.44 26.58 22.11 31.30 28.97 30.88 32.15 25.47 21.93 26.72
\hdashline SSIM 3DGS 0.763 0.603 0.860 0.763 0.626 0.916 0.905 0.923 0.939 0.878 0.812 0.817
SuGaR 0.663 0.514 0.793 0.669 0.558 0.901 0.882 0.892 0.928 0.827 0.762 0.763
Ours 0.772 0.611 0.865 0.774 0.633 0.918 0.906 0.925 0.938 0.880 0.817 0.822
\hdashline LPIPS 3DGS 0.205 0.332 0.107 0.213 0.326 0.219 0.200 0.127 0.204 0.148 0.208 0.208
SuGaR 0.307 0.378 0.182 0.307 0.408 0.2395 0.222 0.167 0.211 0.175 0.259 0.260
Ours 0.203 0.325 0.104 0.202 0.317 0.222 0.202 0.127 0.202 0.133 0.200 0.203
Table 4: PSNRsuperscriptPSNR\text{PSNR}^{\uparrow}PSNR start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT, SSIMsuperscriptSSIM\text{SSIM}^{\uparrow}SSIM start_POSTSUPERSCRIPT ↑ end_POSTSUPERSCRIPT, LPIPSsuperscriptLPIPS\text{LPIPS}^{\downarrow}LPIPS start_POSTSUPERSCRIPT ↓ end_POSTSUPERSCRIPT metrics for Mip-NeRF360 and Tank&Temple datasets
Refer to caption
Figure 8: Radiance Field Comparison on the Mip-NeRF360 Dataset.

In Figure 8, the RGB rendering results of our AtomGS show enhanced detail compared to those of Sugar. This is evident in the regions observed on the tree trunk in the "stump" scene and the slender, curly hay near the dried grass ornament in the "garden" scene, as highlighted in the magnified areas. While AtomGS’s RGB renderings appear visually similar to those of 3DGS, the normal maps reveal that AtomGS better preserves geometry accuracy, such as the tree trunk in the "stump" scene and both the ground beneath the table and the surface of the soccer ball in the "garden" scene.

In Figure 9, Neus, which uses Signed Distance Functions (SDF), produces the smoothest surfaces. However it sometimes sacrifices sharp features, leading to overly smoothed surfaces. SuGaR attempts to convert every Gaussian into 2D ellipsoidal disks, resulting in relatively smooth surfaces. However, the disks do not always align perfectly with the surfaces, creating noticeable disk-shaped artifacts and sometimes overfitting the background. In contrast, AtomGS achieves smooth surfaces while retaining detailed geometries.

DTU 24

Refer to caption Refer to caption Refer to caption

DTU 106

Refer to caption Refer to caption Refer to caption

DTU 122

Refer to caption Refer to caption Refer to caption

Lego

Refer to caption Refer to caption Refer to caption

Chair

Refer to caption Refer to caption Refer to caption

Hotdog

Refer to caption Refer to caption Refer to caption
(a) Ours (b) NeuS (c) SuGaR
Figure 9: Mesh Comparison on DTU Dataset and NeRF Synthetic Dataset [Mildenhall et al.(2021)Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, and Ng]