Gaussian Splatting with Localized Points Management

Haosen Yang1 Chenhao Zhang111footnotemark: 1 Wenqing Wang1 Marco Volino1 Adrian Hilton1 Li Zhang2 Xiatian Zhu1

1University of Surrey 2Fudan University

https://surrey-uplab.github.io/research/LPM
Equal Contribution
Abstract

Point management is a critical component in optimizing 3D Gaussian Splatting (3DGS) models, as the point initiation (e.g., via structure from motion) is distributionally inappropriate. Typically, the Adaptive Density Control (ADC) algorithm is applied, leveraging view-averaged gradient magnitude thresholding for point densification, opacity thresholding for pruning, and regular all-points opacity reset. However, we reveal that this strategy is limited in tackling intricate/special image regions (e.g., transparent) as it is unable to identify all the 3D zones that require point densification, and lacking an appropriate mechanism to handle the ill-conditioned points with negative impacts (e.g., occlusion due to false high opacity). To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration. Zone identification is achieved by leveraging the underlying multiview geometry constraints, with the guidance of image rendering errors. We apply point densification in the identified zone, whilst resetting the opacity of those points residing in front of these regions so that a new opportunity is created to correct ill-conditioned points. Serving as a versatile plugin, LPM can be seamlessly integrated into existing 3D Gaussian Splatting models. Experimental evaluation across both static 3D and dynamic 4D scenes validate the efficacy of our LPM strategy in boosting a variety of existing 3DGS models both quantitatively and qualitatively. Notably, LPM improves both vanilla 3DGS and SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds, outperforming on challenging datasets such as Tanks & Temples and the Neural 3D Video Dataset.

1 Introduction

Neural rendering has emerged as a generalizable, flexible, and powerful approach for photorealistic novel view synthesis (NVS) of any camera poses [24], underpinning a wide variety of applications in augmented/virtual/mixed reality [9], robotics [39], and generation [26], among more others.

For example, taking a learning-based parametric idea, Neural Radiance Fields (NeRFs) [24] implicitly represent the scene radiance of varying complexity using neural networks (e.g., MLPs), without the tedious requirements of model design handcrafting for accounting the scene variations in geometry, texture, illumination. Despite the high-quality representational performances, they are typically inefficient computationally in view rendering due to heavy ray sampling, thus suffer in scaling to high-resolution content applications and large scale scene modeling [30, 33].

Refer to caption
Figure 1: Visualization of points behavior. 3DGS produces ill-conditioned Gaussians (red box) that occlude other valid points, resulting in noticeably incorrect depth estimation. LPM handles these ill-conditioned points to reduce negative impacts and further calibrate the geometry.

In this context, 3D Gaussian Splatting (3DGS) [13] has come as a more efficient alternative with much faster model optimization and real-time neural rendering. As an explicit representation model, this approach begins by initializing a set of 3D Gaussian points with Structure from Motion (SfM), followed by optimizing the parameters of these points via view reconstruction loss, with the view output produced using a differentiable splatting-based rasterization. As the point initialization is coarse and error-prone, in optimization, a point management mechanism, Adaptive Density Control (ADC), is typically applied for dealing with point distributional issues such as under-population (e.g., no enough points) or over-population (e.g., too many points) in the 3D space. However, we find several limitations with ADC: (1) Thresholding simply the average gradient for deciding the regions for point densification tends to overlook under-optimized points. For example, larger Gaussian points typically exhibit lower average gradients, frequently appearing across various views in screen space. (2) Point sparsity makes it difficult to add sufficient and reliable points to comprehensively cover the scene. (3) Falsely optimized Gaussian points could cause negative effects, e.g., occluding other good points and leading to incorrect depth estimates (see erroneous placements on windows Fig.  1).

To overcome the aforementioned limitations, in this paper we propose a novel Localized Point Management (LPM) approach. Our idea is intuitive – identifying those 3D Gaussian points leading to rendering errors. Thus we start with an image rendering error map of a specific view. To obtain the error contributing 3D points, we leverage the region correspondence between different views via feature map**, subject to the multiview geometry constraint. For each pair of corresponded regions, we cast the rays through them at their respective camera views in the cone shape, and consider their intersection as the error source zone. Within each such zone, we consider two situations: (1) At presence of points, we further apply point densification at a lower threshold to complement the original counterpart locally; (2) In case no point due to point sparsity, we add new Gaussian points. concurrently, we reset the opacity of points with high estimated opacity and residing in front of these zones, as they could impose high impact on view rendering. This provides an opportunity of correcting those potentially ill-conditioned points whilst tuning those newly added ones in the following optimization. To minimize model expansion, we prune the points by opacity in a density aware manner.

We summarize the contributions as follows: (1) With in-depth analysis, we have discovered that the standard point management mechanism used in 3DGS has several limitations that impede model optimization. (2) We present Localized Point Management (LPM) for these issues by identifying error-contributing 3D zones and implementing appropriate operations for point densification and opacity reset. (3) Extensive experiments validate the benefits of our LPM in improving a diversity of existing 3DGS models in novel view synthesis on both static and dynamic scenes.

2 Related Work

Neural Scene Representations has always been an important direction in novel view synthesis. Previous methods allocate neural features to structures such as volume [21, 28], texture [32], and point cloud [1]. The pioneering work of NeRF [24] proposes integrating neural networks with 3D volumetric representations to convert a 3D scene into a learnable density field, enabling high-quality novel view synthesis without requiring explicit modeling of the 3D scene and illumination. Later on, numerous works emerge to boost the quality and efficiency of volume rendering, [3, 37, 5] refine the point sampling strategy in ray marching, [4, 35] reparameterize the scene to produce a more compact representation. Additionally, regularization terms [8, 42] can be incorporated to constrain the scene representation, resulting in a closer approximation to real-world geometry. Despite their high-quality representational performance, these methods are typically computationally inefficient for view rendering due to the extensive ray sampling required and the use of Multi-Layer Perceptrons (MLPs) to represent the scene, complicating the computation and optimization of any point within the scene. To address this, several works have proposed novel scene representations aimed at accelerating the rendering process. These representations replace MLPs with sparse voxels [20], hash tables [25], or triplanes [6], significantly enhancing rendering speed. However, real-time rendering remains challenging due to the inherent complexity of the ray marching strategy in volume rendering.

3D Gaussian Splatting represents a recent advancement in novel view synthesis, enabling real-time high-quality rendering. It contributes to splatting-based rasterization by computing pixel colors through depth sorting and α𝛼\alphaitalic_α-blending of projected 2D Gaussians, thereby avoiding the complex sampling strategies of ray marching and achieving real-time performance. It is precisely due to its real-time high-quality rendering capabilities that 3DGS has been applied to various domains, including autonomous driving, content generation [31], and 4D dynamic scenes [18, 36, 40], among others. Despite these advancements, 3DGS still has some drawbacks, such as the storage of Gaussians and handling multi-resolution, and so on. Several works have enhanced 3DGS by improving Gaussian representation, including techniques such as low-pass filtering [41], multiscale Gaussian representations [38], and interpolating Gaussian attributes from structured grid features [22]. However, these works often overlook the importance of point management, specifically Adaptive Density Control, which is typically applied during optimization to address issues like under-population or over-population in the 3D space. Only a few works have focused on point management. For example, GaussianPro [7] directly tackles densification limitations, bridging gaps from SfM-based initialization. Pixel-GS [43] proposes a gradient scaling strategy to suppress artifacts near the camera. Additionally, [27] introduces an auxiliary per-pixel error function to implicitly supervise point contributions.

Although these methods improve densification, they are still unable to identify all the 3D zones that require point densification and lack a proper mechanism to handle ill-conditioned points with negative impacts. In our paper, we propose a novel approach, Localized Point Management, capable of identifying error-contributing zones with the highest demand for both point addition and geometry calibration.

3 Method

3.1 Preliminaries: 3D Gaussian Splatting

Gaussian Splatting builds upon concepts from EWA  [44] splatting and proposes modeling a 3D scene as a collection of 3D Gaussian points {Gii=1,,K}conditional-setsubscript𝐺𝑖𝑖1𝐾\{G_{i}\mid i=1,\ldots,K\}{ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_i = 1 , … , italic_K }, rendered through volume splatting. Each 3D Gaussian G𝐺Gitalic_G is defined by the equation:

G(x)=e12(xμ)TΣ1(xμ),𝐺𝑥superscript𝑒12superscript𝑥𝜇𝑇superscriptΣ1𝑥𝜇G(x)=e^{-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)},italic_G ( italic_x ) = italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_x - italic_μ ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_x - italic_μ ) end_POSTSUPERSCRIPT ,

where μ3×1𝜇superscript31\mu\in\mathbb{R}^{3\times 1}italic_μ ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 1 end_POSTSUPERSCRIPT represents the mean vector, and Σ3×3Σsuperscript33\Sigma\in\mathbb{R}^{3\times 3}roman_Σ ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT denotes its covariance matrix. To maintain the positive semi-definite nature of ΣΣ\Sigmaroman_Σ during optimization, it is represented as Σ=RSSTRTΣ𝑅𝑆superscript𝑆𝑇superscript𝑅𝑇\Sigma=RSS^{T}R^{T}roman_Σ = italic_R italic_S italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, with the orthogonal rotation matrix R3×3𝑅superscript33R\in\mathbb{R}^{3\times 3}italic_R ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT and the diagonal scale matrix S3×3𝑆superscript33S\in\mathbb{R}^{3\times 3}italic_S ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 3 end_POSTSUPERSCRIPT.

To render an image from a specific viewpoint, the color of each pixel p𝑝pitalic_p is determined by blending N𝑁Nitalic_N ordered Gaussians {Gii=1,,N}conditional-setsubscript𝐺𝑖𝑖1𝑁\{G_{i}\mid i=1,\ldots,N\}{ italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ italic_i = 1 , … , italic_N } that overlap p𝑝pitalic_p, using the formula:

c(p)=i=1Nciαij=1i1(1αj),𝑐𝑝superscriptsubscript𝑖1𝑁subscript𝑐𝑖subscript𝛼𝑖superscriptsubscriptproduct𝑗1𝑖11subscript𝛼𝑗c(p)=\sum_{i=1}^{N}c_{i}\alpha_{i}\prod_{j=1}^{i-1}(1-\alpha_{j}),italic_c ( italic_p ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

where αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is derived by evaluating a projected 2D Gaussian from Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at pixel p𝑝pitalic_p combined with a learned opacity for Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the learnable, view-dependent color modeled using spherical harmonics in 3DGS. Gaussians that influence p𝑝pitalic_p are arranged in ascending order based on their depth from the current viewpoint. Employing differentiable rendering techniques allows for the end-to-end optimization of all Gaussian attributes through training view reconstruction.

Point management Since existing 3DGS variants

start by initializing 3D Gaussian points using Structure from Motion (SfM), the points are often coarse and error-prone. During optimization, a point management mechanism, Adaptive Density Control (ADC), is typically applied to manage point distribution issues.

Specifically, thresholding the average gradient is used to decide on point densification. For each Gaussian point Gisubscript𝐺𝑖G_{i}italic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 3DGS tracks the magnitude of the positional gradient Lπμisubscript𝐿𝜋subscript𝜇𝑖\frac{\partial L_{\pi}}{\partial\mu_{i}}divide start_ARG ∂ italic_L start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG across all rendered views, which is then averaged to a quantity Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. During each training iteration, if the gradient Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT surpasses a predefined threshold, it considers this point as inadequately representing the corresponding 3D region. With the scale of the Gaussian as the size measure, a large Gaussian will be split into two, while a small one leads to point cloning.

However, this commonly used ADC strategy is unable to identify all the 3D zones with the underlying need for point densification. This is becuase, often the local complexity of scene geometry varies significantly, which beyond the reach of any single-value based thresholding. Besides, there is lacking of a proper mechanism to handle ill-conditioned points with negative impacts (e.g., wrong opacity values estimated during training with points distributed here and there).

3.2 Localized Gaussian Point Management

To address the aforementioned issues, we introduce a novel model agnostic point management approach, Localized Point Management (LPM), which leverages multiview geometry constraints to identify error contributing 3D points, with the guidance of image rendering errors. This approach can be seamlessly integrated with existing 3DGS models without the need for architectural modification.

As illustrated in Figure  2, we begin with an image rendering error map for a specific view. Under the multiview geometry constraint, the corresponding regions in the referred view are matched via feature map**. For each pair of corresponding regions, we then cast rays through them from their respective camera views in a cone and identify their intersection as the error source zone. Within each zone, we perform localized point manipulation.

Error map generation

To accurately localize those zones in the 3D space that require point densification and geometry calibration, we initiate our process by rendering the current view image through the splatting of 3D Gaussians. This is followed by generating an error map (Figure 2(a)) for this specific view against the grounth-truth image using an error function [18].

Refer to caption
Figure 2: Overview of the Localized Point Management (LPM). (a) We start with an image rendering error map against the current view (the ground-truth). Concurrently, matching points are identified between the current view and a refereed view sampled as an adjacent view via off-the-shelf feature map**. (b)Subsequently, cross-view region map** is then employed to locate the correspondence region in the refereed view. (c)For each pair of corresponded regions, we cast the rays through them at their respective camera views in the cone shape, and consider their intersection as the error source zone. The final step involves identifying under-optimized or ill-conditioned points within these zones, where under-optimized/empty places are densified, ill-conditioned points are reset.

Error contributing 3D zone identification

To project this rendering error back to the 3D space, we leverage the region correspondence between different views under multiview geometry constraints. This involves the following two key steps.

(i) Cross-view region map** We select a neighboring view as the referred image. Following LightGlue [19] that predicts a partial assignment between two sets of local features extracted from two view images A𝐴Aitalic_A and B𝐵Bitalic_B. Each feature consists of sets of 2D features position {Fi(xi,yi)[0,1]2}conditional-setsubscript𝐹𝑖subscript𝑥𝑖subscript𝑦𝑖superscript012\{F_{i}\mid(x_{i},y_{i})\in[0,1]^{2}\}{ italic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT }, normalized by the image size. The images A𝐴Aitalic_A and B𝐵Bitalic_B contain M𝑀Mitalic_M and N𝑁Nitalic_N local features. LightGlue outputs a set of correspondences ={(i,j)}A×B𝑖𝑗𝐴𝐵\mathcal{M}=\{(i,j)\}\subseteq A\times Bcaligraphic_M = { ( italic_i , italic_j ) } ⊆ italic_A × italic_B. Since the 2D rendering error regions in the current view may not all appear in the referenced image, we select the paired region (Re,Re)subscript𝑅𝑒superscriptsubscript𝑅𝑒(R_{e},R_{e}^{\prime})( italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (Figure 2(b)) through the matching points. Additionally, this paired region undergoes multiview adaptive adjustments based on the error map throughout the optimization process.

(ii) 2D-to-3D projection After obtaining the paired regions with render errors, we project each 2D error region to the 3D space via multiview geometry constraints. Specifically, we cast the rays 𝒞𝒞\mathcal{C}caligraphic_C in cone shape for region Resubscript𝑅𝑒R_{e}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT from the camera’s center of projection o𝑜oitalic_o along the direction d𝑑ditalic_d, which aligns with the pixel’s center (Figure 2(c)). The apex of this cone is located at o𝑜oitalic_o, and its radius at the image plane. Hence, o+d𝑜𝑑o+ditalic_o + italic_d is parameterized as 𝒞𝒞\mathcal{C}caligraphic_C. The radius rConesubscript𝑟𝐶𝑜𝑛𝑒r_{Cone}italic_r start_POSTSUBSCRIPT italic_C italic_o italic_n italic_e end_POSTSUBSCRIPT is set to match the radius of the smallest circumscribed circle of the 2D plane error region, creating a cone on the 3D space that can trace the Gaussian points contributing to the 2D error region. Concurrently, a corresponding cone, denoted as 𝒞superscript𝒞\mathcal{C^{\prime}}caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, belong to region Resuperscriptsubscript𝑅𝑒R_{e}^{\prime}italic_R start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is similarly projected. Subsequently, we compute the intersection points of these rays. In order to regionalize these points, we directly use a smallest sphere that can contain these points as error source 3D zone Rzonesubscript𝑅𝑧𝑜𝑛𝑒R_{zone}italic_R start_POSTSUBSCRIPT italic_z italic_o italic_n italic_e end_POSTSUBSCRIPT.

Points manipulation

Recall that in existing 3DGS, points management only relies on the view-averaged gradient magnitude τ𝜏\tauitalic_τ to determine point densification globally. In addition to this, we further perform localized points addition and geometry calibration within the identified error source 3D zone Rzonesubscript𝑅𝑧𝑜𝑛𝑒R_{zone}italic_R start_POSTSUBSCRIPT italic_z italic_o italic_n italic_e end_POSTSUBSCRIPT. For the point addition, we consider two common situations: (1) In the presence of points, we apply point densification to locally complement the original counterparts. We set a lower threshold to select the points that need densification, aiming to enhance the geometric details. The densification rule is consistent with 3DGS, but it focuses on local 3D zones that need it most. Specifically, for small Gaussians, our strategy involves cloning the Gaussians while maintaining their size and repositioning them along the positional gradient to better capture emerging geometrical features. Conversely, larger Gaussians situated in areas of high variance are split into smaller points to more accurately represent the underlying geometry. (2) In cases of point sparsity, we add new Gaussian points at the center of the 3D zone.

In the context of α𝛼\alphaitalic_α-blending in 3DGS, if the points at the forefront of the identified 3D zone Rzonesubscript𝑅𝑧𝑜𝑛𝑒R_{zone}italic_R start_POSTSUBSCRIPT italic_z italic_o italic_n italic_e end_POSTSUBSCRIPT have the highest opacity, they may occlude valid points, leading to incorrect depth estimation, as shown in Figure 1. To deal with such issues, we treat these points as potentially ill-conditioned. We reset these points to provide an opportunity for correction, further calibrating the geometry.

To minimize model expansion, we adaptively prune points based on their opacity values, starting from low to high. The number of points pruned is determined by the density of points in the zone. This strategic reduction ensures that our point management remains cost efficient and adaptive to the evolving needs of the scene representation.

4 Experiment

Datasets and metrics

We conducted an extensive evaluation using both static and dynamic scenes derived from publicly datasets. For static scenes, our approach was applied to a total of 11 scenes as specified in the 3DGS framework [13], which includes nine scenes from Mip-NeRF360 [3], two from Tanks&Temples [14], and two from DeepBlending [12]. In the context of dynamic scenes, our approach was tested across six scenes from the Neural 3D Video Dataset [16].

To evaluate novel view synthesis performance, we follow standard protocols by selecting one out of every eight images as test images, with the remaining used for training in static scenes. For each dynamic scene within the Neural 3D Video Dataset, one view was designated for testing while the others were allocated for training purposes. Evaluation metrics included the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and the learned perceptual image patch similarity (LPIPS), which are broadly recognized standards in the field.

Baseline and Implementation

Vanilla 3D Gaussian Splatting(3DGS) [13] and its variants SpacetimeGS(STGS)  [18] are selected as our main baseline for its established SOTA performance in novel view synthesis. For the Static 3D benchmark, We also record the results of Mip-NeRF360 [3], iNGP  [25] and Plenoxels  [11] as in  [13]. For the Dynamic 4D benchmark, we also performed system comparison, such as DyNeRF [17], StreamRF [15], K-planes [10] and so on. In alignment with the approach described in 3DGS an STGS, our models are trained for 30k iterations across all scenes, following the same training schedule and hyperparameters. In addition to the original Gaussian densification strategies used in 3DGS and SpaceTime Gaussian, we also perform localized points management, including addition, reset, and pruning. We maintain the same thresholds for splitting and cloning points as in the original 3DGS and SpaceTime Gaussian. For point matching, we perform offline extraction to save computational cost. All experiments were conducted on an RTX 3090 GPU with 24GB of memory.

Refer to caption
Figure 3: Qualitative evaluation of our LPM across diverse static datasets [4, 12, 14]. Our LPM improves 3DGS [13] on these challenging scenarios, e.g. (a) Light effect, (b) Completeness in the distance, (c) Intricate details and (d) Transparent. Patches that highlight the visual differences are emphasized with red insets for clearer visibility.

4.1 Main Results

Results on static 3D datasets

Table 1: Comparison of various methods across different scenes on the Mip-NeRF 360 dataset, Tanks&Temples and Deep Blending. 3DGS* indicates the retrained model from the official implementation. Bold represents best, underline indicates second best.
Method Mip-NeRF 360 Tanks&Temples Deep Blending
PSNR SSIM LPIPS PSNR SSIM LPIPS PSNR SSIM LPIPS
Plenoxels  [11] 23.08 0.625 0.463 21.08 0.719 0.379 23.06 0.795 0.510
INGP-Big  [25] 25.59 0.699 0.331 21.92 0.745 0.305 24.96 0.817 0.390
Mip-NeRF 360  [3] 27.69 0.792 0.237 22.22 0.759 0.257 29.40 0.901 0.245
3DGS  [13] 27.21 0.815 0.214 23.14 0.841 0.183 29.41 0.903 0.243
3DGS* 27.47 0.816 0.216 23.67 0.849 0.177 29.55 0.904 0.245
3DGS + LPM 27.59 0.820 0.216 23.83 0.850 0.181 29.76 0.908 0.241

The quantitative results (PSNR, SSIM, and LPIPS) on the Mip-NeRF 360 and Tanks & Temples datasets are presented in Tables 1. We retrained the 3DGS model (referred to as 3DGS*) as it yields better performance compared to the original 3DGS. Our approach achieves results comparable to the state-of-the-art on the Mip-NeRF360 dataset and further enhances 3DGS using our point management technique. Additionally,  LPM improve 3DGS to set new state-of-the-art results on the Tanks & Temples and DeepBlending datasets, effectively capturing more challenging environments (e.g., light effects, transparency). These results quantitatively validate the effectiveness of our method in improving the quality of reconstruction.

In Figures 3, we present a comparison between 3DGS [13] and the 3DGS+LPM. A variety of improvements can be observed, particularly in challenging cases such as light effects, completeness at a distance, intricate details, and transparency. Our LPM significantly reduces artifacts in specific regions on top of 3DGS, particularly in the tree and flowers in the second and third rows. These regions require more points for accurate population, leading to a more precise and detailed reconstruction. Additionally, the tablecloth and window regions in the first and last rows are affected by ill-conditioned points. Our geometry calibration with LPM provides an opportunity to correct these potentially ill-conditioned points, enhancing the overall reconstruction accuracy.

Results on dynamic 4D datasets

Refer to caption
Figure 4: Qualitative evaluation on dynamic Neural 3D Video dataset [16]. LPM improves STGS [18] for both scenes Transparent (e.g., window) and Dynamic movements (e.g., dog’s tongue).
Table 2: Quantitative comparisons on the Neural 3D Video dataset. “FPS” is measured at a resolution of 1352 ×\times× 1014. Some methods only report results for a subset of scenes. For a fair comparison, we report LPM’s results under two pre-existing settings. 1 Only includes the Flame Salmon scene. Bold represents best, underline indicates second best.
Method PSNR DSSIM1 DSSIM2 LPIPS FPS
LLFF [23] 1 23.24 - 0.076 0.235 -
DyNeRF [17] 1 29.58 0.020 0.083 0.063 0.015
Dynamic-4DGS [36] 1 - - - - 30
4DGS  [40] 1 29.38 - - - 114
STGS  [18] 1 29.58 0.038 0.022 0.063 103
STGS* 1 29.48 0.038 0.023 0.066 110
STGS 1 + LPM 29.84 0.036 0.022 0.062 105
StreamRF [15] 28.26 - - 0.039 10.9
NeRFPlayer [29] 30.69 0.034 - 0.111 0.05
HyperReal [2] 31.10 0.036 - 0.096 2
K-planes [10] 31.63 0.018 - 0.31 3
MixVoxels-X  [34] 31.73 0.015 - 0.064 4.6
Dynamic-4DGS  [36] 31.15 - 0.016 0.049 30
4DGS  [40] 32.01 - - 0.055 114
STGS  [18] 32.05 0.026 0.014 0.044 140
STGS* 31.99 0.026 0.015 0.045 145
STGS+ LPM 32.40 0.025 0.014 0.045 140

Table  2 presents a quantitative evaluation on the Neural 3D Video Dataset. Following established practices, training and evaluation are conducted at half resolution, with the first camera held out for evaluation  [17]. Integrating our LPM into SpaceTimeGS yields the best performance across all comparisons. Notably, our method demonstrates significant improvements in the challenging Flame Salmon scene compared to STGS [18]. Our approach not only surpasses previous methods in rendering quality but also maintains comparable rendering speed.

In addition to the quantitative assessment, we provide qualitative comparisons on the Flame Salmon and Flame Steak scenes, as illustrated in Figure 4. The quality of synthesis in both static and dynamic regions markedly outperforms STGS. Several intricate details, including the tree behind the window and the fine features like the dog’s tongue, are faithfully reproduced with higher accuracy compared to STGS [18]. Both examples indicate that LPM improves upon STGS for superior scene modeling.

4.2 Ablation study

Refer to caption
Figure 5: Effect of key operations of LPM. We show that the point addition operation effectively captures the geometric details in the scene; The point reset operation based on the error map further calibrate the geometry.

We conducted ablation studies on the more challenging scene: PlayRoom from Deep Blending [12] and Truck from Tanks&Temples  [14].

Effectiveness and efficiency of the LPM

We hypothesize that the Adaptive Density Control (ADC) tends to overlook under-optimized points due to its simplistic approach of thresholding the average gradient. The straight way to identify the all points is lowering threshold to densification process. Although this solution can reduce blurring in specific regions, such as the toy (red box) illustrated in Figure 5, it still has limitations. As shown in Table 3, lowering the threshold for 3DGS significantly increases the number of Gaussian points and decreases rendering speed. Additionally, the PSNR of the quantitative results decreases due to the introduction of unnecessary points in already dense areas. In contrast, LPM effectively generates points in areas indicated by the error map, leading to more accurate and detailed reconstructions while maintaining real-time rendering speed. As demonstrated by the qualitative comparison in Figure 5, 3DGS with  LPM achieves superior qualitative results.

Table 3: Efficiency analysis. Rendering speed of both methods are measured on our machine.
Scene Method PSNR LPIPS Gaussians Training time
3DGS*  [13] 30.03 0.244 232k 22min
PlayRoom 3DG* (lower threshold) 29.69 0.240 523k 36min
3DGS + LPM 30.22 0.241 186k 23min
3DGS* [13] 25.42 0.146 257k 19 min
Truck 3DGS* (lower threshold) 25.45 0.127 635k 35min
3DGS + LPM 25.61 0.154 265k 21min

Individual points manipulation

We study the effect of individual points manipulation of LPM, including the point addition and reset ill-conditional points. The results in Table 4 show that, (1) each manipulation is useful with positive gain, suggesting that the  LPM is meaningful. (2) The point addition operation densify the under-optimized points which may be overlook in the 3DGS , further captures the geometry details (e.g., detail of toy and leaf of the tree, see Fig. 5). (3) Reset points in ceratin zone provide the opportunity of correct the ill-conditioned points to achieve geometry calibration, (e.g., window of the trunk, see Fig. 5).

Table 4: Performance comparison for different configurations
PlayRoom Truck
Method PSNR LPIPS SSIM PSNR LPIPS SSIM
Full LPM 30.22 0.241 0.910 25.61 0.154 0.883
wo/ point addition 30.10 0.241 0.910 25.43 0.153 0.883
wo/ reset 30.07 0.243 0.908 25.52 0.144 0.883

Robustness to sparse training images

We conducted further ablation studies to verify the impact of the number of training images. In Table 5, we present the results of training 3DGS and our method using randomly selected subsets comprising 25%, 50%, 75%, and 100% of the training images. Remarkably, our method consistently achieves superior rendering results compared to 3DGS across different percentages of training images.

Table 5: Effect of different training view ratios in the PlayRoom and Truck.
Scene Method 25% 50% 75% 100%
PSNR LPIPS PSNR LPIPS PSNR LPIPS PSNR LPIPS
PlayRoom 3DGS [13] 25.33 0.313 27.37 0.270 29.16 0.253 30.03 0.244
3DGS+ LPM 25.43 0.313 27.42 0.267 29.06 0.252 30.22 0.241
Trunk 3DGS [13] 22.46 0.177 24.15 0.154 24.86 0.150 25.42 0.146
3DGS + LPM 22.95 0.173 24.55 0.157 25.14 0.152 25.61 0.154

5 Conclusions and limitations

We propose Localized Point Management (LPM), a novel point management approach to address the limitations of the Adaptive Density Control (ADC) mechanism in 3D Gaussian Splatting (3DGS). The core idea of LPM is identifying the error-contributing 3D zones that require both point addition and geometry calibration under multiview geometry constraints, guided by image rendering errors. We implement appropriate operations for point densification and opacity reset. As a versatile plugin, LPM can be seamlessly integrated into existing 3DGS-based rendering methods. Extensive experiments across both static 3D and dynamic 4D scenes validate the efficacy of LPM in enhancing existing ADC mechanisms both quantitatively and qualitatively. While our method identifies the 3D Gaussian points that lead to rendering errors, it still follows the densification rules of 3DGS [13]. This approach may not be optimal for under-optimized points, and we leave this aspect for further investigation.

References

  • [1] K.-A. Aliev, A. Sevastopolsky, M. Kolos, D. Ulyanov, and V. Lempitsky. Neural point-based graphics. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 696–712. Springer, 2020.
  • [2] B. Attal, J.-B. Huang, C. Richardt, M. Zollhöfer, J. Kopf, M. O’Toole, and C. Kim. Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16610–16620, June 2023.
  • [3] J. T. Barron, B. Mildenhall, M. Tancik, P. Hedman, R. Martin-Brualla, and P. P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  • [4] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  • [5] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman. Zip-nerf: Anti-aliased grid-based neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19697–19705, 2023.
  • [6] A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
  • [7] K. Cheng, X. Long, K. Yang, Y. Yao, W. Yin, Y. Ma, W. Wang, and X. Chen. Gaussianpro: 3d gaussian splatting with progressive propagation. arXiv preprint arXiv:2402.14650, 2024.
  • [8] K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022.
  • [9] N. Deng, Z. He, J. Ye, B. Duinkharjav, P. Chakravarthula, X. Yang, and Q. Sun. Fov-nerf: Foveated neural radiance fields for virtual reality. IEEE Transactions on Visualization and Computer Graphics, 28(11):3854–3864, 2022.
  • [10] S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12479–12488, 2023.
  • [11] S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  • [12] P. Hedman, J. Philip, T. Price, J.-M. Frahm, G. Drettakis, and G. Brostow. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG), 37(6):1–15, 2018.
  • [13] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
  • [14] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  • [15] L. Li, Z. Shen, Z. Wang, L. Shen, and P. Tan. Streaming radiance fields for 3d video synthesis. In Advances in Neural Information Processing Systems, 2022.
  • [16] T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5521–5531, 2022.
  • [17] T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombe, and Z. Lv. Neural 3d video synthesis from multi-view video. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5511–5521, 2022.
  • [18] Z. Li, Z. Chen, Z. Li, and Y. Xu. Spacetime gaussian feature splatting for real-time dynamic view synthesis. arXiv preprint arXiv:2312.16812, 2023.
  • [19] P. Lindenberger, P.-E. Sarlin, and M. Pollefeys. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17627–17638, 2023.
  • [20] L. Liu, J. Gu, K. Zaw Lin, T.-S. Chua, and C. Theobalt. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
  • [21] S. Lombardi, T. Simon, J. Saragih, G. Schwartz, A. Lehrmann, and Y. Sheikh. Neural volumes: Learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751, 2019.
  • [22] T. Lu, M. Yu, L. Xu, Y. Xiangli, L. Wang, D. Lin, and B. Dai. Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. arXiv preprint arXiv:2312.00109, 2023.
  • [23] B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar. Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph., 38(4), jul 2019.
  • [24] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  • [25] T. Müller, A. Evans, C. Schied, and A. Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022.
  • [26] B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  • [27] S. Rota Bulò, L. Porzi, and P. Kontschieder. Revising densification in gaussian splatting. arXiv e-prints, pages arXiv–2404, 2024.
  • [28] V. Sitzmann, J. Thies, F. Heide, M. Nießner, G. Wetzstein, and M. Zollhofer. Deepvoxels: Learning persistent 3d feature embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2437–2446, 2019.
  • [29] L. Song, A. Chen, Z. Li, Z. Chen, L. Chen, J. Yuan, Y. Xu, and A. Geiger. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics, 29(5):2732–2742, 2023.
  • [30] M. Tancik, V. Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
  • [31] J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  • [32] J. Thies, M. Zollhöfer, and M. Nießner. Deferred neural rendering: Image synthesis using neural textures. Acm Transactions on Graphics (TOG), 38(4):1–12, 2019.
  • [33] H. Turki, D. Ramanan, and M. Satyanarayanan. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12922–12931, 2022.
  • [34] F. Wang, S. Tan, X. Li, Z. Tian, Y. Song, and H. Liu. Mixed neural voxels for fast multi-view video synthesis. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 19649–19659, 2023.
  • [35] P. Wang, Y. Liu, Z. Chen, L. Liu, Z. Liu, T. Komura, C. Theobalt, and W. Wang. F2-nerf: Fast neural radiance field training with free camera trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4150–4159, 2023.
  • [36] G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  • [37] Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli, and U. Neumann. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022.
  • [38] Z. Yan, W. F. Low, Y. Chen, and G. H. Lee. Multi-scale 3d gaussian splatting for anti-aliased rendering. arXiv preprint arXiv:2311.17089, 2023.
  • [39] Z. Yang, Y. Chen, J. Wang, S. Manivasagam, W.-C. Ma, A. J. Yang, and R. Urtasun. Unisim: A neural closed-loop sensor simulator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1389–1399, 2023.
  • [40] Z. Yang, H. Yang, Z. Pan, X. Zhu, and L. Zhang. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. In ICLR, 2024.
  • [41] Z. Yu, A. Chen, B. Huang, T. Sattler, and A. Geiger. Mip-splatting: Alias-free 3d gaussian splatting. arXiv preprint arXiv:2311.16493, 2023.
  • [42] Z. Yu, S. Peng, M. Niemeyer, T. Sattler, and A. Geiger. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35:25018–25032, 2022.
  • [43] Z. Zhang, W. Hu, Y. Lao, T. He, and H. Zhao. Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting. arXiv preprint arXiv:2403.15530, 2024.
  • [44] M. Zwicker, H. Pfister, J. Van Baar, and M. Gross. Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.

Appendix A Appendix

A.1 Additional Results

Per-scene Result of Static 3D

We provide additional quantitative results for all three datasets in the tables referenced. Tables  6,  7,  8,  9,  10, and  11 present the metrics for each scene in the Mip-NeRF360 [3], Tanks&Temples [14], and DeepBlending [12] datasets. Our method consistently improve 3DGS [13] scene modeling in the vast majority of scenarios.

Table 6: Performance comparison of different methods on various scenes (PSNR \uparrow). (Part 1).
Bicycle Flowers Garden Stump Treehill Room
Plenoxels [11] 21.912 20.097 23.4947 20.661 22.487 27.594
INGP-Big [25] 22.171 20.652 25.069 23.466 22.373 29.690
Mip-NeRF 360 [3] 24.37 21.73 26.98 26.40 22.87 31.63
3DGS  [13] 25.246 21.520 27.410 26.550 22.490 30.632
3DGS* 25.166 21.576 27.388 26.637 22.487 31.53
3DGS + LPM 25.4 21.73 27.43 26.81 22.78 31.58
Table 7: Performance comparison of different methods on various scenes (PSNR \uparrow). (Part 2).
Counter Kitchen Bonsai Dr Johnson Playroom Truck Train
Plenoxels [11] 23.624 23.420 24.669 23.142 22.980 23.221 18.927
INGP-Big  [25] 26.691 29.479 30.685 28.257 21.665 23.383 20.456
Mip-NeRF 360  [3] 29.55 32.23 33.46 29.140 29.657 24.912 19.523
3DGS  [13] 28.700 30.317 31.980 28.766 30.044 25.187 21.097
3DGS* 28.90 31.43 32.14 29.08 30.03 25.42 21.91
3DGS + LPM 28.91 31.45 32.20 29.30 30.22 25.61 22.05
Table 8: Performance comparison of different methods on various scenes (LPIPS \downarrow). (Part 1).
Bicycle Flowers Garden Stump Treehill Room
Plenoxels  [11] 0.506 0.521 0.3864 0.503 0.540 0.4186
INGP-Big  [25] 0.446 0.441 0.257 0.421 0.450 0.261
Mip-NeRF 360 [3] 0.301 0.344 0.170 0.261 0.339 0.211
3DGS  [13] 0.205 0.336 0.103 0.210 0.317 0.220
3DGS* 0.211 0.336 0.107 0.215 0.324 0.218
3DGS + LPM 0.203 0.337 0.108 0.224 0.347 0.209
Table 9: Performance comparison of different methods on various scenes (LPIPS \downarrow). (Part 2).
Counter Kitchen Bonsai Dr Johnson Playroom Truck Train
Plenoxels  [11] 0.441 0.447 0.398 0.521 0.499 0.335 0.422
INGP-Big  [25] 0.306 0.195 0.205 0.352 0.428 0.249 0.360
Mip-NeRF 360  [3] 0.204 0.127 0.176 0.237 0.252 0.159 0.354
3DGS  [13] 0.204 0.129 0.205 0.244 0.241 0.148 0.218
3DGS* 0.200 0.126 0.204 0.245 0.244 0.146 0.207
3DGS + LPM 0.200 0.125 0.202 0.241 0.241 0.154 0.209
Table 10: Performance comparison of different methods on various scenes (SSIM \uparrow). (Part 1).
Bicycle Flowers Garden Stump Treehill Room
Plenoxels  [11] 0.496 0.431 0.6063 0.523 0.509 0.8417
INGP-Big [25] 0.512 0.486 0.701 0.594 0.542 0.871
Mip-NeRF 360  [3] 0.685 0.583 0.813 0.744 0.632 0.913
3DGS  [13] 0.771 0.605 0.868 0.775 0.638 0.914
3DGS* 0.765 0.606 0.867 0.773 0.634 0.920
3DGS + LPM 0.776 0.609 0.870 0.781 0.636 0.923
Table 11: Performance comparison of different methods on various scenes (SSIM \uparrow). (Part 2).
Counter Kitchen Bonsai Dr Johnson Playroom Truck Train
Plenoxels  [11] 0.759 0.648 0.814 0.787 0.802 0.774 0.663
INGP-Big  [25] 0.817 0.858 0.906 0.854 0.779 0.800 0.689
Mip-NeRF 360  [3] 0.894 0.920 0.941 0.901 0.900 0.857 0.660
3DGS  [13] 0.905 0.922 0.938 0.899 0.906 0.879 0.802
3DGS* 0.908 0.927 0.942 0.901 0.907 0.882 0.815
3DGS + LPM 0.909 0.929 0.943 0.905 0.910 0.883 0.817

Per-scene Result of Dynamic 4D

In Table  12, we provide the PSNR on different scenes. The quanlitative results clearly show that LPM improve STGS  [18] to faithfully capture the subtle static and dynamic information.

Table 12: Performance comparison of different methods on various scenes (PSNR \uparrow).
Coffee Spinach Beef Salmon Steak Sear
Martini Cut Flame Flame Steak
K-Planes-explicit  [10] 28.74 32.19 31.93 28.71 31.80 31.89
K-Planes-hybrid  [10] 29.99 32.60 31.82 30.44 32.38 32.52
MixVoxels  [34] 29.36 31.61 31.30 29.92 31.21 31.43
NeRFPlayer  [29] 31.53 30.56 29.35 31.65 31.93 29.12
HyperReel  [2] 28.37 32.30 32.92 28.26 32.20 32.57
Dynamic-4D  [36] 27.34 32.46 32.90 29.20 32.51 32.49
4DGS  [40] 28.33 32.93 33.85 29.38 34.03 33.51
STGS  [18] 28.61 33.18 33.52 29.48 33.64 33.89
STGS* 28.48 33.05 33.40 29.48 33.74 33.80
STGS+LPM 28.93 33.27 33.90 29.84 34.26 34.20

Appendix B More visualizations

Figure 6 provides more examples on static 3D and dynamic 4D dataset.

Refer to caption
Figure 6: More qualitative comparisons on static 3D and dynamic 4D dataset.