22email: {yhchen.ee, wylin}@sjtu.edu.cn,
{qianyi.wu, jianfei.cai, mehrtash.harandi}@monash.edu
HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Abstract
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over size reduction over SOTA 3DGS compression approach Scaffold-GS. Our code is available here.
Abstract
This is the supplementary material for our paper. Herein, we offer more details of implementations, an extra experiment, quantitative per-scene results across all datasets, and a comprehensive notation table.
Keywords:
3D Gaussian Splatting Compression Context Models1 Introduction
Over the past few years, significant advancements have been made in 3D scene representations for novel view synthesis. Neural Radiance Field (NeRF) [26] proposes rendering colors by accumulating RGB values along sampling rays using an implicit Multilayer Perceptron (MLP), aiming at reconstructing photo-realistic images. However, the extensive sampling of ray points has been a bottleneck, affecting both the speed of training and rendering. Recent advances of NeRF [28, 5, 11] introduce feature grids to enhance the rendering process, facilitating faster rendering speeds by reducing the MLP size. Despite the improvement, these approaches still suffer from relatively slow rendering speeds due to frequent ray point sampling.
In this context, very recently, a new paradigm of 3D representation, 3D Gaussian Splatting (3DGS) [17], emerged. 3DGS introduces learnable Gaussians to directly represent 3D space explicitly. These Gaussians, initialized from Structure-from-Motion (SfM) [33] and endowed with learnable shape and appearance parameters, can be directly splatted onto 2D planes for rapid and differentiable rendering within imperceptible intervals using tile-based rasterization [19]. As such, the time-consuming volume rendering used in NeRF can be completely removed. The advantages of rapid differentiable rendering with high photo-realistic fidelity have stimulated the fast and widespread adoption of 3DGS in the field.
However, 3DGS is not the ultimate solution. One major drawback is that it requires a considerable number of 3D Gaussians to well represent a large-scale scene (e.g., at the scale of millions of Gaussians for city-scale scenes) and needs a large storage space (e.g., a few GigaBytes (GB)) to store the associated Gaussian attributes for each scene [40]. This motivates us to investigate effective compression techniques for 3DGS.
Due to their sparse and unorganized nature, compressing 3D Gaussians is challenging and difficult. Therefore, most existing 3DGS compression approaches focus solely on parameter “values” but overlook their structural relations. For example, as illustrated in Fig. 1 middle, parameter pruning can be used to mask out the Gaussians whose parameter values are below a certain threshold [20, 10]. Another straightforward technique is to apply vector quantization to cluster parameters with similar “values”. Such an approach enables the direct compression of parameters by only retaining more representative ones while maintaining reconstruction fidelity [20, 30, 29, 10]. Nevertheless, solely concentrating on “values” fails to eliminate structural redundancies, which are pivotal for compact representations. To exploit such spatial relations of Gaussians, Scaffold-GS [25] introduces anchors to cluster related nearby 3D Gaussians and neural-predict their attributes from the anchors’ attributes, resulting in significant storage savings. Despite the improvement, Scaffold-GS still treats each anchor independently, and there are still substantial anchors that are sparse, unorganized, and hard to compress, due to their point-cloud nature.
To further push the boundary of 3DGS compression, we draw inspiration from the NeRF series [26], contemplating the idea of representing 3D space using well-organized feature grids [28, 5]. We pose the question: Is there inherent relations between the attributes of unorganized anchors in Scaffold-GS and the structured feature grids? Our answer is affirmative since we observe large mutual information between anchor attributes and the hash grid features. Based on this observation, we propose a Hash-grid Assisted Context (HAC) framework, where our core idea is to jointly learn structured compact hash grid (binarized for each hash parameter) and use it for context modeling of anchor attributes. Specifically, with Scaffold-GS [25] as our base model, for each anchor, we query the hash grid by the anchor location to obtain an interpolated hash feature, which is then used to predict the value distributions of anchor attributes, facilitating entropy coding such as Arithmetic Coding (AE) [39] for a highly compact representation of the model. Note that we employ Scaffold-GS as our base model as its anchor-centered design provides a good foundation to establish relations with these interpolated hash features. Furthermore, we introduce an Adaptive Quantization Module (AQM), which dynamically adjusts different quantization step sizes for different anchor attributes for retaining of their original information. Learnable masks are also employed to mask out invalid Gaussians and anchors, further enhancing the compression ratio. Our main contributions can be summarized as follows:
-
1.
To our knowledge, we are the first to model contexts for 3DGS compression, i.e., using a structured hash grid to exploit the inherent consistencies among unorganized 3D Gaussians (or anchors in Scaffold-GS).
-
2.
To facilitate efficient entropy encoding of anchor attributes, we propose to use the interpolated hash feature to neural-predict the value distribution of anchor attributes as well as neural-predicting quantization step refinement with AQM. We also employ learnable masks to prune out ineffective Gaussians and anchors.
-
3.
Extensive experiments on five datasets demonstrate the effectiveness of our HAC framework and each technical component. We achieve a compression ratio of over our base model Scaffold-GS and over the vanilla 3DGS model when averaged over all datasets, while with comparable or even improved fidelity.
2 Related Work
Neural Radiance Field and its compression. The emergence of Neural Radiance Field (NeRF)[26] has significantly advanced novel view synthesis by employing a single learnable implicit MLP to generate arbitrary views of 3D scenes through -composed accumulation of RGB values along a ray. However, the dense querying of sampling points and the utilization of a large MLP hinder real-time rendering. To address this problem, subsequent approaches such as Instant-NGP[28], TensoRF [5], K-planes [11], and DVGO [37] adopt explicit grid-based representations to facilitate faster training and rendering by reducing the size of the MLP, which however comes at the cost of increased storage space.
To mitigate the storage increase, compression techniques focusing on reducing the size of explicit representations have been developed, which can be categorized into either “value”-based or structural-relation-based approaches. The former category includes pruning [23, 9], codebooks [23, 24], and methods like quantization or entropy constraint employed in BiRF [35] and SHACIRA [13]. On the other hand, the latter category explores structural relations via wavelet decomposition [32], rank-residual decomposition [38], or spatial prediction [36] to eliminate spatial redundancy, thanks to the well-structured characteristics of these feature grids. CNC [6] provides a solid proof of concept by sufficiently utilizing such structural information, achieving remarkable RD performance gain.
3D Gaussian Splatting and its compression. 3DGS [17] has innovatively addressed the challenge of slow training and rendering in NeRF while maintaining high-fidelity quality by representing 3D scenes with 3D Gaussians endowed with learnable shape and appearance attributes. By adopting differentiable splatting and tile-based rasterization [19], 3D Gaussians are optimized during training to best fit their local 3D regions. Despite its advantages, the substantial Gaussians and their associated attributes necessitate effective compression techniques.
Unlike NeRF-based feature grids, 3D Gaussians in 3DGS are sparse and unorganized, presenting significant challenges for establishing structural relations. Consequently, compression approaches have primarily focused solely on the “value” of model parameters, employing techniques such as pruning [20, 10], codebooks [20, 30, 10, 29], and entropy constraints [12]. To our knowledge, Scaffold-GS [25] and Morgenstern et al. [27] have explored the relations of Gaussians. In [25], authors introduce anchor-centered features to achieve reduced parameter numbers, while in [27] dimension collapsing is considered to compress Gaussians in an ordered 2D space. However, their investigation of spatial redundancy remains insufficient.
In this paper, we emphasize leveraging such structural relations for compression is crucial. For instance, approaches in image compression [7, 15, 14] and video compression [21, 22, 34] have demonstrated the effectiveness of eliminating structural redundancy by excavating spatial and temporal relations, thanks to their well-organized data structure. Motivated by this, with Scaffold-GS as our base model, we introduce a well-structured hash grid as context to model the inherent consistencies of the sparse and unorganized anchors, achieving much more compact 3DGS representation.
3 Methods
In Fig. 2, we conceptualize our HAC framework. In particular, HAC is based on the baseline Scaffold-GS [25] (Fig. 2 top), which introduces anchors with their attributes (feature, scaling and offsets) to cluster and neural-predict 3D Gaussian attributes (opacity, RGB, scale, and quaternion). At the core of our HAC, we propose to jointly learn structured compact hash grid (binarized for each parameter) that can be queried at any anchor location to obtain the interpolated hash feature (Fig. 2 middle). Instead of directly substituting anchor feature, is used as context to predict the value distributions of anchor attributes, which is essential for the subsequent entropy coding, i.e., Arithmetic Coding (AE). Our context model (Fig. 2 bottom) is a simple MLP that takes as input and outputs for the adaptive quantization module (AQM) (quantize anchor attribute values into a finite set) and the Gaussian parameters ( and ) for modeling the value distributions of anchor attributes, from which we can compute the probability of each quantized attribute value for AE. Note that we draw two MLPs ( and ) in Fig. 2 for easy explanation but they actually share the same MLP layers with outputs at different dimensions. Besides, an adaptive offset masking module (Fig. 2 top-left) is adopted to prune redundant Gaussians and anchors. In the following, we first introduce the background and then delve into the detailed technical components of our HAC.
3.1 Preliminaries
3D Gaussian Splatting (3DGS) [17] represents a 3D scene using numerous Gaussians and renders viewpoints through a differentiable splatting and tile-based rasterization. Each Gaussian is initialized from SfM and defined by a 3D covariance matrix and location (mean) ,
(1) |
where is a random 3D point, and is defined by a diagonal matrix representing scaling and rotation matrix to guarantee its positive semi-definite characteristics, such that . To render an image from a random viewpoint, 3D Gaussians are first splatted to 2D, and render the pixel value using -composed blending,
(2) |
where measures the opacity of each Gaussian after 2D projection, is view-dependent color modeled by Spherical Harmonic (SH) coefficients, and is the number of sorted Gaussians contributing to the rendering.
Scaffold-GS [25] adheres to the framework of 3DGS and introduces a more storage-friendly and fidelity-satisfying anchor-based approach. It utilizes anchors to cluster Gaussians and deduce their attributes from the attributes of attached anchors through MLPs, rather than directly storing them. Specifically, each anchor consists of a location and anchor attributes , where each component represents anchor feature, scaling and offsets, respectively. During rendering, is inputted into MLPs to generate attributes for Gaussians, whose locations are determined by adding and , where is utilized to regularize both locations and shapes of the Gaussians. While Scaffold-GS has demonstrated effectiveness via this anchor-centered design, we contend there is still significant redundancy among inherent consistencies of anchors that we can fully exploit for a more compact 3DGS representation.
3.2 Bridging Anchors and Hash Grid
Our main idea is to leverage the well-structured hash grid to unveil the inherent spatial consistencies of the unorganized anchors. To verify their mutual information, we first explore substituting anchor features with hash features that are acquired by interpolation using the anchor location on the hash grid , defined as . Here, represents the hash gird, where is the dimension of vector , is the table size of the grid for level , and is the total number of levels. We conduct a preliminary experiment on the Synthetic-NeRF dataset [26] to assess its performance, as shown in the right panel of Fig. 3. Direct substitution using hash features appears to yield inferior fidelity and introduces drawbacks such as unstable training (due to its impact on anchor spawning processes) and decreased testing FPS (owing to the extra interpolation operation). These results may further degrade if and are also substituted for a more compact model. Nonetheless, we find the fidelity degradation remains moderate, suggesting the existence of rich mutual information between and . This prompts us to ask: Can we exploit such mutual relation and use the compact hash features to model the context of anchor attributes ? This leads to the context modeling as a conditional probability:
(3) |
where is omitted in the last term as we assume the independence of and (it can be anywhere), making , and do not employ entropy constraints to . According to information theory [8], a higher probability corresponds to lower uncertainty (entropy) and fewer bits consumption. Thus, the large mutual information between and ensures a large . Our goal is to devise a solution to effectively leverage this relationship. Furthermore, signifies that the size of the hash grid itself should also be compressed, which can be done by adopting the existing solution for Instant-NGP compression [6].
We underscore the significance of this conditional probability based approach since it ensures both rendering speed and fidelity upper-bound unaffected as it only utilizes hash features to estimate the entropy of anchor attributes for entropy coding but does not modify the original Scaffold-GS structure. In the following subsections, we delve into the technical details of our context models.
3.3 HAC: Hash-Grid Assisted Context Framework
The principle objective of HAC is to minimize the entropy of anchor attributes with the assistance of hash feature (i.e., maximize ), facilitating bit reduction when encoding anchor attributes using entropy coding like AE[39]. As shown in Fig. 2, anchor locations are firstly inputted into the hash grid for interpolation, the obtained are then employed as context for .
Adaptive Quantization Module. To facilitate entropy coding, values of must be quantized to a finite set. Our empirical studies reveal that binarization, as that in BiRF [35], is unsuitable for as it fails to preserve sufficient information. Thus, we opt for rounding them to maintain their comprehensive features. To ensure backpropagation, we utilize the “adding noise” operation during training and “rounding” during testing, as described in [1].
Nevertheless, the conventional rounding is essentially a quantization with a step size of “1”, which is inappropriate for the scaling and the offset , since they are usually decimal values. To address this, we further introduce an Adaptive Quantization Module (AQM), which adaptively determines quantization steps. In particular, for the th anchor , we denote as any of its ’s components: , where is its respective dimension. The quantization can be written as,
(4) | ||||
where
(5) | ||||
We use a simple MLP-based context model to predict from hash feature a refinement , which is used to adjust the predefined quantization step size . Note that varies for , , and . (5) essentially restricts the quantization step size to be chosen within , enabling to closely resemble the original characteristics of , maintaining a high rendering fidelity.
Gaussian Distribution Modeling. To measure the bit consumption of during training, its probability needs to be estimated in a differentiable manner. As shown in Fig. 3 left, all three components of anchor attributes exhibit statistical tendencies of Gaussian distributions, where displays a single-sided pattern due to Sigmoid activation111We define as the one after Sigmoid activation, which is slightly different from [25].. This observation establishes a lower bound for probability prediction when all s in are estimated using the respective and of the statistical Gaussian Distribution of , and . Nevertheless, employing a single set of and for all attributes may lack accuracy. Therefore, we assume anchor attributes ’s values independent, and construct their respective Gaussian distributions, where their individual and are estimated by a simple MLP-based context model from . Specifically, for the th anchor and its quantized anchor attribute vector , with the estimated and , we can compute the probability of as,
(6) | ||||
where and represent the probability density function and the cumulative distribution function, respectively. Consequently, we define an entropy loss as the summation of bit consumption over all s:
(7) |
where is the number of anchors and is -th dimension value of . Minimizing the entropy loss will encourage a high probability estimation for , which in turn encourages an accurate and a small and thus guides the learning of the context model.
Adaptive Offset Masking. From Fig. 3 left, we can also see that exhibits an impulse at zero, suggesting the occurrence of substantial unnecessary Gaussians. Thus, we employ the technique introduced by Lee et al. [20] to prune invalid by utilizing straight-through [3] estimated binary masks. Specifically, we apply the same marking loss in [20] to encourage masking as many Gaussians as possible. This process effectively masks out invalid offsets and saves storage space directly. Additionally, we implement anchor pruning: if all the attached are pruned on an anchor, then this anchor no longer contributes to rendering and should be pruned entirely (including its and ).
Hash Grid Compression. As shown in (3), the size of the hash grid also significantly influences the final storage size. To this end, we binarize the hash table to using straight-through estimation (STE) [35] and calculate the occurrence frequency [6] of the symbol “” to estimate its bit consumption:
(8) |
where and are total numbers of “” and “” in the hash grid.
3.4 Training and Coding Process
During training, we incorporate both the rendering fidelity loss and the entropy loss to ensure the model improves rendering quality while controlling total bitrate consumption in a differentiable manner. Our overall loss is
(9) |
Here, represents the rendering loss as defined in [25], which includes two fidelity penalty loss terms and one regularization term for the scaling . The second part in (9) is the estimated controllable bit consumption, including the estimated bits for anchor attributes and for the hash grid. The last term in (9) is the masking loss adopted from [20] to regularize the adaptive offset masking module. and are trade-off hyperparameters used to balance the loss components. Note that we incorporate different techniques or loss items at different iterations to stabilize the training process. Please refer to the supplementary Sec.A for more details.
For the encoding/decoding process, the binary hash grid is first encoded/ decoded using AE with . Then, hash feature is obtained through interpolation based on and . Once is acquired, the context models and are then employed to estimate quantization refinement term and parameters of the Gaussian Distribution (i.e., and ) to derive the probability for entropy encoding/decoding with AE.
4 Experiments
In this section, we first present our HAC framework’s implementation details and then conduct evaluation experiments to compare with existing 3DGS compression approaches. Additionally, we include ablation studies to demonstrate the effectiveness of each technical component of our method. Finally, we visualize the bit allocation map for better understanding.
4.1 Implementation Details
We implement our HAC based on the Scaffold-GS repository [25] using the PyTorch framework [31] and train the model on a single NVIDIA RTX 4090 GPU. We increase the dimension of the Scaffold-GS anchor feature (i.e., ) to 50, and disable its feature bank as we found it may lead to unstable training. For the hash grid , we utilize a mixed 3D-2D structured binary hash grid, with 12 levels of 3D embeddings ranging from 16 to 512 resolutions, and 4 levels of 2D embeddings ranging from 128 to 1024 resolutions. The maximum hash table sizes are and for the 3D and 2D grids, respectively, both with a feature dimension of . We set to , and change from to for variable bitrates. We set as , and for , and , respectively. We combine and to a single 3-layer MLP with ReLU activation.
Datasets | Synthetic-NeRF [26] | Mip-NeRF360 [2] | Tank&Temples [18] | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
methods | psnr | ssim | lpips | size | psnr | ssim | lpips | size | psnr | ssim | lpips | size |
3DGS [17] | 33.80 | 0.970 | 0.031 | 68.46 | 27.49 | 0.813 | 0.222 | 744.7 | 23.69 | 0.844 | 0.178 | 431.0 |
Scaffold-GS [25] | 33.41 | 0.966 | 0.035 | 19.36 | 27.50 | 0.806 | 0.252 | 253.9 | 23.96 | 0.853 | 0.177 | 86.50 |
Lee et al. [20] | 33.33 | 0.968 | 0.034 | 5.54 | 27.08 | 0.798 | 0.247 | 48.80 | 23.32 | 0.831 | 0.201 | 39.43 |
Compressed3D [30] | 32.94 | 0.967 | 0.033 | 3.68 | 26.98 | 0.801 | 0.238 | 28.80 | 23.32 | 0.832 | 0.194 | 17.28 |
EAGLES[12] | 32.54 | 0.965 | 0.039 | 5.74 | 27.15 | 0.808 | 0.238 | 68.89 | 23.41 | 0.840 | 0.200 | 34.00 |
LightGaussian [10] | 32.73 | 0.965 | 0.037 | 7.84 | 27.00 | 0.799 | 0.249 | 44.54 | 22.83 | 0.822 | 0.242 | 22.43 |
Morgenstern et al. [27] | 31.05 | 0.955 | 0.047 | 2.20 | 26.01 | 0.772 | 0.259 | 23.90 | 22.78 | 0.817 | 0.211 | 13.05 |
Navaneet et al. [29] | 33.09 | 0.967 | 0.036 | 4.42 | 27.16 | 0.808 | 0.228 | 50.30 | 23.47 | 0.840 | 0.188 | 27.97 |
Ours-lowrate | 33.24 | 0.967 | 0.037 | 1.18 | 27.53 | 0.807 | 0.238 | 15.26 | 24.04 | 0.846 | 0.187 | 8.10 |
Ours-highrate | 33.71 | 0.968 | 0.034 | 1.86 | 27.77 | 0.811 | 0.230 | 21.87 | 24.40 | 0.853 | 0.177 | 11.24 |
Datasets | DeepBlending [16] | BungeeNeRF [40] | ||||||
---|---|---|---|---|---|---|---|---|
methods | psnr | ssim | lpips | size | psnr | ssim | lpips | size |
3DGS [17] | 29.42 | 0.899 | 0.247 | 663.9 | 24.87 | 0.841 | 0.205 | 1616 |
Scaffold-GS [25] | 30.21 | 0.906 | 0.254 | 66.00 | 26.62 | 0.865 | 0.241 | 183.0 |
Lee et al. [20] | 29.79 | 0.901 | 0.258 | 43.21 | 23.36 | 0.788 | 0.251 | 82.60 |
Compressed3D [30] | 29.38 | 0.898 | 0.253 | 25.30 | 24.13 | 0.802 | 0.245 | 55.79 |
EAGLES[12] | 29.91 | 0.910 | 0.250 | 62.00 | 25.24 | 0.843 | 0.221 | 117.1 |
LightGaussian [10] | 27.01 | 0.872 | 0.308 | 33.94 | 24.52 | 0.825 | 0.255 | 87.28 |
Morgenstern et al. [27] | 28.92 | 0.891 | 0.276 | 8.40 | ||||
Navaneet et al. [29] | 29.75 | 0.903 | 0.247 | 42.77 | 24.63 | 0.823 | 0.239 | 104.3 |
Ours-lowrate | 29.98 | 0.902 | 0.269 | 4.35 | 26.48 | 0.845 | 0.250 | 18.49 |
Ours-highrate | 30.34 | 0.906 | 0.258 | 6.35 | 27.08 | 0.872 | 0.209 | 29.72 |
4.2 Experiment Evaluation
Baselines. We compare our HAC with existing 3DGS compression approaches. Notably, [20, 30, 29, 10] mainly adopt codebook-based or parameter pruning strategies, while Scaffold-GS [25] explores Gaussian relations for compact representation. Additionally, EAGLES [12] and Morgenstern et al. [27] employ non-contextual entropy constraints and dimension collapse techniques, respectively.
Datasets. We follow Scaffold-GS to perform evaluations on multiple datasets, including the small-scale Synthetic-NeRF [26] and the four large-scale real-scene datasets: BungeeNeRF [40], DeepBlending [16], Mip-NeRF360 [2], and Tanks&Temples [18]. Note that we evaluate the entire 9 scenes from Mip-NeRF360 dataset [2]. Covering diverse scenarios, these datasets allow us to comprehensively demonstrate the effectiveness of our approach.
Metrics. To comprehensively evaluate compression Rate-Distortion (RD) performance, we calculate relative rate (size) change of our approach over others under a similar fidelity. Note that BD-rate [4] is incalculable as other methods can typically only output a single rate, while four are needed for its calculation.
Results. Quantitative results are shown in Tab. 1 and Fig. 4, and the qualitative outputs are presented in Fig 5. Please refer to the supplementary Sec.C for the detailed metrics of each scene. Our proposed HAC has demonstrated significant size reduction of over when compared to the vanilla 3DGS [17] with even improved fidelity. The size reduction also exceeds over the base model Scaffold-GS [25]. Notably, our highest fidelity surpasses Scaffold-GS, primarily due to two factors: 1) the entropy loss effectively regularizes the model to prevent overfitting, and 2) we increase the dimension of the anchor feature (i.e., ) to 50, resulting in a larger model volume. Although other compression approaches (mid chunk) can reduce the model size to some extent by primarily using pruning and codebooks, they still exhibit significant spatial redundancy. Specifically, Morgenstern [27] achieves a comparably small size, but it significantly sacrifices fidelity due to the dimension collapsing. Note that the relative size changes must be measured under similar fidelity quality.
Bitstream. Our bitstream consists of five components: anchor attributes (comprising , and ), binary hash grid , offset masks, anchor locations and MLPs. Among them, is encoded using entropy codec AE [39] with estimated probabilities from HAC. It accounts for the dominant portion of the storage. The hash grid and the masks are binary data and are encoded by AE using the respective occurrence frequency. The last two components are stored directly in 16 and 32 bits, respectively. When analyzing the bit allocation of each component, they are 14.90MB (8.76MB, 2.52MB, 3.62MB), 0.15MB, 0.52MB, 2.77MB, and 0.16MB for these five components on the most challenging BungeeNeRF dataset [40] with . With scenes become simpler, the storage share of decreases as the value distribution become easier to predict.
4.3 Ablation Study
In this subsection, we conduct ablation studies to demonstrate effectiveness of each technical component. We conduct experiments on both the most challenging large-scale BungeeNeRF dataset [40] and the small-scale Synthetic-NeRF dataset [26] to support convincing and solid results. We assess the effectiveness of individual technical components by disabling either of the following: 1) mutual information from the hash grid, 2) the adaptive quantization module, 3) adaptive offset masking. The results are presented in Fig. 6. Firstly, we set the hash grid to all zeros to eliminate mutual information. This leads to a degradation of conditional probability from to , which indicates that probability of can only be estimated by the statistic and from the left part of Fig. 3. Consequently, the bit consumption drastically increases as the probability can no longer be accurately estimated. Regarding the latter two components, they contribute from different perspectives. Disabling AQM (we remove while retaining to ensure a necessary decimal quantization step) results in a significant drop in fidelity, especially in more complex scenes or at higher rates, as fails to retain sufficient information for rendering after quantization. Differently, offset masking can achieve remarkable rate savings in simpler scenes or lower rate segments due to more significant positional redundancy in Gaussians. Overall, all three components provide a worthwhile tradeoff for improved RD performance.
4.4 Visualization of Bit Allocation
While HAC measures the parameters’ bit consumption, we are interested in the bit allocation across different local areas in the space. We utilize the Synthetic-NeRF dataset [26] for observation, as its object-centered scenes are well-suited for visualization. As shown in Fig. 7, the bit allocation is represented by voxelized colored balls. As observed from the 2nd column of visualized sub-figures, the model tends to allocate more total bits to areas with complex appearances or sharp edges. For instance, specular objects in “materials” and instrument stands in “drums” exhibit higher total bit consumption due to the complexity of textures for those regions. The analysis of the 4th column from an averaging viewpoint reveals varied trends in bit consumption per anchor. In high bit-consumption voxels, creating more anchors for precise modeling can average the bit rate per anchor, smoothing or reducing their bit consumption. This phenomenon aligns with our assumption that anchors demonstrate inherent consistency in the 3D space where nearby anchors exhibit similar values of attributes, making it easier for the hash grid to accurately estimate their value distribution probabilities.
4.5 Training and Execution Time
Training time. Integration of additional models in HAC results in an increase in training time, approximately longer than Scaffold-GS. Specifically, for the challenging city-scale BungeeNeRF dataset [40], the training times are 38.2 minutes for 3DGS [17], 15.1 minutes for Scaffold-GS [25] and 27.6 minutes for our model. For the small-scale Synthetic-NeRF dataset [26], training times are 3.4 minutes, 4.4 minutes and 9.0 minutes, respectively. This increase of training time in our model over Scaffold-GS is our main limitation, but it is still fast.
Coding time. The encoding/decoding process takes approximately 0.87 seconds and 26.7 seconds on Synthetic-NeRF and BungeeNeRF dataset under , respectively. The dominant time consumption occurs during Codec execution of AE on the CPU (over ), as we only use a single thread.
Inference time. The inference process benefits from the design of the context modeling, allowing for the removal of the hash grid once is fully decoded. Consequently, no additional operations are required during rendering, resulting in a similar FPS with Scaffold-GS.
5 Conclusion
In this paper, we have pioneered an investigation into the relationship between unorganized and sparse Gaussians (or anchors in our paper) and well-structured hash grids, leveraging their mutual information for compact 3DGS representations. Our proposed Hash-grid Assisted Context (HAC) framework has achieved SOTA compression performance with remarkable leading over the concurrent works. Extensive experiments have demonstrated the effectiveness of our HAC and its technical components. Overall, our work has successfully mitigated the major challenging of the 3DGS model, i.e., large storage requirement, enabling its adoption in large-scale scenes and diverse devices.
References
- [1] Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (2018)
- [2] Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5470–5479 (2022)
- [3] Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
- [4] Bjontegaard, G.: Calculation of average psnr differences between rd-curves. ITU SG16 Doc. VCEG-M33 (2001)
- [5] Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: Tensorial radiance fields. In: European Conference on Computer Vision. pp. 333–350. Springer (2022)
- [6] Chen, Y., Wu, Q., Harandi, M., Cai, J.: How far can we compress instant-ngp-based nerf? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
- [7] Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7939–7948 (2020)
- [8] Cover, T.M.: Elements of information theory. John Wiley & Sons (1999)
- [9] Deng, C.L., Tartaglione, E.: Compressing explicit voxel grid representations: fast nerfs become also small. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1236–1245 (2023)
- [10] Fan, Z., Wang, K., Wen, K., Zhu, Z., Xu, D., Wang, Z.: Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. arXiv preprint arXiv:2311.17245 (2023)
- [11] Fridovich-Keil, S., Meanti, G., Warburg, F.R., Recht, B., Kanazawa, A.: K-planes: Explicit radiance fields in space, time, and appearance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12479–12488 (2023)
- [12] Girish, S., Gupta, K., Shrivastava, A.: Eagles: Efficient accelerated 3d gaussians with lightweight encodings. arXiv preprint arXiv:2312.04564 (2023)
- [13] Girish, S., Shrivastava, A., Gupta, K.: Shacira: Scalable hash-grid compression for implicit neural representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17513–17524 (2023)
- [14] He, D., Yang, Z., Peng, W., Ma, R., Qin, H., Wang, Y.: Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5718–5727 (2022)
- [15] He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14771–14780 (2021)
- [16] Hedman, P., Philip, J., Price, T., Frahm, J.M., Drettakis, G., Brostow, G.: Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG) 37(6), 1–15 (2018)
- [17] Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4) (2023)
- [18] Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36(4), 1–13 (2017)
- [19] Lassner, C., Zollhofer, M.: Pulsar: Efficient sphere-based neural rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1440–1449 (2021)
- [20] Lee, J.C., Rho, D., Sun, X., Ko, J.H., Park, E.: Compact 3d gaussian representation for radiance field. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
- [21] Li, J., Li, B., Lu, Y.: Deep contextual video compression. Advances in Neural Information Processing Systems 34, 18114–18125 (2021)
- [22] Li, J., Li, B., Lu, Y.: Hybrid spatial-temporal entropy modelling for neural video compression. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 1503–1511 (2022)
- [23] Li, L., Shen, Z., Wang, Z., Shen, L., Bo, L.: Compressing volumetric radiance fields to 1 mb. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4222–4231 (2023)
- [24] Li, L., Wang, Z., Shen, Z., Shen, L., Tan, P.: Compact real-time radiance fields with neural codebook. In: ICME (2023)
- [25] Lu, T., Yu, M., Xu, L., Xiangli, Y., Wang, L., Lin, D., Dai, B.: Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
- [26] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65(1), 99–106 (2021)
- [27] Morgenstern, W., Barthel, F., Hilsmann, A., Eisert, P.: Compact 3d scene representation via self-organizing gaussian grids. arXiv preprint arXiv:2312.13299 (2023)
- [28] Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41(4), 1–15 (2022)
- [29] Navaneet, K., Meibodi, K.P., Koohpayegani, S.A., Pirsiavash, H.: Compact3d: Compressing gaussian splat radiance field models with vector quantization. arXiv preprint arXiv:2311.18159 (2023)
- [30] Niedermayr, S., Stumpfegger, J., Westermann, R.: Compressed 3d gaussian splatting for accelerated novel view synthesis. arXiv preprint arXiv:2401.02436 (2023)
- [31] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
- [32] Rho, D., Lee, B., Nam, S., Lee, J.C., Ko, J.H., Park, E.: Masked wavelet representation for compact neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20680–20690 (2023)
- [33] Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
- [34] Sheng, X., Li, J., Li, B., Li, L., Liu, D., Lu, Y.: Temporal context mining for learned video compression. IEEE Transactions on Multimedia (2022)
- [35] Shin, S., Park, J.: Binary radiance fields. Advances in neural information processing systems (2023)
- [36] Song, Z., Duan, W., Zhang, Y., Wang, S., Ma, S., Gao, W.: Spc-nerf: Spatial predictive compression for voxel based radiance field. arXiv preprint arXiv:2402.16366 (2024)
- [37] Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5459–5469 (2022)
- [38] Tang, J., Chen, X., Wang, J., Zeng, G.: Compressible-composable nerf via rank-residual decomposition. Advances in Neural Information Processing Systems 35, 14798–14809 (2022)
- [39] Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Communications of the ACM 30(6), 520–540 (1987)
- [40] Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B., Lin, D.: Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In: European conference on computer vision. pp. 106–122. Springer (2022)
–Supplementary Material–
A More Implementation Details
A.1 Training Process
We provide a detailed overview of the training process for our HAC framework, as illustrated in Fig. A.
During the initial 3000 iterations, no additional techniques are applied to impact the original training process of Scaffold-GS [25], ensuring a stable start of the anchor attribute training and anchor spawning.
From iteration 3000 to 10000, we introduce “adding noise” operations to anchor attributes , which allows the model to adapt to the quantization process. Note that, in this stage, we only apply for quantization without using for refinement, therefore, we do not need the hash grid. Specifically, we pause the anchor spawning process between iterations 3000 and 4000 for a transitional period, as the sudden introduction of quantization may introduce instability to the spawning process. Once the parameters are fitted to the quantization after iteration 4000, we re-enable the spawning process. Note that we do not incorporate the hash grid in this stage (i.e., before iteration 10000) because we want to provide a transition for the anchor attributes and the spawning process to fit the quantization operation, enabling a more stable training process when the hash grid is incorporated in the further iterations.
After iteration 10000, assuming the 3D model is adequately fitted to the quantization, we fully integrate our HAC framework to jointly train the binary hash grid. Notably, the bound of the hash grid is determined using the maximum and minimum anchor locations at the 10000th iteration, which are then utilized to normalize anchor locations for interpolation in the hash grid. This comprehensive pipeline ensures a stable training process to reduce the model size via entropy constraints while maintaining a high-quality fidelity.
A.2 Sampling Strategy
During training, employing all anchors for entropy training in each iteration could result in prolonged training time and potential out-of-memory (OOM) issues. Therefore, we adopt a sampling strategy that, in each iteration, we only randomly sample and entropy train 5% of anchors from those used for rendering. This approach ensures faster training speeds while still preserving satisfactory RD performance.
B Additional Experiments
Dataset | Total size (MB) | Per-param size (bit) | ||||
---|---|---|---|---|---|---|
Bungee-NeRF [40] | 8.76 | 2.53 | 3.62 | 3.03 | 7.27 | 4.56 |
Synthetic-NeRF [26] | 0.31 | 0.09 | 0.12 | 1.33 | 3.58 | 3.76 |
We investigate bit allocation among the anchor’s three attributes, as depicted in Table A. When viewing from the total size, feature contributes the most due to its highest dimensionality. However, as it needs to be inputted into MLPs to extract Gaussian attributes, it exhibits the most significant dimensional redundancy, resulting in the smallest per-parameter bit. Conversely, this is not the case for scaling and offsets , which are directly used for rendering, making much fewer dimensional redundancies. Additionally, as and are always of higher decimal precision, their value distributions are more difficult to accurately predict, resulting higher per-parameter bit consumption.
C Quantitative Results of Each Scene
C.1 Detailed Results of Our HAC Framework
C.2 Detailed Results of the Base Models
D Notation Table
Please refer to Tab. I for detailed notation explanations.
Scenes | PSNR | SSIM | LPIPS | SIZE | |
---|---|---|---|---|---|
0.004 | chair | 34.02 | 0.981 | 0.018 | 0.82 |
drums | 26.20 | 0.950 | 0.044 | 1.23 | |
ficus | 34.27 | 0.983 | 0.016 | 0.71 | |
hotdog | 36.44 | 0.979 | 0.033 | 0.51 | |
lego | 34.25 | 0.976 | 0.027 | 0.97 | |
materials | 30.20 | 0.959 | 0.045 | 1.07 | |
mic | 35.39 | 0.989 | 0.011 | 0.55 | |
ship | 31.24 | 0.902 | 0.124 | 1.82 | |
AVG | 32.75 | 0.965 | 0.040 | 0.96 | |
0.003 | chair | 34.33 | 0.982 | 0.017 | 0.89 |
drums | 26.26 | 0.951 | 0.043 | 1.42 | |
ficus | 34.57 | 0.984 | 0.015 | 0.82 | |
hotdog | 36.70 | 0.980 | 0.031 | 0.54 | |
lego | 34.65 | 0.977 | 0.024 | 1.07 | |
materials | 30.29 | 0.960 | 0.043 | 1.20 | |
mic | 35.62 | 0.990 | 0.010 | 0.62 | |
ship | 31.32 | 0.903 | 0.121 | 1.86 | |
AVG | 32.97 | 0.966 | 0.038 | 1.05 | |
0.002 | chair | 34.73 | 0.984 | 0.016 | 1.03 |
drums | 26.32 | 0.952 | 0.043 | 1.45 | |
ficus | 34.90 | 0.985 | 0.014 | 0.94 | |
hotdog | 37.11 | 0.981 | 0.029 | 0.64 | |
lego | 35.04 | 0.979 | 0.022 | 1.25 | |
materials | 30.53 | 0.961 | 0.041 | 1.45 | |
mic | 35.92 | 0.990 | 0.010 | 0.67 | |
ship | 31.38 | 0.903 | 0.119 | 1.99 | |
AVG | 33.24 | 0.967 | 0.037 | 1.18 | |
0.001 | chair | 35.21 | 0.985 | 0.014 | 1.32 |
drums | 26.38 | 0.952 | 0.041 | 1.95 | |
ficus | 35.37 | 0.986 | 0.013 | 1.20 | |
hotdog | 37.47 | 0.983 | 0.026 | 0.79 | |
lego | 35.51 | 0.981 | 0.019 | 1.61 | |
materials | 30.58 | 0.961 | 0.040 | 1.62 | |
mic | 36.25 | 0.991 | 0.009 | 0.81 | |
ship | 31.48 | 0.904 | 0.116 | 2.50 | |
AVG | 33.53 | 0.968 | 0.035 | 1.47 | |
0.0005 | chair | 35.49 | 0.986 | 0.013 | 1.67 |
drums | 26.45 | 0.952 | 0.041 | 2.32 | |
ficus | 35.30 | 0.986 | 0.013 | 1.53 | |
hotdog | 37.87 | 0.984 | 0.024 | 0.97 | |
lego | 35.67 | 0.981 | 0.019 | 1.90 | |
materials | 30.70 | 0.962 | 0.039 | 2.07 | |
mic | 36.71 | 0.992 | 0.008 | 1.01 | |
ship | 31.52 | 0.904 | 0.115 | 3.39 | |
AVG | 33.71 | 0.968 | 0.034 | 1.86 |
Scenes | PSNR | SSIM | LPIPS | SIZE | |
---|---|---|---|---|---|
0.004 | bicycle | 25.05 | 0.742 | 0.264 | 27.54 |
garden | 27.28 | 0.842 | 0.151 | 22.69 | |
stump | 26.58 | 0.762 | 0.269 | 18.11 | |
room | 31.55 | 0.921 | 0.208 | 5.53 | |
counter | 29.35 | 0.911 | 0.195 | 7.26 | |
kitchen | 31.16 | 0.923 | 0.131 | 8.05 | |
bonsai | 32.28 | 0.942 | 0.189 | 8.56 | |
flower | 21.26 | 0.572 | 0.381 | 19.59 | |
treehill | 23.30 | 0.645 | 0.356 | 20.04 | |
AVG | 27.53 | 0.807 | 0.238 | 15.26 | |
0.003 | bicycle | 25.05 | 0.742 | 0.261 | 30.02 |
garden | 27.36 | 0.844 | 0.148 | 24.62 | |
stump | 26.64 | 0.763 | 0.265 | 19.85 | |
room | 31.71 | 0.922 | 0.206 | 5.72 | |
counter | 29.54 | 0.913 | 0.191 | 7.93 | |
kitchen | 31.22 | 0.925 | 0.128 | 8.84 | |
bonsai | 32.50 | 0.944 | 0.186 | 9.40 | |
flower | 21.26 | 0.571 | 0.383 | 20.67 | |
treehill | 23.26 | 0.645 | 0.356 | 22.08 | |
AVG | 27.62 | 0.808 | 0.236 | 16.57 | |
0.002 | bicycle | 25.10 | 0.742 | 0.262 | 33.14 |
garden | 27.43 | 0.847 | 0.143 | 27.52 | |
stump | 26.59 | 0.761 | 0.268 | 21.75 | |
room | 31.87 | 0.925 | 0.201 | 6.47 | |
counter | 29.65 | 0.915 | 0.189 | 8.88 | |
kitchen | 31.46 | 0.928 | 0.125 | 10.05 | |
bonsai | 32.70 | 0.945 | 0.184 | 10.51 | |
flower | 21.32 | 0.576 | 0.377 | 23.73 | |
treehill | 23.34 | 0.647 | 0.350 | 24.83 | |
AVG | 27.72 | 0.809 | 0.233 | 18.54 | |
0.001 | bicycle | 25.11 | 0.742 | 0.259 | 39.15 |
garden | 27.46 | 0.849 | 0.139 | 32.17 | |
stump | 26.59 | 0.763 | 0.264 | 25.26 | |
room | 31.90 | 0.926 | 0.198 | 7.85 | |
counter | 29.74 | 0.918 | 0.184 | 10.44 | |
kitchen | 31.63 | 0.930 | 0.122 | 12.07 | |
bonsai | 32.97 | 0.948 | 0.180 | 12.72 | |
flower | 21.27 | 0.575 | 0.377 | 27.55 | |
treehill | 23.26 | 0.648 | 0.345 | 29.65 | |
AVG | 27.77 | 0.811 | 0.230 | 21.87 | |
0.0005 | bicycle | 25.05 | 0.742 | 0.258 | 44.01 |
garden | 27.50 | 0.850 | 0.139 | 36.27 | |
stump | 26.57 | 0.762 | 0.264 | 28.93 | |
room | 32.19 | 0.929 | 0.194 | 9.16 | |
counter | 29.75 | 0.918 | 0.185 | 12.22 | |
kitchen | 31.81 | 0.931 | 0.120 | 13.96 | |
bonsai | 33.16 | 0.949 | 0.178 | 14.90 | |
flower | 21.28 | 0.575 | 0.376 | 31.24 | |
treehill | 23.22 | 0.646 | 0.346 | 34.42 | |
AVG | 27.83 | 0.811 | 0.229 | 25.01 |
Scenes | PSNR | SSIM | LPIPS | SIZE | |
---|---|---|---|---|---|
0.004 | truck | 25.88 | 0.878 | 0.158 | 9.26 |
train | 22.19 | 0.815 | 0.216 | 6.94 | |
AVG | 24.04 | 0.846 | 0.187 | 8.10 | |
0.003 | truck | 25.99 | 0.880 | 0.153 | 9.80 |
train | 22.49 | 0.817 | 0.213 | 7.59 | |
AVG | 24.24 | 0.849 | 0.183 | 8.70 | |
0.002 | truck | 25.99 | 0.881 | 0.153 | 11.15 |
train | 22.66 | 0.819 | 0.210 | 8.64 | |
AVG | 24.33 | 0.850 | 0.181 | 9.90 | |
0.001 | truck | 26.02 | 0.883 | 0.147 | 12.42 |
train | 22.78 | 0.823 | 0.207 | 10.07 | |
AVG | 24.40 | 0.853 | 0.177 | 11.24 | |
0.0005 | truck | 26.00 | 0.883 | 0.146 | 15.12 |
train | 22.49 | 0.823 | 0.206 | 11.19 | |
AVG | 24.25 | 0.853 | 0.176 | 13.16 |
Scenes | PSNR | SSIM | LPIPS | SIZE | |
---|---|---|---|---|---|
0.004 | playroom | 30.44 | 0.902 | 0.272 | 3.15 |
drjohnson | 29.53 | 0.903 | 0.265 | 5.55 | |
AVG | 29.98 | 0.902 | 0.269 | 4.35 | |
0.003 | playroom | 30.61 | 0.903 | 0.269 | 3.66 |
drjohnson | 29.67 | 0.904 | 0.261 | 5.71 | |
AVG | 30.14 | 0.903 | 0.265 | 4.69 | |
0.002 | playroom | 30.66 | 0.905 | 0.265 | 4.12 |
drjohnson | 29.69 | 0.905 | 0.258 | 6.51 | |
AVG | 30.17 | 0.905 | 0.262 | 5.32 | |
0.001 | playroom | 30.84 | 0.906 | 0.262 | 5.03 |
drjohnson | 29.85 | 0.906 | 0.255 | 7.67 | |
AVG | 30.34 | 0.906 | 0.258 | 6.35 | |
0.0005 | playroom | 30.66 | 0.906 | 0.259 | 6.08 |
drjohnson | 29.76 | 0.906 | 0.255 | 9.09 | |
AVG | 30.21 | 0.906 | 0.257 | 7.58 |
Scenes | PSNR | SSIM | LPIPS | SIZE | |
---|---|---|---|---|---|
0.004 | amsterdam | 26.80 | 0.865 | 0.224 | 22.49 |
bilbao | 27.65 | 0.864 | 0.231 | 17.14 | |
hollywood | 24.25 | 0.748 | 0.347 | 16.55 | |
pompidou | 25.16 | 0.829 | 0.266 | 20.40 | |
quebec | 29.33 | 0.918 | 0.192 | 15.06 | |
rome | 25.68 | 0.845 | 0.243 | 19.30 | |
AVG | 26.48 | 0.845 | 0.250 | 18.49 | |
0.003 | amsterdam | 26.95 | 0.873 | 0.214 | 24.41 |
bilbao | 27.82 | 0.872 | 0.218 | 18.76 | |
hollywood | 24.27 | 0.753 | 0.342 | 17.87 | |
pompidou | 25.34 | 0.837 | 0.255 | 22.49 | |
quebec | 29.67 | 0.924 | 0.185 | 16.15 | |
rome | 25.98 | 0.855 | 0.231 | 20.83 | |
AVG | 26.67 | 0.852 | 0.241 | 20.08 | |
0.002 | amsterdam | 27.13 | 0.880 | 0.202 | 27.14 |
bilbao | 28.02 | 0.880 | 0.205 | 20.91 | |
hollywood | 24.43 | 0.763 | 0.330 | 20.09 | |
pompidou | 25.27 | 0.842 | 0.249 | 24.85 | |
quebec | 29.98 | 0.929 | 0.175 | 17.90 | |
rome | 26.28 | 0.866 | 0.219 | 23.07 | |
AVG | 26.85 | 0.860 | 0.230 | 22.33 | |
0.001 | amsterdam | 27.25 | 0.886 | 0.190 | 31.84 |
bilbao | 27.98 | 0.886 | 0.190 | 24.38 | |
hollywood | 24.59 | 0.772 | 0.319 | 23.41 | |
pompidou | 25.58 | 0.851 | 0.236 | 29.19 | |
quebec | 30.30 | 0.934 | 0.163 | 21.23 | |
rome | 26.61 | 0.876 | 0.203 | 26.91 | |
AVG | 27.05 | 0.868 | 0.217 | 26.16 | |
0.0005 | amsterdam | 27.24 | 0.891 | 0.180 | 36.31 |
bilbao | 28.09 | 0.891 | 0.181 | 27.72 | |
hollywood | 24.60 | 0.778 | 0.313 | 26.00 | |
pompidou | 25.60 | 0.853 | 0.231 | 33.55 | |
quebec | 30.19 | 0.936 | 0.155 | 24.56 | |
rome | 26.73 | 0.881 | 0.194 | 30.21 | |
AVG | 27.08 | 0.872 | 0.209 | 29.72 |
Datasets | Scenes | PSNR | SSIM | LPIPS | SIZE |
---|---|---|---|---|---|
Synthetic-NeRF | chair | 35.65 | 0.988 | 0.010 | 115.77 |
drums | 26.28 | 0.955 | 0.037 | 92.87 | |
ficus | 35.48 | 0.987 | 0.012 | 64.53 | |
hotdog | 38.05 | 0.985 | 0.020 | 43.37 | |
lego | 35.98 | 0.982 | 0.017 | 80.53 | |
materials | 30.48 | 0.960 | 0.037 | 38.50 | |
mic | 36.76 | 0.993 | 0.006 | 48.31 | |
ship | 31.73 | 0.907 | 0.107 | 63.82 | |
AVG | 33.80 | 0.970 | 0.031 | 68.46 | |
Mip-NeRF360 | bicycle | 25.11 | 0.746 | 0.245 | 1336.45 |
garden | 27.30 | 0.856 | 0.122 | 1327.99 | |
stump | 26.66 | 0.770 | 0.242 | 1070.92 | |
room | 31.74 | 0.926 | 0.197 | 353.10 | |
counter | 29.07 | 0.914 | 0.184 | 277.39 | |
kitchen | 31.47 | 0.931 | 0.117 | 414.33 | |
bonsai | 32.12 | 0.946 | 0.181 | 295.33 | |
flower | 21.36 | 0.588 | 0.360 | 814.24 | |
treehill | 22.62 | 0.636 | 0.347 | 812.63 | |
AVG | 27.49 | 0.813 | 0.222 | 744.71 | |
Tank&Temples | truck | 25.38 | 0.877 | 0.148 | 606.99 |
train | 22.00 | 0.811 | 0.208 | 254.91 | |
AVG | 23.69 | 0.844 | 0.178 | 430.95 | |
DeepBlending | playroom | 29.83 | 0.900 | 0.247 | 551.93 |
drjohnson | 29.02 | 0.898 | 0.247 | 775.91 | |
AVG | 29.42 | 0.899 | 0.247 | 663.92 | |
BungeeNeRF | amsterdam | 26.03 | 0.874 | 0.170 | 1458.14 |
bilbao | 26.35 | 0.864 | 0.191 | 1350.37 | |
hollywood | 23.44 | 0.767 | 0.241 | 1601.76 | |
pompidou | 21.20 | 0.772 | 0.266 | 2169.21 | |
quebec | 28.83 | 0.923 | 0.156 | 1468.76 | |
rome | 23.34 | 0.848 | 0.206 | 1649.12 | |
AVG | 24.87 | 0.841 | 0.205 | 1616.23 |
Datasets | Scenes | PSNR | SSIM | LPIPS | SIZE |
---|---|---|---|---|---|
Synthetic-NeRF | chair | 34.96 | 0.985 | 0.013 | 15.50 |
drums | 26.36 | 0.949 | 0.045 | 26.93 | |
ficus | 34.66 | 0.984 | 0.015 | 16.46 | |
hotdog | 37.82 | 0.984 | 0.022 | 11.31 | |
lego | 35.48 | 0.981 | 0.018 | 19.84 | |
materials | 30.37 | 0.958 | 0.043 | 23.12 | |
mic | 36.37 | 0.991 | 0.008 | 14.83 | |
ship | 31.27 | 0.896 | 0.119 | 26.90 | |
AVG | 33.41 | 0.966 | 0.035 | 19.36 | |
Mip-NeRF360 | bicycle | 24.50 | 0.705 | 0.306 | 248.00 |
garden | 27.17 | 0.842 | 0.146 | 271.00 | |
stump | 26.27 | 0.784 | 0.284 | 493.00 | |
room | 31.93 | 0.925 | 0.202 | 133.00 | |
counter | 29.34 | 0.914 | 0.191 | 194.00 | |
kitchen | 31.30 | 0.928 | 0.126 | 173.00 | |
bonsai | 32.70 | 0.946 | 0.185 | 258.00 | |
flower | 21.14 | 0.566 | 0.417 | 253.00 | |
treehill | 23.19 | 0.642 | 0.410 | 262.00 | |
AVG | 27.50 | 0.806 | 0.252 | 253.89 | |
Tank&Temples | truck | 25.77 | 0.883 | 0.147 | 107.00 |
train | 22.15 | 0.822 | 0.206 | 66.00 | |
AVG | 23.96 | 0.853 | 0.177 | 86.50 | |
DeepBlending | playroom | 30.62 | 0.904 | 0.258 | 63.00 |
drjohnson | 29.80 | 0.907 | 0.250 | 69.00 | |
AVG | 30.21 | 0.906 | 0.254 | 66.00 | |
BungeeNeRF | amsterdam | 27.16 | 0.898 | 0.188 | 223.00 |
bilbao | 26.60 | 0.857 | 0.257 | 178.00 | |
hollywood | 24.49 | 0.787 | 0.318 | 155.00 | |
pompidou | 24.94 | 0.839 | 0.271 | 209.00 | |
quebec | 30.28 | 0.936 | 0.190 | 159.00 | |
rome | 26.23 | 0.873 | 0.225 | 174.00 | |
AVG | 26.62 | 0.865 | 0.241 | 183.00 |
Notation | Shape | Definition |
---|---|---|
A random 3D point | ||
Location of Gaussians in 3DGS [17] | ||
Covariance matrix of Gaussians | ||
Scale matrix of Gaussians | ||
Rotation matrix of Gaussians | ||
Opacity of Gaussians after 2D projection | ||
View-dependent color of Gaussians | ||
Number of Gaussians contributed to the rendering | ||
The obtained pixel value after rendering | ||
Anchor location | ||
Feature of the anchor | ||
Scaling of the anchor | ||
Offsets of the anchor | ||
The set of anchor’s attributes including {, , } | ||
Dimension of | ||
Number of offsets per anchor | ||
A 3D-2D mixed binary hash grid | ||
Table size of the hash grid at each level | ||
Number of levels of the hash grid | ||
Dimension of the vectors of the hash grid | ||
A vector of the hash grid | ||
Feature obtained by interpolation of in | ||
Any of anchor’s attribute vectors | ||
Quantized version of | ||
Dimension of , which | ||
Quantization step of | ||
Quantization step refinement term | ||
Estimated mean value for distribution modeling | ||
Estimated standard deviation for distribution modeling | ||
Base quantization step, which varies for | ||
Total number of anchors | ||
Occurrence frequency of “+1” in | ||
Total number of “+1” | ||
Total number of “-1” | ||
The loss item used in Scaffold-GS [25] | ||
Entropy loss for measuring bits of | ||
Entropy loss for measuring bits of | ||
Masking loss | ||
The total loss | ||
Tradeoff parameter to achieve variable birate | ||
Tradeoff parameter to balance masking ratio | ||
The MLP to deduce from | ||
The MLP to deduce and from | ||
Probability density function of Gaussian distribution | ||
Cumulative distribution function of Gaussian distribution |