PPSurf: Combining Patches and Point Convolutions
for Detailed Surface Reconstruction
Abstract
3D surface reconstruction from point clouds is a key step in areas such as content creation, archaeology, digital cultural heritage, and engineering. Current approaches either try to optimize a non-data-driven surface representation to fit the points, or learn a data-driven prior over the distribution of commonly occurring surfaces and how they correlate with potentially noisy point clouds. Data-driven methods enable robust handling of noise and typically either focus on a global or a local prior, which trade-off between robustness to noise on the global end and surface detail preservation on the local end. We propose PPSurf as a method that combines a global prior based on point convolutions and a local prior based on processing local point cloud patches. We show that this approach is robust to noise while recovering surface details more accurately than the current state-of-the-art. Our source code, pre-trained model and dataset are available at: https://github.com/cg-tuwien/ppsurf {CCSXML} <ccs2012> <concept> <concept_id>10010147.10010371.10010396</concept_id> <concept_desc>Computing methodologies Shape modeling</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10010147.10010257</concept_id> <concept_desc>Computing methodologies Machine learning</concept_desc> <concept_significance>300</concept_significance> </concept> </ccs2012>
\ccsdesc[500]Computing methodologies Shape modeling \ccsdesc[300]Computing methodologies Machine learning
\printccsdeschttps://onlinelibrary.wiley.com/doi/10.1111/cgf.15000
1 Introduction
3D surface reconstruction from point clouds is a key step for workflows in areas such as content creation, archaeology, digital cultural heritage, and engineering, to convert raw 3D point scan data, like casual RGBD (color and depth) mobile phone images or more accurate range scans (e.g., from laser range scanner), to surface-based 3D object representations that can be used in downstream applications.
Given the large practical interest, surface reconstruction has become a central problem in computer graphics and vision research. The problem is generally ill-defined, as different surfaces may correspond to similar point clouds. However, several approaches have been proposed to tackle this ambiguity. One research direction attempts to optimize surface representations with strong non-data-driven inductive biases to fit the point cloud [KBH06, WSS19, PJL21, BZYSM21]. This resolves the ambiguity, but is susceptible to deteriorating conditions of the input points, such as scan noise or regions with missing points, which cannot easily be corrected using a fixed inductive bias. Another line of research focuses on learning data-driven priors, usually over the distribution of commonly occurring surfaces and how they correlate with potentially noisy point clouds [PFS19, EGO20, PJL21, BM22]. The surface reconstruction ambiguity can then be resolved by finding a surface that has a high probability for the given point cloud under the learned prior. The prior in these data-driven methods can range from global, where the prior captures a distribution over full 3D object surfaces, to local, where the prior captures the distribution over local surface patches. Global priors are the least susceptible to noise and missing points, but have limited capability to capture fine local details. Local priors, on the other hand, can capture such fine details accurately, but are more susceptible to strong noise and missing points. Existing methods mostly focus their prior on a small range in this global-local spectrum. For example, DeepSDF [PFS19] uses a global prior, Points2Surf [EGO20] mostly focuses on a local prior, while POCO’s point convolutions [BM22] learn a prior in the medium range that is reasonably robust to deteriorating conditions, but still struggles to accurately capture local detail.
We propose PPSurf as a method that covers a wider range in the global-local spectrum of priors, by combining the local prior of a patch-based method like Points2Surf with a more global prior of a point convolution-based method like POCO. For this purpose, we design an architecture that has two branches: the first branch is based on POCO [BM22] and provides a global prior by applying several layers of point convolutions to a sparse set of support points. To reconstruct geometric details more accurately, we merge features from this first branch with features from a second branch, which processes a local patch of points with PointNet [QSMG17]. We additionally discovered that modifying the architecture of PointNet by replacing the sum aggregation with an attention-based aggregation improves performance. This results in a method that is robust to noise and missing points, while preserving details more accurately than previous methods.
In our experiments, we compare PPSurf to several previous state-of-the-art methods, both data-driven and non-data-driven, on synthetic as well as real-world data, and demonstrate improved performance on both in-distribution, and out-of-distribution surface reconstruction tasks.
2 Related Work
Surface reconstruction from point clouds is an active area of research. We distinguish between data-driven methods that train on a large dataset, and non-data-driven methods that do not use machine learning or overfit to a single shape.
Non-data-driven methods.
Poisson reconstruction [KBH06, KH13] has for many years been the gold standard of non-data-driven approaches. Recent works have suggested optimizing the parameters of a neural network to predict the signed distance to the surface [AL20, SMB20, AL21] directly from a single point cloud. In particular, Atzmon and Lipman [AL20] introduced this concept for unoriented point clouds. They optimized the parameters of the neural network with a sign-agnostic loss and a geometric initialization of its parameters. Gropp et al. [GYH20] and Atzmon and Lipman [AL21] followed up on this work and included a gradient regularization in the loss. Later, Ma et al. [BZYSM21] introduced Neural-Pull, an optimization objective that uses directly the gradient of the optimized SDF to move the query points to the closest point in the input point cloud. In follow-up work, this approach was extended by incorporating a network to classify a point being on the surface or not [CHL23], and an additional loss that aligns the gradient direction between different level sets of the SDF [MZLH23]. In order to improve the quality of the final SDF, Yifan et al. [YWOSH20] and Zhou et al. [ZML22] proposed to iteratively increase the input point cloud with points sampled from the optimized SDF in the previous iteration. A different approach was proposed by Peng et al. [PJL21] (also used in LION [ZVW22]), based on a differentiable Poisson Surface Reconstruction operation that could be used for optimization-based or learned reconstructions. Differently from previous methods, the set of points in the surface is optimized through the differentiable reconstruction instead of a neural network representing the SDF. Lin et al. proposed a parametric Gauss formula for reconstruction [LXSW22], which has quadratic complexity in memory leading to prohibitive costs for larger point clouds. VIPSS by Huang et al. [HCJ19] formulates reconstruction as a constrained quadratic optimization problem. iPSR by Hou et al. [HWW22] uses an iterative approach to Poisson reconstruction that improves the surface more and more, while removing the need to be given point normals. IsoPoisson by Xiao et al. [XSL23] incorporate an isovalue constraint to the Poisson equation, which helps with consistent normal orientation and consequently improved reconstruction.
Non-data-driven methods are sensitive to noise, which is usually present in real 3D scans. In order to address this limitation to some extent, a recent pre-print from Wang et al. [WWW23] proposed Neural-IMLS, a non-data-driven method that regularizes the smoothness of surface normals using an MLP with limited capacity. While this produces smooth surfaces, it also loses some geometric detail due to this non-data-driven regularization. Noise to Noise Map** by Baorui et al. [MLH23] focuses on the reconstruction of noisy point clouds in an unsupervised overfitting scheme. Additionally, these methods require significant reconstruction times due to the optimization being performed for each shape individually, which can be a limiting factor for large scans.
Data-driven methods.
A recent line of research has approached the problem of shape reconstruction in a data-driven manner by using a large dataset to learn a prior over the distribution of commonly occurring surfaces and how they correlate with the input point cloud. These approaches are typically fast and robust to noisy inputs compared to non-data-driven approaches. However, in such methods, the resulting reconstruction highly depends on the quality of such priors.
Several works have proposed to use a global prior to capturing the distribution over full 3D object surfaces [CZ19, MON19, PFS19]. These methods define such a prior as a single latent vector representing the shape, which is then used as a condition in a fully connected network to decode the SDF of a given query point. Usually, the decoder is trained on large data sets with a point-cloud encoder [MON19, CZ19]. However, Park et al. [PFS19] proposed to train the decoder directly on such data sets and then optimize the latent vector to match the noisy point cloud during inference. Recently, Zhang et al. [ZTNW23] proposed to use richer global priors. They introduced an encoder-decoder network that encodes the input point cloud using attention modules into a set of latent vectors representing the shape, which are then used to predict the SDF for a set of query points using cross-attention modules.
Other works have opted to condition their models with local priors. Siddiqui et al. [STM21] encoded the input point clouds in a set of latent scene patches. These latent vectors are used to query a database of latent vectors from patches obtained from the training set. The obtained patches are then blended together using an attention mechanism. Ma et al. [BYSZ22] incorporated local priors by including a network pre-trained on a large number of surface patches which classifies a point as being on the surface or not. This network is used to guide an optimization process that learns the shape’s SDF using another neural network. Jiang et al. [JSM20] pre-trained an SDF encoder-decoder on a large data set of object parts. Then, during the optimization process, only the latent codes of the different parts of the object are optimized. Chen et al. [CTFZ22] propose a dual contouring method learned on a small local prior.
Since global and local priors provide complementary information about the shape, a common approach is to use a prior in the medium range using a hierarchical encoder-decoder network. These approaches reduce the input point cloud to a simplified representation, e.g., voxelization or subsampled point cloud, which is then enriched by the global information provided by the bottleneck of the encoder-decoder architecture. Chibane et al. [CAPM20] and Peng et a. [PNM20] proposed a 3DCNN encoder-decoder network to encode the sparse or noisy point cloud to later predict the SDF for an arbitrary point around the surface. Chibane et al. [CMPM20] extended this work to predict an unsigned distance field, which allowed them to represent complex open surfaces. Tang et al. [TLX21] extended the work of Peng et al. [PNM20] to include test-time optimization to improve out-of-distribution point clouds. Ummenhofer and Koltun [UK21] proposed a CNN that works directly on an Octree, from which the model was able to predict the SDF. Wang et al. [WLT22] also represented the input point cloud with an octree, from which they constructed a graph. This graph was further processed by a GCN encoder-decoder to generate an embedding for each octree node, from where the final SDF is predicted. Dai et al. [DDN20] instead used a 3D sparse encoder-decoder network to complete partial 3D scans and predict a complete SDF. Lionar et al. [LESP21] also developed an encoder-decoder network but used instead the projection of the input point cloud to a set of arbitrary 2D planes, from which the final SDF was predicted. Boulch and Marlet [BM22] recently proposed to use an encoder-decoder network that directly worked with points, avoiding discretization artifacts from voxel-based representations. Although all these methods work relatively well when compared with methods that use global or local priors alone, they struggle to accurately capture fine local details of the shapes.
Erler et al. [EGO20] proposed to explicitly model global and local priors directly from point clouds using two different branches. Each branch used a PointNet [QSMG17] architecture, to process the local patch around the query point in the local branch, and a point cloud representing the complete shape in the global branch. While the local branch was able to capture high-frequency details relatively well, they used a weak global prior due to the small subset of points selected to represent the shape. Our approach addresses the limitations of all these methods by incorporating strong global and local priors.
3 Method
The goal of our method is to take as input an unoriented point cloud that was sampled from an unknown watertight surface with a noisy sampling process, and output a surface that approximates as closely as possible. Similar to several previous approaches, we define the surface using an implicit representation, since this guarantees watertightness and naturally handles arbitrary surface topology in a smooth and differentiable way. More specifically, is defined as the -level set of an occupancy field : .
We train a network with parameters to model the field given a point cloud :
(1) |
The network uses two branches: i) a global branch that performs point convolutions [BPM20] on a sparse random subset of points and effectively learns a global prior over the coarse shape of given the input points , and ii) a local branch that processes a small local patch around and effectively learns a local prior over the detailed shape of local surface patches. Each branch outputs a feature vector for a given query point that is combined into a single feature vector before being processed by a small MLP that outputs the occupancy probability :
(2) |
where is the operation used to combine the two feature vectors, a sum in our experiments. Here, we omit the parameters of the networks , , and to avoid a cluttered notation. Figure 1 illustrates our architecture.
In the following, we describe the architecture of PPSurf, including the global and local branches in Section 3.1, followed by a description of the training and inference setups in Sections 3.2 and 3.3, respectively.
3.1 Architecture
Global Branch
The global branch takes as input a random subset and a 3D query point and outputs a global feature vector for the point , which encodes information about the coarse shape of the point cloud. We implement the global branch using POCO [BM22], which consists of two main components: i) a point convolution module that computes a feature vector for each sparse point , followed by ii) an interpolation module that interpolates the feature vectors to get the global feature vector at point .
The point convolution module uses FKAConv [BPM20] to process the sparse point cloud into a feature vector for each point:
(3) |
where is the set of feature vectors at each sparse point. Due to limitations both in performance and network capacity, convolutions can only be performed on the sparse subset instead of the full point cloud , with in our experiments. This module consists of 10 layers of convolutions. Each layer uses a convolution kernel that operates over the 16 nearest neighbors of each point.
Given a query point , the interpolation module interpolates the feature vectors at the nearest neighbors of the query point to get the global feature vector using an attention-based weighting:
(4) | |||
(5) |
where denotes concatenation, , are two MLPs that transform the feature vectors before and after the weighted sum, and are learned weighting functions, each implemented as a single linear layer. Analogous to the attention heads in multi-head attention, multiple different weighting functions are used as a form of ensemble learning, in our experiments. Note that when evaluating multiple query points for a point cloud, the point convolution module only needs to be evaluated once, while the interpolation module needs to be evaluated once per query point.
Local Branch
The local branch processes a local patch around the query point and outputs a local feature vector for the point , which encodes information about the detailed shape of the point cloud near . We base the local branch on the popular PointNet [QSMG17] architecture, which has been successfully applied in various methods that process local point cloud patches [GKOM18, RLBG19]. We modify the architecture with an attention-based aggregation, instead of the original max- or sum-based aggregation, which we found to improve performance.
We define the local patch as the nearest neighbors of the query point . We normalize the patch by centering it at the origin and scaling it to fit into a unit sphere, obtaining the normalized patch . Subsequently, we apply PointNet with attention-based aggregation similar to Eqs. 4 and 5, but without using multiple attention heads:
(6) | |||
(7) |
where is a learned weighting function implemented as linear layer, and , are two MLPs that transform the feature vectors before and after the weighted aggregation.
3.2 Training Setup
We train our network with a binary cross-entropy loss supervised by the ground-truth occupancy on query points defined by the Points2Surf ABC var-noise training set [EGO20]. We train with AdamW (lr=0.001, betas=(0.9, 0.999), eps=1e-5, weight_decay=1e-2, amsgrad=False) for 150 epochs with scheduler steps at 75 and 125 epochs. On our training machine, we can fully utilize all 4 NVIDIA A40 GPUs with distributed data-parallel training using a total batch size of 50 and 48 workers. The other hyperparameters are mostly based on POCO, namely manifold points, a network decoder of 64 and 2 output classes. One change is the increased latent size of 128, which was 32 in POCO. The additional hyperparameters for the local branch are a PointNet latent size of 256 and a patch size of 50. The training takes about 5 hours.
3.3 Inference Setup
We use the inference setup from POCO [BM22], which differs from the training setup in two main aspects: First, we perform test-time augmentation in our global branch to obtain more reliable results. Second, we sample query points in a grid and use a variant of marching cubes to reconstruct a mesh. We describe both in more detail below.
Test-time augmentation.
The sparse subsample used for the global branch may miss important geometric detail. To improve robustness, we compute the per-point feature vectors for multiple different random subsamples , until each point in is included in at least subsamples. The different feature vectors for each point in are then averaged before performing the interpolation step.
Mesh reconstruction.
We place query points in a grid and use a variant of marching cubes [LC87] proposed in POCO to obtain a mesh from the occupancy field . That marching cubes variant uses a region-growing strategy starting from the input points to avoid the costly evaluation at all grid points, and super-samples marching-cube edges that intersect a surface to get a more accurate estimate of the intersection point.
4 Results
We evaluate PPSurf by comparing our surface reconstruction performance to several state-of-the-art methods, both data-driven and non-data-driven. We show both quantitative and qualitative comparisons in Section 4.1. Additionally, we provide an ablation to empirically validate our main design choices in Section 4.2.
Metrics
We use three well-known metrics to evaluate the error of our reconstructed surfaces: the Chamfer distance, the F1-score, and the normal error. We evaluate each metric at random surface samples for the Chamfer distance and normal error, or volume samples for the IoU. This results in roughly variance between different runs.
The Chamfer distance [BTBW77, FSG17] measures the distance between two point sets. We use it to measure the distance between reconstructed and GT surface samples. It is defined as:
(8) |
where and are point sets of size sampled on the surface of the GT object and the reconstructed object.
The F1 Score [TH15] measures the overlap between the ground truth surface and the region enclosed by the reconstructed surface, similar to the IoU. It weights precision and recall equally.
The normal error measures the difference between the normals of the reconstructed surface and the ground truth normals. We sample points uniformly on the ground truth mesh and the reconstructed mesh , storing the normals of their originating faces. Then, we find the closest neighbor of each point in . We report the average angle between the normals of these point pairs: , where and are ground truth and reconstructed normals, respectively.
Datasets
We evaluate our method on the set of dataset variants introduced in P2S [EGO20]:
- •
-
•
The Famous [EGO20] dataset consists of 22 diverse well-known meshes, including the Stanford Bunny, the Utah Teapot, and the Armadillo. We use this dataset for testing only.
-
•
A subset of shapes from the Thingi10k [ZJ16] dataset are used as additional test set. The Thingi10k dataset contains a variety of CAD shapes, but also more organic shapes like statues.
-
•
The [EGO20] dataset consists of 3 real-world point clouds.
All synthetic point clouds were created with the simulated scanner BlenSor [GKUP11] with a scanner resolution of , using a random number of scans between and . Each dataset comes in up to five variants:
-
•
no noise: A version without noise
-
•
med. noise: A version with noise using a standard deviation of , where is the largest side of the object’s bounding box.
-
•
high noise: A version with noise using a standard deviation of .
-
•
var. noise: A version with variable noise, where the amount of noise used for a given shape is sampled uniformly in and the number of scans in .
-
•
sparse: A version with medium noise where all shapes only uses scans, resulting in point clouds between and points.
-
•
dense: A version with medium noise where all shapes use scans, resulting in point clouds between and points.
For a fair comparison, we train all data-driven methods on the ABC var. noise dataset and evaluate them with each test set. Some point cloud examples of these data sets are illustrated in Figure 2.
4.1 Comparisons
We compare PPSurf to several recent data-driven and non-data-driven reconstruction methods. PGR [LXSW22], Neural-IMLS (IMLS) [WWW23] and Shape as Points (SAP-O) are non-data-driven methods that do not train on a large dataset and instead directly fit a surface to the input point cloud. Shape as Points also has a data-driven variant (SAP) that uses a trained network. Additionally, we use Points2Surf (P2S) [EGO20] and POCO [BM22] as data-driven methods. We took the best available variants and settings for each method: For PGR, we use the default parameters wmin=0.0015, alpha=1.05 for no noise, med noise and var. noise. We use the following adapted parameters for the other datasets: wmin=0.03, alpha=2.0 for high noise, wmin=0.03, alpha=1.5 for dense and sparse. We use thingi-noisy for SAP-O, vanilla for P2S , and 10k-FKAConv-InterpAttentionKHeadsNet for POCO. We used the provided noise-large configuration for SAP. For IMLS, we used the results provided by the authors (high noise datasets were not provided by the authors). Note that IMLS was developed concurrently with our work.
Qualitative Comparison
Figure 3 shows comparisons for one example of each dataset variant. While non-data-driven methods give competitive results on low-noise results, PPSurf has a clear advantage with sparse and noisy point clouds.
We show examples on real-world point clouds in Figure 4, where PPSurf produces clearer edges and finer details.
Quantitative Comparison
Table 1 shows the performance of PPSurf on all dataset variants. We report the average over all shapes in the test set. Similar to the qualitative results, POCO, PPSurf and the non-data-driven methods share the first place in most low-noise dataset variants, but PPSurf 50NN takes the lead in almost all other dataset variants. This confirms that adding the local branch does indeed improve the local reconstruction.
Dataset | Chamfer Distance (x100) | F1 | Normal Error | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
IMLS | PGR | SAP-O | SAP | P2S | POCO | PPSurf | IMLS | PGR | SAP-O | SAP | P2S | POCO | PPSurf | IMLS | PGR | SAP-O | SAP | P2S | POCO | PPSurf | |
ABC var. noise | 1.08 | 1.60 | 1.18 | 1.18 | 0.84 | 0.70 | 0.66 | 0.78 | 0.50 | 0.67 | 0.79 | 0.83 | 0.89 | 0.90 | 0.55 | 1.29 | 1.11 | 0.52 | 0.65 | 0.32 | 0.30 |
ABC no noise | 0.48 | 0.53 | 0.63 | 1.08 | 0.61 | 0.50 | 0.48 | 0.92 | 0.92 | 0.88 | 0.80 | 0.88 | 0.94 | 0.94 | 0.20 | 0.26 | 0.30 | 0.51 | 0.31 | 0.19 | 0.19 |
Famous no noise | 0.35 | 0.36 | 0.35 | 0.99 | 0.46 | 0.39 | 0.37 | 0.95 | 0.95 | 0.95 | 0.84 | 0.93 | 0.95 | 0.96 | 0.44 | 0.48 | 0.44 | 0.75 | 0.57 | 0.46 | 0.46 |
Thingi10k no noise | 0.40 | 0.30 | 0.43 | 0.89 | 0.39 | 0.33 | 0.33 | 0.93 | 0.95 | 0.92 | 0.85 | 0.93 | 0.95 | 0.96 | 0.25 | 0.19 | 0.23 | 0.49 | 0.32 | 0.18 | 0.19 |
Mean no noise | 0.41 | 0.40 | 0.47 | 0.99 | 0.49 | 0.41 | 0.39 | 0.93 | 0.94 | 0.92 | 0.83 | 0.91 | 0.95 | 0.95 | 0.30 | 0.31 | 0.33 | 0.58 | 0.40 | 0.28 | 0.28 |
Famous med. noise | 0.54 | 0.95 | 0.58 | 1.06 | 0.52 | 0.49 | 0.48 | 0.91 | 0.60 | 0.90 | 0.83 | 0.92 | 0.94 | 0.93 | 0.57 | 1.35 | 0.91 | 0.78 | 0.63 | 0.53 | 0.54 |
Thingi10k med. noise | 0.58 | 0.93 | 0.56 | 0.93 | 0.44 | 0.39 | 0.38 | 0.90 | 0.57 | 0.89 | 0.85 | 0.92 | 0.94 | 0.94 | 0.37 | 1.32 | 0.78 | 0.50 | 0.38 | 0.24 | 0.25 |
Mean med. noise | 0.56 | 0.94 | 0.57 | 0.99 | 0.48 | 0.44 | 0.43 | 0.91 | 0.59 | 0.89 | 0.84 | 0.92 | 0.94 | 0.94 | 0.47 | 1.33 | 0.85 | 0.64 | 0.50 | 0.38 | 0.39 |
ABC high noise | – | 1.90 | 1.96 | 1.51 | 1.24 | 1.00 | 0.97 | – | 0.42 | 0.49 | 0.75 | 0.78 | 0.84 | 0.85 | – | 1.35 | 1.42 | 0.65 | 0.99 | 0.43 | 0.41 |
Famous high noise | – | 1.86 | 1.80 | 1.62 | 1.14 | 1.11 | 1.01 | – | 0.50 | 0.59 | 0.78 | 0.84 | 0.84 | 0.85 | – | 1.35 | 1.39 | 0.91 | 1.04 | 0.76 | 0.72 |
Thingi10k high noise | – | 1.94 | 1.89 | 1.45 | 1.08 | 0.92 | 0.83 | – | 0.51 | 0.60 | 0.80 | 0.84 | 0.87 | 0.88 | – | 1.31 | 1.36 | 0.64 | 0.90 | 0.47 | 0.43 |
Mean high noise | – | 1.90 | 1.88 | 1.53 | 1.16 | 1.01 | 0.94 | – | 0.32 | 0.56 | 0.78 | 0.82 | 0.85 | 0.86 | – | 1.34 | 1.39 | 0.73 | 0.98 | 0.55 | 0.52 |
Famous sparse | 0.90 | 0.88 | 0.71 | 1.24 | 0.77 | 0.67 | 0.64 | 0.86 | 0.88 | 0.88 | 0.74 | 0.89 | 0.92 | 0.92 | 0.68 | 0.75 | 0.86 | 0.89 | 0.71 | 0.60 | 0.61 |
Thingi10k sparse | 0.82 | 0.89 | 0.86 | 1.35 | 0.78 | 0.63 | 0.63 | 0.85 | 0.86 | 0.84 | 0.73 | 0.87 | 0.90 | 0.90 | 0.48 | 0.53 | 0.76 | 0.73 | 0.51 | 0.37 | 0.39 |
Mean sparse | 0.86 | 0.88 | 0.79 | 1.29 | 0.77 | 0.65 | 0.63 | 0.86 | 0.87 | 0.86 | 0.73 | 0.88 | 0.91 | 0.91 | 0.58 | 0.64 | 0.81 | 0.81 | 0.61 | 0.48 | 0.50 |
Famous dense | 0.45 | 0.70 | 0.53 | 0.96 | 0.41 | 0.42 | 0.40 | 0.93 | 0.43 | 0.90 | 0.86 | 0.94 | 0.95 | 0.95 | 0.52 | 1.33 | 1.00 | 0.74 | 0.59 | 0.49 | 0.48 |
Thingi10k dense | 0.49 | 0.67 | 0.54 | 0.88 | 0.36 | 0.35 | 0.33 | 0.91 | 0.47 | 0.89 | 0.87 | 0.94 | 0.95 | 0.96 | 0.30 | 1.23 | 0.84 | 0.47 | 0.33 | 0.21 | 0.21 |
Mean dense | 0.47 | 0.69 | 0.53 | 0.92 | 0.39 | 0.39 | 0.37 | 0.92 | 0.45 | 0.89 | 0.86 | 0.94 | 0.95 | 0.96 | 0.41 | 1.28 | 0.92 | 0.60 | 0.46 | 0.35 | 0.34 |
Mean overall | 0.61 | 1.04 | 0.93 | 1.16 | 0.70 | 0.61 | 0.58 | 0.89 | 0.66 | 0.80 | 0.81 | 0.89 | 0.91 | 0.92 | 0.43 | 0.98 | 0.88 | 0.66 | 0.61 | 0.40 | 0.40 |
Computation Time and Memory Consumption
Training PPSurf on the ABC var-noise training set was done in 5 hours on 4 NVIDIA A40 GPUs and 48 AMD EPYC-Milan cores. We reconstruct all shapes in our test sets on a single A40 and 48 CPU cores. See the timings and memory consumption in Table 2. While non-data-driven methods tend to be faster than data-driven ones, SAP is a lightning-fast exception. PPSurf with small patch sizes has a negligible impact on resources compared to POCO. Neural IMLS does not report timings. As it is concurrent work, we could not do our own measurements. While it is fast, PGR’s memory usage varies a lot with point cloud size, between a few GB to going out-of-memory with >46GB on 21 shapes.
Time per Shape | Max GPU Memory | |
---|---|---|
PGR | 1.9 min | >46GB |
SAP-O | 1.1 min | 3.8GB |
SAP | 0.8sec | 3.1GB |
P2S | 13.5min | 14.3GB |
POCO | 1.6min | 9.0GB |
PPSurf 10NN | 1.6min | 9.1GB |
PPSurf 25NN | 1.7min | 9.1GB |
PPSurf 50NN | 1.9min | 9.3GB |
PPSurf 100NN | 2.6min | 13.7GB |
PPSurf 200NN | 3.5min | 13.2GB |
Discussion
For dense and noise-free point clouds, non-data-driven methods such as PGR, SAP-O and especially IMLS are a good option. However, their performance is limited in the presence of typical point-cloud artifacts, due to missing data-driven priors. Data-driven methods such as SAP, P2S, POCO and PPSurf can better deal with such artifacts. SAP is the fastest method but lacks accuracy, possibly due to its very small network. A bigger version could perhaps produce competitive results but would require non-trivial changes to the method.
P2S employs a relatively simple PointNet for global shape encoding, which results in a weak global prior that can not reach the quality of a more efficient encoder such as FKAConv. Furthermore, it reconstructs noisy surfaces, which is reflected in the relatively high normal error, even with noise-free inputs.
Apart from some noise-free datasets, only POCO is close to PPSurf’s quality. PPSurf achieves similar results on low-noise point clouds, but significantly better reconstructions for noisy point clouds. When predicting the occupancy at the query points, POCO has no direct access to the full point cloud, only to a coarse latent representation. This inability to accurately represent local information is likely the reason why POCO tends to produce blobby structures and over-smooth the reconstructed surfaces. We avoid this by providing a latent code that captures local detail more accurately by adding a local branch that directly encodes dense local patches of the point cloud.
4.2 Ablation
We investigate several design choices in an ablation study on the ABC var-noise test set. Most importantly, Table 3 shows that having both global and local branches gains a major advantage. Referring to Table 4, the optimal local patch size lies in the range of to . Further, attention is a better symmetric operation than max, and concatenating features is similar to summing them. This can be seen in Table 5. Please see the supplementary for an evaluation of the most relevant variants on all datasets. We compare the following variants of our method:
-
•
Full is the full method as described in Section 3.
-
•
For Only Local, we set the global features to zeros, disabling this branch. Based on the results of this experiment, we conclude that this model can not reliably encode any surface since it lacks global knowledge of the surface to reconstruct.
-
•
Only Global is similar to POCO as it omits the local branch. The results show that a global prior can help to obtain reliable reconstructions but with lower performance due to the missing fine details.
-
•
For Sym Max, we replace the attention-based interpolation used in the local branch with the max, effectively making this branch a PointNet [QSMG17]. The results show an advantage for attention.
-
•
In Merge Cat, we concatenate the features of both branches instead of summing them, which leads to twice the input size for the final MLP. Results show that this is slightly worse than Full.
-
•
The QPoints variant is the same as Merge Cat, but additionally, we concatenate query point coordinates to the input of the learned weighting function . However, this results in a slightly worse performance than Full and even Merge Cat.
-
•
For the xNN variants, we take the nearest neighbors for local subsample. Full is equal to 50NN.
Model | Chamfer (x100) | F1 Score | Normal Error |
---|---|---|---|
Only Local | 2.69 | 0.36 | 1.56 |
Only Global | 0.70 | 0.89 | 0.33 |
PPSurf Full | 0.66 | 0.90 | 0.30 |
Model | Chamfer (x100) | F1 Score | Normal Error |
---|---|---|---|
PPSurf 10NN | 1.10 | 0.90 | 0.40 |
PPSurf 25NN | 0.66 | 0.90 | 0.31 |
PPSurf Full | 0.66 | 0.90 | 0.30 |
PPSurf 100NN | 0.66 | 0.90 | 0.30 |
PPSurf 200NN | 0.67 | 0.89 | 0.31 |
Model | Chamfer (x100) | F1 Score | Normal Error |
---|---|---|---|
PPSurf Sym Max | 1.11 | 0.90 | 0.40 |
PPSurf QPoints | 0.67 | 0.89 | 0.31 |
PPSurf Merge Cat | 0.66 | 0.90 | 0.30 |
PPSurf Full | 0.66 | 0.90 | 0.30 |
4.3 Limitations
Reconstruction times are still non-interactive, due to the need to evaluate the occupancy at a large number of samples. Possibilities for speed-ups include more efficient sampling strategies to use fewer query points.
As our learned priors were trained on noisy data to make PPSurf more robust to noise, they also bias the reconstructed surface to some extent towards the distributions learned by the priors. This results in some loss of accuracy when applied to noise-free point clouds compared to some of the non-data-driven methods (see Figure 5). Learning a prior that is specialized to noise-free point clouds, or including more noise-free point clouds in our training set would alleviate this issue.
While PPSurf is better than the baselines in filling scan shadows, it is not a generative method and cannot generate new geometric detail in large missing regions. This limits the size of missing regions that can be filled with plausible geometry. Combining PPSurf with a generative model would be an interesting direction for future work. See Figure 6 for an example of inaccurately filled scan shadows.
5 Conclusion
In this paper, we have introduced PPSurf as a method for surface reconstruction from raw, unoriented point clouds. In contrast to previous methods, PPSurf incorporates strong local and global priors learned from data. Whilst our global prior is based on a point convolutional neural network that processes the point cloud as a whole, fine details are preserved through the local prior based on dense local point cloud patches. We have shown in extensive studies that PPSurf is able to achieve better surface reconstructions than previous data-driven and non-data-driven methods, being more robust to noise in the input point cloud and preserving fine details at the same time.
In the future, we would like to investigate how modern techniques borrowed from generative models could improve the obtained reconstruction from sparse point clouds where large parts of the shape are missing.
6 Acknowledgements
This work has been supported by the FWF projects P24600-N23 and P32418-N31, the WWTF project ICT19-009 and the EU MSCA-ITN project EVOCATION (grant agreement 813170).
References
- [AL20] Atzmon M., Lipman Y.: Sal: Sign agnostic learning of shapes from raw data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
- [AL21] Atzmon M., Lipman Y.: SALD: sign agnostic learning with derivatives. In International Conference on Learning Representations, ICLR (2021).
- [BM22] Boulch A., Marlet R.: Poco: Point convolution for surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2022), pp. 6302–6314.
- [BPM20] Boulch A., Puy G., Marlet R.: FKAConv: Feature-Kernel Alignment for Point Cloud Convolution. In 15th Asian Conference on Computer Vision (ACCV 2020) (2020).
- [BTBW77] Barrow H. G., Tenenbaum J. M., Bolles R. C., Wolf H. C.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In Proceedings of the 5th International Joint Conference on Artificial Intelligence - Volume 2 (San Francisco, CA, USA, 1977), IJCAI’77, Morgan Kaufmann Publishers Inc., pp. 659–663. URL: http://dl.acm.org/citation.cfm?id=1622943.1622971.
- [BYSZ22] Baorui M., Yu-Shen L., Zhizhong H.: Reconstructing surfaces for sparse point clouds with on-surface priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
- [BZYSM21] Baorui M., Zhizhong H., Yu-Shen L., Matthias Z.: Neural-pull: Learning signed distance functions from point clouds by learning to pull space onto surfaces. In International Conference on Machine Learning (ICML) (2021).
- [CAPM20] Chibane J., Alldieck T., Pons-Moll G.: Implicit functions in feature space for 3d shape reconstruction and completion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
- [CHL23] Chen C., Han Z., Liu Y.-S.: Unsupervised inference of signed distance functions from single sparse point clouds without learning priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
- [CMPM20] Chibane J., Mir A., Pons-Moll G.: Neural unsigned distance fields for implicit function learning. In Advances in Neural Information Processing Systems (NeurIPS) (December 2020).
- [CTFZ22] Chen Z., Tagliasacchi A., Funkhouser T., Zhang H.: Neural dual contouring. ACM Transactions on Graphics (Special Issue of SIGGRAPH) 41, 4 (2022).
- [CZ19] Chen Z., Zhang H.: Learning implicit fields for generative shape modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
- [DDN20] Dai A., Diller C., Nießner M.: Sg-nn: Sparse generative neural networks for self-supervised scene completion of rgb-d scans. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE (2020).
- [EGO20] Erler P., Guerrero P., Ohrhallinger S., Mitra N. J., Wimmer M.: Points2Surf: Learning implicit surfaces from point clouds. In European Conference on Computer Vision (ECCV) (2020).
- [FSG17] Fan H., Su H., Guibas L. J.: A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 605–613.
- [GKOM18] Guerrero P., Kleiman Y., Ovsjanikov M., Mitra N. J.: PCPNet: Learning local shape properties from raw point clouds. Computer Graphics Forum 37, 2 (2018), 75–85. doi:10.1111/cgf.13343.
- [GKUP11] Gschwandtner M., Kwitt R., Uhl A., Pree W.: Blensor: Blender sensor simulation toolbox. In International Symposium on Visual Computing (2011), Springer, pp. 199–208.
- [GYH20] Gropp A., Yariv L., Haim N., Atzmon M., Lipman Y.: Implicit geometric regularization for learning shapes. In International Conference on Machine Learning (ICML). 2020.
- [HCJ19] Huang Z., Carr N., Ju T.: Variational implicit point set surfaces. ACM Trans. Graph. 38, 4 (jul 2019). URL: https://doi.org/10.1145/3306346.3322994, doi:10.1145/3306346.3322994.
- [HWW22] Hou F., Wang C., Wang W., Qin H., Qian C., He Y.: Iterative poisson surface reconstruction (ipsr) for unoriented points. ACM Trans. Graph. 41, 4 (jul 2022). URL: https://doi.org/10.1145/3528223.3530096, doi:10.1145/3528223.3530096.
- [JSM20] Jiang C., Sud A., Makadia A., Huang J., Nießner M., Funkhouser T.: Local implicit grid representations for 3d scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020).
- [KBH06] Kazhdan M., Bolitho M., Hoppe H.: Poisson surface reconstruction. In Proc. of the Eurographics symposium on Geometry processing (2006).
- [KH13] Kazhdan M., Hoppe H.: Screened poisson surface reconstruction. ACM Transactions on Graphics (ToG) 32, 3 (2013), 29.
- [KMJ19] Koch S., Matveev A., Jiang Z., Williams F., Artemov A., Burnaev E., Alexa M., Zorin D., Panozzo D.: Abc: A big cad model dataset for geometric deep learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
- [LC87] Lorensen W. E., Cline H. E.: Marching cubes: A high resolution 3d surface construction algorithm. In ACM siggraph computer graphics (1987), vol. 21, ACM, pp. 163–169.
- [LESP21] Lionar S., Emtsev D., Svilarkovic D., Peng S.: Dynamic plane convolutional occupancy networks. In Winter Conference on Applications of Computer Vision (WACV) (2021).
- [LXSW22] Lin S., Xiao D., Shi Z., Wang B.: Surface reconstruction from point clouds without normals by parametrizing the gauss formula. ACM Trans. Graph. 42, 2 (oct 2022). URL: https://doi.org/10.1145/3554730, doi:10.1145/3554730.
- [MLH23] Ma B., Liu Y.-S., Han Z.: Learning signed distance functions from noisy 3d point clouds via noise to noise map**. In International Conference on Machine Learning (ICML) (2023).
- [MON19] Mescheder L., Oechsle M., Niemeyer M., Nowozin S., Geiger A.: Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
- [MZLH23] Ma B., Zhou J., Liu Y.-S., Han Z.: Towards better gradient consistency for neural signed distance functions via level set alignment. In Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
- [PFS19] Park J. J., Florence P., Straub J., Newcombe R., Lovegrove S.: Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2019), pp. 165–174.
- [PJL21] Peng S., Jiang C. M., Liao Y., Niemeyer M., Pollefeys M., Geiger A.: Shape as points: A differentiable poisson solver. In Advances in Neural Information Processing Systems (NeurIPS) (2021).
- [PNM20] Peng S., Niemeyer M., Mescheder L., Pollefeys M., Geiger A.: Convolutional occupancy networks. In European Conference on Computer Vision (ECCV) (2020).
- [QSMG17] Qi C. R., Su H., Mo K., Guibas L. J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 652–660.
- [RLBG19] Rakotosaona M.-J., La Barbera V., Guerrero P., Mitra N. J., Ovsjanikov M.: Pointcleannet: Learning to denoise and remove outliers from dense point clouds. Computer Graphics Forum (2019).
- [SMB20] Sitzmann V., Martel J. N., Bergman A. W., Lindell D. B., Wetzstein G.: Implicit neural representations with periodic activation functions. In Advances in Neural Information Processing Systems (NeurIPS) (2020).
- [STM21] Siddiqui Y., Thies J., Ma F., Shan Q., Nießner M., Dai A.: Retrievalfuse: Neural 3d scene reconstruction with a database. In International Conference on Computer Vision (ICCV) (2021).
- [TH15] Taha A. A., Hanbury A.: Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC medical imaging 15, 1 (2015), 1–28.
- [TLX21] Tang J., Lei J., Xu D., Ma F., Jia K., Zhang L.: Sa-convonet: Sign-agnostic optimization of convolutional occupancy networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021).
- [UK21] Ummenhofer B., Koltun V.: Adaptive surface reconstruction with multiscale convolutional kernels. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
- [WLT22] Wang P.-S., Liu Y., Tong X.: Dual octree graph networks for learning adaptive volumetric shape representations. ACM Transactions on Graphics (2022).
- [WSS19] Williams F., Schneider T., Silva C. T., Zorin D., Bruna J., Panozzo D.: Deep geometric prior for surface reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 10130–10139.
- [WWW23] Wang Z., Wang P., Wang P., Dong Q., Gao J., Chen S., Xin S., Tu C., Wang W.: Neural-imls: Self-supervised implicit moving least-squares network for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics (2023), 1–16. doi:10.1109/TVCG.2023.3284233.
- [XSL23] Xiao D., Shi Z., Li S., Deng B., Wang B.: Point normal orientation and surface reconstruction by incorporating isovalue constraints to poisson equation. Computer Aided Geometric Design 103 (2023), 102195. URL: https://www.sciencedirect.com/science/article/pii/S0167839623000274, doi:https://doi.org/10.1016/j.cagd.2023.102195.
- [YWOSH20] Yifan W., Wu S., Oztireli C., Sorkine-Hornung O.: Iso-points: Optimizing neural implicit surfaces with hybrid representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
- [ZJ16] Zhou Q., Jacobson A.: Thingi10k: A dataset of 10,000 3d-printing models. arXiv preprint arXiv:1605.04797 (2016).
- [ZML22] Zhou J., Ma B., Liu Y.-S., Fang Y., Han Z.: Learning consistency-aware unsigned distance functions progressively from raw point clouds. In Advances in Neural Information Processing Systems (NeurIPS) (2022).
- [ZTNW23] Zhang B., Tang J., Nießner M., Wonka P.: 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Trans. Graph. 42, 4 (jul 2023). URL: https://doi.org/10.1145/3592442, doi:10.1145/3592442.
- [ZVW22] Zeng X., Vahdat A., Williams F., Gojcic Z., Litany O., Fidler S., Kreis K.: Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS) (2022).