License: CC BY 4.0
arXiv:2401.08518v2 [cs.CV] 08 Feb 2024
\JournalSubmission\BibtexOrBiblatex\electronicVersion\PrintedOrElectronic\teaser
[Uncaptioned image]

We present PPSurf, a method to reconstruct surfaces from noisy point clouds. Unlike previous methods, our approach combines two strong data-driven priors, one prior over local surface details, and a second prior over the coarse shape of larger surface regions. This makes PPSurf robust to noise, while reconstructing surface detail better than current methods.

PPSurf: Combining Patches and Point Convolutions
for Detailed Surface Reconstruction

P. Erler11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT\orcid0000-0002-2790-9279 and L. Fuentes-Perez22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT\orcid0000-0003-1096-2871 and P. Hermosilla11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT\orcid0000-0003-3586-4741 and P. Guerrero33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT\orcid0000-0002-7568-2849 and R. Pajarola22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT\orcid0000-0002-6724-526X and M. Wimmer11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT\orcid0000-0002-9370-2663
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTTU Wien, Austria 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTUniversity of Zürich, Switzerland 33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTAdobe Research, United Kingdom
Abstract

3D surface reconstruction from point clouds is a key step in areas such as content creation, archaeology, digital cultural heritage, and engineering. Current approaches either try to optimize a non-data-driven surface representation to fit the points, or learn a data-driven prior over the distribution of commonly occurring surfaces and how they correlate with potentially noisy point clouds. Data-driven methods enable robust handling of noise and typically either focus on a global or a local prior, which trade-off between robustness to noise on the global end and surface detail preservation on the local end. We propose PPSurf as a method that combines a global prior based on point convolutions and a local prior based on processing local point cloud patches. We show that this approach is robust to noise while recovering surface details more accurately than the current state-of-the-art. Our source code, pre-trained model and dataset are available at: https://github.com/cg-tuwien/ppsurf {CCSXML} <ccs2012> <concept> <concept_id>10010147.10010371.10010396</concept_id> <concept_desc>Computing methodologies Shape modeling</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10010147.10010257</concept_id> <concept_desc>Computing methodologies Machine learning</concept_desc> <concept_significance>300</concept_significance> </concept> </ccs2012>

\ccsdesc

[500]Computing methodologies Shape modeling \ccsdesc[300]Computing methodologies Machine learning

\printccsdesc
00footnotetext: Published in Computer Graphics Forum (Jan 2024):
https://onlinelibrary.wiley.com/doi/10.1111/cgf.15000

1 Introduction

3D surface reconstruction from point clouds is a key step for workflows in areas such as content creation, archaeology, digital cultural heritage, and engineering, to convert raw 3D point scan data, like casual RGBD (color and depth) mobile phone images or more accurate range scans (e.g., from laser range scanner), to surface-based 3D object representations that can be used in downstream applications.

Given the large practical interest, surface reconstruction has become a central problem in computer graphics and vision research. The problem is generally ill-defined, as different surfaces may correspond to similar point clouds. However, several approaches have been proposed to tackle this ambiguity. One research direction attempts to optimize surface representations with strong non-data-driven inductive biases to fit the point cloud [KBH06, WSS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19, PJL*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21, BZYSM21]. This resolves the ambiguity, but is susceptible to deteriorating conditions of the input points, such as scan noise or regions with missing points, which cannot easily be corrected using a fixed inductive bias. Another line of research focuses on learning data-driven priors, usually over the distribution of commonly occurring surfaces and how they correlate with potentially noisy point clouds [PFS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19, EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20, PJL*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21, BM22]. The surface reconstruction ambiguity can then be resolved by finding a surface that has a high probability for the given point cloud under the learned prior. The prior in these data-driven methods can range from global, where the prior captures a distribution over full 3D object surfaces, to local, where the prior captures the distribution over local surface patches. Global priors are the least susceptible to noise and missing points, but have limited capability to capture fine local details. Local priors, on the other hand, can capture such fine details accurately, but are more susceptible to strong noise and missing points. Existing methods mostly focus their prior on a small range in this global-local spectrum. For example, DeepSDF [PFS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] uses a global prior, Points2Surf [EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] mostly focuses on a local prior, while POCO’s point convolutions [BM22] learn a prior in the medium range that is reasonably robust to deteriorating conditions, but still struggles to accurately capture local detail.

We propose PPSurf as a method that covers a wider range in the global-local spectrum of priors, by combining the local prior of a patch-based method like Points2Surf with a more global prior of a point convolution-based method like POCO. For this purpose, we design an architecture that has two branches: the first branch is based on POCO [BM22] and provides a global prior by applying several layers of point convolutions to a sparse set of support points. To reconstruct geometric details more accurately, we merge features from this first branch with features from a second branch, which processes a local patch of points with PointNet [QSMG17]. We additionally discovered that modifying the architecture of PointNet by replacing the sum aggregation with an attention-based aggregation improves performance. This results in a method that is robust to noise and missing points, while preserving details more accurately than previous methods.

In our experiments, we compare PPSurf to several previous state-of-the-art methods, both data-driven and non-data-driven, on synthetic as well as real-world data, and demonstrate improved performance on both in-distribution, and out-of-distribution surface reconstruction tasks.

2 Related Work

Surface reconstruction from point clouds is an active area of research. We distinguish between data-driven methods that train on a large dataset, and non-data-driven methods that do not use machine learning or overfit to a single shape.

Non-data-driven methods.

Poisson reconstruction [KBH06, KH13] has for many years been the gold standard of non-data-driven approaches. Recent works have suggested optimizing the parameters of a neural network to predict the signed distance to the surface [AL20, SMB*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20, AL21] directly from a single point cloud. In particular, Atzmon and Lipman [AL20] introduced this concept for unoriented point clouds. They optimized the parameters of the neural network with a sign-agnostic loss and a geometric initialization of its parameters. Gropp et al. [GYH*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] and Atzmon and Lipman [AL21] followed up on this work and included a gradient regularization in the loss. Later, Ma et al. [BZYSM21] introduced Neural-Pull, an optimization objective that uses directly the gradient of the optimized SDF to move the query points to the closest point in the input point cloud. In follow-up work, this approach was extended by incorporating a network to classify a point being on the surface or not  [CHL23], and an additional loss that aligns the gradient direction between different level sets of the SDF  [MZLH23]. In order to improve the quality of the final SDF, Yifan et al. [YWOSH20] and Zhou et al. [ZML*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT22] proposed to iteratively increase the input point cloud with points sampled from the optimized SDF in the previous iteration. A different approach was proposed by Peng et al. [PJL*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21] (also used in LION [ZVW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT22]), based on a differentiable Poisson Surface Reconstruction operation that could be used for optimization-based or learned reconstructions. Differently from previous methods, the set of points in the surface is optimized through the differentiable reconstruction instead of a neural network representing the SDF. Lin et al. proposed a parametric Gauss formula for reconstruction [LXSW22], which has quadratic complexity in memory leading to prohibitive costs for larger point clouds. VIPSS by Huang et al. [HCJ19] formulates reconstruction as a constrained quadratic optimization problem. iPSR by Hou et al. [HWW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT22] uses an iterative approach to Poisson reconstruction that improves the surface more and more, while removing the need to be given point normals. IsoPoisson by Xiao et al. [XSL*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT23] incorporate an isovalue constraint to the Poisson equation, which helps with consistent normal orientation and consequently improved reconstruction.

Non-data-driven methods are sensitive to noise, which is usually present in real 3D scans. In order to address this limitation to some extent, a recent pre-print from Wang et al. [WWW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT23] proposed Neural-IMLS, a non-data-driven method that regularizes the smoothness of surface normals using an MLP with limited capacity. While this produces smooth surfaces, it also loses some geometric detail due to this non-data-driven regularization. Noise to Noise Map** by Baorui et al. [MLH23] focuses on the reconstruction of noisy point clouds in an unsupervised overfitting scheme. Additionally, these methods require significant reconstruction times due to the optimization being performed for each shape individually, which can be a limiting factor for large scans.

Data-driven methods.

A recent line of research has approached the problem of shape reconstruction in a data-driven manner by using a large dataset to learn a prior over the distribution of commonly occurring surfaces and how they correlate with the input point cloud. These approaches are typically fast and robust to noisy inputs compared to non-data-driven approaches. However, in such methods, the resulting reconstruction highly depends on the quality of such priors.

Several works have proposed to use a global prior to capturing the distribution over full 3D object surfaces [CZ19, MON*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19, PFS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19]. These methods define such a prior as a single latent vector representing the shape, which is then used as a condition in a fully connected network to decode the SDF of a given query point. Usually, the decoder is trained on large data sets with a point-cloud encoder [MON*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19, CZ19]. However, Park et al. [PFS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] proposed to train the decoder directly on such data sets and then optimize the latent vector to match the noisy point cloud during inference. Recently, Zhang et al. [ZTNW23] proposed to use richer global priors. They introduced an encoder-decoder network that encodes the input point cloud using attention modules into a set of latent vectors representing the shape, which are then used to predict the SDF for a set of query points using cross-attention modules.

Other works have opted to condition their models with local priors. Siddiqui et al. [STM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21] encoded the input point clouds in a set of latent scene patches. These latent vectors are used to query a database of latent vectors from patches obtained from the training set. The obtained patches are then blended together using an attention mechanism. Ma et al. [BYSZ22] incorporated local priors by including a network pre-trained on a large number of surface patches which classifies a point as being on the surface or not. This network is used to guide an optimization process that learns the shape’s SDF using another neural network. Jiang et al. [JSM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] pre-trained an SDF encoder-decoder on a large data set of object parts. Then, during the optimization process, only the latent codes of the different parts of the object are optimized. Chen et al. [CTFZ22] propose a dual contouring method learned on a small local prior.

Since global and local priors provide complementary information about the shape, a common approach is to use a prior in the medium range using a hierarchical encoder-decoder network. These approaches reduce the input point cloud to a simplified representation, e.g., voxelization or subsampled point cloud, which is then enriched by the global information provided by the bottleneck of the encoder-decoder architecture. Chibane et al. [CAPM20] and Peng et a. [PNM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] proposed a 3DCNN encoder-decoder network to encode the sparse or noisy point cloud to later predict the SDF for an arbitrary point around the surface. Chibane et al. [CMPM20] extended this work to predict an unsigned distance field, which allowed them to represent complex open surfaces. Tang et al. [TLX*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21] extended the work of Peng et al. [PNM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] to include test-time optimization to improve out-of-distribution point clouds. Ummenhofer and Koltun [UK21] proposed a CNN that works directly on an Octree, from which the model was able to predict the SDF. Wang et al. [WLT22] also represented the input point cloud with an octree, from which they constructed a graph. This graph was further processed by a GCN encoder-decoder to generate an embedding for each octree node, from where the final SDF is predicted. Dai et al. [DDN20] instead used a 3D sparse encoder-decoder network to complete partial 3D scans and predict a complete SDF. Lionar et al. [LESP21] also developed an encoder-decoder network but used instead the projection of the input point cloud to a set of arbitrary 2D planes, from which the final SDF was predicted. Boulch and Marlet [BM22] recently proposed to use an encoder-decoder network that directly worked with points, avoiding discretization artifacts from voxel-based representations. Although all these methods work relatively well when compared with methods that use global or local priors alone, they struggle to accurately capture fine local details of the shapes.

Erler et al. [EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] proposed to explicitly model global and local priors directly from point clouds using two different branches. Each branch used a PointNet [QSMG17] architecture, to process the local patch around the query point in the local branch, and a point cloud representing the complete shape in the global branch. While the local branch was able to capture high-frequency details relatively well, they used a weak global prior due to the small subset of points selected to represent the shape. Our approach addresses the limitations of all these methods by incorporating strong global and local priors.

Refer to caption
Figure 1: PPSurf computes the occupancy probability at a query point 𝐱𝐱\mathbf{x}bold_x given a noisy point cloud P𝑃Pitalic_P. A global branch processes a sparse subset PPsuperscript𝑃𝑃P^{\prime}\subseteq Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_P using point convolutions, followed by an attention-based interpolation to get features at 𝐱𝐱\mathbf{x}bold_x that capture the coarse shape of the point cloud. A local branch processes a local patch P𝐱Psubscript𝑃𝐱𝑃P_{\mathbf{x}}\subset Pitalic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ⊂ italic_P using a PointNet [QSMG17] with attention-based aggregation to get features at 𝐱𝐱\mathbf{x}bold_x that capture the detailed shape of the point cloud near 𝐱𝐱\mathbf{x}bold_x. Global and local features are aggregated to compute the occupancy probability at 𝐱𝐱\mathbf{x}bold_x.

3 Method

The goal of our method is to take as input an unoriented point cloud P={𝐩1,𝐩2,,𝐩n}𝑃subscript𝐩1subscript𝐩2subscript𝐩𝑛P=\{\mathbf{p}_{1},\mathbf{p}_{2},\dots,\mathbf{p}_{n}\}italic_P = { bold_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } that was sampled from an unknown watertight surface 𝒮gtsuperscript𝒮gt\mathcal{S}^{\text{gt}}caligraphic_S start_POSTSUPERSCRIPT gt end_POSTSUPERSCRIPT with a noisy sampling process, and output a surface 𝒮𝒮\mathcal{S}caligraphic_S that approximates 𝒮gtsuperscript𝒮gt\mathcal{S}^{\text{gt}}caligraphic_S start_POSTSUPERSCRIPT gt end_POSTSUPERSCRIPT as closely as possible. Similar to several previous approaches, we define the surface 𝒮𝒮\mathcal{S}caligraphic_S using an implicit representation, since this guarantees watertightness and naturally handles arbitrary surface topology in a smooth and differentiable way. More specifically, 𝒮𝒮\mathcal{S}caligraphic_S is defined as the 0.50.50.50.5-level set of an occupancy field o(𝐱)𝑜𝐱o(\mathbf{x})italic_o ( bold_x ): 𝒮{𝐱|o(𝐱)=0.5}𝒮conditional-set𝐱𝑜𝐱0.5\mathcal{S}\coloneqq\{\mathbf{x}\ |\ o(\mathbf{x})=0.5\}caligraphic_S ≔ { bold_x | italic_o ( bold_x ) = 0.5 }.

We train a network fθ(𝐱,P)subscript𝑓𝜃𝐱𝑃f_{\theta}(\mathbf{x},P)italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x , italic_P ) with parameters θ𝜃\thetaitalic_θ to model the field o𝑜oitalic_o given a point cloud P𝑃Pitalic_P:

o(𝐱)fθ(𝐱,P)𝑜𝐱subscript𝑓𝜃𝐱𝑃o(\mathbf{x})\coloneqq f_{\theta}(\mathbf{x},P)italic_o ( bold_x ) ≔ italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x , italic_P ) (1)

The network f𝑓fitalic_f uses two branches: i) a global branch fg(𝐱,P)superscript𝑓𝑔𝐱superscript𝑃f^{g}(\mathbf{x},P^{\prime})italic_f start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) that performs point convolutions [BPM20] on a sparse random subset of points PPsuperscript𝑃𝑃P^{\prime}\subseteq Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_P and effectively learns a global prior over the coarse shape of 𝒮𝒮\mathcal{S}caligraphic_S given the input points P𝑃Pitalic_P, and ii) a local branch fl(𝐱,P𝐱)superscript𝑓𝑙𝐱subscript𝑃𝐱f^{l}(\mathbf{x},P_{\mathbf{x}})italic_f start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ) that processes a small local patch P𝐱Psubscript𝑃𝐱𝑃P_{\mathbf{x}}\subset Pitalic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ⊂ italic_P around 𝐱𝐱\mathbf{x}bold_x and effectively learns a local prior over the detailed shape of local surface patches. Each branch outputs a feature vector for a given query point 𝐱𝐱\mathbf{x}bold_x that is combined into a single feature vector before being processed by a small MLP fosuperscript𝑓𝑜f^{o}italic_f start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT that outputs the occupancy probability o(𝐱)𝑜𝐱o(\mathbf{x})italic_o ( bold_x ):

fθ(𝐱,P)fo(fg(𝐱,P)fl(𝐱,P𝐱)),subscript𝑓𝜃𝐱𝑃superscript𝑓𝑜direct-sumsuperscript𝑓𝑔𝐱superscript𝑃superscript𝑓𝑙𝐱subscript𝑃𝐱f_{\theta}(\mathbf{x},P)\coloneqq f^{o}\big{(}f^{g}(\mathbf{x},P^{\prime})% \oplus f^{l}(\mathbf{x},P_{\mathbf{x}})\big{)},italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x , italic_P ) ≔ italic_f start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ⊕ italic_f start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ) ) , (2)

where direct-sum\oplus is the operation used to combine the two feature vectors, a sum in our experiments. Here, we omit the parameters of the networks fosuperscript𝑓𝑜f^{o}italic_f start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT, fgsuperscript𝑓𝑔f^{g}italic_f start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT, and flsuperscript𝑓𝑙f^{l}italic_f start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT to avoid a cluttered notation. Figure 1 illustrates our architecture.

In the following, we describe the architecture of PPSurf, including the global and local branches in Section 3.1, followed by a description of the training and inference setups in Sections 3.2 and 3.3, respectively.

3.1 Architecture

Global Branch

The global branch fg(𝐱,P)superscript𝑓𝑔𝐱superscript𝑃f^{g}(\mathbf{x},P^{\prime})italic_f start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) takes as input a random subset PPsuperscript𝑃𝑃P^{\prime}\subseteq Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_P and a 3D query point 𝐱𝐱\mathbf{x}bold_x and outputs a global feature vector for the point 𝐱𝐱\mathbf{x}bold_x, which encodes information about the coarse shape of the point cloud. We implement the global branch using POCO [BM22], which consists of two main components: i) a point convolution module that computes a feature vector 𝐳isubscriptsuperscript𝐳𝑖\mathbf{z}^{\prime}_{i}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each sparse point 𝐩iPsubscriptsuperscript𝐩𝑖superscript𝑃\mathbf{p}^{\prime}_{i}\in P^{\prime}bold_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, followed by ii) an interpolation module that interpolates the feature vectors 𝐳isubscriptsuperscript𝐳𝑖\mathbf{z}^{\prime}_{i}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to get the global feature vector at point 𝐱𝐱\mathbf{x}bold_x.

The point convolution module uses FKAConv [BPM20] to process the sparse point cloud Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT into a feature vector for each point:

Z=FKAConv(P),superscript𝑍FKAConvsuperscript𝑃Z^{\prime}=\text{FKAConv}(P^{\prime}),italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = FKAConv ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (3)

where Z={𝐳1,𝐳2,,𝐳|P|}superscript𝑍subscriptsuperscript𝐳1subscriptsuperscript𝐳2subscriptsuperscript𝐳superscript𝑃Z^{\prime}=\{\mathbf{z}^{\prime}_{1},\mathbf{z}^{\prime}_{2},\dots,\mathbf{z}^% {\prime}_{|P^{\prime}|}\}italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_POSTSUBSCRIPT } is the set of feature vectors at each sparse point. Due to limitations both in performance and network capacity, convolutions can only be performed on the sparse subset Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT instead of the full point cloud P𝑃Pitalic_P, with |P|=10ksuperscript𝑃10k|P^{\prime}|=10\text{k}| italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | = 10 k in our experiments. This module consists of 10 layers of convolutions. Each layer uses a convolution kernel that operates over the 16 nearest neighbors of each point.

Given a query point 𝐱𝐱\mathbf{x}bold_x, the interpolation module interpolates the feature vectors 𝐳isubscriptsuperscript𝐳𝑖\mathbf{z}^{\prime}_{i}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at the nearest neighbors 𝒩𝐱subscriptsuperscript𝒩𝐱\mathcal{N}^{\prime}_{\mathbf{x}}caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT of the query point to get the global feature vector using an attention-based weighting:

fg(𝐱,P)fgb(j𝒩𝐱w𝐱,jfga((𝐱𝐩j)𝐳j))superscript𝑓𝑔𝐱superscript𝑃superscript𝑓𝑔𝑏subscript𝑗subscriptsuperscript𝒩𝐱subscript𝑤𝐱𝑗superscript𝑓𝑔𝑎conditional𝐱subscriptsuperscript𝐩𝑗subscriptsuperscript𝐳𝑗\displaystyle f^{g}(\mathbf{x},P^{\prime})\coloneqq f^{gb}\Big{(}\sum_{j\in% \mathcal{N}^{\prime}_{\mathbf{x}}}w_{\mathbf{x},j}\ f^{ga}\big{(}(\mathbf{x}-% \mathbf{p}^{\prime}_{j})\|\mathbf{z}^{\prime}_{j}\big{)}\Big{)}italic_f start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≔ italic_f start_POSTSUPERSCRIPT italic_g italic_b end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT bold_x , italic_j end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_g italic_a end_POSTSUPERSCRIPT ( ( bold_x - bold_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∥ bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) (4)
with w𝐱,j1kk=164softmaxjfkgw((𝐱𝐩j)𝐳j),with subscript𝑤𝐱𝑗1𝑘superscriptsubscript𝑘164subscriptsoftmax𝑗subscriptsuperscript𝑓𝑔𝑤𝑘conditional𝐱subscriptsuperscript𝐩𝑗subscriptsuperscript𝐳𝑗\displaystyle\text{with }w_{\mathbf{x},j}\coloneqq\frac{1}{k}\sum_{k=1}^{64}% \text{softmax}_{j}\ f^{gw}_{k}\big{(}(\mathbf{x}-\mathbf{p}^{\prime}_{j})\|% \mathbf{z}^{\prime}_{j}\big{)},with italic_w start_POSTSUBSCRIPT bold_x , italic_j end_POSTSUBSCRIPT ≔ divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 64 end_POSTSUPERSCRIPT softmax start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_g italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( ( bold_x - bold_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∥ bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (5)

where \| denotes concatenation, fgasuperscript𝑓𝑔𝑎f^{ga}italic_f start_POSTSUPERSCRIPT italic_g italic_a end_POSTSUPERSCRIPT, fgbsuperscript𝑓𝑔𝑏f^{gb}italic_f start_POSTSUPERSCRIPT italic_g italic_b end_POSTSUPERSCRIPT are two MLPs that transform the feature vectors before and after the weighted sum, and fkgwsubscriptsuperscript𝑓𝑔𝑤𝑘f^{gw}_{k}italic_f start_POSTSUPERSCRIPT italic_g italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are learned weighting functions, each implemented as a single linear layer. Analogous to the attention heads in multi-head attention, multiple different weighting functions are used as a form of ensemble learning, 64646464 in our experiments. Note that when evaluating multiple query points 𝐱𝐱\mathbf{x}bold_x for a point cloud, the point convolution module only needs to be evaluated once, while the interpolation module needs to be evaluated once per query point.

Local Branch

The local branch fl(𝐱,P𝐱)superscript𝑓𝑙𝐱subscript𝑃𝐱f^{l}(\mathbf{x},P_{\mathbf{x}})italic_f start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ) processes a local patch P𝐱subscript𝑃𝐱P_{\mathbf{x}}italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT around the query point 𝐱𝐱\mathbf{x}bold_x and outputs a local feature vector for the point 𝐱𝐱\mathbf{x}bold_x, which encodes information about the detailed shape of the point cloud near 𝐱𝐱\mathbf{x}bold_x. We base the local branch on the popular PointNet [QSMG17] architecture, which has been successfully applied in various methods that process local point cloud patches [GKOM18, RLBG*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19]. We modify the architecture with an attention-based aggregation, instead of the original max- or sum-based aggregation, which we found to improve performance.

We define the local patch P𝐱subscript𝑃𝐱P_{\mathbf{x}}italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT as the 50505050 nearest neighbors of the query point 𝐱𝐱\mathbf{x}bold_x. We normalize the patch by centering it at the origin and scaling it to fit into a unit sphere, obtaining the normalized patch P¯𝐱subscript¯𝑃𝐱\bar{P}_{\mathbf{x}}over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT. Subsequently, we apply PointNet with attention-based aggregation similar to Eqs. 4 and 5, but without using multiple attention heads:

fl(𝐱,P𝐱)flb(𝐩¯jP¯𝐱vjfla(𝐩¯j))superscript𝑓𝑙𝐱subscript𝑃𝐱superscript𝑓𝑙𝑏subscriptsubscript¯𝐩𝑗subscript¯𝑃𝐱subscript𝑣𝑗superscript𝑓𝑙𝑎subscript¯𝐩𝑗\displaystyle f^{l}(\mathbf{x},P_{\mathbf{x}})\coloneqq f^{lb}\big{(}\sum_{% \bar{\mathbf{p}}_{j}\in\bar{P}_{\mathbf{x}}}v_{j}\ f^{la}(\bar{\mathbf{p}}_{j}% )\big{)}italic_f start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( bold_x , italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ) ≔ italic_f start_POSTSUPERSCRIPT italic_l italic_b end_POSTSUPERSCRIPT ( ∑ start_POSTSUBSCRIPT over¯ start_ARG bold_p end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ over¯ start_ARG italic_P end_ARG start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_l italic_a end_POSTSUPERSCRIPT ( over¯ start_ARG bold_p end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) (6)
with vjsoftmaxjflv(fla(𝐩¯j)),with subscript𝑣𝑗subscriptsoftmax𝑗superscript𝑓𝑙𝑣superscript𝑓𝑙𝑎subscript¯𝐩𝑗\displaystyle\text{with }v_{j}\coloneqq\text{softmax}_{j}\ f^{lv}\big{(}f^{la}% (\bar{\mathbf{p}}_{j})\big{)},with italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≔ softmax start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT italic_l italic_v end_POSTSUPERSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_l italic_a end_POSTSUPERSCRIPT ( over¯ start_ARG bold_p end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) , (7)

where flvsuperscript𝑓𝑙𝑣f^{lv}italic_f start_POSTSUPERSCRIPT italic_l italic_v end_POSTSUPERSCRIPT is a learned weighting function implemented as linear layer, and flasuperscript𝑓𝑙𝑎f^{la}italic_f start_POSTSUPERSCRIPT italic_l italic_a end_POSTSUPERSCRIPT, flbsuperscript𝑓𝑙𝑏f^{lb}italic_f start_POSTSUPERSCRIPT italic_l italic_b end_POSTSUPERSCRIPT are two MLPs that transform the feature vectors before and after the weighted aggregation.

3.2 Training Setup

We train our network with a binary cross-entropy loss BCE(o(𝐱),ogt(𝐱))BCE𝑜𝐱superscript𝑜gt𝐱\text{BCE}(o(\mathbf{x}),o^{\text{gt}}(\mathbf{x}))BCE ( italic_o ( bold_x ) , italic_o start_POSTSUPERSCRIPT gt end_POSTSUPERSCRIPT ( bold_x ) ) supervised by the ground-truth occupancy ogt(𝐱)superscript𝑜gt𝐱o^{\text{gt}}(\mathbf{x})italic_o start_POSTSUPERSCRIPT gt end_POSTSUPERSCRIPT ( bold_x ) on query points defined by the Points2Surf ABC var-noise training set [EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20]. We train with AdamW (lr=0.001, betas=(0.9, 0.999), eps=1e-5, weight_decay=1e-2, amsgrad=False) for 150 epochs with scheduler steps at 75 and 125 epochs. On our training machine, we can fully utilize all 4 NVIDIA A40 GPUs with distributed data-parallel training using a total batch size of 50 and 48 workers. The other hyperparameters are mostly based on POCO, namely 10k10𝑘10k10 italic_k manifold points, a network decoder k𝑘kitalic_k of 64 and 2 output classes. One change is the increased latent size of 128, which was 32 in POCO. The additional hyperparameters for the local branch are a PointNet latent size of 256 and a patch size of 50. The training takes about 5 hours.

3.3 Inference Setup

We use the inference setup from POCO [BM22], which differs from the training setup in two main aspects: First, we perform test-time augmentation in our global branch to obtain more reliable results. Second, we sample query points in a grid and use a variant of marching cubes to reconstruct a mesh. We describe both in more detail below.

Test-time augmentation.

The sparse subsample PPsuperscript𝑃𝑃P^{\prime}\subseteq Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⊆ italic_P used for the global branch may miss important geometric detail. To improve robustness, we compute the per-point feature vectors 𝐳isubscriptsuperscript𝐳𝑖\mathbf{z}^{\prime}_{i}bold_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for multiple different random subsamples P1,P2,subscriptsuperscript𝑃1subscriptsuperscript𝑃2P^{\prime}_{1},P^{\prime}_{2},\dotsitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , …, until each point in P𝑃Pitalic_P is included in at least 10101010 subsamples. The 10absent10\geq 10≥ 10 different feature vectors for each point in P𝑃Pitalic_P are then averaged before performing the interpolation step.

Mesh reconstruction.

We place query points in a 2573superscript2573257^{3}257 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT grid and use a variant of marching cubes [LC87] proposed in POCO to obtain a mesh from the occupancy field o(𝐱)𝑜𝐱o(\mathbf{x})italic_o ( bold_x ). That marching cubes variant uses a region-growing strategy starting from the input points to avoid the costly evaluation at all grid points, and super-samples marching-cube edges that intersect a surface to get a more accurate estimate of the intersection point.

4 Results

We evaluate PPSurf by comparing our surface reconstruction performance to several state-of-the-art methods, both data-driven and non-data-driven. We show both quantitative and qualitative comparisons in Section 4.1. Additionally, we provide an ablation to empirically validate our main design choices in Section 4.2.

Metrics

We use three well-known metrics to evaluate the error of our reconstructed surfaces: the Chamfer distance, the F1-score, and the normal error. We evaluate each metric at 100k100𝑘100k100 italic_k random surface samples for the Chamfer distance and normal error, or volume samples for the IoU. This results in roughly ±0.5%plus-or-minuspercent0.5\pm 0.5\%± 0.5 % variance between different runs.

The Chamfer distance [BTBW77, FSG17] measures the distance between two point sets. We use it to measure the distance between reconstructed and GT surface samples. It is defined as:

1|A|𝐩iAmin𝐩jB𝐩i𝐩j22+1|B|𝐩jBmin𝐩iA𝐩j𝐩i22,1𝐴subscriptsubscript𝐩𝑖𝐴subscriptsubscript𝐩𝑗𝐵subscriptsuperscriptnormsubscript𝐩𝑖subscript𝐩𝑗221𝐵subscriptsubscript𝐩𝑗𝐵subscriptsubscript𝐩𝑖𝐴subscriptsuperscriptnormsubscript𝐩𝑗subscript𝐩𝑖22\frac{1}{|A|}\sum_{\mathbf{p}_{i}\in A}\min_{\mathbf{p}_{j}\in B}\|\mathbf{p}_% {i}-\mathbf{p}_{j}\|^{2}_{2}\ +\frac{1}{|B|}\sum_{\mathbf{p}_{j}\in B}\min_{% \mathbf{p}_{i}\in A}\|\mathbf{p}_{j}-\mathbf{p}_{i}\|^{2}_{2},divide start_ARG 1 end_ARG start_ARG | italic_A | end_ARG ∑ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_A end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_B end_POSTSUBSCRIPT ∥ bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG | italic_B | end_ARG ∑ start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_B end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_A end_POSTSUBSCRIPT ∥ bold_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - bold_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (8)

where A𝐴Aitalic_A and B𝐵Bitalic_B are point sets of size 100k100𝑘100k100 italic_k sampled on the surface of the GT object and the reconstructed object.

The F1 Score [TH15] measures the overlap between the ground truth surface and the region enclosed by the reconstructed surface, similar to the IoU. It weights precision and recall equally.

The normal error measures the difference between the normals of the reconstructed surface and the ground truth normals. We sample 100k100𝑘100k100 italic_k points uniformly on the ground truth mesh A𝐴Aitalic_A and the reconstructed mesh B𝐵Bitalic_B, storing the normals of their originating faces. Then, we find the closest neighbor of each point bB𝑏𝐵b\in Bitalic_b ∈ italic_B in A𝐴Aitalic_A. We report the average angle between the normals of these point pairs: 1nsi=1ns(arccos(𝐧iA𝐧iB))1subscript𝑛𝑠superscriptsubscript𝑖1subscript𝑛𝑠subscriptsuperscript𝐧𝐴𝑖subscriptsuperscript𝐧𝐵𝑖\frac{1}{n_{s}}\sum_{i=1}^{n_{s}}(\arccos(\mathbf{n}^{A}_{i}\cdot\mathbf{n}^{B% }_{i}))divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( roman_arccos ( bold_n start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_n start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), where 𝐧iAsubscriptsuperscript𝐧𝐴𝑖\mathbf{n}^{A}_{i}bold_n start_POSTSUPERSCRIPT italic_A end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐧iBsubscriptsuperscript𝐧𝐵𝑖\mathbf{n}^{B}_{i}bold_n start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are ground truth and reconstructed normals, respectively.

Datasets

We evaluate our method on the set of dataset variants introduced in P2S [EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20]:

Refer to caption
Figure 2: Point cloud examples of the data sets used in our evaluation.

All synthetic point clouds were created with the simulated scanner BlenSor [GKUP11] with a scanner resolution of 176×144176144176\times 144176 × 144, using a random number of scans between 5555 and 30303030. Each dataset comes in up to five variants:

  • no noise: A version without noise

  • med. noise: A version with noise using a standard deviation of 0.01L0.01𝐿0.01L0.01 italic_L, where L𝐿Litalic_L is the largest side of the object’s bounding box.

  • high noise: A version with noise using a standard deviation of 0.05L0.05𝐿0.05L0.05 italic_L.

  • var. noise: A version with variable noise, where the amount of noise used for a given shape is sampled uniformly in [0,0.05L]00.05𝐿[0,0.05L][ 0 , 0.05 italic_L ] and the number of scans in [5,30]530[5,30][ 5 , 30 ].

  • sparse: A version with medium noise where all shapes only uses 5555 scans, resulting in point clouds between 2k2𝑘2k2 italic_k and 22k22𝑘22k22 italic_k points.

  • dense: A version with medium noise where all shapes use 30303030 scans, resulting in point clouds between 5k5𝑘5k5 italic_k and 112k112𝑘112k112 italic_k points.

For a fair comparison, we train all data-driven methods on the ABC var. noise dataset and evaluate them with each test set. Some point cloud examples of these data sets are illustrated in Figure 2.

4.1 Comparisons

We compare PPSurf to several recent data-driven and non-data-driven reconstruction methods. PGR [LXSW22], Neural-IMLS (IMLS) [WWW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT23] and Shape as Points (SAP-O) are non-data-driven methods that do not train on a large dataset and instead directly fit a surface to the input point cloud. Shape as Points also has a data-driven variant (SAP) that uses a trained network. Additionally, we use Points2Surf (P2S) [EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] and POCO [BM22] as data-driven methods. We took the best available variants and settings for each method: For PGR, we use the default parameters wmin=0.0015, alpha=1.05 for no noise, med noise and var. noise. We use the following adapted parameters for the other datasets: wmin=0.03, alpha=2.0 for high noise, wmin=0.03, alpha=1.5 for dense and sparse. We use thingi-noisy for SAP-O, vanilla for P2S , and 10k-FKAConv-InterpAttentionKHeadsNet for POCO. We used the provided noise-large configuration for SAP. For IMLS, we used the results provided by the authors (high noise datasets were not provided by the authors). Note that IMLS was developed concurrently with our work.

Qualitative Comparison

Figure 3 shows comparisons for one example of each dataset variant. While non-data-driven methods give competitive results on low-noise results, PPSurf has a clear advantage with sparse and noisy point clouds.

Refer to caption
Figure 3: Qualitative comparison to all baselines. We evaluate one example from each dataset variant (except for the no-noise variants, where we only show one example due to space constraints). Colors show the distance of the reconstructed surface to the ground-truth surface. Due to our combined local and global branches, PPSurf reconstructs details more accurately than the baselines, especially in the presence of strong input noise. Note that results for Neural IMLS are not provided by the authors for the high-noise dataset variants. See the supplementary material for a qualitative comparison on all shapes in our test sets.

We show examples on real-world point clouds in Figure 4, where PPSurf produces clearer edges and finer details.

Refer to caption
Figure 4: Real-world reconstructions. We compare to all baselines on the two point clouds that were obtained from real-world objects.

Quantitative Comparison

Table 1 shows the performance of PPSurf on all dataset variants. We report the average over all shapes in the test set. Similar to the qualitative results, POCO, PPSurf and the non-data-driven methods share the first place in most low-noise dataset variants, but PPSurf 50NN takes the lead in almost all other dataset variants. This confirms that adding the local branch does indeed improve the local reconstruction.

Table 1: Comparison of reconstruction errors. We show the Chamfer distance, F1 Score and normal error between reconstructed and ground-truth surfaces averaged over all shapes in a dataset. Apart from a few noise-free datsets, PPSurf consistently performs similar or better than the baselines. Note that the mean performance of Neural IMLS does not include results of the high noise datasets, which are likely to favour PPSurf. Due to out-of-memory errors, PGR could not reconstruct all shapes, which are ignored here. Best results per row are marked in bold and the second-best results are underlined.
Dataset Chamfer Distance (x100) \Downarrow F1 \Uparrow Normal Error \Uparrow
IMLS PGR SAP-O SAP P2S POCO PPSurf IMLS PGR SAP-O SAP P2S POCO PPSurf IMLS PGR SAP-O SAP P2S POCO PPSurf
ABC var. noise 1.08 1.60 1.18 1.18 0.84 0.70 0.66 0.78 0.50 0.67 0.79 0.83 0.89 0.90 0.55 1.29 1.11 0.52 0.65 0.32 0.30
ABC no noise 0.48 0.53 0.63 1.08 0.61 0.50 0.48 0.92 0.92 0.88 0.80 0.88 0.94 0.94 0.20 0.26 0.30 0.51 0.31 0.19 0.19
Famous no noise 0.35 0.36 0.35 0.99 0.46 0.39 0.37 0.95 0.95 0.95 0.84 0.93 0.95 0.96 0.44 0.48 0.44 0.75 0.57 0.46 0.46
Thingi10k no noise 0.40 0.30 0.43 0.89 0.39 0.33 0.33 0.93 0.95 0.92 0.85 0.93 0.95 0.96 0.25 0.19 0.23 0.49 0.32 0.18 0.19
Mean no noise 0.41 0.40 0.47 0.99 0.49 0.41 0.39 0.93 0.94 0.92 0.83 0.91 0.95 0.95 0.30 0.31 0.33 0.58 0.40 0.28 0.28
Famous med. noise 0.54 0.95 0.58 1.06 0.52 0.49 0.48 0.91 0.60 0.90 0.83 0.92 0.94 0.93 0.57 1.35 0.91 0.78 0.63 0.53 0.54
Thingi10k med. noise 0.58 0.93 0.56 0.93 0.44 0.39 0.38 0.90 0.57 0.89 0.85 0.92 0.94 0.94 0.37 1.32 0.78 0.50 0.38 0.24 0.25
Mean med. noise 0.56 0.94 0.57 0.99 0.48 0.44 0.43 0.91 0.59 0.89 0.84 0.92 0.94 0.94 0.47 1.33 0.85 0.64 0.50 0.38 0.39
ABC high noise 1.90 1.96 1.51 1.24 1.00 0.97 0.42 0.49 0.75 0.78 0.84 0.85 1.35 1.42 0.65 0.99 0.43 0.41
Famous high noise 1.86 1.80 1.62 1.14 1.11 1.01 0.50 0.59 0.78 0.84 0.84 0.85 1.35 1.39 0.91 1.04 0.76 0.72
Thingi10k high noise 1.94 1.89 1.45 1.08 0.92 0.83 0.51 0.60 0.80 0.84 0.87 0.88 1.31 1.36 0.64 0.90 0.47 0.43
Mean high noise 1.90 1.88 1.53 1.16 1.01 0.94 0.32 0.56 0.78 0.82 0.85 0.86 1.34 1.39 0.73 0.98 0.55 0.52
Famous sparse 0.90 0.88 0.71 1.24 0.77 0.67 0.64 0.86 0.88 0.88 0.74 0.89 0.92 0.92 0.68 0.75 0.86 0.89 0.71 0.60 0.61
Thingi10k sparse 0.82 0.89 0.86 1.35 0.78 0.63 0.63 0.85 0.86 0.84 0.73 0.87 0.90 0.90 0.48 0.53 0.76 0.73 0.51 0.37 0.39
Mean sparse 0.86 0.88 0.79 1.29 0.77 0.65 0.63 0.86 0.87 0.86 0.73 0.88 0.91 0.91 0.58 0.64 0.81 0.81 0.61 0.48 0.50
Famous dense 0.45 0.70 0.53 0.96 0.41 0.42 0.40 0.93 0.43 0.90 0.86 0.94 0.95 0.95 0.52 1.33 1.00 0.74 0.59 0.49 0.48
Thingi10k dense 0.49 0.67 0.54 0.88 0.36 0.35 0.33 0.91 0.47 0.89 0.87 0.94 0.95 0.96 0.30 1.23 0.84 0.47 0.33 0.21 0.21
Mean dense 0.47 0.69 0.53 0.92 0.39 0.39 0.37 0.92 0.45 0.89 0.86 0.94 0.95 0.96 0.41 1.28 0.92 0.60 0.46 0.35 0.34
Mean overall 0.61 1.04 0.93 1.16 0.70 0.61 0.58 0.89 0.66 0.80 0.81 0.89 0.91 0.92 0.43 0.98 0.88 0.66 0.61 0.40 0.40

Computation Time and Memory Consumption

Training PPSurf on the ABC var-noise training set was done in 5 hours on 4 NVIDIA A40 GPUs and 48 AMD EPYC-Milan cores. We reconstruct all shapes in our test sets on a single A40 and 48 CPU cores. See the timings and memory consumption in Table 2. While non-data-driven methods tend to be faster than data-driven ones, SAP is a lightning-fast exception. PPSurf with small patch sizes has a negligible impact on resources compared to POCO. Neural IMLS does not report timings. As it is concurrent work, we could not do our own measurements. While it is fast, PGR’s memory usage varies a lot with point cloud size, between a few GB to going out-of-memory with >46GB on 21 shapes.

Table 2: Comparison of reconstruction times and memory usage. We show the mean reconstruction time per shape and the maximum GPU-memory consumption for each method on the ABC var noise dataset. 200NN uses reconstruction batch size 25k25𝑘25k25 italic_k instead of 50k50𝑘50k50 italic_k. PGR went out of memory on 21 shapes.
Time per Shape Max GPU Memory
PGR 1.9 min >46GB
SAP-O 1.1 min 3.8GB
SAP 0.8sec 3.1GB
P2S 13.5min 14.3GB
POCO 1.6min 9.0GB
PPSurf 10NN 1.6min 9.1GB
PPSurf 25NN 1.7min 9.1GB
PPSurf 50NN 1.9min 9.3GB
PPSurf 100NN 2.6min 13.7GB
PPSurf 200NN 3.5min 13.2GB

Discussion

For dense and noise-free point clouds, non-data-driven methods such as PGR, SAP-O and especially IMLS are a good option. However, their performance is limited in the presence of typical point-cloud artifacts, due to missing data-driven priors. Data-driven methods such as SAP, P2S, POCO and PPSurf can better deal with such artifacts. SAP is the fastest method but lacks accuracy, possibly due to its very small network. A bigger version could perhaps produce competitive results but would require non-trivial changes to the method.

P2S employs a relatively simple PointNet for global shape encoding, which results in a weak global prior that can not reach the quality of a more efficient encoder such as FKAConv. Furthermore, it reconstructs noisy surfaces, which is reflected in the relatively high normal error, even with noise-free inputs.

Apart from some noise-free datasets, only POCO is close to PPSurf’s quality. PPSurf achieves similar results on low-noise point clouds, but significantly better reconstructions for noisy point clouds. When predicting the occupancy at the query points, POCO has no direct access to the full point cloud, only to a coarse latent representation. This inability to accurately represent local information is likely the reason why POCO tends to produce blobby structures and over-smooth the reconstructed surfaces. We avoid this by providing a latent code that captures local detail more accurately by adding a local branch that directly encodes dense local patches of the point cloud.

4.2 Ablation

We investigate several design choices in an ablation study on the ABC var-noise test set. Most importantly, Table 3 shows that having both global and local branches gains a major advantage. Referring to Table 4, the optimal local patch size lies in the range of 25NN25𝑁𝑁25NN25 italic_N italic_N to 100NN100𝑁𝑁100NN100 italic_N italic_N. Further, attention is a better symmetric operation than max, and concatenating features is similar to summing them. This can be seen in Table 5. Please see the supplementary for an evaluation of the most relevant variants on all datasets. We compare the following variants of our method:

  • Full is the full method as described in Section 3.

  • For Only Local, we set the global features to zeros, disabling this branch. Based on the results of this experiment, we conclude that this model can not reliably encode any surface since it lacks global knowledge of the surface to reconstruct.

  • Only Global is similar to POCO as it omits the local branch. The results show that a global prior can help to obtain reliable reconstructions but with lower performance due to the missing fine details.

  • For Sym Max, we replace the attention-based interpolation used in the local branch with the max, effectively making this branch a PointNet [QSMG17]. The results show an advantage for attention.

  • In Merge Cat, we concatenate the features of both branches instead of summing them, which leads to twice the input size for the final MLP. Results show that this is slightly worse than Full.

  • The QPoints variant is the same as Merge Cat, but additionally, we concatenate query point coordinates to the input of the learned weighting function flvsuperscript𝑓𝑙𝑣f^{lv}italic_f start_POSTSUPERSCRIPT italic_l italic_v end_POSTSUPERSCRIPT. However, this results in a slightly worse performance than Full and even Merge Cat.

  • For the xNN variants, we take the x𝑥xitalic_x nearest neighbors for local subsample. Full is equal to 50NN.

Table 3: Branch Ablation Study. Using the ABC var-noise test set, we compare PPSurf Full to variants with disabled branches. The only-local variant failed to produce some meshes, which are ignored in the metrics. The best results per column are marked in bold.
Model Chamfer (x100) \Downarrow F1 Score \Uparrow Normal Error \Downarrow
Only Local 2.69 0.36 1.56
Only Global 0.70 0.89 0.33
PPSurf Full 0.66 0.90 0.30
Table 4: Patch Size Ablation Study. Using the ABC var-noise test set, we compare PPSurf Full (which is 50NN) to variants with different patch sizes. The best results per column are marked in bold.
Model Chamfer (x100) \Downarrow F1 Score \Uparrow Normal Error \Downarrow
PPSurf 10NN 1.10 0.90 0.40
PPSurf 25NN 0.66 0.90 0.31
PPSurf Full 0.66 0.90 0.30
PPSurf 100NN 0.66 0.90 0.30
PPSurf 200NN 0.67 0.89 0.31
Table 5: Miscellanous Ablation Study. Using the ABC var-noise test set, we compare PPSurf Full (which uses Merge Sum and Sym Att) to more variants. The best results per column are marked in bold.
Model Chamfer (x100) \Downarrow F1 Score \Uparrow Normal Error \Downarrow
PPSurf Sym Max 1.11 0.90 0.40
PPSurf QPoints 0.67 0.89 0.31
PPSurf Merge Cat 0.66 0.90 0.30
PPSurf Full 0.66 0.90 0.30

4.3 Limitations

Refer to caption
Figure 5: Limitations. Our method has difficulties to recover the edges of clean point clouds due to training with noisy point clouds.
Refer to caption
Figure 6: Limitations. Our method struggles with reconstructions of large missing areas in the input point cloud since we did not incorporate any generative model capabilities.

Reconstruction times are still non-interactive, due to the need to evaluate the occupancy at a large number of samples. Possibilities for speed-ups include more efficient sampling strategies to use fewer query points.

As our learned priors were trained on noisy data to make PPSurf more robust to noise, they also bias the reconstructed surface to some extent towards the distributions learned by the priors. This results in some loss of accuracy when applied to noise-free point clouds compared to some of the non-data-driven methods (see Figure 5). Learning a prior that is specialized to noise-free point clouds, or including more noise-free point clouds in our training set would alleviate this issue.

While PPSurf is better than the baselines in filling scan shadows, it is not a generative method and cannot generate new geometric detail in large missing regions. This limits the size of missing regions that can be filled with plausible geometry. Combining PPSurf with a generative model would be an interesting direction for future work. See Figure 6 for an example of inaccurately filled scan shadows.

5 Conclusion

In this paper, we have introduced PPSurf as a method for surface reconstruction from raw, unoriented point clouds. In contrast to previous methods, PPSurf incorporates strong local and global priors learned from data. Whilst our global prior is based on a point convolutional neural network that processes the point cloud as a whole, fine details are preserved through the local prior based on dense local point cloud patches. We have shown in extensive studies that PPSurf is able to achieve better surface reconstructions than previous data-driven and non-data-driven methods, being more robust to noise in the input point cloud and preserving fine details at the same time.

In the future, we would like to investigate how modern techniques borrowed from generative models could improve the obtained reconstruction from sparse point clouds where large parts of the shape are missing.

6 Acknowledgements

This work has been supported by the FWF projects P24600-N23 and P32418-N31, the WWTF project ICT19-009 and the EU MSCA-ITN project EVOCATION (grant agreement 813170).

References

  • [AL20] Atzmon M., Lipman Y.: Sal: Sign agnostic learning of shapes from raw data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
  • [AL21] Atzmon M., Lipman Y.: SALD: sign agnostic learning with derivatives. In International Conference on Learning Representations, ICLR (2021).
  • [BM22] Boulch A., Marlet R.: Poco: Point convolution for surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2022), pp. 6302–6314.
  • [BPM20] Boulch A., Puy G., Marlet R.: FKAConv: Feature-Kernel Alignment for Point Cloud Convolution. In 15th Asian Conference on Computer Vision (ACCV 2020) (2020).
  • [BTBW77] Barrow H. G., Tenenbaum J. M., Bolles R. C., Wolf H. C.: Parametric correspondence and chamfer matching: Two new techniques for image matching. In Proceedings of the 5th International Joint Conference on Artificial Intelligence - Volume 2 (San Francisco, CA, USA, 1977), IJCAI’77, Morgan Kaufmann Publishers Inc., pp. 659–663. URL: http://dl.acm.org/citation.cfm?id=1622943.1622971.
  • [BYSZ22] Baorui M., Yu-Shen L., Zhizhong H.: Reconstructing surfaces for sparse point clouds with on-surface priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  • [BZYSM21] Baorui M., Zhizhong H., Yu-Shen L., Matthias Z.: Neural-pull: Learning signed distance functions from point clouds by learning to pull space onto surfaces. In International Conference on Machine Learning (ICML) (2021).
  • [CAPM20] Chibane J., Alldieck T., Pons-Moll G.: Implicit functions in feature space for 3d shape reconstruction and completion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
  • [CHL23] Chen C., Han Z., Liu Y.-S.: Unsupervised inference of signed distance functions from single sparse point clouds without learning priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  • [CMPM20] Chibane J., Mir A., Pons-Moll G.: Neural unsigned distance fields for implicit function learning. In Advances in Neural Information Processing Systems (NeurIPS) (December 2020).
  • [CTFZ22] Chen Z., Tagliasacchi A., Funkhouser T., Zhang H.: Neural dual contouring. ACM Transactions on Graphics (Special Issue of SIGGRAPH) 41, 4 (2022).
  • [CZ19] Chen Z., Zhang H.: Learning implicit fields for generative shape modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  • [DDN20] Dai A., Diller C., Nießner M.: Sg-nn: Sparse generative neural networks for self-supervised scene completion of rgb-d scans. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE (2020).
  • [EGO*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] Erler P., Guerrero P., Ohrhallinger S., Mitra N. J., Wimmer M.: Points2Surf: Learning implicit surfaces from point clouds. In European Conference on Computer Vision (ECCV) (2020).
  • [FSG17] Fan H., Su H., Guibas L. J.: A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 605–613.
  • [GKOM18] Guerrero P., Kleiman Y., Ovsjanikov M., Mitra N. J.: PCPNet: Learning local shape properties from raw point clouds. Computer Graphics Forum 37, 2 (2018), 75–85. doi:10.1111/cgf.13343.
  • [GKUP11] Gschwandtner M., Kwitt R., Uhl A., Pree W.: Blensor: Blender sensor simulation toolbox. In International Symposium on Visual Computing (2011), Springer, pp. 199–208.
  • [GYH*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] Gropp A., Yariv L., Haim N., Atzmon M., Lipman Y.: Implicit geometric regularization for learning shapes. In International Conference on Machine Learning (ICML). 2020.
  • [HCJ19] Huang Z., Carr N., Ju T.: Variational implicit point set surfaces. ACM Trans. Graph. 38, 4 (jul 2019). URL: https://doi.org/10.1145/3306346.3322994, doi:10.1145/3306346.3322994.
  • [HWW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT22] Hou F., Wang C., Wang W., Qin H., Qian C., He Y.: Iterative poisson surface reconstruction (ipsr) for unoriented points. ACM Trans. Graph. 41, 4 (jul 2022). URL: https://doi.org/10.1145/3528223.3530096, doi:10.1145/3528223.3530096.
  • [JSM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] Jiang C., Sud A., Makadia A., Huang J., Nießner M., Funkhouser T.: Local implicit grid representations for 3d scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020).
  • [KBH06] Kazhdan M., Bolitho M., Hoppe H.: Poisson surface reconstruction. In Proc. of the Eurographics symposium on Geometry processing (2006).
  • [KH13] Kazhdan M., Hoppe H.: Screened poisson surface reconstruction. ACM Transactions on Graphics (ToG) 32, 3 (2013), 29.
  • [KMJ*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] Koch S., Matveev A., Jiang Z., Williams F., Artemov A., Burnaev E., Alexa M., Zorin D., Panozzo D.: Abc: A big cad model dataset for geometric deep learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
  • [LC87] Lorensen W. E., Cline H. E.: Marching cubes: A high resolution 3d surface construction algorithm. In ACM siggraph computer graphics (1987), vol. 21, ACM, pp. 163–169.
  • [LESP21] Lionar S., Emtsev D., Svilarkovic D., Peng S.: Dynamic plane convolutional occupancy networks. In Winter Conference on Applications of Computer Vision (WACV) (2021).
  • [LXSW22] Lin S., Xiao D., Shi Z., Wang B.: Surface reconstruction from point clouds without normals by parametrizing the gauss formula. ACM Trans. Graph. 42, 2 (oct 2022). URL: https://doi.org/10.1145/3554730, doi:10.1145/3554730.
  • [MLH23] Ma B., Liu Y.-S., Han Z.: Learning signed distance functions from noisy 3d point clouds via noise to noise map**. In International Conference on Machine Learning (ICML) (2023).
  • [MON*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] Mescheder L., Oechsle M., Niemeyer M., Nowozin S., Geiger A.: Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  • [MZLH23] Ma B., Zhou J., Liu Y.-S., Han Z.: Towards better gradient consistency for neural signed distance functions via level set alignment. In Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  • [PFS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] Park J. J., Florence P., Straub J., Newcombe R., Lovegrove S.: Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2019), pp. 165–174.
  • [PJL*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21] Peng S., Jiang C. M., Liao Y., Niemeyer M., Pollefeys M., Geiger A.: Shape as points: A differentiable poisson solver. In Advances in Neural Information Processing Systems (NeurIPS) (2021).
  • [PNM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] Peng S., Niemeyer M., Mescheder L., Pollefeys M., Geiger A.: Convolutional occupancy networks. In European Conference on Computer Vision (ECCV) (2020).
  • [QSMG17] Qi C. R., Su H., Mo K., Guibas L. J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 652–660.
  • [RLBG*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] Rakotosaona M.-J., La Barbera V., Guerrero P., Mitra N. J., Ovsjanikov M.: Pointcleannet: Learning to denoise and remove outliers from dense point clouds. Computer Graphics Forum (2019).
  • [SMB*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT20] Sitzmann V., Martel J. N., Bergman A. W., Lindell D. B., Wetzstein G.: Implicit neural representations with periodic activation functions. In Advances in Neural Information Processing Systems (NeurIPS) (2020).
  • [STM*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21] Siddiqui Y., Thies J., Ma F., Shan Q., Nießner M., Dai A.: Retrievalfuse: Neural 3d scene reconstruction with a database. In International Conference on Computer Vision (ICCV) (2021).
  • [TH15] Taha A. A., Hanbury A.: Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC medical imaging 15, 1 (2015), 1–28.
  • [TLX*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT21] Tang J., Lei J., Xu D., Ma F., Jia K., Zhang L.: Sa-convonet: Sign-agnostic optimization of convolutional occupancy networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021).
  • [UK21] Ummenhofer B., Koltun V.: Adaptive surface reconstruction with multiscale convolutional kernels. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  • [WLT22] Wang P.-S., Liu Y., Tong X.: Dual octree graph networks for learning adaptive volumetric shape representations. ACM Transactions on Graphics (2022).
  • [WSS*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT19] Williams F., Schneider T., Silva C. T., Zorin D., Bruna J., Panozzo D.: Deep geometric prior for surface reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 10130–10139.
  • [WWW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT23] Wang Z., Wang P., Wang P., Dong Q., Gao J., Chen S., Xin S., Tu C., Wang W.: Neural-imls: Self-supervised implicit moving least-squares network for surface reconstruction. IEEE Transactions on Visualization and Computer Graphics (2023), 1–16. doi:10.1109/TVCG.2023.3284233.
  • [XSL*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT23] Xiao D., Shi Z., Li S., Deng B., Wang B.: Point normal orientation and surface reconstruction by incorporating isovalue constraints to poisson equation. Computer Aided Geometric Design 103 (2023), 102195. URL: https://www.sciencedirect.com/science/article/pii/S0167839623000274, doi:https://doi.org/10.1016/j.cagd.2023.102195.
  • [YWOSH20] Yifan W., Wu S., Oztireli C., Sorkine-Hornung O.: Iso-points: Optimizing neural implicit surfaces with hybrid representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
  • [ZJ16] Zhou Q., Jacobson A.: Thingi10k: A dataset of 10,000 3d-printing models. arXiv preprint arXiv:1605.04797 (2016).
  • [ZML*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT22] Zhou J., Ma B., Liu Y.-S., Fang Y., Han Z.: Learning consistency-aware unsigned distance functions progressively from raw point clouds. In Advances in Neural Information Processing Systems (NeurIPS) (2022).
  • [ZTNW23] Zhang B., Tang J., Nießner M., Wonka P.: 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models. ACM Trans. Graph. 42, 4 (jul 2023). URL: https://doi.org/10.1145/3592442, doi:10.1145/3592442.
  • [ZVW*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT22] Zeng X., Vahdat A., Williams F., Gojcic Z., Litany O., Fidler S., Kreis K.: Lion: Latent point diffusion models for 3d shape generation. In Advances in Neural Information Processing Systems (NeurIPS) (2022).