Multi-View Neural 3D Reconstruction of Micro-/Nanostructures
with Atomic Force Microscopy

Shuo Chen

{}^{1}

Mao Peng

{}^{2}

Yi** Li

{}^{1}

Bing-Feng Ju

{}^{2}

Hujun Bao

{}^{1}

Yuan-Liu Chen

{}^{2*}

Guofeng Zhang

{}^{1*}

{}^{1}

State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China

{}^{2}

State Key Lab of Fluid Power&Mechatronic Systems, Zhejiang University, Hangzhou 310027, China
{chenshuo.eric, pengmao99, eugenelee, mbfju, baohujun, yuanliuchen, zhangguofeng}@zju.edu.cn

Abstract

Atomic Force Microscopy (AFM) is a widely employed tool for micro-/nanoscale topographic imaging. However, conventional AFM scanning struggles to reconstruct complex 3D micro-/nanostructures precisely due to limitations such as incomplete sample topography capturing and tip-sample convolution artifacts. Here, we propose a multi-view neural-network-based framework with AFM (MVN-AFM), which accurately reconstructs surface models of intricate micro-/nanostructures. Unlike previous works, MVN-AFM does not depend on any specially shaped probes or costly modifications to the AFM system. To achieve this, MVN-AFM uniquely employs an iterative method to align multi-view data and eliminate AFM artifacts simultaneously. Furthermore, we pioneer the application of neural implicit surface reconstruction in nanotechnology and achieve markedly improved results. Extensive experiments show that MVN-AFM effectively eliminates artifacts present in raw AFM images and reconstructs various micro-/nanostructures including complex geometrical microstructures printed via Two-photon Lithography and nanoparticles such as PMMA nanospheres and ZIF-67 nanocrystals. This work presents a cost-effective tool for micro-/nanoscale 3D analysis.

^$\ast$^$\ast$footnotetext: corresponding authors

1 Introduction

The investigation of the three-dimensional (3D) structure plays a vital role in nanotechnology research, encompassing areas like nanofabrications [1, 2], nanorobots [3, 4], and nanomedicines [5, 6], given its critical relevance to the functional properties of micro-/nanoscale objects. Currently, the Scanning Electron Microscope (SEM) [7] is a prevalent tool for observing the 3D geometry of micro-/nanostructures. This technique involves irradiating the sample with an electron beam and capturing a 2D image by detecting the intensity of secondary electrons emitted from the sample surface. Despite its widespread use and multiple advantages, SEM is a destructive method [8], requires a vacuum environment, and cannot provide accurate height information in the images. In contrast, the Atomic Force Microscope (AFM) [9] acquires precise height information of the sample surface through the forces between its probe and the sample. Moreover, AFM can operate in various environments, is insensitive to the sample material, and is non-destructive.

Nonetheless, conventional AFM comes with its own set of challenges. One primary limitation is that conventional AFM can only capture 2.5D information instead of a complete 3D representation of the sample because the position feedback in conventional AFM systems is confined to the vertical direction [9]. Another significant challenge is the issue of tip-sample convolution [10]. This phenomenon arises from geometrical interactions between the AFM tip and the surface features of the sample (Supplementary Fig. 1). These interactions often lead to artifacts [10, 11, 12] in the scanning results that are inherently difficult to differentiate from the actual sample geometry. Such limitations impede the effective use of AFM to investigate intricate 3D micro-/nanostructures and catalyze the development of advanced AFM technologies, i.e., 3D-AFM.

The advancement of 3D-AFM technology predominantly follows two distinct trajectories. The first approach involves the design of specialized probe shapes aimed at enabling the measurement of structures that are inaccessible with conventional AFM scanning. As an example, critical dimension AFM (CD-AFM)[13, 14], currently a prevalent method for semiconductor structures, utilizes flared tips that enable lateral dithering in addition to the vertical oscillation of the cantilever. These designs equip the AFM with the capability to image not only vertical but also undercut sidewall features of samples. Additionally, there are other creative designs of probes, such as the introduction of hinge structures [15], orthogonal cantilevers [16], and probes made of carbon nanotube with high aspect ratios[17]. However, the extra cost and complexity of manufacturing special probes and customized AFM scanning systems present substantial challenges to the widespread adoption of these methods.

Another common technique for 3D-AFM is the practice of tilting either the probe [18, 19, 20, 21, 22] or the sample [23, 24] to scan micro-/nanostructures from multiple directions. These methods avoid the necessity for specialized flared tips, instead relying on integrating multiple scans into a complete 3D model. As a result, the effectiveness relies on the precision of the data stitching of multiple scans. Historically, previous methods [18, 19, 20, 21, 22, 23, 24] predominantly apply to simple, well-defined structures, such as gratings. The grating’s relatively straightforward structure facilitates the manual removal of artifacts resulting from tip-sample convolution and simplifies the problem of the tilting method. However, significant challenges arise when the tilting method is applied to micro-/nanostructures of unknown and intricate shapes, such as those created by Two-photon Lithography (TPL) [25, 26, 27, 28] or comprised of diverse nanoparticles [5, 6, 29]—a scenario frequently encountered in nanotechnology research. Firstly, the complex surface geometries of these structures make it difficult to manually identify and remove artifacts from the AFM images. Secondly, the existence of these unremoved artifacts in the scans impedes the accuracy of the stitching process. Thirdly, owing to the complex overlap** relationships among multi-view data, simply stitching of these data is insufficient for the construction of a clear and accurate 3D model.

In this study, we propose MVN-AFM, a framework that is able to reconstruct the 3D surface model of a wide range of complex-shaped micro-/nanostructures without any specially shaped probes or costly modifications to the AFM system. Our framework leverages the concept of tilting samples, but we extend its application for complex structures beyond the limitations of existing methods. Specifically, we first propose an iterative optimization algorithm to automatically remove AFM artifacts and improve the alignment accuracy of multi-view data from intricate micro-/nanostructures. Subsequently, in order to reconstruct the 3D model of these structures by multi-view AFM data, we draw inspiration from multi-view depth fusion techniques [30, 31, 32, 33, 34, 35] in computer vision. We introduce the neural implicit surface reconstruction methods [36, 37, 38, 39, 40, 41], the recent advance in this field, to utilize a neural network to represent the 3D model of micro-/nanostructures. By employing differentiable volume rendering to train the neural implicit function with multi-view AFM data supervision, we fuse the multi-view scanning results into an accurate and comprehensive 3D model. Furthermore, we conduct extensive experiments to evaluate the capabilities of the MVN-AFM framework. In detail, we utilize the TPL technique to fabricate various 3D microstructures with distinct geometrical characteristics and prepare specimens of commonly used nanoparticles, including polymethyl methacrylate (PMMA) nanospheres [42, 43, 44] and Zeolitic imidazolate framework (ZIF)-67 nanocrystals [45, 46, 47, 48]. MVN-AFM effectively eliminates artifacts present in raw AFM images and successfully reconstructs not only the overall shape but also specific hidden details that are not discernible in conventional AFM scans. The ability of MVN-AFM to provide detailed and accurate 3D reconstructions of a broad spectrum of micro-/nanostructures, coupled with its low implementation cost, positions it as a potentially valuable tool in nanotechnology research.

2 Results

2.1 Pipeline of MVN-AFM

Refer to caption — Figure 1: The pipeline of MVN-AFM. a First, we place various micro-/nanostructures on a rotatable tilt stage. Second, we rotate the turntable and measure the vertical heights by a conventional AFM, resulting in a set of multi-view AFM images with many artifacts. b Input raw AFM images with artifacts and iterate two sub-steps. In the data alignment process, data judged as artifacts are eliminated before alignment, and the poses of multi-view images are updated. In the mask-solving process, the solved poses transform the multi-view data, and the data consistency is cross-validated to solve the mask of artifacts. c The posed and masked multi-view AFM images are used to train a neural network representing a signed distance field in space by the differentiable volume rendering technique (Supplementary Fig. 5). d The 3D surface model extracted from the signed distance field, and corresponding topography images without artifacts.

The objective of MVN-AFM is to construct a generalized process that can be used for 3D reconstruction of unknown-shaped complex micro-/nanostructures based on multi-view AFM scanning data, relying only on conventional AFM systems and standard probes. MVN-AFM consists of three main steps (Fig. 1): Multi-view AFM Scanning, Data Alignment and Mask Solving, and Neural Implicit Surface Reconstruction.

The step of Multi-view AFM Scanning (Fig. 1a) captures multi-view AFM images of 3D micro-/nanostructures, providing essential geometric information for the subsequent reconstruction process. Previous tilting methods [18, 19, 20, 21, 22, 23, 24] acquire complete geometric information with only two AFM scans towards each sidewall of the grating. However, in nanotechnology research, the prior knowledge of the sample’s shape and orientation is often unknown. To address this, we design a standardized AFM scanning process that is independent of the sample’s shape. This process aims to comprehensively acquire the surface geometric information of unknown and complex-shaped structures. We use the sample-tilting approach to collect multi-view data, thus avoiding modifications to the mechanics of conventional AFM. For this purpose, we designed a rotatable stage with a tilt angle (Supplementary Fig. 2). We carefully designed the size of the whole stage so that it can be used in the limited activity space of a commercial AFM without any collision. The sample is placed on a turntable in the center of the stage so that multi-view scans around the sample can be acquired as it rotates to different directions.

In the step of Data Alignment and Mask Solving (Fig. 1b), we iteratively align multi-view AFM data to a unified coordinate system and remove the artifacts in AFM images. A critical process in the tilting method involves establishing the spatial relationship among multi-view data. This requires determining a coordinate transformation, denoted as pose $T$ , to align data from different views within the same coordinate system (Supplementary Fig. 3). To achieve this, some methods [18, 21, 22] employ designs with high-cost components to enable precise control of probe scanning direction, which allows direct access to $T$ . Others [19, 24, 23] utilize the Iterative Closest Point (ICP) algorithm [49] to solve $T$ by minimizing the distance between AFM data points. The ICP algorithm relies heavily on data free from artifacts that do not represent the actual sample shape. Consequently, these previous methods [19, 24, 23] manually remove highly recognizable artifacts from AFM data of simple structures before using the ICP algorithm. However, for multi-view images obtained by a conventional AFM system on intricate structures, there are two challenges: eliminating artifacts and solving for the pose. Here, we define the label of whether each data point is an artifact as a latent variable, mask $M$ . To simultaneously solve for $T$ and $M$ , we propose an iterative EM-like algorithm [50, 51]. Initially, we consider all AFM data artifact-free, i.e., $M_{0}$ is all zeros, and directly apply the ICP algorithm to obtain a set of coarse poses, $T_{0}$ . In the E-step, we project multi-view data onto each other using $T_{i-1}$ from the previous iteration $i-1$ . We then conduct cross-validation of the projected data to identify areas of inconsistency in multi-view data. These regions are then labeled as ones, and we obtain the updated $M_{i}$ . The motivation for the cross-validation is that artifacts vary with the probe-sample angle, so they are inconsistent in multi-view data. In contrast, the sample’s geometric surface remains consistent across different views, regardless of the probe scanning directions. In the M-step, we erase the artifact through $M_{i}$ and apply the ICP algorithm again to compute the updated $T_{i}$ . Iterating the EM steps, the data filtered out of most artifacts by $M$ yields a more precise $T$ . The improved $T$ also makes the cross-validation accurate. Two steps are iteratively performed to enhance each other. After $k$ iterations, we obtain the accurate pose $T_{k}$ and the artifact mask $M_{k}$ for each viewpoint of the AFM data.

The step of Neural Implicit Surface Reconstruction (Fig. 1c) utilizes the aligned and masked AFM data to train an implicit function represented by a neural network and extract the final 3D surface model of micro-/nanostructures from the network. Specifically, we follow previous work [52] and model the geometry surface of the sample as a neural network encoded Signed Distance Field (SDF): $s(x;\theta):\mathbb{R}^{3}\rightarrow\mathbb{R}$ , where $x$ denotes a 3D position and $\theta$ is the parameters of a Multilayer Perceptron (MLP). The SDF defines a scalar field where each point in space is associated with the shortest distance to a surface. This distance is positive if the point is outside the surface and negative if it is inside. Previously, the neural implicit surface reconstruction methods [41] were developed for posed images from cameras in the macroscopic world, not for nanotechnology and AFM data applications. To adapt this method to our reconstruction process, we convert AFM images into depth maps as captured by virtual orthogonal cameras (Supplementary Fig. 4). Each pixel in AFM images transformed by pose $T$ and filtered by mask $M$ represents a sample ray. The loss function is the disparity between the AFM data and the depth value derived from differentiable volume rendering along the ray. We then optimize the MLP network parameters $\theta$ through back propagation [53]. Moreover, we also use the multiresolution hash encoding technique [54] to accelerate the training process. Upon completing the training, we can query the SDF value of any spatial point by inferring the network. Based on the fact that the zero set of SDF represents the structure surface, the Marching Cubes algorithm [55] is finally utilized to extract the 3D surface model of the micro-/nanostructures (Fig. 1d).

2.2 Reconstruction of Two-photon Lithography Structures

In this section, we evaluate the proposed MVN-AFM on microstructures printed by TPL technology. The TPL technology, which focuses a femtosecond laser into tiny voxels in a photosensitive resist, enables 3D printing of a given Computer-Aided Design (CAD) model with sub-100 nm resolution through the two-photon polymerization (TPP) process [56]. To fully demonstrate the performance of MVN-AFM on complex 3D microstructures, we printed a set of samples with different geometrical features. Specifically, we printed six structures (Supplementary Fig. 6): cylinder, undercut, spiral, gear, monkey, and house. For centrosymmetric structures, we incorporated three small cones around each microstructure to indicate their orientation, as depicted in the first four rows of Fig. 2. This step is unnecessary for non-centrosymmetric structures, such as the monkey and the house. The height of these microstructures varies between 2 µm and 3.5 µm. We performed AFM scans in tap** mode, with a scan size of 10 µm $\times$ 10 µm and 256 lines of 256 points for each AFM image.

The cylinder (Fig. 2a) is a representative structure that challenges conventional AFM scanning [57] and previous tilting methods. Unlike grating structures, the vertical annular sidewall cannot be divided into distinct left and right sections. Next, the undercut (Fig. 2b) is a prevalent structural feature in semiconductor manufacturing [58]. This structure differs from the cylinder by having a sloped sidewall. We further constructed the gear (Fig. 2c), a mechanical structure frequently encountered in Micro-Electro-Mechanical systems (MEMS)[59]. The spiral (Fig. 2d) is distinguished by an intricate array of rotating curved concave and convex structures on its sidewall. Furthermore, we also conducted tests using the Suzanne Monkey (Fig. 2e), a standard model in computer graphics [60]. Unlike the previous columnar structures, this model poses unique challenges due to its curved features and the indistinct boundary between its top surface and sidewalls. We finally designed a house structure (Fig. 2f) that included shapes with both planar and curved features, along with detailed elements like grooves on the sidewalls to represent doors and windows.

In conventional AFM scanning, the results are a mixture of incomplete surface geometry and artifacts, which do not accurately represent the sample surface. As illustrated in Fig. 2g and h, despite the vast difference in sidewall geometry, the scanning result of the undercut is indistinguishable from that of the cylinder model. Some detailed features are also virtually invisible, such as the doors and windows in the house model (Fig. 2l). The cross-sectional profiles reveal significant distortion of these scanning results, which may lead researchers to misjudge the actual shape of these samples. Moreover, it is obvious that manually separating artifacts from the AFM scans of these intricate structures is almost impractical.

In contrast, the proposed MVN-AFM framework effectively eliminates artifacts while precisely merging geometric information from multi-view AFM scanning into accurate and comprehensive 3D models. These reconstructed models align consistently with SEM photos and demonstrate the surface of these samples. These models clearly differentiate between the cylinder (Fig. 2m) and undercut (Fig. 2n) structure, precisely reconstruct the gear’s teeth (Fig. 2o), and capture the correct orientation of the spiral threads(Fig. 2p) and the monkey’s subtly inward-curving side faces (Fig. 2q). Even the minutely detailed grooves (Fig. 2r) on the house sidewalls are observable.

2.3 Reconstruction of Nanoparticles

To further demonstrate the generalization of MVN-AFM on structures with smaller sizes and different geometry features, we selected some widely used nanoparticles, including PMMA nanospheres and ZIF-67 nanocrystals.

The spherical [61] is a typical shape of nanoparticles with extensive applications. The characteristics of nanospheres depend significantly on their size and surface structure [6], making accurate 3D reconstruction valuable for their research. To evaluate the effectiveness of our proposed method on spherical structure, we chose PMMA [42, 43, 44] with a diameter of about 500 nm, a widely used type of polymeric nanosphere. In Fig. 3g and h, it is evident that the artifacts in the conventional AFM scanning data are seamlessly connected with the top curved surface of the nanospheres, and the overall shape does not exhibit a spherical appearance. Under these circumstances, manually distinguishing artifact boundaries in AFM scanning as in previous methods becomes unachievable, and the details on the sides of the nanospheres are entirely lost, posing a challenge for researchers to accurately determine the size and structure of these nanospheres. In contrast, MVN-AFM demonstrates its advanced capabilities by accurately reconstructing several adherent nanospheres, each mirroring the shape observed in SEM photographs (Fig. 3c, d). Importantly, the tilt scanning feature of MVN-AFM captures the curved surface information on the sides of nanospheres. This information is seamlessly integrated, resulting in complete, artifact-free spherical reconstructions (Fig. 3k, l).

Next, we selected ZIF-67, a cubic symmetric nanocrystal, as a representative crystal-like nanoparticle to assess the effectiveness of our method. ZIF-67 [45] and its derivatives exhibit various excellent properties, leading to their extensive attention and research [46, 47, 48]. The morphological characteristics and size of nanocrystals can be tailored by manipulating experimental conditions during synthesis, leading to variations in their properties [62, 63, 64]. Therefore, obtaining accurate 3D surface models of nanocrystals is of paramount importance. In the SEM photos (Fig. 3e, f), the ZIF-67 nanocrystals exhibit a distinct polyhedral shape, ranging in size from about 100 nm to 500 nm. However, conventional AFM results (Fig. 3i, j) only partially demonstrate the top surface of the crystals, resulting in an overall blurred representation of the particles’ shape. Furthermore, in scenarios where multiple crystals aggregate, as illustrated in our example, only the uppermost crystal in the stack is visible in the conventional AFM scanning (Fig. 3i), with the underlying crystal completely obscured by the top crystal and associated probe artifacts. On the contrary, the surface model reconstructed by our method (Fig. 3m, n) accurately captures the polyhedral shape of the ZIF-67 crystals, delineating their side planes and edges with precision. Even in cases where the particles are stacked up, the MVN-AFM method successfully reveals the bottom crystal (Fig. 3m), typically obscured in conventional scans, and accurately represents the arrangement of the particles in the stack, aligning with the SEM photograph (Fig. 3e).

In our nanoparticle experiments, PMMA nanospheres and ZIF-67 nanocrystals differ significantly from the previous TPL microstructures in terms of material compositions, geometric features, and particle sizes. MVN-AFM precisely reconstructs these diverse samples by the exact same procedure and parameters, showcasing its outstanding generalizability and potential applicability in a broad spectrum of micro-/nanostructure research.

2.4 Evaluation on Simulated Data

MAE (µm)	Cylinder	Undercut	Gear	Spiral	Monkey	House	Average
Conventional AFM	0.2587	0.2642	0.1750	0.2456	0.2043	0.2414	0.2315
MVN-AFM	0.0131	0.0137	0.0204	0.0128	0.0122	0.0173	0.0149

Table 1: The error comparison of conventional AFM images and MVN-AFM. The mean absolute error (lower is better) of the input conventional AFM images and the topography images from 3D models reconstructed by MVN-AFM.

To complement the previous qualitative comparisons on real experimental data, we embarked on quantitative evaluations using a set of simulated AFM data. We generated these data based on the CAD models of structures in the TPL experiment. The simulation environment allows for the precise determination of the spatial relationships between multi-view AFM data and access to an accurate surface model of the sample, a feat challenging to achieve in real-world experiments. To ensure that the simulated data closely mimics real AFM scanning conditions, we developed a simulated probe model. This model is based on the quadrilateral pyramid probe (Fig. 4a) utilized in our TPL experiments. Considering the nanoscale curvature of the actual AFM probe is negligible compared to the microscale dimensions of the TPL samples, we simplified the probe representation into a pyramid shape (Fig. 4b). The simulation of AFM scanning was then carried out by modeling the rigid body collision [65] between the probe and the sample models. As depicted in Fig. 4c and d, the simulated data exhibit a high degree of similarity to the real AFM data in terms of the overall shape and the presence of artifacts.

First, we focus on showcasing the enhancements MVN-AFM brings to the alignment accuracy of multi-view AFM data. We quantified the error in this alignment process by comparing the rotation component $R$ and the translation component $t$ of pose $T$ with the accurate $\overline{R}$ and $\overline{t}$ for each viewpoint acquired in the simulation environment. As illustrated in Fig. 5, we present a comparative analysis between the alignment method in MVN-AFM and the direct ICP alignment of raw AFM data, which includes artifacts. The analysis reveals that MVN-AFM achieves a substantial improvement in alignment accuracy, evidenced by an impressive average reduction of 46% in rotation errors and 27% in translation errors. These results not only demonstrate the negative impact of artifacts present in AFM data on the precision of data alignment but also highlight the efficacy of MVN-AFM in mitigating these challenges.

In the subsequent analysis, we compare the models reconstructed by two prominent multi-view depth fusion techniques: the neural implicit method and TSDF (Truncated Signed Distance Function) Fusion [30]. Our method interprets AFM images as depth images from virtual orthogonal cameras, framing the challenge as the depth fusion problem in computer vision. Depth fusion techniques are categorized into traditional [30, 31, 32] and neural implicit methods [33, 34, 35, 40, 41]. The TSDF Fusion is a widely used traditional method that efficiently fuses multi-view depth data by dividing the 3D space into weighted discrete voxels and updating these weights according to the depth information along the pixel ray. However, multi-view AFM scanning of micro-/nanostructures presents unique challenges, particularly the uneven sampling density (Supplementary Fig. 9a) due to restricted tilt angles and limited viewpoints during the scanning process. This limitation often leads to regions with sparse sampling, such as the sidewall grooves of the spiral model (Fig. 6c). In the context of TSDF Fusion (Supplementary Fig. 9b), unintersected voxels in sparsely sampled regions demonstrate as voids in the reconstructed model, a limitation evident in Fig. 6a. Conversely, the neural implicit method, which represents the 3D model as a continuous neural network, exhibits a remarkable ability to construct a smooth and complete surface model, even with limited sample points, as depicted in Fig. 6b. This capability of the neural implicit method to effectively handle sparse data and reconstruct intricate surfaces makes it more suitable for the 3D reconstruction of multi-view AFM data in the MVN-AFM framework.

Finally, we evaluated the accuracy of the topography in the 3D surface models reconstructed by MVN-AFM. Our simulation environment enables the capture of precise surface topography unaffected by the probe’s shape. The difference between accurate surface topography and the AFM images reveals substantial artifacts (Fig. 6e), particularly around the edges and at the sharper geometric features of the structure in the raw AFM data. These results underscore the complexity of artifacts in AFM images of intricate structures and highlight the challenges associated with their manual removal. The visualization of the difference between the topography images from 3D models of MVN-AFM and the accurate topography images (Fig. 6f) clearly indicates that MVN-AFM is highly effective in eliminating the artifacts present in the AFM data. Moreover, it successfully integrates accurate surface geometric information from various viewpoints, significantly diminishing the surface topography error. To quantify these improvements, we calculated the average of the absolute pixel error values across multiple viewpoints for each model. As summarized in Tab. 1, this analysis reveals that MVN-AFM achieved an exceptional average reduction of 94% in topography error for each structure, affirming the high accuracy of the 3D models reconstructed by MVN-AFM.

3 Discussion

In this work, we introduce MVN-AFM, a framework for 3D surface reconstruction of intricate micro-/nanostructures using multi-view AFM scanning data. We propose a novel iterative optimization method to simultaneously align the multi-view data and remove artifacts in the AFM image, achieving higher alignment accuracy. To the best of our knowledge, we are the first to utilize the neural implicit surface reconstruction technique in the field of nanotechnology, which enables fusing spatially overlap** multi-view AFM data into an accurate 3D model. MVN-AFM shows considerable practical value. Extensive experiments demonstrate the superior capability of MVN-AFM on diverse micro-/nanostructures, including microstructures printed by TPL, PMMA nanospheres, and ZIF-67 nanocrystals. The 3D models reconstructed by MVN-AFM provide researchers with a more comprehensive representation of micro-/nanostructures than what is achievable with conventional AFM scanning and 2D SEM images. The success of MVN-AFM across these varied samples, each with distinct geometries, types, and sizes, robustly affirms its effectiveness and broad applicability in nanofabrication, nanoparticles, and many other fields. Importantly, MVN-AFM only requires a conventional AFM system and a standard AFM probe to achieve these results. This aspect makes MVN-AFM a more accessible and cost-effective option for researchers to analyze intricate 3D micro-/nanostructures.

Our framework is efficient and flexible. While multi-view AFM data provides more surface information, it also increases the time for AFM scanning. We tested the effect of reconstruction using different numbers of tilted AFM data(Supplementary Fig. 10 and Supplementary Tab. 1). We found that for the structures in the TPL experiment, the reconstruction quality converged with only eight tilt scans. The scanning time for one AFM image is about 4.5 minutes, and considering the time required to switch to different scanning directions, multi-view scanning takes about 2 hours for a single structure. Next, given a set of multi-view AFM data, our framework takes about 10 minutes to complete the 3D reconstruction. Notably, the basis of our algorithm is the multi-view consistency of the accurate surface topography and the multi-view inconsistency of image artifacts in multi-view AFM data. Because our algorithm does not take parameters such as scanning number, tilt angle, and probe shape as prior information, users have the flexibility to adjust these parameters according to their requirements.

Here, our study underscores the significant potential of integrating nanotechnology with neural implicit representations [66, 67, 68], an emerging and rapidly evolving field in computer vision. Specifically, we employ neural implicit surface reconstruction methods, where a neural network effectively represents a continuous SDF in space. Because of the continuous nature of neural networks, it is more suitable for representing geometric surface models that are inherently continuous in space than traditional discrete methods, as demonstrated in numerous recent works [33, 34, 35]. Our research further reveals the successful application of this technique in the reconstruction of 3D micro-/nanostructures with multi-view AFM data.

Our methodology’s foundational assumption is that the sample remains static during the multi-view AFM scanning process because the multi-view AFM data alignment step depends on the consistency of geometric features across different views. Therefore, our method is unsuitable for dynamic samples, such as living cells or samples prone to deformation during scanning. Precisely reconstructing the deformation process of nanostructures is widely demanded in many research, which points to a promising direction for future work. One possible solution is applying our method to High-Speed Atomic Force Microscopy (HS-AFM) [69], which allows observing the dynamic action of nano-objects.

4 Methods

4.1 Hardware and Software Requirments of MVN-AFM

In our experiments, all the code of MVN-AFM was run on a computer with an Intel i9-13900KF CPU, an Nvidia RTX4090 GPU, 64 gigabytes of RAM, and a Linux operation system with a 5.15.0 kernel version, which is a typical configuration of the current lab workstation computer. In order to run the code of MVN-AFM properly, it requires at least one graphics card with memory larger than 12 gigabytes. We use Open3D [70] 0.17.0, an open-source Python library, to handle the 3D data. Our implementation of neural implicit surface reconstruction is based on an open-source repository [71] of hash encoding [54] and NeuS [41], and the network is built on the deep learning framework PyTorch [72] 1.13.1.

4.2 Multi-View AFM Scanning

All multi-view AFM images in our experiments were acquired through a commercial AFM (Dimension ICON, Bruker). The AFM probe we used was the TESPA-V2 (Bruker), which has a height of 15 µm, an overall shape of a quadrilateral pyramid, a front angle of 25 ${}^{\circ}$ , a back angle of 17.5 ${}^{\circ}$ , and a side angle of 20 ${}^{\circ}$ . The stage has a 24 mm $\times$ 24 mm square bottom, and the height is 16 mm with a 30 ${}^{\circ}$ tilt angle. The turntable can hold a 4 mm $\times$ 4 mm sample. The whole stage can be placed directly into a commercial AFM and does not collide with any part of the AFM during scanning. In our experiments, we rotate 45 ${}^{\circ}$ each time between two adjacent scans and obtain eight tilt scans around the sample. Together with a conventional overhead view, nine scans are acquired per sample. For every view, we obtain an AFM image with 256 lines of 256 points by AFM working in tap** mode at a frequency of 1 Hz.

Precisely localizing the identical region across multi-view AFM scans is a critical step in the data capture process. The methodologies for achieving this localization are diverse and can be tailored to the unique characteristics of the experiment sample. In our TPL experiments, we utilized polymer grid markers printed around the sample to assist in localization by the optical microscope in the AFM system. For experiments of nanoparticles, we constructed scored markers on mica bases. These are just examples of the various strategies that can be adopted for localization, with other available methods including the use of a Transmission Electron Microscopy (TEM) index grid [73, 74] or the creation of noticeable artificial markers [75]. The common destination of these techniques is to ensure that the specific structure for 3D reconstruction can be precisely and efficiently located within the AFM system.

4.3 Data Alignment and Mask Solving

First, we claim some basic concepts in this step. Each AFM image is equivalent to a set of 3D points under an AFM coordinate system (Supplementary Fig. 3), with the z-axis being the position feedback direction of AFM and the x-y plane being the probe scanning plane. We define the AFM coordinate system of the overhead view image of the sample as the destinated sample coordinate system. Moreover, we define a corresponding virtual orthogonal depth camera for each AFM image (Supplementary Fig. 4). With a given set of raw AFM data, we convert the AFM height information $h$ into a depth value $d$ for a virtual orthogonal camera parallel to the x-y plane, $d=\alpha-h$ , where $\alpha$ is the assumed height of the camera. The value of $\alpha$ is simply ensured all $d$ to be positive. Each AFM data point is treated as a ray $r=o+d\vec{v}$ , originating from the pixel position $o$ on the imaging plane and extending along the direction $\vec{v}$ of the camera to the depth $d$ .

In the initialization and the M-step, we applied the point-to-plane ICP algorithm [49] to align the data points filtered by mask $M$ of AFM images and get a set of transformations $T$ . In the E-step, We compute the artifact mask $M$ of each AFM image by a cross-validation method. In detail, we first transform each camera ray $r$ to the sample coordinate system to obtain the ray $r^{\prime}$ by the currently solved $T=\{(R,t)\mid R\in\mathbb{R}^{3\times 3},\,t\in\mathbb{R}^{3}\}$ , where $r^{\prime}=o^{\prime}+d\vec{v}^{\prime}$ , $o^{\prime}=Ro+t$ , and $\vec{v}^{\prime}=R\vec{v}$ . Subsequently, we generate $n$ sets of meshes by connecting spatial points corresponding to neighboring pixels in $n$ AFM images within the sample coordinate system. Next, we compute the intersection of each ray with these meshes and obtain $n$ depth values, $D=\{d_{1},d_{2},...,d_{n}\}$ . Due to a basic fact, the artifacts of tip-sample convolution cause an expansion of the overall topography [10], resulting in the height value of AFM scanning being larger than the actual sample height, equivalent to the smaller depth value. Therefore, we consider a pixel as an artifact when $D_{\text{max}}-d>\phi$ , where d is the measured depth of each pixel, and $\phi$ is set to 3% of the AFM scan size initially and linearly reduced to 1% with iteration, a value determined experimentally and applied consistently across all our experiments. The tiny threshold is set to make the algorithm robust to noises in the AFM data and inaccurate $T$ during the iteration process. These iterative EM-steps reinforce each other. After a fixed number of iterations, five in our experiments, the resolved poses $T$ , and masks $M$ are saved for subsequent steps. For a set of multi-view AFM data, this process takes about 2 minutes.

4.4 Neural Implicit Surface Reconstruction

In the neural implicit surface reconstruction step, we train a multi-resolution hash table with learnable parameters and an MLP neural network named the SDF network by the aligned and masked multi-view AFM data (Supplementary Fig. 5). In the training process, we sample 3D points along the ray $r^{\prime}$ of pixels filtered by mask $M$ . First, the 3D point coordinate is encoded by multiresolution hash technology [54]. Here, we use 16 resolution levels, each obtaining a 2-dimensional feature vector. Concatenating the hash encoding and the 3D coordinate, we get a 35-dimensional feature as input of the SDF network. The SDF network is a one-layer MLP network with 64 hidden sizes and ReLU activation, which maps the input feature to an SDF value at that 3D point. The SDF value of each point is converted to a density value through the unbiased and occlusion-aware weight function proposed by NeuS [41]. Then, the density values of sample points along the ray are accumulated by the differentiable volume rendering method to obtain the depth value $\widehat{d}$ of that ray. The loss function $L$ consists of a depth error term $L_{\text{depth}}$ and a regularization term $L_{\text{reg}}$ :

L_{\text{depth}}=\frac{1}{b}\sum_{p}^{b}(\widehat{d_{p}}-d_{p})^{2},

(1)

L_{\text{reg}}=\frac{1}{bm}\sum_{p,q}^{b,m}\left(\|\mathbf{n}_{pq}\|-1\right)^% {2}.

(2)

$L_{\text{depth}}$ is the mean square error (MSE) between the rendering depth value of each pixel and the AFM data supervision, where $b$ is the batch size and $p$ is the index. The regularization term [52] is used to constrain the SDF field represented by the network, where $\mathbf{n}$ is the normal of the sample point, $m$ is the number of sample points along a ray, and $q$ is the index. $L_{\text{reg}}$ facilitates a smooth and natural surface, commonly used in SDF-based neural implicit surface reconstruction methods.

L=L_{\text{depth}}+\lambda L_{\text{reg}}.

(3)

The weight $\lambda$ is 0.1 in our experiment. During the network training, the Adam optimizer updates the network parameters with a learning rate of 0.001 to minimize the loss and perform 20,000 iterations. In one iteration, we randomly select 256 rays with 1024 sample points along the ray. Notably, Each set of network parameters can only represent a 3D model of one structure, so multi-view AFM images for different samples need to be trained from scratch. The whole training time is about 8 minutes on an Nvidia RTX4090 GPU. To visualize the 3D model, we divide the space into 256 $\times$ 256 $\times$ 256 voxels. Subsequently, the SDF values for each voxel are obtained through neural network inference, followed by the extraction of meshes using the Marching Cube algorithm [55]. Unlike the traditional discrete voxel-based representation [30], which requires the prior determination of a voxel division resolution, neural implicit surface representations do not have a resolution limitation. The network can infer SDF values at any location, enabling the generation of a mesh representation with arbitrary resolution.

4.5 Constructing the Two-photon Lithography Structures

We used a commercial photoresist IP-Dip2 (Nanoscribe GmbH) as our material. The IP-Dip2 was dropped on a glass substrate with a thickness of 170-190 µm (Borosilicate substrates, Nanoscribe GmbH) for fabricating the structures. We used A commercial Direct Laser Writing setup (Photonic Professional GT2, Nanoscribe GmbH) equipped with a 780 nm femtosecond laser (a repetition rate of 80 MHz, a pulse duration of 80-100 fs) and a 63 $\times$ , numerical aperture (NA) = 1.4 oil immersion objective to print the microstructures. We imported the STL files into Describe 2.7 (Nanoscribe GmbH) to generate the executable job files. We set the slice and hatching distances to 0.1 µm for microstructures, the highest accuracy this machine can achieve. These distances were set to 0.3 µm for grid markers because they are only used for optical microscope localization, which has no high requirement for printing accuracy. The printing parameters were set to 30 mW of laser power and 10,000 µm/s scanning speed. Then, we imported the executable job files to Nanowrite 1.8 (Nanoscribe GmbH) to start the job. After the printing process, the printed structures were developed with propylene glycol methyl ether acetate (PGMEA) for 20 minutes and isopropyl alcohol (IPA) for 5 minutes to wash out the unpolymerized resists at room temperature and leave the microstructures on the substrate.

4.6 Constructing the Nanoparticle Samples

PMMA nanosphere dispersion (500 nm) was purchased from the Jiangsu Zhichuan Technology Co., Ltd (China). ZIF-67 powder (300 nm) was purchased from the Nan**g Xianfeng Nano Co., Ltd (China). We used ethanol to dilute these nanoparticles, sonicated them for 10 minutes, and then deposited the suspension onto a 4 mm $\times$ 4 mm mica base. We performed multi-view localization of the particles of interest based on the markers of the mica surface around the region. We used the same view number, tilt angle, AFM scanning mode, and AFM probe as in the TPL experiment. The AFM scan size was 2 µm $\times$ 2 µm for PMMA nanospheres and 1.5 µm $\times$ 1.5 µm for ZIF-67 nanocrystals. We obtained SEM photos of these micro-/nanostructures by sputter-coating samples with platinum by sputtering apparatus (MCIOO, Hitachi) and then observing them with a field-emission scanning electron microscope (GeminiSEM 300, ZEISS).

4.7 Constructing the Simulated Data

The simulated data was generated using the 3D design software Blender [60] 3.3. Within Blender, we constructed models of the structures as well as the AFM probe. To mimic the conditions of our real-world multi-view AFM scanning, we set up orthogonal cameras within the software positioned to align with the scanning directions in our real experiment. Furthermore, to replicate the real-world experimental setup more accurately, we rotated the probe model by 11 ${}^{\circ}$ . This adjustment accounts for the inherent angle between the working cantilever of the AFM holder and the scanning plane in a real AFM system [57, 10]. Next, we orthogonally projected the surface model onto these cameras and performed a convolution of the probe shape to generate simulated AFM images. In the quantitative evaluation of the accuracy of solved poses, we extracted the precise poses of these cameras directly from Blender and evaluated the absolute pose error by an open-source tool EVO [76]. We implemented the TSDF Fusion method based on an open source repository [77] and added support for the orthogonal camera projection model that allows for the fusion of multi-view AFM data. We divided the space into 256 $\times$ 256 $\times$ 256 voxels to keep consistent with the setup of the mesh model extraction step in the neural implicit surface reconstruction. When it comes to evaluating the topography images of the 3D models reconstructed by MVN-AFM, we employ the Mean Absolute Error (MAE) as our metric.

\text{MAE}=\frac{1}{mn}\sum_{i=1}^{m}\sum_{j=1}^{n}\left|\bar{h}_{ij}-h_{ij}% \right|,

(4)

where $n$ denotes the number of multi-view images, $m$ is the pixel number in each image, $\bar{h}$ is the accurate height value of a pixel, and $h$ denotes the value of pixels in raw AFM images or topography images from MVN-AFM.

5 Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

6 Code Availability

The source code of MVN-AFM is available at https://github.com/zju3dv/MVN-AFM.

References

[1] Gates, B. et al. New approaches to nanofabrication: Molding, printing, and other techniques. Chemical Reviews 105, 1171–1196 (2005).
[2] Quake, S. & Scherer, A. From micro- to nanofabrication with soft materials. Science 290, 1536–1540 (2000).
[3] Douglas, S. M., Bachelet, I. & Church, G. M. A logic-gated nanorobot for targeted transport of molecular payloads. Science 335, 831–834 (2012).
[4] Li, S. et al. A DNA nanorobot functions as a cancer therapeutic in response to a molecular trigger in vivo. Nature Biotechnology 36, 258+ (2018).
[5] Gratton, S. E. A. et al. The effect of particle design on cellular internalization pathways. Proceedings of the National Academy of Sciences 105, 11613–11618 (2008).
[6] Wang, J., Byrne, J. D., Napier, M. E. & DeSimone, J. M. More effective nanomedicines through particle design. small 7, 1919–1931 (2011).
[7] Seiler, H. Secondary-electron emission in the scanning electron-microscope. Journal of Applied Physics 54, R1–R18 (1983).
[8] Egerton, R., Li, P. & Malac, M. Radiation damage in the TEM and SEM. Micron 35, 399–409 (2004).
[9] Binnig, G., Quate, C. F. & Gerber, C. Atomic force microscope. Physical Review Letters 56, 930–933 (1986).
[10] Golek, F., Mazur, P., Ryszka, Z. & Zuber, S. AFM image artifacts. Applied Surface Science 304, 11–19 (2014).
[11] Velegol, S., Pardi, S., Li, X., Velegol, D. & Logan, B. AFM imaging artifacts due to bacterial cell height and AFM tip geometry. Langmuir 19, 851–857 (2003).
[12] Westra, K., Mitchell, A. & Thomson, D. Tip artifacts in atomic-force microscope imaging of thin-film surfaces. Journal of Applied Physics 74, 3608–3610 (1993).
[13] Martin, Y. & Wickramasinghe, H. K. Method for imaging sidewalls by atomic-force microscopy. Applied Physics Letters 64, 2498–2500 (1994).
[14] Orji, N. G. & Dixson, R. G. Higher order tip effects in traceable CD-AFM-based linewidth measurements. Measurement Science and Technology 18, 448–455 (2007).
[15] Thiesler, J., Tutsch, R., Fromm, K. & Dai, G. True 3D-AFM sensor for nanometrology. Measurement Science and Technology 31, 074012 (2020).
[16] Geng, J., Zhang, H., Meng, X., Rong, W. & Xie, H. Sidewall imaging of microarray-based biosensor using an orthogonal cantilever probe. IEEE Transactions on Instrumentation and Measurement 70, 1–8 (2021).
[17] Nguyen, C. et al. Carbon nanotube scanning probe for profiling of deep-ultraviolet and 193 nm photoresist patterns. Applied Physics Letters 81, 901–903 (2002).
[18] Cho, S.-J. et al. Three-dimensional imaging of undercut and sidewall structures by atomic force microscopy. Review of Scientific Instruments 82 (2011).
[19] Kizu, R., Misumi, I., Hirai, A., Kinoshita, K. & Gonda, S. Development of a metrological atomic force microscope with a tip-tilting mechanism for 3D nanometrology. Measurement Science and Technology 29, 075005 (2018).
[20] Xie, H., Hussain, D., Yang, F. & Sun, L. Development of three-dimensional atomic force microscope for sidewall structures imaging with controllable scanning density. IEEE/ASME Transactions on Mechatronics 21, 316–328 (2016).
[21] Wu, J.-W. et al. Effective tilting angles for a dual probes AFM system to achieve high-precision scanning. IEEE/ASME Transactions on Mechatronics 21, 2512–2521 (2016).
[22] Xie, H., Hussain, D., Yang, F. & Sun, L. Atomic force microscope caliper for critical dimension measurements of micro and nanostructures through sidewall scanning. Ultramicroscopy 158, 8–16 (2015).
[23] Zhao, X., Fu, J., Chu, W., Nguyen, C. & Vorburger, T. V. An image stitching method to eliminate the distortion of the sidewall in linewidth measurement. In Metrology, Inspection, and Process Control for Microlithography XVIII, vol. 5375, 363–373 (2004).
[24] Pan, S.-P., Liou, H.-C., Chen, C.-C. A., Chen, J.-R. & Liu, T.-S. Precision measurement of sub-50 nm linewidth by stitching double-tilt images. Japanese Journal of Applied Physics 49, 06GK06 (2010).
[25] Kawata, S., Sun, H., Tanaka, T. & Takada, K. Finer features for functional microdevices - micromachines can be created with higher resolution using two-photon absorption. Nature 412, 697–698 (2001).
[26] Jaiswal, A. et al. Two decades of two-photon lithography: Materials science perspective for additive manufacturing of 2D/3D nano-microstructures. Iscience 26 (2023).
[27] Li, J. & Pumera, M. 3D printing of functional microrobots. Chemical Society Reviews 50, 2794–2838 (2021).
[28] Dabbagh, S. R. et al. 3D-printed microrobots from design to translation. Nature Communications 13 (2022).
[29] Jun, Y.-w., Choi, J.-s. & Cheon, J. Shape control of semiconductor and metal oxide nanocrystals through nonhydrolytic colloidal routes. Angewandte Chemie International Edition 45, 3414–3439 (2006).
[30] Izadi, S. et al. Kinectfusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proceedings of the 24th annual ACM symposium on User interface software and technology, 559–568 (2011).
[31] Curless, B. & Levoy, M. A volumetric method for building complex models from range images. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 303–312 (1996).
[32] Nießner, M., Zollhöfer, M., Izadi, S. & Stamminger, M. Real-time 3D reconstruction at scale using voxel hashing. ACM Transactions on Graphics (ToG) 32, 1–11 (2013).
[33] Xie, Y. et al. Neural fields in visual computing and beyond. Computer Graphics Forum 41, 641–676 (2022).
[34] Weder, S., Schonberger, J. L., Pollefeys, M. & Oswald, M. R. NeuralFusion: Online depth fusion in latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3162–3172 (2021).
[35] Li, K., Tang, Y., Prisacariu, V. A. & Torr, P. H. BNV-Fusion: Dense 3D reconstruction using bi-level neural volume fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6166–6175 (2022).
[36] Mildenhall, B. et al. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65, 99–106 (2021).
[37] Oechsle, M., Peng, S. & Geiger, A. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5589–5599 (2021).
[38] Yariv, L. et al. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems 33, 2492–2502 (2020).
[39] Yariv, L., Gu, J., Kasten, Y. & Lipman, Y. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems 34, 4805–4815 (2021).
[40] Sucar, E., Liu, S., Ortiz, J. & Davison, A. J. iMAP: Implicit map** and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6229–6238 (2021).
[41] Wang, P. et al. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021).
[42] Bettencourt, A. & Almeida, A. J. Poly(methyl methacrylate) particulate carriers in drug delivery. Journal of microencapsulation 29, 353–367 (2012).
[43] Tang, E., Cheng, G., Pang, X., Ma, X. & Xing, F. Synthesis of nano-ZnO/poly(methyl methacrylate) composite microsphere through emulsion polymerization and its UV-shielding property. Colloid and Polymer Science 284, 422–428 (2006).
[44] Zhu, A., Shi, Z., Cai, A., Zhao, F. & Liao, T. Synthesis of core-shell PMMA-SiO2 nanoparticles with suspension-dispersion-polymerization in an aqueous system and its effect on mechanical properties of PVC composites. Polymer testing 27, 540–547 (2008).
[45] Zhong, G., Liu, D. & Zhang, J. The application of ZIF-67 and its derivatives: Adsorption, separation, electrochemistry and catalysts. Journal of Materials Chemistry A 6, 1887–1899 (2018).
[46] Qian, J., Sun, F. & Qin, L. Hydrothermal synthesis of zeolitic imidazolate framework-67 (ZIF-67) nanocrystals. Materials Letters 82, 220–223 (2012).
[47] Wang, L. et al. Flexible solid-state supercapacitor based on a metal-organic framework interwoven by electrochemically-deposited PANI. Journal of the American Chemical Society 137, 4920–4923 (2015).
[48] Yang, J. et al. Hollow Zn/Co ZIF particles derived from core-shell ZIF-67@ZIF-8 as selective catalyst for the semi-hydrogenation of acetylene. Angewandte Chemie-international Edition 54, 10889–10893 (2015).
[49] Rusinkiewicz, S. & Levoy, M. Efficient variants of the ICP algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, 145–152 (2001).
[50] Do, C. B. & Batzoglou, S. What is the expectation maximization algorithm? Nature Biotechnology 26, 897–899 (2008).
[51] Moon, T. The expectation-maximization algorithm. IEEE Signal Processing Magazine 13, 47–60 (1996).
[52] Gropp, A., Yariv, L., Haim, N., Atzmon, M. & Lipman, Y. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099 (2020).
[53] Hecht-Nielsen, R. Theory of the backpropagation neural network. In Neural networks for perception, 65–93 (1992).
[54] Müller, T., Evans, A., Schied, C. & Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 1–15 (2022).
[55] Lorensen, W. E. & Cline, H. E. Marching cubes: A high resolution 3D surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field, 347–353 (1998).
[56] Selimis, A., Mironov, V. & Farsari, M. Direct laser writing: Principles and materials for scaffold 3D printing. Microelectronic Engineering 132, 83–89 (2015).
[57] Shen, J., Zhang, D., Zhang, F.-H. & Gan, Y. AFM tip-sample convolution effects for cylinder protrusions. Applied Surface Science 422, 482–491 (2017).
[58] Lee, J. H. et al. Electrically pumped sub-wavelength metallo-dielectric pedestal pillar lasers. Optics Express 19, 21524–21531 (2011).
[59] Chaubey, S. K. & Jain, N. K. State-of-art review of past research on manufacturing of meso and micro cylindrical gears. Precision engineering 51, 702–728 (2018).
[60] Community, B. O. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam. http://www.blender.org (2018).
[61] Reis, C. P., Neufeld, R. J., Ribeiro, A. J. & Veiga, F. Nanoencapsulation I. Methods for preparation of drug-loaded polymeric nanoparticles. Nanomedicine: Nanotechnology, Biology and Medicine 2, 8–21 (2006).
[62] Saliba, D., Ammar, M., Rammal, M., Al-Ghoul, M. & Hmadeh, M. Crystal growth of ZIF-8, ZIF-67, and their mixed-metal derivatives. Journal of the American Chemical Society 140, 1812–1823 (2018).
[63] Nordin, N. A. H. M., Ismail, A. F., Mustafa, A., Murali, R. S. & Matsuura, T. The impact of ZIF-8 particle size and heat treatment on CO 2/CH 4 separation using asymmetric mixed matrix membrane. RSC Advances 4, 52530–52541 (2014).
[64] Xia, Y., Xiong, Y., Lim, B. & Skrabalak, S. E. Shape-controlled synthesis of metal nanocrystals: simple chemistry meets complex physics? Angewandte Chemie International Edition 48, 60–103 (2009).
[65] Amyot, R. & Flechsig, H. BioAFMviewer: An interactive interface for simulated AFM scanning of biomolecular structures and dynamics. PLoS computational biology 16 (2020).
[66] Sitzmann, V., Martel, J., Bergman, A., Lindell, D. & Wetzstein, G. Implicit neural representations with periodic activation functions. Advances in neural information processing systems 33, 7462–7473 (2020).
[67] Chen, Y., Liu, S. & Wang, X. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8628–8638 (2021).
[68] Pumarola, A., Corona, E., Pons-Moll, G. & Moreno-Noguer, F. D-NeRF: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10318–10327 (2021).
[69] Uchihashi, T., Kodera, N. & Ando, T. Guide to video recording of structure dynamics and dynamic processes of proteins by high-speed atomic force microscopy. Nature Protocols 7, 1193–1206 (2012).
[70] Zhou, Q.-Y., Park, J. & Koltun, V. Open3D: A modern library for 3D data processing. arXiv:1801.09847 (2018).
[71] Guo, Y.-C. Instant neural surface reconstruction. Github. https://github.com/bennyguo/instant-nsr-pl (2022).
[72] Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
[73] Markiewicz, P. & Goh, M. Identifying locations on a substrate for the repeated positioning of AFM samples. Ultramicroscopy 68, 215–221 (1997).
[74] Abu Quba, A. A., Schaumann, G. E., Karagulyan, M. & Diehl, D. A new approach for repeated tip-sample relocation for AFM imaging of nano and micro sized particles and cells in liquid environment. Ultramicroscopy 211 (2020).
[75] Liu, Z. et al. Mechanically engraved mica surface using the atomic force microscope tip facilitates return to a specific sample location. Microscopy research and technique 66, 156–162 (2005).
[76] Grupp, M. evo: Python package for the evaluation of odometry and SLAM. Github. https://github.com/MichaelGrupp/evo (2017).
[77] Zeng, A. et al. Volumetric TSDF Fusion of RGB-D images in python. Github. https://github.com/andyzeng/tsdf-fusion-python (2017).

7 Acknowledgements

We thank A. Ren, L. Ma, and C. Wu for assistance in the two-photon lithography experiment. We are grateful to Y. Wang, J. Tang, and M. Duan for helpful discussions. We also thank the staff of the Analysis Center of Agrobiology and Environmental Sciences, Zhejiang University, for their support in SEM imaging. This work was partially supported by the National Natural Science Foundation of China (No.61932003 received by G.Z., No.51975522 and No.U22A20207 received by Y.-L.C.).

8 Author Contributions

S.C. conceived the idea and proposed this project; S.C. and M.P. performed experiments; S.C. wrote code and processed data; S.C. and Y.L. wrote the draft of the manuscript, and all co-authors proofread and revised the manuscript; G.Z., Y.-L.C., H.B., and B.-F.J. provided valuable suggestions including the experiment design and writing. G.Z. and Y.-L.C. supervised this project, including the framework design and improvement.

9 Competing Interests

The authors declare no competing interests.

Multi-View Neural 3D Reconstruction of Micro-/Nanostructures
with Atomic Force Microscopy
Supplementary Information

L1 Chamfer (µm)	Cylinder	Undercut	Gear	Spiral	Monkey	House	Average	Time (hour)
2-tilt	0.0177	0.0223	0.0364	0.0199	0.0371	0.0537	0.0312	$\approx$ 0.5
4-tilt	0.0116	0.0120	0.0175	0.0151	0.0238	0.0231	0.0172	$\approx$ 1
8-tilt	0.0106	0.0111	0.0133	0.0109	0.0193	0.0189	0.0140	$\approx$ 2
16-tilt	0.0107	0.0106	0.0132	0.0118	0.0180	0.0181	0.0137	$\approx$ 4

Table 1: Reconstruction error and the scanning time for different numbers of AFM tilt scans. The table shows the L1 chamfer distance (lower is better) between the ground truth models and the models reconstructed by different numbers of tilt scans using MVN-AFM. The 8-tilt view is the parameter in our experiments. The time column demonstrates the approximate time required to complete different numbers of AFM tilt scans in real experiments, including the time for AFM scanning and switching between different viewpoints. However, considering the slow speed of AFM scanning, the number of views cannot be increased indefinitely. For structures in TPL experiments, there is no considerable reduction in the reconstruction error between 8-tilt and 16-tilt scans, but this would increase the data acquisition time by 2 hours. Therefore, our real-world experiments practiced eight tilt scans for each sample.

Multi-View Neural 3D Reconstruction of Micro-/Nanostructures with Atomic Force Microscopy