Automated forest inventory: analysis of high-density airborne LiDAR point clouds with 3D deep learning

Binbin Xiang [email protected] Maciej Wielgosz Theodora Kontogianni Torben Peters Stefano Puliti Rasmus Astrup Konrad Schindler Photogrammetry and Remote Sensing, ETH Zürich, 8093 Zürich, Switzerland
Norwegian Institute of Bioeconomy Research (NIBIO), 1433 Ås, Norway

Abstract

Detailed forest inventories are critical for sustainable and flexible management of forest resources, to conserve various ecosystem services. Modern airborne laser scanners deliver high-density point clouds with great potential for fine-scale forest inventory and analysis, but automatically partitioning those point clouds into meaningful entities like individual trees or tree components remains a challenge. The present study aims to fill this gap and introduces a deep learning framework Δ=Δ Δ=Δ Δ=Δ Δ=Δ , termed ForAINet, that is able to perform such a segmentation across diverse forest types and geographic regions. From the segmented data, we then derive relevant biophysical parameters of individual trees as well as stands. The system has been tested on FOR-Instance, a dataset of point clouds that have been acquired in five different countries using surveying drones. The segmentation back-end achieves over 85% F-score for individual trees, respectively over 73% mean IoU across five semantic categories: ground, low vegetation, stems, live branches and dead branches. Building on the segmentation results our pipeline then densely calculates biophysical features of each individual tree (height, crown diameter, crown volume, DBH, and location) and properties per stand (digital terrain model and stand density). Especially crown-related features are in most cases retrieved with high accuracy, whereas the estimates for DBH and location are less reliable, due to the airborne scanning setup.

keywords:

automated forestry inventory, individual tree segmentation, high density ALS point cloud, 3D semantic segmentation, individual tree component segmentation

^†^†journal: Remote Sensing of Environment

Δ=Δ

1 Introduction

Forests offer multiple ecosystem functions and services such as timber production, carbon sequestration, valuable habitats to safeguard biodiversity, and recreation (Li et al., 2023; Maes et al., 2023). Forest inventories are intended to provide the required information to support sustainable, multi-functional forest management, so as to ensure the continued provision of those services. Up-to-date and fine-scale (i.e., tree level) information represents a cornerstone for the transition towards a small-scale, targeted, adaptive, and multi-functional management.

Collecting and maintaining forest inventories was one of the first applications to adopt airborne laser scanning (ALS) as an operational tool, at local (Næsset et al., 2004), regional and national scales (Kangas et al., 2018). The two main approaches to extract forest inventory data from Light Detection and Ranging (LiDAR) point clouds are the area-based approach (ABA, Næsset, 2002) and individual tree detection (ITD, Hyyppa et al., 2001). ABA operates by estimating statistical indicators from the points within a given area (e.g., a grid cell). In contrast, ITD (also referred to as single tree inventory) relies on the detection and segmentation of individual trees in the point cloud (Persson et al., 2002) Δ=Δ Δ=Δ Δ=Δ Δ=Δ , and consequently requires data at higher resolution (Coomes et al., 2017). Several methods have been developed for ITD, whose accuracies differ depending on the forest conditions and the point density (Vauhkonen et al., 2011; Kaartinen et al., 2012). Most ITD algorithms operate on a rasterised canopy height model (CHM) derived from the highest ALS returns, representing the canopy top layer. Consequently, they are limited to the dominant canopy trees and tend to miss an important share of intermediate and understory trees. Despite the large body of research on the topic, the operational uptake of ITD inventories has so far been limited, due to (1) the bias introduced by the relatively low detection rates, caused by the omission of non-dominant trees, and (2) a lack of transferability across different forest conditions and datasets.

Δ=Δ Δ=Δ Δ=Δ Δ=Δ Traditionally, low-resolution ALS-based inventories have been based on rather sparse point clouds (0.5 – 5 pts/m²) for ABA, and slightly denser for ITD (10 – 20 pts/m²). With the rapid development of sensor technology, very high density ALS (ALS-HD) has become available Δ=Δ Δ=Δ Δ=Δ Δ=Δ , with point cloud densities ranging from 500 up to 10,000 pts/m², offering new opportunities (Kellner et al., 2019; Puliti et al., 2022). Δ=Δ Δ=Δ Δ=Δ Δ=Δ While also collected with airborne platforms such as drones (as in the present study) or helicopters (Hyyppä et al., 2022; Persson et al., 2022), ALS-HD data differ from conventional ALS data in that they are acquired from lower altitude (and at lower flight speed), and usually with larger overlap between adjacent flight strips. Despite covering smaller geographical regions (e.g., forest stands), ALS-HD represents an emerging alternative data source for ITD inventories due to their (for airborne data) unprecedented level of detail, closing the gap between large-area ALS and high-resolution terrestrial laser scanning (TLS).

The increased amount of 3D structural detail, with distinctively visible individual trees throughout the vertical canopy profile, has sparked renewed interest in the pursuit of single tree inventories, and possibly even completely airborne inventories based on dense retrievals of the variables of interest from the point clouds (Liang et al., 2019; Puliti et al., 2020). In addition to the measurement of common forestry variables like the diameter at breast height (DBH, Kuželka et al., 2020) and tree biomass (Brede et al., 2019), recent research has highlighted the potential to derive a suite of useful tree analytics and insights from per-tree ALS-HD data, such as species (Hakula et al., 2023), stem curve (Hyyppä et al., 2022), growth trajectory (Puliti et al., 2022), and breeding-related traits (du Toit et al., 2023). The promise of high-resolution scans for ITD is evident, but there is a bottleneck: to measure the properties of each tree, one first needs a sufficiently accurate and transferable method to segment the raw scan data into individual trees.

Parallel to the rise of high-density ALS, we have seen an explosion of interest in deep machine learning algorithms for processing 3D point clouds in general, and forest scans in particular (Table 1). Over the last decade, the superior capabilities of deep neural networks when it comes to representing and analysing structured data have brought about transformative changes across a range of science and engineering disciplines (Wang et al., 2023a). Deep learning also holds great promise for tree segmentation, and indeed there has been a recent trend to adopt it for the ongoing efforts towards automated forest inventories (Xi et al., 2018; Chen et al., 2021b; Krisanski et al., 2021b; Sun et al., 2022; Chang et al., 2022; Wang and Bryson, 2023; Zhang et al., 2023; Kim et al., 2023; Jiang et al., 2023; Wielgosz et al., 2023; Straker et al., 2023). Δ=Δ Δ=Δ Δ=Δ Δ=Δ The review in Table 1 indicates that most methods either apply 2D CNNs on CHM or use PointNet++ (Qi et al., 2017) to directly process point clouds. Networks that operate on the CHM share the limitations of traditional 2.5D ITD algorithms, i.e., they disregard understory trees. The seminal PointNet++ architecture does consider all scan points, including those below the main canopy. But new network designs, particularly sparse 3D convolutional networks (Choy et al., 2019), offer higher accuracy as well as computational advantages, making them more suitable for high-density point clouds.

Table 1: Representative deep learning based methods for single and multiple tasks to segment trees based on point clouds

Task	Highlights of method	Remarks	References
Single task: individual tree crown segmentation (main output is individual tree crown delineation or crown width)
CHM based	(1) Deep learning method to recognize trees and then use height-related gradient information to accomplish individual tree crown delineation.	Data collected by Unmanned aerial vehicle (UAV) LiDAR in Chizhou City, China; Data not open; Code not open.	(Chen et al., 2021b)
	(2) Deep learning object detection algorithm to identify individual tree crowns in height maps generated from point cloud.	Data collected by ALS in Nan**g, China; Data open (Contact the author); Code not open.	(Sun et al., 2022)
	(3) CHM based segmentation by using YOLOv5 network.	FOR-instance dataset; Code open.	(Straker et al., 2023)
Single task: individual tree segmentation (main output are point-wise instance IDs)
CHM based	(4) RandLAnet to remove non tree points. 2D detection based deep learning for tree instance segmentation and post processing refinement.	Data collected by TLS in Evo, Finland and Guigang, China; Other output is tree position, tree height, tree DBH, crown diameter; Data not open; Code not open.	(Chang et al., 2022)
Point based	(5) Top-down instance segmentation deep learning and self-adaptive mean shift clustering.	Data collected by ALS in Washington, U.S. and Bretten, Germany; Data open; Code not open.	(Zhang et al., 2023)
Single task: individual tree component segmentation (main output are point-wise semantic labels)
Deep learning based	(6) FCN for classify wood, branch and others.	Data collected by TLS in Canada; Data not open; Code not open.	(Xi et al., 2018)
	(7) Classify terrain, vegetation, coarse woody debris (CWD) and stem based on PointNet++.	Data collected by TLS, ALS, MLS and UAV-based aerial photogrammetry (UAS_AP) in Australia and New Zealand; Other output is DTM; Data open (contact the author); Code open.	(Krisanski et al., 2021b)
	(8) Classify ground, understorey, tree stem, tree foliage based on RandLA-Net.	Data collected by backpack laser scanner; Data not open; Code not open.	(Kaijaluoto et al., 2022)
	(9) Similar to (7). PointNet++ model for segmenting the canopy, trunk, and branches of tree.	Data collected by TLS in Korea; Data not open; Code not open.	(Kim et al., 2023)
	(10) Similar to (7). Leaf-wood separation network.	Data collected by TLS in eastern Cameroon; Data open; Code not open.	(Jiang et al., 2023)
Hybrid features based	(11) Recurrent Neural Networks (RNNs) that directly estimates the geometric parameters of individual tree stems.	Data collected by ALS and TLS in Australia and New Zealand; Data not open; Code not open.	(Wang and Bryson, 2023)
Multiple tasks
	(12) Step by step methods. At first ground remove; raster based faster RCNN for individual tree segmentation; 3D FCN for stem segmentation.	Data collected by TLS in Australia; Tasks include DTM generation, individual tree segmentation, stem points extraction and stem reconstruction; Data not open; Code not open.	(Windrim and Bryson, 2020)
	(13) PointNet++ for semantic segmentation. The remaining vegetation points are assigned to be the same tree as the nearest cylinder measurement in X and Y coordinates.	Data collected by TLS in Western Australia; Tasks include DTM generation, semantic segmentation (terrain, vegetation, CWD, stems), individual tree segmentation and tree attributes calculation (height, DBH); Data not open; Code open.	(Krisanski et al., 2021a)
	(14) PointNet++ semantic segmentation; Graph-based approach for instance segmentation.	Data collected by MLS in south-east Norway; Tasks include semantic segmentation (vegetation, terrain, stem, CWD), individual tree segmentation and derivation of attributes (height); Data not open; Code open.	(Wielgosz et al., 2023)

Past research has predominantly focused on semantic segmentation tasks such as segmenting leaf and wood points or separating points on trees from those on understory (Krisanski et al., 2021b). Comparatively few studies look at tree instance segmentation (Windrim and Bryson, 2020; Krisanski et al., 2021a), and those mostly rely on sequences of ad-hoc steps (possibly including deep learning modules), which makes it difficult to transfer them to a new location or sensor setup, due to the effort needed to correct set the hyper-parameters for every step of the sequence, including frequent dependencies between the settings of different steps (Wielgosz et al., 2023).

Most previous studies about the analysis forest scans have focused on one specific sub-task, for instance only the localisation of individual trees, or the segmentation of tree crowns, or the segmentation of a given, individual tree into its components (Table 1). Individual tree localisation aims to identify a representative spatial location, typically the centre coordinates of the stem or crown (Zhang et al., 2019; de Paula Pires et al., 2022). Generally, the analysis of (static or mobile) TLS data tends to extract stem centres, and stem attributes like the DBH (Zhang et al., 2019; de Paula Pires et al., 2022); whereas in ALS data (from drones as well as manned aircraft) it is common to look for the crown centre (Zörner et al., 2018). Crown segmentation aims to delineate individual tree crowns, often based on 2D projections onto the ground, and to derive crown attributes such as the area, width or overlap (Strîmbu and Strîmbu, 2015; Dalponte and Coomes, 2016). Individual tree segmentation refers to finding all points that belong to the same tree instance (including crown, stem and isolated branches) (Dai et al., 2018; Yan et al., 2020), whereas tree component segmentation consists in predicting fine-grained semantic labels (e.g., wood, foliage, stem) for all points of an individual tree (Xu et al., 2021; Xi et al., 2018).

Clearly, many or all of the mentioned tasks must be solved to meet the requirements of an inventory that supports forest management. A few studies have proposed pipelines that greedily carry out several tasks step by step (Wielgosz et al., 2023; Windrim and Bryson, 2020), or have collected existing solutions for individual tasks into a common software package (Roussel et al., 2020). See Table 1. We note that different segmentation tasks (tree vs. non-tree, individual tree instances, per-instance components) are inherently related to each other. Solving them simultaneously may offer significant synergies and result in better performance, while at the same time reducing the computational overhead. On the other hand, such a shared model is particularly demanding with respect to the underlying representation, as it must be universal enough to support multiple different tasks.

Δ=Δ Δ=Δ Δ=Δ Δ=Δ Data augmentation synthetically enhances the diversity of the training data and prevents overfitting, thus improving model accuracy without the need to collect additional field data (Chen et al., 2021b; Zhang et al., 2023). The methods in Table 1 usually employ basic data augmentation techniques like rotation, mirroring and coordinate jittering (Chen et al., 2021b; Krisanski et al., 2021b; Kaijaluoto et al., 2022). Yet, we are not aware of data augmentation strategies that take into account the specific characteristics of LiDAR point clouds captured in forests.

Another significant barrier to the advancement of automated forest inventory tools has been the lack of large reference datasets. Table 1 highlights the limited availability of open data that, moreover, often provide only semantic category labels but not tree instance labels. The scarcity of labeled data for supervised learning presents a bottleneck for the use of deep learning. The emergence of comprehensive labeled datasets like FOR-Instance is an important prerequisite to enable 3D deep learning in support of forest inventories.

In the present work, we consolidate data-driven segmentation and structuring of forest point clouds into a single, integrated deep learning framework. We argue that a key ingredient for an efficient, automatic airborne forest inventory and management system is a comprehensive segmentation engine Δ=Δ Δ=Δ Δ=Δ Δ=Δ , tuned specifically to the data characteristics of forest scans. In this way, reference data for different tasks can be jointly used to supervise the model training, so as to obtain a generic, versatile representation of point patterns that are informative about trees. Besides its proven abilities in terms of representation learning, the deep learning paradigm also offers end-to-end training from raw point cloud data, and thus a single, small set of hyper-parameters to tune the model to new forest and/or sensor characteristics. In our view, the segmentation problem is the critical part of the analysis. Once it has been solved with good accuracy, biophysical variables like tree height, DBH and crown attributes can be extracted with dedicated geometric algorithms, thus ensuring transparency and interpretability of the retrieval system. To validate the proposed approach, in the present study we:

1.

Develop a deep network Δ=Δ Δ=Δ Δ=Δ Δ=Δ , ForAINet, that ingests ALS-HD data and jointly performs semantic segmentation at plot/stand level (dominant canopy trees / intermediate and understory trees / low vegetation / ground), tree instance segmentation, and semantic component segmentation at tree level (stem / live branches / dead branches).
2.

Complement the segmentation back-end with geometric algorithms to retrieve several important biophysical attributes from the segmentation results, including both individual tree attributes (height, crown diameter, crown volume, DBH, location), and plot-level attributes (stand density and terrain model).
3.

Experimentally validate the performance on a database containing multiple plots from forests in different geographic regions, with varying characteristics. We also compare to a widely used point cloud processing software, and find that the proposed deep learning approach greatly outperforms that baseline.

All data used in our study is publicly available, and our source code will be made available under a permissive license. We hope that our work, together with the growing access to high-density LiDAR observations, will contribute to accelerating the development of automatic forestry inventories.

2 Materials and methods

2.1 Dataset

Refer to caption — Figure 1: Illustration of our segmentation and retrieval framework. Δ=Δ Δ=Δ Δ=Δ Δ=Δ It operates in two steps: The first step segments points into semantic categories as well as individual trees, for details see Figure 2 and Section 2.2. The second step retrieves tree parameters and stand structure from the segmentation results, see Section 2.3.

Table 2: Δ=Δ Δ=Δ Δ=Δ Δ=Δ Characteristics of the FOR-Instance dataset in different geographic regions.

Region name

Forest type

Name of tree species

Sensor

Number of plot

Number of trees

Average point density (pts/m

{}^{2}

)

Country

Train

Eval

Test

CULS

Coniferous dominated temperate forest

Pinus silvestris

Riegl VUX-1 UAV

2585

Czech Republic

NIBIO

Coniferous dominated boreal forest

Picea abies (dominated)

Pinus Sylvestris (few)

Betula pendula (few)

Riegl MiniVUX-1 UAV

575

9529

Norway

RMIT

Native dry sclerophyll eucalypt forest

Eucalyptus pulchella

Riegl MiniVUX-1 UAV

223

498

Australia

SCION

Non-native pure coniferous temperate forest

Pinus radiata

Riegl MiniVUX-1 UAV

135

4576

New Zealand

TUWIEN

Deciduous dominated alluvial forest

Deciduous species

Riegl VUX-1 UAV

150

1717

Austria

The FOR-Instance dataset (Puliti et al., 2023) is a recently published, machine learning-ready collection of 3D point clouds with manually annotated per-point semantic class labels and tree IDs. The dataset also has DBH values acquired by field measurement for selected trees. The point clouds were collected with drones and helicopters equipped with survey-grade laser scanners (Riegl VUX-1 UAV and Mini-VUX) at 67 plots across various geographic regions and forest types. Δ=Δ Δ=Δ Δ=Δ Δ=Δ For our study, points with reference label outpoints, corresponding to incomplete or partially observed trees along plot borders, were removed due to the absence of instance labels for them. This leaves us with five semantic categories: low vegetation, ground, stem points, live branches and dead branches, see Figure 1. Points on stems, live branches and dead branches are considered tree points, the other two categories are non-tree points. The data is split into 42 training plots, 14 validation plots and 11 test plots, in such a way that all five geographic regions are present in both the training and the test portions.¹¹1The NIBIO2 region of FOR-Instance had to be excluded from the quantitative experiments, because it lacks DBH reference data, and because a significant number of understory trees are Δ=Δ Δ=Δ Δ=Δ Δ=Δ not annotated in the reference. Δ=Δ Δ=Δ Δ=Δ Δ=Δ Table 2 lists important characteristics of the FOR-Instance dataset, separately per geographic region. For further information about the data we refer the reader to Puliti et al. (2023).

One specific challenge of the FOR-Instance data is a strong geographic imbalance: while the NIBIO forest region comprises 37 training plots, CULS, RMIT, and TUWIEN each only feature one training plot. The different regions vary greatly in terms of tree species composition and terrain. For example, the CULS region is predominantly made up of Pinus sylvestris, whereas TUWIEN mainly consists of deciduous trees. These species differences result in a wide range of crown shapes and of the amount of live branches. Furthermore, the regions exhibit marked differences in point density. Particularly NIBIO is sparser than the other regions near and on the ground. Also, some regions (including NIBIO) have pronounced topography, whereas others are fairly flat. Finally, FOR-Instance features realistic, complex tree structures, including bent stems, occluded or sparsely sampled stems, and fallen trees lying on the ground.

2.2 Deep learning framework for multiple segmentation tasks

Our forest inventory system consists of two parts, see Figure 1. First, a deep neural network labels the individual 3D points, simultaneously assigning semantic labels and instance IDs, using a shared feature extraction backbone Δ=Δ Δ=Δ Δ=Δ Δ=Δ (Figure 2). Δ=Δ Δ=Δ Δ=Δ Δ=Δ For brevity, we call that component ForAINet, short for Forest Automatic Inventory Neural Network. In a second step, learning-free geometric methods operate on the segmented data to retrieve a suite of forestry-related biophysical variables at the per-tree and per-stand levels.

2.2.1 Δ=Δ Δ=Δ Δ=Δ Δ=Δ Data augmentation and balancing strategies

The input to the segmentation network is a point set $P\in\mathbb{R}^{N\times C}$ , where $N$ is the number of points, and $C$ is the dimension of the per-point attributes. In our base setting, the attributes are only the $x$ , $y$ , $z$ coordinates, with the origin at the point cloud’s centre of gravity. If further useful information is available, e.g., additional sensor readings or forest-specific hand-crafted descriptors (Wang, 2020), it can be appended as additional attributes. For the presents study, we have explored the point-wise intensity, the return number and the scan angle rank, as given by the FOR-Instance dataset. Moreover, we have experimented with Δ=Δ Δ=Δ Δ=Δ Δ=Δ traditional, hand-crafted geometric features derived from the eigenvalues of the local second moment matrix, including their sum, omnivariance, eigenentropy, anisotropy, planarity, linearity, surface variation, sphericity (Hackel et al., 2016) and verticality (Guinard and Landrieu, 2017) Δ=Δ Δ=Δ Δ=Δ Δ=Δ , c.f. Section 2.5. For efficiency, the entire point cloud is voxel grid subsampled Δ=Δ Δ=Δ Δ=Δ Δ=Δ to a single, randomly chosen point per voxel, so as to thin out overly dense regions and achieve a homogeneous (maximum) point density. The filter voxel size is set to 20 $\times$ 20 $\times$ 20 cm³, respectively 125 pts/m³.

Training samples are drawn following a class-balanced random sampling scheme: 3D points are sampled with probability $P_{i}$ inversely proportional to the square root of the class frequency $N_{i}$ ( $P_{i}\propto\sqrt{1/N_{i}}$ ) Δ=Δ Δ=Δ Δ=Δ Δ=Δ , where the subscript $i$ indicates the five different semantic classes. The sampled point defines the axis of a vertical cylinder of fixed radius, and the set of all points within the cylinder forms one training sample. Δ=Δ Δ=Δ Δ=Δ Δ=Δ In our implementation the radius is set to 8 m, enough to ensure almost every tree in the FOR-Instance data is contained in a single cylinder (for other biomes this value may have to be adapted).

In addition to the class-balanced sampling, we also experimented with three additional balancing strategies Δ=Δ Δ=Δ Δ=Δ Δ=Δ , in order to determine and adopt the most effective strategy. Class weighting assigns weights inversely proportional to the square root of the per-class point count when computing the (semantic) segmentation loss, such that classes that are rare within a given sample receive higher weights. Height weighting assigns higher weights to low points (small $z$ -coordinate), to reflect the uneven scan density along the vertical. The height values are normalized by their mean and the weight is inversely proportional to the logarithm of that height. Finally, for more balanced sampling across different regions, we test a region weighting scheme, where each geographic region is assigned a sampling probability, calculated from the region-wide class frequencies with the same inverse square root scheme as for class balancing.

For each sampled cylinder region, various data augmentation techniques are applied during training: additive Gaussian random noise on the point coordinates (jittering), random rotations around the cylinder axis, random anisotropic scaling by factors $s\in[0.9,1.1]$ , and random reflection along the $y$ -axis. In addition, Δ=Δ Δ=Δ Δ=Δ Δ=Δ a subsampling strategy is implemented for all input points: for every training sample, an unbiased coin flip (50% probability) decides whether to apply Δ=Δ Δ=Δ Δ=Δ Δ=Δ subsampling; if yes, then 40% of the points in the sample are randomly discarded. That procedure aims to boost robustness against occlusions and missing points, especially for parts obscured by foliage and for the part near the ground, which tends to be sparser in ALS point clouds. Moreover, Δ=Δ Δ=Δ Δ=Δ Δ=Δ elastic deformation is applied, with the aim to enhance the recognition of curved tree stems. To that end, a field of Gaussian random deformations with magnitude $\in[0.4,1.6]\,$ m is sampled on a regular grid of granularity $\in[0.2,0.8]\,$ m. The field is then smoothed to ensure spatial coherence and applied to the point coordinates via trilinear interpolation, causing local bending of previously straight structures such as stems.

Inspired by the idea behind Mix3D (Nekrasov et al., 2021) and CoSMix (Saltori et al., 2022), we propose TreeMix, a dedicated data augmentation strategy for forest point clouds. As illustrated in Figure 3, TreeMix synthetically mixes two cylindrical training samples: tree instances in the target sample are randomly removed and replaced by with instances taken from the source sample, by copying them to the appropriate locations. To further enhance diversity, the newly added trees are augmented with random noise, rotation, reflection and scaling, as described above. A newly inserted tree is accepted only if it has low overlap with the points in the target sample (overlap below 10%). This check ensures that synthetic samples comply with the logic of manually labeled forest point clouds, where different tree IDs do not overlap, since inter-penetrations between different canopies cannot be labeled precisely. By mixing trees from different locations within a plot, or even from different plots or countries, TreeMix increases the diversity of the training data.

2.2.2 Network architecture

The point cloud segmentation network consists of a shared 3D feature extraction backbone, followed by three parallel prediction heads Δ=Δ Δ=Δ Δ=Δ Δ=Δ (Figure 2). The first head is dedicated to semantic segmentation. Δ=Δ Δ=Δ Δ=Δ Δ=Δ To address the complexity of segmenting trees in densely packed forests, it proved beneficial to use two heads to extract two complementary embeddings of each point, which serve as the basis for tree instance clustering.

As feature extraction backbone we use a sparse 3D convolutional neural network (CNN), implemented in the MinkowskiEngine library (Choy et al., 2019). It offers a favourable trade-off between performance and computational cost (Xiang et al., 2023b). Essentially, the backbone is a 3D version of the U-Net architecture that processes voxelised point clouds with sub-manifold sparse convolutions, generating per-point feature vectors of length 16. These features are then passed to the three prediction heads.

The semantic segmentation branch estimates a semantic label for each point. First, the per-point features are fed through a multi-layer perception (MLP) to obtain 5-class semantic scores $F_{s}\in\mathbb{R}^{N\times 5}$ . Those scores are further processed with two branches. One simply applies a softmax activation to obtain 5-class probabilities for the point. The other has one more hidden layer, also followed by softmax activation, to obtain binary tree/non-tree probabilities. Both outputs are trained with standard cross-entropy losses. Points labeled as non-tree are excluded from the subsequent individual tree segmentation.

The instance segmentation branch assigns point-wise individual tree IDs by clustering the outputs of the two corresponding prediction branches. One of them estimates, for each point, a 3D offset vector from the center of its tree instance, and is supervised with loss consisting of (i) the cosine distance between the true and predicted offset vectors Δ=Δ Δ=Δ Δ=Δ Δ=Δ , to align the directions of the predicted and true offset vectors, irrespective of object size and distance from the object center; and (ii) the $L_{1}$ distance between their endpoints Δ=Δ Δ=Δ Δ=Δ Δ=Δ to directly minimize the deviation from the true offset. The 3D centre offset Δ=Δ Δ=Δ Δ=Δ Δ=Δ has been advocated by several previous studies (Jiang et al., 2020; Chen et al., 2021a; Vu et al., 2022; Zhong et al., 2022) Δ=Δ Δ=Δ Δ=Δ Δ=Δ . Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ A perfect prediction would mean that the offsets contract each instance to a single point. The other branch maps each point to a 5-dimensional embedding space, such that points on the same tree instance form clusters in that space. Δ=Δ Δ=Δ Δ=Δ Δ=Δ Using 5 dimensions is in line with existing literature (Wang et al., 2019a; Engelmann et al., 2020; He et al., 2020) and corroborated by our own experiments, where higher values did not produce significant accuracy gains. It is supervised with a contrastive loss function that aims to minimise distances between points on the same tree, and to maximise distances between points on different trees (Wang et al., 2019a). Notably, the two additional dimensions of the embedding space make it possible to represent further object properties beyond 3D geometric offsets. Δ=Δ Δ=Δ Δ=Δ Δ=Δ Empirically, using both the embedding and the direct offset prediction allows for to more accurate tree instance segmentation.

The two embeddings are separately clustered to tree candidates Δ=Δ Δ=Δ Δ=Δ Δ=Δ in unsupervised fashion: offsets are explicitly added to the point coordinates and the resulting, shifted points are clustered with simple region growing Δ=Δ Δ=Δ Δ=Δ Δ=Δ as recommended by (Zhao et al., 2021), with the threshold for the maximum distance Δ=Δ Δ=Δ Δ=Δ Δ=Δ set at 0.3 meters. Clusters in the 5D embedding space, where distances do not have a direct geometric interpretation, are found with the mean-shift method Δ=Δ Δ=Δ Δ=Δ Δ=Δ , as in several studies (Wang et al., 2019a; Lahoud et al., 2019). Mean-shift (Comaniciu and Meer, 2002) has only a single bandwidth parameter, which we always keep at an empirically determined value as 0.6, a setting we found to robustly work across city as well as forest scenes (Xiang et al., 2023b, a). In line with previous work, we find that conventional clustering as a post-process works better than learning cluster assignments within the neural network.

The two (redundant) sets of tree instance candidates are merged and filtered with another small neural network, called ScoreNet (Jiang et al., 2020) Δ=Δ Δ=Δ Δ=Δ Δ=Δ , that predicts how well each instance candidate matches a ground truth (GT) tree Δ=Δ Δ=Δ Δ=Δ Δ=Δ instance (i.e., ScoreNet (Jiang et al., 2020)). That ScoreNet is a small 3D U-Net followed by max-pooling and a fully connected layer, and is trained to regress the maximal expected intersection over union (IoU) between an instance candidate and any of the actual trees. The candidates are then ordered by their score and pruned with greedy non-maximum suppression to obtain the final set of individual tree instances. All non-tree points are assigned the instance label $-1$ . See (Xiang et al., 2023a) for further details.

At test time cylindrical blocks are not sampled at random, but on a regular $(x,y)$ -grid, to ensure even coverage of the plot. Overlap** blocks are merged by re-assigning instance IDs such that they are globally unique, while greedily fusing instances that were split across more than one block (Xiang et al., 2023a). We slightly modify the original block merging scheme and fuse the instances whose intersection has the highest overlap with the smaller instance (rather than with the union). After block merging the points discarded by the initial voxel grid subsampling are reinserted and labeled with the nearest-neighbour method, such that every observed scan point has a semantic class and an instance ID.

2.3 Automated retrieval of tree parameters and stand structure

After segmenting a 3D point cloud, important individual tree attributes and stand structure characteristics can be extracted with straightforward geometric computations Δ=Δ Δ=Δ Δ=Δ Δ=Δ (see Figure 1). These biophysical attributes are the actual variables of interest for the forest inventory, because they typically serve as input for management decisions and ecological analysis. Compared to field measurements, tree parameters retrieved from point clouds allow for complete and consistent coverage of much larger areas. For some parameters (e.g., tree height), LiDAR retrievals are also the most accurate method; while for others (e.g., DBH) they are not as accurate as field measurements, and one has to accept a lower reliability of the individual measurement in exchange for the better coverage. In the present study, we extract the following exemplary forest attributes: at plot level, a digital terrain model (DTM) of the plot and its stand density; and for each individual tree the height, crown diameter, crown volume, crown volume of live branches, DBH and location.

The DTM (a raster of heights above some reference, in m) is routinely retrieved from LiDAR observations, due to the high cost of field surveys. In our study, the relevant points for DTM fitting have already been filtered by segmenting a separate ground class. Computing the DTM reduces to setting up a regular $(x,y)$ -raster and interpolating the heights of the raster points. We choose a grid spacing of 0.5 m $\times$ 0.5 m. Due to the high point density, nearest neighbour interpolation is sufficient.

Stand density (trees/ha) is another important plot-level feature, reflecting a forest’s growth state and influencing factors such as light availability, resource competition, and overall forest health (Liang et al., 2018). Stand density is calculated by counting the number of individual trees and dividing it by the surface area they cover. That area is estimated by projecting all tree points down to the $(x,y)$ -plane and computing the area of the 2D convex hull around them (Figure 1).

The tree height (m above ground) is defined as the elevation difference between the highest point of a tree and the ground level at the bottom of the same tree instance (Figure 1). The ground elevation is found by interpolating the DTM value at the $(x,y)$ -location of the tree. For a more reliable estimate of the height, we found it advantageous to filter out potential outliers above the canopy (e.g., scanning artefacts, isolated, protruding twigs). To that end we cluster all points labeled as members of the individual tree with HDBSCAN (McInnes et al., 2017) based on their 3D coordinates, find the dominant cluster, and return its highest point as the tree top.

There are several different ways to define the crown diameter (m), i.e., it may be the smallest enclosing circle, or the average between the major and minor axes of an enclosing ellipse. Some authors find the two longest perpendicular distances from the center line to the convex hull, compute their average and double it to approximate the diameter (Trochta et al., 2017; Chen et al., 2021b). Others use the height and width of an axis-aligned bounding box (Sun et al., 2022) or the mean radius of a circular ring that encloses the crown boundary (called a “donut”) Zhang et al. (2015). Here we simply project the crown points (live branches and dead branches) to the $(x,y)$ -plane and find their smallest enclosing circle with Welzl’s algorithm (Welzl, 1991). The diameter of this enclosing circle is our estimated crown diameter (Figure 1).

Δ=Δ Δ=Δ Δ=Δ Δ=Δ Compared to the 2D crown diameter, crown volume more accurately describes the effective size of the tree’s canopy, essential to assess photosynthesis potential and vitality. It enables the derivation of important ecological variables (e.g., growth) predictive of forest health and functionality. To calculate the crown volume (m ${}^{3}$ ), we find all crown points belonging to the same individual tree and filter them with HDBSCAN as described above for the height estimation to discard isolated protrusions. For the dominant cluster we then calculate the 3D convex hull, i.e., the smallest convex polyhedron containing all points Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ Δ=Δ , as a simple and robust approximation of the volume, valid across varying point densities. The volume of that polyhedron is our estimate of the crown volume. We compute two variants, one that includes live and dead branches and one that includes only live branches, where the latter is a more accurate description of the volume of the photosynthetic component of the tree and thus useful for its downstream use in modelling tree biophysical attributes such as tree volume and tree growth.

Traditional, manual field measurements of DBH (cm) are done with steel calipers. Often it is measured along two perpendicular directions and the two values are averaged (Liang et al., 2018). The breast height is usually set to 1.3 m above ground. For automatic measurements of DBH based on the segmented point cloud, we follow the standard approach: find all stem points within a given height range around the breast height, project them onto the $(x,y)$ -plane, and fit a circle to them (Figure 1). Specifically, we use a default height interval of $\pm$ 0.5 m around the breast height, but increase that interval if necessary to ensure that at least 10 points are found. Additionally, we filter the projected 2D stem points with HDBSCAN clustering to remove isolated points due to stem segmentation errors. For additional robustness, the circle fit to the remaining points is performed with the RANSAC method (Fischler and Bolles, 1981). The circle diameter corresponds to the DBH, and the circle center determines the location of the tree, according to its definition as the stem center at breast height (Liang et al., 2018).

2.4 Evaluation metrics

Δ=Δ Δ=Δ Δ=Δ Δ=Δ For better readability, all metrics, including confusion matrices, mIoU scores and F-score, are shown as percentages, rounded to one decimal place.

2.4.1 Metrics for point cloud segmentation

Standard metrics are computed to evaluate semantic segmentation: confusion matrices, overall accuracy, mean per-class accuracy, and mean intersection-over-union (mIoU). Additionally, we assess the geometric accuracy of the estimated DTM.

Table 3: Δ=Δ Δ=Δ Δ=Δ Δ=Δ Metrics used to evaluate individual tree segmentation. In machine learning, including computer vision, several of these metrics are also in common use but have different names.

forestry term

equation

machine learning term

Completeness

Tree detection accuracy (DA)

Producers’s accuracy (PAdetect)

C

= TP/N

Recall (r)

Omission error

e_{om}

= FN/N

1-Recall

Commission error

e_{com}

= FP/(TP+FP)

1-Precision

F-score

F = 2rp/(r+p)

F1-score

When comparing the literature, we found that forest researchers (Yin and Wang, 2016; Liang et al., 2018) and machine learning professionals, including those specialising in computer vision (Gu et al., 2022; Wang et al., 2019a), often use different names for the same quality metric. To make this explicit and avoid misinterpretations, we list them in Table 3. It shows that both fields differ in naming conventions while conveying similar evaluation criteria. In the Table 3, $N$ is the total number of reference trees, TP is the number of true positives (correctly detected trees), FN is number of reference trees that are not detected, FP is number of false positives (detections not corresponding to any reference tree), and TP+FP thus equals the total number of trees predicted by the system.

A subtle source of potential differences is the procedure used to match predicted trees to reference trees. In forest research this is frequently done based on geometric distances, i.e., each reference tree is matched to the detection that closest to it in terms of (stem) location (Yan et al., 2020; Huo et al., 2022; Hao et al., 2022). Sometimes additional exclusion criteria are employed to ensure the detection is sufficiently close, has similar height, etc. A related variant is to consider a detection correct if its location falls within the crown boundary of the reference tree (Chen et al., 2022). Some studies on individual tree detection match by IoU score between projected 2D crown polygons (Dietenberger et al., 2023). In our study we follow the standard procedure in computer vision, which reflects the 3-dimensional nature of the problem. Δ=Δ Δ=Δ Δ=Δ Δ=Δ Our method matches predicted tree instances to reference trees based on the IoU score between their respective point sets. This approach ensures robustness against variations in tree and stem densities across different plots. Predictions that do not reach an IoU of at least 0.5 with any reference tree are counted as false positives.

To go beyond verifying the presence of tree instances and evaluate how precisely they delineated, we introduce the coverage metric, which quantifies the agreement with the ground truth instance boundaries. Denoting the set of ground truth trees $\{I_{i}^{\text{gt}},i\in\{1,...,N_{\text{gt}}\}\}$ and the set of predicted trees as $\{I_{j}^{\text{pre}},j\in\{1,...,N_{\text{pre}}\}\}$ , we compare the two sets on a per-instance basis. For each ground truth tree we find the predicted instance with the highest IoU score, formally:

\text{maxIoU}(I_{i}^{\text{gt}})=\max_{j=1}^{N_{\text{pre}}}\big{(}\text{IoU}(% I_{i}^{\text{gt}},I_{j}^{\text{pre}})\big{)}\;.

(1)

The coverage is then defined as the average over all reference trees:

\text{Cov}=\frac{1}{N_{\text{gt}}}\sum_{i=1}^{N_{\text{gt}}}\text{maxIoU}(I_{i% }^{\text{gt}})\;.

(2)

2.4.2 Metrics for individual tree features and stand structure

For the DTM we use the two most common metrics (Liang et al., 2018): The root mean square error (RMSE) measures the average (squared) vertical deviation from the reference. The coverage derives indicates what portion of the plot area is covered by the estimated DTM: our nearest neighbour interpolation is based on a Voronoi tesselation (Shewchuk, 1996, 2002), as implemented in the pynn function. It may thus happen that no DTM value can be derived, if the grid location falls outside the Voronoi cells.

As in most existing studies, the accuracies of the extracted tree height, crown diameter, crown volume, live crown volume and DBH are evaluated by scatterplots of matched individual trees (Gu et al., 2022). From those, several numerical metrics can be extracted: the slope of the regression line, which should be close to 1; the correlation coefficient ( $R^{2}$ ), which should also be close to 1; and the p-value, which should be close to 0, indicating a highly significant predictive relation between the estimates and the reference values. Moreover, we compute the RMSE of the estimated dimensions. Finally, for crown-related variables, we complement the RMSE with a relative version, RMSE%, which is normalised by the Δ=Δ Δ=Δ Δ=Δ Δ=Δ average reference value of the variable. In this way, the errors are set in relation to the absolute magnitude of the estimates: e.g., in a regions with small trees of $\approx$ 20 m ${}^{3}$ crown volume, an error of $\pm$ 10 m ${}^{3}$ is a lot, whereas in an area of tall trees with much higher volume, it is very little. For the tree location, we calculate the RMSE in $x$ and $y$ direction, respectively, and create positional accuracy plots to visualize the deviations from the reference coordinates.

2.4.3 Evaluation for understory and suppressed trees

In many studies, trees smaller than a certain threshold (i.e., understory and suppressed trees) are excluded from the reference data collection. The criterion for what is considered “too small” may differ across studies, typically it is determined by a lower bound on the DBH (e.g., 10 cm is a typical value in temperate regions). Nevertheless, omitting these smaller trees constitutes a loss of information about the understory, and introduces a bias when assessing the accuracy of remote sensing techniques: suppressed trees that are detected in the point cloud are counted as false positives, since there are no matching reference annotations. In the FOR-Instance data understory and suppressed trees are not individually identified in the reference data, but are categorized as “low vegetation”. Detected trees in the understory would thus be counted against our method, although the discrepancy is due to limitations of the ground truth, not the model.

We found that, due to the vertical airborne view through the canopy, ALS-based DBH estimates are relatively inaccurate compared to other tree variables. We therefore follow (Huo et al., 2022) and use a height threshold instead: trees taller than 1/3 of the tallest tree in a plot are regarded as dominant, trees below that threshold constitute the understory. Only the dominant trees enter into the computation of the quality metrics in Section 2.4.1, whereas the understory can only be assessed by visual inspection, because there is no appropriate reference.

2.5 Implementation details

All code for our study was written in Python Δ=Δ Δ=Δ Δ=Δ Δ=Δ , and is publicly available at https://github.com/bxiang233/ForAINet. All experiments were conducted on a machine with 8-core Intel CPU, 8 GB of memory per core, and one Nvidia Titan RTX GPU with 24 GB of on-board memory. The Δ=Δ Δ=Δ Δ=Δ Δ=Δ code for extracting hand-crafted geometric features is a slightly modified version of the code published by Guinard and Landrieu (2017). The point cloud segmentation network was constructed with the help of the Torch-Point3D library (Chaton et al., 2020). All hyper-parameters for HDBSCAN filtering were determined by grid search on the validation set.

3 Results

3.1 3D point cloud segmentation

A series of experiments was conducted to examine whether various loss designs, weighting strategies and data augmentations could enhance segmentation performance for forest scenes. The quantitative results for individual tree segmentation and semantic segmentation, across different settings, are summarized in Table 4. The classification results for each semantic class, as well as the binary tree vs. non-tree classification accuracy, are shown in Table 5. The basic setting is the panoptic segmentation network we developed in earlier work (Xiang et al., 2023a), which we expanded to distinguish five semantic classes instead of only two (tree and non-tree). The other tested settings (Tables 4 and 5) are obtained by applying a single modification to the base setting.

Table 4: Δ=Δ Δ=Δ Δ=Δ Δ=Δ Instance segmentation and semantic segmentation results under different settings. In each column, bold font indicates the best results.

Settings	Instance segmentation (%)					Semantic segmentation (%)
Settings	Completeness	Omission error	Commission error	F-score	Cov	oAcc	mAcc	mIoU
Basic setting	79.3	20.8	21.2	79.0	77.0	92.6	81.2	73.0
+ binary semantic loss	83.0	17.0	21.6	80.6	79.8	92.5	80.4	72.4
+ class weights	78.9	21.1	22.5	78.2	77.1	90.9	83.9	70.7
+ height weights	80.8	19.2	21.4	79.7	78.0	91.8	78.4	70.4
+ region weights	80.5	19.5	20.0	80.2	77.4	92.3	79.0	71.4
+ intensity	78.9	21.1	14.7	82.0	76.7	93.0	78.9	72.7
+ return number	81.7	18.3	13.5	84.1	78.2	92.9	81.6	73.7
+ scan angle rank	81.7	18.3	18.5	81.6	79.0	93.2	81.8	74.3
+ hand-crafted features	81.4	18.6	18.1	81.7	78.9	93.6	82.6	75.7
+ elastic distortion and Δ=Δ Δ=Δ Δ=Δ Δ=Δ subsampling	83.3	16.7	16.5	83.4	79.5	92.6	80.5	72.4
+ TreeMix	82.0	18.0	11.7	85.1	78.1	93.0	80.8	73.5

Table 5: Per-class IoU values under different settings. Bold values in each column indicate the best results.

Settings	Multi class semantic segmentation (%)						Tree vs. non-tree (%)
Settings	Low veg.	Ground	Stem	Live branches	Dead branches	mIoU	Non-tree	Tree	mIoU
Basic setting	86.3	73.7	54.8	93.5	56.5	73.0	93.6	97.1	95.3
+ binary semantic loss	84.8	70.2	54.3	94.1	58.8	72.4	96.0	98.1	97.1
+ class weights	78.9	65.1	58.2	93.4	57.7	70.7	95.9	98.1	97.0
+ height weights	84.0	67.6	50.6	93.3	56.6	70.4	92.6	96.6	94.6
+ region weights	85.5	72.5	48.7	93.4	56.7	71.4	95.0	97.7	96.4
+ intensity	89.3	78.8	50.1	93.4	51.9	72.7	92.9	96.7	94.8
+ return number	86.5	74.5	54.6	94.1	58.8	73.7	93.8	97.1	95.4
+ scan angle rank	88.2	77.1	53.4	94.0	58.6	74.3	96.7	98.4	97.6
+ hand-crafted features	89.1	78.7	55.6	94.3	60.7	75.7	94.2	97.3	95.7
+ elastic distortion and Δ=Δ Δ=Δ Δ=Δ Δ=Δ subsampling	85.8	73.9	50.8	93.9	57.8	72.4	92.6	96.6	94.6
+ TreeMix	87.7	76.0	52.3	94.0	57.4	73.5	95.9	98.1	97.0

The tested setting can be categorized into three groups. The first group modifies the loss function or the sampling of training examples, i.e., the available input data is not modified, but used in a different manner. One way to do this is to add explicit supervision for the binary tree vs. non-tree task. Δ=Δ Δ=Δ Δ=Δ Δ=Δ It is implemented by adding a fully connected layer after the five-class semantic segmentation branch, and supervised with the binary classification loss. At first glance this may appear superfluous, since the binary reference is derived by simply aggregating the fine-grained labels (stems, live branches and dead branches into the tree class, ground and low vegetation into the non-tree class). Still the aggregation adds information that otherwise is unavailable to the network, namely the hierarchical structure of the label space. E.g., as soon as a point is part of a tree, it can no longer be confused with the low vegetation, hence the classifier can invest its capacity into difficult decisions within the tree class, such as separating live from dead branches. Indeed, we empirically find that adding the binary labels increases the IoU for tree vs. non-tree by 1.8 percent points (Table 5). A more correct set of tree points, in turn, better supports instance discrimination, so that the F-score for individual tree segmentation also improves by 1.6 percent points (see Table 4). The price to pay is a slightly higher confusion between the non-tree classes ground and low vegetation.

Also in this group fall different sampling and reweighting strategies. Sampling based on class frequency significantly improves the segmentation of the under-represented stem class (+3.4 pp), and also of the dead branches (+1.2 pp). However the associated accuracy loss for the ground ( $-$ 2.3 pp) hurts overall performance. Reweighting based on point height is meant to compensate the sampling bias of the ALS recording geometry, vertically through the canopy. Reweighting Δ=Δ Δ=Δ Δ=Δ Δ=Δ by the regional class distribution is a coarser version of class balancing, based on regional forest composition. The latter two strategies, however, both result in a moderate decline in IoU, despite small gains in regions with little training data.

The second group of modifications concern the input data to the model. We tested different additional input observations beyond the point coordinates: LiDAR intensity, return number, scan angle rank, as well as hand-crafted statistical features as described in Section 2.2.1. Compared to the basic setting, additional input features can to some extent enhance individual tree segmentation, see Table 4. Among them, the return number stands out with a 5.1 pp gain in F-score. For semantic segmentation, all additional features except for the intensity bring some improvement. In particular, the hand-crafted descriptors reach the highest semantic segmentation mIoU of 75.7 $\%$ . This may indicate that FOR-Instance is still too small to optimally train the network, as the network would, in principle, be able to (approximately) learn the extraction of the descriptors from raw data.

The third group are tree-specific data augmentation strategies to synthetically increase the size and diversity of the training set. We have experimented with two approaches: elastic distortion with Δ=Δ Δ=Δ Δ=Δ Δ=Δ subsampling, and TreeMix. The idea behind the use of elastic distortion is to better account for curved and non-vertical stems, which are too rare to be learned well, but may still occur in the test data. The reasoning behind Δ=Δ Δ=Δ Δ=Δ Δ=Δ subsampling, i.e., removing points from the training examples, is to simulate points lost due to occlusion, scattering, etc., and make the network more robust against missing data. These low-level augmentation methods increase the instance segmentation F-score by 4.4 pp, but slightly reduce the semantic segmentation performance. TreeMix is a more informed augmentation strategy, where samples are synthesised by mixing semantically meaningful entities (i.e., trees) from different samples. It achieves the best single tree segmentation result of 85.1% F-score (+6.1 pp), while also improving semantic segmentation.

Based on these results, we choose the network with TreeMix augmentation as the best configuration for the forest inventory task and conduct a series of detailed analyses with it. The confusion matrix (computed across all 11 plots of the test set) shows minimal confusion between tree (stem, live branches, dead branches) and non-tree points (low vegetation and ground), see Figure 4. As expected, confusions occur mainly between semantically similar categories, i.e., low vegetation and ground, respectively different tree components. Among them, the most significant confusion is that points on stems and dead branches are miss-classified as live branches.

Table 6 presents individual tree segmentation results for each of the 11 test plots. Aside from Plot 8 (RMIT forest region) and Plot 11 (TUWIEN forest region), which show relatively poor results, all other plots achieve a F-scores above 82 $\%$ . A potential reason for the weaker performance on may be that RMIT has the lowest point density (on average, 498 pts/m²), and TUWIEN has the second-lowest one (1717 pts/m²); while all other forests have densities above 2585 pts/m², with a maximum of 9529 pts/m² for NIBIO. The sparsity renders segmentation visibly more difficult, as can be seen in the third row of Figure 5.

In addition, NIBIO, CULS, and SCION are pure coniferous or conifer-dominated forests. Coniferous trees are generally straighter and have slimmer crowns with smaller branches. On the contrary, RMIT is native dry sclerophyll Eucalypt forest, where eucalyptus trees have few, but thick branches and expansive crowns. TUWIEN is a deciduous-dominated alluvial forest, where trees have many branches and low crowns, varying greatly in size. Expectedly, our results show that segmenting tall and straight, slender trees is easier; even when they are located close to each other (Table 6 and Figure 5). Whereas for trees with dispersed crowns and complex branch structure, individual tree segmentation is more challenging. A further reason for this discrepancy the limited diversity of training data for the forest characteristics of RMIT and TUWIEN, which each only have a single plot of training data Δ=Δ Δ=Δ Δ=Δ Δ=Δ (Figure 2).

Table 6: Δ=Δ Δ=Δ Δ=Δ Δ=Δ Individual tree segmentation results for each plot in the test set.

Plot ID

(Region)

reference

trees

trees with

DBH field data

detected

trees

corr. detected trees

(Completeness)

omitted trees

(Omission Δ=Δ Δ=Δ Δ=Δ Δ=Δ error)

wrong detections

(Commission Δ=Δ Δ=Δ Δ=Δ Δ=Δ error)

F-score

Cov

1 (CULS)

00000020

(100.0%)

000000

(0.0%)

0000003

(13.0%)

93.0%

98.2%

2 (NIBIO)

(70.3%)

(29.7%)

(0.0%)

82.5%

62.4%

3 (NIBIO)

(96.7%)

(3.3%)

(0.0%)

98.3%

88.3%

4 (NIBIO)

(96.3%)

(3.7%)

(0.0%)

98.1%

84.6%

5 (NIBIO)

(90.0%)

(10.0%)

(0.0%)

94.7%

84.4%

6 (NIBIO)

(92.9%)

(7.1%)

(10.3%)

91.2%

81.7%

7 (NIBIO)

(84.2%)

(15.8%)

(11.1%)

86.5%

75.2%

8 (RMIT)

(64.1%)

(35.9%)

(24.1%)

69.5%

60.6%

9 (SCION)

(92.0%)

(8.0%)

92.0%

86.7%

10 (SCION)

(83.3%)

(16.7%)

(0.0%)

90.9%

79.4%

11 (TUWIEN)

(71.4%)

(28.6%)

(32.4%)

69.4%

58.3%

3.2 Tree features and stand attributes

Based on the segmentation results individual tree features and stand-wise attributes were calculated, as described in Section 2.3. Quality metrics obtained by comparing the predicted values to the reference are shown in Table 7. Figure 6 shows the scatterplots for individual tree height, crown diameter, crown volume, crown volume of live branches, and individual tree DBH, and the positional accuracy plot for the matched trees from all five forest regions in one graph. Figure 7 shows scatterplots of matched individual trees and positional accuracy plots for each of the five forest regions.

As shown in Table 7, the estimated DTM achieves high coverage ( $>$ 98 %) and low RMSE value ( $<$ 26 cm) for all plots.²²2Note that the limiting factor for DTM is largely not measurement or fitting accuracy, but the considerable definition uncertainty of the forest floor. Also tree height predictions are accurate, all RMSE $\%$ below 0.06 (see Table 7). As shown in the scatterplots in Figure 6(a) and in the first row of Figure 7, estimated and reference tree heights correlate very well, across all forest regions. This finding is in line with a previous study, e.g., by Wang et al. (2019b).

For the prediction of tree crown attributes, i.e., crown diameter, crown volume, and crown volume of live branches, both accurate individual tree segmentation and individual tree component classification are required. For crown diameter prediction, RMIT and TUWIEN have relatively worse predictions 0.26 RMSE% for RMIT and 0.20 for TUWIEN, respectively. For example, two large, umbrella-shaped tree crowns in RMIT have been over-segmented, leading to incorrect crown dimaeter estimates, see Figure 8(a). Two examples from TUWIEN can be seen in Figure 8(b), where under-segmentation has Δ=Δ Δ=Δ Δ=Δ Δ=Δ led to large differences between estimated values and the reference. As the tree crowns in TUWIEN are large, inaccurately segmenting them causes significant mis-estimation of the crown diameter. Crown volumes (both over all branches and over only live branches) estimates are relatively poor for the RMIT and SCION regions. As shown in Figure 8 (examples 1 and 2), the estimates are again too low due to over-segmentation. For SCION (example 5) the predicted value is too large due to under-segmentation. In addition, tree component segmentation has a significant influence on the results: although the individual tree segmentation is good, some dead branches are mis-classified as live branches, leading to the final crown volume of live branches being greatly over-estimated. This exposes a conceptual issue of this common variable: the crown volume of a tree has a considerable definition uncertainty, and a minor change in what is regarded as the crown can lead to large changes of the volume.

DBH and tree location are generally less accurate, which can be attributed to the sparsity and noise of the point clouds around the breast height that, in turn limit the accuracy of circle fitting. This is somewhat expected given the airborne recording geometry of the FOR-Instance data. Figure 8(d) shows some examples from each region. It can be seen that except for the CULS region, where the circle shape is relatively clear, the sparsity of the data makes accurate measurements difficult. Sometimes there may be no stem points at all, making it impossible to perform fitting. In those cases the average $x$ , $y$ coordinates of all the points on a tree are used as its location. The sparsity of stem points highlights a limitation of ALS data. A solution could be to adopt a different method of estimating DBH, for instance via airborne allometric models (Jucker et al., 2017).

Table 7: Results for individual trees features and stand-wise attributes, for each plot in the test set.

Plot ID (Region)

Stand density (trees/ha)

DTM

Tree height

DBH RMSE

Crown diameter

Crown volume RMSE (

m^{3}

)

(RMSE%)

Location RMSE

(cm)

Cov

(%)

RMSE

(cm)

RMSE

(m)

RMSE%

With GT

(cm)

With field data

(cm)

RMSE

(m)

RMSE%

Live branches

All branches

mean

1 (CULS)

2079

100.0

0.5

0.02

0.1

0.01

13.1 (0.08)

14.3 (0.07)

2 (NIBIO)

4771

98.8

0.6

0.02

0.5

0.10

45.8 (0.32)

47.6 (0.23)

3 (NIBIO)

4572

99.2

0.3

0.01

0.7

0.14

30.6 (0.34)

30.9 (0.19)

4 (NIBIO)

4612

99.6

0.8

0.03

0.4

0.08

25.6 (0.33)

16.5 (0.13)

5 (NIBIO)

3075

99.1

0.1

0.01

0.6

0.10

39.3 (0.26)

24.7 (0.11)

6 (NIBIO)

4015

99.0

0.9

0.03

0.6

0.15

27.0 (0.28)

32.2 (0.20)

7 (NIBIO)

3012

99.4

0.4

0.01

0.6

0.10

55.4 (0.33)

47.1 (0.22)

8 (RMIT)

7283

100.0

0.5

0.06

0.7

0.26

9.2 (0.96)

10.0 (0.94)

9 (SCION)

2758

99.9

0.1

0.00

1.0

0.14

81.4 (0.35)

115.1 (0.37)

10 (SCION)

2273

100.0

0.0

0.00

0.9

0.11

114.9 (0.38)

242.8 (0.68)

151

104

11 (TUWIEN)

3419

100.0

0.4

0.02

1.4

0.20

137.8 (0.60)

142.2 (0.57)

117

4 Discussion

4.1 Baseline comparisons

We have compared our proposed pipeline to three alternatives. Our first baseline Δ=Δ Δ=Δ Δ=Δ Δ=Δ is an example of current standard practice, available in popular cloud processing software: Treeiso (Xi and Hopkinson, 2022) is a recently published, unsupervised algorithm to segment individual trees, which is available as a plug-in for the CloudCompare software (https://www.cloudcompare.org, last accessed 09/2023). Treeiso requires point clouds consisting exclusively of tree points. We therefore feed it only the points assigned to trees by our semantic segmentation module, under the TreeMix setting. The user parameters of Treeiso must be tuned for optimal performance, but are quite user-friendly and understandable also for non-experts in forestry.

The method consists of three steps. For the first step, an initial segmentation with a 3D cut-pursuit algorithm, we found the default values to be effective. In the second and third steps, there are three parameters that influence the segmentation results, as shown in Table 9: the number $K$ of nearest neighbors to search in step 2, a 2D cut-pursuit algorithm; the strength $\lambda$ of the regularization in that step; and the height/length ratio $Th_{h}$ for step 3, a global refinement. Larger values of $K$ and $\lambda$ , and smaller values of $Th_{h}$ , result in larger segments. I.e., $K$ and/or $\lambda$ must be decreased or $Th_{h}$ must be increased in case of under-segmentation, and vice versa if over-segmentation occurs.

Table 8 shows the best individual tree segmentation accuracy that we could achieve with careful tuning. We note that, due to the interactive procedure, it is impossible to exhaust all parameter combinations. We started from the default values and iteratively adjusted them ( $K$ and $\lambda$ in steps of 5, $Th_{h}$ in steps of 0.05) to obtain the visually best output. In this way we managed to gain almost 8 pp in F-score compared to the default parameters, see Table 8. Still a significant gap ( $\approx$ 17 pp) remains compared to the performance of our deep learning method.

Figure 5 shows a qualitative comparison between Treeiso and our segmentation method. In the CULS region, both algorithms achieve excellent individual tree segmentation results. In all other regions we find that the deep learning-based methods has advantages over Treeiso in certain cases (marked by white ellipses). For plots where trees vary greatly in size, such as RMIT and TUWIEN, we could not find a way to tune the parameters of Treeiso such that it correctly segments trees of different sizes within the plot. It either over-segments large trees or merges small trees, whereas the learned segmentation model handles these situations better. In addition, for coniferous trees with jagged shapes such as those in the NIBIO and SCION examples, parameter tuning is also difficult to find suitable parameters for handling tightly spaced trees. Especially protruding branches are over-segmented, and none of the parameter setting we tried could rectify this.

Table 8: Δ=Δ Δ=Δ Δ=Δ Δ=Δ Quantitative comparison to individual tree segmentation with Treeiso. All values are percentages (%). “default” are results obtained with the recommended default parameters of Treeiso, while “best” are those obtained with parameters that we individually tuned for each plot. Refer to Table 9 for those optimal values.

Method	Completeness	Omission error	Commission error	F-score	Cov
Treeiso (default)	64.1	35.9	43.1	60.3	67.8
Treeiso (best)	70.0	30.0	33.5	68.2	71.7
Ours	82.0	18.0	11.7	85.1	78.1

Table 9: Default and best Treeiso parameter combinations for each plot in FOR-Instance.

Plot ID	1	2	3	4	5	6	7	8	9	10	11	Default
$K$ (Nearest neiborhood to search in step 2: 2D cut-pursuit algorithm)	50	20	20	10	15	20	15	20	20	20	20	20
$\lambda$ (Regularization strength in step 2)	20	10	25	20	10	10	15	10	20	20	40	20
$Th_{h}$ (Relative height length ratio in step 3: global refinement)	0.5	0.3	0.5	0.5	0.3	0.5	0.3	0.7	0.6	0.05	0.5	0.5

Another comparison was made against a different deep learning-based method (Straker et al., 2023). That method involves constructing a CHM from the point cloud and converting it to pseudo-colour images Δ=Δ Δ=Δ Δ=Δ Δ=Δ of 640 $\times$ 640 pixels, which are then fed into a 2D object detection network, YOLOv5 (Jocher et al., 2022). Δ=Δ Δ=Δ Δ=Δ Δ=Δ They used the published weights of the YOLOv5l-seg model and chose the same hyperparameters as described by Jocher et al. (2022), except for changing the initial learning rate to 0.001. The quantitative comparison is given in Table 10. Our segmentation-based approach achieved higher F-scores in all forest regions except for CULS. Consistent with the findings reported by Straker et al. (2023), both methods reach higher quality in coniferous forests, such as CULS, SCION, and NIBIO; whereas the quality somewhat deteriorates in forests with strongly varying tree heights and crown sizes, as in the RMIT and TUWIEN. Still, also in those challenging regions our segmentation method had an edge. On the particularly challenging TUWIEN plot that more than doubles the F-score from 30% to 69.4%.

Finally we also compare our individual tree segmentation results Δ=Δ Δ=Δ Δ=Δ Δ=Δ to those of Point2Tree (Wielgosz et al., 2023), another graph-based Δ=Δ Δ=Δ Δ=Δ Δ=Δ method that was originally developed for terrestrial mobile laser scanning (MLS) point clouds. Point2Tree consists of two stages. First it employs a PointNet++ model to obtain semantic labels for the points. The semantic segmentation serves as a basis for an unsupervised, graph-based algorithm. That instance segmentation algorithm sequentially identifies individual trees in a graph: the segmented point cloud is divided into small segments that form the nodes of the graph, constructed based on trunk information. For optimal results, various hyper-parameters must be selected individually for each forest type, including slice size, minimum number of stem points, and stem height.

Point2Tree is particularly sensitive to low point density near the ground. This sensitivity arises because, having been developed for TLS points, graph construction starts from the bottom, and the PointNet++ has been trained on MLS data. If the point cloud is too sparse there, the method tends to break, which makes it less suitable for our ALS data. For instances with adequate density at lower height, Point2Tree is effective. When applied to FOR-Instance, the method was only successful for 3 of the test plots, shown in Table 11 Δ=Δ Δ=Δ Δ=Δ Δ=Δ ; showing that our method, adapted to the characteristics of ALS data, indeed offers a significant advantage when it comes to handling plots with varying point densities. For completeness we note that the failure cases could possibly be alleviated by re-training Point2Tree for ALS data and then adapting the graph -based part to the new data characteristics.

Table 10: Quantitative comparison to CHM-based YOLOv5 (Straker et al., 2023) on the individual tree segmentation task. All values are percentages (

\%

Region	Method	Completeness	Omission error	Commission error	F-score
CULS	YOLOv5	100.0	0.0	0.0	100.0
CULS	Ours	100.0	0.0	13.0	93.0
NIBIO	YOLOv5	72.0	28.0	13.0	79.0
NIBIO	Ours	88.4	11.6	3.6	92.4
RMIT	YOLOv5	62.0	38.0	30.0	65.0
RMIT	Ours	64.1	35.9	24.1	69.5
SCION	YOLOv5	91.0	9.0	9.0	91.0
SCION	Ours	87.7	12.3	4.0	91.5
TUWIEN	YOLOv5	23.0	77.0	59.0	30.0
TUWIEN	Ours	71.4	28.6	33.4	69.4

Table 11: Quantitative comparison to Point2Tree (Wielgosz et al., 2023) on the individual tree segmentation task. Point2Tree was only successful on three of the test plots. All values are percentages (

\%

Plot ID

(Region)

Method

Completeness

Omission error

Commission error

F-score

CULS

Point2Tree

93.4

6.6

48.8

61.5

Ours

100.0

0.0

13.0

93.0

NIBIO plot 3

Point2Tree

76.0

24.0

68.5

40.0

Ours

96.7

3.3

0.0

98.3

NIBIO plot 4

Point2Tree

80.5

19.5

64.9

47.2

Ours

96.3

3.7

0.0

98.1

4.2 Dominant vs. understory trees

Besides the 11 plots in the test set, the FOR-Instance data includes another 15 plots also collected in Norway, which we refer to as NIBIO2. The data from NIBIO2 have not been included in the test set for two reasons. First, NIBIO2 does not have reference measurements of DBH. Second, in this area the reference data exhibits evident confusion between the low vegetation class and understory trees: many small trees are not annotated in the reference, which would introduce a bias in the quantitative results. We therefore separately evaluate on NIBIO2, using 1/3 of the maximum tree height as the threshold to separate dominant from understory trees, as described in Section 2.4.3. Table 12 and Figure 9 present the corresponding results for NIBIO2.

As shown in Table 12, the performance metric are much lower across all trees than when measured only across the dominant trees (72.8 $\%$ vs. 82.9 $\%$ in F-score). When measured only across understory trees, the F-score drops to 39.3 $\%$ , showing their substantial impact on the aggregate metrics. As can be seen in Figure 9, the gap results from trees that are (correctly) identified by the network, but not annotated in the reference, resulting in commission errors that are, in fact, due to limitations of the reference data. In the last two columns of Figure 9, understory trees that are present in the reference but missed by the segmentation network are circled in red, whereas those found by the network but missing in the reference are circled in black. It is clear that the latter are plausible tree detections that should not be penalised. Δ=Δ Δ=Δ Δ=Δ Δ=Δ This discrepancy highlights the need for more complete ground truth annotations, especially for understory trees, to properly assess tree segmentation.

Our method treats dominant and understory trees in a unified manner, in principle ensuring equal effectiveness for both. Still, understory trees may exhibit lower point density and suffer from missing data caused by severe occlusions, due to the nature of airborne sensor data acquisition. Consequently, their segmentation is usually less accurate compared to canopy-level trees (Dong et al., 2020; Jarron et al., 2020; Wang et al., 2023b). We are unable to quantify the difference in segmentation performance between dominant and understory trees for our pipeline, because of the lack of precise ground truth annotations in the dataset. This uncertainty again emphasizes the need for more complete and accurate ground truth. Reference datasets with carefully annotated understory trees will be a indispensable to overcome the current limitations and unlock the full potential of segmentation algorithms for forestry.

Table 12: Quantitative results of individual tree segmentation in the NIBIO2 region. One can clearly see the effect of the understory, and associated labeling ambiguities, on the performance metrics.

Groups

reference

trees

detected

trees

correctly detected trees

(Completeness)

omitted trees

(Omission Δ=Δ Δ=Δ Δ=Δ Δ=Δ error)

wrong detections

(Commission Δ=Δ Δ=Δ Δ=Δ Δ=Δ error)

F-score

Dominant

571

490

440 (77.1%)

131 (22.9%)

50 (10.2%)

82.9%

Understory trees

266

157

83 (31.2%)

183 (68.8%)

74 (47.1%)

39.3%

All trees

837

647

540 (64.5%)

297 (35.5%)

107 (16.5%)

72.8%

4.3 Implications and limitations

The pipeline proposed in this paper sets a new state of the art for automated individual tree segmentation and semantic segmentation in ALS point clouds (at least for the forest characteristics covered by FOR-Instance Δ=Δ Δ=Δ Δ=Δ Δ=Δ , including boreal coniferous forest, temperate coniferous forest, temperate deciduous forest and sclerophyll forest).

Δ=Δ Δ=Δ Δ=Δ Δ=Δ Our method has also demonstrated potential for detecting understory trees. While contributing minimally to biomass, they play a crucial role for the ecosystem by ensuring canopy succession, stand development, soil erosion protection, and habitat for wildlife (Hamraz et al., 2017). The detection of understory trees has recently received more attention within the research community (Hamraz et al., 2017; Dong et al., 2020; Jarron et al., 2020; Wang et al., 2023b; Penner et al., 2023). Current methods often follow a stratified approach that iteratively removes higher canopy layers, then detects lower ones (Dong et al., 2020; Jarron et al., 2020; Wang et al., 2023b). In contrast, our method directly processes 3D point clouds, without the need to derive a 2.5D height model and to update it for each layer. We argue that directly analyzing the 3D forest structure is a more natural approach to effectively and efficiently detect understory trees. Moreover, advances in LiDAR technology (e.g., single-photon LiDAR) can be expected to further increase point cloud density and to promote the adoption of 3D methods.

Based on the segmentation, it also delivers accurate estimates of key variables like tree height, crown width, and crown volume, as well as stand attributes like stem density and the terrain of the forest floor. These outputs provide a strong basis for a tree-level forest inventory, and consequently for fine-scale forest management. Complete per-tree information opens up the possibility to monitor tree growth dynamics over extended areas. It may also contribute to more targeted timber production, safeguarding of young trees, and maintenance of biodiversity within forest ecosystems. Moreover, the segmentation results may enhance the automatic estimation of ecosystem service indicators as well as aggregate timber volume, carbon stock and sequestration, in support of ecological management and protection.

Of course, our proof-of-concept also has limitations. Δ=Δ Δ=Δ Δ=Δ Δ=Δ In particular, it tends to reach lower segmentation accuracy for more complex forest structure. Among the forest types contained in the FOR-Instance dataset, the CULS region consists solely of even-aged trees of a single tree species (Scots pine). The NIBIO area also features mature forest, dominated mainly by Picea abies. Although the SCION region has high tree density ( $\approx$ 930 trees per hectare), it is a productive plantation forest also dominated by a single species (Pinus radiata). The segmentation accuracy in these three regions is high, which can be attributed to their relatively uniform species composition, with even-aged and mature trees creating a regular, single-layered forest structure. Despite the overlap** and intertwined crowns of the closely spaced, radiata pines with many branches, the segmentation accuracy in the SCION area remains high.

On the contrary, the TUWIEN and RMIT regions are structurally more complex. TUWIEN features complicated vegetation structures (woody debris, lying and standing deadwood, and dense understory), a large diversity of tree species, and considerable variations in canopy layering and tree size (DBH up to 1.3 m). RMIT encompasses trees with a wide range of ages and heights (from 5 m to 17 m), with an understory consisting of low shrubs ( $\leq$ 2 m in height) and native grasses. For these two structurally more intricate forests our method exhibits markedly lower accuracy (Figure 6). It is worth noting again that TUWIEN and RMIT both only have a single training plot in the current training data. It is thus not clear to what extent the lower performance stems from the higher complexity and not simply from insufficient training data. It will be interesting to see how 3D deep learning copes with more varied forests once sufficiently large training sets become available.

Another factor is point cloud density. We have conducted a simple experiment to study the impact of point density on the segmentation accuracy of ForAINet: The original FOR-Instance data is sub-sampled to 7 different point densities (i.e., 10, 25, 50, 75, 100, 500, and 1000 pts/m²). That range of densities encompasses the typical densities of our ALS-HD (500-1000 pts/m²), through conventional, airborne high-density ALS (up to 100 pts/m²), to traditional ALS (approximately 10 pts/m²). We then run our best configuration (with TreeMix augmentation, see Section 3.1) on the sub-sampled datasets. The results indicate that the performance decreases with point density (Figure 10). The drop in instance segmentation F-score remains within 5 percentage points for densities $\geq$ 75 pts/m². Below that value the omission error markedly increases. The mIoU for semantic segmentation follows a similar pattern. In conclusion, our 3D learning method is challenged by point densities below 100 pts/m², as obtained for instance from traditional airborne ALS.

Δ=Δ Δ=Δ Δ=Δ Δ=Δ Another important point is that Δ=Δ Δ=Δ Δ=Δ Δ=Δ our method builds on fully supervised machine learning. Its performance is strongly heavily by the data seen during training, which means that it cannot necessarily be deployed out-of-the-box to unfamiliar forest types Δ=Δ Δ=Δ Δ=Δ Δ=Δ (e.g., tropical environments with high overlap between trees) or sensor characteristics. Empirically, even moderate changes of the data characteristics can greatly affect neural network estimates. To sidestep extensive data collection efforts it may be useful to explore transfer learning and domain adaptation strategies (Triess et al., 2021). To that end, further empirical studies are needed, which in turn call for even larger and more diverse reference datasets.

Our current experiments are restricted to ALS-HD data. While that acquisition setup proved to be an excellent choice not only for tree height, but also for and crown-related attributes, retrievals of DBH and tree location were less accurate. The circle fitting used for those tasks is compromised by the lower point density of ALS on lower parts of the stem. We see two potential solutions: (i) The use of multi-source LiDAR, by complementing ALS with terrestrial (possibly mobile) scans. we are confident that this could largely resolve the problem, but it would come at the cost of a larger scanning effort, and poses the additional technical challenges to co-register terrestrial and airborne data to centimeter level. (ii) A practical perhaps more attractive solution could be to avoid explicit, geometric estimation of stem parameters, and instead retrieve DBH and tree location directly from the ALS data. This could potentially be done either by augmenting the segmentation network with an appropriate regression branch, or by deriving dedicated allometric relations.

5 Conclusion

We have assembled a point cloud processing pipeline aimed at extracting a complete inventory of per-tree attributes from ALS-HD data. The pipeline starts by extracting semantic class labels and individual trees Δ=Δ Δ=Δ Δ=Δ Δ=Δ with ForAINet, which then facilitate the automatic estimation of tree- and plot-level attributes. In experiments on the FOR-Instance dataset the proposed method achieved excellent segmentation performance, with 85.1 $\%$ F-score for individual tree segmentation and a 73.5 $\%$ mIoU for 5-class semantic segmentation. The good segmentation quality translates to good predictive skill for several important biophysical properties, including tree height, crown width, crown volume. Retrievals of DBH and tree location were less accurate, because the scanning geometry of ALS results in an unfavourable point density on the lower stems. Δ=Δ Δ=Δ Δ=Δ Δ=Δ Overall, the proposed method shows good performance across a variety of forest types, including the detection of many understory trees. Segmentation quality does, however, noticeably deteriorate if a forest exhibits complex structure, and also if the point density falls below $\approx$ 100 pts/m ${}^{2}$ . While further research is needed to address these remaining issues, we believe that fully automated tree-level forest inventories based on remotely sensed data are within reach.

References

Brede et al. (2019) Brede, B., Calders, K., Lau, A., Raumonen, P., Bartholomeus, H.M., Herold, M., Kooistra, L., 2019. Non-destructive tree volume estimation through quantitative structure modelling: Comparing UAV laser scanning with terrestrial LIDAR. Remote Sensing of Environment 233, 111355. https://doi.org/10.1016/j.rse.2019.111355.
Chang et al. (2022) Chang, L., Fan, H., Zhu, N., Dong, Z., 2022. A two-stage approach for individual tree segmentation from TLS point clouds. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15, 8682–8693. https://doi.org/10.1109/JSTARS.2022.3212445.
Chaton et al. (2020) Chaton, T., Chaulet, N., Horache, S., Landrieu, L., 2020. Torch-Points3D: A modular multi-task framework for reproducible deep learning on 3D point clouds, in: International Conference on 3D Vision (3DV), pp. 1–10. https://doi.org/10.1109/3DV50981.2020.00029.
Chen et al. (2022) Chen, Q., Gao, T., Zhu, J., Wu, F., Li, X., Lu, D., Yu, F., 2022. Individual tree segmentation and tree height estimation using leaf-off and leaf-on UAV-LiDAR data in dense deciduous forests. Remote Sensing 14, 2787. https://doi.org/10.3390/rs14122787.
Chen et al. (2021a) Chen, S., Fang, J., Zhang, Q., Liu, W., Wang, X., 2021a. Hierarchical aggregation for 3D instance segmentation, in: IEEE/CVF International Conference on Computer Vision, pp. 15447–15456. https://doi.org/10.1109/ICCV48922.2021.01518.
Chen et al. (2021b) Chen, X., Jiang, K., Zhu, Y., Wang, X., Yun, T., 2021b. Individual tree crown segmentation directly from UAV-borne LiDAR data using the PointNet of deep learning. Forests 12, 131. https://doi.org/10.3390/f12020131.
Choy et al. (2019) Choy, C., Gwak, J., Savarese, S., 2019. 4D spatio-temporal convnets: Minkowski convolutional neural networks, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3075–3084. https://doi.org/10.1109/CVPR.2019.00319.
Comaniciu and Meer (2002) Comaniciu, D., Meer, P., 2002. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619. 10.1109/34.1000236.
Coomes et al. (2017) Coomes, D.A., Dalponte, M., Jucker, T., Asner, G.P., Banin, L.F., Burslem, D.F., Lewis, S.L., Nilus, R., Phillips, O.L., Phua, M.H., Qie, L., 2017. Area-based vs tree-centric approaches to map** forest carbon in southeast asian forests from airborne laser scanning data. Remote Sensing of Environment 194, 77–88. https://doi.org/10.1016/j.rse.2017.03.017.
Dai et al. (2018) Dai, W., Yang, B., Dong, Z., Shaker, A., 2018. A new method for 3D individual tree extraction using multispectral airborne LiDAR point clouds. ISPRS Journal of Photogrammetry and Remote Sensing 144, 400–411. https://doi.org/10.1016/j.isprsjprs.2018.08.010.
Dalponte and Coomes (2016) Dalponte, M., Coomes, D.A., 2016. Tree-centric map** of forest carbon density from airborne laser scanning and hyperspectral data. Methods in Ecology and Evolution 7, 1236–1245. https://doi.org/10.1111/2041-210X.12575.
Dietenberger et al. (2023) Dietenberger, S., Mueller, M.M., Bachmann, F., Nestler, M., Ziemer, J., Metz, F., Heidenreich, M.G., Koebsch, F., Hese, S., Dubois, C., Thiel, C., 2023. Tree stem detection and crown delineation in a structurally diverse deciduous forest combining leaf-on and leaf-off UAV-SfM data. Remote Sensing 15, 4366. https://doi.org/10.3390/rs15184366.
Dong et al. (2020) Dong, T., Zhang, X., Ding, Z., Fan, J., 2020. Multi-layered tree crown extraction from LiDAR data using graph-based segmentation. Computers and Electronics in Agriculture 170, 105213. https://doi.org/10.1016/j.compag.2020.105213.
Engelmann et al. (2020) Engelmann, F., Bokeloh, M., Fathi, A., Leibe, B., Nießner, M., 2020. 3D-MPA: Multi proposal aggregation for 3D semantic instance segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9031–9040. https://doi.org/10.1109/CVPR42600.2020.00905.
Fischler and Bolles (1981) Fischler, M.A., Bolles, R.C., 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 381–395. https://doi.org/10.1145/358669.358692.
Gu et al. (2022) Gu, W., Bai, S., Kong, L., 2022. A review on 2D instance segmentation based on deep neural networks. Image and Vision Computing 120, 104401. https://doi.org/10.1016/j.imavis.2022.104401.
Guinard and Landrieu (2017) Guinard, S., Landrieu, L., 2017. Weakly supervised segmentation-aided classification of urban scenes from 3D LiDAR point clouds, in: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 151–157. https://doi.org/10.5194/isprs-archives-XLII-1-W1-151-2017.
Hackel et al. (2016) Hackel, T., Wegner, J., Schindler, K., 2016. Fast semantic segmentation of 3D point clouds with strongly varying density, in: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Copernicus GmbH. pp. 177–184. https://doi.org/10.5194/isprs-annals-III-3-177-2016.
Hakula et al. (2023) Hakula, A., Ruoppa, L., Lehtomäki, M., Yu, X., Kukko, A., Kaartinen, H., Taher, J., Matikainen, L., Hyyppä, E., Luoma, V., Holopainen, M., Kankare, V., Hyyppä, J., 2023. Individual tree segmentation and species classification using high-density close-range multispectral laser scanning data. ISPRS Open Journal of Photogrammetry and Remote Sensing 9, 100039. https://doi.org/10.1016/j.ophoto.2023.100039.
Hamraz et al. (2017) Hamraz, H., Contreras, M., Zhang, J., 2017. Forest understory trees can be segmented accurately within sufficiently dense airborne laser scanning point clouds. Scientific Reports 7, 6770. 10.1038/s41598-017-07200-0.
Hao et al. (2022) Hao, Y., Widagdo, F.R.A., Liu, X., Liu, Y., Dong, L., Li, F., 2022. A hierarchical region-merging algorithm for 3-D segmentation of individual trees using UAV-LiDAR point clouds. IEEE Transactions on Geoscience and Remote Sensing 60, 1–16. https://doi.org/10.1109/TGRS.2021.3121419.
He et al. (2020) He, T., Gong, D., Tian, Z., Shen, C., 2020. Learning and memorizing representative prototypes for 3D point cloud semantic and instance segmentation, in: European Conference on Computer Vision (ECCV), pp. 564–580. 10.1007/978-3-030-58523-5_33.
Huo et al. (2022) Huo, L., Lindberg, E., Holmgren, J., 2022. Towards low vegetation identification: A new method for tree crown segmentation from LiDAR data based on a symmetrical structure detection algorithm (SSD). Remote Sensing of Environment 270, 112857. https://doi.org/10.1016/j.rse.2021.112857.
Hyyppä et al. (2022) Hyyppä, E., Kukko, A., Kaartinen, H., Yu, X., Muhojoki, J., Hakala, T., Hyyppä, J., 2022. Direct and automatic measurements of stem curve and volume using a high-resolution airborne laser scanning system. Science of Remote Sensing 5, 100050. https://doi.org/10.1016/j.srs.2022.100050.
Hyyppa et al. (2001) Hyyppa, J., Kelle, O., Lehikoinen, M., Inkinen, M., 2001. A segmentation-based method to retrieve stem volume estimates from 3-D tree height models produced by laser scanners. IEEE Transactions on Geoscience and Remote Sensing 39, 969–975. https://doi.org/10.1109/36.921414.
Jarron et al. (2020) Jarron, L.R., Coops, N.C., MacKenzie, W.H., Tompalski, P., Dykstra, P., 2020. Detection of sub-canopy forest structure using airborne LiDAR. Remote Sensing of Environment 244, 111770. https://doi.org/10.1016/j.rse.2020.111770.
Jiang et al. (2020) Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., Jia, J., 2020. PointGroup: Dual-set point grou** for 3D instance segmentation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4866–4875. https://doi.org/10.1109/CVPR42600.2020.00492.
Jiang et al. (2023) Jiang, T., Zhang, Q., Liu, S., Liang, C., Dai, L., Zhang, Z., Sun, J., Wang, Y., 2023. LWSNet: A point-based segmentation network for leaf-wood separation of individual trees. Forests 14, 1303. https://doi.org/10.3390/f14071303.
Jocher et al. (2022) Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., NanoCode012, Kwon, Y., Michael, K., TaoXie, Fang, J., imyhxy, Lorna, Zeng, Y., Wong, C., V, A., Montes, D., Wang, Z., Fati, C., Nadar, J., Laughing, UnglvKitDe, Sonck, V., tkianai, yxNONG, Skalski, P., Hogan, A., Nair, D., Strobel, M., Jain, M., 2022. ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation Software. https://doi.org/10.5281/zenodo.7347926.
Jucker et al. (2017) Jucker, T., Caspersen, J., Chave, J., Antin, C., Barbier, N., Bongers, F., Dalponte, M., van Ewijk, K.Y., Forrester, D.I., Haeni, M., Higgins, S.I., Holdaway, R.J., Iida, Y., Lorimer, C., Marshall, P.L., Momo, S., Moncrieff, G.R., Ploton, P., Poorter, L., Rahman, K.A., Schlund, M., Sonké, B., Sterck, F.J., Trugman, A.T., Usoltsev, V.A., Vanderwel, M.C., Waldner, P., Wedeux, B.M.M., Wirth, C., Wöll, H., Woods, M., Xiang, W., Zimmermann, N.E., Coomes, D.A., 2017. Allometric equations for integrating remote sensing imagery into forest monitoring programmes. Global Change Biology 23, 177–190. https://doi.org/10.1111/gcb.13388.
Kaartinen et al. (2012) Kaartinen, H., Hyyppä, J., Yu, X., Vastaranta, M., Hyyppä, H., Kukko, A., Holopainen, M., Heipke, C., Hirschmugl, M., Morsdorf, F., Næsset, E., Pitkänen, J., Popescu, S., Solberg, S., Wolf, B.M., Wu, J.C., 2012. An international comparison of individual tree detection and extraction using airborne laser scanning. Remote Sensing 4, 950–974. https://doi.org/10.3390/rs4040950.
Kaijaluoto et al. (2022) Kaijaluoto, R., Kukko, A., El Issaoui, A., Hyyppä, J., Kaartinen, H., 2022. Semantic segmentation of point cloud data using raw laser scanner measurements and deep neural networks. ISPRS Open Journal of Photogrammetry and Remote Sensing 3, 100011. https://doi.org/10.1016/j.ophoto.2021.100011.
Kangas et al. (2018) Kangas, A., Astrup, R., Breidenbach, J., Fridman, J., Gobakken, T., Korhonen, K.T., Maltamo, M., Nilsson, M., Nord-Larsen, T., Næsset, E., et al., 2018. Remote sensing and forest inventories in Nordic countries–roadmap for the future. Scandinavian Journal of Forest Research 33, 397–412. https://doi.org/10.1080/02827581.2017.1416666.
Kellner et al. (2019) Kellner, J.R., Armston, J., Birrer, M., Cushman, K., Duncanson, L., Eck, C., Falleger, C., Imbach, B., Král, K., Krůček, M., et al., 2019. New opportunities for forest remote sensing through ultra-high-density drone lidar. Surveys in Geophysics 40, 959–977. https://doi.org/10.1007/s10712-019-09529-9.
Kim et al. (2023) Kim, D.H., Ko, C.U., Kim, D.G., Kang, J.T., Park, J.M., Cho, H.J., 2023. Automated segmentation of individual tree structures using deep learning over LiDAR point cloud data. Forests 14, 1159. https://doi.org/10.3390/f14061159.
Krisanski et al. (2021a) Krisanski, S., Taskhiri, M.S., Gonzalez Aracil, S., Herries, D., Muneri, A., Gurung, M.B., Montgomery, J., Turner, P., 2021a. Forest structural complexity tool—an open source, fully-automated tool for measuring forest point clouds. Remote Sensing 13, 4677. https://doi.org/10.3390/rs13224677.
Krisanski et al. (2021b) Krisanski, S., Taskhiri, M.S., Gonzalez Aracil, S., Herries, D., Turner, P., 2021b. Sensor agnostic semantic segmentation of structurally diverse and complex forest point clouds using deep learning. Remote Sensing 13, 1413. https://doi.org/10.3390/rs13081413.
Kuželka et al. (2020) Kuželka, K., Slavík, M., Surový, P., 2020. Very high density point clouds from UAV laser scanning for automatic tree stem detection and direct diameter measurement. Remote Sensing 12, 1236. https://doi.org/10.3390/rs12081236.
Lahoud et al. (2019) Lahoud, J., Ghanem, B., Pollefeys, M., Oswald, M., 2019. 3D instance segmentation via multi-task metric learning, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9255–9265. https://doi.org/10.1109/ICCV.2019.00935.
Li et al. (2023) Li, S., Brandt, M., Fensholt, R., Kariryaa, A., Igel, C., Gieseke, F., Nord-Larsen, T., Oehmcke, S., Carlsen, A.H., Junttila, S., Tong, X., d’Aspremont, A., Ciais, P., 2023. Deep learning enables image-based tree counting, crown segmentation, and height prediction at national scale. PNAS Nexus 2, 1–16. https://doi.org/10.1093/pnasnexus/pgad076.
Liang et al. (2018) Liang, X., Hyyppä, J., Kaartinen, H., Lehtomäki, M., Pyörälä, J., Pfeifer, N., Holopainen, M., Brolly, G., Francesco, P., Hackenberg, J., Huang, H., Jo, H.W., Katoh, M., Liu, L., Mokroš, M., Morel, J., Olofsson, K., Poveda-Lopez, J., Trochta, J., Wang, D., Wang, J., Xi, Z., Yang, B., Zheng, G., Kankare, V., Luoma, V., Yu, X., Chen, L., Vastaranta, M., Saarinen, N., Wang, Y., 2018. International benchmarking of terrestrial laser scanning approaches for forest inventories. ISPRS Journal of Photogrammetry and Remote Sensing 144, 137–179. https://doi.org/10.1016/j.isprsjprs.2018.06.021.
Liang et al. (2019) Liang, X., Wang, Y., Pyörälä, J., Lehtomäki, M., Yu, X., Kaartinen, H., Kukko, A., Honkavaara, E., Issaoui, A.E.I., Nevalainen, O., Vaaja, M., Virtanen, J.P., Katoh, M., Deng, S., 2019. Forest in situ observations using unmanned aerial vehicle as an alternative of terrestrial measurements. Forest Ecosystems 6. https://doi.org/10.1186/s40663-019-0173-3.
Maes et al. (2023) Maes, J., Bruzón, A.G., Barredo, J.I., Vallecillo, S., Vogt, P., Rivero, I.M., Santos-Martín, F., 2023. Accounting for forest condition in Europe based on an international statistical standard. Nature Communications 14, 3723. https://doi.org/10.1038/s41467-023-39434-0.
McInnes et al. (2017) McInnes, L., Healy, J., Astels, S., 2017. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2, 205. https://doi.org/10.21105/joss.00205.
Næsset (2002) Næsset, E., 2002. Predicting forest stand characteristics with airborne scanning laser using a practical two-stage procedure and field data. Remote Sensing of Environment 80, 88–99. https://doi.org/10.1016/S0034-4257(01)00290-5.
Næsset et al. (2004) Næsset, E., Gobakken, T., Holmgren, J., Hyyppä, H.H.J., Maltamo, M., Nilsson, M., Olsson, H., Persson, Å., Söderman, U., 2004. Laser scanning of forest resources: the nordic experience. Scandinavian Journal of Forest Research 19, 482–499. https://doi.org/10.1080/02827580410019553.
Nekrasov et al. (2021) Nekrasov, A., Schult, J., Litany, O., Leibe, B., Engelmann, F., 2021. Mix3D: Out-of-context data augmentation for 3D scenes, in: International Conference on 3D Vision (3DV), pp. 116–125. https://doi.org/10.1109/3DV53792.2021.00022.
de Paula Pires et al. (2022) de Paula Pires, R., Olofsson, K., Persson, H.J., Lindberg, E., Holmgren, J., 2022. Individual tree detection and estimation of stem attributes with mobile laser scanning along boreal forest roads. ISPRS Journal of Photogrammetry and Remote Sensing 187, 211–224. https://doi.org/10.1016/j.isprsjprs.2022.03.004.
Penner et al. (2023) Penner, M., White, J.C., Woods, M.E., 2023. Automated characterization of forest canopy vertical layering for predicting forest inventory attributes by layer using airborne LiDAR data. Forestry: An International Journal of Forest Research 97, 59–75. https://doi.org/10.1093/forestry/cpad033.
Persson et al. (2002) Persson, A., Holmgren, J., Soderman, U., 2002. Detecting and measuring individual trees using an airborne laser scanner. Photogrammetric Engineering and Remote Sensing 68, 925–932.
Persson et al. (2022) Persson, H.J., Olofsson, K., Holmgren, J., 2022. Two-phase forest inventory using very-high-resolution laser scanning. Remote Sensing of Environment 271, 112909. https://doi.org/10.1016/j.rse.2022.112909.
Puliti et al. (2020) Puliti, S., Breidenbach, J., Astrup, R., 2020. Estimation of forest growing stock volume with UAV laser scanning data: Can it be done without field data? Remote Sensing 12, 1245. https://doi.org/10.3390/rs12081245.
Puliti et al. (2022) Puliti, S., McLean, J.P., Cattaneo, N., Fischer, C., Astrup, R., 2022. Tree height-growth trajectory estimation using uni-temporal UAV laser scanning data and deep learning. Forestry: An International Journal of Forest Research 96, 37–48. https://doi.org/10.1093/forestry/cpac026.
Puliti et al. (2023) Puliti, S., Pearse, G., Surový, P., Wallace, L., Hollaus, M., Wielgosz, M., Astrup, R., 2023. FOR-instance: a UAV laser scanning benchmark dataset for semantic and instance segmentation of individual trees. https://doi.org/10.48550/arXiv.2309.01279.
Qi et al. (2017) Qi, C.R., Yi, L., Su, H., Guibas, L., 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space, in: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), pp. 5105–5114.
Roussel et al. (2020) Roussel, J.R., Auty, D., Coops, N.C., Tompalski, P., Goodbody, T.R., Meador, A.S., Bourdon, J.F., de Boissieu, F., Achim, A., 2020. lidR: An R package for analysis of Airborne Laser Scanning (ALS) data. Remote Sensing of Environment 251, 112061. https://doi.org/10.1016/j.rse.2020.112061.
Saltori et al. (2022) Saltori, C., Galasso, F., Fiameni, G., Sebe, N., Ricci, E., Poiesi, F., 2022. CoSMix: Compositional semantic mix for domain adaptation in 3D LiDAR segmentation, in: European Conference on Computer Vision (ECCV), Springer. pp. 586–602. https://doi.org/10.1007/978-3-031-19827-4_34.
Shewchuk (1996) Shewchuk, J.R., 1996. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator, in: Lin, M.C., Manocha, D. (Eds.), Applied Computational Geometry: Towards Geometric Engineering. Springer-Verlag. volume 1148 of Lecture Notes in Computer Science, pp. 203–222. From the First ACM Workshop on Applied Computational Geometry.
Shewchuk (2002) Shewchuk, J.R., 2002. Delaunay refinement algorithms for triangular mesh generation. Computational Geometry: Theory and Applications 22, 21–74. https://doi.org/10.1016/S0925-7721(01)00047-5.
Straker et al. (2023) Straker, A., Puliti, S., Breidenbach, J., Kleinn, C., Pearse, G., Astrup, R., Magdon, P., 2023. Instance segmentation of individual tree crowns with YOLOv5: A comparison of approaches using the ForInstance benchmark LiDAR dataset. ISPRS Open Journal of Photogrammetry and Remote Sensing 9, 100045. https://doi.org/10.1016/j.ophoto.2023.100045.
Strîmbu and Strîmbu (2015) Strîmbu, V.F., Strîmbu, B.M., 2015. A graph-based segmentation algorithm for tree crown extraction using airborne LiDAR data. ISPRS Journal of Photogrammetry and Remote Sensing 104, 30–43. https://doi.org/10.1016/j.isprsjprs.2015.01.018.
Sun et al. (2022) Sun, C., Huang, C., Zhang, H., Chen, B., An, F., Wang, L., Yun, T., 2022. Individual tree crown segmentation and crown width extraction from a heightmap derived from aerial laser scanning data using a deep learning framework. Frontiers in Plant Science 13, 914974. https://doi.org/10.3389/fpls.2022.914974.
du Toit et al. (2023) du Toit, F., Coops, N.C., Ratcliffe, B., El-Kassaby, Y.A., Lucieer, A., 2023. Modelling internal tree attributes for breeding applications in Douglas-fir progeny trials using RPAS-ALS. Science of Remote Sensing 7, 100072. https://doi.org/10.1016/j.srs.2022.100072.
Triess et al. (2021) Triess, L.T., Dreissig, M., Rist, C.B., Zollner, J.M., 2021. A survey on deep domain adaptation for LiDAR perception, in: IEEE Intelligent Vehicles Symposium Workshops (IV Workshops), pp. 350–357. https://doi.org/10.1109/IVWorkshops54471.2021.9669228.
Trochta et al. (2017) Trochta, J., Krůček, M., Vrška, T., Král, K., 2017. 3D Forest: An application for descriptions of three-dimensional forest structures using terrestrial LiDAR. PLoS ONE 12, e0176871. https://doi.org/10.1371/journal.pone.0176871.
Vauhkonen et al. (2011) Vauhkonen, J., Ene, L., Gupta, S., Heinzel, J., Holmgren, J., Pitkänen, J., Solberg, S., Wang, Y., Weinacker, H., Hauglin, K.M., Lien, V., Packalén, P., Gobakken, T., Koch, B., Næsset, E., Tokola, T., Maltamo, M., 2011. Comparative testing of single-tree detection algorithms under different types of forest. Forestry: An International Journal of Forest Research 85, 27–40. https://doi.org/10.1093/forestry/cpr051.
Vu et al. (2022) Vu, T., Kim, K., Luu, T.M., Nguyen, T., Yoo, C.D., 2022. SoftGroup for 3D instance segmentation on point clouds, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2698–2707. https://doi.org/10.1109/CVPR52688.2022.00273.
Wang (2020) Wang, D., 2020. Unsupervised semantic and instance segmentation of forest point clouds. ISPRS Journal of Photogrammetry and Remote Sensing 165, 86–97. https://doi.org/10.1016/j.isprsjprs.2020.04.020.
Wang and Bryson (2023) Wang, F., Bryson, M., 2023. Tree segmentation and parameter measurement from point clouds using deep and handcrafted features. Remote Sensing 15, 1086. https://doi.org/10.3390/rs15041086.
Wang et al. (2023a) Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., Chandak, P., Liu, S., Van Katwyk, P., Deac, A., Anandkumar, A., Bergen, K., Gomes, C.P., Ho, S., Kohli, P., Lasenby, J., Leskovec, J., Liu, T.Y., Manrai, A., Marks, D., Ramsundar, B., Song, L., Sun, J., Tang, J., Veličković, P., Welling, M., Zhang, L., Coley, C.W., Bengio, Y., Zitnik, M., 2023a. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60. https://doi.org/10.1038/s41586-023-06221-2.
Wang et al. (2019a) Wang, X., Liu, S., Shen, X., Shen, C., Jia, J., 2019a. Associatively segmenting instances and semantics in point clouds, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4091–4100. https://doi.org/10.1109/CVPR.2019.00422.
Wang et al. (2019b) Wang, Y., Lehtomäki, M., Liang, X., Pyörälä, J., Kukko, A., Jaakkola, A., Liu, J., Feng, Z., Chen, R., Hyyppä, J., 2019b. Is field-measured tree height as reliable as believed – a comparison study of tree height estimates from field measurement, airborne laser scanning and terrestrial laser scanning in a boreal forest. ISPRS Journal of Photogrammetry and Remote Sensing 147, 132–145. https://doi.org/10.1016/j.isprsjprs.2018.11.008.
Wang et al. (2023b) Wang, Z., Li, P., Cui, Y., Lei, S., Kang, Z., 2023b. Automatic detection of individual trees in forests based on airborne LiDAR data with a tree region-based convolutional neural network (RCNN). Remote Sensing 15, 1024. https://doi.org/10.3390/rs15041024.
Welzl (1991) Welzl, E., 1991. Smallest enclosing disks (balls and ellipsoids), in: Maurer, H. (Ed.), New Results and New Trends in Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg. pp. 359–370. https://doi.org/10.1007/BFb0038202.
Wielgosz et al. (2023) Wielgosz, M., Puliti, S., Wilkes, P., Astrup, R., 2023. Point2Tree(P2T)—framework for parameter tuning of semantic and instance segmentation used with mobile laser scanning data in coniferous forest. Remote Sensing 15, 3737. https://doi.org/10.3390/rs15153737.
Windrim and Bryson (2020) Windrim, L., Bryson, M., 2020. Detection, segmentation, and model fitting of individual tree stems from airborne laser scanning of forests using deep learning. Remote Sensing 12, 1469. https://doi.org/10.3390/rs12091469.
Xi and Hopkinson (2022) Xi, Z., Hopkinson, C., 2022. 3D graph-based individual-tree isolation (Treeiso) from terrestrial laser scanning point clouds. Remote Sensing 14, 6116. https://doi.org/10.3390/rs14236116.
Xi et al. (2018) Xi, Z., Hopkinson, C., Chasmer, L., 2018. Filtering stems and branches from terrestrial laser scanning point clouds using deep 3-D fully convolutional networks. Remote Sensing 10, 1215. https://doi.org/10.3390/rs10081215.
Xiang et al. (2023a) Xiang, B., Peters, T., Kontogianni, T., Vetterli, F., Puliti, S., Astrup, R., Schindler, K., 2023a. Towards accurate instance segmentation in large-scale LiDAR point clouds, in: Geospatial Week Laser Scanning Workshop – ISPRS Annals X-1-W1. https://doi.org/10.5194/isprs-annals-X-1-W1-2023-605-2023.
Xiang et al. (2023b) Xiang, B., Yue, Y., Peters, T., Schindler, K., 2023b. A review of panoptic segmentation for mobile map** point clouds. ISPRS Journal of Photogrammetry and Remote Sensing 203, 373–391. https://doi.org/10.1016/j.isprsjprs.2023.08.008.
Xu et al. (2021) Xu, S., Zhou, K., Sun, Y., Yun, T., 2021. Separation of wood and foliage for trees from ground point clouds using a novel least-cost path model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 6414–6425. https://doi.org/10.1109/JSTARS.2021.3090502.
Yan et al. (2020) Yan, W., Guan, H., Cao, L., Yu, Y., Li, C., Lu, J., 2020. A self-adaptive mean shift tree-segmentation method using UAV LiDAR data. Remote Sensing 12, 515. https://doi.org/10.3390/rs12030515.
Yin and Wang (2016) Yin, D., Wang, L., 2016. How to assess the accuracy of the individual tree-based forest inventory derived from remotely sensed data: a review. International Journal of Remote Sensing 37, 4521–4553. https://doi.org/10.1080/01431161.2016.1214302.
Zhang et al. (2015) Zhang, C., Zhou, Y., Qiu, F., 2015. Individual tree segmentation from LiDAR point clouds for urban forest inventory. Remote Sensing 7, 7892–7913. https://doi.org/10.3390/rs70607892.
Zhang et al. (2019) Zhang, W., Wan, P., Wang, T., Cai, S., Chen, Y., **, X., Yan, G., 2019. A novel approach for the detection of standing tree stems from plot-level terrestrial laser scanning data. Remote Sensing 11, 211. https://doi.org/10.3390/rs11020211.
Zhang et al. (2023) Zhang, Y., Liu, H., Liu, X., Yu, H., 2023. Towards intricate stand structure: A novel individual tree segmentation method for ALS point cloud based on extreme offset deep learning. Applied Science 13, 6853. https://doi.org/10.3390/app13116853.
Zhao et al. (2021) Zhao, Y., Zhang, X., Huang, X., 2021. A technical survey and evaluation of traditional point cloud clustering methods for LiDAR panoptic segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2464–2473. https://doi.org/10.1109/ICCVW54120.2021.00279.
Zhong et al. (2022) Zhong, M., Chen, X., Chen, X., Zeng, G., Wang, Y., 2022. MaskGroup: Hierarchical point grou** and masking for 3D instance segmentation, in: International Conference on Multimedia and Expo, pp. 1–6. https://doi.org/10.1109/ICME52920.2022.9859996.
Zörner et al. (2018) Zörner, J., Dymond, J.R., Shepherd, J.D., Wiser, S.K., Jolly, B., 2018. LiDAR-based regional inventory of tall trees—Wellington, New Zealand. Forests 9, 702. https://doi.org/10.3390/f9110702.

Appendix A List of abbreviations

Table 13:

Abbreviations and symbols	Meaning
ABA	area-based approach
ALS	airborne laser scanning
ALS-HD	very high density ALS point clouds
CHM	canopy height model
CWD	coarse woody debris
DBH	diameter at breast height
DEM	digital elevation model
DLS	drone laser scanning
DSM	digital surface model
DTM	digital terrain model
FCN	fully convolutional network
GT	ground truth
GUI	graphical user interface
CV	computer vision
IoU	intersection over union
ITC	individual tree crown
ITD	individual tree detection
LiDAR	Light Detection and Ranging
MLP	multi-layer perception
MLS	mobile laser scanning
NMS	non-maximum suppression
RMSE	root mean square error
TLS	terrestrial laser scanning
UAV	unmanned aerial vehicle
ULS	unmanned laser scanning

Δ=Δ Δ=Δ