¹¹institutetext: German Cancer Research Center (DKFZ), Heidelberg, Division of Medical Image Computing, Germany ²²institutetext: Faculty of Mathematics and Computer Science, Heidelberg University, Germany ³³institutetext: HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany ⁴⁴institutetext: Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany ⁵⁵institutetext: Helmholtz Imaging, German Cancer Research Center, Heidelberg, Germany ⁶⁶institutetext: Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany ⁷⁷institutetext: Division for Computational Neuroimaging, Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany ⁸⁸institutetext: Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany ⁹⁹institutetext: Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen, University Hospital Essen, Essen, Germany ¹⁰¹⁰institutetext: Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital

Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures

Yannick Kirchhoff Contributed equally. All co-primary authors agree that they may designate themselves as the lead author in their profiles.112233 Maximilian R. Rokuss^* 1122 Saikat Roy^* 1122 Balint Kovacs 1144 Constantin Ulrich 1144 Tassilo Wald 112255 Maximilian Zenk 1144 Philipp Vollmuth 116677 Jens Kleesiek 8899 Fabian Isensee 1155 Klaus Maier-Hein 111010

Abstract

Accurately segmenting thin tubular structures, such as vessels, nerves, roads or concrete cracks, is a crucial task in computer vision. Standard deep learning-based segmentation loss functions, such as Dice or Cross-Entropy, focus on volumetric overlap, often at the expense of preserving structural connectivity or topology. This can lead to segmentation errors that adversely affect downstream tasks, including flow calculation, navigation, and structural inspection. Although current topology-focused losses mark an improvement, they introduce significant computational and memory overheads. This is particularly relevant for 3D data, rendering these losses infeasible for larger volumes as well as increasingly important multi-class segmentation problems. To mitigate this, we propose a novel Skeleton Recall Loss, which effectively addresses these challenges by circumventing intensive GPU-based calculations with inexpensive CPU operations. It demonstrates overall superior performance to current state-of-the-art approaches on five public datasets for topology-preserving segmentation, while substantially reducing computational overheads by more than $90\%$ . In doing so, we introduce the first multi-class capable loss function for thin structure segmentation, excelling in both efficiency and efficacy for topology-preservation. Our code is available to the community, providing a foundation for further advancements at: [will be shared later]

Keywords:

Segmentation Topology Tubular Structures Loss Function

1 Introduction

The precise segmentation of thin tubular structures is a critical task across diverse domains in engineering and medical applications (Fig. 1). Topological correctness is fundamental for facilitating downstream tasks such as analyzing blood flow dynamics, delineating neuronal boundaries in Electron Microscopy imagery, evaluating risk factors for vascular pathologies, aiding in surgical planning, and optimizing route planning [21, 1, 32, 40, 6, 5]. Classical approaches for the automated segmentation of thin curvilinear structures have encompassed methods including image transforms [25, 34], mathematical morphologies [41, 29], filtering [13, 10, 15], differential operators [33], among others [8, 16, 19, 3, 2]. Deep learning based techniques have played an increasing role in recent years with standard segmentation networks such as UNet [27] being popular. Standard overlap-based losses (eg. dice-similarity coefficient [43]) enable such networks to segment large structures while often struggling with small elongated ones [18] as shown in Fig. 2.

Multiple methods have been introduced in recent years to address the challenges of segmenting thin curvilinear structures but are often domain-specific or require the use of specialized networks [39, 23, 24, 22, 17, 4, 9]. Recently, centerlineDice [31] (clDice) was introduced encompassing both a loss function and a metric for measuring connectivity in segmentation of thin structures. Effectively, it incorporates the skeleton of a segmentation into the dice calculation. While the clDice metric uses the exact skeleton, the clDice loss works with a differentiable approximation of the skeleton. It is used to enable architecture-independent, topology-aware segmentation of thin tubular structures and is considered as state-of-the art. However, despite its advantages, it introduces a large computational overhead. Furthermore, the differentiable Soft Skeleton used in the loss calculation can often be jagged as depicted in Fig. 3, leading to inaccuracies in segmentation. While a follow-up approach [20] attempted to address this limitation by introducing a topological-correct differentiable skeleton, it did so while being even more computationally expensive. This limitation becomes particularly pronounced when dealing with large volumes or multi-class segmentation problems common to 3D medical image segmentation, rendering training on multi-class 3D datasets challenging to infeasible even on modern hardware.

In response to these challenges, we propose Skeleton Recall Loss, a novel loss function tailored to address the intricacies associated with thin structures in segmentation tasks. Skeleton Recall Loss demonstrates the following strengths:

1.

Minimal training time: The Tubed Skeleton required by Skeleton Recall Loss can be computed with simple CPU-based operations using common image processing frameworks (e.g. scikit-image [38]) as part of data-loading or even be precomputed. It is then used in a simple additional soft recall loss with the prediction, thus requiring very little additional training time.
2.

Minimal training memory: The utilization of Skeleton Recall Loss entails minimal GPU Memory overhead. Unlike approaches reliant on a differentiable skeleton in prediction or ground truth during training, Skeleton Recall Loss sidesteps the computationally taxing GPU-based skeletonization process, thus necessitating only a marginal increase in GPU training memory.
3.

Domain and architecture agnostic: Skeleton Recall Loss exhibits inherent plug-and-play characteristics, seamlessly integrating into a wide array of 2D and 3D segmentation tasks without imposing architectural constraints. It operates without the need for specialized networks or modifications to underlying segmentation architectures.
4.

Multi-class compatibility: Skeleton Recall Loss integrates seamlessly with multi-class labels, while competing methods like clDice Loss, often face near insurmountable computational challenges on such problems.

Skeleton Recall Loss yields overall superior results to a baseline network without topological losses, as well as against clDice Loss as a state-of-the-art topological loss. We demonstrate this effectiveness on extensive multi-domain evaluation on 5 publicly available datasets. Notably, our loss function inherently feasibly supports multi-class segmentation problems and thus can be considered a new state-of-the-art for dilineating thin curvilinear structures in natural as well as medical images.

2 Related Work

Deep learning-based approaches for segmentation of thin curvilinear structures often involve specialized networks. In [22], a joint network was trained with a shared encoder to use 2 decoders to simultaneously segment as well as score a tubular path. Similarly, in [4], a joint network was proposed to simultaneously learn features as well as global topology. Also, in [17], a pair of sequential UNets was used, the first of which performed a coarse prediction while the other detected missing or false splits in the structure. A similar end-to-end approach in [24] used a channel and spatial attention module, which was incorporated within the bottleneck of an encoder-decoder network for segmenting thin structures in medical images. In another work [26], an oriented derivative of stick (ODoS) filter output was refined using a succession of UNets to obtain effective segmentation of curvilinear objects. However, alternative approaches utilize specialized loss functions to assist the network in preserving topology in segmentation outputs. In [7], topological priors were incorporated into network training by the usage of a persistent-homology based differentiable topological prior as a loss function. Another work [11], attempted to preserve topological information by enforcing that the prediction and ground truth have the same Betti number via a novel loss function. Recently, the use of a differentiable skeleton has emerged as the predominant method for topology-aware segmentation of thin structures following the introduction of the centerlineDice (clDice) loss function[31], outperforming persistent-homology based approaches. This method is complemented by the introduction of the clDice metric, a well-established measure of connectedness. The differentiable skeleton proposed by [31] was improved in [37] where a Soft-Persistent Skeleton was proposed for coronal artery tracking. Alternatively, in [28], the differentiable skeleton was predicted by a secondary network in addition to the primary segmentation output. Most recently, a topologically correct differentiable skeletonization algorithm was introduced in [20], overcoming previous approximation of skeletons, while still requiring massive computational resources to do so.

3 Methodology

3.1 The challenges of Differentiable Skeletons

The usage of a differentiable skeleton based loss [31, 37, 28, 20] in training a deep neural network to segment thin tubular structures is an intuitive approach to preserve connectivity. However, it is fraught with challenges which can be multi-faceted in nature. One of the most easily demonstrable issues is shown in Fig. 3. As mentioned earlier, the so-called Soft skeletonization of clDice Loss can lead to perforated and jagged skeletons, especially in 3D, which results in inaccuracies for the clDice Loss calculation. This is in addition to the enormous GPU memory and training time overheads that are a natural part of a GPU-based differentiable skeletonization process on both the ground truth as well as the network prediction. This overhead is demonstrated in Fig. 7, and can render effective training almost infeasible in multi-class datasets with large 3D input volumes such as TopCoW [40] (which is used in this work) without access to significant computational resources. While follow-up work in [20] allowed for a relative improvement in topologically accurate differentiable skeletonization, it further aggravated the issues with excessive resource utilization.

3.2 Skeleton Recall Loss: Connectivity conservation on thin structures without differentiable skeletons

Skeleton Recall Loss is a loss function designed to preserve connectivity in thin tubular structures without incurring massive computational overheads. It is universally applicable, regardless of whether the inputs are 2D or 3D. It does so by avoiding the GPU-based soft-skeletonization on the prediction and ground truth. Instead, a tubed skeletonization is performed on the ground truth, followed by a soft recall loss against the predicted segmentation output. This is illustrated in Fig. 4 and further discussed in the following sections.

3.2.1 Tubed Skeletonization

The usage of a skeleton for the preservation of connectivity is an effective method, but it does not need to be differentiable. In this work, we extract a tubed skeleton from the ground truth as demonstrated in Algorithm 1. Initially, we binarize the ground truth segmentation mask and compute its skeleton using methods outlined in [42] for 2D and [14] for 3D inputs. Subsequently, we dilate the skeleton with a diamond kernel of radius $2$ to make it tubular, thereby enlarging the effective area for loss computation around the otherwise thin, single-pixel-wide skeleton. This enhances the stability of the loss by incorporating signals from a greater number of pixels, particularly those in close proximity to the skeleton, which are vital for connectivity. Lastly, for multi-class problems, we multiply the tubed skeleton with the ground truth mask, effectively assigning parts of the skeleton to their respective classes. All of these operations are computationally inexpensive and can be carried out on the CPU during data loading or pre-computed using libraries such as scikit-image [38].

3.2.2 Soft Recall on Tubed Skeleton

Following the extraction of our tubed skeleton, we incentivize the network to include as much of this skeleton as possible as part of the prediction. This is performed simply by using a soft recall loss $\mathcal{L}_{SkelRecall}$ , in addition to any existing generic loss $\mathcal{L}_{generic}$ used by the network (for example, Dice Loss, Cross Entropy Loss, etc.), as seen in Algorithm 2. This vastly improves the connectivity of thin curvilinear structures predicted by a network trained using this loss (Sec. 5.1). Additionally, Skeleton Recall Loss is computationally inexpensive in comparison to the use of a differentiable skeletonization, requiring only fractionally more GPU memory and additional time during training (Sec. 5.3). This facilitates training multi-class segmentation problems as opposed to current differentiable skeleton methods which incur infeasible overheads (Sec. 5.3.2).

Algorithm 1 Tubed Skeletonization

Y

are

K

-classed hard targets where

Y_{i,j(,k)}\in[0,K]

Y_{\text{bin}}\leftarrow Y>0

% Binarize to foreground and background labels

Y_{\text{skel}}\leftarrow\text{{skeletonize}}(Y_{\text{bin}})

% Extract binarized skeleton

Y_{\text{skel}}\leftarrow\text{{dilate}}(Y_{\text{skel}})

% Dilate to create tubed skeleton

Y_{\text{mc-skel}}\leftarrow Y_{\text{skel}}\times Y

% De-binarize to create multi-class tubed skeleton

5: return

Y_{\text{mc-skel}}

Algorithm 2 Skeleton Recall Loss

0: target labels (

Y

), network predictions (

Y_{\text{pred}}

K

-classes, weighing factor

w

% Generic loss computation, e.g. Cross Entropy, Dice Similarity (GPU)

Y_{\text{onehot}}\leftarrow\text{{onehot}}(Y)

\mathcal{L}_{generic}\leftarrow\mathbf{GenericLoss}(Y_{\text{onehot}},Y_{\text% {pred}})

% Precomputed using CPU Operations

Y_{\text{mc-skel}}\leftarrow\text{{TubedSkeletonization}}(Y)

% Extract tubed skeleton using Alg. 1 % Skeleton Recall Loss computation (GPU)

Y_{\text{mc-skel}}\leftarrow\text{{onehot}}(Y_{\text{mc-skel}})

L_{\text{mc-skel}}=\{\}

%Soft Recall with tubed skeletons. Practically implemented as vector operations

6: for all

class\in[1,K]

L_{\text{mc-skel}}[class]\leftarrow-\frac{\mathbf{sum}(Y_{\text{mc-skel}}[% class]\times Y_{\text{pred}}[class])}{\mathbf{sum}(Y_{\text{mc-skel}}[class])}

8: end for

\mathcal{L}_{SkelRecall}\leftarrow\mathbf{mean}(L_{\text{mc-skel}})

10:

\mathcal{L}\leftarrow\mathcal{L}_{generic}+w\cdot\mathcal{L}_{SkelRecall}

11: return

\mathcal{L}

4 Experimental Setup

4.1 Datasets

Table 1: Details of the datasets used for training and evaluation. Our datasets show wide coverage over a number of thin structure segmentation tasks in natural and medical images. The TopCoW dataset is used both in binary and multi-class settings, in line with the original challenge.

Dataset	Dims	Type	# Samples	Target Structure
Roads[21]	2D	binary	804	Roads on aerial images
DRIVE[32]	2D	binary	20	Blood vessels on retina images
Cracks[36]	2D	binary	5388	Cracks on concrete structure images
ToothFairy[6, 5]	3D	binary	138	Inferior Alveolar Canal
TopCoW[40]	3D	multi-class	200	Circle of Willis vessels in the brain

We employ five public datasets featuring thin structures for validating the proposed Skeleton Recall Loss. The datasets span natural as well as medical images, covering a range of segmentation challenges, including both binary and multi-class segmentation problems in 2D as well as 3D contexts. An overview of the datasets can be found in Tab. 1. Among the three 2D datasets used in this study, the Digital Retinal Images for Vessel Extraction (DRIVE) dataset [32] was employed, focusing on retinal vessel segmentation. Additionally, structural inspection images designed for concrete crack segmentation (Cracks) [36] and aerial images of Massachusetts for road segmentation (Roads) [21] were included, highlighting the diversity of thin structures in natural and constructed environments. In the 3D domain, we incorporated two cutting-edge medical image segmentation challenge datasets. One of them was ToothFairy¹¹1https://toothfairy.grand-challenge.org/, which was a segmentation challenge on 3D Cone-Beam CTs [6, 5] featuring the inferior alveolar canal as the target structure. Additionally, the TopCoW²²2https://topcow23.grand-challenge.org/ dataset for topology-aware 3D segmentation of vessels in the Circle of Willis for CTA and MRA data [40] was utilized, encompassing binary as well as multi-class segmentation on 13 different subtypes of vessels. This diverse set of datasets enables a comprehensive evaluation of the proposed Skeleton Recall Loss, demonstrating generalizability of the method to a wide range of thin structure segmentation challenges in both 2D and 3D contexts.

4.2 Evaluation Metrics

We use multiple metrics including overlap, connectivity and topological measures for thorough evaluation of our proposed loss function. An interesting dichotomy of clDice [31] is that while it makes for an inefficient loss function for training deep neural networks, it is an effective metric for measuring connectivity. Therefore, following existing guidelines [18] for semantic segmentation of tubular structures, we use clDice as a metric in conjunction with Dice similarity coefficient, as our connectivity- and overlap-based measures. Similar to previous work, we also report on topology-based metrics, namely, absolute Betti Number Errors of 0^th and 1^st Betti Numbers, $\beta_{0}$ and $\beta_{1}$ . However, in contrast to other work [11, 31], we calculate the Betti Errors on whole volumes instead of small, randomly extracted patches. Our evaluation strategy is more intuitive in nature and offers better interpretability of the measure, which is especially relevant in medical segmentation tasks.

4.3 Baseline Loss Functions

We benchmark our proposed loss function against state-of-the-art loss functions targeting thin structure segmentation on the five datasets detailed in Sec. 4.1. Specifically, we compare against: 1) clDice Loss [31], the leading method in the field that utilizes approximate differentiable skeletons. 2) Additionally, we also compare against a modification of clDice Loss, where we replace the differentiable skeletonization of the original publication by a follow-up of this work [20]. This new method, called Topo-clDice Loss in our evaluations, produces topologically-accurate differentiable skeletons at the cost of even higher computational requirements. We note that loss functions based on persistent homologies [22, 11] are excluded from our evaluation which, while related, were surpassed by the clDice Loss [31].

4.4 Training

We implement the baseline loss functions (Sec. 4.3), as well as our proposed Skeleton Recall Loss in a powerful medical image segmentation network (nnUNet [12]) and a state-of-the-art natural image segmentation network (HRNet [35]), pretrained on ImageNet [30]. We use the examined loss functions for connectivity conservation ( $\mathcal{L}_{connectivity}$ ) in addition to the underlying generic loss ( $\mathcal{L}_{generic}$ ) of our training framework – a combination of Cross-Entropy and Soft Dice Loss. The connectivity loss is weighted by an additional parameter $w$ as shown in Eq. 1.

\mathcal{L}=\mathcal{L}_{generic}+w\cdot\mathcal{L}_{connectivity}

(1)

Our experiments are restricted to two weight configurations $w\in\{0.1,1.0\}$ in order to curb the influence of extensive hyperparameter tuning. We show a more detailed analysis of the effect of the weight parameter in the Appendix. Additionally, the full set of hyperparameters, optimizers and configurations of nnUNet and HRNet used for training on the different datasets are also provided in the Appendix.

5 Results and Discussion

5.1 Skeleton Recall Loss enables state-of-the-art segmentation of thin structures

The obtained results in Tab. 2 clearly show that our proposed Skeleton Recall Loss consistently surpasses previous thin structure segmentation losses on almost all datasets. For concrete crack segmentation[36], the results indicate better Dice and clDice performance at the cost of slightly worse Betti numbers than clDice Loss. However, Skeleton Recall Loss demonstrates the best clDice and Betti numbers for retinal vessel segmentation[32], yielding a Dice score just marginally behind clDice Loss. Notably, for the three datasets with an independent testset available, specifically Roads[21] and both of the 3D datasets, ToothFairy[6, 5] and TopCoW[40], we observe superior performance of our proposed Skeleton Recall Loss. This is further demonstrated by the qualitative results given in Fig. 5. Skeleton Recall Loss is also seen to be better than baselines, on both binary as well as multi-class settings of TopCoW as elaborated in following sections. We obtain this state-of-the-art performance while being architecture agnostic (Sec. 5.2) as well as overwhelmingly resource efficient (Sec. 5.3).

Table 2: State-of-the-art segmentation of thin structures. Quantitative results obtained by incorporating our proposed Skeleton Recall Loss as well as existing thin structure segmentation losses into the loss function of a generic nnUNet backbone. Results are reported on the testset, except for DRIVE and Cracks datasets, where we report 5-fold cross validation results due to unavailability of an independent testset.

Dataset Loss configuration Dice $\uparrow$ clDice $\uparrow$ $\mathbf{\beta_{0}}$ error $\downarrow$ $\mathbf{\beta_{1}}$ error $\downarrow$ Roads [21] Default nnUNet 78.99 88.79 5.769 84.62 + clDice Loss 79.15 89.00 6.539 82.00 + Topo-clDice Loss 78.94 88.62 11.00 85.92 + Skeleton Recall Loss (Ours) 79.25 89.06 4.846 83.69 DRIVE [32] Default nnUNet 80.87 80.26 57.00 22.80 + clDice Loss 81.05 80.68 44.50 23.35 + Topo-clDice Loss 80.80 80.19 46.35 23.70 + Skeleton Recall Loss (Ours) 80.99 80.83 38.75 21.50 Cracks [36] Default nnUNet 94.59 95.76 0.147 0.0033 + clDice Loss 94.80 95.96 0.142 0.0033 + Topo-clDice Loss 94.83 96.00 0.159 0.0035 + Skeleton Recall Loss (Ours) 94.88 96.04 0.148 0.0035 ToothFairy [6, 5] Default nnUNet 71.80 89.16 0.900 0.0200 + clDice Loss 72.36 89.67 0.620 0.0200 + Skeleton Recall Loss (Ours) 74.42 92.05 0.540 0.0200 TopCoW binary [40] Default nnUNet 93.55 98.25 0.743 1.800 + clDice Loss 93.64 98.35 0.514 1.986 + Skeleton Recall Loss (Ours) 93.72 98.48 0.500 1.586 TopCoW multi-class Default nnUNet 85.36 93.68 0.137 0.0571 + clDice Loss – Out Of Memory – + Skeleton Recall Loss (Ours) 86.59 94.35 0.151 0.056

5.2 Skeleton Recall Loss is architecture agnostic

While Skeleton Recall Loss demonstrates state-of-the-art performance usingnnUNet as a backbone framework in Tab. 2, it is not restricted to specialized architectures. We highlight this in Tab. 3 where HRNet [35], a state-of-the-art 2D architecture for natural image segmentation, is used as the backbone. This leads to similar benefits on connectivity conservation using Skeleton Recall Loss during training on our 2D datasets. Skeleton Recall Loss is seen to exceed the connectivity conserving performance (as demonstrated by the clDice metric) on 2 out of 3 datasets, while being comparable on the remaining one. Our overall superiority over all metrics demonstrates that Skeleton Recall Loss is architecture agnostic and can be used as a loss in training arbitrary deep architectures for connectivity-conserving segmentation of thin structures.

Table 3: Skeleton Recall Loss is architecture agnostic. Quantitative results using HRNet, a state-of-the-art 2D network, on all examined 2D datasets. Skeleton Recall Loss demonstrates accurate segmentation including effective connectivity conservation, without explicit reliance on a particular deep neural network architecture.

Dataset Loss configuration Dice $\uparrow$ clDice $\uparrow$ $\mathbf{\beta_{0}}$ error $\downarrow$ $\mathbf{\beta_{1}}$ error $\downarrow$ Roads HRNet 77.91 87.63 18.08 83.15 + clDice Loss 78.05 88.21 8.846 82.31 + Skeleton Recall Loss (Ours) 78.25 88.14 15.38 81.31 DRIVE HRNet 80.34 77.83 149.3 51.25 + clDice Loss 80.63 81.65 65.75 46.25 + Skeleton Recall Loss (Ours) 76.08 84.20 27.25 38.50 Cracks HRNet 94.09 95.39 0.1985 0.0037 + clDice Loss 94.67 96.01 0.1558 0.0037 + Skeleton Recall Loss (Ours) 95.03 96.26 0.1596 0.0019

5.3 Connectivity conservation with minimal overheads

5.3.1 Efficient binary segmentation of thin structures

A plurality of tasks in the segmentation of thin curvilinear structures have historically been binary in nature. As competing state-of-the-art differentiable skeleton methods were developed for the binary scenario, we consider this to be where such methods should also be most competitive. However, Skeleton Recall Loss does not only provide state-of-the-art connectivity-conserving thin structure segmentation performance, as seen in Sec. 5.1 and Sec. 5.2, but it can do so while using only fractional GPU memory and training time compared to existing methods, as shown in Fig. 6. Differentiable skeleton based methods require a GPU-based skeleton computation [31] or prediction [28]. For our differentiable skeleton baseline clDice Loss, this leads to approximately $88\%$ additional training time and $52\%$ more VRAM consumption compared to the plain nnUNet backbone when averaged across our 5 datasets (excluding multi-class TopCoW). Remarkably, our method Skeleton Recall Loss does the same at only an additional $\mathbf{8\%}$ training time and $\mathbf{2\%}$ higher VRAM consumption. This illustrates that Skeleton Recall Loss categorically outperforms traditional differentiable skeleton-based methods on binary settings, which they were developed for, in terms of resource efficiency.

5.3.2 Enabling multi-class segmentation of thin structures

Binary segmentation has historically sufficed for many image analysis tasks across various domains. However, as the demand for finer-grained analysis grows, transitioning to multi-class segmentation becomes increasingly vital. This shift is especially pertinent in medical contexts due to the prevalence of thin structures where binary segmentation may not adequately capture the complexity of anatomical features. For instance, the recent TopCoW challenge[40] revealed that binary segmentation of brain vessels can be deemed as sufficiently solved, approaching inter-rater agreement in the Dice score. However, differentiating between the different vessels still remains a challenging task. Our Skeleton Recall Loss demonstrates powerful multi-class segmentation capabilities in addition to standard binary settings. Tab. 2 showcases the results of multi-class segmentation on 13 different brain vessel classes of the TopCoW dataset using both standard nnUNet and our proposed Loss. The results demonstrate that while nnUNet exhibits slightly better $\beta_{0}$ error, our Skeleton Recall Loss significantly improves Dice and clDice scores. Moreover, it performs on par in terms of $\beta_{1}$ error, ultimately yielding a superior overall result.

Fig. 7 shows the multi-class resource utilization with respect to the number of classes of our proposed Loss in comparison to clDice Loss. We demonstrate significant training time and memory savings with near-constant additional overhead despite the increasing number of classes. In contrast, the plots underscore the approximately linear growth in memory consumption and training time associated with clDice Loss. We note that the inefficiency of clDice Loss rendered it infeasible on all 13 classes as it exceeded the memory capacity of an A100 40GB GPU. In summary, Skeleton Recall Loss can be employed for an arbitrary number of classes with minimal computational cost.

6 Conclusion

This paper proposes a novel loss function, Skeleton Recall Loss, designed for connectivity preserving semantic segmentation. It is domain and architecture agnostic and, unlike existing methods, requires minimal additional training time and memory. Through extensive evaluation on five publicly available datasets, we demonstrate that Skeleton Recall Loss shows overall superior performance on existing state-of-the-art topology-aware loss functions. Moreover, it stands as the first loss function designed for computationally manageable thin structure segmentation within the increasingly significant but hitherto unaddressed multi-class context. In essence, Skeleton Recall Loss represents a significant advancement in the field of thin structure segmentation, offering both efficiency and efficacy. The public availability of our code further facilitates its adoption and serves as a foundation for future advancements in this critical area of study.

Acknowledgements

The present contribution is supported by the Helmholtz Association under the joint research school "HIDSS4Health – Helmholtz Information and Data Science School for Health."

References

[1] Arganda-Carreras, I., Turaga, S.C., Berger, D.R., Cireşan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J., Laptev, D., Dwivedi, S., Buhmann, J.M., et al.: Crowdsourcing the creation of image segmentation algorithms for connectomics. Frontiers in neuroanatomy 9, 142 (2015)
[2] Bibiloni, P., González-Hidalgo, M., Massanet, S.: A survey on curvilinear object segmentation in multiple applications. Pattern Recognition 60, 949–970 (2016)
[3] Chambon, S., Moliard, J.M.: Automatic road pavement assessment with image processing: Review and comparison. International Journal of Geophysics 2011 (2011)
[4] Cheng, M., Zhao, K., Guo, X., Xu, Y., Guo, J.: Joint topology-preserving and feature-refinement network for curvilinear structure segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7147–7156 (2021)
[5] Cipriano, M., Allegretti, S., Bolelli, F., Di Bartolomeo, M., Pollastri, F., Pellacani, A., Minafra, P., Anesi, A., Grana, C.: Deep segmentation of the mandibular canal: a new 3d annotated dataset of cbct volumes. IEEE Access 10, 11500–11510 (2022)
[6] Cipriano, M., Allegretti, S., Bolelli, F., Pollastri, F., Grana, C.: Improving segmentation of the inferior alveolar nerve through deep label propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21137–21146 (2022)
[7] Clough, J.R., Byrne, N., Oksuz, I., Zimmer, V.A., Schnabel, J.A., King, A.P.: A topological loss function for deep-learning based image segmentation using persistent homology. IEEE transactions on pattern analysis and machine intelligence 44(12), 8766–8778 (2020)
[8] Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen, C.G., Barman, S.A.: Blood vessel segmentation methodologies in retinal images–a survey. Computer methods and programs in biomedicine 108(1), 407–433 (2012)
[9] He, Y., Sun, H., Yi, Y., Chen, W., Kong, J., Zheng, C.: Curv-net: curvilinear structure segmentation network based on selective kernel and multi-bi-convlstm. Medical Physics 49(5), 3144–3158 (2022)
[10] Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging 19(3), 203–210 (2000)
[11] Hu, X., Li, F., Samaras, D., Chen, C.: Topology-preserving deep image segmentation. Advances in neural information processing systems 32 (2019)
[12] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)
[13] Koller, T.M., Gerig, G., Szekely, G., Dettwiler, D.: Multiscale detection of curvilinear structures in 2-d and 3-d image data. In: Proceedings of IEEE International Conference on Computer Vision. pp. 864–869. IEEE (1995)
[14] Lee, T.C., Kashyap, R.L., Chu, C.N.: Building skeleton models via 3-d medial surface axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56(6), 462–478 (1994)
[15] Lemaitre, C., Perdoch, M., Rahmoune, A., Matas, J., Mitéran, J.: Detection and matching of curvilinear structures. Pattern recognition 44(7), 1514–1527 (2011)
[16] Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G.: A review of 3d vessel lumen segmentation techniques: Models, features and extraction schemes. Medical image analysis 13(6), 819–845 (2009)
[17] Lin, M., Zepf, K., Christensen, A.N., Bashir, Z., Svendsen, M.B.S., Tolsgaard, M., Feragen, A.: Dtu-net: Learning topological similarity for curvilinear structure segmentation. In: International Conference on Information Processing in Medical Imaging. pp. 654–666. Springer (2023)
[18] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M.D., Buettner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: recommendations for image analysis validation. Nature methods pp. 1–18 (2024)
[19] Mena, J.B.: State of the art on automatic road extraction for gis update: a novel classification. Pattern recognition letters 24(16), 3037–3058 (2003)
[20] Menten, M.J., Paetzold, J.C., Zimmer, V.A., Shit, S., Ezhov, I., Holland, R., Probst, M., Schnabel, J.A., Rueckert, D.: A skeletonization algorithm for gradient-based optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21394–21403 (2023)
[21] Mnih, V.: Machine learning for aerial image labeling. University of Toronto (Canada) (2013)
[22] Mosinska, A., Koziński, M., Fua, P.: Joint segmentation and path classification of curvilinear structures. IEEE transactions on pattern analysis and machine intelligence 42(6), 1515–1521 (2019)
[23] Mou, L., Zhao, Y., Chen, L., Cheng, J., Gu, Z., Hao, H., Qi, H., Zheng, Y., Frangi, A., Liu, J.: Cs-net: Channel and spatial attention network for curvilinear structure segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22. pp. 721–730. Springer (2019)
[24] Mou, L., Zhao, Y., Fu, H., Liu, Y., Cheng, J., Zheng, Y., Su, P., Yang, J., Chen, L., Frangi, A.F., et al.: Cs2-net: Deep learning segmentation of curvilinear structures in medical imaging. Medical image analysis 67, 101874 (2021)
[25] Palti-Wasserman, D., Brukstein, A.M., Beyar, R.P.: Identifying and tracking a guide wire in the coronary arteries during angioplasty from x-ray images. IEEE Transactions on Biomedical Engineering 44(2), 152–164 (1997)
[26] Peng, Y., Pan, L., Luan, P., Tu, H., Li, X.: Curvilinear object segmentation in medical images based on odos filter and deep learning network. arXiv preprint arXiv:2301.07475 (2023)
[27] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
[28] Rougé, P., Passat, N., Merveille, O.: Cascaded multitask u-net using topological loss for vessel segmentation and centerline extraction. arXiv preprint arXiv:2307.11603 (2023)
[29] Roychowdhury, S., Koozekanani, D.D., Parhi, K.K.: Iterative vessel segmentation of fundus images. IEEE Transactions on Biomedical Engineering 62(7), 1738–1749 (2015)
[30] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. International journal of computer vision 115, 211–252 (2015)
[31] Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim, J.P., Bauer, U., Menze, B.H.: cldice-a novel topology-preserving loss function for tubular structure segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16560–16569 (2021)
[32] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging 23(4), 501–509 (2004)
[33] Steger, C.: Extracting curvilinear structures: A differential geometric approach. In: Computer Vision—ECCV’96: 4th European Conference on Computer Vision Cambridge, UK, April 15–18, 1996 Proceedings, Volume I 4. pp. 630–641. Springer (1996)
[34] Subirats, P., Dumoulin, J., Legeay, V., Barba, D.: Automation of pavement surface crack detection using the continuous wavelet transform. In: 2006 International Conference on Image Processing. pp. 3037–3040. IEEE (2006)
[35] Sun, K., Zhao, Y., Jiang, B., Cheng, T., Xiao, B., Liu, D., Mu, Y., Wang, X., Liu, W., Wang, J.: High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514 (2019)
[36] Tomaszkiewicz, K., Owerko, T.: A pre-failure narrow concrete cracks dataset for engineering structures damage classification and segmentation. Scientific Data 10(1), 925 (2023)
[37] Viti, M., Talbot, H., Abdallah, B., Perot, E., Gogin, N.: Coronary artery centerline tracking with the morphological skeleton loss. In: 2022 IEEE International Conference on Image Processing (ICIP). pp. 2741–2745. IEEE (2022)
[38] Van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: scikit-image: image processing in python. PeerJ 2, e453 (2014)
[39] Wang, F., Gu, Y., Liu, W., Yu, Y., He, S., Pan, J.: Context-aware spatio-recurrent curvilinear structure segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12648–12657 (2019)
[40] Yang, K., Musio, F., Ma, Y., Juchler, N., Paetzold, J.C., Al-Maskari, R., Höher, L., Li, H.B., Hamamci, I.E., Sekuboyina, A., et al.: Benchmarking the cow with the topcow challenge: Topology-aware anatomical segmentation of the circle of willis for cta and mra. arXiv preprint arXiv:2312.17670 (2023)
[41] Zana, F., Klein, J.C.: Segmentation of vessel-like patterns using mathematical morphology and curvature evaluation. IEEE transactions on image processing 10(7), 1010–1019 (2001)
[42] Zhang, T.Y., Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Communications of the ACM 27(3), 236–239 (1984)
[43] Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., Palmer, A.C.: Morphometric analysis of white matter lesions in mr images: method and validation. IEEE transactions on medical imaging 13(4), 716–724 (1994)

Appendix 0.A Influence of the Loss Weight Parameter $\bm{w}$

Appendix 0.B Model Configurations

Table 4: Configuration of nnUNet and HRNet on the five datasets: nnUNet employs patch-based training and inference, while HRNet uses the whole image. HRNet is designed specifically for 2D data, while nnUNet supports both 2D and 3D images.

Dataset Network Batch Size Patch Size Optimizer LR Schedule Roads nnUNet 12 $512\!\times\!512$ SGD, $\mu=0.99$ PolyLR(1e-2) HRNet 2 $1500\!\times\!1500$ SGD, $\mu=0.9$ PolyLR(1e-2) DRIVE nnUNet 2 $512\!\times\!512$ SGD, $\mu=0.99$ PolyLR(1e-2) HRNet 2 $565\!\times\!584$ SGD, $\mu=0.9$ PolyLR(1e-2) Cracks nnUNet 65 $224\!\times\!224$ SGD, $\mu=0.99$ PolyLR(1e-2) HRNet 64 $224\!\times\!224$ SGD, $\mu=0.9$ PolyLR(1e-2) ToothFairy nnUNet 2 $80\!\times\!160\!\times\!192$ SGD, $\mu=0.99$ PolyLR(1e-2) HRNet – – – – TopCoW nnUNet 2 $80\!\times\!192\!\times\!160$ SGD, $\mu=0.99$ PolyLR(1e-2) HRNet – – – –