11institutetext: German Cancer Research Center (DKFZ), Heidelberg, Division of Medical Image Computing, Germany 22institutetext: Faculty of Mathematics and Computer Science, Heidelberg University, Germany 33institutetext: HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany 44institutetext: Medical Faculty Heidelberg, Heidelberg University, Heidelberg, Germany 55institutetext: Helmholtz Imaging, German Cancer Research Center, Heidelberg, Germany 66institutetext: Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany 77institutetext: Division for Computational Neuroimaging, Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany 88institutetext: Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany 99institutetext: Cancer Research Center Cologne Essen (CCCE), West German Cancer Center Essen, University Hospital Essen, Essen, Germany 1010institutetext: Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital

Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures

Yannick Kirchhoff Contributed equally. All co-primary authors agree that they may designate themselves as the lead author in their profiles.112233    Maximilian R. Rokuss* 1122    Saikat Roy* 1122    Balint Kovacs 1144    Constantin Ulrich 1144    Tassilo Wald 112255    Maximilian Zenk 1144    Philipp Vollmuth 116677    Jens Kleesiek 8899    Fabian Isensee 1155    Klaus Maier-Hein 111010
Abstract

Accurately segmenting thin tubular structures, such as vessels, nerves, roads or concrete cracks, is a crucial task in computer vision. Standard deep learning-based segmentation loss functions, such as Dice or Cross-Entropy, focus on volumetric overlap, often at the expense of preserving structural connectivity or topology. This can lead to segmentation errors that adversely affect downstream tasks, including flow calculation, navigation, and structural inspection. Although current topology-focused losses mark an improvement, they introduce significant computational and memory overheads. This is particularly relevant for 3D data, rendering these losses infeasible for larger volumes as well as increasingly important multi-class segmentation problems. To mitigate this, we propose a novel Skeleton Recall Loss, which effectively addresses these challenges by circumventing intensive GPU-based calculations with inexpensive CPU operations. It demonstrates overall superior performance to current state-of-the-art approaches on five public datasets for topology-preserving segmentation, while substantially reducing computational overheads by more than 90%percent9090\%90 %. In doing so, we introduce the first multi-class capable loss function for thin structure segmentation, excelling in both efficiency and efficacy for topology-preservation. Our code is available to the community, providing a foundation for further advancements at: [will be shared later]

Keywords:
Segmentation Topology Tubular Structures Loss Function

1 Introduction

The precise segmentation of thin tubular structures is a critical task across diverse domains in engineering and medical applications (Fig. 1). Topological correctness is fundamental for facilitating downstream tasks such as analyzing blood flow dynamics, delineating neuronal boundaries in Electron Microscopy imagery, evaluating risk factors for vascular pathologies, aiding in surgical planning, and optimizing route planning [21, 1, 32, 40, 6, 5]. Classical approaches for the automated segmentation of thin curvilinear structures have encompassed methods including image transforms [25, 34], mathematical morphologies [41, 29], filtering [13, 10, 15], differential operators [33], among others [8, 16, 19, 3, 2]. Deep learning based techniques have played an increasing role in recent years with standard segmentation networks such as UNet [27] being popular. Standard overlap-based losses (eg. dice-similarity coefficient [43]) enable such networks to segment large structures while often struggling with small elongated ones [18] as shown in Fig. 2.

Multiple methods have been introduced in recent years to address the challenges of segmenting thin curvilinear structures but are often domain-specific or require the use of specialized networks [39, 23, 24, 22, 17, 4, 9]. Recently, centerlineDice [31] (clDice) was introduced encompassing both a loss function and a metric for measuring connectivity in segmentation of thin structures. Effectively, it incorporates the skeleton of a segmentation into the dice calculation. While the clDice metric uses the exact skeleton, the clDice loss works with a differentiable approximation of the skeleton. It is used to enable architecture-independent, topology-aware segmentation of thin tubular structures and is considered as state-of-the art. However, despite its advantages, it introduces a large computational overhead. Furthermore, the differentiable Soft Skeleton used in the loss calculation can often be jagged as depicted in Fig. 3, leading to inaccuracies in segmentation. While a follow-up approach [20] attempted to address this limitation by introducing a topological-correct differentiable skeleton, it did so while being even more computationally expensive. This limitation becomes particularly pronounced when dealing with large volumes or multi-class segmentation problems common to 3D medical image segmentation, rendering training on multi-class 3D datasets challenging to infeasible even on modern hardware.

Refer to caption
Refer to caption
(a) Roads
Refer to caption
Refer to caption
(b) DRIVE
Refer to caption
Refer to caption
(c) Cracks
Refer to caption
Refer to caption
(d) Toothfairy
Refer to caption
Refer to caption
(e) TopCoW
Figure 1: Diversity of thin structures. Segmentation of thin structures is a challenging task in engineering and medical imaging. This is highlighted in 5 diverse datasets used in this work to incorporate the segmentation of: a) Roads in satellite imagery, b) Retinal vessels, c) Cracks in concrete structures, d) Inferior alveolar canal in facial CTs, and e) Circle of Willis arterial vessel components.

In response to these challenges, we propose Skeleton Recall Loss, a novel loss function tailored to address the intricacies associated with thin structures in segmentation tasks. Skeleton Recall Loss demonstrates the following strengths:

Refer to caption
Figure 2: Comparison of state-of-the-art loss functions on the task of thin structure segmentation. Top: Our Skeleton Recall Loss efficiently addresses connectivity conservation, unlike standard dice loss, without the overhead of clDice Loss, making it ideal for multi-class problems as well. Bottom: Qualitative results on the TopCoW[40] dataset. Due to computational cost clDice Loss can not be used for multi-class segmentation.
  1. 1.

    Minimal training time: The Tubed Skeleton required by Skeleton Recall Loss can be computed with simple CPU-based operations using common image processing frameworks (e.g. scikit-image [38]) as part of data-loading or even be precomputed. It is then used in a simple additional soft recall loss with the prediction, thus requiring very little additional training time.

  2. 2.

    Minimal training memory: The utilization of Skeleton Recall Loss entails minimal GPU Memory overhead. Unlike approaches reliant on a differentiable skeleton in prediction or ground truth during training, Skeleton Recall Loss sidesteps the computationally taxing GPU-based skeletonization process, thus necessitating only a marginal increase in GPU training memory.

  3. 3.

    Domain and architecture agnostic: Skeleton Recall Loss exhibits inherent plug-and-play characteristics, seamlessly integrating into a wide array of 2D and 3D segmentation tasks without imposing architectural constraints. It operates without the need for specialized networks or modifications to underlying segmentation architectures.

  4. 4.

    Multi-class compatibility: Skeleton Recall Loss integrates seamlessly with multi-class labels, while competing methods like clDice Loss, often face near insurmountable computational challenges on such problems.

Skeleton Recall Loss yields overall superior results to a baseline network without topological losses, as well as against clDice Loss as a state-of-the-art topological loss. We demonstrate this effectiveness on extensive multi-domain evaluation on 5 publicly available datasets. Notably, our loss function inherently feasibly supports multi-class segmentation problems and thus can be considered a new state-of-the-art for dilineating thin curvilinear structures in natural as well as medical images.

Refer to caption
(a) MRA Image
Refer to caption
(b) GT segmentation
Refer to caption
(c) Soft Skeleton [31]
Refer to caption
(d) Tubed Skeleton
Figure 3: The challenges of Differentiable Skeletons. Visual comparison of (c) the soft skeleton used for the calculation of the clDice Loss [31] and (d) the proposed tubed skeleton used for Skeleton Recall Loss for (a) an image and the corresponding (b) ground truth segmentation, originating from the TopCoW dataset [40].

2 Related Work

Deep learning-based approaches for segmentation of thin curvilinear structures often involve specialized networks. In [22], a joint network was trained with a shared encoder to use 2 decoders to simultaneously segment as well as score a tubular path. Similarly, in [4], a joint network was proposed to simultaneously learn features as well as global topology. Also, in [17], a pair of sequential UNets was used, the first of which performed a coarse prediction while the other detected missing or false splits in the structure. A similar end-to-end approach in [24] used a channel and spatial attention module, which was incorporated within the bottleneck of an encoder-decoder network for segmenting thin structures in medical images. In another work [26], an oriented derivative of stick (ODoS) filter output was refined using a succession of UNets to obtain effective segmentation of curvilinear objects. However, alternative approaches utilize specialized loss functions to assist the network in preserving topology in segmentation outputs. In [7], topological priors were incorporated into network training by the usage of a persistent-homology based differentiable topological prior as a loss function. Another work [11], attempted to preserve topological information by enforcing that the prediction and ground truth have the same Betti number via a novel loss function. Recently, the use of a differentiable skeleton has emerged as the predominant method for topology-aware segmentation of thin structures following the introduction of the centerlineDice (clDice) loss function[31], outperforming persistent-homology based approaches. This method is complemented by the introduction of the clDice metric, a well-established measure of connectedness. The differentiable skeleton proposed by [31] was improved in [37] where a Soft-Persistent Skeleton was proposed for coronal artery tracking. Alternatively, in [28], the differentiable skeleton was predicted by a secondary network in addition to the primary segmentation output. Most recently, a topologically correct differentiable skeletonization algorithm was introduced in [20], overcoming previous approximation of skeletons, while still requiring massive computational resources to do so.

3 Methodology

3.1 The challenges of Differentiable Skeletons

The usage of a differentiable skeleton based loss [31, 37, 28, 20] in training a deep neural network to segment thin tubular structures is an intuitive approach to preserve connectivity. However, it is fraught with challenges which can be multi-faceted in nature. One of the most easily demonstrable issues is shown in Fig. 3. As mentioned earlier, the so-called Soft skeletonization of clDice Loss can lead to perforated and jagged skeletons, especially in 3D, which results in inaccuracies for the clDice Loss calculation. This is in addition to the enormous GPU memory and training time overheads that are a natural part of a GPU-based differentiable skeletonization process on both the ground truth as well as the network prediction. This overhead is demonstrated in Fig. 7, and can render effective training almost infeasible in multi-class datasets with large 3D input volumes such as TopCoW [40] (which is used in this work) without access to significant computational resources. While follow-up work in [20] allowed for a relative improvement in topologically accurate differentiable skeletonization, it further aggravated the issues with excessive resource utilization.

3.2 Skeleton Recall Loss: Connectivity conservation on thin structures without differentiable skeletons

Skeleton Recall Loss is a loss function designed to preserve connectivity in thin tubular structures without incurring massive computational overheads. It is universally applicable, regardless of whether the inputs are 2D or 3D. It does so by avoiding the GPU-based soft-skeletonization on the prediction and ground truth. Instead, a tubed skeletonization is performed on the ground truth, followed by a soft recall loss against the predicted segmentation output. This is illustrated in Fig. 4 and further discussed in the following sections.

Refer to caption
Figure 4: Overview of our method in comparison to differentiable skeleton based approaches. Initially, a segmentation network (green) predicts a segmentation mask. Our proposed Skeleton Recall Loss (blue) calculates the soft recall of the prediction on the precomputed Tubed Skeleton of the ground truth. In doing so, we mitigate the massive overheads introduced by differentiable skeleton based methods (red).

3.2.1 Tubed Skeletonization

The usage of a skeleton for the preservation of connectivity is an effective method, but it does not need to be differentiable. In this work, we extract a tubed skeleton from the ground truth as demonstrated in Algorithm 1. Initially, we binarize the ground truth segmentation mask and compute its skeleton using methods outlined in [42] for 2D and [14] for 3D inputs. Subsequently, we dilate the skeleton with a diamond kernel of radius 2222 to make it tubular, thereby enlarging the effective area for loss computation around the otherwise thin, single-pixel-wide skeleton. This enhances the stability of the loss by incorporating signals from a greater number of pixels, particularly those in close proximity to the skeleton, which are vital for connectivity. Lastly, for multi-class problems, we multiply the tubed skeleton with the ground truth mask, effectively assigning parts of the skeleton to their respective classes. All of these operations are computationally inexpensive and can be carried out on the CPU during data loading or pre-computed using libraries such as scikit-image [38].

3.2.2 Soft Recall on Tubed Skeleton

Following the extraction of our tubed skeleton, we incentivize the network to include as much of this skeleton as possible as part of the prediction. This is performed simply by using a soft recall loss SkelRecallsubscript𝑆𝑘𝑒𝑙𝑅𝑒𝑐𝑎𝑙𝑙\mathcal{L}_{SkelRecall}caligraphic_L start_POSTSUBSCRIPT italic_S italic_k italic_e italic_l italic_R italic_e italic_c italic_a italic_l italic_l end_POSTSUBSCRIPT, in addition to any existing generic loss genericsubscript𝑔𝑒𝑛𝑒𝑟𝑖𝑐\mathcal{L}_{generic}caligraphic_L start_POSTSUBSCRIPT italic_g italic_e italic_n italic_e italic_r italic_i italic_c end_POSTSUBSCRIPT used by the network (for example, Dice Loss, Cross Entropy Loss, etc.), as seen in Algorithm 2. This vastly improves the connectivity of thin curvilinear structures predicted by a network trained using this loss (Sec. 5.1). Additionally, Skeleton Recall Loss is computationally inexpensive in comparison to the use of a differentiable skeletonization, requiring only fractionally more GPU memory and additional time during training (Sec. 5.3). This facilitates training multi-class segmentation problems as opposed to current differentiable skeleton methods which incur infeasible overheads (Sec. 5.3.2).

Algorithm 1 Tubed Skeletonization
0:  Y𝑌Yitalic_Y are K𝐾Kitalic_K-classed hard targets where Yi,j(,k)[0,K]Y_{i,j(,k)}\in[0,K]italic_Y start_POSTSUBSCRIPT italic_i , italic_j ( , italic_k ) end_POSTSUBSCRIPT ∈ [ 0 , italic_K ]
1:  YbinY>0subscript𝑌bin𝑌0Y_{\text{bin}}\leftarrow Y>0italic_Y start_POSTSUBSCRIPT bin end_POSTSUBSCRIPT ← italic_Y > 0 % Binarize to foreground and background labels
2:  Yskelskeletonize(Ybin)subscript𝑌skelskeletonizesubscript𝑌binY_{\text{skel}}\leftarrow\text{{skeletonize}}(Y_{\text{bin}})italic_Y start_POSTSUBSCRIPT skel end_POSTSUBSCRIPT ← skeletonize ( italic_Y start_POSTSUBSCRIPT bin end_POSTSUBSCRIPT ) % Extract binarized skeleton
3:  Yskeldilate(Yskel)subscript𝑌skeldilatesubscript𝑌skelY_{\text{skel}}\leftarrow\text{{dilate}}(Y_{\text{skel}})italic_Y start_POSTSUBSCRIPT skel end_POSTSUBSCRIPT ← dilate ( italic_Y start_POSTSUBSCRIPT skel end_POSTSUBSCRIPT ) % Dilate to create tubed skeleton
4:  Ymc-skelYskel×Ysubscript𝑌mc-skelsubscript𝑌skel𝑌Y_{\text{mc-skel}}\leftarrow Y_{\text{skel}}\times Yitalic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT ← italic_Y start_POSTSUBSCRIPT skel end_POSTSUBSCRIPT × italic_Y % De-binarize to create multi-class tubed skeleton
5:  return  Ymc-skelsubscript𝑌mc-skelY_{\text{mc-skel}}italic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT
Algorithm 2 Skeleton Recall Loss
0:  target labels (Y𝑌Yitalic_Y), network predictions (Ypredsubscript𝑌predY_{\text{pred}}italic_Y start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT), K𝐾Kitalic_K-classes, weighing factor w𝑤witalic_w % Generic loss computation, e.g. Cross Entropy, Dice Similarity (GPU)
1:  Yonehotonehot(Y)subscript𝑌onehotonehot𝑌Y_{\text{onehot}}\leftarrow\text{{onehot}}(Y)italic_Y start_POSTSUBSCRIPT onehot end_POSTSUBSCRIPT ← onehot ( italic_Y )
2:  generic𝐆𝐞𝐧𝐞𝐫𝐢𝐜𝐋𝐨𝐬𝐬(Yonehot,Ypred)subscript𝑔𝑒𝑛𝑒𝑟𝑖𝑐𝐆𝐞𝐧𝐞𝐫𝐢𝐜𝐋𝐨𝐬𝐬subscript𝑌onehotsubscript𝑌pred\mathcal{L}_{generic}\leftarrow\mathbf{GenericLoss}(Y_{\text{onehot}},Y_{\text% {pred}})caligraphic_L start_POSTSUBSCRIPT italic_g italic_e italic_n italic_e italic_r italic_i italic_c end_POSTSUBSCRIPT ← bold_GenericLoss ( italic_Y start_POSTSUBSCRIPT onehot end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT )   % Precomputed using CPU Operations
3:  Ymc-skelTubedSkeletonization(Y)subscript𝑌mc-skelTubedSkeletonization𝑌Y_{\text{mc-skel}}\leftarrow\text{{TubedSkeletonization}}(Y)italic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT ← TubedSkeletonization ( italic_Y ) % Extract tubed skeleton using Alg. 1   % Skeleton Recall Loss computation (GPU)
4:  Ymc-skelonehot(Ymc-skel)subscript𝑌mc-skelonehotsubscript𝑌mc-skelY_{\text{mc-skel}}\leftarrow\text{{onehot}}(Y_{\text{mc-skel}})italic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT ← onehot ( italic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT )
5:  Lmc-skel={}subscript𝐿mc-skelL_{\text{mc-skel}}=\{\}italic_L start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT = { }   %Soft Recall with tubed skeletons. Practically implemented as vector operations
6:  for all class[1,K]𝑐𝑙𝑎𝑠𝑠1𝐾class\in[1,K]italic_c italic_l italic_a italic_s italic_s ∈ [ 1 , italic_K ] do
7:     Lmc-skel[class]𝐬𝐮𝐦(Ymc-skel[class]×Ypred[class])𝐬𝐮𝐦(Ymc-skel[class])subscript𝐿mc-skeldelimited-[]𝑐𝑙𝑎𝑠𝑠𝐬𝐮𝐦subscript𝑌mc-skeldelimited-[]𝑐𝑙𝑎𝑠𝑠subscript𝑌preddelimited-[]𝑐𝑙𝑎𝑠𝑠𝐬𝐮𝐦subscript𝑌mc-skeldelimited-[]𝑐𝑙𝑎𝑠𝑠L_{\text{mc-skel}}[class]\leftarrow-\frac{\mathbf{sum}(Y_{\text{mc-skel}}[% class]\times Y_{\text{pred}}[class])}{\mathbf{sum}(Y_{\text{mc-skel}}[class])}italic_L start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT [ italic_c italic_l italic_a italic_s italic_s ] ← - divide start_ARG bold_sum ( italic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT [ italic_c italic_l italic_a italic_s italic_s ] × italic_Y start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT [ italic_c italic_l italic_a italic_s italic_s ] ) end_ARG start_ARG bold_sum ( italic_Y start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT [ italic_c italic_l italic_a italic_s italic_s ] ) end_ARG
8:  end for
9:  SkelRecall𝐦𝐞𝐚𝐧(Lmc-skel)subscript𝑆𝑘𝑒𝑙𝑅𝑒𝑐𝑎𝑙𝑙𝐦𝐞𝐚𝐧subscript𝐿mc-skel\mathcal{L}_{SkelRecall}\leftarrow\mathbf{mean}(L_{\text{mc-skel}})caligraphic_L start_POSTSUBSCRIPT italic_S italic_k italic_e italic_l italic_R italic_e italic_c italic_a italic_l italic_l end_POSTSUBSCRIPT ← bold_mean ( italic_L start_POSTSUBSCRIPT mc-skel end_POSTSUBSCRIPT )
10:  generic+wSkelRecallsubscript𝑔𝑒𝑛𝑒𝑟𝑖𝑐𝑤subscript𝑆𝑘𝑒𝑙𝑅𝑒𝑐𝑎𝑙𝑙\mathcal{L}\leftarrow\mathcal{L}_{generic}+w\cdot\mathcal{L}_{SkelRecall}caligraphic_L ← caligraphic_L start_POSTSUBSCRIPT italic_g italic_e italic_n italic_e italic_r italic_i italic_c end_POSTSUBSCRIPT + italic_w ⋅ caligraphic_L start_POSTSUBSCRIPT italic_S italic_k italic_e italic_l italic_R italic_e italic_c italic_a italic_l italic_l end_POSTSUBSCRIPT
11:  return  \mathcal{L}caligraphic_L

4 Experimental Setup

4.1 Datasets

Table 1: Details of the datasets used for training and evaluation. Our datasets show wide coverage over a number of thin structure segmentation tasks in natural and medical images. The TopCoW dataset is used both in binary and multi-class settings, in line with the original challenge.
Dataset Dims Type # Samples Target Structure
Roads[21] 2D binary 804 Roads on aerial images
DRIVE[32] 2D binary 20 Blood vessels on retina images
Cracks[36] 2D binary 5388 Cracks on concrete structure images
ToothFairy[6, 5] 3D binary 138 Inferior Alveolar Canal
TopCoW[40] 3D multi-class 200 Circle of Willis vessels in the brain

We employ five public datasets featuring thin structures for validating the proposed Skeleton Recall Loss. The datasets span natural as well as medical images, covering a range of segmentation challenges, including both binary and multi-class segmentation problems in 2D as well as 3D contexts. An overview of the datasets can be found in Tab. 1. Among the three 2D datasets used in this study, the Digital Retinal Images for Vessel Extraction (DRIVE) dataset [32] was employed, focusing on retinal vessel segmentation. Additionally, structural inspection images designed for concrete crack segmentation (Cracks[36] and aerial images of Massachusetts for road segmentation (Roads[21] were included, highlighting the diversity of thin structures in natural and constructed environments. In the 3D domain, we incorporated two cutting-edge medical image segmentation challenge datasets. One of them was ToothFairy111https://toothfairy.grand-challenge.org/, which was a segmentation challenge on 3D Cone-Beam CTs [6, 5] featuring the inferior alveolar canal as the target structure. Additionally, the TopCoW222https://topcow23.grand-challenge.org/ dataset for topology-aware 3D segmentation of vessels in the Circle of Willis for CTA and MRA data [40] was utilized, encompassing binary as well as multi-class segmentation on 13 different subtypes of vessels. This diverse set of datasets enables a comprehensive evaluation of the proposed Skeleton Recall Loss, demonstrating generalizability of the method to a wide range of thin structure segmentation challenges in both 2D and 3D contexts.

4.2 Evaluation Metrics

We use multiple metrics including overlap, connectivity and topological measures for thorough evaluation of our proposed loss function. An interesting dichotomy of clDice [31] is that while it makes for an inefficient loss function for training deep neural networks, it is an effective metric for measuring connectivity. Therefore, following existing guidelines [18] for semantic segmentation of tubular structures, we use clDice as a metric in conjunction with Dice similarity coefficient, as our connectivity- and overlap-based measures. Similar to previous work, we also report on topology-based metrics, namely, absolute Betti Number Errors of 0th and 1st Betti Numbers, β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. However, in contrast to other work [11, 31], we calculate the Betti Errors on whole volumes instead of small, randomly extracted patches. Our evaluation strategy is more intuitive in nature and offers better interpretability of the measure, which is especially relevant in medical segmentation tasks.

4.3 Baseline Loss Functions

We benchmark our proposed loss function against state-of-the-art loss functions targeting thin structure segmentation on the five datasets detailed in Sec. 4.1. Specifically, we compare against: 1) clDice Loss [31], the leading method in the field that utilizes approximate differentiable skeletons. 2) Additionally, we also compare against a modification of clDice Loss, where we replace the differentiable skeletonization of the original publication by a follow-up of this work [20]. This new method, called Topo-clDice Loss in our evaluations, produces topologically-accurate differentiable skeletons at the cost of even higher computational requirements. We note that loss functions based on persistent homologies [22, 11] are excluded from our evaluation which, while related, were surpassed by the clDice Loss [31].

4.4 Training

We implement the baseline loss functions (Sec. 4.3), as well as our proposed Skeleton Recall Loss in a powerful medical image segmentation network (nnUNet [12]) and a state-of-the-art natural image segmentation network (HRNet [35]), pretrained on ImageNet [30]. We use the examined loss functions for connectivity conservation (connectivitysubscript𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦\mathcal{L}_{connectivity}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n italic_n italic_e italic_c italic_t italic_i italic_v italic_i italic_t italic_y end_POSTSUBSCRIPT) in addition to the underlying generic loss (genericsubscript𝑔𝑒𝑛𝑒𝑟𝑖𝑐\mathcal{L}_{generic}caligraphic_L start_POSTSUBSCRIPT italic_g italic_e italic_n italic_e italic_r italic_i italic_c end_POSTSUBSCRIPT) of our training framework – a combination of Cross-Entropy and Soft Dice Loss. The connectivity loss is weighted by an additional parameter w𝑤witalic_w as shown in Eq. 1.

=generic+wconnectivitysubscript𝑔𝑒𝑛𝑒𝑟𝑖𝑐𝑤subscript𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦\mathcal{L}=\mathcal{L}_{generic}+w\cdot\mathcal{L}_{connectivity}caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_g italic_e italic_n italic_e italic_r italic_i italic_c end_POSTSUBSCRIPT + italic_w ⋅ caligraphic_L start_POSTSUBSCRIPT italic_c italic_o italic_n italic_n italic_e italic_c italic_t italic_i italic_v italic_i italic_t italic_y end_POSTSUBSCRIPT (1)

Our experiments are restricted to two weight configurations w{0.1,1.0}𝑤0.11.0w\in\{0.1,1.0\}italic_w ∈ { 0.1 , 1.0 } in order to curb the influence of extensive hyperparameter tuning. We show a more detailed analysis of the effect of the weight parameter in the Appendix. Additionally, the full set of hyperparameters, optimizers and configurations of nnUNet and HRNet used for training on the different datasets are also provided in the Appendix.

5 Results and Discussion

5.1 Skeleton Recall Loss enables state-of-the-art segmentation of thin structures

The obtained results in Tab. 2 clearly show that our proposed Skeleton Recall Loss consistently surpasses previous thin structure segmentation losses on almost all datasets. For concrete crack segmentation[36], the results indicate better Dice and clDice performance at the cost of slightly worse Betti numbers than clDice Loss. However, Skeleton Recall Loss demonstrates the best clDice and Betti numbers for retinal vessel segmentation[32], yielding a Dice score just marginally behind clDice Loss. Notably, for the three datasets with an independent testset available, specifically Roads[21] and both of the 3D datasets, ToothFairy[6, 5] and TopCoW[40], we observe superior performance of our proposed Skeleton Recall Loss. This is further demonstrated by the qualitative results given in Fig. 5. Skeleton Recall Loss is also seen to be better than baselines, on both binary as well as multi-class settings of TopCoW as elaborated in following sections. We obtain this state-of-the-art performance while being architecture agnostic (Sec. 5.2) as well as overwhelmingly resource efficient (Sec. 5.3).

Table 2: State-of-the-art segmentation of thin structures. Quantitative results obtained by incorporating our proposed Skeleton Recall Loss as well as existing thin structure segmentation losses into the loss function of a generic nnUNet backbone. Results are reported on the testset, except for DRIVE and Cracks datasets, where we report 5-fold cross validation results due to unavailability of an independent testset.

Dataset Loss configuration Dice \uparrow clDice \uparrow β𝟎subscript𝛽0\mathbf{\beta_{0}}italic_β start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT error \downarrow β𝟏subscript𝛽1\mathbf{\beta_{1}}italic_β start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT error \downarrow Roads [21] Default nnUNet 78.99 88.79 5.769 84.62 + clDice Loss 79.15 89.00 6.539 82.00 + Topo-clDice Loss 78.94 88.62 11.00 85.92 + Skeleton Recall Loss (Ours) 79.25 89.06 4.846 83.69 DRIVE [32] Default nnUNet 80.87 80.26 57.00 22.80 + clDice Loss 81.05 80.68 44.50 23.35 + Topo-clDice Loss 80.80 80.19 46.35 23.70 + Skeleton Recall Loss (Ours) 80.99 80.83 38.75 21.50 Cracks [36] Default nnUNet 94.59 95.76 0.147 0.0033 + clDice Loss 94.80 95.96 0.142 0.0033 + Topo-clDice Loss 94.83 96.00 0.159 0.0035 + Skeleton Recall Loss (Ours) 94.88 96.04 0.148 0.0035 ToothFairy [6, 5] Default nnUNet 71.80 89.16 0.900 0.0200 + clDice Loss 72.36 89.67 0.620 0.0200 + Skeleton Recall Loss (Ours) 74.42 92.05 0.540 0.0200 TopCoW binary [40] Default nnUNet 93.55 98.25 0.743 1.800 + clDice Loss 93.64 98.35 0.514 1.986 + Skeleton Recall Loss (Ours) 93.72 98.48 0.500 1.586 TopCoW multi-class Default nnUNet 85.36 93.68 0.137 0.0571 + clDice Loss Out Of Memory + Skeleton Recall Loss (Ours) 86.59 94.35 0.151 0.056

5.2 Skeleton Recall Loss is architecture agnostic

While Skeleton Recall Loss demonstrates state-of-the-art performance usingnnUNet as a backbone framework in Tab. 2, it is not restricted to specialized architectures. We highlight this in Tab. 3 where HRNet [35], a state-of-the-art 2D architecture for natural image segmentation, is used as the backbone. This leads to similar benefits on connectivity conservation using Skeleton Recall Loss during training on our 2D datasets. Skeleton Recall Loss is seen to exceed the connectivity conserving performance (as demonstrated by the clDice metric) on 2 out of 3 datasets, while being comparable on the remaining one. Our overall superiority over all metrics demonstrates that Skeleton Recall Loss is architecture agnostic and can be used as a loss in training arbitrary deep architectures for connectivity-conserving segmentation of thin structures.

Image
Refer to caption
Ground Truth
Refer to caption
nnUNet
Refer to caption
+ clDice Loss
Refer to caption
+ Ours
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Connectivity conservation in qualitative results on 5 datasets. nnUNet with conventional segmentation losses performs well in adequately delineating general structures, particularly thicker ones. However, challenges arise in accurately capturing thin structures and maintaining connectivity within the segmentation. This is demonstrated on examples from (top to bottom) Roads, DRIVE, Cracks, Toothfairy and TopCoW datasets. Augmenting the model with clDice Loss yields some improvement but falls short in addressing connectivity issues. In contrast, our proposed Skeleton Recall Loss demonstrates enhanced preservation of topology and improved connectivity in segmentation outputs.
Table 3: Skeleton Recall Loss is architecture agnostic. Quantitative results using HRNet, a state-of-the-art 2D network, on all examined 2D datasets. Skeleton Recall Loss demonstrates accurate segmentation including effective connectivity conservation, without explicit reliance on a particular deep neural network architecture.

Dataset Loss configuration Dice \uparrow clDice \uparrow β𝟎subscript𝛽0\mathbf{\beta_{0}}italic_β start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT error \downarrow β𝟏subscript𝛽1\mathbf{\beta_{1}}italic_β start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT error \downarrow Roads HRNet 77.91 87.63 18.08 83.15 + clDice Loss 78.05 88.21 8.846 82.31 + Skeleton Recall Loss (Ours) 78.25 88.14 15.38 81.31 DRIVE HRNet 80.34 77.83 149.3 51.25 + clDice Loss 80.63 81.65 65.75 46.25 + Skeleton Recall Loss (Ours) 76.08 84.20 27.25 38.50 Cracks HRNet 94.09 95.39 0.1985 0.0037 + clDice Loss 94.67 96.01 0.1558 0.0037 + Skeleton Recall Loss (Ours) 95.03 96.26 0.1596 0.0019

5.3 Connectivity conservation with minimal overheads

Refer to captionRefer to caption
Figure 6: Efficient Resource Utilization for Binary Segmentation. The figures depict the additional training time per epoch and memory requirements caused by employing Skeleton Recall Loss and clDice Loss compared to the standard network training (nnUnet, dashed lines) across all assessed datasets. While Skeleton Recall Loss shows minimal increase in VRAM usage and negligible changes in epoch duration, clDice Loss introduces notable overhead in both time and memory. For example, clDice Loss more than doubles the epoch duration for DRIVE or almost doubles VRAM usage for Toothfairy.

5.3.1 Efficient binary segmentation of thin structures

A plurality of tasks in the segmentation of thin curvilinear structures have historically been binary in nature. As competing state-of-the-art differentiable skeleton methods were developed for the binary scenario, we consider this to be where such methods should also be most competitive. However, Skeleton Recall Loss does not only provide state-of-the-art connectivity-conserving thin structure segmentation performance, as seen in Sec. 5.1 and Sec. 5.2, but it can do so while using only fractional GPU memory and training time compared to existing methods, as shown in Fig. 6. Differentiable skeleton based methods require a GPU-based skeleton computation [31] or prediction [28]. For our differentiable skeleton baseline clDice Loss, this leads to approximately 88%percent8888\%88 % additional training time and 52%percent5252\%52 % more VRAM consumption compared to the plain nnUNet backbone when averaged across our 5 datasets (excluding multi-class TopCoW). Remarkably, our method Skeleton Recall Loss does the same at only an additional 𝟖%percent8\mathbf{8\%}bold_8 % training time and 𝟐%percent2\mathbf{2\%}bold_2 % higher VRAM consumption. This illustrates that Skeleton Recall Loss categorically outperforms traditional differentiable skeleton-based methods on binary settings, which they were developed for, in terms of resource efficiency.

5.3.2 Enabling multi-class segmentation of thin structures

Binary segmentation has historically sufficed for many image analysis tasks across various domains. However, as the demand for finer-grained analysis grows, transitioning to multi-class segmentation becomes increasingly vital. This shift is especially pertinent in medical contexts due to the prevalence of thin structures where binary segmentation may not adequately capture the complexity of anatomical features. For instance, the recent TopCoW challenge[40] revealed that binary segmentation of brain vessels can be deemed as sufficiently solved, approaching inter-rater agreement in the Dice score. However, differentiating between the different vessels still remains a challenging task. Our Skeleton Recall Loss demonstrates powerful multi-class segmentation capabilities in addition to standard binary settings. Tab. 2 showcases the results of multi-class segmentation on 13 different brain vessel classes of the TopCoW dataset using both standard nnUNet and our proposed Loss. The results demonstrate that while nnUNet exhibits slightly better β0subscript𝛽0\beta_{0}italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT error, our Skeleton Recall Loss significantly improves Dice and clDice scores. Moreover, it performs on par in terms of β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT error, ultimately yielding a superior overall result.

Refer to captionRefer to captionRefer to captionRefer to caption
Figure 7: Resource Utilization for Multi-class Segmentation. Skeleton Recall Loss requires minimal additional GPU memory and training time for an increasing number of classes, as it avoids a differentiable skeleton computation. Competing methods like clDice Loss, on the other hand, incur enormous overheads which make multi-class training on datasets such as TopCoW [40] infeasible. Analysis was performed for two different batch sizes (BS) on a single A100 40GB GPU, averaged over 5 training epochs, for ease of comparability.

Fig. 7 shows the multi-class resource utilization with respect to the number of classes of our proposed Loss in comparison to clDice Loss. We demonstrate significant training time and memory savings with near-constant additional overhead despite the increasing number of classes. In contrast, the plots underscore the approximately linear growth in memory consumption and training time associated with clDice Loss. We note that the inefficiency of clDice Loss rendered it infeasible on all 13 classes as it exceeded the memory capacity of an A100 40GB GPU. In summary, Skeleton Recall Loss can be employed for an arbitrary number of classes with minimal computational cost.

6 Conclusion

This paper proposes a novel loss function, Skeleton Recall Loss, designed for connectivity preserving semantic segmentation. It is domain and architecture agnostic and, unlike existing methods, requires minimal additional training time and memory. Through extensive evaluation on five publicly available datasets, we demonstrate that Skeleton Recall Loss shows overall superior performance on existing state-of-the-art topology-aware loss functions. Moreover, it stands as the first loss function designed for computationally manageable thin structure segmentation within the increasingly significant but hitherto unaddressed multi-class context. In essence, Skeleton Recall Loss represents a significant advancement in the field of thin structure segmentation, offering both efficiency and efficacy. The public availability of our code further facilitates its adoption and serves as a foundation for future advancements in this critical area of study.

Acknowledgements

The present contribution is supported by the Helmholtz Association under the joint research school "HIDSS4Health – Helmholtz Information and Data Science School for Health."

References

  • [1] Arganda-Carreras, I., Turaga, S.C., Berger, D.R., Cireşan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J., Laptev, D., Dwivedi, S., Buhmann, J.M., et al.: Crowdsourcing the creation of image segmentation algorithms for connectomics. Frontiers in neuroanatomy 9,  142 (2015)
  • [2] Bibiloni, P., González-Hidalgo, M., Massanet, S.: A survey on curvilinear object segmentation in multiple applications. Pattern Recognition 60, 949–970 (2016)
  • [3] Chambon, S., Moliard, J.M.: Automatic road pavement assessment with image processing: Review and comparison. International Journal of Geophysics 2011 (2011)
  • [4] Cheng, M., Zhao, K., Guo, X., Xu, Y., Guo, J.: Joint topology-preserving and feature-refinement network for curvilinear structure segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7147–7156 (2021)
  • [5] Cipriano, M., Allegretti, S., Bolelli, F., Di Bartolomeo, M., Pollastri, F., Pellacani, A., Minafra, P., Anesi, A., Grana, C.: Deep segmentation of the mandibular canal: a new 3d annotated dataset of cbct volumes. IEEE Access 10, 11500–11510 (2022)
  • [6] Cipriano, M., Allegretti, S., Bolelli, F., Pollastri, F., Grana, C.: Improving segmentation of the inferior alveolar nerve through deep label propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21137–21146 (2022)
  • [7] Clough, J.R., Byrne, N., Oksuz, I., Zimmer, V.A., Schnabel, J.A., King, A.P.: A topological loss function for deep-learning based image segmentation using persistent homology. IEEE transactions on pattern analysis and machine intelligence 44(12), 8766–8778 (2020)
  • [8] Fraz, M.M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A.R., Owen, C.G., Barman, S.A.: Blood vessel segmentation methodologies in retinal images–a survey. Computer methods and programs in biomedicine 108(1), 407–433 (2012)
  • [9] He, Y., Sun, H., Yi, Y., Chen, W., Kong, J., Zheng, C.: Curv-net: curvilinear structure segmentation network based on selective kernel and multi-bi-convlstm. Medical Physics 49(5), 3144–3158 (2022)
  • [10] Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging 19(3), 203–210 (2000)
  • [11] Hu, X., Li, F., Samaras, D., Chen, C.: Topology-preserving deep image segmentation. Advances in neural information processing systems 32 (2019)
  • [12] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)
  • [13] Koller, T.M., Gerig, G., Szekely, G., Dettwiler, D.: Multiscale detection of curvilinear structures in 2-d and 3-d image data. In: Proceedings of IEEE International Conference on Computer Vision. pp. 864–869. IEEE (1995)
  • [14] Lee, T.C., Kashyap, R.L., Chu, C.N.: Building skeleton models via 3-d medial surface axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56(6), 462–478 (1994)
  • [15] Lemaitre, C., Perdoch, M., Rahmoune, A., Matas, J., Mitéran, J.: Detection and matching of curvilinear structures. Pattern recognition 44(7), 1514–1527 (2011)
  • [16] Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G.: A review of 3d vessel lumen segmentation techniques: Models, features and extraction schemes. Medical image analysis 13(6), 819–845 (2009)
  • [17] Lin, M., Zepf, K., Christensen, A.N., Bashir, Z., Svendsen, M.B.S., Tolsgaard, M., Feragen, A.: Dtu-net: Learning topological similarity for curvilinear structure segmentation. In: International Conference on Information Processing in Medical Imaging. pp. 654–666. Springer (2023)
  • [18] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M.D., Buettner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: recommendations for image analysis validation. Nature methods pp. 1–18 (2024)
  • [19] Mena, J.B.: State of the art on automatic road extraction for gis update: a novel classification. Pattern recognition letters 24(16), 3037–3058 (2003)
  • [20] Menten, M.J., Paetzold, J.C., Zimmer, V.A., Shit, S., Ezhov, I., Holland, R., Probst, M., Schnabel, J.A., Rueckert, D.: A skeletonization algorithm for gradient-based optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21394–21403 (2023)
  • [21] Mnih, V.: Machine learning for aerial image labeling. University of Toronto (Canada) (2013)
  • [22] Mosinska, A., Koziński, M., Fua, P.: Joint segmentation and path classification of curvilinear structures. IEEE transactions on pattern analysis and machine intelligence 42(6), 1515–1521 (2019)
  • [23] Mou, L., Zhao, Y., Chen, L., Cheng, J., Gu, Z., Hao, H., Qi, H., Zheng, Y., Frangi, A., Liu, J.: Cs-net: Channel and spatial attention network for curvilinear structure segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22. pp. 721–730. Springer (2019)
  • [24] Mou, L., Zhao, Y., Fu, H., Liu, Y., Cheng, J., Zheng, Y., Su, P., Yang, J., Chen, L., Frangi, A.F., et al.: Cs2-net: Deep learning segmentation of curvilinear structures in medical imaging. Medical image analysis 67, 101874 (2021)
  • [25] Palti-Wasserman, D., Brukstein, A.M., Beyar, R.P.: Identifying and tracking a guide wire in the coronary arteries during angioplasty from x-ray images. IEEE Transactions on Biomedical Engineering 44(2), 152–164 (1997)
  • [26] Peng, Y., Pan, L., Luan, P., Tu, H., Li, X.: Curvilinear object segmentation in medical images based on odos filter and deep learning network. arXiv preprint arXiv:2301.07475 (2023)
  • [27] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
  • [28] Rougé, P., Passat, N., Merveille, O.: Cascaded multitask u-net using topological loss for vessel segmentation and centerline extraction. arXiv preprint arXiv:2307.11603 (2023)
  • [29] Roychowdhury, S., Koozekanani, D.D., Parhi, K.K.: Iterative vessel segmentation of fundus images. IEEE Transactions on Biomedical Engineering 62(7), 1738–1749 (2015)
  • [30] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. International journal of computer vision 115, 211–252 (2015)
  • [31] Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim, J.P., Bauer, U., Menze, B.H.: cldice-a novel topology-preserving loss function for tubular structure segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16560–16569 (2021)
  • [32] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging 23(4), 501–509 (2004)
  • [33] Steger, C.: Extracting curvilinear structures: A differential geometric approach. In: Computer Vision—ECCV’96: 4th European Conference on Computer Vision Cambridge, UK, April 15–18, 1996 Proceedings, Volume I 4. pp. 630–641. Springer (1996)
  • [34] Subirats, P., Dumoulin, J., Legeay, V., Barba, D.: Automation of pavement surface crack detection using the continuous wavelet transform. In: 2006 International Conference on Image Processing. pp. 3037–3040. IEEE (2006)
  • [35] Sun, K., Zhao, Y., Jiang, B., Cheng, T., Xiao, B., Liu, D., Mu, Y., Wang, X., Liu, W., Wang, J.: High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514 (2019)
  • [36] Tomaszkiewicz, K., Owerko, T.: A pre-failure narrow concrete cracks dataset for engineering structures damage classification and segmentation. Scientific Data 10(1),  925 (2023)
  • [37] Viti, M., Talbot, H., Abdallah, B., Perot, E., Gogin, N.: Coronary artery centerline tracking with the morphological skeleton loss. In: 2022 IEEE International Conference on Image Processing (ICIP). pp. 2741–2745. IEEE (2022)
  • [38] Van der Walt, S., Schönberger, J.L., Nunez-Iglesias, J., Boulogne, F., Warner, J.D., Yager, N., Gouillart, E., Yu, T.: scikit-image: image processing in python. PeerJ 2,  e453 (2014)
  • [39] Wang, F., Gu, Y., Liu, W., Yu, Y., He, S., Pan, J.: Context-aware spatio-recurrent curvilinear structure segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12648–12657 (2019)
  • [40] Yang, K., Musio, F., Ma, Y., Juchler, N., Paetzold, J.C., Al-Maskari, R., Höher, L., Li, H.B., Hamamci, I.E., Sekuboyina, A., et al.: Benchmarking the cow with the topcow challenge: Topology-aware anatomical segmentation of the circle of willis for cta and mra. arXiv preprint arXiv:2312.17670 (2023)
  • [41] Zana, F., Klein, J.C.: Segmentation of vessel-like patterns using mathematical morphology and curvature evaluation. IEEE transactions on image processing 10(7), 1010–1019 (2001)
  • [42] Zhang, T.Y., Suen, C.Y.: A fast parallel algorithm for thinning digital patterns. Communications of the ACM 27(3), 236–239 (1984)
  • [43] Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., Palmer, A.C.: Morphometric analysis of white matter lesions in mr images: method and validation. IEEE transactions on medical imaging 13(4), 716–724 (1994)

Appendix 0.A Influence of the Loss Weight Parameter 𝒘𝒘\bm{w}bold_italic_w

Refer to captionRefer to caption
Figure 8: Evaluation of weight parameter 𝒘𝒘\bm{w}bold_italic_w: The nnUNet baseline performance on Roads is depicted in red. Altering the weight w𝑤witalic_w influences the impact of the additional loss. The figure shows that our Skeleton Recall Loss (green) consistently surpasses clDice Loss (blue) irrespective of the weight parameter.

Appendix 0.B Model Configurations

Table 4: Configuration of nnUNet and HRNet on the five datasets: nnUNet employs patch-based training and inference, while HRNet uses the whole image. HRNet is designed specifically for 2D data, while nnUNet supports both 2D and 3D images.

Dataset Network Batch Size Patch Size Optimizer LR Schedule Roads nnUNet 12 512×512512512512\!\times\!512512 × 512 SGD, μ=0.99𝜇0.99\mu=0.99italic_μ = 0.99 PolyLR(1e-2) HRNet 2 1500×1500150015001500\!\times\!15001500 × 1500 SGD, μ=0.9𝜇0.9\mu=0.9italic_μ = 0.9 PolyLR(1e-2) DRIVE nnUNet 2 512×512512512512\!\times\!512512 × 512 SGD, μ=0.99𝜇0.99\mu=0.99italic_μ = 0.99 PolyLR(1e-2) HRNet 2 565×584565584565\!\times\!584565 × 584 SGD, μ=0.9𝜇0.9\mu=0.9italic_μ = 0.9 PolyLR(1e-2) Cracks nnUNet 65 224×224224224224\!\times\!224224 × 224 SGD, μ=0.99𝜇0.99\mu=0.99italic_μ = 0.99 PolyLR(1e-2) HRNet 64 224×224224224224\!\times\!224224 × 224 SGD, μ=0.9𝜇0.9\mu=0.9italic_μ = 0.9 PolyLR(1e-2) ToothFairy nnUNet 2 80×160×1928016019280\!\times\!160\!\times\!19280 × 160 × 192 SGD, μ=0.99𝜇0.99\mu=0.99italic_μ = 0.99 PolyLR(1e-2) HRNet TopCoW nnUNet 2 80×192×1608019216080\!\times\!192\!\times\!16080 × 192 × 160 SGD, μ=0.99𝜇0.99\mu=0.99italic_μ = 0.99 PolyLR(1e-2) HRNet