11institutetext: University of Calabria, Mathematics and Computer Science, Rende, Italy 22institutetext: University of Calabria, Department of Physics, Rende, Italy 33institutetext: INFN, Frascati, Italy 44institutetext: University of Calabria, DiBEST, Rende, Italy 55institutetext: DLVSystem, Srl, Rende, Italy 66institutetext: STAR Lab, Rende, Italy

μ𝜇\bf\muitalic_μ-Net: A Deep Learning-Based Architecture for μ𝜇\bf\muitalic_μ-CT Segmentation

Pierangela Bruno First two authors contributed equally to this paper.11    Edoardo De Rose0 11    Carlo Adornetto 11    Francesco Calimeri 1155    Sandro Donato 223366    Raffaele Giuseppe Agostino 2266    Daniela Amelio 44    Riccardo Barberi 2266    Maria Carmela Cerra 44    Maria Caterina Crocco 2266    Mariacristina Filice 44    Raffaele Filosa 2266    Gianluigi Greco 11    Sandra Imbrogno 44    Vincenzo Formoso 2266
Abstract

X-ray computed microtomography (μ𝜇\muitalic_μ-CT) is a non-destructive technique that can generate high-resolution 3D images of the internal anatomy of medical and biological samples. These images enable clinicians to examine internal anatomy and gain insights into the disease or anatomical morphology. However, extracting relevant information from 3D images requires semantic segmentation of the regions of interest, which is usually done manually and results time-consuming and tedious. In this work, we propose a novel framework that uses a convolutional neural network (CNN) to automatically segment the full morphology of the heart of Carassius auratus. The framework employs an optimized 2D CNN architecture that can infer a 3D segmentation of the sample, avoiding the high computational cost of a 3D CNN architecture. We tackle the challenges of handling large and high-resoluted image data (over a thousand pixels in each dimension) and a small training database (only three samples) by proposing a standard protocol for data normalization and processing. Moreover, we investigate how the noise, contrast, and spatial resolution of the sample and the training of the architecture are affected by the reconstruction technique, which depends on the number of input images. Experiments show that our framework significantly reduces the time required to segment new samples, allowing a faster microtomography analysis of the Carassius auratus heart shape. Furthermore, our framework can work with any bio-image (biological and medical) from μ𝜇\muitalic_μ-CT with high-resolution and small dataset size.

Keywords:
Computer Vision Deep Learning Micro Tomography Segmentation

1 Introduction

X-ray Computed tomography (CT) is a powerful and widely used imaging tool that provides 3D digital gray-scale images of an object’s internal structure; such images can be quantitatively analyzed to identify specific components of the 3D morphology. Modern CT is a valuable diagnostic tool that provides meaningful information reducing X-ray doses. μ𝜇\muitalic_μ-CT is an even more powerful technique used in the study of human and animal anatomy in research and medicine [1], allowing to achieve higher resolutions. Computer-based approaches can ease and enhance the extraction of information and patterns from μ𝜇\muitalic_μ-CT, leveraging, for instance, accurate semantic segmentation of the anatomical parts. Moreover, in the latest years, the use of image segmentation algorithms proved to be promising in facilitating analysis and detection of abnormalities [2, 3]. Those methods can be applied voxel-wise in a 3D context as well as pixel-wise, slice by slice [4]; however, instrumental noise, non-uniform intensity, and pixel discretization can limit the resolution of the image and obscure finer details. Traditional segmentation methods like thresholding and morphological filters are sensitive to parameter changes, leading to potential detail loss; conventional methods struggle with variations in phase/absorption contrast intensity [4]. Generally, 3D segmentation methods lack flexibility and adaptability, and determining the best method for a specific application is challenging, especially in medical imaging due to the heterogeneity of image characteristics and distributions [4]. Deep Learning (DL) approaches such as CNNs rapidly became the state-of-the-art (SOTA) for medical image segmentation, classification, recognition, and report generation [5, 6], and have been widely applied in the field of CT. However, few attempts have been made on μ𝜇\muitalic_μ-CT; indeed, the wealth of information presents a significant challenge in terms of analysis and interpretation. This is particularly evident in semantic segmentation tasks such as for kidney [7], cartilage [8], temporal bone [9], lung [10] and thorax mouse μ𝜇\muitalic_μ-CT [1]. Same applies to cardiac imaging, crucial for patient-specific intervention planning [11]; here, primary datasets are mainly magnetic resonance imaging (MRI), but tomography datasets have started to be acquired, which have a higher resolution. Notable contributions include DL segmentation structure for MRI cardiac datasets [12], U-net variant for short-axis MRI [13], DL approach for ECG-gated CT data [14], novel pipeline for whole heart CT segmentation [15].

This work aims at defining a general framework for DL-based processing of high-resolution μ𝜇\muitalic_μ-CT images in presence of small datasets, a prevalent scenario in the medical domain. The underlying rationale is that we can improve performance and reliability of image segmentation by considering each class individually. Our approach not only contributes at enhancing μ𝜇\muitalic_μ-CT segmentation performances, but also fosters the usage of more lightweight models, in contrast to the current widespread usage of foundational models. The main contributions of this work can be summarized as follows.
-- We build a new dataset consisting of μ𝜇\muitalic_μ-CT images from C. auratus’s heart, a teleost fish, also known as goldfish.
-- We design a novel DL-based framework for extracting, enhancing, analyzing information from μ𝜇\muitalic_μ-CT. It extends SOTA semantic segmentation by defining multiple models and ensembling strategies. We present an implementation of the framework and assess it by designing and conducting an extensive experimental campaign over the newly introduced dataset. Results show that our proposal outperforms the SOTA methods, exhibiting improved performance and reduced misclassification errors.
-- We show how the application of 2D CNNs followed by custom post-processing to achieve 3D continuity reduces computational costs if compared with 3D CNNs.
-- We design an approach for feasible and robust segmentation, explicitly suited for use cases in which a limited number of labeled samples is available.
-- We study how the quality of 3D tomographic images affects architectural performance, given that obtaining such images requires to collect multiple projection images over a wide range of projection angles.

To the best of our knowledge, this is the first approach that proposes a combination of multiple DL-based models and a comprehensive ablation study to assess the benefits of different architectures and parameter components in the context of μ𝜇\muitalic_μ-CT image segmentation, even with limited prior knowledge.

2 Proposed approach

We present μ𝜇\muitalic_μ-Net, a novel DL-based framework for the analysis and semantic segmentation of μ𝜇\muitalic_μ-CT images. As already introduced, although the high resolution of μ𝜇\muitalic_μ-CTs offers many advantages, images can either be too rich or have little variability between different tissues; this can negatively affect CNN generalization capability, resulting in misclassifications; this is further exacerbated when only a few images are available. The herein proposed framework addresses such challenges by automatically extracting meaningful information from μ𝜇\muitalic_μ-CT images. μ𝜇\muitalic_μ-Net faces different tasks with different specialized models; each model is trained to automatically solve a small part of the whole task: each model segments a different area of the heart, and we defined an ad-hoc ensembling procedure to combine the results. One of the key advantages of our approach is versatility; indeed, our framework can be applied to any μ𝜇\muitalic_μ-CT images in the medical domain, regardless of the organs/tissues involved,thus resulting as a valuable tool for researchers across various disciplines. It is flexible and adaptable, as it can be configured with different architectures and customized according to the dataset; moreover, it is specifically tailored to preprocess and postprocess images of this kind with suitable filters, taking into account the 3D nature of the image.

Refer to caption
Figure 1: Initially, the μ𝜇\muitalic_μ-Net employs a CNN to identify the ventricles. Following this, two separate DL architectures carry out binary segmentation of various areas. The final result is obtained by applying an ensemble strategy to the different segmentations produced by each model.

Fig. 1 shows the architecture of μ𝜇\muitalic_μ-Net whose aim is to automatically segment heart morphology in μ𝜇\muitalic_μ-CT images of goldfish. We propose the following five-step procedure for performing semantic segmentation of the goldfish μ𝜇\muitalic_μ-CT images: 1. Data acquisition and preparation: biological samples are collected and properly stained; subsequently, μ𝜇\muitalic_μ-CT projections are acquired and the volume is reconstructed and normalized (see Section 3). 2. Data preprocessing: filters are applied to images according to the different tasks. 3. Segmentation model: we defined and trained three different models by dividing the segmentation problems (see Section 4.2); 4. Data post-processing: different filters and volumetric techniques are used to obtain a 3D coherence starting from 2D CNN models. 5. Ensembling models: we defined an ensembling set of rules to merge the results and obtain the whole final semantic segmentation (see Section 4.2).

We experimentally validate the proposed methodology for the particular image segmentation task and compared the results with the SOTA methods (see Section 4.4).

3 Dataset building and description

In our experiments, we use μ𝜇\muitalic_μ-CT of the heart of C. auratus (Linnaeus, C. (1758)), a teleost fish, also known as goldfish. The scans were manually annotated under the supervision of 4444 expert biologists for supervised learning.

The goldfish heart comprises four main components: sinus venosus, atrium, ventricle, and bulbous arteriosus [16]. Notably, the atrium features a spacious cavity with a muscular rim and a network of thin elastin and collagen fibers. Meanwhile, the ventricle consists of two distinct layers: the outer compacta, rich in blood vessels and muscle bundles oriented in various directions, and the inner spongiosa, which lacks blood vessels but contains numerous fibers. Sample preparation and dataset acquisition consist of different steps; it is worth noting that the entire procedure requires several hours. Given the anonymous nature of the submission, additional details on data sources will be disclosed in case of publication. The X-ray acquisition resulted in a challenging procedure, as the biological nature of the samples posed several technical issues. A staining procedure was applied to enhance the contrast of the samples; the absorption contrast technique was used to reconstruct 3D images. Each sample rotates with an angular step ΔθΔ𝜃\Delta\thetaroman_Δ italic_θ, and is penetrated by an X-ray beam at each step; the attenuated X-ray beam’s intensity is measured, creating a sinogram. The 3D structure is converted into a stack of 2D sinograms, that are fed to a reconstruction algorithm. We used the Filtered Back-Projection (FBP) algorithm for speed and simplicity. The resulting 2D image stack from FBP defines the 3D twin of the μ𝜇\muitalic_μ-CT sample. μ𝜇\muitalic_μ-CTs are manually segmented to define the ground truth labels, which hold clinical significance and include atrium, ventricle, Bulbus arteriosus, compacta, and lacunary spaces.

The analysis of such dataset is significantly interesting for studying cardiac pathologies. When subjected to oxygen deprivation [17, 18, 19], the goldfish accelerates its cardiac functions and, despite its small size, exhibits electrical activities similar to those of large mammals, which makes it relevant for translational research [20]. Moreover, the goldfish heart is influenced by many hormones and peptides acting as cardiac modulators in mammals under normal and stressful conditions [21]. These features make the goldfish an attractive model for exploring the mechanisms that give high flexibility to the heart, especially when facing internal and external challenges.

4 Experimental Design

4.1 Data acquisition, 3D reconstruction and data preparation

According to the previous section (see Sec. 3) and the difficulties that data acquisition and preparation hold, we acquired only 3 samples. For each sample, a total of Np=3600subscript𝑁𝑝3600N_{p}=3600italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 3600 projections with an angular resolution of 0.10.10.10.1 degrees were acquired. Samples were reconstructed in line with the section 3 using for each sample different projection dose. For each of the three samples, we used Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, Np2subscript𝑁𝑝2\frac{N_{p}}{2}divide start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG, and Np3subscript𝑁𝑝3\frac{N_{p}}{3}divide start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG projections to reconstruct three datasets, D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, D2subscript𝐷2D_{2}italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and D3subscript𝐷3D_{3}italic_D start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, respectively. This allowed us to test how well our architecture can handle tasks with varying levels of projection data. We also conducted an analysis on the impact of input dimensions on the model’s performance. This was achieved by generating new, cropped stacks from the original data, focusing on the region of interest. This approach enabled us to evaluate the adaptability of our model to tasks with varying input sizes. Each reconstructed sample consists of approximately 1500 slices of size 1300×1300130013001300\times 13001300 × 1300 pixels, where each voxel corresponds to 5,55μm555𝜇𝑚5,55\mu m5 , 55 italic_μ italic_m. We implemented a normalization process for the dataset to ensure uniformity without introducing any additional bias. In the field of tomography, a specific range of absorption values is selected during the reconstruction phase. This selection process results in each sample appearing self-normalized. For our 16-bit images, any pixel in the reconstructed image that falls below this threshold is assigned a value of 0, while those above it are assigned a value of 216superscript2162^{16}2 start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT. Various strategies exist in the literature for this process, but we chose to reconstruct our samples using the same range of absorption values. This range serves as a normalization factor in our methodology. Furthermore, in our experiment, we also considered the 2D-trans-axial projection of μ𝜇\muitalic_μ-CTs: Axial View (corresponds to the XY plane, which is perpendicular to the rotation axis Z), Sagittal View (XZ plane), and Coronal View (YZ plane). Then, we split the entire dataset of 9 stacks of images (3 datasets ×\times× 3 stacks), composed of D1subscript𝐷1D_{1}italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTD2subscript𝐷2D_{2}italic_D start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and D3subscript𝐷3D_{3}italic_D start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT into D1,trainsubscript𝐷1𝑡𝑟𝑎𝑖𝑛D_{1,train}italic_D start_POSTSUBSCRIPT 1 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPTD1,testsubscript𝐷1𝑡𝑒𝑠𝑡D_{1,test}italic_D start_POSTSUBSCRIPT 1 , italic_t italic_e italic_s italic_t end_POSTSUBSCRIPTD2,trainsubscript𝐷2𝑡𝑟𝑎𝑖𝑛D_{2,train}italic_D start_POSTSUBSCRIPT 2 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPTD2,testsubscript𝐷2𝑡𝑒𝑠𝑡D_{2,test}italic_D start_POSTSUBSCRIPT 2 , italic_t italic_e italic_s italic_t end_POSTSUBSCRIPTD3,trainsubscript𝐷3𝑡𝑟𝑎𝑖𝑛D_{3,train}italic_D start_POSTSUBSCRIPT 3 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT and D3,testsubscript𝐷3𝑡𝑒𝑠𝑡D_{3,test}italic_D start_POSTSUBSCRIPT 3 , italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT. In this setting, the train subsets consist of 2 stacks (2 heart samples). The third sample is used as a single heart sample that is common to all the datasets, except for the number of projections (Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT). To reduce redundancy, we select only one image out of every three from the input stacks for training, since the images are nearly identical when they are next to each other. For each training iteration, we randomly select one tile from each 2D slice to reduce its size. The training set is split into 70%percent7070\%70 % for training and 30%percent3030\%30 % for validation to monitor progress and prevent overfitting.

4.2 Training phase and evaluation metrics

The framework was developed using Pytorch (v1.13.0). A high-performance computing node with two Tesla V100-PCIE-16GB GPUs, Intel® Xeon® Gold 5118 CPU (2.30GHz), and 512GB of RAM was used for training. Jaccard index, also known as the Intersection Over Union (IoU) coefficient, is used as the evaluation metric during the training (i.e., 1 means perfect prediction, 0 worst prediction) [22, 23]. To assess the overall performance on the test set, after reconstructing the entire volume, we employed an IoU weighted by the frequency of each class in the 3D image.

4.3 Ablation study

As for the ablation study we conducted, we explored:

  • (A.1) Hyperparameter space: learning rate, tile size, model architecture, preprocessing, and postprocessing. For each aspect, we compare several options and report the results on the validation set.

  • (A.2) DL-based models: Segnet [24], DeepLabV3 [25] and U-net [26].

  • (A.3) Input parameters: normalization type, tile size, number of slices for each stack (i.e., number of images are fed into the model at once), preprocessing and postprocessing methods. To reduce the computational cost of processing large images and to avoid losing small-scale details relevant to our samples, we applied random crop** of sub-images, called tiles. We also experimented with different filters on input images (i.e., histogram equalization, median, unsharpmask filter).

  • (A.4) Number of projections: variation of projection dose and, consequently, the generated 3D images (see Sec. 3); number of projections affects the quality of the image in terms of noise, spatial resolution, and artifacts. We found it useful to examine how performance changed with spatial resolution.

4.4 Experiments

As mentioned, we split the semantic segmentation task into separated sub-problems, focused on specific classes. Therefore, we have conducted several experiments: (1) Semantic Segmentation of Atrium, Ventricle, and Bulbus arteriosus. The chosen model is the Segnet [24] (see subsection 4.3). The train ran for 150150150150 epochs with a learning rate of 0.00010.00010.00010.0001, Adam optimizer, and tiles dimension of 400×400400400400\times 400400 × 400 pixels. We trained our model on the XY view of the sample and inferred on all three views (XY, XZ, and YZ). To obtain the final prediction and the 3D continuity, first, we chose the pixel-based mode for the pixels of intersection between the views and then we applied a hole-filling algorithm. (2) Binary Segmentation of lacunary spaces. We used the Segnet model with the same parameters as the previous step, except for the tile size of 224×224224224224\times 224224 × 224 pixels. The model received as input only the ventricle image part obtained from the previous step. To enhance the contrast between lacunary spaces and tissue, we applied an unsharp mask filter to the input image. (3) Binary Segmentation of compacta. Similarly, we used a Segnet model with the same parameters and tile size. Also in this case we used as input only the ventricle part of the image. (4) Overall Semantic Segmentation via Ensambling. Once the results of the three experiments have been obtained, an ensembling strategy was performed to obtain a single segmentation result. Such strategy consists of a set of rules: (I) the Atrium class is always chosen over all other classes; (II) the Bulbus arteriosus class is preferred over the lacunary spaces and compacta ones; (III) the compacta class is selected in preference to the Ventricle and lacunary spaces classes; (IV) the lacunary spaces class is favored over the Ventricle class.

Comparison approaches To evaluate our approach, we compared it with two SOTA anatomical image segmentation tools: Biomedisa [27] and nnU-net [28]. Biomedisa is a platform designed for semi-automatic and automatic segmentation of large volumetric images, using smart interpolation of sparsely pre-segmented slices. It should be highlighted that we made use of the automatic image segmentation version of Biomedisa and it takes in input 3D images. nnU-net, on the other hand, is a DL-based method that self-configures for any new segmentation task, covering preprocessing, network architecture, training, and post-processing.

5 Results and Discussion

Among the tested architectures (see Sec. 4.3 (A.2)) we discarded DeepLabV3 due to its poor accuracy. Although U-net achieved good performance, Segnet obtained the best results on the test set. Therefore, we selected Segnet as the model for each experiment. Also, according to Sec. 4.3 (A.3), we tested two data normalization techniques: a𝑎aitalic_a normalize the data based on the mean and standard deviation of each stack and b𝑏bitalic_b normalize during the reconstruction process. The technique described in b𝑏bitalic_b yielded better results in the ablation study so we adopted it. As stated by Sec. 4.3 (A.4), we trained different models varying on the number of projections used for the reconstruction stack.
Results of the first experiment (see Sec. 4.4) show that as the number of projection doses decreases, the spatial resolution deteriorates [29]. The models were then evaluated on various test sets with different numbers of projection doses. Results are reported in Tab. 1. The models were tested by using the test sets for each of the 3 different datasets and performing 3-fold cross-validation over the 3 stacks in each dataset.

Table 1: Performance achieved by μ𝜇\muitalic_μ-Net on the test set according to different training sets with a different number of projection doses.
Train set IOU (%) on Test set
D1,testsubscript𝐷1𝑡𝑒𝑠𝑡D_{1,test}italic_D start_POSTSUBSCRIPT 1 , italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT D2,testsubscript𝐷2𝑡𝑒𝑠𝑡D_{2,test}italic_D start_POSTSUBSCRIPT 2 , italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT D3,testsubscript𝐷3𝑡𝑒𝑠𝑡D_{3,test}italic_D start_POSTSUBSCRIPT 3 , italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT
D1,trainsubscript𝐷1𝑡𝑟𝑎𝑖𝑛D_{1,train}italic_D start_POSTSUBSCRIPT 1 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT 87.9 (±3.7)plus-or-minus3.7(\pm 3.7)( ± 3.7 ) 77.6 (±4.5)plus-or-minus4.5(\pm 4.5)( ± 4.5 ) 44.6 (±7.8)plus-or-minus7.8(\pm 7.8)( ± 7.8 )
D2,trainsubscript𝐷2𝑡𝑟𝑎𝑖𝑛D_{2,train}italic_D start_POSTSUBSCRIPT 2 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT 44.5 (±7.8)plus-or-minus7.8(\pm 7.8)( ± 7.8 ) 73.7 (±3.4)plus-or-minus3.4(\pm 3.4)( ± 3.4 ) 81.1 (±3.5)plus-or-minus3.5(\pm 3.5)( ± 3.5 )
D3,trainsubscript𝐷3𝑡𝑟𝑎𝑖𝑛D_{3,train}italic_D start_POSTSUBSCRIPT 3 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT 30.3 (±8.5)plus-or-minus8.5(\pm 8.5)( ± 8.5 ) 81.7 (±3.6)plus-or-minus3.6(\pm 3.6)( ± 3.6 ) 86.8 (±3.6)plus-or-minus3.6(\pm 3.6)( ± 3.6 )
D1,train+D2,trainsubscript𝐷1𝑡𝑟𝑎𝑖𝑛subscript𝐷2𝑡𝑟𝑎𝑖𝑛D_{1,train}+D_{2,train}italic_D start_POSTSUBSCRIPT 1 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT 2 , italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT 87.9 (±3.7)plus-or-minus3.7(\pm 3.7)( ± 3.7 ) 86.4 (±3.4)plus-or-minus3.4(\pm 3.4)( ± 3.4 ) 88.6 (±3.7)plus-or-minus3.7(\pm 3.7)( ± 3.7 )
Table 2: Comparing IOU scores for our proposal, nnU-net, and Biomedisa methods. Best results for each class are reported in bold.
IOU (%) on Test set
Ventricle Bulbus arteriosus Atrium Compacta Lacunary spaces Total
μ𝜇\muitalic_μ-Net 94.5 (±3.4)plus-or-minus3.4(\pm 3.4)( ± 3.4 ) 77.5 (±3.7)plus-or-minus3.7(\pm 3.7)( ± 3.7 ) 87.9 (±2.4)plus-or-minus2.4(\pm 2.4)( ± 2.4 ) 77.2 (±2.4)plus-or-minus2.4(\pm 2.4)( ± 2.4 ) 84.8 (±3.6)plus-or-minus3.6(\pm 3.6)( ± 3.6 ) 87.6 (±3.7)plus-or-minus3.7(\pm 3.7)( ± 3.7 )
nnU-net 80.2 (±3.4)plus-or-minus3.4(\pm 3.4)( ± 3.4 ) 80.1 (±3.4)plus-or-minus3.4(\pm 3.4)( ± 3.4 ) 61.1 (±5.2)plus-or-minus5.2(\pm 5.2)( ± 5.2 ) 64.8 (±4.2)plus-or-minus4.2(\pm 4.2)( ± 4.2 ) 72.9 (±4.4)plus-or-minus4.4(\pm 4.4)( ± 4.4 ) 76.8 (±5.2)plus-or-minus5.2(\pm 5.2)( ± 5.2 )
biomedisa 63.5 (±5.4)plus-or-minus5.4(\pm 5.4)( ± 5.4 ) 16.3 (±8.4)plus-or-minus8.4(\pm 8.4)( ± 8.4 ) 20.5 (±7.4)plus-or-minus7.4(\pm 7.4)( ± 7.4 ) 50.2 (±5.6)plus-or-minus5.6(\pm 5.6)( ± 5.6 ) 56.9 (±5.4)plus-or-minus5.4(\pm 5.4)( ± 5.4 ) 43.8 (±8.4)plus-or-minus8.4(\pm 8.4)( ± 8.4 )

The table shows that models trained with spatial high-resolution images obtain good performances on a spatial high-resolution test set, while worse performances are reported for a spatial low-resolution test set (see the first row of Tab. 1). Training on spatial medium and low-resolution images results in better performance in similar-resolution test sets than the ones obtained on spatial high-resolution (see the second and third row of Tab. 1). This evidences that as the number of projections (and hence the spatial resolution) decreases, the performance of the network deteriorates due to a lack of information. However, training a model using images with different spatial resolutions results in a more stable performance, meaning that the network has more generalization capability across resolutions.
Tab. 2 shows performance results in terms of IOUs for each experiment performed (see Sec. 4.4). They largely hinge on the outcome of the initial experiment, the better the ventricle is detected in the first experiment the better the compacta and lacunary spaces will be detected in the same area of interest. Our workflow achieved an IOU value of 87.6%percent87.687.6\%87.6 % where a very good identification of ventricle (94.5%percent94.594.5\%94.5 %) allows a good detection of compacta and lacunary spaces, 77.2%percent77.277.2\%77.2 % and 84.8%percent84.884.8\%84.8 %, respectively. We compared our proposal with two SOTA semantic segmentation models, nnU-net and Biomedisa. We trained both models to segment atrium, ventricle, and Bulbus arteriosus regions only; our workflow achieved a higher IOU than both models, with 76.8% for nnU-net and 43.8% for Biomedisa.The lower performance of nnU-net may be attributed to its automatic selection of patch size, filters, and normalization. We used the 2D configuration of nnU-net, as the 3D one was not feasible due to the limited number of samples (less than 3) and the high computational demand. On the contrary, Biomedisa’s performance was poor due to its use of a 3D U-net that standardized each sample size, leading to a reduction in resolution and distortion of shapes.

Refer to caption
Figure 2: Visualization of ground truth, μ𝜇\muitalic_μ-Net and the comparison methods results. Each image represents a slice taken from the same quartile of slices within a single 3D stack

A visual inspection of the results is shown in Fig. 2 where we compared from the left to the right: the manual segmentation (labels), the predicted segmentation of μ𝜇\muitalic_μ-Net, and the results of the comparison methods. We can observe that μ𝜇\muitalic_μ-Net demonstrates excellent segmentation of small and medium lacunary spaces, while it tends to confuse larger ones with the background. As for other anatomical regions, μ𝜇\muitalic_μ-Net notably outperforms the comparison methods. In accordance with our hypothesis, these results highlight how the strategy underlying our framework, which decomposes the problem into simpler sub-problems before utilizing ensemble techniques, is more effective than tackling the problem as a whole, as seen in the cases of nnU-Net and Biomedisa. Finally, our workflow can speed up the processing of new scans and is adaptable for additional segmentation tasks. The models are primed for training on more goldfish hearts or for transfer learning on other high-resolution μ𝜇\muitalic_μ-CT tasks. The data from our automatic segmentation can be quantitatively analyzed like manually segmented data.

6 Conclusions

We introduced μ𝜇\muitalic_μ-Net, an novel workflow built on Segnet for the semantic segmentation of biological μ𝜇\muitalic_μ-CT images. Training was performed on a new dataset, encompassing the collection of manually segmented 3D μ𝜇\muitalic_μ-CT scans of goldfish hearts. The experiments showed that μ𝜇\muitalic_μ-Net enhances efficiency and dependability of image segmentation techniques by treating each class separately. μ𝜇\muitalic_μ-Net significantly outperformed existing methods, setting the stage for a more precise and automated examination of goldfish heart morphology and potential diseases, thus facilitating translational studies.

References

  • [1] Malimban, J., Lathouwers, D., Qian, H., Verhaegen, F., Wiedemann, J., Brandenburg, S., Staring, M.: Deep learning-based segmentation of the thorax in mouse micro-ct scans. Scientific Reports 12(1),  1822 (2022)
  • [2] Roß, T., Bruno, P., Reinke, A., Wiesenfarth, M., Koeppel, L., Full, P.M., Pekdemir, B., Godau, P., Trofimova, D., Isensee, F., et al.: Beyond rankings: Learning (more) from algorithm validation. Medical image analysis 86, 102765 (2023)
  • [3] Bruno, P., Spadea, M.F., Scaramuzzino, S., De Rosa, S., Indolfi, C., Gargiulo, G., Giugliano, G., Esposito, G., Calimeri, F., Zaffino, P.: Assessing vascular complexity of paod patients by deep learning-based segmentation and fractal dimension. Neural Computing and Applications 34(24), 22015–22022 (2022)
  • [4] Fu, Y., Lei, Y., Wang, T., Curran, W.J., Liu, T., Yang, X.: A review of deep learning based methods for medical image multi-organ segmentation. Physica Medica 85, 107–122 (2021)
  • [5] Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A., Van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88 (2017)
  • [6] Adornetto, C., Guzzo, A., Vasile, A.: Automatic medical report generation via latent space conditioning and transformers. In: 2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). pp. 0428–0435 (2023). https://doi.org/10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361320
  • [7] da Cruz, L.B., Araújo, J.D.L., Ferreira, J.L., Diniz, J.O.B., Silva, A.C., de Almeida, J.D.S., de Paiva, A.C., Gattass, M.: Kidney segmentation from computed tomography images using deep neural network. Computers in Biology and Medicine 123, 103906 (2020). https://doi.org/10.1016/j.compbiomed.2020.103906
  • [8] Matula, J., Polakova, V., Salplachta, J., Tesarova, M., Zikmund, T., Kaucka, M., Adameyko, I., Kaiser, J.: Resolving complex cartilage structures in developmental biology via deep learning-based automatic segmentation of x-ray computed microtomography images. Scientific Reports 12(1),  8728 (2022)
  • [9] Nikan, S., Van Osch, K., Bartling, M., Allen, D.G., Rohani, S.A., Connors, B., Agrawal, S.K., Ladak, H.M.: Pwd-3dnet: a deep learning-based fully-automated segmentation of multiple structures on temporal bone ct scans. IEEE Trans. on Image Processing 30, 739–753 (2020)
  • [10] Sforazzini, F., Salome, P., Moustafa, M., Zhou, C., Schwager, C., Rein, K., Bougatf, N., Kudak, A., Woodruff, H., Dubois, L., et al.: Deep learning–based automatic lung segmentation on multiresolution ct scans from healthy and fibrotic lungs in mice. Radiology: Artificial Intelligence 4(2), e210095 (2022)
  • [11] Clark, D., Badea, C.: Advances in micro-ct imaging of small animals. Physica Medica 88, 175–192 (2021)
  • [12] Liu, T., Tian, Y., Zhao, S., Huang, X., Wang, Q.: Residual convolutional neural network for cardiac image segmentation and heart disease diagnosis. IEEE Acc. 8, 82153–82161 (2020)
  • [13] Zheng, Q., Delingette, H., Duchateau, N., Ayache, N.: 3-d consistent and robust segmentation of cardiac images by deep learning with spatial propagation. IEEE transactions on medical imaging 37(9), 2137–2148 (2018)
  • [14] Sharobeem, S., Le Breton, H., Lalys, F., Lederlin, M., Lagorce, C., Bedossa, M., Boulmier, D., Leurent, G., Haigron, P., Auffret, V.: Validation of a whole heart segmentation from computed tomography imaging using a deep-learning approach. Journal of Cardiovascular Translational Research 15(2), 427–437 (2022)
  • [15] Xu, Z., Wu, Z., Feng, J.: Cfun: Combining faster r-cnn and u-net network for efficient whole heart segmentation. arXiv preprint arXiv:1812.04914 (2018)
  • [16] Filice, M., Cerra, M.C., Imbrogno, S.: The goldfish carassius auratus: an emerging animal model for comparative cardiac research. Journal of Comparative Physiology B 192(1), 27–48 (2022)
  • [17] Filice, M., Gattuso, A., Imbrogno, S., Tota, B., Cerra, M.: Functional, structural, and molecular remodelling of the goldfish (carassius auratus) heart under moderate hypoxia. Fish Physiology and Biochemistry (2024). https://doi.org/10.1007/s10695-024-01297-7
  • [18] Filice, M., Mazza, R., Leo, S., Gattuso, A., Cerra, M.C., Imbrogno, S.: The hypoxia tolerance of the goldfish (carassius auratus) heart: The nos/no system and beyond. Antioxidants 9(6) (2020). https://doi.org/10.3390/antiox9060555
  • [19] Imbrogno, S., Capria, C., Tota, B., Jensen, F.: Nitric oxide improves the hemodynamic performance of the hypoxic goldfish (carassius auratus) heart. Nitric Oxide 42, 24–31 (2014). https://doi.org/10.1016/j.niox.2014.08.012
  • [20] Bazmi, M., Escobar, A.L.: Excitation–contraction coupling in the goldfish (carassius auratus) intact heart. Frontiers in Physiology 11,  1103 (2020)
  • [21] Imbrogno, S., Filice, M., Cerra, M.C.: Exploring cardiac plasticity in teleost: The role of humoral modulation. General and Comparative Endocrinology 283, 113236 (2019)
  • [22] Eelbode, T., Bertels, J., Berman, M., Vandermeulen, D., Maes, F., Bisschops, R., Blaschko, M.B.: Optimization for medical image segmentation: Theory and practice when evaluating with dice score or jaccard index. IEEE Transactions on Medical Imaging 39(11), 3679–3690 (2020). https://doi.org/10.1109/TMI.2020.3002417
  • [23] Maier-Hein, et al.: Metrics reloaded: recommendations for image analysis validation. Nature Methods 21, 195–212 (2024). https://doi.org/10.1038/s41592-023-02151-z
  • [24] Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39(12), 2481–2495 (2017)
  • [25] Yurtkulu, S.C., Şahin, Y.H., Unal, G.: Semantic segmentation with extended deeplabv3 architecture. In: 2019 27th Signal Processing and Communications Applications Conference (SIU). pp. 1–4. IEEE (2019)
  • [26] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
  • [27] Lösel, P.D., Van de Kamp, T., Jayme, A., Ershov, A., Faragó, T., Pichler, O., Tan Jerome, N., Aadepu, N., Bremer, S., Chilingaryan, S.A., Heethoff, M., Kopmann, A., Odar, J., Schmelzle, S., Zuber, M., Wittbrodt, J., Baumbach, T., Heuveline, V.: Introducing biomedisa as an open-source online platform for biomedical image segmentation. Nature Communications 11(1), Article no: 5577 (2020). https://doi.org/10.1038/s41467-020-19303-w, 56.03.10; LK 01
  • [28] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)
  • [29] Villarraga-Gómez, H., Smith, S.T.: Effect of the number of projections on dimensional measurements with x-ray computed tomography. Precision Engineering 66, 445–456 (2020)