H-SynEx: Using synthetic images and ultra-high resolution
ex vivo MRI for hypothalamus subregion segmentation

Livia Rodrigues Martina Bocchetta Oula Puonti Douglas Greve Ana Carolina Londe Marcondes França Simone Appenzeller Juan Eugenio Iglesias Leticia Rittner
Abstract

The hypothalamus is a small structure located in the center of the brain and is involved in significant functions such as slee**, temperature, and appetite control. Various neurological disorders are also associated with hypothalamic abnormalities. Automated image analysis of this structure from brain MRI is thus highly desirable to study the hypothalamus in vivo. However, most automated segmentation tools currently available focus exclusively on T1w images. In this study, we introduce H-SynEx, a machine learning method for automated segmentation of hypothalamic subregions that generalizes across different MRI sequences and resolutions without retraining. H-synEx was trained with synthetic images built from label maps derived from ultra-high resolution ex vivo MRI scans, which enables finer-grained manual segmentation when compared with 1mm𝑚𝑚mmitalic_m italic_m isometric in vivo images. We validated our method using Dice Coefficient (DSC) and Average Hausdorff distance (AVD) across in vivo images from six different datasets with six different MRI sequences (T1, T2, proton density, quantitative T1, fractional anisotrophy, and FLAIR). Statistical analysis compared hypothalamic subregion volumes in controls, Alzheimer’s disease (AD), and behavioral variant frontotemporal dementia (bvFTD) subjects using the Area Under the Receiving Operating Characteristic curve (AUROC) and Wilcoxon rank sum test. Our results show that H-SynEx successfully leverages information from ultra-high resolution scans to segment in vivo from different MRI sequences. Our automated segmentation was able to discriminate controls versus Alzheimer’s Disease patients on FLAIR images with 5mm𝑚𝑚mmitalic_m italic_m spacing. H-SynEx is openly available at https://github.com/liviamarodrigues/hsynex.

keywords:
Hypothalamus segmentation, ex vivo MRI, domain randomization
\externaldocument

Supplementary_material

\affiliation

[1]Universidade Estadual de Campinas, School of Electrical and Computer Engineering \affiliation[2]Massachusetts General Hospital, Harvard Medical School \affiliation[3]Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, United Kingdom \affiliation[4]Centre for Cognitive and Clinical Neuroscience, Division of Psychology, Department of Life Sciences, College of Health, Medicine and Life Sciences, Brunel University London, United Kingdom \affiliation[5]Universidade Estadual de Campinas - School of Medical Sciences \affiliation[6]Centre for Medical Image Computing, University College London \affiliation[7]Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology

{highlights}

The development of a fully automated segmentation method trained on synthetic images derived from ex vivo MRI label maps capable of identifying hypothalamic subregions across various MRI sequences and resolutions, including clinical acquisitions with large slice spacing;

The usage of ultra-high resolution ex vivo images to build the label maps yields a highly accurate model of the hypothalamus anatomy.

H-SynEx outperforms other state-of-the-art methods in two patient-control comparisons conducted in this study and is currently the only method capable of segmenting hypothalamic subregions on MRI sequences other than T1w and T2w.

1 Introduction

The hypothalamus is a small, cone-shaped, gray-matter structure located in the central part of the brain. It is composed of subnuclei containing the cell bodies of multiple neuron subtypes. Despite its small dimensions, the hypothalamus plays a significant role in controlling sleep, body temperature, appetite, and emotions, among other functions [1, 2]. In the literature, several studies establish a connection between the whole hypothalamus and neurodegenerative diseases such as Alzheimer’s disease [3], Huntington’s disease [4, 5], Behavioral Variant Frontotemporal Dementia (bvFTD)  [6, 7], Amyotrophic Lateral Sclerosis (ALS) [8, 9], among others [10, 11, 12, 13]. Some studies suggest a differential involvement of the hypothalamic subregions across conditions [6], leading to the belief that studying these subregions individually is essential for a better understanding of these conditions.

MRI enables the study of the human brain in vivo, but many analyses (e.g., volumetry) require manual segmentations that are challenging and time-consuming. For the hypothalamus, manual segmentation is particularly prone to high inter- and intra-rater variability due to its small size and low contrast with neighboring tissue [14, 15, 16]. Even with the help of semi-automated methods, a segmentation of a single scan can take up to 40 minutes [17], making large-scale studies impractical at most research sites. To better understand the role of the hypothalamus, several studies use different MRI sequencies [8, 10, 12, 3, 18]. However, these studies are limited to select sites and require specialists with neuroanatomical knowledge to perform manual annotation.

Numerous supervised methods have been proposed for the hypothalamus automated segmentation on T1w [19, 14, 16, 15, 20] and T2w images [16]. However, none of these methods can segment images at anisotropic resolution (often the case in clinical MRI) or in different sequences than the ones they were trained on (T1w/T2w). They all require retraining to function across different sequences and resolutions, necessitating more labeled data. The use of semi-supervised models on medical images enhances the generalization of networks without necessarily increasing the quantity of annotated data [21, 22]. However, most of these models work only in one type of MRI sequence and usually need retraining to adapt to different sequences. Synthetic images allow the construction of training datasets and flawless ground truths [23, 24] and the development of methods capable of generalizing in across different MRI sequences [25, 24].

So far, all automated hypothalamus segmentation methods were conducted using manual segmentation of in vivo images with resolutions ranging between 0.8mm𝑚𝑚mmitalic_m italic_m and 1mm𝑚𝑚mmitalic_m italic_m. Being a small structure, the delineation of the hypothalamus is significantly affected by partial volume effects, even in high-resolution images (such as 0.8mm𝑚𝑚mmitalic_m italic_m). Recently, the usage of ultra-high resolution ex vivo MRI has proven to be beneficial in the segmentation of small structures such as the hippocampus, amygdala, and thalamus [26, 27, 28], as it permits a better visualization of their anatomical boundaries, leading to more accurate manual annotation

In this article, we train a model using synthetic images derived from label maps built from ultra-high resolution ex vivo MRI. Using synthetic images provides robustness againt changes in MRI contrast, while constructing the label maps from ex vivo images provides more accurate delineation of the hypothalamus at higher resolution, enhancing the automated segmentation quality.

H-SynEx, our automated method for hypothalamic subregion segmentation, demonstrates robustness across different MRI contrasts and resolutions. In our experiments, we evaluate its resilience across T1w, T2w, PD, qT1, FA, and FLAIR sequences, as well as in data with 5mm𝑚𝑚mmitalic_m italic_m spacing.

2 Data

2.1 Training Data

The data used for training H-SynEx comprises synthetic images derived from 3D segmentations (label maps). These label maps are built using a dataset consisting of 10 post mortem MRI acquisitions of brain hemispheres [29] of 5 male and 5 female specimenss who died of natural causes with no clinical diagnoses or neuropathology. The voxel resolution ranges from 120 to 150 μ𝜇\muitalic_μm. The age at the time of death ranges from 54 to 79 years, with an average of 66.4 ±plus-or-minus\pm± 8.46 years. The dataset is publicly available at the Distributed Archives for Neurophysiology Data Integration (DANDI Archive)111https://dandiarchive.org/dandiset/000026/draft/files?location= [30] (Figure 1).

Refer to caption
Figure 1: ex vivo MR images: Examples of three images used during the method development

2.2 Test Data

The method evaluation relies on in vivo images from 6 different datasets (Table 1):

  • 1.

    FreeSurfer Maintenance (FSM) [20]: Composed of 29 subjects from which 7 were used for validation and 22 for testing. For each subject, we have T1-weighted (T1w), T2-weighted (T2w), proton density (PD), fracitional anisotropy (FA), and quantitative T1 (qT1) acquisitions (Figure 2). In all cases the voxel resolution is 1mm𝑚𝑚mmitalic_m italic_m isotropic. FSM contains manual labeling for the whole hypothalamus and its subregions (right and left anterior-superior, anterior-inferior, tuberal-superior, tuberal-inferior, and posterior). The manual segmentation was performed on in vivo images, and thus with limited accuracy. This dataset was approved by the Massachusetts General Hospital Internal Review Board for the protection of human subjects and all subjects gave written informed consent.

  • 2.

    MiLI [15]: The MICLab-LNI Initiative comprises manual and automated segmentations of the entire hypothalamus conducted on T1w images with slice thickness between 0.9mm𝑚𝑚mmitalic_m italic_m and 1.2mm𝑚𝑚mmitalic_m italic_m. However, it lacks segmentations for hypothalamic subregions. It includes subjects from various open datasets such as MiLI, OASIS [31], and IXI [32]. We only used the manually segmented images, totaling 55 from MiLI (30 controls and 25 ataxia patients), 23 from OASIS, and 19 from IXI. For the latter dataset, as it also encompasses T2w and proton density (PD) acquisitions, we incorporated these modalities in our experiments.

  • 3.

    ADNI [33]: We used a total of 572 controls (280 male and 292 female with average age of 75.5±6.4plus-or-minus75.56.475.5\pm 6.475.5 ± 6.4 and 73.6±6.01plus-or-minus73.66.0173.6\pm 6.0173.6 ± 6.01, respectively) and 271 Alzheimer’s disease (AD) patients (143 male and 98 female with average age of 75.34±7.6plus-or-minus75.347.675.34\pm 7.675.34 ± 7.6 and 73.8±7.6plus-or-minus73.87.673.8\pm 7.673.8 ± 7.6, respectively) for both T1w (1mm𝑚𝑚mmitalic_m italic_m isometric) and FLAIR (0.85mm×0.85mm×5mm0.85𝑚𝑚0.85𝑚𝑚5𝑚𝑚0.85mm\times 0.85mm\times 5mm0.85 italic_m italic_m × 0.85 italic_m italic_m × 5 italic_m italic_m) modalities. The ADNI dataset does not have manual segmentation of the hypothalamus.

  • 4.

    NIFD [34]: From the Neuroimaging in Frontotemporal Dementia dataset, we used 111 controls (49 male and 62 female with average age of 61.8±7.4plus-or-minus61.87.461.8\pm 7.461.8 ± 7.4 and 63.4±7.8plus-or-minus63.47.863.4\pm 7.863.4 ± 7.8, respectively) against 74 behavioral variant frontotemporal dementia (bvFTD) patients (51 male and 23 female with average age of 61.16±5.8plus-or-minus61.165.861.16\pm 5.861.16 ± 5.8 and 62.4±7.7plus-or-minus62.47.762.4\pm 7.762.4 ± 7.7, respectively). The voxel resolution is 1mm𝑚𝑚mmitalic_m italic_m isotropic. The NIFD dataset does not have manual segmentation of the hypothalamus.

Refer to caption
Figure 2: Example of different modalities (FSM dataset)
Table 1: Datasets used for model validation and testing; WS: Whole Structure, SR: Subregion
Dataset Name Sequence type Acquis. Number Subjects Number Voxel Resolution Manual Segmentation Content Segmentation Protocol
Validation FSM T1w, T2w, PD 35 7 Controls 1mm𝑚𝑚mmitalic_m italic_m isometric WS/SR WS:Author [20]
Testing FA, qT1 SR:Bocchetta et al [6]
  FSM T1w, T2w PD, 110 22 Controls 1mm𝑚𝑚mmitalic_m italic_m isometric WS/SR WS:Author [20]
FA, qT1 SR:Bocchetta et al [6]
MiLI T1 55 30 Controls slice thickness between WS WS:Rodrigues et al [15]
25 Patients
MiLI-OASIS T1 23 23 Controls 0.9mm𝑚𝑚mmitalic_m italic_m and 1.2mm𝑚𝑚mmitalic_m italic_m
MiLI-IXI T1w, T2w, PD 57 19 Controls
ADNI T1w, FLAIR 1686 572 Controls 1mm𝑚𝑚mmitalic_m italic_m isometric (T1w)
271 AD Patients 0.85×0.85×5mm0.850.855𝑚𝑚0.85\times 0.85\times 5mm0.85 × 0.85 × 5 italic_m italic_m (FLAIR) No manual
NIFD T1 185 111 Controls 1mm𝑚𝑚mmitalic_m italic_m isometric segmentation
74 bvFTD patients

3 Methods

3.1 Preprocessing of ex vivo MRI

We will train our neural networks with synthetic images generated from label maps. In order to create these, some operations were first performed on the ex vivo scans::

  • 1.

    Preprocessing: we reoriented the images to conform to positive RAS standards, flipped the right hemispheres, eliminated of all non-brain voxels and performed bias field correction. Also, we resampled the voxels to 0.3 mm𝑚𝑚mmitalic_m italic_m to find a balance between high resolution and computational cost. (Figure 3(a,b)).

  • 2.

    Creation of label maps: Starting with the preprocessed MRI data, we manually delineated the hypothalamus and its subregions. Also, we needed the whole-brain segmentation to bring context around the hypothalamus. However, as other brain structures are not the primary focus of segmentation, it is not necessary for their segmentation to be performed manually, as they may contain noise and may not directly correspond to brain structures. Therefore, we generated automated whole-brain segmentation using k-means, with the value of k𝑘kitalic_k varying from 4 to 9, to introduce more variability into the dataset. Lastly, we merged both manual and automated brain hemisphere segmentation (Figure 3(c)) and mirrored it to generate a complete whole-brain label map L[D×H×W]𝐿delimited-[]𝐷𝐻𝑊L\left[D\times H\times W\right]italic_L [ italic_D × italic_H × italic_W ] (Figure 3(d)). The mirroring process is conducted using an optimization technique that aims to minimize gaps and overlaps. More details on the label maps creation can be found at Rodrigues et al [35].

  • 3.

    Find MNI coordinates: Given that the hypothalamus is a small structure, the use of spatial priors is helpful during training. To achieve this, we integrate MNI coordinates for each input voxel by registering the label maps into the MNI space. Using the label maps L[D×H×W]𝐿delimited-[]𝐷𝐻𝑊L\left[D\times H\times W\right]italic_L [ italic_D × italic_H × italic_W ], we generate a Gaussian image G[D×H×W]𝐺delimited-[]𝐷𝐻𝑊G\left[D\times H\times W\right]italic_G [ italic_D × italic_H × italic_W ] that simulates a T1w MRI. Subsequently, we registered G𝐺Gitalic_G to the MNI space using NiftiReg [36] and obtain the MNI coordinates C[3×D×H×W]𝐶delimited-[]3𝐷𝐻𝑊C\left[3\times D\times H\times W\right]italic_C [ 3 × italic_D × italic_H × italic_W ] of the registered image. C𝐶Citalic_C serves as an additional input channel during training to support the network.

  • 4.

    Crop: We crop L𝐿Litalic_L and C𝐶Citalic_C around the hypothalamus, resulting in two standardized arrays, Lcrop[200×200×200]subscript𝐿cropdelimited-[]200200200L_{\text{crop}}\left[200\times 200\times 200\right]italic_L start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT [ 200 × 200 × 200 ] and Ccrop[3×200×200×200]subscript𝐶cropdelimited-[]3200200200C_{\text{crop}}\left[3\times 200\times 200\times 200\right]italic_C start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT [ 3 × 200 × 200 × 200 ], which corresponds to a field of view of 60×60×60mm606060𝑚𝑚60\times 60\times 60mm60 × 60 × 60 italic_m italic_m.

  • 5.

    One-hot array: We convert Lcropsubscript𝐿cropL_{\text{crop}}italic_L start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT into a one-hot array Lone[V×200×200×200]subscript𝐿onedelimited-[]𝑉200200200L_{\text{one}}\left[V\times 200\times 200\times 200\right]italic_L start_POSTSUBSCRIPT one end_POSTSUBSCRIPT [ italic_V × 200 × 200 × 200 ], being V𝑉Vitalic_V the number of labels presented on Lcropsubscript𝐿cropL_{\text{crop}}italic_L start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT. V𝑉Vitalic_V varies according to the K𝐾Kitalic_K labels employed on the whole brain segmentation.

Refer to caption
Figure 3: Image preprocessing and label maps creation. (a) Original  ex vivo image. (b) Preprocessed image (c) Automated brain segmentation (k𝑘kitalic_k=4) and manual hypothalamus segmentation merged (d) The final label map is cropped around the hypothalamus (yellow box) to generate the synthetic images.

3.2 Manual segmentation of the hypothalamus in training data: ex vivo images

H-SynEx is capable of segmenting 10 subregions of the hypothalamus, being right and left Anterior inferior, Anterior Superior, Tuberal inferior, Tuberal Superior, and Posterior. However, since the ex vivo images present only one whole hemisphere of the brain, the manual segmentation was done in only one side of the hypothalamus. The whole structure and subregion segmentation protocol are based on Rodrigues et al [15] and Bocchetta et al [6], repectively. We also automatically delineate the fornix, using morphological closing. The details on the manual segmentation protocol used to delineate the hypothalamus are described in [35].

3.3 Training

3.3.1 Synthetic Images Generation

The synthetic image generation (Figure 5(a)) is performed on the fly during training. At each iteration, one of the training label maps, L𝐿Litalic_L, is randomly selected. L𝐿Litalic_L goes throught the preprocessing presented on Section 3.1, which results on the cropped label map (Lcropsubscript𝐿cropL_{\text{crop}}italic_L start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT) and MNI coordinates (Ccropsubscript𝐶cropC_{\text{crop}}italic_C start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT). Then, we apply aggressive geometric augmentation that encompasses random crop, rotation, and elastic transformation on both Lcropsubscript𝐿cropL_{\text{crop}}italic_L start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT and Ccropsubscript𝐶cropC_{\text{crop}}italic_C start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT.,Next, we use the generative model proposed by SynthSeg [24] based on Gaussian Mixture Models conditioned on the transformed Lcropsubscript𝐿𝑐𝑟𝑜𝑝L_{crop}italic_L start_POSTSUBSCRIPT italic_c italic_r italic_o italic_p end_POSTSUBSCRIPT, using randomized parameters for contrast and resolution to create the final synthetic image The transformed Lcropsubscript𝐿cropL_{\text{crop}}italic_L start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT will be the target used to train the network. To assist training, we use an Euclidean distance map (E𝐸Eitalic_E) derived from the target, which has been proven to help locate boundary features during segmentation tasks [37]. E𝐸Eitalic_E is part of the loss function and is only employed during training, not being necessary during inference. The final input of the network is the concatenation of the synthetic image and the transformed Ccropsubscript𝐶cropC_{\text{crop}}italic_C start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT.

Refer to caption
Figure 4: Examples of coronal slices from 3D synthetic images used as input: The images shown here came from the label maps cropped around the hypothalamus. The use of aggressive data augmentation along random contrast values on the generative model results in large variability in the appearance of the input images.

3.3.2 Training architecture

Two distinct sub-models were trained separately, one for the entire hypothalamus (Mhypsubscript𝑀hypM_{\text{hyp}}italic_M start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT) and another specifically for its subregions (Msubsubscript𝑀subM_{\text{sub}}italic_M start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT) (Figure 5 (b)). Both Mhypsubscript𝑀hypM_{\text{hyp}}italic_M start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT and Msubsubscript𝑀subM_{\text{sub}}italic_M start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT are 3D-UNets [38, 39], however, in both cases, we added a skip connection between the input channels referring to the transformed Ccropsubscript𝐶cropC_{\text{crop}}italic_C start_POSTSUBSCRIPT crop end_POSTSUBSCRIPT and the final convolutional block to ensure that the original positional encoding is readily available at full-resolution also in the decoder. Mhypsubscript𝑀hypM_{\text{hyp}}italic_M start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT receives I𝐼Iitalic_I as input and outputs Ohypsubscript𝑂hypO_{\text{hyp}}italic_O start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT. The input of Msubsubscript𝑀subM_{\text{sub}}italic_M start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT is defined as Isub=IOhypsubscript𝐼sub𝐼subscript𝑂hypI_{\text{sub}}=I*O_{\text{hyp}}italic_I start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT = italic_I ∗ italic_O start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT. While Ohypsubscript𝑂hypO_{\text{hyp}}italic_O start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT is a 2-channel array representing the hypothalamus and its background, Osubsubscript𝑂subO_{\text{sub}}italic_O start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT, the output of Msubsubscript𝑀subM_{\text{sub}}italic_M start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT, is a 13-channel array encompassing the subregions, right and left fornices and background.

Refer to caption
Figure 5: Training Flowchart: (a) Generation of synthetic images: The synthetic images S are generated using the label maps from the ex vivo images. (b) Models training: there are two training blocks, one focused on the entire hypothalamus and another specialized in subregion segmentation. The training of the two blocks is done subsequently. We first trained the whole structure segmentation model(Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT), and later, the model for the subregions segmentation(Msubsubscript𝑀𝑠𝑢𝑏M_{sub}italic_M start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT). However, the output of Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT is used to assist the input creation of Msubsubscript𝑀𝑠𝑢𝑏M_{sub}italic_M start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT during training.

3.3.3 Loss function and training details

The loss function applied to Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT (1) is a combination of Dice Loss (DL𝐷𝐿DLitalic_D italic_L) and Mean Square Error (MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E), while the loss function applied to Msubsubscript𝑀𝑠𝑢𝑏M_{sub}italic_M start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT (2), on the other hand, combines DL𝐷𝐿DLitalic_D italic_L and Cross Entropy (CE𝐶𝐸CEitalic_C italic_E). Although our goal is to optimize the Dice coefficient, the Dice loss function has flat gradients away from the optimum at initialization. This issue is mitigated by combining it with other loss functions such as the MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E and the cross-entropy (CE) loss, which provides better gradient information and improves training efficiency.

Lhyp=αDL(T,Tpred)+βMSE(E,Epred)subscript𝐿hyp𝛼𝐷𝐿𝑇subscript𝑇pred𝛽MSE𝐸subscript𝐸predL_{\text{hyp}}=\alpha*DL\left(T,T_{\text{pred}}\right)+\beta*\text{MSE}\left(E% ,E_{\text{pred}}\right)italic_L start_POSTSUBSCRIPT hyp end_POSTSUBSCRIPT = italic_α ∗ italic_D italic_L ( italic_T , italic_T start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT ) + italic_β ∗ MSE ( italic_E , italic_E start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT ) (1)
Lsub=αDL(T,Tpred)+βCE(T,Tpred)subscript𝐿sub𝛼𝐷𝐿𝑇subscript𝑇pred𝛽CE𝑇subscript𝑇predL_{\text{sub}}=\alpha*DL\left(T,T_{\text{pred}}\right)+\beta*\text{CE}\left(T,% T_{\text{pred}}\right)italic_L start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT = italic_α ∗ italic_D italic_L ( italic_T , italic_T start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT ) + italic_β ∗ CE ( italic_T , italic_T start_POSTSUBSCRIPT pred end_POSTSUBSCRIPT ) (2)

For both models, we used Adam optimizer with a learning rate of 51055superscript1055*10^{-5}5 ∗ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, a batch size of 32, and values of α𝛼\alphaitalic_α and β𝛽\betaitalic_β as 0.30.30.30.3 and 0.70.70.70.7, respectively. As stop criteria, we simply trained Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT for 40000 training steps and did not use any validation set. However, on Msubsubscript𝑀𝑠𝑢𝑏M_{sub}italic_M start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT, we used 35 images from FSM (5 acquisitions from different MRI sequences from 7 distinct subjects) as validation set(Table 1). We set an early stop criteria based on the DSC of the validation set. For this, we defined the stop** criteria as δminsubscript𝛿𝑚𝑖𝑛\delta_{min}italic_δ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT = 0.001. The network trained for 28000 steps and stopped. Both 3D U-Net modules are composed of an encoder of 5 levels with 24, 48, 96, 192, and 384 feature maps. Each convolutional block is composed of three layers: group normalization, convolution, and activation function (ReLU).

3.4 Inference and Post processing

The inference process is summarized in Figure 6. The first step is preprocessing, in which we find the MNI coordinates (Cinfsubscript𝐶infC_{\text{inf}}italic_C start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT) of the input image, by using a fast deep learning algorithm, EasyReg [25]. The input of Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT, defined as Ainfsubscript𝐴infA_{\text{inf}}italic_A start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT, is found by crop** and concatenating Cinfsubscript𝐶infC_{\text{inf}}italic_C start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT and the original image (Iinfsubscript𝐼infI_{\text{inf}}italic_I start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT). The input for Msubsubscript𝑀subM_{\text{sub}}italic_M start_POSTSUBSCRIPT sub end_POSTSUBSCRIPT, however, is formed by the product of Ainfsubscript𝐴infA_{\text{inf}}italic_A start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT, the output of Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT, and the ventral diencephalon (VDC) label, which is derived from the whole brain segmentation produced by EasyReg [25]. The inclusion of the ventral-DC label is justified as we found it to reduce false positives within the anterior subregion. The post-processing phase comprises two sequential steps: the rescaling of the final segmentation to match the voxel size of the original image (Iinfsubscript𝐼infI_{\text{inf}}italic_I start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT), and the exclusion of voxels that belong to the third ventricle by using the whole brain segmentation obtained from EasyReg [25].

Refer to caption
Figure 6: Inference flowchart: The inference image Iinfsubscript𝐼infI_{\text{inf}}italic_I start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT goes through a preprocessing step to find the input array Ainfsubscript𝐴infA_{\text{inf}}italic_A start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT. Ainfsubscript𝐴infA_{\text{inf}}italic_A start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT is then applied to the whole structure segmentation model(Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT). Finally, using VDC, Ainfsubscript𝐴infA_{\text{inf}}italic_A start_POSTSUBSCRIPT inf end_POSTSUBSCRIPT and the output of Mhypsubscript𝑀𝑦𝑝M_{hyp}italic_M start_POSTSUBSCRIPT italic_h italic_y italic_p end_POSTSUBSCRIPT (Ohyp_infsubscript𝑂hyp_infO_{\text{hyp\_inf}}italic_O start_POSTSUBSCRIPT hyp_inf end_POSTSUBSCRIPT), we create the input for the subregion segmentation model (Msubsubscript𝑀𝑠𝑢𝑏M_{sub}italic_M start_POSTSUBSCRIPT italic_s italic_u italic_b end_POSTSUBSCRIPT) and find the final subregions segmentation.

3.5 Statistical Analysis

The statistical analysis was done using the AVD and DSC combined with Wilcoxon signed-rank tests to assess the statistical significance of differences in performance across methods. We also compared the ability of H-SynEx and competing methods to find statistical differences in the volume of hypothalamus subregions of controls and patients (AD and bvFTD). Since the datasets have few subjects and we can not assess with high significance that the distribution is Gaussian, the statistical analyses were conducted considering non-parametric distributions. We used Wilcoxon rank-sum test to assess the significant difference in medians between groups and the area under the receiving operating characteristic curve (AUROC) as a non-parametric version of effect sizes between groups. Finally, we used the DeLong test to compare AUROCs across methods operating on the same sample. All statistical tests were conducted with a confidence level of 95% (pvalue<0.05)𝑝𝑣𝑎𝑙𝑢𝑒0.05(p-value<0.05)( italic_p - italic_v italic_a italic_l italic_u italic_e < 0.05 )

4 Experiments and Results

H-SynEx was trained using synthetic images derived from ultra-high-resolution ex vivo label maps. While the synthetic approach increase the network ability to generalize across different sequences, the use of ex vivo images improve the ability to delineate the hypothalamus due to their ultra-high-resolution. Given that, our experiments were structured to assess the method’s applicability under diverse conditions (Table 2).

Table 2: Summary of conducted experiments
Testing set
Experiment Objective Dataset Number of Acquisitions MRI
per MRI Sequence Sequence
Inter-Rater Metrics To establish a baseline for FSM 10 T1
evaluation metrics
  Direct comparison with To assess whereas the method FSM 22 T1w, T2w,PD,
manual segmentation on is capable to segment FA, qT1
different sequences on different MRI sequences IXI 19 T1w,T2w,PD
  Comparing H-SynEx MiLI 55 T1
Comparing against against other state-of-the-art MiLI-OASIS 23
state-of-the-art methods available methods MiLI-IXI 19
using only T1 images FSM 22
  Application in Assess the method usability ADNI 843 T1
Group Studies in group studies NIFD 185
  Resilience to large To assess usability ADNI 843 T1w, FLAIR
slice spacing on diverse MRI resolution

4.1 Consistency between labeling protocols

One of the primary challenges in analyzing the results of our experiment is that each dataset used in testing has a distinct manual segmentation protocol, none of which aligns with the one employed in training H-SynEx due to the difference between in vivo and ex vivo images [35]. Therefore, our initial experiment aims to establish an upper bound value for DSC and AVD by comparing inter-rater metrics using distinct segmentation protocols performed on T1w images. We compare manual segmentations in 10 FSM images delineated by two different raters: the first uses the FSM protocol (Table 1) while the second employs the protocol used during the label maps construction(Section 3.2).

The results (Table 3) shows that despite the AVD is influenced by different protocols, its values remain small, with the highest being 0.43. The DSC metric, however, is affected by both the variations in segmentation protocols and the small size of the hypothalamus subregions, resulting in final values of 0.66 or lower.

Table 3: Inter-rater metrics (median) for 10 subjects from FSM
Subregion Metric DSC AVD(mm)
Anterior 0.63 0.41
Tuberal 0.66 0.43
Posterior 0.66 0.38

4.2 Direct comparison with manual segmentation on different sequences

In this experiment, we aim to evaluate the ability of H-SynEx to properly segment the subregions of the hypothalamus in different MRI sequences. We employed five different sequences from FSM - T1w, T2w, proton density (PD), fractional anisotropy (FA), and quantitative T1 (qT1)- and three from IXI -T1w, T2w, and PD. As other methods from the literature exclusively operate on T1w images, a quantitative comparison of their metrics with H-SynEx was not possible in this experiment.

Analyzing H-SynEx metrics on different sequences, we can see that the method presents a better performance on T1w images (Figure 8). Yet, it is capable of segmenting the hypothalamus and its subregions in all the proposed MRI sequences, as can be seen in Figure 7.

Refer to caption
Figure 7: Qualitative results in different datasets, sequences, and resolutions for H-SynEx. Other methods, when applied to sequences different from T1w, return no results
Refer to caption
Figure 8: DC, and AVD for H-SynEx across diverse sequences. Top row: IXI dataset, which only presents the segmentation of the whole structure (excluding the mammillary bodies). Bottom row: FSM dataset, that contains the segmentation of the hypothalamus and its subregions.
Refer to caption
Figure 9: Hypothalamus volume: H-SynEx and manual segmentation (target) volumes for FSM dataset.

4.3 Comparing against other state-of-the-art methods

To compare H-SynEx with other state-of-the-art models [14, 15, 20], we used T1w images from MiLI and FSM datasets and analyzed the whole hypothalamus segmentation. It is worth noting that the MiLI segmentation protocol does not include the mammillary bodies. Therefore, for this dataset, we excluded the posterior subregion from the results before computing the metrics. Similarly, HypAST does not segment the posterior subregion, therefore we excluded it from FSM in this case, before running the metrics.

Given that Billot et al [14] works only on T1w images, we compared its results on the hypothalamus suregions with H-SynEx on 22 T1w images from FSM (Table 4). Finally, to compare H-SynEx with ScLimbic [20] and Rodrigues et al [15] we used the whole structure (Table 5).

Table 4: AVD and DSC(median) for H-SynEx and Billot et al. on different subregions for FSM dataset. †indicates statistical significance on a two-sided Wilcoxon rank-sum test using Bonferroni correction for p<0.05𝑝0.05p<0.05italic_p < 0.05

.

Subregion Model H-SynEx Billot et al.
AVD (mm) Anterior 0.54 1.32
Tuberal 0.49 0.66
Posterior 0.33 0.52
DICE Anterior 0.53 0.33
Tuberal 0.59 0.58
H-Posterior 0.67 0.55
Table 5: AVD and DSC(median) for H-SynEx, ScLimbic [20] and Billot et al. [14] on different datasets (MiLI, IXI, OASIS, and FSM) for the entire hypothalamus (except MB). The symbols indicate statistical significance on a two-sided Wilcoxon rank-sum test using Bonferroni correction for p<0.05𝑝0.05p<0.05italic_p < 0.05: (*) Billot vs H-SynEx; () ScLimbic vs H-SynEx; () Billot vs ScLimbic. Since ScLimbic was trained using the FSM dataset, we did not consider these results. Similarly, since HypAST was trained using data from MiLI, IXI and the same segmentation protocol as OASIS, we did not consider these results

. Model Dataset MiLI IXI OASIS FSM AVD (mm) Billot 0.46 0.61* 0.47 0.40 HypAST - - - 0.41 ScLimbic 0.39 0.44 0.49 - H-SynEx 0.45 0.45 0.5 0.43 DICE Billot 0.66* 0.6 0.65* 0.68 HypAST - - - 0.69 ScLimbic 0.67 0.64 0.59 - H-SynEx 0.63 0.62 0.58 0.65

4.4 Application to group studies

In this experiment, we employ H-SynEx on images acquired from both patient and control groups to simulate the real-world application of this method by physicians. Also, we assess the ability of the network to separate groups as a proxy for performance on datasets that have no ground truth segmentation

In the literature, we can find some studies that point to hypothalamic atrophy in both AD and bvFTD patients [6, 40]. Therefore, to evaluate the group studies, we compared the hypothalamic subregion volumes of patients and control groups from ADNI (AD subjects) and NIFD (bvFTD subjects). We normalized the volumes by dividing them by the total intracranial volume (TIV), provided by SynthSeg [24]. This normalization is a common practice in volumetric studies with brain MRI. For comparative purposes, we conducted the analysis using Billot et al. and compared with H-SynEx through DeLong test [41].

Observing the applicability of the methods on group studies (Table 6), H-SynEx achieved statistical significance (p<0.05)𝑝0.05(p<0.05)( italic_p < 0.05 ) in the Wilcoxon rank-sum test in all hypothalamic subregions when comparing AD vs. controls, while Billot et al. was unable to detect differences in the tuberal-inferior region. Additionally, in some cases, we observed a higher AUROC in H-SynEx, along with a pvalue<0.05𝑝𝑣𝑎𝑙𝑢𝑒0.05p-value<0.05italic_p - italic_v italic_a italic_l italic_u italic_e < 0.05 for the DeLong test, indicating the ability of H-SynEx to better discern differences between the two groups in this dataset. Regarding NIFD, the results were similar for both models, except for the tuberal-inferior region.

4.5 Resilience to large slice spacing

In this experiment, we applied H-SynEx on FLAIR images from the ADNI dataset acquired with a slice spacing (and thickness) of 5mm5𝑚𝑚5mm5 italic_m italic_m in the axial plane. Here, we want to evaluate our method’s capability to identify hypothalamic atrophy with larger spacings, which are common in clinical MRI. Once no other method in the literature works with FLAIR images, we solely compared H-SynEx segmentations on 5mm𝑚𝑚mmitalic_m italic_m spacing FLAIR images from the same subjects from the ADNI dataset used in Experiment 4.4. When analyzing the volumes, H-SynEx returns statistically significant results (Table 6) when comparing patient and control volumes normalized by TIV in all subregions, except for the posterior subregion.

Refer to caption
Figure 10: Normalized volume correlation for FLAIRs vs T1w (ADNI Dataset) using H-SynEx segmentation. Up: Control subjects; Down: AD patients. We can see that besides the posterior subregion, we can find a positive correlation between FLAIR and T1w normalized volumes.
Table 6: AUROC Values for patients vs. controls for H-SynEx and Billot methods in ADNI and NIFD datasets. For ADNI dataset, we also analyze our method when applied to FLAIR images with spacing of 5mm5𝑚𝑚5mm5 italic_m italic_m. Stars indicate the level of statistical significance (two-sided Wilcoxon rank-sum test) between both cohorts (* p<0.05𝑝0.05p<0.05italic_p < 0.05, ** p<0.01𝑝0.01p<0.01italic_p < 0.01). indicates statistical significance on the DeLong test (p<0.05𝑝0.05p<0.05italic_p < 0.05) between H-SynEx and Billot methods. indicates statistical significance on the DeLong test (p<0.05𝑝0.05p<0.05italic_p < 0.05) between H-SynEx applied on T1-w and H-SynEx applied on Flairs.
Dataset ADNI NIFD
Subregion Model H-SynEx Flair H-SynEx T1w Billot T1w H-SynEx T1w Billot T1w
Whole 0.66** 0.74** 0.65** 0.79** 0.74**
a-sHyp 0.60** 0.69** 0.72** 0.76** 0.75**
a-iHyp 0.60** 0.64** 0.55* 0.72** 0.62**
supTub 0.68** 0.60** 0.67** 0.76** 0.76**
infTub 0.67** 0.73** 0.52 0.74** 0.59*
postHyp 0.52 0.72** 0.70** 0.7** 0.73**

5 Discussion and Conclusion

Due to the small size of the hypothalamus and its low contrast compared to neighboring tissues, its manual segmentation is challenging, and variable among and within raters. These characteristics extend across various MRI sequences. To address this issue, we introduced H-SynEx, a novel automated segmentation method for the hypothalamus and its subregions. To the best of our knowledge, H-SynEx is the first method to combine ultra-high-resolution ex vivo MRI and synthetic images. This integration has allowed us to develop a method capable of effectively segmenting small structures, such as hypothalamus subregions, across various MRI sequences and resolutions, including FLAIR images with a spacing of 5mm𝑚𝑚mmitalic_m italic_m.

Typically, when evaluating how well a developed segmentation method generalizes, we compare it to others found in existing literature. To do this, it is common to use a dataset that none of the methods have seen during training. However, when these methods use training sets with different segmentation protocols, this difference can introduce bias, favoring the method trained under the same protocol as the test images. By using ex vivo images to construct the training set, the segmentation protocol used in training H-SynEx became different from any other in vivo image set. Consequently, the main challenge in analyzing the results lies in the difference between the training and test protocols. Focusing on that, we compared the manual segmentation of two raters who employed distinct protocols on 10 T1w images from the FSM dataset and found inter-rater DSC values lower or equal to 0.66 and AVD higher or equal to 0.38. We use these values as a baseline for analyzing the metrics in the subsequent experiments.

On Experiment 2, we analyzed H-SynEx usability across different MRI sequences. We could assess that T1w images presented the best results. However, despite the lower DSC and higher AVD values for the other sequences, it is important to emphasize that the manual segmentations of the hypothalamus subregions in both FSM and IXI were done in T1w images, not being influenced by the different contrasts of other sequences. Additionally, while the FSM images for each subject are already registered, this is not the case for the IXI dataset. Hence, the manual segmentations were registered to be used on the different sequences acquisitions of the same subject. Both registration and the use of a different sequence for manual segmentation may compromise the final results. Finally, we could notice a high variability on both metrics, which may be explained by the small size of the hypothalamus. This hypothesis is reassured by comparing the volumes delineated by H-SynEx and manual segmentation in the FSM dataset (Figure 9). We can see that both the posterior and anterior subregions, which show greater variability in the DSC and AVD, are relatively smaller than the tuberal subregion. Furthermore, the variability in volumes across sequences and subregions appears to be less pronounced than the variability in the metrics. For instance, for the anterior subregions we can see a large variability in the DSC, which is less pronounced in both AVD and the volumetric analysis. This may imply that the small size of the anterior subregion may be interfering in the final DSC values. The same analysis is valid for the posterior region.

When comparing H-SynEx with other state-of-the-art methods, we see that H-SynEx outperforms Billot et al in almost every metric for subregion segmentation. Here, it is important to highlight that despite DSC values seem to be low at first glance, they are not far from the values observed in the inter-rater analysis. H-SynEx AVD values, however, demonstrate greater similarity to inter-rater AVD, particularly in the posterior subregion where even lower AVD values are observed. Additionally, H-SynEx AVD metrics are substantially lower compared to those reported by Billot et al. Observing AVD and DSC for the whole structure (Table 5), H-SynEx outperforms Billot et al and returns similar results to HypAST [15] and ScLimbic [20] on the former, despite not achieving the best performance on the latter. However, when dealing with small structures with complex boundaries, distance metrics such as AVD, are more suitable to compare different methods  [42]. Also, it is important to emphasize that all other methods were exclusively trained on in vivo T1w images, not having to deal with domain gap. Despite not achieving the highest quantitative results on T1w images, H-SynEx offers a distinct advantage. Built upon well-established domain randomization methods, it demonstrates superior generalization ability across MRI sequences. This enhanced robustness stems from its ability to handle variations in data, making it more adaptable to different imaging conditions.

When comparing volumes of the hypothalamus from patient and control groups on T1w images, we have confirmed that our method detects expected differences in all subregions in ADNI and NIFD datasets, with AUROCs of 0.74 and 0.79 respectively, and pvalue<0.05𝑝𝑣𝑎𝑙𝑢𝑒0.05p-value<0.05italic_p - italic_v italic_a italic_l italic_u italic_e < 0.05 for the Wilcoxon signed-rank test in both cases. Notably, the AUROC values reported to NIFD are higher than those found in ADNI (Table 6). This behavior is expected since bvFTD patients tend to exhibit more pronounced hypothalamic atrophy than AD patients (10-12% volume loss in AD and 15-20% in bvFTD) [43]. Additionally, we determined that H-SynEx results differ statistically from Billot et al for the entire hypothalamus and in most subregions in the ADNI dataset, with a pvalue<0.05𝑝𝑣𝑎𝑙𝑢𝑒0.05p-value<0.05italic_p - italic_v italic_a italic_l italic_u italic_e < 0.05 for the DeLong test.

Finally, we analyzed the same subjects from ADNI used in experiment 4, but using FLAIR images with a spacing of 5mm𝑚𝑚mmitalic_m italic_m. It is possible to see that, similarly to when analyzing T1w images, the method was able to differentiate between patients and controls in almost all subregions, except for the posterior. This may be explained by the 5mm𝑚𝑚mmitalic_m italic_m spacing of the FLAIR images since it makes many images lack the mammilary bodies, or limit it to just one slice of the image. For this reason, the small AUROC values in this subregion are expected. Finally, we plotted the correlation among T1w and FLAIR normalized volumes (Figure 10) to investigate whether H-SynEx exhibits consistency among them. The anterior subregion displays a moderate correlation (r=0.40 and r=0.50, respectively), and tuberal subregions have strong correlations (r=0.79 and r=0.80, respectively), both for controls and AD subjects. As expected, the posterior correlation is weak in both cases (r=0.11 and r=0.22). These results support the hypothesis that the method can be used in challenging resolutions and still detect differences among groups.

Although H-SynEx leverages randomized synthetic images to mitigate training bias, a limitation remains. The model’s accuracy on unseen data can still be affected by the image contrast itself. For instance, when analysing Experiment 4.2, in both IXI and FSM there is only one label per subject, done on T1w images. Therefore the manual segmentations used to generate the quantitative results were not influenced by different contrasts, which may influence the final results. Also, we could demonstrate that the smallest subregions (anterior and posterior) had the biggest variability, especially in DC, an overlap measure known for being sensitive to small structures [42].

To the best of our knowledge, we have presented the first automated method for hypothalamic subregion segmentation capable of working across different in vivo MRI sequences and resolutions without retraining. By producing reliable and consistent segmentations, H-SynEx facilitates the analysis of the hypothalamus in various pre-existing datasets, whether in research or clinical settings. Our tool is publicly available and has the potential to increase our understanding of the roles played by the hypothalamus and its subregions in neurodegenerative diseases and other related conditions.

6 Acknowledgements

L.Rodrigues acknowledges the Coordination for the Improvement of Higher Education Personnel (88887.716540/2022-00). M. Bocchetta is supported by a Fellowship award from the Alzheimer’s Society, UK (AS-JF-19a-004-517). J.E.Iglesias acknowledges NIH 1RF1MH123195, 1R01AG070988, and a grant from the Jack Satter foundation. L. Rittner acknowledges CNPq 313598/2020-7 and FAPESP 2013/07559-3. S.Appenzeller acknowledges CAPES Print, CAPES 001 e BRAINN.

References

  • [1] C. Neudorfer, J. Germann, G. J. Elias, R. Gramer, A. Boutet, A. M. Lozano, A high-resolution in vivo magnetic resonance imaging atlas of the human hypothalamic region, Scientific Data 7 (1) (2020) 305.
  • [2] C. B. Saper, B. B. Lowell, The hypothalamus, Current Biology 24 (23) (2014) R1111–R1116.
  • [3] R. Piyush, S. Ramakrishnan, Analysis of sub-anatomic volume changes in Alzheimer brain using diffusion tensor imaging, in: 2014 40th Annual Northeast Bioengineering Conference (NEBEC), IEEE, 2014, pp. 1–2.
  • [4] S. Gabery, N. Georgiou-Karistianis, et al., Volumetric analysis of the hypothalamus in huntington disease using 3T MRI: The image-hd study, PloS one 10 (2) (2015) e0117593.
  • [5] D. M. Bartlett, A. Reyes, et al., Investigating the relationships between hypothalamic volume and measures of circadian rhythm and habitual sleep in premanifest huntington’s disease, Neurobiology of sleep and circadian rhythms 6 (2019) 1–8.
  • [6] M. Bocchetta, E. Gordon, et al., Detailed volumetric analysis of the hypothalamus in behavioral variant frontotemporal dementia, Journal of Neurology 262 (2015) 2635–2642.
  • [7] O. Piguet, Å. Petersén, B. Yin Ka Lam, S. Gabery, K. Murphy, J. R. Hodges, G. M. Halliday, Eating and hypothalamus changes in behavioral-variant frontotemporal dementia, Annals of Neurology 69 (2) (2011) 312–319.
  • [8] M. Gorges, P. Vercruysse, et al., Hypothalamic atrophy is related to body mass index and age at onset in amyotrophic lateral sclerosis, Journal of Neurology, Neurosurgery & Psychiatry 88 (12) (2017) 1033–1041.
  • [9] R. M. Ahmed, F. Steyn, L. Dupuis, Hypothalamus and weight loss in amyotrophic lateral sclerosis, Handbook of Clinical Neurology 180 (2021) 327–338.
  • [10] J. Seong, J. Y. Kang, J. S. Sun, K. W. Kim, Hypothalamic inflammation and obesity: a mechanistic review, Archives of pharmacal research 42 (2019) 383–392.
  • [11] S. Modi, D. Thaploo, et al., Individual differences in trait anxiety are associated with gray matter alterations in hypothalamus: Preliminary neuroanatomical evidence, Psychiatry Research: Neuroimaging 283 (2019) 45–54.
  • [12] F. H. Wolfe, G. Auzias, et al., Focal atrophy of the hypothalamus associated with third ventricle enlargement in autism spectrum disorder, Neuroreport 26 (17) (2015) 1017–1022.
  • [13] M. Gutierrez, M. Garcia, J. Rodriguez, S. Rivero, S. Jacobelli, Hypothalamic-pituitary-adrenal axis function and prolactin secretion in systemic lupus erythematosus, Lupus 7 (6) (1998) 404–408.
  • [14] B. Billot, M. Bocchetta, et al., Automated segmentation of the hypothalamus and associated subunits in brain MRI, NeuroImage 223 (2020) 117287.
  • [15] L. Rodrigues, T. J. R. Rezende, G. Wertheimer, Y. Santos, M. França, L. Rittner, A benchmark for hypothalamus segmentation on t1-weighted mr images, NeuroImage 264 (2022) 119741.
  • [16] S. Estrada, D. Kügler, E. Bahrami, P. Xu, D. Mousa, M. Breteler, N. A. Aziz, M. Reuter, Fastsurfer-hypvinn: Automated sub-segmentation of the hypothalamus and adjacent structures on high-resolutional brain mri, arXiv preprint arXiv:2308.12736 (2023).
  • [17] J. Wolff, S. Schindler, et al., A semi-automated algorithm for hypothalamus volumetry in 3 Tesla magnetic resonance images, Psychiatry Research: Neuroimaging 277 (2018) 45–51.
  • [18] E. A. Schur, S. J. Melhorn, S.-K. Oh, J. M. Lacy, K. E. Berkseth, S. J. Guyenet, J. A. Sonnen, V. Tyagi, M. Rosalynn, B. De Leon, et al., Radiologic evidence that hypothalamic gliosis is associated with obesity and insulin resistance in humans, Obesity 23 (11) (2015) 2142–2148.
  • [19] L. Rodrigues, T. Rezende, et al., Hypothalamus fully automatic segmentation from MR images using a U-Net based architecture, in: 15th SIPAIM, Vol. 11330, International Society for Optics and Photonics, 2020, p. 113300J.
  • [20] D. N. Greve, B. Billot, D. Cordero, A. Hoopes, M. Hoffmann, A. V. Dalca, B. Fischl, J. E. Iglesias, J. C. Augustinack, A deep learning toolbox for automatic segmentation of subcortical limbic structures from mri images, Neuroimage 244 (2021) 118610.
  • [21] A. R. Fayjie, R. Dutta, P. Kashyap, U. R. Kumar, P. Vandewalle, Semi-supervised adversarial few-shot learning for medical image segmentation (2022).
  • [22] G. Bortsova, F. Dubost, L. Hogeweg, I. Katramados, M. De Bruijne, Semi-supervised medical image segmentation via learning consistency under transformations, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part VI 22, Springer, 2019, pp. 810–818.
  • [23] V. Thambawita, P. Salehi, S. A. Sheshkal, S. A. Hicks, H. L. Hammer, S. Parasa, T. d. Lange, P. Halvorsen, M. A. Riegler, Singan-seg: Synthetic training data generation for medical image segmentation, PloS one 17 (5) (2022) e0267976.
  • [24] B. Billot, D. N. Greve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A. V. Dalca, J. E. Iglesias, et al., Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining, Medical image analysis 86 (2023) 102789.
  • [25] J. E. Iglesias, Easyreg: A ready-to-use deep learning tool for symmetric affine and nonlinear brain mri registration (2023).
  • [26] J. E. Iglesias, J. C. Augustinack, K. Nguyen, C. M. Player, A. Player, M. Wright, N. Roy, M. P. Frosch, A. C. McKee, L. L. Wald, et al., A computational atlas of the hippocampal formation using ex vivo, ultra-high resolution mri: application to adaptive segmentation of in vivo mri, Neuroimage 115 (2015) 117–137.
  • [27] Z. M. Saygin, D. Kliemann, J. E. Iglesias, A. J. van der Kouwe, E. Boyd, M. Reuter, A. Stevens, K. Van Leemput, A. McKee, M. P. Frosch, et al., High-resolution magnetic resonance imaging reveals nuclei of the human amygdala: manual segmentation to automatic atlas, Neuroimage 155 (2017) 370–382.
  • [28] J. E. Iglesias, R. Insausti, G. Lerma-Usabiaga, M. Bocchetta, K. Van Leemput, D. N. Greve, A. Van der Kouwe, B. Fischl, C. Caballero-Gaudes, P. M. Paz-Alonso, et al., A probabilistic atlas of the human thalamic nuclei combining ex vivo mri and histology, Neuroimage 183 (2018) 314–326.
  • [29] I. Costantini, L. Morgan, J. Yang, Y. Balbastre, D. Varadarajan, L. Pesce, M. Scardigli, G. Mazzamuto, V. Gavryusev, F. M. Castelli, et al., A cellular resolution atlas of broca’s area, Science Advances 9 (41) (2023) eadg3844.
  • [30] Distributed archives for neurophysiology data integrationdoi:https://doi.org/10.5281/zenodo.7041535.
  • [31] P. J. LaMontagne, T. L. Benzinger, J. C. Morris, S. Keefe, R. Hornbeck, C. Xiong, E. Grant, J. Hassenstab, K. Moulder, A. Vlassenko, et al., OASIS-3: longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer disease, MedRxiv (2019).
  • [32] IXI Dataset, https://brain-development.org/ixi-dataset/, accessed: 2023-11-29.
  • [33] S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, L. Beckett, The alzheimer’s disease neuroimaging initiative, Neuroimaging Clinics 15 (4) (2005) 869–877.
  • [34] NIFD Dataset, https://ida.loni.usc.edu/collaboration/access/appLicense.jsp, accessed: 2023-11-29.
  • [35] L. Rodrigues, M. Bocchetta, O. Puonti, D. Greve, A. C. Londe, M. França, S. Appenzeller, L. Rittner, J. E. Iglesias, High-resolution segmentations of the hypothalamus and its subregions for training of segmentation models (2024). arXiv:2406.19492.
  • [36] M. Modat, J. McClelland, S. Ourselin, Lung registration using the niftyreg package, Medical image analysis for the clinic-a grand Challenge 2010 (2010) 33–42.
  • [37] X. Liu, L. Yang, J. Chen, S. Yu, K. Li, Region-to-boundary deep learning model with multi-scale feature fusion for medical image segmentation, Biomedical Signal Processing and Control 71 (2022) 103165.
  • [38] A. Wolny, L. Cerrone, A. Vijayan, R. Tofanelli, A. V. Barro, M. Louveaux, C. Wenzl, S. Strauss, D. Wilson-Sánchez, R. Lymbouridou, S. S. Steigleder, C. Pape, A. Bailoni, S. Duran-Nebreda, G. W. Bassel, J. U. Lohmann, M. Tsiantis, F. A. Hamprecht, K. Schneitz, A. Maizel, A. Kreshuk, Accurate and versatile 3d segmentation of plant tissues at cellular resolution, eLife 9 (2020) e57613. doi:10.7554/eLife.57613.
    URL https://doi.org/10.7554/eLife.57613
  • [39] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d u-net: learning dense volumetric segmentation from sparse annotation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, Springer, 2016, pp. 424–432.
  • [40] A. Tao, Z. Myslinski, Y. Pan, C. Iadecola, J. Dyke, G. Chiang, M. Ishii, Hypothalamic atrophy in alzheimer’s disease (1819) (2021).
  • [41] E. R. DeLong, D. M. DeLong, D. L. Clarke-Pearson, Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics (1988) 837–845.
  • [42] A. A. Taha, A. Hanbury, Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool, BMC medical imaging 15 (1) (2015) 1–28.
  • [43] P. Vercruysse, D. Vieau, et al., Hypothalamic alterations in neurodegenerative diseases and their relation to abnormal energy metabolism, Front.Mol. Neurosci. 11 (2018) 2.