Mask-Guided Attention U-Net for Enhanced Neonatal Brain Extraction and Image Preprocessing

Bahram Jafrasteh¹¹1Present address: Weill Cornell Medicine, Department of Radiology. [email protected] Simón Pedro Lubián-López [email protected] Emiliano Trimarco Macarena Román Ruiz Carmen Rodríguez Barrios Yolanda Marín Almagro Isabel Benavente-Fernández [email protected] Biomedical Research and Innovation Institute of Cádiz (INiBICA) Research Unit, Puerta del Mar University, Cádiz, Spain Division of Neonatology, Department of Pediatrics, Puerta del Mar University Hospital, Cádiz, Spain Area of Pediatrics, Department of Child and Mother Health and Radiology, Medical School, University of Cádiz, Cádiz, Spain

Abstract

In this study, we introduce MGA-Net, a novel mask-guided attention neural network, which extends the U-net model for precision neonatal brain imaging. MGA-Net is designed to extract the brain from other structures and reconstruct high-quality brain images. The network employs a common encoder and two decoders: one for brain mask extraction and the other for brain region reconstruction. A key feature of MGA-Net is its high-level mask-guided attention module, which leverages features from the brain mask decoder to enhance image reconstruction. To enable the same encoder and decoder to process both MRI and ultrasound (US) images, MGA-Net integrates sinusoidal positional encoding. This encoding assigns distinct positional values to MRI and US images, allowing the model to effectively learn from both modalities. Consequently, features learned from a single modality can aid in learning a modality with less available data, such as US. We extensively validated the proposed MGA-Net on diverse datasets from varied clinical settings and neonatal age groups. The metrics used for assessment included the DICE similarity coefficient, recall, and accuracy for image segmentation; structural similarity for image reconstruction; and root mean squared error for total brain volume estimation from 3D ultrasound images. Our results demonstrate that MGA-Net significantly outperforms traditional methods, offering superior performance in brain extraction and segmentation while achieving high precision in image reconstruction and volumetric analysis. Thus, MGA-Net represents a robust and effective preprocessing tool for MRI and 3D ultrasound images, marking a significant advance in neuroimaging that enhances both research and clinical diagnostics in the neonatal period and beyond.

keywords:

Deep Learning, Mask guided attention, U-net Architecture, Multimodal Image Processing, Brain Volume Estimation

1 Introduction

Neuroimaging studies are crucial for advancing our understanding of brain development and functioning, particularly in the field of neuroscience. The preprocessing process directly influences the quality and effectiveness of subsequent analyses (Biessmann et al., 2011). This is particularly true in the context of neonatal neuroimaging, where precise and sensitive handling of data is necessary to accommodate the unique challenges posed by this population. Preprocessing in neonates, and even more so in preterm-born neonates, represents a critical area of research given the vulnerable nature of their develo** brains and the clinical implications of these studies (Hintz et al., 2015; Hinojosa-Rodríguez et al., 2017). During the neonatal period, Magnetic Resonance Imaging (MRI) and 3D ultrasound (US) are the primary modalities for brain analysis. Both techniques offer distinct advantages and are often complementary. MRI provides high-resolution, detailed structural images of the brain, which are crucial for identifying developmental anomalies and guiding therapeutic interventions (Counsell et al., 2019). On the other hand, 3D ultrasound is a safe, real-time, and more accessible option that can be used at the bedside to monitor cerebral anatomy and vascular structures dynamically (Stanojevic et al., 2002; Beijst et al., 2020). The integration of these two modalities in neonatal brain imaging facilitates a comprehensive approach, allowing for detailed assessments that are not possible with either modality alone. However, effective integration of data from MRI and ultrasound requires sophisticated preprocessing techniques that can accurately handle and integrate diverse data types, enhance image quality, and prepare datasets for further analysis. This introduction to neonatal neuroimaging underscores the importance of develo** advanced preprocessing tools that can leverage the strengths of both MRI and ultrasound. Such tools must not only improve the clarity and usability of individual images but also harmonize data from different sources to provide a more holistic view of neonatal brain health.

Neuroimaging data preprocessing involves tasks such as brain extraction, bias field correction, noise removal, spacing adjustment, and brain alignment. Brain extraction, commonly referred to as skull strip**, plays a fundamental role in most preprocessing workflows, involving the removal of non-brain tissues, including skull and neck fat, from MRI scans (Thakur et al., 2020). This step is indispensable for complying with privacy regulations, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and the General Data Protection Regulation of 2016 (GDPR). Furthermore, brain extraction significantly impacts various analyses, including brain tumor and white matter lesion segmentation, cortical surface reconstruction, surgical interventions, neurodegeneration studies, radiation therapy planning, image registration, and predictions related to diseases like Alzheimer’s disease and multiple sclerosis (Zhao et al., 2010; Gitler et al., 2017; Radue et al., 2015).

Manual brain delineation, the prevailing method for skull strip**, involves a labor-intensive process of expert-driven delineation of the brain border. This step is crucial for further brain analysis, such as tissue segmentation (Jafrasteh et al., 2024), which allows detailed examinations of brain morphology and pathology. However, it is time-consuming, susceptible to inconsistencies between different raters, and poses challenges to reproducibility. In contrast, the subsequent preprocessing steps can be executed automatically with less supervision. Therefore, there is a growing emphasis on the development and use of automatic methods. In recent years, deep learning techniques, particularly Convolutional Neural Networks (CNNs), have emerged as state-of-the-art tools for image segmentation tasks (Hwang et al., 2019; Yu et al., 2022; Hoopes et al., 2022; Ranjbar et al., 2022; Pei et al., 2022).

While various preprocessing methods for MRI images exist, most attention has been directed toward images acquired using strictly standardized protocols, neglecting those gathered in less controlled situations. Additionally, preprocessing MRI data from preterm neonates, especially those with pathological conditions, remains largely unexplored. There are some recent studies such as (Iglesias et al., 2023) that has proposed an artificial intelligence based method to improve the quality of MRI images. However, the primary focus is not neonates. In the realm of 3D US imaging, the focus has predominantly been on fetal US, with limited studies addressing the preprocessing of neonatal and preterm born neonates (Namburete et al., 2018; Jafrasteh et al., 2023). This deficiency in automated methods for ultrasound image preprocessing hinders their application in broader neuroimaging studies (Moser et al., 2022).

Despite the potential of separate deep learning models to preprocess MRI and US images, this approach frequently encounters significant challenges, primarily due to the scarcity of extensive datasets, particularly ultrasound imaging. Deep learning models require substantial amounts of data to ensure robust performance. The limited availability of comprehensive US datasets severely limits the effectiveness of models tailored specifically to a single modality. Additionally, while training a unified deep learning model using paired MRI and US images may seem advantageous, this strategy often faces feasibility issues related to data limitations, making it impractical for widespread implementation. Given these challenges, there is a compelling need for a novel deep learning approach that effectively leverages the strengths of the available data across different imaging modalities.

Recently, mask-guided attention has demonstrated improved performance for different purposed such as image classification (Wang et al., 2021), image re-identification (Cai et al., 2019), image recover occluded pedestrian detection (Pang et al., 2019). However, to the best of our knowledge, there are few studies on develo** mask-guided attention in image reconstruction, specifically for brain image preprocessing. In this study, we developed a novel mask-guided attention module that utilizes brain maks to preprocess and reconstruct meaningful MRI and 3D neonatal ultrasound images. Morover, sinusoidal positional encoding is used to enhance the models ability to process distinct modalities within the same framework. MGA-Net is engineered to perform critical tasks, such as brain extraction, bias field correction, and noise removal for MRI, as well as similar preprocessing steps and total brain volume estimation for 3D neonatal US images. This integrated approach not only addresses the limitations of data scarcity but also enhances the model’s applicability and effectiveness in clinical settings. The validation of MGA-Net was rigorously performed using diverse MRI datasets, including those collected under less standardized conditions, thus demonstrating the model’s robustness across varying data quality (Figure 1. Additionally, a dataset comprising real 3D ultrasound images from preterm neonates was used to further validate and refine the ultrasound preprocessing capabilities of MGA-Net. This approach not only enhances the utility of the available data and maximizes the efficacy of preprocessing tasks across modalities.

Refer to caption — Figure 1: Example of brain extraction and preprocessing of T1w MRI images using MGA-Net across various postmenstrual ages, ranging from neonatal (25 weeks to 50 weeks) to eight years old patients.

1.1 Related study

1.1.1 MRI

In the context of traditional brain extraction tools, it is essential to mention the widely used Brain Extraction Tool (BET), introduced by Smith et al. (Smith, 2002). BET as a robust algorithm for brain extraction provides a baseline for comparison with advanced deep learning methods. Pei et al. (Pei et al., 2022) propose an ensemble neural network (EnNet) based on 3D Convolutional Neural Networks (3DCNN) for skull strip** in multiparametric MRI scans. By assessing the performance of various image modality combinations, the authors present a comprehensive dataset for validation, offering a fully automated method for skull strip**. Similarly, Ranjbar et al. (Ranjbar et al., 2022) focus on skull strip** in MRI images of brain tumor patients. Using the Dense-Vnet model, they address the challenge of limited publicly available data for training, emphasizing the potential of weakly supervised deep learning for MRI brain extraction, even in the presence of pathologies.

Yu et al. (Yu et al., 2022) introduce the Brain Extraction Net (BEN), a domain-adaptive and semi-supervised deep neural network designed to extract brain tissues across species, MRI modalities, and MR scanners. BEN was evaluated on a diverse set of datasets. Hoopes et al. (Hoopes et al., 2022) present SynthStrip, a learning-based skull-strip** tool that leverages synthetic training data to handle different imaging protocols. SynthStrip eliminates the need for specific target contrasts during training, demonstrating substantial improvements in accuracy compared to existing skull-strip** baselines and demonstrating potential for diverse MRI imaging scenarios. Expanding the scope of MRI-related preprocessing, additional studies, such as those by Zhao et al. (Zhao et al., 2010), Gitler et al. (Gitler et al., 2017), and Radue et al. (Radue et al., 2015) delve into various aspects, such as neurodegenerative diseases, cortical surface reconstruction, and correlations in radiation therapy.

1.1.2 Ultrasound imaging

In the field of ultrasound imaging, most recent studies have relied on automatic methods. Namburete et al. (Namburete et al., 2018) employ a stack of 16 2D axial slices to create a multitask learning model to extract the brain from fetal ultrasound images. Moser et al. (Moser et al., 2020) use a 3D U-Net architecture to extract fetal brains from 3D ultrasound images obtained from healthy fetuses with a gestational age ranging from 14 to 31 weeks. Moser et al. (Moser et al., 2022) introduce the Brain Extraction and Alignment Network (BEAN), a convolutional neural network (CNN) designed to automate the initial steps of neuroimage analysis, specifically brain extraction and alignment, for 3D ultrasound scans of the fetal brain. BEAN is a multi-task CNN featuring two independent branches for these tasks and has been evaluated on images from fetal infants with different postmenstrual ages (PMAs). To complement these studies, ongoing research into ultrasound imaging preprocessing is essential considering the limited existing literature, as highlighted by the work of Moser et al. (Moser et al., 2022).

2 The Proposed MGA-Net

We propose a novel architecture based on U-net for brain extraction and image preprocessing. The network inputs either MRI or 3D ultrasound (US) images. To account for the distinct spatial characteristics inherent in medical imaging modalities, sinusoidal positional encoding was incorporated into the network architecture. Specifically, we assign a value of 1 to sinusoidal positional encoding for MRI images and -1 for ultrasound (US) images. This encoding strategy was inspired by Vaswani et al. (2017) (Vaswani et al., 2017), who introduced positional encoding in the Transformer architecture for sequence-to-sequence tasks. By tailoring the positional encoding to the specific spatial attributes of MRI and US images, we aim to empower the neural network to capture and use modality-specific positional information. We utilized a convolutional layer for down-sampling through the U-net architecture, and the nearest neighbor method was used for up-sampling the model. In addition, we employ an attention module in the bottleneck layer of the proposed method to improve its robustness. After the bottle neck layer, there are two decoders: The first method generates non-strict brain boundaries using a signed distance transform (SDT) similar to (Hoopes et al., 2022). Each voxel’s distance to the skull boundary was positively signed if it was outside the brain and negatively signed if it was inside the brain mask. To reconstruct the image, we used a threshold of three voxels from the skull boundary and removed any extra regions.

The second decoder generates the final preprocessed image. Both decoders use skip connections to preserve and propagate important features. In the final decoder, a crucial component called mask-guided attention is employed to significantly improve the network’s performance in handling complex image features and structures.

Mask-guided attention enhances the network’s ability to focus on relevant regions when processing images. This attention mechanism dynamically adjusts the importance of different parts of the image based on both local features extracted from the brain extraction decoder and global features extracted from the image reconstruction branch. Specifically, it consists of query (Q), key (K), and value (V) matrices.

The key and value matrices are derived from the features extracted by the brain extraction decoder, which encodes the spatial information relevant to the brain structures. The query matrix incorporates features from the image reconstruction branch, which allows the attention mechanism to consider the overall context of the image during processing.

By integrating mask-guided attention, the network can effectively attend to important regions for both brain extraction and image preprocessing tasks, thus improving its ability to capture relevant information and enhance feature representation.

In our implementation, we adopted a four-head attention mechanism to capture intricate relationships and enhance the model’s capacity to handle diverse features effectively.

The network output consists of two components: the brain extraction decoder output and the preprocessed image decoder output. Figure 2 illustrates the architecture of the proposed MGA-Net, highlighting the integration of mask-guided attention for improved performance.

For preprocessing MRI images, we applied noise reduction using the Advanced Normalization Tools (ANTs) package, N4 bias field corrections (ANTs), and contrast-limited adaptive histogram equalization (CLAHE) to enhance the image histogram. The preprocessing of US images was similar, except for the absence of bias field correction.

We define the loss function $\mathcal{L}$ as the sum of the mean squared error (MSE) for brain extraction $\mathcal{L}_{\text{mask}}$ , MSE of image reconstruction $\mathcal{L}_{\text{MSE}}$ and structural similarity index (SSIM) $\mathcal{L}_{\text{SSIM}}$ (Wang et al., 2004) for image reconstruction.

At the core of our augmentation strategy lies histogram matching, a technique designed to align the intensity distributions of input images with reference distributions, effectively enhancing the model’s ability to learn features specific to non-brain regions. By computing histograms of brain and non-brain tissues from a given image and synthesizing reference histograms based on pre-existing intensity data, our approach ensures that the reconstructed images faithfully represent non-brain structures while maintaining consistency with established segmentation masks. The intensity data were pre-computed based on the average histograms of the available T1-weighted and T2-weighted images in the training sets. The intensity data used in this process were pre-computed using the average histograms derived from the available T1-weighted and T2-weighted images in the training sets. Therefore, this augmentation technique is exclusively applied to T1-W and T2-W images. For other image sequences and US images we randomly employ histogram equalization.

By systematically adjusting the intensity distributions without altering the spatial relationships, our model can reconstruct images devoid of brain structures with improved accuracy and reliability.

Other data augmentation techniques include sagittal, axial, and coronal rotation from -10 to 10 degrees, image zooming by varying the image spacing from 0.5 to a maximum of 4. We ensured that the image size was not reduced to less than half of the original size. In addition, we perform image crop** by varying the threshold of SDT of the ground truth mask from 0 to 4. These augmentations generate extra images to improve the network’s generalizability. In addition, we introduce motion blur with kernel sizes of 5 and 11 to make the network robust to motion blur effects. The above augmentations generate new images and their corresponding ground truth labels. Additionally, we retained the noise information from the preprocessing steps and then added it to the image during training with a factor that randomly selected from a normal distribution to simulate the noise in the image.

Finally, our model preprocesses MRI/US images that are slightly larger than the brain boundaries. We use an SDT of 3 to reconstruct images while ensuring that the brain boundary is within an SDT of zero, thereby providing flexibility in brain extraction.

3 Experimental settings

Our training dataset comprised a diverse assortment of medical images, totaling 710 distinct MRI sequences and 148 ultrasound images, as detailed in Table 1. Given the limited availability of data from neonates, which is insufficient for training deep learning models, we incorporated various MRI sequences from multiple datasets. This strategy allows the proposed model to learn from a diverse range of data, thereby enhancing its generalizability. The IXI dataset ²²2Available at: http://brain-development.org/ixi-dataset provides a comprehensive range of MRI contrasts and modalities, including T1-weighted (T1w), T2-weighted (T2w), proton density-weighted (PDw), magnetic resonance angiography (MRA), and diffusion-weighted imaging (DWI). We integrated this dataset into our training data using a randomly selected DWI image for training, akin to the approach detailed in Hoopes et al. (Hoopes et al., 2022). Additionally, data from the FSM dataset, as documented in Greve et al.’s work (Greve et al., 2021), which includes standard acquisitions and quantitative T1 maps (qT1), were incorporated. Furthermore, our training dataset includes pseudo-continuous arterial spin labeling (PCASL) scans acquired as stacks of 2D-EPI slices with low resolution and a limited field of view (FOV), a methodology elucidated in (Dai et al., 2008; Hoopes et al., 2022). We also included data from the QIN dataset drawn from previous studies (Mamonov and Kalpathy-Cramer, 2016; Clark et al., 2013; Prah et al., 2015), consisting of pre-contrast clinical image stacks with thick slices from patients newly diagnosed with glioblastoma. Finally, we have incorporated datasets from the University Hospital Puerta del Mar (HUPM), which include T1-weighted (T1w) images of preterm infants and 3D ultrasound (US) images. The ground truth for these images was established by medical experts using the MELAGE (Jafrasteh et al., 2023) toolbox.

3D ultrasound images may not fully capture the entire brain region because of the narrow field of view of the anterior fontanel. Consequently, the ground truth mask may not represent total brain volume (TBV). We observed that between 5

We extensively validated the proposed method using neonate datasets. Specifically, our validation dataset integrates 228 T1-weighted and 266 T2-weighted images from the Develo** Human Connectome Project (DHCP), all obtained from neonatal subjects (Makropoulos et al., 2018). Additionally, T1w and T2w images from the Albert neonate dataset (Gousias et al., 2012, 2013), sourced from the repository maintained by Gousias et al. ³³3Available at: https://brain-development.org/brain-atlases/neonatal-brain-atlases/neonatal-brain-atlas-gousias/, were included in our validation efforts. We further enhanced our validation dataset with 70 neonate T1w images and 10 3D neonate ultrasound images from the University Hospital Puerta del Mar (HUPM), which were sourced from patients not included in the training dataset to ensure the robustness of our validation process. Finally, the BrainWeb dataset (Cocosco, 1997; Collins et al., 1998), which includes 21 simulated T1 MRI images with the corresponding ground truth data, was used to validate the model’s performance on adult MRI images.

We also included 712 ground truth TBV values from the HUPM dataset (Jafrasteh et al., 2023; Benavente-Fernández et al., 2021) to evaluate the performance of the proposed model on 3D ultrasound images. The PMA of patients in these datasets ranged from 25 to 39 gestational weeks. Additionally, we conducted a comparative analysis in which MRI and US images obtained on the same day were processed using MGA-Net to estimate total brain volume. In total, there are 49 paired MRI and US images available for this analysis. The PMA of patients who underwent paired MRI and US was 25.4 to 52.7 weeks. We evaluated the performance of the proposed method against BET (Smith, 2002), Neural Preprocessing (NPP) (He et al., 2023) and SynthStrip (Hoopes et al., 2022) using the aforementioned datasets. Table 1 summarizes the training and testing datasets.

Table 1: Dataset, Modality, Resolution (mm), and Number of Samples for Training and Test Data

Dataset	Modality	Resolution (mm)	No. (Training)	No. (Test)
IXI	T1w-T2w-PDw	0.9 $\times$ 0.9 $\times$ 1.2	148	-
	MRA	0.5 $\times$ 0.5 $\times$ 0.8	50	-
	DWI	1.8 $\times$ 1.8 $\times$ 2.0	32	-
FSM	T1w-T2w-PDw-qT1	1.0 $\times$ 1.0 $\times$ 1.0	138	-
ASL	T1w MPRAGE	1.0 $\times$ 1.0 $\times$ 1.0	43	-
	PCASL 2D-EPI	3.4 $\times$ 3.4 $\times$ 5.0	43	-
QIN	T1w-T2-FLAIR	0.4 $\times$ 0.4 $\times$ 0.6	71	-
	T2w 2D-FSE	1.0 $\times$ 1.0 $\times$ 5.0	39	-
Infant	T1w MPRAGE	1.0 $\times$ 1.0 $\times$ 1.0	16	-
HUPM	T1w - neonate	0.9 $\times$ 0.9 $\times$ 0.9	78	70
	3D US	0.7 $\times$ 0.7 $\times$ 0.7	148	10
	T1w -8 years	0.9 $\times$ 0.9 $\times$ 0.9	42	-
BrainWeb	T1w	1.0 $\times$ 1.0 $\times$ 1.0	-	21
DHCP	T1w MRI	0.5 $\times$ 0.5 $\times$ 0.5	-	228
	T2w MRI	0.5 $\times$ 0.5 $\times$ 0.5	-	266
Albert	T1w	0.8 $\times$ 0.8 $\times$ 0.8	-	20
	T2w	0.8 $\times$ 0.8 $\times$ 0.8	-	20

The network inputs and outputs are of dimension $128\times 128\times 128$ . The number of batch sizes was set to $4$ , and the network was trained over 1000 steps. All experiments were executed on a workstation equipped with two A5000 Nvidia GPUs, two Intel Xeon Gold 5220R CPUs, and 128 GB RAM. The DICE coefficient, mean surface distance (MSD), recall, and accuracy metrics were used to assess segmentation accuracy. Furthermore, to validate the image reconstruction performance, the structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) were employed. Additionally, to evaluate total brain volume (TBV) estimation, the root mean square error (RMSE) and R-squared criteria were used.

4 Results and discussion

Tables 2 and 3 presents a comparison of the proposed methods BET, NPP, and SynthStrip on the test datasets considered in this study. The results highlight the general superiority of the proposed method. Notably, our method achieved higher DICE coefficient (F1) scores and lower mean surface distance (MSD) values compared to the other methods, indicating better segmentation accuracy and spatial agreement with ground truth labels.

Tables 2 and 3 provide a detailed comparison of the proposed method against BET, NPP, and SynthStrip across various test datasets, and the results were evaluated using the DICE, MSD, Recall, and Accuracy criteria. The results consistently demonstrate the superior performance of the proposed method in most cases.

Table 2 uses the DICE coefficient (F1) and mean surface distance (MSD) to evaluate segmentation accuracy and spatial agreement with ground truth labels. The proposed method achieved the highest DICE scores and lowest MSD values across most datasets, indicating superior performance:

In the DHCP (T1) and HUPM (T1) datasets, the proposed method obtained significantly better results compared to the compared methods. For the Albert (T1) and Albert (T2) datasets, the proposed method outperformed the others, demonstrating high segmentation accuracy and spatial agreement. In the BrainWeb dataset, the proposed method was equal to SynthStrip in DICE but had a slightly higher MSD. Table 3 compares the methods based on Recall and Accuracy. In general, the proposed method demonstrates better performance:

In the DHCP (T1) dataset, the proposed method achieved high recall and accuracy, outperforming BET and NPP, and closely matching SynthStrip. For the DHCP (T2) dataset, the proposed method matches NPP in recall but exhibits better accuracy. In the HUPM (T1) dataset, our method excels in both Recall and Accuracy compared to the other methods. For the Albert (T1) and Albert (T2) datasets, the proposed method again demonstrated high performance, surpassing BET and NPP and matching SynthStrip. In the BrainWeb dataset, the proposed method had a slightly lower Recall compared to SynthStrip; however, it matched accuracy. Overall, the proposed method consistently delivers superior performance across different datasets and evaluation metrics, which highlights its effectiveness in segmentation tasks. The few instances where other methods performed slightly better are noted; however, they do not detract from the general trend of the proposed method.

Table 2: Performance comparison on various datasets using DICE and MSD criteria.

Dataset	BET		Synthstrip		NPP		Ours
Dataset	DICE(F1)	MSD	DICE(F1)	MSD	DICE(F1)	MSD	DICE(F1)	MSD
DHCP (T1)	0.82(0.07)	8.82(3.02)	0.94(0.02)	3.88(0.83)	0.86(0.03)	8.22(1.35)	0.96(0.01)	2.54(0.4)
DHCP (T2)	0.73(0.1)	13.82(5.59)	0.97(0.01)	1.84(0.34)	0.9(0.01)	6.57(1.46)	0.97(0.0)	2.18(0.3)
HUPM (T1)	0.75(0.09)	7.44(2.32)	0.89(0.06)	2.9(1.22)	0.69(0.09)	9.06(1.92)	0.97(0.01)	0.94(0.19)
Albert (T1)	0.92(0.02)	2.75(0.83)	0.95(0.02)	1.74(0.54)	0.74(0.05)	10.63(2.06)	0.98(0.0)	0.74(0.13)
Albert (T2)	0.86(0.06)	5.07(2.33)	0.95(0.02)	1.72(0.57)	0.69(0.03)	12.08(2.0)	0.97(0.0)	1.01(0.15)
BrainWeb	0.97(0.01)	2.0(0.36)	0.97(0.01)	2.05(0.37)	0.93(0.01)	3.42(0.44)	0.97(0.01)	2.36(0.39)

Table 3: Performance comparison on various datasets using Recal and Accuracy criteria.

Dataset	BET		Synthstrip		NPP		Ours
Dataset	Recal	Accuracy	Recal	Accuracy	Recal	Accuracy	Recal	Accuracy
DHCP (T1)	0.71(0.1)	0.88(0.04)	0.99(0.01)	0.95(0.02)	0.97(0.02)	0.86(0.03)	0.95(0.02)	0.97(0.01)
DHCP (T2)	0.59(0.13)	0.82(0.06)	0.95(0.02)	0.97(0.01)	0.97(0.01)	0.9(0.01)	0.97(0.01)	0.98(0.0)
HUPM (T1)	0.99(0.01)	0.74(0.11)	0.99(0.01)	0.91(0.05)	0.99(0.01)	0.66(0.12)	0.98(0.02)	0.98(0.01)
Albert (T1)	0.97(0.02)	0.93(0.02)	0.99(0.01)	0.96(0.02)	1.0(0.0)	0.71(0.08)	0.98(0.01)	0.98(0.0)
Albert (T2)	0.83(0.1)	0.89(0.04)	0.99(0.01)	0.96(0.02)	0.99(0.01)	0.62(0.05)	0.99(0.01)	0.98(0.0)
BrainWeb	0.97(0.01)	0.97(0.01)	0.99(0.01)	0.97(0.0)	1.0(0.0)	0.94(0.01)	0.98(0.01)	0.97(0.0)

Regarding image reconstruction, Table 4 compares the performance of the proposed method and a preprocessing pipeline mentioned in the previous section using PSNR and SSIM metrics across different datasets. MGA-Net demonstrated better PSNR performance on the DHCP, Albert, and HUPM (US) datasets. For HUPM(T1) datasets, it showed lower PSNR but higher SSIM than the NPP method. NPP demonstrated better PSNR and SSIM values on brain web datasets. Generally, NPP showed lower SSIM values than our proposed method. It can be related to the training set in which it was trained on adult datasets rather than neonates. The proposed method consistently achieved high PSNR and SSIM values compared to the pipeline approach, indicating superior image reconstruction quality.

Table 4: Comparison of performance between proposed method and preprocessing pipeline using PSNR and SSIM metrics across different datasets. PSNR and SSIM values are reported for both proposed method and pipeline approach. The values in parentheses represent the standard deviation.

Dataset	PSNR			SSIM
Dataset	Pipeline	NPP	Ours	Pipeline	NPP	Ours
DHCP (T1)	18.56(1.02)	25.06(1.87)	30.26(2.32)	0.76(0.05)	0.91(0.02)	0.97(0.01)
DHCP (T2)	25.77(1.52)	22.56(1.43)	30.33(1.2)	0.96(0.01)	0.89(0.02)	0.97(0.01)
HUPM (T1)	34.54(3.39)	33.95(1.7)	32.88(5.21)	0.99(0.0)	0.88(0.05)	0.99(0.01)
Albert (T1)	19.51(2.95)	28.36(1.85)	33.42(4.01)	0.73(0.03)	0.75(0.06)	0.98(0.01)
Albert (T2)	30.86(2.35)	27.15(1.42)	36.68(0.91)	0.95(0.02)	0.63(0.05)	0.97(0.0)
BrainWeb (T1)	26.49(3.11)	29.82(3.94)	25.65(3.08)	0.98(0.0)	0.98(0.01)	0.97(0.01)
HUPM (US)	36.58(1.19)	-	42.71(2.56)	0.99(0.0)	-	0.99(0.0)

Across different datasets (Table 5, our method consistently achieved higher DICE (F1) scores, lower MSD values, and higher recall and accuracy rates compared to BET, NPP, and SynthStrip, highlighting its robustness and effectiveness in diverse segmentation tasks. Figure 3 illustrates a 3D MRI image along with the corresponding reconstruction and segmentation outcomes obtained using the proposed MGA-Net.

Table 5: Performance comparison on all the used datasets with different criteria.

Method	BET	SynthStrip	NPP	Ours
DICE(F1)	0.79(0.1)	0.95(0.03)	0.85(0.08)	0.97(0.01)
MSD	10.28(5.44)	2.7(1.18)	7.63(2.14)	2.11(0.65)
Accuracy	0.85(0.08)	0.96(0.03)	0.85(0.1)	0.98(0.01)
Recal	0.71(0.17)	0.97(0.02)	0.97(0.02)	0.97(0.02)

Figure 4 illustrates the comparison between observed and predicted total brain volumes based on 3D ultrasound measurements at various postmenstrual ages (PMAs). The obtained RMSE and $R^{2}$ are 14.17 and 0.96, respectively.

Figure 5 presents the results for 49 paired MRI and US images, where the RMSE is 22.01 and the $R^{2}$ value is 0.97, respectively. These results demonstrate a high degree of concordance between the TBV values obtained from these two distinct modalities.

The results demonstrate a strong correlation between observed and predicted total brain volume, indicating the effectiveness of the proposed method in accurately estimating total brain volume from 3D ultrasound images. In addition, Table 6 provides performance evaluation using HUPM 3D ultrasound data, demonstrating high DICE (F1) score, low MSD, and high recall and accuracy rates. Figure 6 illustrates a 3D ultrasound (US) image along with the corresponding reconstruction and segmentation outcomes obtained with the proposed MGA-Net and applied to a real ultrasound image sourced from the HUPM dataset. Visualization was performed using MELAGE software.

Table 6: Performance evaluation using HUPM 3D ultrasound data.

DICE(F1)	MSD	Recal	Accuracy
0.95 (0.02)	1.49 (0.45)	0.96 (0.02)	0.96 (0.01)

Overall, the results demonstrate the superior performance of our proposed method in medical image segmentation and reconstruction tasks across various datasets and evaluation metrics.

4.1 Sensitivity analysis

In this subsection, we present a sensitivity analysis to assess the impact of varying the threshold parameter on the segmentation performance of the proposed method on different datasets. Figure 7 illustrates the results of the sensitivity analysis, where we systematically varied the threshold parameter and evaluated its effect on segmentation accuracy. By observing changes in performance metrics across different threshold values, we gain insights into how threshold parameters impact segmentation performance. As threshold increases, the recal also increases, and the DICE coefficient decreases. Generally, a threshold equal to zero provides a balance between the dice coefficient and recall.

4.2 Ablation study and image anomalies

In this subsection, we compare the performance of the proposed method to that of the u-net architecture. These results demonstrate the higher segmentation accuracy of the proposed method. Table 7 shows the comparison between U-Net 3D (ablated) and the proposed method on all test datasets in this study.

Table 7: Results of ablation study

	DICE(F1)	MSD	Recal	Accuracy
U-Net	0.94(0.02)	3.4(1.27)	0.95(0.02)	0.9(0.04)
Ours	0.97(0.01)	2.11(0.65)	0.98(0.01)	0.97(0.02)

Figure 9 presents the three views of an eight-year-old preterm-born patient with reconstruction using NPP, U-Net, and the proposed method. As can be seen, the proposed method can successfully reconstruct the image even though there is an anomaly in the image, whereas NPP and U-Net have some problems in reconstructing the entire image (red circle in Figure 9.

The dataset obtained from Hospital Universitario Puerta del Mar (HUPM) originates from clinical settings, involving preterm neonates who may present with neurological abnormalities, such as abnormal tissue structures or atypical brain morphology (Figure 8). Some of these patients have been incorporated into the training dataset, while others are allocated to the test dataset to evaluate the model’s generalizability and robustness. Despite the inherent challenges associated with these variations, the results demonstrate that the performance of the proposed method on the test dataset is superior to that of conventional methods. This indicates that the proposed method can be effectively accommodated and adjusted for clinical abnormalities in its analyses. We attribute this robustness to the network architecture, which enhances its ability to generalize across uncommon and variant anatomical features.

For the ultrasound (US) datasets, Figure 10 presents coronal, sagittal, and axial view of a 3D ultrasound from the HUPM dataset, displaying the image before and after processing with the MGA-Net. This figure illustrates the effectiveness of the proposed method in reconstructing images, highlighting how MGA-Net enhances the visual quality and detail for better diagnostic evaluation.

5 Conclusions

This study introduced the mask-guided attention neural network (MGA-Net), a novel framework designed to tackle the inherent complexities of preprocessing MRI and 3D ultrasound images. Our comparative analysis highlights the superiority of MGA-Net over other state-of-the-art methods, such as BET, NPP, and SynthStrip. The results demonstrate that MGA-Net not only enhances the precision of brain extraction and segmentation and excels at reconstructing high-quality images. Significantly, MGA-Net demonstrated exceptional ability to estimate total brain volume from 3D ultrasound images, which is an area in which existing methods often falter. A comprehensive sensitivity analysis of the threshold parameters further demonstrates the robustness of the model, ensuring reliability even under varied operational conditions. The integration of sinusoidal positional encoding and the mask guided attention within MGA-Net represents a significant advancement in this field. These features allow the model to adeptly handle the diverse statistical properties of MRI and ultrasound images, thereby facilitating superior performance across multiple critical metrics. The mask-guided attention allows the network to focus on the most relevant part of the image to generate preprocessed images. In addition, we use extensive data augmentation such as a novel histogram based data augmentation, to improve the robustness of the proposed method. Overall, MGA-Net is a freely available tool for enhancing the accuracy and efficiency of neuroimaging data analysis across diverse applications in research and clinical settings.

Code Availability

The network code and weights are available through https://github.com/BahramJafrasteh/MGA-Net

Acknowledgements

This study was funded by the Instituto de Salud Carlos III (ISCIII) through the project ”DTS22/00142” and co-funded by the European Union. We acknowledge the use of data from the develo** Human Connectome Project (dHCP), KCL-Imperial-Oxford Consortium, funded by the European Research Council under the European Union Seventh Framework Program (FP/2007-2013) / ERC Grant Agreement no. [319456]. We thank the families who generously supported this trial. The DCHP data were used to evaluate model performance.

References

Biessmann et al. (2011) F. Biessmann, S. Plis, F. C. Meinecke, T. Eichele, K.-R. Muller, Analysis of multimodal neuroimaging data, IEEE reviews in biomedical engineering 4 (2011) 26–58.
Hintz et al. (2015) S. R. Hintz, P. D. Barnes, D. Bulas, T. L. Slovis, N. N. Finer, L. A. Wrage, A. Das, J. E. Tyson, D. K. Stevenson, W. A. Carlo, et al., Neuroimaging and neurodevelopmental outcome in extremely preterm infants, Pediatrics 135 (2015) e32–e42.
Hinojosa-Rodríguez et al. (2017) M. Hinojosa-Rodríguez, T. Harmony, C. Carrillo-Prado, J. D. Van Horn, A. Irimia, C. Torgerson, Z. Jacokes, Clinical neuroimaging in the preterm infant: diagnosis and prognosis, NeuroImage: Clinical 16 (2017) 355–368.
Counsell et al. (2019) S. J. Counsell, T. Arichi, S. Arulkumaran, M. A. Rutherford, Fetal and neonatal neuroimaging, Handbook of clinical neurology 162 (2019) 67–103.
Stanojevic et al. (2002) M. Stanojevic, T. Hafner, A. Kurjak, Three-dimensional ultrasound-a useful imaging technique in the assessment of neonatal brain (2002).
Beijst et al. (2020) C. Beijst, J. Dudink, R. Wientjes, I. Benavente-Fernandez, F. Groenendaal, M. J. Brouwer, I. Išgum, H. W. de Jong, L. S. de Vries, Two-dimensional ultrasound measurements vs. magnetic resonance imaging-derived ventricular volume of preterm infants with germinal matrix intraventricular haemorrhage, Pediatric Radiology 50 (2020) 234–241.
Thakur et al. (2020) S. Thakur, J. Doshi, S. Pati, S. Rathore, C. Sako, M. Bilello, S. M. Ha, G. Shukla, A. Flanders, A. Kotrotsou, et al., Brain extraction on mri scans in presence of diffuse glioma: Multi-institutional performance evaluation of deep learning methods and robust modality-agnostic training, Neuroimage 220 (2020) 117081.
Zhao et al. (2010) L. Zhao, U. Ruotsalainen, J. Hirvonen, J. Hietala, J. Tohka, Automatic cerebral and cerebellar hemisphere segmentation in 3d mri: adaptive disconnection algorithm, Medical image analysis 14 (2010) 360–372.
Gitler et al. (2017) A. D. Gitler, P. Dhillon, J. Shorter, Neurodegenerative disease: models, mechanisms, and a new hope, Disease models & mechanisms 10 (2017) 499–502.
Radue et al. (2015) E.-W. Radue, F. Barkhof, L. Kappos, T. Sprenger, D. A. Häring, A. de Vera, P. von Rosenstiel, J. R. Bright, G. Francis, J. A. Cohen, Correlation between brain volume loss and clinical and mri outcomes in multiple sclerosis, Neurology 84 (2015) 784–793.
Jafrasteh et al. (2024) B. Jafrasteh, M. Lubián-Gutiérrez, S. P. Lubián-López, I. Benavente-Fernández, Enhanced spatial fuzzy c-means algorithm for brain tissue segmentation in t1 images, Neuroinformatics (2024) 1–14.
Hwang et al. (2019) H. Hwang, H. Z. U. Rehman, S. Lee, 3d u-net for skull strip** in brain mri, Applied Sciences 9 (2019) 569.
Yu et al. (2022) Z. Yu, X. Han, W. Xu, J. Zhang, C. Marr, D. Shen, T. Peng, X.-Y. Zhang, J. Feng, A generalizable brain extraction net (ben) for multimodal mri data from rodents, nonhuman primates, and humans, Elife 11 (2022) e81217.
Hoopes et al. (2022) A. Hoopes, J. S. Mora, A. V. Dalca, B. Fischl, M. Hoffmann, Synthstrip: Skull-strip** for any brain image, NeuroImage 260 (2022) 119474.
Ranjbar et al. (2022) S. Ranjbar, K. W. Singleton, L. Curtin, C. R. Rickertsen, L. E. Paulson, L. S. Hu, J. R. Mitchell, K. R. Swanson, Weakly supervised skull strip** of magnetic resonance imaging of brain tumor patients, Frontiers in Neuroimaging 1 (2022) 832512.
Pei et al. (2022) L. Pei, M. Ak, N. H. M. Tahon, S. Zenkin, S. Alkarawi, A. Kamal, M. Yilmaz, L. Chen, M. Er, N. Ak, et al., A general skull strip** of multiparametric brain mris using 3d convolutional neural network, Scientific Reports 12 (2022) 10826.
Iglesias et al. (2023) J. E. Iglesias, B. Billot, Y. Balbastre, C. Magdamo, S. E. Arnold, S. Das, B. L. Edlow, D. C. Alexander, P. Golland, B. Fischl, Synthsr: A public ai tool to turn heterogeneous clinical brain scans into high-resolution t1-weighted images for 3d morphometry, Science advances 9 (2023) eadd3607.
Namburete et al. (2018) A. I. Namburete, W. Xie, M. Yaqub, A. Zisserman, J. A. Noble, Fully-automated alignment of 3d fetal brain ultrasound to a canonical reference space using multi-task learning, Medical image analysis 46 (2018) 1–14.
Jafrasteh et al. (2023) B. Jafrasteh, S. P. Lubián-López, I. Benavente-Fernández, A deep sift convolutional neural networks for total brain volume estimation from 3d ultrasound images, Computer Methods and Programs in Biomedicine (2023) 107805.
Moser et al. (2022) F. Moser, R. Huang, B. W. Papież, A. I. Namburete, I. 21st Consortium, et al., Bean: Brain extraction and alignment network for 3d fetal neurosonography, NeuroImage 258 (2022) 119341.
Wang et al. (2021) J. Wang, X. Yu, Y. Gao, Mask guided attention for fine-grained patchy image classification, in: 2021 IEEE International Conference on Image Processing (ICIP), IEEE, 2021, pp. 1044–1048.
Cai et al. (2019) H. Cai, Z. Wang, J. Cheng, Multi-scale body-part mask guided attention for person re-identification, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2019, pp. 0–0.
Pang et al. (2019) Y. Pang, J. Xie, M. H. Khan, R. M. Anwer, F. S. Khan, L. Shao, Mask-guided attention network for occluded pedestrian detection, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 4967–4975.
Smith (2002) S. M. Smith, Fast robust automated brain extraction, Human brain map** 17 (2002) 143–155.
Moser et al. (2020) F. Moser, R. Huang, A. T. Papageorghiou, B. W. Papież, A. I. Namburete, Automated fetal brain extraction from clinical ultrasound volumes using 3d convolutional neural networks, in: Medical Image Understanding and Analysis: 23rd Conference, MIUA 2019, Liverpool, UK, July 24–26, 2019, Proceedings 23, Springer, 2020, pp. 151–163.
Vaswani et al. (2017) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
Wang et al. (2004) Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing 13 (2004) 600–612.
Greve et al. (2021) D. N. Greve, B. Billot, D. Cordero, A. Hoopes, M. Hoffmann, A. V. Dalca, B. Fischl, J. E. Iglesias, J. C. Augustinack, A deep learning toolbox for automatic segmentation of subcortical limbic structures from mri images, Neuroimage 244 (2021) 118610.
Dai et al. (2008) W. Dai, D. Garcia, C. De Bazelaire, D. C. Alsop, Continuous flow-driven inversion for arterial spin labeling using pulsed radio frequency and gradient fields, Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 60 (2008) 1488–1497.
Mamonov and Kalpathy-Cramer (2016) A. Mamonov, J. Kalpathy-Cramer, Data from qin gbm treatment response, The Cancer Imaging Archive (2016).
Clark et al. (2013) K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, et al., The cancer imaging archive (tcia): maintaining and operating a public information repository, Journal of digital imaging 26 (2013) 1045–1057.
Prah et al. (2015) M. Prah, S. Stufflebeam, E. Paulson, J. Kalpathy-Cramer, E. Gerstner, T. Batchelor, D. Barboriak, B. Rosen, K. Schmainda, Repeatability of standardized and normalized relative cbv in patients with newly diagnosed glioblastoma, American Journal of Neuroradiology 36 (2015) 1654–1661.
Jafrasteh et al. (2023) B. Jafrasteh, S. P. L. López, I. B. Fernández, Melage: A purely python based neuroimaging software (neonatal), arXiv preprint arXiv:2309.07175 (2023).
Makropoulos et al. (2018) A. Makropoulos, E. C. Robinson, A. Schuh, R. Wright, S. Fitzgibbon, J. Bozek, S. J. Counsell, J. Steinweg, K. Vecchiato, J. Passerat-Palmbach, et al., The develo** human connectome project: A minimal processing pipeline for neonatal cortical surface reconstruction, Neuroimage 173 (2018) 88–112.
Gousias et al. (2012) I. S. Gousias, A. D. Edwards, M. A. Rutherford, S. J. Counsell, J. V. Hajnal, D. Rueckert, A. Hammers, Magnetic resonance imaging of the newborn brain: manual segmentation of labelled atlases in term-born and preterm infants, Neuroimage 62 (2012) 1499–1509.
Gousias et al. (2013) I. S. Gousias, A. Hammers, S. J. Counsell, L. Srinivasan, M. A. Rutherford, R. A. Heckemann, J. V. Hajnal, D. Rueckert, A. D. Edwards, Magnetic resonance imaging of the newborn brain: automatic segmentation of brain images into 50 anatomical regions, PloS one 8 (2013) e59990.
Cocosco (1997) C. A. Cocosco, Brainweb: Online interface to a 3d mri simulated brain database., (No Title) (1997).
Collins et al. (1998) D. L. Collins, A. P. Zijdenbos, V. Kollokian, J. G. Sled, N. J. Kabani, C. J. Holmes, A. C. Evans, Design and construction of a realistic digital brain phantom, IEEE transactions on medical imaging 17 (1998) 463–468.
Benavente-Fernández et al. (2021) I. Benavente-Fernández, E. Ruiz-González, M. Lubian-Gutiérrez, S. P. Lubián-Fernández, Y. Cabrales Fontela, C. Roca-Cornejo, P. Olmo-Duran, S. P. Lubián-López, Ultrasonographic estimation of total brain volume: 3d reliability and 2d estimation. enabling routine estimation during nicu admission in the preterm infant, Frontiers in Pediatrics (2021) 740.
He et al. (2023) X. He, A. Q. Wang, M. R. Sabuncu, Neural pre-processing: A learning framework for end-to-end brain mri pre-processing, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2023, pp. 258–267.