Abstract
Background: Magnetic resonance imaging (MRI) offers excellent soft tissue contrast essential for diagnosis and treatment, but its long acquisition times can cause patient discomfort and motion artifacts.
Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named “Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)” to accelerate data acquisition without requiring fully sampled datasets.
Materials and Methods: We used the fastMRI multi-coil brain axial \chT2-weighted (\chT2-w) dataset from 1,376 cases and single-coil brain quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) \chT1 maps from 318 cases to train and test our model. Robustness against domain shift was evaluated using two out-of-distribution (OOD) datasets: multi-coil brain axial postcontrast \chT1-weighted (\chT1c) dataset from 50 cases and axial T1-weighted (T1-w) dataset from 50 patients. Data were retrospectively subsampled at acceleration rates . ASSCGD partitions a random sampling pattern into two disjoint sets, ensuring data consistency during training. We compared our method with ReconFormer Transformer and SS-MRI, assessing performance using normalized mean squared error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Statistical tests included one-way analysis of variance (ANOVA) and multi-comparison Tukey’s Honesty Significant Difference (HSD) tests.
Results: ASSCGD preserved fine structures and brain abnormalities visually better than comparative methods at for both multi-coil and single-coil datasets. It achieved the lowest NMSE at , and the highest PSNR and SSIM values at all acceleration rates for the multi-coil dataset. Similar trends were observed for the single-coil dataset, though SSIM values were comparable to ReconFormer at . These results were further confirmed by the voxel-wise correlation scatter plots. OOD results showed significant (p ) improvements in undersampled image quality after reconstruction.
Conclusions: ASSCGD successfully reconstructs fully sampled images without utilizing them in the training step, potentially reducing imaging costs and enhancing image quality crucial for diagnosis and treatment.
Adaptive Self-Supervised Consistency-Guided Diffusion Model for Accelerated MRI Reconstruction
Mojtaba Safari1, Zach Eidex1, Shaoyan Pan1, Richard L.J. Qiu1, Xiaofeng Yang, PhD1,‡
1Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, United States of America
‡Corresponding Author
Author to whom correspondence should be addressed. email: [email protected]
keywords: k-space sampling, fastMRI, accelerated MRI, reconstruction, adaptive partitioning
1 Introduction
Magnetic resonance imaging (MRI) provides excellent soft tissue contrast, playing a vital role in diagnosis, treatment, and follow-up. However, the prolonged acquisition times can lead to patient discomfort and motion artifacts, which compromise image quality. In addition, the recent Lancet Oncology Commission highlighted severe MRI and other medical imaging device shortages in low- and middle-income countries [1], potentially resulting in 2.5 million deaths worldwide [2]. Globally, only seven MRI scanners are installed per million people, primarily due to the high cost of installation, operation, and maintenance. In addition, the highly sampled k-space required for high-resolution MRI images prohibitively increases acquisition time. Long acquisition time reduces imaging throughput and increases patient discomfort and motion artifacts [3].
The MRI acquisition can be accelerated by reducing the sampled k-space data, but this is limited by the Nyquist criteria. Compressed sensing (CS) and parallel imaging (PI) techniques aim to recover fully sampled images from under-sampled images by exploiting data in a sparse transformed space and redundant data acquired using uncorrelated radiofrequency coils, respectively [4, 5]. However, at high acceleration rates, PI and CS methods suffer from noise amplification [6] and residual artifacts [7], respectively.
Deep learning (DL) algorithms have been extensively used to reconstruct accelerated high-resolution MRI images. DL-based compressed sensing-MRI (DL-based CS-MRI) approaches are roughly divided into two categories: data-driven and physics-guided [8]. The former maps the under-sampled k-space to the fully-sampled k-space [9, 10, 11]. The latter incorporates the forward encoding operator knowledge, such as the coil sensitivity and under-sampling pattern, to solve ill-posed inverse problems. It is mainly divided into two groups: unroll optimization method and the data consistency (DC) layer. The former unrolls an iterative reconstruction approach where it alternates between data consistency and regularization to solve the objective in a fixed number of iterations [12, 13, 14]. The DC layer is an end-to-end way of training the unroll models and usually is the latest layer of the networks. [15, 16, 17].
These methods are typically trained under supervised frameworks where the reference fully sampled images were utilized to train a model. However, obtaining the fully sampled images might be impractical in imaging scenarios such as cardiovascular MRI due to excessive involuntary movements, or diffusion MRI with echo planar imaging due to quick \chT2^* signal decay [18]. Additionally, acquiring high-resolution anatomical brain MRI images can be prohibitively long.
In this study, we propose an adaptive self-supervised consistency-guided diffusion (ASSCGD) model to reconstruct fully sampled images without requiring them in the training step. Our proposed ASSCGD model is based on an adversarial mapper to reconstruct fully sampled MRI images. Our proposed method was evaluated using both single-coil and multi-coil MRI data, as well as two out-of-distribution (OOD) datasets. Our method leverages a recently proposed ReconFormer transformer [19] as a generator and is compared with two state-of-the-art models. Our contributions are as follows:
-
•
To our knowledge, ASSCGD is the first study proposing a self-supervised method using an adversarial mapper.
-
•
The proposed method performs the backward diffusion process in smaller steps that improve sampling efficiency.
-
•
The proposed method’s robustness against domain shift was evaluated at the test time,
-
•
To our knowledge, It is the first self-supervised method aimed at reconstructing fully sampled quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) \chT1 map,
2 Materials and Methods
2.1 Compressed sensing MRI
Let represent the observed subsampled k-space measurement and is the unobserved fully sampled data. The compressed sensing is formulated as follows:
(1) |
where is the additive acquisition noise and give represents the encoding operator. is composed of a coil sensitivity map , a Fourier transform , and a sampling map with the specified pattern controlling the acceleration rate (R). The mathematical expression for the encoding operator is . The MRI reconstruction is formulated as an unconstrained optimization problem as follows [14, 17, 20],
(2) |
where represents our proposed model parametrized by . The first and second terms represent the regularization and data consistency, and is a scalar regularization weight that balances between the data consistency and regularization terms.
2.2 Diffusion model
The diffusion model inspired by nonequilibrium thermodynamics aims to approximate complex and intractable distributions with a tractable one like normal Gaussian [21]. It consists of two process: the forward process and the reverse process.
Forward process:
The forward process utilizes a noise scheduler to add Gaussian noise to the noise-free using a first-order Markov process in a large number of steps , eventually converting to a normal multivariate Gaussian . This step employs the first-order Markov process to calculate as follows [22]:
(3) | ||||
where and with is the noise variance. Assuming additive Gaussian noise, sampling in an arbitrary step can be calculated in closed form as follows:
(4) |
where and .
Reverse process:
The reverse process gradually learns to remove the added noise in to recover noise-free . This process of training a network to generate from Gaussian noise . The reverse process will follow the forward process trajectories but in the reverse direction for small values as follows [21, 23, 24]:
(5) |
This is a denoising matching term where it learns the desired denoising transition step as an approximator to tractable, ground-truth denoising transition step given in (6), where it is modeled as a Gaussian.
(6) | ||||
Furthermore, all the terms are frozen at each timestep, it was shown that the loss function given in (5) becomes as follows:
(7) |
where is the estimated average recovered image as follows:
(8) |
where is parameterized by our DL model to recover from noisy image at a given step .
We employed an adversarial mapper to implicitly capture the conditional distribution for the reverse process steps. The generator is used to sample . At the same time, a discriminator differentiates between samples and actual sample sampled from the true denoising distribution . Our discriminator was coupled with a gradient penalty to improve learning [15, 24]:
(9) | ||||
where is our generator parameterized by to reconstruct mean and variance. The generator loss becomes:
(10) |
2.3 Proposed self-supervised framework and training details
Acquiring fully sampled data can be impractical due to constraints such as voluntary and involuntary motions, lengthy acquisition times, and signal decay. These constraints hinder the application of supervised DL-based CS-MRI approaches. Thus, we proposed a self-supervised approach that randomly divided the sampling pattern into two sets and as given in (11). These two sets have no elements in common except the center of k-space.
(11) |
We used an under-sampling pattern to train our proposed model and define our DC layer as follows.
(12) |
where capital letters refer to the Fourier transform of the corresponding parameters. In other words, our method updates the k-space lines that were under-sampled and keeps the original k-space lines that were not sampled during image acquisition.
Finally, our proposed method defined the loss function given in (7) using a sampling pattern as follows:
(13) |
The final loss function is composed of discriminator loss, generator loss, and reconstruction loss as follows:
(14) |
where was used to scale and control the ratio between the losses. Our proposed method used the indices specified by to reconstruct k-space at indices given by . Figure 1 illustrates the flowchart of our proposed method. Then the loss function was calculated at the location indicated by the pattern. In other words, our self-supervised network was trained to decrease the discrepancy between the predicted images and the acquired measurement that was not seen in the training. At the inference step, the test unseen under -sampled data using a sampling pattern was used to reconstruct the fully sampled data.
This self-supervised scenario is similar to the cross-validation concept used to reduce bias and the likelihood of overfitting. Cross-validation partitions the dataset into at least two sets where one set is used to train a model and another set is used to evaluate the model. However, unlike cross-validation in machine learning, which performs partitioning once for all data, our method performs random partitioning per image slice.
In the inference step, we did not start from complete noise. Instead, we set the Gaussian noise covariance to 0.1 and then used them directly as initial inputs. This approach has been previously employed to reconstruct high-resolution MR images from under-sampled k-space data and remove MRI motion artifacts [25, 26].
We used the original implementation of the recently proposed transformer, named ReconFormer111https://github.com/guopengf/ReconFormer, as a generator. It incorporates the pyramid structures, enabling scale processing at each pyramid unit, while the globally columnar structure maintains high-resolution information [19]. The discriminator consisted of four convolution layers with kernel size and padding one. Each convolution layer was followed by a ReLu activation layer and a batch normalization layer.
The networks were trained using Adam optimizer with a learning rate to minimize the loss function with batch size four over 25 epochs. All training was performed on our server using PyTorch [27] framework version 2.1.2, and NVIDIA A100 GPUs with CUDA Toolkit 12.2 for all experiments.
2.4 Sampling masks
We employed a 1d random sampling pattern where the k-space center was excluded from sampling.The center fraction excluded from sampling was set to of the k-space lines in the horizontal direction. Our proposed ASSCGD divides the acquired sampling mask into two disjoint sets and with except four k-space lines at the center of dataset. Figure 2 illustrates , , and sampling masks for varying at .
The sampling pattern was randomly sampled for each different slice to train the model. Therefore, our subsampled data using will be able to simulate the ghosting that is present in zero-filled datasets retrospectively subsampled using masks. We investigated a uniformly random selection among elements of to create subsampled patterns and . The sampling ratio was used to train and test the proposed model, where is the number of elements. In other words, means 30% of was randomly assigned to to train the model and the rest were assigned to sampling pattern to define loss function as given in Equation (13).
2.5 Dataset
We utilized publicly available single-coil and multi-coil brain MRI datasets. Both datasets were retrospectively under-sampled using a random sample provided in the fastMRI database, with acceleration rates (R) equal to 2, 6, and 8 [28, 29].
2.5.1 Single-coil data:
The brain MPI-Leipzig Mind-Brain-Body dataset [30, 31] was used to train and test our proposed model. We utilized high-resolution MP2RAGE \chT1 maps of 318 patients that were split into two non-overlap** trains and tested with 250 and 68 patients, respectively. The sagittal acquisition orientation of volumetric MP2RAGE \chT1 maps with 176 slices were acquired with the imaging parameters: \chT_R ms , \chT_E ms, \chTI1 ms, \chTI2 ms, \chFA_1 = , \chFA_2 = , pre-scan normalization, echo spacing ms, bandwidth Hz/pixel, FOV mm, voxel size mm3 isotropic, GRAPPA acceleration factor 3, slice order = interleaved, duration= 8 min 22 s.
2.5.2 Multi-coil data:
The brain datasets used in the preparation of this article were obtained from the NYU fastMRI Initiative database222fastmri.med.nyu.edu that was approved by the NYU School of Medicine Institutional Review Board [28, 29]. We utilized \chT2-weighted (\chT2-w) images of 1051 patients’ data for training and 325 patients’ unseen data for testing. In addition, we evaluated the robustness of our model to domain shift using two out-of-distribution (OOD) unseen datasets, \chT1-weighted (\chT1-w) and postcontrast \chT1-w (T1c), using 50 patients’ data for each MRI sequence.
The sensitivity maps were generated from center of k-space using ESPIRiT 333https://mrirecon.github.io/bart/ [32] with a kernel size and the calibration matrix and eigenvalue decomposition threshold 0.02 and 0.95, respectively.
2.6 Quantitative analysis
To evaluate the performance of the proposed ASSCGD model, we compared it against two benchmark models: SS-MRI [33] and ReconFormer [19], which were trained under our proposed sampling approach.
Three quantitative metrics were employed: normalized mean square error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), utilizing the PyTorch Image Quality library 444https://piq.readthedocs.io/en/latest/index.html [34]. The NMSE compared reconstructed with ground truth as follows:
(15) |
where is the squared Euclidean distance. Lower NMSE values indicate better image reconstruction. However, it might be in favor of blurry images [28].
The PSNR defined below utilizes a logarithmic scaling that makes the quantification results more aligned with human perception [35].
(16) |
where is the maximum signal intensity of ground truth images. Higher PSNR indicates better image reconstruction.
The SSIM quantifies the structural similarity between the reconstructed and ground truth is defined as follows:
(17) |
where and are the average voxel values of and images, are the variance, and is the covariance between and images. The constants and stabilize the division, we used and . SSIM ranges between -1 and +1, with the best similarity achieved by an SSIM equal to +1.
2.7 Statistical analysis
The quantitative metrics were compared using one-way analysis of variance (ANOVA) to evaluate the null hypothesis that the mean values of each method were the same. The differences with p was considered statistically significant. Additionally, a multi-comparison Tukey’s honestly test difference (HSD) was performed to evaluate pairwise differences between the methods, with p indicating statistical significance.
We reported the average values of the quantitative metrics. In addition, we calculated the confidence intervals (CIs) on the average values using the percentile bootstrap method (with iterations) with the bias-adjusted and accelerated bootstrap method [36].
3 Results
In this section, we present the comprehensive evaluation of our proposed ASSCGD model for MRI reconstruction. The results are organized to highlight the model’s performance across different datasets and under various conditions.
We assess the within-domain multi-coil \chT2-w reconstruction and single-coil MP2RAGE \chT1 map reconstruction capabilities of ASSCGD using both qualitative and quantitative metrics. The model’s ability to preserve fine structures and reduce artifacts is demonstrated through visual comparisons and statistical analysis. The impact of loss mask partitioning on the performance of the proposed method is demonstrated.
We also evaluate the robustness of ASSCGD against domain shifts using OOD datasets. This analysis illustrates the model’s adaptability to new and unseen data.
3.1 Within-domain reconstruction
Our proposed ASSCGD was evaluated for within-domain reconstruction at , , and with a partitioning rate . We conducted both qualitative and quantitative comparisons with the ReconFormer Transformer and SS-MRI models.
3.1.1 Qualitative Results
Multi-coil dataset:
Figures 3 and 4 present the qualitative results of within-domain \chT2-w multi-coil images for two subjects. Figure 3(a) shows results for a healthy volunteer, highlighting that our method recovers more details, particularly in the gold boxed regions, compared to the ReconFormer model trained with our framework. Difference maps in Figure 3(b) illustrate that our method better preserves gray matter and edges, indicated by the gold and red boxes.
Figure 4(a) displays results for a patient with brain abnormalities. Our proposed method could recover the abnormality boundary shown by a red box close to the ground truth. Furthermore, our method reconstructed details better (gold boxes) compared with the SS-MRI method. Difference maps in Figure 4(b) confirm these observations.
Single-coil dataset:
Figure 5 presents the qualitative results for high-resolution single-coil MP2RAGE \chT1 quantitative map for a healthy volunteer at , , and with a partitioning rate . Our method could preserve the small structure shown by gold boxes as well as the Putamen and Caudate nuclei shown by red boxes with better spatial contrast than the comparative models, as shown in Figure 5(a) and the difference map shown in Figure 5(b).
3.1.2 Quantitative Results
Tables 1 and 2 summarize the quantitative metrics for multi-coil and single-coil datasets, respectively, at different acceleration rates. The ANOVA test indicated statistically significant differences (p ) between the average values of the methods. The results of the Tukey’s HSD multi-comparison are presented in the text and tables.
Multi-coil dataset:
The quantitative results are listed in Table 1 at , , and and . The ANOVA test indicated p for all metrics indicating that there are statistically significant differences between average values. Our method got the lowest NMSE for all acceleration rates except for and . The ReconFormer method achieved the lowest NMSE value at , nonetheless, it was not statistically significantly different from our method (p ). Although our method achieved the lowest NMSE value for , but it was not statistically significantly different from ReconFormer (p ). Our proposed self-supervised method achieved the highest PSNR and SSIM values for all acceleration rates, demonstrating the lowest remaining spatial distortion, like ghosting, and the highest structural similarities between the ground truth and reconstructed images, respectively.
Zero filled | Reconformer | SS-MRI | Ours | ||
---|---|---|---|---|---|
2 | 14.99 (12.95 - 17.14) | 0.47 (0.45 - 0.50)∗ | 0.52(0.50 - 0.56)∗ | 0.51 (0.48 - 0.54) | |
4 | 26.38 (24.43 - 28.41) | 1.51 (1.42 - 1.64) | 4.87 (4.76 - 4.98) | 1.26 (1.18 - 1.35) | |
NMSE (95% CI) [%] | 8 | 30.78 (28.49 - 33.11) | 3.34 (3.14 - 3.63)∗ | 5.16 (5.06 - 5.26) | 3.26 (3.08 - 3.51) |
2 | 17.75 (17.64 - 17.86) | 38.80 (38.65 - 38.96) | 39.05 (38.90 - 39.20) | 39.93 (39.78 - 40.09) | |
4 | 17.64 (17.53 - 17.77) | 34.23 (34.10 - 34.37) | 27.93 (27.79 - 28.06) | 35.44 (35.31 - 35.58) | |
PSNR (95% CI) [dB] | 8 | 18.75 (18.62 - 18.87) | 27.54 (27.43 - 27.65) | 30.65 (30.52 - 30.76) | 31.67 (31.55 - 31.79) |
2 | 81.53(81.06 - 81.98) | 96.10(96.02 - 96.17) | 96.33 (96.15 - 96.49) | 98.14 (98.06 - 98.21) | |
4 | 70.89(70.44 - 71.36) | 93.36(93.19 - 93.51) | 76.72 (76.32 - 77.15) | 95.55 (95.38 - 95.69) | |
SSIM (95% CI) [%] | 8 | 65.17(64.65 - 65.68) | 89.72 (89.49 - 89.93) | 77.01 (76.63 - 77.39) | 91.67 (91.45 - 91.89) |
∗ indicates p-value of Tukey’s HSD. |
Single-coil dataset:
The quantitative metrics for the within-domain single-coil high-resolution MP2RAGE T1 map are summarized in Table 2. One-way ANOVA tests indicated significant differences (p ) between methods for all metrics. Tukey’s HSD test confirmed that ASSCGD performed significantly better than comparative methods for most metrics. Our method achieved the lowest NMSE values that were statistically different from the comparative methods for all acceleration rates except with where our method achieved a performance that did differ statistically significant (p-value ) from ReconFormer. Our method achieved the highest PSNR values for all acceleration rates that were statistically significantly (p ) from comparative methods. Furthermore, our proposed method achieved the highest SSIM at that differed statistically significantly higher (p ) than the other methods. Nonetheless, our method and ReconFormer performed similarly in terms of the SSIM index at and with p of and , respectively.
Zero filled | Reconformer | SS-MRI | Ours | ||
---|---|---|---|---|---|
2 | 15.37 (13.46 - 17.49) | 0.53 (0.52 - 0.55)∗ | 5.18 (4.97 - 5.40) | 0.56 (0.54 - 0.57) | |
4 | 22.43 (20.65 - 24.33) | 2.04 (1.99 - 2.09) | 8.75 (8.36 - 9.18) | 1.87 (1.82 - 1.93) | |
NMSE (95% CI) [%] | 8 | 33.17 (31.79 - 34.72) | 3.99 (3.90 - 4.10) | 10.76 (10.42 - 11.13) | 3.65 (3.56 - 3.75) |
2 | 15.46 (15.39 - 15.53) | 35.42(35.31 - 35.53) | 23.07(22.98 - 23.16) | 36.15(36.03 - 36.26) | |
4 | 15.35 (15.28 - 15.43) | 29.46(29.37 - 29.56) | 21.31(21.21 - 21.40) | 30.51(30.42 - 30.61) | |
PSNR (95% CI) [dB] | 8 | 16.31 (16.23 - 16.40) | 26.54(26.45 - 26.65) | 20.75(20.66 - 20.85) | 27.94(27.84 - 28.04) |
2 | 73.83(73.43 - 74.21) | 95.75(95.68 - 95.81)∗ | 85.12(84.57 - 85.65) | 95.31(95.25 - 95.38) | |
4 | 68.62(68.29 - 68.94) | 89.11(88.96 - 89.26) | 80.47(80.06 - 80.89) | 89.80(89.63 - 89.95) | |
SSIM (95% CI) [%] | 8 | 66.13(65.83 - 66.42) | 84.55(84.32 - 84.78)∗ | 77.17(76.75 - 77.62) | 84.77(84.53 - 84.99) |
∗ indicates p-value of Tukey’s HSD. |
3.1.3 Voxel-wise correlation
We visualized the scatter plot for 80 randomly selected within-domain multi-coil and single-coil data to visualize and quantify the agreement between reconstructed images at and with the ground truth reference images (see Figure 6). For the single-coil dataset, our method reconstructed images shown in Figure 6(c) with better conformity with the references markedly better than SS-MRI shown in Figure 6(a) and comparable to ReconFormer shown in Figure 6(b). The same results were achieved for multi-coil data where the SS-MRI performed better than itself trained using the single-coil dataset. Still there is a noticeable gap between our proposed (see Figure 6(f)) method and SS-MRI (see Figure 6(d)). Furthermore, ground truth voxel-wise shown in Figure 6(e) and (f) confirms that our method might be able to generate images that are more similar to the ground truth.
3.1.4 Effect of Sampling Mask Ratio
Our proposed method split the sampling mask randomly into two non-overlap** masks and , which were used in train and loss paths, respectively (see Figure 1). The ratio plays a role in the network’s performance. We trained and tested the network for varying (see Figure 2 for the sampling masks). The effect of network training with different s on the quantitative metrics NMSE, PSNR, and SSIM are shown in Figure 7 at three acceleration rates that were acquired using multi-coil axial T2-w images. Our method achieved better performance at in terms of PSNR and SSIM at and . However, it performed similarly to at . Furthermore, NMSE metrics indicate that our method achieved the best performance when trained using than other splitting ratios .
3.2 Out-of-domain reconstruction
We evaluated our model performance in OOD reconstructions where our proposed method was trained using multi-coil axial \chT2-w fastMRI images and then tested using axial \chT1c and \chT1-w shown, respectively, in first and second rows of Figure 8 at . Our method statistically significant (p ) improved quantitative metrics.
Our method achieved significantly lower NMSE values (p ), indicating high voxel-wise similarity between the reconstructed and ground truth images for \chT1-w and \chT1c illustrated in Figures 8(a) and (d), respectively. In addition, high PSNR values confirm minimal remaining spatial distortion in the reconstructed images for \chT1-w and \chT1c illustrated in Figures 8(b) and (e), respectively. Finally, The high SSIM values demonstrate a remarkable resemblance between the reconstructed and ground truth images (see Figures 8(c) and (f)).
4 Discussion
This study introduces the Adaptive Self-supervised Consistency-guided Diffusion (ASSCGD) model, a novel self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method designed to mitigate the challenges associated with prolonged MRI acquisition times and the necessity for fully sampled datasets. Our approach leverages the synergy between CS and DL to accelerate MRI acquisition without compromising image quality.
The ASSCGD model’s primary innovation lies in its use of an adversarial mapper within a self-supervised framework, eliminating the dependence on fully sampled training datasets. This advancement is particularly significant for clinical scenarios where acquiring fully sampled data is impractical. By integrating a diffusion model, our method enhances sampling efficiency and reconstruction quality. The backward diffusion process, executed in smaller steps, contributes to robust and efficient image sampling. Extensive testing on OOD datasets demonstrated significant improvements in NMSE, PSNR, and SSIM metrics. These results highlight the model’s ability to generalize across various MRI sequences and patient-specific conditions, underscoring its versatility for clinical use.
The majority of DL-based CS-MRI methods use supervised learning to train networks [14, 16, 17, 19, 25]. However, acquiring fully sampled can be challenging in some practical applications due to the long acquisition time, physiological constraints, and signal decay. For instance, fully sampled high-resolution brain MP2RAGE \chT1 maps can take around 24 minutes to acquire, which is impractical for large-scale studies and may lead to patient discomfort unless using acceleration approaches. Such a long acquisition time increases the likelihood of patient movements, which could substantially reduce the image quality. Thus, being able to use self-supervised DL-based CS MRI approaches is imperative to broaden their applications where acquiring such data are challenging. By integrating a diffusion model, our method enhances reconstruction quality. The backward diffusion process, executed in smaller steps, contributes to robust and efficient image sampling. Extensive testing on within domain and OOD datasets demonstrated significant improvements in NMSE, PSNR, and SSIM metrics. These results highlight the model’s ability to generalize across various MRI sequences and patient-specific conditions, underscoring its versatility for clinical use.
Several self-supervised studies have been proposed to train models without using fully sampled data. For instance, a data-driven method of de-aliasing was proposed for single-coil data that performed an image-to-image translation [37]. However, it did not encode the operator and used similar sampling patterns for training and loss that increased noise during test time. Alternatively, a self-supervised study was proposed, assuming data acquired using two different sampling patterns [38]. Furthermore, physics-driven methods that unroll the training process were also proposed [20]. However, the unrolling nature of the method might increase the test burden and time.
Our self-supervised method uses a DC layer that was trained end-to-end and utilizes the sampling pattern that is feasible to implement clinically. Our experiment using the multi-coil and single-coil datasets enabled us to recover target ground truth images at markedly lower acquisition times (see Figures 3 - 5). Specifically, our method could recover fine details highlighted by gold and red boxes better than the comparative models. The negligible amount of remaining aliasing in the reconstructed images was confirmed by Figures 3 - 5. Furthermore, this observation was confirmed by the low NMSE and high PSNR achieved in Tables 1 and 2.
Although our method could achieve the best performance for , Figure 7 suggests that our method might be resilient to variations in the hyper-parameter around . In addition, out-of-domain reconstruction indicated in Figure 8 with significantly better performance (p ) indicates reasonable robustness against domain shifts. However, the results indicate that our method performed better on T1-w (first row in Figure 8) images than when tested on T1c (second row in Figure 8). That might be due to contrast agent enhancement in T1c in the image regions that are not evident in T2-w images.
Despite the promising outcomes, several limitations need to be addressed. Our model was not tested on prospectively undersampled raw k-space datasets acquired under parallel imaging frameworks. Future work should explore the application of ASSCGD to these datasets to validate its clinical utility further. In addition, we did not train our method using raw multi-coil high-resolution 3D MRI, such as \chT1 magnetization-prepared rapid acquisition gradient echoes MRI images because they are not readily available to the end-user. . Expanding the training dataset to include such data could enhance the model’s performance.
5 Conclusions
The Adaptive Self-supervised Consistency-guided Diffusion (ASSCGD) model significantly improves the quality of reconstructed MRI images from undersampled data without requiring fully sampled training datasets, offering a promising solution for accelerating MRI acquisition. The proposed method has the potential to enhance clinical MRI practices by reducing scan times and improving image quality, which is crucial for accurate diagnosis and treatment planning. Our method is particularly useful in radiation oncology by reducing the imaging time and the likelihood of motion artifacts, specifically for MRI-linear accelerators.
Conflicts of interest
There are no conflicts of interest declared by the authors.
Acknowledgement
This research is supported in part by the National Institutes of Health under Award Number R56EB033332, R01EB032680, and R01CA272991.
Data availability
Brain MRI data were obtained from the New York University fastMRI initiative database (https://fastmri.med.nyu.edu/) and the OpenNeuro MPI-Leipzig Mind-Brain-Body dataset (https://openneuro.org/datasets/ds000221/versions/00002). The datasets were acquired with the relevant institutional review board approvals.
References
- [1] Hedvig Hricak, May Abdel-Wahab, Rifat Atun, Miriam Mikhail Lette, Diana Paez, James A Brink, Lluís Donoso-Bach, Guy Frija, Monika Hierath, Ola Holmberg, et al. Medical imaging and nuclear medicine: a lancet oncology commission. The Lancet Oncology, 22(4):e136–e172, 2021.
- [2] Yilong Liu, Alex TL Leong, Yujiao Zhao, Linfang Xiao, Henry KF Mak, Anderson Chun On Tsang, Gary KK Lau, Gilberto KK Leung, and Ed X Wu. A low-cost and shielding-free ultra-low-field brain mri scanner. Nature communications, 12(1):7238, 2021.
- [3] Mojtaba Safari, Xiaofeng Yang, Chih-Wei Chang, Richard LJ Qiu, Ali Fatemi, and Louis Archambault. Unsupervised mri motion artifact disentanglement: introducing maudgan. Physics in Medicine and Biology, 2024.
- [4] Justin P Haldar, Diego Hernando, and Zhi-Pei Liang. Compressed-sensing mri with random encoding. IEEE transactions on Medical Imaging, 30(4):893–903, 2010.
- [5] Anagha Deshmane, Vikas Gulani, Mark A Griswold, and Nicole Seiberlich. Parallel mr imaging. Journal of Magnetic Resonance Imaging, 36(1):55–72, 2012.
- [6] Ricardo Otazo, Daniel Kim, Leon Axel, and Daniel K Sodickson. Combination of compressed sensing and parallel imaging for highly accelerated first-pass cardiac perfusion mri. Magnetic resonance in medicine, 64(3):767–776, 2010.
- [7] Yuchou Chang, Dong Liang, and Leslie Ying. Nonlinear grappa: A kernel approach to parallel mri reconstruction. Magnetic Resonance in Medicine, 68(3):730–740, 2012.
- [8] Mojtaba Safari, Zach Eidex, Chih-Wei Chang, Richard LJ Qiu, and Xiaofeng Yang. Fast mri reconstruction using deep learning-based compressed sensing: A systematic review. arXiv preprint arXiv:2405.00241, 2024.
- [9] Zhaoyang ** and Qing-San Xiang. Improving accelerated mri by deep learning with sparsified complex data. Magnetic Resonance in Medicine, 89(5):1825–1838, 2023.
- [10] Chentao Cao, Zhuo-Xu Cui, Yue Wang, Shaonan Liu, Tai** Chen, Hairong Zheng, Dong Liang, and Yanjie Zhu. High-frequency space diffusion model for accelerated mri. IEEE Transactions on Medical Imaging, 2024.
- [11] Dominik Narnhofer, Kerstin Hammernik, Florian Knoll, and Thomas Pock. Inverse gans for accelerated mri reconstruction. In Wavelets and Sparsity XVIII, volume 11138, pages 381–392. SPIE, 2019.
- [12] Andreas Kofler, Marie-Christine Pali, Tobias Schaeffter, and Christoph Kolbitsch. Deep supervised dictionary learning by algorithm unrolling—application to fast 2d dynamic mr image reconstruction. Medical Physics, 50(5):2939–2960, 2023.
- [13] Biao Qu, Jialue Zhang, Taishan Kang, Jianzhong Lin, Mei** Lin, Huajun She, Qingxia Wu, Meiyun Wang, and Gaofeng Zheng. Radial magnetic resonance image reconstruction with a deep unrolled projected fast iterative soft-thresholding network. Computers in Biology and Medicine, 168:107707, 2024.
- [14] Mojtaba Safari, Xiaofeng Yang, and Ali Fatemi. Mri data consistency guided conditional diffusion probabilistic model for mr imaging acceleration. In Medical Imaging 2024: Clinical and Biomedical Imaging, volume 12930, pages 202–205. SPIE, 2024.
- [15] Alper Güngör, Salman UH Dar, Şaban Öztürk, Yilmaz Korkmaz, Hasan A Bedel, Gokberk Elmas, Muzaffer Ozbey, and Tolga Çukur. Adaptive diffusion priors for accelerated mri reconstruction. Medical Image Analysis, 88:102872, 2023.
- [16] Maarten L Terpstra, Matteo Maspero, Joost JC Verhoeff, and Cornelis AT van den Berg. Accelerated respiratory-resolved 4d-mri with separable spatio-temporal neural networks. Medical physics, 50(9):5331–5342, 2023.
- [17] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony N Price, and Daniel Rueckert. A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE transactions on Medical Imaging, 37(2):491–503, 2017.
- [18] Kawin Setsompop, R Kimmlingen, E Eberlein, Thomas Witzel, Julien Cohen-Adad, Jennifer A McNab, Boris Keil, M Dylan Tisdall, P Hoecht, Peter Dietz, et al. Pushing the limits of in vivo diffusion mri for the human connectome project. Neuroimage, 80:220–233, 2013.
- [19] Pengfei Guo, Yiqun Mei, **yuan Zhou, Shanshan Jiang, and Vishal M Patel. Reconformer: Accelerated mri reconstruction using recurrent transformer. IEEE transactions on medical imaging, 2023.
- [20] Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uğurbil, and Mehmet Akçakaya. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magnetic resonance in medicine, 84(6):3172–3191, 2020.
- [21] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- [22] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- [23] Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
- [24] Stanley H Chan. Tutorial on diffusion models for imaging and vision. arXiv preprint arXiv:2403.18103, 2024.
- [25] Jiahao Huang, Angelica I Aviles-Rivero, Carola-Bibiane Schönlieb, and Guang Yang. Cdiffmr: Can we replace the gaussian noise with k-space undersampling for fast mri? In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 3–12. Springer, 2023.
- [26] Mojtaba Safari, Xiaofeng Yang, Ali Fatemi, and Louis Archambault. Mri motion artifact reduction using a conditional diffusion probabilistic model (mar-cdpm). Medical Physics, 51(4):2598–2610, 2024.
- [27] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- [28] Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J. Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, Nafissa Yakubova, James Pinkerton, Duo Wang, Erich Owens, C. Lawrence Zitnick, Michael P. Recht, Daniel K. Sodickson, and Yvonne W. Lui. fastMRI: An open dataset and benchmarks for accelerated MRI, 2018.
- [29] Florian Knoll, Jure Zbontar, Anuroop Sriram, Matthew J Muckley, Mary Bruno, Aaron Defazio, Marc Parente, Krzysztof J Geras, Joe Katsnelson, Hersh Chandarana, et al. fastmri: A publicly available raw k-space and dicom dataset of knee images for accelerated mr image reconstruction using machine learning. Radiology: Artificial Intelligence, 2(1):e190007, 2020.
- [30] Anahit Babayan, Blazeij Baczkowski, Roberto Cozatl, Maria Dreyer, Haakon Engen, Miray Erbey, Marcel Falkiewicz, Nicolas Farrugia, Michael Gaebler, Johannes Golchert, Laura Golz, Krzysztof Gorgolewski, Philipp Haueis, Julia Huntenburg, Rebecca Jost, Yelyzaveta Kramarenko, Sarah Krause, Deniz Kumral, Mark Lauckner, Daniel S. Margulies, Natacha Mendes, Katharina Ohrnberger, Sabine Oligschläger, Anastasia Osoianu, Jared Pool, Janis Reichelt, Andrea Reiter, Josefin Röbbig, Lina Schaare, Jonathan Smallwood, and Arno Villringer. ”mpi-leipzig-mind-brain-body”, 2018.
- [31] Anahit Babayan, Miray Erbey, Deniz Kumral, Janis D Reinelt, Andrea MF Reiter, Josefin Röbbig, H Lina Schaare, Marie Uhlig, Alfred Anwander, Pierre-Louis Bazin, et al. A mind-brain-body dataset of mri, eeg, cognition, emotion, and peripheral physiology in young and old adults. Scientific data, 6(1):1–21, 2019.
- [32] Martin Uecker, Peng Lai, Mark J Murphy, Patrick Virtue, Michael Elad, John M Pauly, Shreyas S Vasanawala, and Michael Lustig. Espirit—an eigenvalue approach to autocalibrating parallel mri: where sense meets grappa. Magnetic resonance in medicine, 71(3):990–1001, 2014.
- [33] Chen Hu, Cheng Li, Haifeng Wang, Qiegen Liu, Hairong Zheng, and Shanshan Wang. Self-supervised learning for mri reconstruction with a parallel network training framework. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VI 24, pages 382–391. Springer, 2021.
- [34] Sergey Kastryulin, Jamil Zakirov, Denis Prokopenko, and Dmitry V. Dylov. Pytorch image quality: Metrics for image quality assessment, 2022.
- [35] Zach Eidex, **g Wang, Mojtaba Safari, Eric Elder, Jacob Wynne, Tonghe Wang, Hui-Kuo Shu, Hui Mao, and Xiaofeng Yang. High-resolution 3t to 7t adc map synthesis with a hybrid cnn-transformer model. Medical Physics, 2024.
- [36] Bradley Efron and Robert J Tibshirani. Confidence intervals based on bootstrap percentiles. In An introduction to the bootstrap, pages 168–177. Springer, 1993.
- [37] Ortal Senouf, Sanketh Vedula, Tomer Weiss, Alex Bronstein, Oleg Michailovich, and Michael Zibulevsky. Self-supervised learning of inverse problem solvers in medical imaging. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data: First MICCAI Workshop, DART 2019, and First International Workshop, MIL3ID 2019, Shenzhen, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13 and 17, 2019, Proceedings 1, pages 111–119. Springer, 2019.
- [38] Jiaming Liu, Yu Sun, Cihat Eldeniz, Weijie Gan, Hongyu An, and Ulugbek S Kamilov. Rare: Image reconstruction using deep priors learned without groundtruth. IEEE Journal of Selected Topics in Signal Processing, 14(6):1088–1099, 2020.