Abstract

Background: Magnetic resonance imaging (MRI) offers excellent soft tissue contrast essential for diagnosis and treatment, but its long acquisition times can cause patient discomfort and motion artifacts.

Purpose: To propose a self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method named “Adaptive Self-Supervised Consistency Guided Diffusion Model (ASSCGD)” to accelerate data acquisition without requiring fully sampled datasets.

Materials and Methods: We used the fastMRI multi-coil brain axial \chT2-weighted (\chT2-w) dataset from 1,376 cases and single-coil brain quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) \chT1 maps from 318 cases to train and test our model. Robustness against domain shift was evaluated using two out-of-distribution (OOD) datasets: multi-coil brain axial postcontrast \chT1-weighted (\chT1c) dataset from 50 cases and axial T1-weighted (T1-w) dataset from 50 patients. Data were retrospectively subsampled at acceleration rates R{2×,4×,8×}R\in\{2\times,4\times,8\times\}italic_R ∈ { 2 × , 4 × , 8 × }. ASSCGD partitions a random sampling pattern into two disjoint sets, ensuring data consistency during training. We compared our method with ReconFormer Transformer and SS-MRI, assessing performance using normalized mean squared error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Statistical tests included one-way analysis of variance (ANOVA) and multi-comparison Tukey’s Honesty Significant Difference (HSD) tests.

Results: ASSCGD preserved fine structures and brain abnormalities visually better than comparative methods at R=8×R=8\timesitalic_R = 8 × for both multi-coil and single-coil datasets. It achieved the lowest NMSE at R{4×,8×}R\in\{4\times,8\times\}italic_R ∈ { 4 × , 8 × }, and the highest PSNR and SSIM values at all acceleration rates for the multi-coil dataset. Similar trends were observed for the single-coil dataset, though SSIM values were comparable to ReconFormer at R{2×,8×}R\in\{2\times,8\times\}italic_R ∈ { 2 × , 8 × }. These results were further confirmed by the voxel-wise correlation scatter plots. OOD results showed significant (p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT) improvements in undersampled image quality after reconstruction.

Conclusions: ASSCGD successfully reconstructs fully sampled images without utilizing them in the training step, potentially reducing imaging costs and enhancing image quality crucial for diagnosis and treatment.

Adaptive Self-Supervised Consistency-Guided Diffusion Model for Accelerated MRI Reconstruction
Mojtaba Safari1, Zach Eidex1, Shaoyan Pan1, Richard L.J. Qiu1, Xiaofeng Yang, PhD1,‡
1Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, United States of America
Corresponding Author

Author to whom correspondence should be addressed. email: [email protected]

keywords: k-space sampling, fastMRI, accelerated MRI, reconstruction, adaptive partitioning

1 Introduction

Magnetic resonance imaging (MRI) provides excellent soft tissue contrast, playing a vital role in diagnosis, treatment, and follow-up. However, the prolonged acquisition times can lead to patient discomfort and motion artifacts, which compromise image quality. In addition, the recent Lancet Oncology Commission highlighted severe MRI and other medical imaging device shortages in low- and middle-income countries [1], potentially resulting in 2.5 million deaths worldwide [2]. Globally, only seven MRI scanners are installed per million people, primarily due to the high cost of installation, operation, and maintenance. In addition, the highly sampled k-space required for high-resolution MRI images prohibitively increases acquisition time. Long acquisition time reduces imaging throughput and increases patient discomfort and motion artifacts [3].

The MRI acquisition can be accelerated by reducing the sampled k-space data, but this is limited by the Nyquist criteria. Compressed sensing (CS) and parallel imaging (PI) techniques aim to recover fully sampled images from under-sampled images by exploiting data in a sparse transformed space and redundant data acquired using uncorrelated radiofrequency coils, respectively [4, 5]. However, at high acceleration rates, PI and CS methods suffer from noise amplification [6] and residual artifacts [7], respectively.

Deep learning (DL) algorithms have been extensively used to reconstruct accelerated high-resolution MRI images. DL-based compressed sensing-MRI (DL-based CS-MRI) approaches are roughly divided into two categories: data-driven and physics-guided [8]. The former maps the under-sampled k-space to the fully-sampled k-space [9, 10, 11]. The latter incorporates the forward encoding operator knowledge, such as the coil sensitivity and under-sampling pattern, to solve ill-posed inverse problems. It is mainly divided into two groups: unroll optimization method and the data consistency (DC) layer. The former unrolls an iterative reconstruction approach where it alternates between data consistency and regularization to solve the objective in a fixed number of iterations [12, 13, 14]. The DC layer is an end-to-end way of training the unroll models and usually is the latest layer of the networks. [15, 16, 17].

These methods are typically trained under supervised frameworks where the reference fully sampled images were utilized to train a model. However, obtaining the fully sampled images might be impractical in imaging scenarios such as cardiovascular MRI due to excessive involuntary movements, or diffusion MRI with echo planar imaging due to quick \chT2^* signal decay [18]. Additionally, acquiring high-resolution anatomical brain MRI images can be prohibitively long.

In this study, we propose an adaptive self-supervised consistency-guided diffusion (ASSCGD) model to reconstruct fully sampled images without requiring them in the training step. Our proposed ASSCGD model is based on an adversarial mapper to reconstruct fully sampled MRI images. Our proposed method was evaluated using both single-coil and multi-coil MRI data, as well as two out-of-distribution (OOD) datasets. Our method leverages a recently proposed ReconFormer transformer [19] as a generator and is compared with two state-of-the-art models. Our contributions are as follows:

  • To our knowledge, ASSCGD is the first study proposing a self-supervised method using an adversarial mapper.

  • The proposed method performs the backward diffusion process in smaller steps that improve sampling efficiency.

  • The proposed method’s robustness against domain shift was evaluated at the test time,

  • To our knowledge, It is the first self-supervised method aimed at reconstructing fully sampled quantitative magnetization prepared 2 rapid acquisition gradient echoes (MP2RAGE) \chT1 map,

2 Materials and Methods

2.1 Compressed sensing MRI

Let yN𝑦superscript𝑁y\in\mathbb{C}^{N}italic_y ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT represent the observed subsampled k-space measurement and xM𝑥superscript𝑀x\in\mathbb{C}^{M}italic_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT is the unobserved fully sampled data. The compressed sensing is formulated as follows:

y=𝒜Ωx+δ𝑦subscript𝒜Ω𝑥𝛿\centering y=\mathcal{A}_{\Omega}x+\delta\@add@centeringitalic_y = caligraphic_A start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_x + italic_δ (1)

where δN𝛿superscript𝑁\delta\in\mathbb{C}^{N}italic_δ ∈ blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is the additive acquisition noise and 𝒜ΩN×MNsubscript𝒜Ωsuperscript𝑁𝑀superscript𝑁\mathcal{A}_{\Omega}\in\mathbb{C}^{N\times M}\to\mathbb{C}^{N}caligraphic_A start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × italic_M end_POSTSUPERSCRIPT → blackboard_C start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT give NMmuch-less-than𝑁𝑀N\ll Mitalic_N ≪ italic_M represents the encoding operator. 𝒜Ωsubscript𝒜Ω\mathcal{A}_{\Omega}caligraphic_A start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT is composed of a coil sensitivity map 𝒮𝒮\mathcal{S}caligraphic_S, a Fourier transform \mathcal{F}caligraphic_F, and a sampling map with the specified pattern ΩΩ\Omegaroman_Ω controlling the acceleration rate (R). The mathematical expression for the encoding operator is 𝒜Ω=Ω𝒮subscript𝒜Ωdirect-productΩ𝒮\mathcal{A}_{\Omega}=\Omega\odot\mathcal{F}\odot\mathcal{S}caligraphic_A start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT = roman_Ω ⊙ caligraphic_F ⊙ caligraphic_S. The MRI reconstruction is formulated as an unconstrained optimization problem as follows [14, 17, 20],

argminx,φxfASSCGD(yφ)22+λy𝒜Ωx22\centering\underset{x,\varphi}{\arg\min}\parallel x-f_{\text{ASSCGD}}(y\mid% \varphi)\parallel_{2}^{2}+\lambda\parallel y-\mathcal{A}_{\Omega}x\parallel_{2% }^{2}\@add@centeringstart_UNDERACCENT italic_x , italic_φ end_UNDERACCENT start_ARG roman_arg roman_min end_ARG ∥ italic_x - italic_f start_POSTSUBSCRIPT ASSCGD end_POSTSUBSCRIPT ( italic_y ∣ italic_φ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ ∥ italic_y - caligraphic_A start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (2)

where fASSCGDsubscript𝑓ASSCGDf_{\text{ASSCGD}}italic_f start_POSTSUBSCRIPT ASSCGD end_POSTSUBSCRIPT represents our proposed model parametrized by φ𝜑\varphiitalic_φ. The first and second terms represent the regularization and data consistency, and λ>0𝜆0\lambda>0italic_λ > 0 is a scalar regularization weight that balances between the data consistency and regularization terms.

2.2 Diffusion model

The diffusion model inspired by nonequilibrium thermodynamics aims to approximate complex and intractable distributions with a tractable one like normal Gaussian [21]. It consists of two process: the forward process and the reverse process.

Forward process:

The forward process utilizes a noise scheduler to add Gaussian noise to the noise-free y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT using a first-order Markov process q(yt|yt1)𝑞conditionalsubscript𝑦𝑡subscript𝑦𝑡1q(y_{t}|y_{t-1})italic_q ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) in a large number of steps T𝑇Titalic_T, eventually converting yTsubscript𝑦𝑇y_{T}italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT to a normal multivariate Gaussian yT𝒩(yT;𝟎,𝐈)similar-tosubscript𝑦𝑇𝒩subscript𝑦𝑇0𝐈y_{T}\sim\mathcal{N}(y_{T};\boldsymbol{0},\mathbf{I})italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∼ caligraphic_N ( italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ; bold_0 , bold_I ). This step employs the first-order Markov process to calculate q(yt|yt1)𝑞conditionalsubscript𝑦𝑡subscript𝑦𝑡1q(y_{t}|y_{t-1})italic_q ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) as follows [22]:

yt=1βtyt1+βtϵsubscript𝑦𝑡1subscript𝛽𝑡subscript𝑦𝑡1subscript𝛽𝑡italic-ϵ\displaystyle y_{t}=\sqrt{1-\beta_{t}}y_{t-1}+\sqrt{\beta_{t}}\epsilonitalic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + square-root start_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_ϵ (3)
q(ytyt1)=𝒩(yt;1βtyt1,βt𝐈)𝑞conditionalsubscript𝑦𝑡subscript𝑦𝑡1𝒩subscript𝑦𝑡1subscript𝛽𝑡subscript𝑦𝑡1subscript𝛽𝑡𝐈\displaystyle q(y_{t}\mid y_{t-1})=\mathcal{N}(y_{t};\sqrt{1-\beta_{t}}y_{t-1}% ,\beta_{t}\mathbf{I})italic_q ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = caligraphic_N ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I )

where ϵ𝒩(ϵ;𝟎,𝐈)similar-toitalic-ϵ𝒩italic-ϵ0𝐈\epsilon\sim\mathcal{N}(\epsilon;\boldsymbol{0},\mathbf{I})italic_ϵ ∼ caligraphic_N ( italic_ϵ ; bold_0 , bold_I ) and βt(0,1)subscript𝛽𝑡01\beta_{t}\in(0,1)italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ ( 0 , 1 ) with β1=104subscript𝛽1superscript104\beta_{1}=10^{-4}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT is the noise variance. Assuming additive Gaussian noise, sampling ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in an arbitrary step t𝑡titalic_t can be calculated in closed form as follows:

q(yty0)=𝒩(yt;α¯ty0,(1α¯t)𝐈)𝑞conditionalsubscript𝑦𝑡subscript𝑦0𝒩subscript𝑦𝑡subscript¯𝛼𝑡subscript𝑦01subscript¯𝛼𝑡𝐈\centering q(y_{t}\mid y_{0})=\mathcal{N}(y_{t};\sqrt{\bar{\alpha}_{t}}y_{0},(% 1-\bar{\alpha}_{t})\mathbf{I})\@add@centeringitalic_q ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_I ) (4)

where αt=1βtsubscript𝛼𝑡1subscript𝛽𝑡\alpha_{t}=1-\beta_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and α¯t=s=1tαssubscript¯𝛼𝑡superscriptsubscriptproduct𝑠1𝑡subscript𝛼𝑠\bar{\alpha}_{t}=\prod_{s=1}^{t}\alpha_{s}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT.

Reverse process:

The reverse process gradually learns to remove the added noise in yTsubscript𝑦𝑇y_{T}italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT to recover noise-free y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. This process of training a network pφsubscript𝑝𝜑p_{\varphi}italic_p start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT to generate y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT from Gaussian noise yTsubscript𝑦𝑇y_{T}italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. The reverse process will follow the forward process trajectories but in the reverse direction for small β𝛽\betaitalic_β values as follows [21, 23, 24]:

argmin𝜑t=2T𝔼q(yty0)[D𝕂𝕃(q(yt1yt,y0)pφ(yt1yt))]\centering\underset{\varphi}{\arg\min}\sum_{t=2}^{T}\mathbb{E}_{q(y_{t}\mid y_% {0})}\left[D_{\mathbb{KL}}\left(q(y_{t-1}\mid y_{t},y_{0})\parallel p_{\varphi% }(y_{t-1}\mid y_{t})\right)\right]\@add@centeringunderitalic_φ start_ARG roman_arg roman_min end_ARG ∑ start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_D start_POSTSUBSCRIPT blackboard_K blackboard_L end_POSTSUBSCRIPT ( italic_q ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ italic_p start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ] (5)

This is a denoising matching term where it learns the desired denoising transition step pφ(yt1yt)subscript𝑝𝜑conditionalsubscript𝑦𝑡1subscript𝑦𝑡p_{\varphi}(y_{t-1}\mid y_{t})italic_p start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) as an approximator to tractable, ground-truth denoising transition step q(yt1yt,y0)𝑞conditionalsubscript𝑦𝑡1subscript𝑦𝑡subscript𝑦0q(y_{t-1}\mid y_{t},y_{0})italic_q ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) given in (6), where it is modeled as a Gaussian.

q(yt1yt,y0)=𝒩(xt1;μq(yt,y0),σq(t)𝐈)𝑞conditionalsubscript𝑦𝑡1subscript𝑦𝑡subscript𝑦0𝒩subscript𝑥𝑡1subscript𝜇𝑞subscript𝑦𝑡subscript𝑦0subscript𝜎𝑞𝑡𝐈\displaystyle q(y_{t-1}\mid y_{t},y_{0})=\mathcal{N}\left(x_{t-1};\mu_{q}(y_{t% },y_{0}),\sigma_{q}(t)\mathbf{I}\right)italic_q ( italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ; italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_t ) bold_I ) (6)
μq(yt,y0)=αt(1α¯t1)yt+α¯t1y01α¯tsubscript𝜇𝑞subscript𝑦𝑡subscript𝑦0subscript𝛼𝑡1subscript¯𝛼𝑡1subscript𝑦𝑡subscript¯𝛼𝑡1subscript𝑦01subscript¯𝛼𝑡\displaystyle\mu_{q}(y_{t},y_{0})=\dfrac{\sqrt{\alpha_{t}}(1-\bar{\alpha}_{t-1% })y_{t}+\sqrt{\bar{\alpha}_{t-1}}y_{0}}{1-\bar{\alpha}_{t}}italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = divide start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG
σq(t)=(1αt)(1α¯t1)1α¯tsubscript𝜎𝑞𝑡1subscript𝛼𝑡1subscript¯𝛼𝑡11subscript¯𝛼𝑡\displaystyle\sigma_{q}(t)=\dfrac{(1-{\alpha}_{t})(1-\bar{\alpha}_{t-1})}{1-% \bar{\alpha}_{t}}italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG

Furthermore, all the α𝛼\alphaitalic_α terms are frozen at each timestep, it was shown that the loss function given in (5) becomes as follows:

Ł=argmin𝜑12σq2(t)[μq(yt,y0)μφ(yt)22]Ł𝜑12superscriptsubscript𝜎𝑞2𝑡delimited-[]superscriptsubscriptnormsubscript𝜇𝑞subscript𝑦𝑡subscript𝑦0subscript𝜇𝜑subscript𝑦𝑡22\centering\text{\L{}}=\underset{\varphi}{\arg\min}\dfrac{1}{2\sigma_{q}^{2}(t)% }\left[\parallel\mu_{q}(y_{t},y_{0})-\mu_{\varphi}(y_{t})\parallel_{2}^{2}% \right]\@add@centeringŁ = underitalic_φ start_ARG roman_arg roman_min end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) end_ARG [ ∥ italic_μ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (7)

where μφ(yt)subscript𝜇𝜑subscript𝑦𝑡\mu_{\varphi}(y_{t})italic_μ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is the estimated average recovered image as follows:

μφ(yt)=αt(1α¯t1)yt+α¯t1yφ(yt)1α¯tsubscript𝜇𝜑subscript𝑦𝑡subscript𝛼𝑡1subscript¯𝛼𝑡1subscript𝑦𝑡subscript¯𝛼𝑡1subscript𝑦𝜑subscript𝑦𝑡1subscript¯𝛼𝑡\mu_{\varphi}(y_{t})=\dfrac{\sqrt{\alpha_{t}}(1-\bar{\alpha}_{t-1})y_{t}+\sqrt% {\bar{\alpha}_{t-1}}y_{\varphi}(y_{t})}{1-\bar{\alpha}_{t}}italic_μ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = divide start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_ARG italic_y start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG (8)

where yφ(yt)subscript𝑦𝜑subscript𝑦𝑡y_{\varphi}(y_{t})italic_y start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) is parameterized by our DL model to recover y0subscript𝑦0y_{0}italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT from noisy image ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at a given step t𝑡titalic_t.

We employed an adversarial mapper to implicitly capture the conditional distribution for the reverse process steps. The generator fASSCGDsubscript𝑓ASSCGDf_{\text{ASSCGD}}italic_f start_POSTSUBSCRIPT ASSCGD end_POSTSUBSCRIPT is used to sample y^tpφ(ytyt+k)similar-tosubscript^𝑦𝑡subscript𝑝𝜑conditionalsubscript𝑦𝑡subscript𝑦𝑡𝑘\hat{y}_{t}\sim p_{\varphi}(y_{t}\mid y_{t+k})over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ). At the same time, a discriminator 𝒟θsubscript𝒟𝜃\mathcal{D}_{\theta}caligraphic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT differentiates between samples y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and actual sample ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT sampled from the true denoising distribution q(ytyt+k)𝑞conditionalsubscript𝑦𝑡subscript𝑦𝑡𝑘q(y_{t}\mid y_{t+k})italic_q ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ). Our discriminator was coupled with a gradient penalty to improve learning [15, 24]:

LD=subscript𝐿𝐷absent\displaystyle L_{D}=italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = t0(𝔼q(y0,yt)𝔼q(yt+kyt)[log(𝒟θ(yt,yt+k))]\displaystyle\sum_{t\geq 0}(\mathbb{E}_{q(y_{0},y_{t})}\mathbb{E}_{q(y_{t+k}% \mid y_{t})}\left[-\log\left(\mathcal{D}_{\theta}(y_{t},y_{t+k})\right)\right]∑ start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT ( blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ - roman_log ( caligraphic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ) ) ] (9)
+𝔼q(yt+k)𝔼𝒩φ(μφ,σφ)[log(1𝒟θ(y^t,yt+k))]subscript𝔼𝑞subscript𝑦𝑡𝑘subscript𝔼subscript𝒩𝜑subscript𝜇𝜑subscript𝜎𝜑delimited-[]1subscript𝒟𝜃subscript^𝑦𝑡subscript𝑦𝑡𝑘\displaystyle+\mathbb{E}_{q(y_{t+k})}\mathbb{E}_{\mathcal{N}_{\varphi}(\mu_{% \varphi},\sigma_{\varphi})}\left[-\log(1-\mathcal{D}_{\theta}(\hat{y}_{t},y_{t% +k}))\right]+ blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ - roman_log ( 1 - caligraphic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ) ) ]
+𝔼q(y0,yt)𝔼q(yt+kyt)[12yt𝒟θ(y^t,yt+k)22])\displaystyle+\mathbb{E}_{q(y_{0},y_{t})}\mathbb{E}_{q(y_{t+k}\mid y_{t})}% \left[\dfrac{1}{2}\parallel\nabla_{y_{t}}\mathcal{D}_{\theta}(\hat{y}_{t},y_{t% +k})\parallel_{2}^{2}\right])+ blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ∣ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ ∇ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] )

where 𝒩φ(μφ,σφ)subscript𝒩𝜑subscript𝜇𝜑subscript𝜎𝜑\mathcal{N}_{\varphi}(\mu_{\varphi},\sigma_{\varphi})caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ) is our generator parameterized by φ𝜑\varphiitalic_φ to reconstruct mean and variance. The generator loss becomes:

LG=t0𝔼q(yt+k)𝔼𝒩φ(μφ,σφ)[log(𝒟θ(yt,yt+k))]subscript𝐿𝐺subscript𝑡0subscript𝔼𝑞subscript𝑦𝑡𝑘subscript𝔼subscript𝒩𝜑subscript𝜇𝜑subscript𝜎𝜑delimited-[]subscript𝒟𝜃subscript𝑦𝑡subscript𝑦𝑡𝑘\centering L_{G}=\sum_{t\geq 0}\mathbb{E}_{q(y_{t+k})}\mathbb{E}_{\mathcal{N}_% {\varphi}(\mu_{\varphi},\sigma_{\varphi})}\left[-\log(\mathcal{D}_{\theta}(y_{% t},y_{t+k}))\right]\@add@centeringitalic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t ≥ 0 end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_q ( italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT caligraphic_N start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ( italic_μ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ - roman_log ( caligraphic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT ) ) ] (10)

2.3 Proposed self-supervised framework and training details

Acquiring fully sampled data can be impractical due to constraints such as voluntary and involuntary motions, lengthy acquisition times, and signal decay. These constraints hinder the application of supervised DL-based CS-MRI approaches. Thus, we proposed a self-supervised approach that randomly divided the sampling pattern ΩΩ\Omegaroman_Ω into two sets \alephroman_ℵ and ΥΥ\Upsilonroman_Υ as given in (11). These two sets have no elements in common except the center of k-space.

Ω=ΥΩΥ\centering\Omega=\aleph\vee\Upsilon\@add@centeringroman_Ω = roman_ℵ ∨ roman_Υ (11)

We used an under-sampling pattern \alephroman_ℵ to train our proposed model and define our DC layer as follows.

Y^tASSCGD(k)={Y^tASSCGD(k),ifkX(k),ifksuperscriptsubscript^𝑌𝑡ASSCGD𝑘casessuperscriptsubscript^𝑌𝑡ASSCGD𝑘if𝑘𝑋𝑘if𝑘\centering\hat{Y}_{t}^{\text{ASSCGD}}(k)=\left\{\begin{array}[]{c}\hat{Y}_{t}^% {\text{ASSCGD}}(k),\,\,\,\,\text{if}\,\,\,\,k\in\aleph\\ X(k),\,\,\,\,\,\,\,\,\,\,\,\text{if}\,\,\,\,k\notin\aleph\end{array}\right.\@add@centeringover^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ASSCGD end_POSTSUPERSCRIPT ( italic_k ) = { start_ARRAY start_ROW start_CELL over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ASSCGD end_POSTSUPERSCRIPT ( italic_k ) , if italic_k ∈ roman_ℵ end_CELL end_ROW start_ROW start_CELL italic_X ( italic_k ) , if italic_k ∉ roman_ℵ end_CELL end_ROW end_ARRAY (12)

where capital letters refer to the Fourier transform of the corresponding parameters. In other words, our method updates the k-space lines that were under-sampled and keeps the original k-space lines that were not sampled during image acquisition.

Finally, our proposed method defined the loss function given in (7) using a sampling pattern ΥΥ\Upsilonroman_Υ as follows:

ŁΥ=argmin𝜑12σq2(t)(1αt)2(1α¯t)αt[ϵϵ^φΥ(yt)22]superscriptŁΥ𝜑12superscriptsubscript𝜎𝑞2𝑡superscript1subscript𝛼𝑡21subscript¯𝛼𝑡subscript𝛼𝑡delimited-[]superscriptsubscriptnormitalic-ϵsuperscriptsubscript^italic-ϵ𝜑Υsubscript𝑦𝑡22\centering\text{\L{}}^{\Upsilon}=\underset{\varphi}{\arg\min}\dfrac{1}{2\sigma% _{q}^{2}(t)}\dfrac{(1-\alpha_{t})^{2}}{(1-\bar{\alpha}_{t})\alpha_{t}}\left[% \parallel\epsilon-\hat{\epsilon}_{\varphi}^{\Upsilon}(y_{t})\parallel_{2}^{2}% \right]\@add@centeringŁ start_POSTSUPERSCRIPT roman_Υ end_POSTSUPERSCRIPT = underitalic_φ start_ARG roman_arg roman_min end_ARG divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t ) end_ARG divide start_ARG ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG [ ∥ italic_ϵ - over^ start_ARG italic_ϵ end_ARG start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Υ end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (13)

The final loss function is composed of discriminator loss, generator loss, and reconstruction loss as follows:

Lfinal=ŁΥ+λ(LD+LG)subscript𝐿finalsuperscriptŁΥ𝜆subscript𝐿𝐷subscript𝐿𝐺L_{\text{final}}=\text{\L{}}^{\Upsilon}+\lambda(L_{D}+L_{G})italic_L start_POSTSUBSCRIPT final end_POSTSUBSCRIPT = Ł start_POSTSUPERSCRIPT roman_Υ end_POSTSUPERSCRIPT + italic_λ ( italic_L start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) (14)

where λ=0.1𝜆0.1\lambda=0.1italic_λ = 0.1 was used to scale and control the ratio between the losses. Our proposed method used the indices specified by \alephroman_ℵ to reconstruct k-space at indices given by ΥΥ\Upsilonroman_Υ. Figure 1 illustrates the flowchart of our proposed method. Then the loss function was calculated at the location indicated by the ΥΥ\Upsilonroman_Υ pattern. In other words, our self-supervised network was trained to decrease the discrepancy between the predicted images ysuperscript𝑦y^{\aleph}italic_y start_POSTSUPERSCRIPT roman_ℵ end_POSTSUPERSCRIPT and the acquired measurement yΥsuperscript𝑦Υy^{\Upsilon}italic_y start_POSTSUPERSCRIPT roman_Υ end_POSTSUPERSCRIPT that was not seen in the training. At the inference step, the test unseen under -sampled data using a sampling pattern ΩΩ\Omegaroman_Ω was used to reconstruct the fully sampled data.

This self-supervised scenario is similar to the cross-validation concept used to reduce bias and the likelihood of overfitting. Cross-validation partitions the dataset into at least two sets where one set is used to train a model and another set is used to evaluate the model. However, unlike cross-validation in machine learning, which performs partitioning once for all data, our method performs random partitioning per image slice.

In the inference step, we did not start from complete noise. Instead, we set the Gaussian noise covariance to 0.1 and then used them directly as initial inputs. This approach has been previously employed to reconstruct high-resolution MR images from under-sampled k-space data and remove MRI motion artifacts [25, 26].

We used the original implementation of the recently proposed transformer, named ReconFormer111https://github.com/guopengf/ReconFormer, as a generator. It incorporates the pyramid structures, enabling scale processing at each pyramid unit, while the globally columnar structure maintains high-resolution information [19]. The discriminator consisted of four convolution layers with kernel size (3×3)33(3\times 3)( 3 × 3 ) and padding one. Each convolution layer was followed by a ReLu activation layer and a batch normalization layer.

The networks were trained using Adam optimizer with a learning rate 2×1042superscript1042\times 10^{-4}2 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to minimize the loss function with batch size four over 25 epochs. All training was performed on our server using PyTorch [27] framework version 2.1.2, and NVIDIA A100 GPUs with CUDA Toolkit 12.2 for all experiments.

Refer to caption
Figure 1: The flowchart of our proposed self-supervised adaptive diffusion model is illustrated. The coil sensitivity 𝒮𝒮\mathcal{S}caligraphic_S, random sampling mask ΩΩ\Omegaroman_Ω, and k-space were illustrated as inputs. Firstly, the random sampling pattern ΩΩ\Omegaroman_Ω is divided into two non-overlap** sampling masks \alephroman_ℵ and ΥΥ\Upsilonroman_Υ that were used in the training path and loss path as given in (13), respectively. Then, the adversarial mapper was trained as given in (9) and (10) using the data sampled in step t+k𝑡𝑘t+kitalic_t + italic_k. The sample yt+ksubscript𝑦𝑡𝑘y_{t+k}italic_y start_POSTSUBSCRIPT italic_t + italic_k end_POSTSUBSCRIPT used to recover y^0subscript^𝑦0\hat{y}_{0}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT where was used to calculate y^tsubscript^𝑦𝑡\hat{y}_{t}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in a given step t𝑡titalic_t Equation (4).

2.4 Sampling masks

We employed a 1d random sampling pattern ΩΩ\Omegaroman_Ω where the k-space center was excluded from sampling.The center fraction excluded from sampling was set to 4%percent44\%4 % of the k-space lines in the horizontal direction. Our proposed ASSCGD divides the acquired sampling mask ΩΩ\Omegaroman_Ω into two disjoint sets \alephroman_ℵ and ΥΥ\Upsilonroman_Υ with except four k-space lines at the center of dataset. Figure 2 illustrates ΩΩ\Omegaroman_Ω, \alephroman_ℵ, and ΥΥ\Upsilonroman_Υ sampling masks for varying ρ0.3,0.5,0.7𝜌0.30.50.7\rho\in{0.3,0.5,0.7}italic_ρ ∈ 0.3 , 0.5 , 0.7 at R=4×R=4\timesitalic_R = 4 ×.

The sampling pattern ΥΥ\Upsilonroman_Υ was randomly sampled for each different slice to train the model. Therefore, our subsampled data using ΥΥ\Upsilonroman_Υ will be able to simulate the ghosting that is present in zero-filled datasets retrospectively subsampled using ΩΩ\Omegaroman_Ω masks. We investigated a uniformly random selection among elements of ΩΩ\Omegaroman_Ω to create subsampled patterns \alephroman_ℵ and ΥΥ\Upsilonroman_Υ. The sampling ratio ρ=Υ{0.3,0.5,0.7}𝜌delimited-∣∣delimited-∣∣Υ0.30.50.7\rho=\frac{\mid\aleph\mid}{\mid\Upsilon\mid}\in\{0.3,0.5,0.7\}italic_ρ = divide start_ARG ∣ roman_ℵ ∣ end_ARG start_ARG ∣ roman_Υ ∣ end_ARG ∈ { 0.3 , 0.5 , 0.7 } was used to train and test the proposed model, where \mid\bullet\mid∣ ∙ ∣ is the number of elements. In other words, ρ=0.3𝜌0.3\rho=0.3italic_ρ = 0.3 means 30% of ΩΩ\Omegaroman_Ω was randomly assigned to \alephroman_ℵ to train the model and the rest were assigned to ΥΥ\Upsilonroman_Υ sampling pattern to define loss function as given in Equation (13).

Refer to caption
Figure 2: The sampling masks for three different splitting ratios ρ=Υ{0.3,0.5,0.7}𝜌delimited-∣∣delimited-∣∣Υ0.30.50.7\rho=\frac{\mid\aleph\mid}{\mid\Upsilon\mid}\in\{0.3,0.5,0.7\}italic_ρ = divide start_ARG ∣ roman_ℵ ∣ end_ARG start_ARG ∣ roman_Υ ∣ end_ARG ∈ { 0.3 , 0.5 , 0.7 } at R=4×R=4\timesitalic_R = 4 × are illustrated.

2.5 Dataset

We utilized publicly available single-coil and multi-coil brain MRI datasets. Both datasets were retrospectively under-sampled using a random sample provided in the fastMRI database, with acceleration rates (R) equal to 2, 6, and 8 [28, 29].

2.5.1 Single-coil data:

The brain MPI-Leipzig Mind-Brain-Body dataset [30, 31] was used to train and test our proposed model. We utilized high-resolution MP2RAGE \chT1 maps of 318 patients that were split into two non-overlap** trains and tested with 250 and 68 patients, respectively. The sagittal acquisition orientation of volumetric MP2RAGE \chT1 maps with 176 slices were acquired with the imaging parameters: \chT_R =5000absent5000=5000= 5000 ms , \chT_E =2.92absent2.92=2.92= 2.92 ms, \chTI1=700absent700=700= 700 ms, \chTI2=2500absent2500=2500= 2500 ms, \chFA_1 = 4superscript44^{\circ}4 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, \chFA_2 = 5superscript55^{\circ}5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, pre-scan normalization, echo spacing =6.9absent6.9=6.9= 6.9 ms, bandwidth =240absent240=240= 240 Hz/pixel, FOV =256absent256=256= 256 mm, voxel size =1absent1=1= 1 mm3 isotropic, GRAPPA acceleration factor 3, slice order = interleaved, duration= 8 min 22 s.

2.5.2 Multi-coil data:

The brain datasets used in the preparation of this article were obtained from the NYU fastMRI Initiative database222fastmri.med.nyu.edu that was approved by the NYU School of Medicine Institutional Review Board [28, 29]. We utilized \chT2-weighted (\chT2-w) images of 1051 patients’ data for training and 325 patients’ unseen data for testing. In addition, we evaluated the robustness of our model to domain shift using two out-of-distribution (OOD) unseen datasets, \chT1-weighted (\chT1-w) and postcontrast \chT1-w (T1c), using 50 patients’ data for each MRI sequence.

The sensitivity maps 𝒮𝒮\mathcal{S}caligraphic_S were generated from 24×24242424\times 2424 × 24 center of k-space using ESPIRiT 333https://mrirecon.github.io/bart/ [32] with a kernel size 6×6666\times 66 × 6 and the calibration matrix and eigenvalue decomposition threshold 0.02 and 0.95, respectively.

2.6 Quantitative analysis

To evaluate the performance of the proposed ASSCGD model, we compared it against two benchmark models: SS-MRI [33] and ReconFormer [19], which were trained under our proposed sampling approach.

Three quantitative metrics were employed: normalized mean square error (NMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), utilizing the PyTorch Image Quality library 444https://piq.readthedocs.io/en/latest/index.html [34]. The NMSE compared reconstructed y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG with ground truth y𝑦yitalic_y as follows:

NMSE(y,y^)=yy^22y22NMSE𝑦^𝑦superscriptsubscriptnorm𝑦^𝑦22superscriptsubscriptnorm𝑦22\centering\text{NMSE}(y,\hat{y})=\dfrac{\parallel y-\hat{y}\parallel_{2}^{2}}{% \parallel y\parallel_{2}^{2}}\@add@centeringNMSE ( italic_y , over^ start_ARG italic_y end_ARG ) = divide start_ARG ∥ italic_y - over^ start_ARG italic_y end_ARG ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∥ italic_y ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (15)

where 2\parallel\bullet\parallel_{2}∥ ∙ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is the squared Euclidean distance. Lower NMSE values indicate better image reconstruction. However, it might be in favor of blurry images [28].

The PSNR defined below utilizes a logarithmic scaling that makes the quantification results more aligned with human perception [35].

PSNR(y,y^)=ymax21Ni=1N(yiy^i)2PSNR𝑦^𝑦superscriptsubscript𝑦21𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝑦𝑖subscript^𝑦𝑖2\centering\text{PSNR}(y,\hat{y})=\dfrac{y_{\max}^{2}}{\frac{1}{N}\sum_{i=1}^{N% }(y_{i}-\hat{y}_{i})^{2}}\@add@centeringPSNR ( italic_y , over^ start_ARG italic_y end_ARG ) = divide start_ARG italic_y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (16)

where ymaxsubscript𝑦y_{\max}italic_y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT is the maximum signal intensity of ground truth images. Higher PSNR indicates better image reconstruction.

The SSIM quantifies the structural similarity between the reconstructed y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG and ground truth y𝑦yitalic_y is defined as follows:

SSIM(y,y^)=(2μyμy^+c1)(2σyy^+c2)(μy2+μy^2+c1)(σy2+σy^2+c2)SSIM𝑦^𝑦2subscript𝜇𝑦subscript𝜇^𝑦subscript𝑐12subscript𝜎𝑦^𝑦subscript𝑐2superscriptsubscript𝜇𝑦2superscriptsubscript𝜇^𝑦2subscript𝑐1superscriptsubscript𝜎𝑦2superscriptsubscript𝜎^𝑦2subscript𝑐2\centering\text{SSIM}(y,\hat{y})=\dfrac{(2\mu_{y}\mu_{\hat{y}}+c_{1})(2\sigma_% {y\hat{y}}+c_{2})}{(\mu_{y}^{2}+\mu_{\hat{y}}^{2}+c_{1})(\sigma_{y}^{2}+\sigma% _{\hat{y}}^{2}+c_{2})}\@add@centeringSSIM ( italic_y , over^ start_ARG italic_y end_ARG ) = divide start_ARG ( 2 italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( 2 italic_σ start_POSTSUBSCRIPT italic_y over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG start_ARG ( italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_μ start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG (17)

where μysubscript𝜇𝑦\mu_{y}italic_μ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and μy^subscript𝜇^𝑦\mu_{\hat{y}}italic_μ start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT are the average voxel values of y𝑦yitalic_y and y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG images, σysubscript𝜎𝑦\sigma_{y}italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT σy^subscript𝜎^𝑦\sigma_{\hat{y}}italic_σ start_POSTSUBSCRIPT over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT are the variance, and σyy^subscript𝜎𝑦^𝑦\sigma_{y\hat{y}}italic_σ start_POSTSUBSCRIPT italic_y over^ start_ARG italic_y end_ARG end_POSTSUBSCRIPT is the covariance between y𝑦yitalic_y and y^^𝑦\hat{y}over^ start_ARG italic_y end_ARG images. The constants c1subscript𝑐1c_{1}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and c2subscript𝑐2c_{2}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT stabilize the division, we used c1=0.01ymaxsubscript𝑐10.01subscript𝑦c_{1}=0.01y_{\max}italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.01 italic_y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and c2=0.03ymaxsubscript𝑐20.03subscript𝑦c_{2}=0.03y_{\max}italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.03 italic_y start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. SSIM ranges between -1 and +1, with the best similarity achieved by an SSIM equal to +1.

2.7 Statistical analysis

The quantitative metrics were compared using one-way analysis of variance (ANOVA) to evaluate the null hypothesis that the mean values of each method were the same. The differences with p <0.05absent0.05<0.05< 0.05 was considered statistically significant. Additionally, a multi-comparison Tukey’s honestly test difference (HSD) was performed to evaluate pairwise differences between the methods, with p <0.05absent0.05<0.05< 0.05 indicating statistical significance.

We reported the average values of the quantitative metrics. In addition, we calculated the 95%percent9595\%95 % confidence intervals (CIs) on the average values using the percentile bootstrap method (with n=10000𝑛10000n=10000italic_n = 10000 iterations) with the bias-adjusted and accelerated bootstrap method [36].

3 Results

In this section, we present the comprehensive evaluation of our proposed ASSCGD model for MRI reconstruction. The results are organized to highlight the model’s performance across different datasets and under various conditions.

We assess the within-domain multi-coil \chT2-w reconstruction and single-coil MP2RAGE \chT1 map reconstruction capabilities of ASSCGD using both qualitative and quantitative metrics. The model’s ability to preserve fine structures and reduce artifacts is demonstrated through visual comparisons and statistical analysis. The impact of loss mask partitioning on the performance of the proposed method is demonstrated.

We also evaluate the robustness of ASSCGD against domain shifts using OOD datasets. This analysis illustrates the model’s adaptability to new and unseen data.

3.1 Within-domain reconstruction

Our proposed ASSCGD was evaluated for within-domain reconstruction at R=2×R=2\timesitalic_R = 2 ×, 4×4\times4 ×, and 8×8\times8 × with a partitioning rate ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5. We conducted both qualitative and quantitative comparisons with the ReconFormer Transformer and SS-MRI models.

3.1.1 Qualitative Results

Multi-coil dataset:

Figures 3 and 4 present the qualitative results of within-domain \chT2-w multi-coil images for two subjects. Figure 3(a) shows results for a healthy volunteer, highlighting that our method recovers more details, particularly in the gold boxed regions, compared to the ReconFormer model trained with our framework. Difference maps in Figure 3(b) illustrate that our method better preserves gray matter and edges, indicated by the gold and red boxes.

Refer to caption
Figure 3: Within-domain axial \chT2-w image reconstruction at R=2×R=2\timesitalic_R = 2 ×, 4×4\times4 ×, and 8×8\times8 × and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 are illustrated. (a) illustrates the results for a healthy subject and (b) illustrates the difference map between the reconstructed and ground truth images.
Refer to caption
Figure 4: Within-domain axial \chT2-w image reconstruction at R=2×R=2\timesitalic_R = 2 ×, 4×4\times4 ×, and 8×8\times8 × and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 are illustrated. (a) illustrates the results for a subject with evident abnormality shown by a red box and (b) illustrates the difference map between the reconstructed and ground truth images.

Figure 4(a) displays results for a patient with brain abnormalities. Our proposed method could recover the abnormality boundary shown by a red box close to the ground truth. Furthermore, our method reconstructed details better (gold boxes) compared with the SS-MRI method. Difference maps in Figure 4(b) confirm these observations.

Single-coil dataset:

Figure 5 presents the qualitative results for high-resolution single-coil MP2RAGE \chT1 quantitative map for a healthy volunteer at R=2×R=2\timesitalic_R = 2 ×, 4×4\times4 ×, and 8×8\times8 × with a partitioning rate ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5. Our method could preserve the small structure shown by gold boxes as well as the Putamen and Caudate nuclei shown by red boxes with better spatial contrast than the comparative models, as shown in Figure 5(a) and the difference map shown in Figure 5(b).

Refer to caption
Figure 5: Within-domain MP2RAGE \chT1 quantitative map reconstructed at R{2×,4×,8×}R\in\{2\times,4\times,8\times\}italic_R ∈ { 2 × , 4 × , 8 × } and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 are illustrated. (a) illustrates the results for a subject and (b) illustrates the difference map between the reconstructed and the ground truth images.

3.1.2 Quantitative Results

Tables 1 and 2 summarize the quantitative metrics for multi-coil and single-coil datasets, respectively, at different acceleration rates. The ANOVA test indicated statistically significant differences (p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT) between the average values of the methods. The results of the Tukey’s HSD multi-comparison are presented in the text and tables.

Multi-coil dataset:

The quantitative results are listed in Table 1 at R=2×R=2\timesitalic_R = 2 ×, 4×4\times4 ×, and 8×8\times8 × and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5. The ANOVA test indicated p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT for all metrics indicating that there are statistically significant differences between average values. Our method got the lowest NMSE for all acceleration rates except for R=2×R=2\timesitalic_R = 2 × and R=4×R=4\timesitalic_R = 4 ×. The ReconFormer method achieved the lowest NMSE value at R=2×R=2\timesitalic_R = 2 ×, nonetheless, it was not statistically significantly different from our method (p =0.65absent0.65=0.65= 0.65). Although our method achieved the lowest NMSE value for R=8×R=8\timesitalic_R = 8 ×, but it was not statistically significantly different from ReconFormer (p =0.83absent0.83=0.83= 0.83). Our proposed self-supervised method achieved the highest PSNR and SSIM values for all acceleration rates, demonstrating the lowest remaining spatial distortion, like ghosting, and the highest structural similarities between the ground truth and reconstructed images, respectively.

Table 1: Within-domain performance for multi-coil axial \chT2-w fastMRI at R{2×,4×,8×}R\in\{2\times,4\times,8\times\}italic_R ∈ { 2 × , 4 × , 8 × } and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 are provided. The arrows indicate directions of better performance.
Zero filled Reconformer SS-MRI Ours
2×\times× 14.99 (12.95 - 17.14) 0.47 (0.45 - 0.50) 0.52(0.50 - 0.56) 0.51 (0.48 - 0.54)
4×\times× 26.38 (24.43 - 28.41) 1.51 (1.42 - 1.64) 4.87 (4.76 - 4.98) 1.26 (1.18 - 1.35)
NMSE (95% CI) [%] \downarrow 8×\times× 30.78 (28.49 - 33.11) 3.34 (3.14 - 3.63) 5.16 (5.06 - 5.26) 3.26 (3.08 - 3.51)
2×\times× 17.75 (17.64 - 17.86) 38.80 (38.65 - 38.96) 39.05 (38.90 - 39.20) 39.93 (39.78 - 40.09)
4×\times× 17.64 (17.53 - 17.77) 34.23 (34.10 - 34.37) 27.93 (27.79 - 28.06) 35.44 (35.31 - 35.58)
PSNR (95% CI) [dB] \uparrow 8×\times× 18.75 (18.62 - 18.87) 27.54 (27.43 - 27.65) 30.65 (30.52 - 30.76) 31.67 (31.55 - 31.79)
2×\times× 81.53(81.06 - 81.98) 96.10(96.02 - 96.17) 96.33 (96.15 - 96.49) 98.14 (98.06 - 98.21)
4×\times× 70.89(70.44 - 71.36) 93.36(93.19 - 93.51) 76.72 (76.32 - 77.15) 95.55 (95.38 - 95.69)
SSIM (95% CI) [%] \uparrow 8×\times× 65.17(64.65 - 65.68) 89.72 (89.49 - 89.93) 77.01 (76.63 - 77.39) 91.67 (91.45 - 91.89)
indicates p-value >0.05absent0.05>0.05> 0.05 of Tukey’s HSD.
Single-coil dataset:

The quantitative metrics for the within-domain single-coil high-resolution MP2RAGE T1 map are summarized in Table 2. One-way ANOVA tests indicated significant differences (p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT) between methods for all metrics. Tukey’s HSD test confirmed that ASSCGD performed significantly better than comparative methods for most metrics. Our method achieved the lowest NMSE values that were statistically different from the comparative methods for all acceleration rates except with R=2𝑅2R=2italic_R = 2 where our method achieved a performance that did differ statistically significant (p-value =0.97absent0.97=0.97= 0.97) from ReconFormer. Our method achieved the highest PSNR values for all acceleration rates that were statistically significantly (p <0.05absent0.05<0.05< 0.05) from comparative methods. Furthermore, our proposed method achieved the highest SSIM at R=4×R=4\timesitalic_R = 4 × that differed statistically significantly higher (p <0.05absent0.05<0.05< 0.05) than the other methods. Nonetheless, our method and ReconFormer performed similarly in terms of the SSIM index at R=2×R=2\timesitalic_R = 2 × and 8×8\times8 × with p of 0.150.150.150.15 and 0.610.610.610.61, respectively.

Table 2: Within-domain performance for single-coil high resolution MP2RAGE T1 map at R{2×,4×,8×}R\in\{2\times,4\times,8\times\}italic_R ∈ { 2 × , 4 × , 8 × } and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 are provided. The arrows indicate directions of better performance.
Zero filled Reconformer SS-MRI Ours
2×\times× 15.37 (13.46 - 17.49) 0.53 (0.52 - 0.55) 5.18 (4.97 - 5.40) 0.56 (0.54 - 0.57)
4×\times× 22.43 (20.65 - 24.33) 2.04 (1.99 - 2.09) 8.75 (8.36 - 9.18) 1.87 (1.82 - 1.93)
NMSE (95% CI) [%] \downarrow 8×\times× 33.17 (31.79 - 34.72) 3.99 (3.90 - 4.10) 10.76 (10.42 - 11.13) 3.65 (3.56 - 3.75)
2×\times× 15.46 (15.39 - 15.53) 35.42(35.31 - 35.53) 23.07(22.98 - 23.16) 36.15(36.03 - 36.26)
4×\times× 15.35 (15.28 - 15.43) 29.46(29.37 - 29.56) 21.31(21.21 - 21.40) 30.51(30.42 - 30.61)
PSNR (95% CI) [dB] \uparrow 8×\times× 16.31 (16.23 - 16.40) 26.54(26.45 - 26.65) 20.75(20.66 - 20.85) 27.94(27.84 - 28.04)
2×\times× 73.83(73.43 - 74.21) 95.75(95.68 - 95.81) 85.12(84.57 - 85.65) 95.31(95.25 - 95.38)
4×\times× 68.62(68.29 - 68.94) 89.11(88.96 - 89.26) 80.47(80.06 - 80.89) 89.80(89.63 - 89.95)
SSIM (95% CI) [%] \uparrow 8×\times× 66.13(65.83 - 66.42) 84.55(84.32 - 84.78) 77.17(76.75 - 77.62) 84.77(84.53 - 84.99)
indicates p-value >0.05absent0.05>0.05> 0.05 of Tukey’s HSD.

3.1.3 Voxel-wise correlation

We visualized the scatter plot for 80 randomly selected within-domain multi-coil and single-coil data to visualize and quantify the agreement between reconstructed images at R=8×R=8\timesitalic_R = 8 × and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 with the ground truth reference images (see Figure 6). For the single-coil dataset, our method reconstructed images shown in Figure 6(c) with better conformity with the references markedly better than SS-MRI shown in Figure 6(a) and comparable to ReconFormer shown in Figure 6(b). The same results were achieved for multi-coil data where the SS-MRI performed better than itself trained using the single-coil dataset. Still there is a noticeable gap between our proposed (see Figure 6(f)) method and SS-MRI (see Figure 6(d)). Furthermore, ground truth voxel-wise shown in Figure 6(e) and (f) confirms that our method might be able to generate images that are more similar to the ground truth.

Refer to caption
Figure 6: The scatter plots between ground truth reference images and reconstructed images at R=8×R=8\timesitalic_R = 8 × and ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 for single-coil high-resolution MP2RAGE T1 map and multi-coil axial T2-w are illustrated in the first row and second row, respectively. Single-coil (a) SS-MRI, (b) ReconFormer, and (c) Ours. Multi-coil (d) SS-MRI, (e) ReconFormer, and (f) Ours.

3.1.4 Effect of Sampling Mask Ratio

Our proposed method split the sampling mask ΩΩ\Omegaroman_Ω randomly into two non-overlap** masks \alephroman_ℵ and ΥΥ\Upsilonroman_Υ, which were used in train and loss paths, respectively (see Figure 1). The ratio ρ=Υ𝜌delimited-∣∣delimited-∣∣Υ\rho=\frac{\mid\aleph\mid}{\mid\Upsilon\mid}italic_ρ = divide start_ARG ∣ roman_ℵ ∣ end_ARG start_ARG ∣ roman_Υ ∣ end_ARG plays a role in the network’s performance. We trained and tested the network for varying ρ{0.3,0.5,0.7}𝜌0.30.50.7\rho\in\{0.3,0.5,0.7\}italic_ρ ∈ { 0.3 , 0.5 , 0.7 } (see Figure 2 for the sampling masks). The effect of network training with different ρ𝜌\rhoitalic_ρs on the quantitative metrics NMSE, PSNR, and SSIM are shown in Figure 7 at three acceleration rates R{2,4,8}𝑅248R\in\{2,4,8\}italic_R ∈ { 2 , 4 , 8 } that were acquired using multi-coil axial T2-w images. Our method achieved better performance at ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 in terms of PSNR and SSIM at R=2×R=2\timesitalic_R = 2 × and R=4×R=4\timesitalic_R = 4 ×. However, it performed similarly to ρ=0.7𝜌0.7\rho=0.7italic_ρ = 0.7 at R=8×R=8\timesitalic_R = 8 ×. Furthermore, NMSE metrics indicate that our method achieved the best performance when trained using ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5 than other splitting ratios ρ{0.3,0.7}𝜌0.30.7\rho\in\{0.3,0.7\}italic_ρ ∈ { 0.3 , 0.7 }.

Refer to caption
Figure 7: The effect of network training with varying ρ=Υ𝜌Υ\rho=\frac{\aleph}{\Upsilon}italic_ρ = divide start_ARG roman_ℵ end_ARG start_ARG roman_Υ end_ARG is illustrated for within-domain multi-coil axial T2-w fastMRI images.

3.2 Out-of-domain reconstruction

We evaluated our model performance in OOD reconstructions where our proposed method was trained using multi-coil axial \chT2-w fastMRI images and then tested using axial \chT1c and \chT1-w shown, respectively, in first and second rows of Figure 8 at R{2×,4×,8×}R\in\{2\times,4\times,8\times\}italic_R ∈ { 2 × , 4 × , 8 × }. Our method statistically significant (p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT) improved quantitative metrics.

Our method achieved significantly lower NMSE values (p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT), indicating high voxel-wise similarity between the reconstructed and ground truth images for \chT1-w and \chT1c illustrated in Figures 8(a) and (d), respectively. In addition, high PSNR values confirm minimal remaining spatial distortion in the reconstructed images for \chT1-w and \chT1c illustrated in Figures 8(b) and (e), respectively. Finally, The high SSIM values demonstrate a remarkable resemblance between the reconstructed and ground truth images (see Figures 8(c) and (f)).

Refer to caption
Figure 8: Out-of-distribution reconstruction quantitative results are illustrated for multi-coil axial T1-w and T1c in the first and second rows, respectively, for three difference acceleration rates.

4 Discussion

This study introduces the Adaptive Self-supervised Consistency-guided Diffusion (ASSCGD) model, a novel self-supervised deep learning-based compressed sensing MRI (DL-based CS-MRI) method designed to mitigate the challenges associated with prolonged MRI acquisition times and the necessity for fully sampled datasets. Our approach leverages the synergy between CS and DL to accelerate MRI acquisition without compromising image quality.

The ASSCGD model’s primary innovation lies in its use of an adversarial mapper within a self-supervised framework, eliminating the dependence on fully sampled training datasets. This advancement is particularly significant for clinical scenarios where acquiring fully sampled data is impractical. By integrating a diffusion model, our method enhances sampling efficiency and reconstruction quality. The backward diffusion process, executed in smaller steps, contributes to robust and efficient image sampling. Extensive testing on OOD datasets demonstrated significant improvements in NMSE, PSNR, and SSIM metrics. These results highlight the model’s ability to generalize across various MRI sequences and patient-specific conditions, underscoring its versatility for clinical use.

The majority of DL-based CS-MRI methods use supervised learning to train networks [14, 16, 17, 19, 25]. However, acquiring fully sampled can be challenging in some practical applications due to the long acquisition time, physiological constraints, and signal decay. For instance, fully sampled high-resolution brain MP2RAGE \chT1 maps can take around 24 minutes to acquire, which is impractical for large-scale studies and may lead to patient discomfort unless using acceleration approaches. Such a long acquisition time increases the likelihood of patient movements, which could substantially reduce the image quality. Thus, being able to use self-supervised DL-based CS MRI approaches is imperative to broaden their applications where acquiring such data are challenging. By integrating a diffusion model, our method enhances reconstruction quality. The backward diffusion process, executed in smaller steps, contributes to robust and efficient image sampling. Extensive testing on within domain and OOD datasets demonstrated significant improvements in NMSE, PSNR, and SSIM metrics. These results highlight the model’s ability to generalize across various MRI sequences and patient-specific conditions, underscoring its versatility for clinical use.

Several self-supervised studies have been proposed to train models without using fully sampled data. For instance, a data-driven method of de-aliasing was proposed for single-coil data that performed an image-to-image translation [37]. However, it did not encode the operator and used similar sampling patterns for training and loss that increased noise during test time. Alternatively, a self-supervised study was proposed, assuming data acquired using two different sampling patterns [38]. Furthermore, physics-driven methods that unroll the training process were also proposed [20]. However, the unrolling nature of the method might increase the test burden and time.

Our self-supervised method uses a DC layer that was trained end-to-end and utilizes the sampling pattern that is feasible to implement clinically. Our experiment using the multi-coil and single-coil datasets enabled us to recover target ground truth images at markedly lower acquisition times (see Figures 3 - 5). Specifically, our method could recover fine details highlighted by gold and red boxes better than the comparative models. The negligible amount of remaining aliasing in the reconstructed images was confirmed by Figures 3 - 5. Furthermore, this observation was confirmed by the low NMSE and high PSNR achieved in Tables 1 and 2.

Although our method could achieve the best performance for ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5, Figure 7 suggests that our method might be resilient to variations in the hyper-parameter ρ𝜌\rhoitalic_ρ around ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5. In addition, out-of-domain reconstruction indicated in Figure 8 with significantly better performance (p 105much-less-thanabsentsuperscript105\ll 10^{-5}≪ 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT) indicates reasonable robustness against domain shifts. However, the results indicate that our method performed better on T1-w (first row in Figure 8) images than when tested on T1c (second row in Figure 8). That might be due to contrast agent enhancement in T1c in the image regions that are not evident in T2-w images.

Despite the promising outcomes, several limitations need to be addressed. Our model was not tested on prospectively undersampled raw k-space datasets acquired under parallel imaging frameworks. Future work should explore the application of ASSCGD to these datasets to validate its clinical utility further. In addition, we did not train our method using raw multi-coil high-resolution 3D MRI, such as \chT1 magnetization-prepared rapid acquisition gradient echoes MRI images because they are not readily available to the end-user. . Expanding the training dataset to include such data could enhance the model’s performance.

5 Conclusions

The Adaptive Self-supervised Consistency-guided Diffusion (ASSCGD) model significantly improves the quality of reconstructed MRI images from undersampled data without requiring fully sampled training datasets, offering a promising solution for accelerating MRI acquisition. The proposed method has the potential to enhance clinical MRI practices by reducing scan times and improving image quality, which is crucial for accurate diagnosis and treatment planning. Our method is particularly useful in radiation oncology by reducing the imaging time and the likelihood of motion artifacts, specifically for MRI-linear accelerators.

Conflicts of interest

There are no conflicts of interest declared by the authors.

Acknowledgement

This research is supported in part by the National Institutes of Health under Award Number R56EB033332, R01EB032680, and R01CA272991.

Data availability

Brain MRI data were obtained from the New York University fastMRI initiative database (https://fastmri.med.nyu.edu/) and the OpenNeuro MPI-Leipzig Mind-Brain-Body dataset (https://openneuro.org/datasets/ds000221/versions/00002). The datasets were acquired with the relevant institutional review board approvals.

References

  • [1] Hedvig Hricak, May Abdel-Wahab, Rifat Atun, Miriam Mikhail Lette, Diana Paez, James A Brink, Lluís Donoso-Bach, Guy Frija, Monika Hierath, Ola Holmberg, et al. Medical imaging and nuclear medicine: a lancet oncology commission. The Lancet Oncology, 22(4):e136–e172, 2021.
  • [2] Yilong Liu, Alex TL Leong, Yujiao Zhao, Linfang Xiao, Henry KF Mak, Anderson Chun On Tsang, Gary KK Lau, Gilberto KK Leung, and Ed X Wu. A low-cost and shielding-free ultra-low-field brain mri scanner. Nature communications, 12(1):7238, 2021.
  • [3] Mojtaba Safari, Xiaofeng Yang, Chih-Wei Chang, Richard LJ Qiu, Ali Fatemi, and Louis Archambault. Unsupervised mri motion artifact disentanglement: introducing maudgan. Physics in Medicine and Biology, 2024.
  • [4] Justin P Haldar, Diego Hernando, and Zhi-Pei Liang. Compressed-sensing mri with random encoding. IEEE transactions on Medical Imaging, 30(4):893–903, 2010.
  • [5] Anagha Deshmane, Vikas Gulani, Mark A Griswold, and Nicole Seiberlich. Parallel mr imaging. Journal of Magnetic Resonance Imaging, 36(1):55–72, 2012.
  • [6] Ricardo Otazo, Daniel Kim, Leon Axel, and Daniel K Sodickson. Combination of compressed sensing and parallel imaging for highly accelerated first-pass cardiac perfusion mri. Magnetic resonance in medicine, 64(3):767–776, 2010.
  • [7] Yuchou Chang, Dong Liang, and Leslie Ying. Nonlinear grappa: A kernel approach to parallel mri reconstruction. Magnetic Resonance in Medicine, 68(3):730–740, 2012.
  • [8] Mojtaba Safari, Zach Eidex, Chih-Wei Chang, Richard LJ Qiu, and Xiaofeng Yang. Fast mri reconstruction using deep learning-based compressed sensing: A systematic review. arXiv preprint arXiv:2405.00241, 2024.
  • [9] Zhaoyang ** and Qing-San Xiang. Improving accelerated mri by deep learning with sparsified complex data. Magnetic Resonance in Medicine, 89(5):1825–1838, 2023.
  • [10] Chentao Cao, Zhuo-Xu Cui, Yue Wang, Shaonan Liu, Tai** Chen, Hairong Zheng, Dong Liang, and Yanjie Zhu. High-frequency space diffusion model for accelerated mri. IEEE Transactions on Medical Imaging, 2024.
  • [11] Dominik Narnhofer, Kerstin Hammernik, Florian Knoll, and Thomas Pock. Inverse gans for accelerated mri reconstruction. In Wavelets and Sparsity XVIII, volume 11138, pages 381–392. SPIE, 2019.
  • [12] Andreas Kofler, Marie-Christine Pali, Tobias Schaeffter, and Christoph Kolbitsch. Deep supervised dictionary learning by algorithm unrolling—application to fast 2d dynamic mr image reconstruction. Medical Physics, 50(5):2939–2960, 2023.
  • [13] Biao Qu, Jialue Zhang, Taishan Kang, Jianzhong Lin, Mei** Lin, Huajun She, Qingxia Wu, Meiyun Wang, and Gaofeng Zheng. Radial magnetic resonance image reconstruction with a deep unrolled projected fast iterative soft-thresholding network. Computers in Biology and Medicine, 168:107707, 2024.
  • [14] Mojtaba Safari, Xiaofeng Yang, and Ali Fatemi. Mri data consistency guided conditional diffusion probabilistic model for mr imaging acceleration. In Medical Imaging 2024: Clinical and Biomedical Imaging, volume 12930, pages 202–205. SPIE, 2024.
  • [15] Alper Güngör, Salman UH Dar, Şaban Öztürk, Yilmaz Korkmaz, Hasan A Bedel, Gokberk Elmas, Muzaffer Ozbey, and Tolga Çukur. Adaptive diffusion priors for accelerated mri reconstruction. Medical Image Analysis, 88:102872, 2023.
  • [16] Maarten L Terpstra, Matteo Maspero, Joost JC Verhoeff, and Cornelis AT van den Berg. Accelerated respiratory-resolved 4d-mri with separable spatio-temporal neural networks. Medical physics, 50(9):5331–5342, 2023.
  • [17] Jo Schlemper, Jose Caballero, Joseph V Hajnal, Anthony N Price, and Daniel Rueckert. A deep cascade of convolutional neural networks for dynamic mr image reconstruction. IEEE transactions on Medical Imaging, 37(2):491–503, 2017.
  • [18] Kawin Setsompop, R Kimmlingen, E Eberlein, Thomas Witzel, Julien Cohen-Adad, Jennifer A McNab, Boris Keil, M Dylan Tisdall, P Hoecht, Peter Dietz, et al. Pushing the limits of in vivo diffusion mri for the human connectome project. Neuroimage, 80:220–233, 2013.
  • [19] Pengfei Guo, Yiqun Mei, **yuan Zhou, Shanshan Jiang, and Vishal M Patel. Reconformer: Accelerated mri reconstruction using recurrent transformer. IEEE transactions on medical imaging, 2023.
  • [20] Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller, Jutta Ellermann, Kâmil Uğurbil, and Mehmet Akçakaya. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magnetic resonance in medicine, 84(6):3172–3191, 2020.
  • [21] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  • [22] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  • [23] Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
  • [24] Stanley H Chan. Tutorial on diffusion models for imaging and vision. arXiv preprint arXiv:2403.18103, 2024.
  • [25] Jiahao Huang, Angelica I Aviles-Rivero, Carola-Bibiane Schönlieb, and Guang Yang. Cdiffmr: Can we replace the gaussian noise with k-space undersampling for fast mri? In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 3–12. Springer, 2023.
  • [26] Mojtaba Safari, Xiaofeng Yang, Ali Fatemi, and Louis Archambault. Mri motion artifact reduction using a conditional diffusion probabilistic model (mar-cdpm). Medical Physics, 51(4):2598–2610, 2024.
  • [27] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  • [28] Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J. Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, Nafissa Yakubova, James Pinkerton, Duo Wang, Erich Owens, C. Lawrence Zitnick, Michael P. Recht, Daniel K. Sodickson, and Yvonne W. Lui. fastMRI: An open dataset and benchmarks for accelerated MRI, 2018.
  • [29] Florian Knoll, Jure Zbontar, Anuroop Sriram, Matthew J Muckley, Mary Bruno, Aaron Defazio, Marc Parente, Krzysztof J Geras, Joe Katsnelson, Hersh Chandarana, et al. fastmri: A publicly available raw k-space and dicom dataset of knee images for accelerated mr image reconstruction using machine learning. Radiology: Artificial Intelligence, 2(1):e190007, 2020.
  • [30] Anahit Babayan, Blazeij Baczkowski, Roberto Cozatl, Maria Dreyer, Haakon Engen, Miray Erbey, Marcel Falkiewicz, Nicolas Farrugia, Michael Gaebler, Johannes Golchert, Laura Golz, Krzysztof Gorgolewski, Philipp Haueis, Julia Huntenburg, Rebecca Jost, Yelyzaveta Kramarenko, Sarah Krause, Deniz Kumral, Mark Lauckner, Daniel S. Margulies, Natacha Mendes, Katharina Ohrnberger, Sabine Oligschläger, Anastasia Osoianu, Jared Pool, Janis Reichelt, Andrea Reiter, Josefin Röbbig, Lina Schaare, Jonathan Smallwood, and Arno Villringer. ”mpi-leipzig-mind-brain-body”, 2018.
  • [31] Anahit Babayan, Miray Erbey, Deniz Kumral, Janis D Reinelt, Andrea MF Reiter, Josefin Röbbig, H Lina Schaare, Marie Uhlig, Alfred Anwander, Pierre-Louis Bazin, et al. A mind-brain-body dataset of mri, eeg, cognition, emotion, and peripheral physiology in young and old adults. Scientific data, 6(1):1–21, 2019.
  • [32] Martin Uecker, Peng Lai, Mark J Murphy, Patrick Virtue, Michael Elad, John M Pauly, Shreyas S Vasanawala, and Michael Lustig. Espirit—an eigenvalue approach to autocalibrating parallel mri: where sense meets grappa. Magnetic resonance in medicine, 71(3):990–1001, 2014.
  • [33] Chen Hu, Cheng Li, Haifeng Wang, Qiegen Liu, Hairong Zheng, and Shanshan Wang. Self-supervised learning for mri reconstruction with a parallel network training framework. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VI 24, pages 382–391. Springer, 2021.
  • [34] Sergey Kastryulin, Jamil Zakirov, Denis Prokopenko, and Dmitry V. Dylov. Pytorch image quality: Metrics for image quality assessment, 2022.
  • [35] Zach Eidex, **g Wang, Mojtaba Safari, Eric Elder, Jacob Wynne, Tonghe Wang, Hui-Kuo Shu, Hui Mao, and Xiaofeng Yang. High-resolution 3t to 7t adc map synthesis with a hybrid cnn-transformer model. Medical Physics, 2024.
  • [36] Bradley Efron and Robert J Tibshirani. Confidence intervals based on bootstrap percentiles. In An introduction to the bootstrap, pages 168–177. Springer, 1993.
  • [37] Ortal Senouf, Sanketh Vedula, Tomer Weiss, Alex Bronstein, Oleg Michailovich, and Michael Zibulevsky. Self-supervised learning of inverse problem solvers in medical imaging. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data: First MICCAI Workshop, DART 2019, and First International Workshop, MIL3ID 2019, Shenzhen, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13 and 17, 2019, Proceedings 1, pages 111–119. Springer, 2019.
  • [38] Jiaming Liu, Yu Sun, Cihat Eldeniz, Weijie Gan, Hongyu An, and Ulugbek S Kamilov. Rare: Image reconstruction using deep priors learned without groundtruth. IEEE Journal of Selected Topics in Signal Processing, 14(6):1088–1099, 2020.