Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography

Yucheng Lu    Zhixin Xu    Moon Hyung Choi    Jimin Kim    and Seung-Won Jung    \IEEEmembershipSenior Member, IEEE This work was supported by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry, and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711139124, KMDF__\__PR__\__20200901__\__0096). (Corresponding author: Seung-Won Jung.)Y. Lu is with the Education and Research Center for Socialware IT, Korea University, Seoul, Korea; and the Department of Datalogi, IT University of Copenhagen, Copenhagen, Denmark (e-mail: [email protected]).Z. Xu and S.-W. Jung is with the Department of Electrical Engineering, Korea University, Seoul, Korea (e-mail: [email protected]; [email protected]).M. H. Choi and J. Kim are with the Department of Radiology, Eunpyeong St. Mary’s Hospital, College of Medicine, The Catholic University of Korea (e-mail: [email protected]; [email protected]).
Abstract

Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effectiveness of learning-based methods, most were developed on the simulated data. However, the real-world scenario differs significantly from the simulation domain, especially when using the multi-slice spiral scanner geometry. This paper proposes a two-stage method for the commercially available multi-slice spiral CT scanners that better exploits the complete reconstruction pipeline for LDCT denoising across different domains. Our approach makes good use of the high redundancy of multi-slice projections and the volumetric reconstructions while leveraging the over-smoothing problem in conventional cascaded frameworks caused by aggressive denoising. The dedicated design also provides a more explicit interpretation of the data flow. Extensive experiments on various datasets showed that the proposed method could remove up to 70% of noise without compromised spatial resolution, and subjective evaluations by two experienced radiologists further supported its superior performance against state-of-the-art methods in clinical practice. Code is available at https://github.com/YCL92/TMD-LDCT.

{IEEEkeywords}

Deep learning, low-dose computed tomography, image and video denoising

1 Introduction

\IEEEPARstart

Computed tomography (CT) is one of the most popular tools used in clinical examinations nowadays due to its non-invasive and volumetric data acquisition advantages. Unlike conventional X-ray studies that project all volumetric information onto a single planar image, CT enables us to restore piles of axial structures through reverse reconstruction, providing rich space of data representation that helps access finer patterns for further evaluation and diagnosis.

Despite its great convenience and performance, there have been significant concerns about the potential health hazard to patients. Cell and organ damage may occur due to excessive exposure if proper measures against ionizing radiation are not considered. Even though the dosage of CT is relatively low, exposure over a protracted time can still increase the risk of develo** cancer [1]. Therefore, minimizing radiation exposure has been carried out with a sense of urgency regarding science and public opinion [2].

Since it is currently impractical to exclude CT from general health examinations, engineers have been working actively on reducing the radiation exposed to the subjects through various techniques, such as enlarging the source field of view, increasing the sensor resolution both horizontally and vertically, improving the detector sensitivity, and increasing the table speed for faster scanning. These efforts have led to considerable performance improvement over the past decades.

In addition to hardware-based solutions, researchers have also paid attention to dosage reduction through software-assisted technology. Two representative methods are sparse-view CT, which uses a reduced number of projections per gantry rotation, and low-dose CT (LDCT), which uses a reduced intensity of the X-ray source. Whereas the former tries to compensate for the artifacts introduced by the missing views, the latter gives rise to an image denoising problem, receiving more attention as similar tasks in other areas (e.g., low-light image enhancement) have been extensively studied [3, 4, 5].

With the success of convolutional neural networks (CNNs) in low-level computer vision tasks such as denoising [6], deblurring [7], and super-resolution [8], they have been rapidly adopted to medical imaging applications. Early studies have shown the promising potential of CNNs on LDCT denoising compared to conventional handcrafted regularizers [9, 10, 11, 12]. However, the protocol of LDCT denoising significantly differs from that of conventional image denoising, requiring further optimization of the obtained data representation (i.e., projections) before the final reconstruction. Unfortunately, many existing works directly borrow ideas from conventional image denoising and apply them to the reconstructed CT slices as a post-processing stage, which can be sub-optimal for LDCT denoising. Although there are a few works in the literature that handle projection data [13, 14, 15, 16, 17], they either operate on simulated 1D parallel-beam projection data which is not aligned with the modern CT scanner geometry, or try to incorporate the entire reconstruction pipeline into a single black-box model. All the above limitations hinder these works from being more optimal and transparent.

To address these problems, we propose a two-stage denoising framework dedicated to multi-slice spiral CT scanners in this paper. Specifically, in the first stage, a projection domain denoising network takes as input the successive projection slices and estimates sequential noise components, which are then rebined and used by the image restoration network in the second stage for further refinement. This two-stage design considers the domain-specific characteristics while avoiding information degradation in common cascaded structures, yielding objectively and subjectively more satisfactory image quality. In summary, the main contributions of this paper are as follows:

  • We propose a two-stage framework for LDCT denoising. The proposed method works across both the projection and image domains. It is specifically optimized for CT scanners with multi-slice helical geometry.

  • We model each stage’s physical properties of noise and artifacts based on the data acquisition process in the reconstruction pipeline. This design improves the denoising performance and gives end-users richer interpretation ability and transparency.

  • We demonstrate through experiments on patient data that our method significantly outperforms existing works both quantitatively and qualitatively. An extensive analysis of phantom scans further supports that the proposed method has achieved state-of-the-art performance.

The remaining sections of this paper are organized as follows: Section II reviews some existing works related to our topic and briefly discusses their limitations, Section III presents the proposed method in full detail, Section IV provides the experiment setup and compares our results with those of several representative works, and finally, Section V concludes the paper.

2 Related Work

Since the data acquisition and image restoration protocol of CT differs from that of conventional digital cameras, the design of an LDCT denoiser can be highly flexible depending on the appearance of the data to be processed. Compared to early works based on handcrafted image priors, such as total variation [18], non-local means [19], dictionary learning [20], block matching and 3D filtering [21], etc., some pioneering works [9, 10, 11, 12] have shown the superior potential of CNNs in LDCT denoising. Thus, we mainly focus on CNN-based methods and classify the most recent research into three categories: post-reconstruction image denoising, model-based iterative image reconstruction and denoising, and cross-domain joint optimization.

2.1 Post-reconstruction Image Denoising

Post-reconstruction image denoising aims at removing noise directly from the reconstructed CT images, which can be formulated as:

Ipr=𝒢I(In),subscript𝐼𝑝𝑟subscript𝒢𝐼subscript𝐼𝑛I_{pr}=\mathcal{G}_{I}\left(I_{n}\right),italic_I start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT = caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , (1)

where Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and Iprsubscript𝐼𝑝𝑟I_{pr}italic_I start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT are the low-dose noisy input and noise-suppressed output predicted by the denoiser 𝒢Isubscript𝒢𝐼\mathcal{G}_{I}caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT, respectively. A significant merit of this approach is that it works in the 2D image domain as a post-processing step, so there is no need to alter the reconstruction pipeline in clinical practice. A considerable number of works fall within this category: Fan et al. [22] replaced the conventional 2D convolution [11] with the quadratic representation. Zavala et al. [23] revisited the perfect reconstruction conditions of the encoder-decoder-structured CNNs with soft-shrinkage and proposed a learnable shrinkage layer to handle decomposed wavelet frames, which was later extended using the over-complete Haar wavelet transform [24]. Liang et al. [25] employed a perceptual loss from a pre-trained VGGNet to their densely connected denoiser. Matsuura et al. [26] extracted context-aware features from images reconstructed with various parameter presets as data augmentation to enhance the quality of the denoised result. Tao et al. [27] observed the specific patterns lying in the stacked view-by-view back-projection tensors and developed a tensor singular value decomposition-based algorithm for LDCT, which was extended to the field of deep learning-based LDCT denoising [28]. Xu et al. [29] applied dynamic filters predicted from a CNN to the extracted image features such that the non-uniformly distributed noise can be separated. Kim et al. [30] proposed a progressive denoising method to remove noise in an iterative manner while injecting synthetic noise into the projections on the fly. Zhang et al. [31] designed a framework consisting of two parallel networks to handle the low-frequency and high-frequency components, where the popular and powerful Transformer architecture [32] was adopted. Similarly, Wang et al. [33] introduced resha**, dilated unfolding, and cyclic shifting operations between successive Transformer blocks to share information across different patches, reaching better performance.

Unfortunately, clean observations can never be reached due to the statistical uncertainty of CT. Hence, the paired dataset used for training actually consists of routine-dose images, which serve as approximations of their noise-free counterparts, and low-dose images, which can be simulated via projection-domain noise injection or image-domain superimposition [34]. As the noise statistics in CT images still follow some common properties such as zero-mean and zero discrepancies, a few researchers adopted the idea of Noise2Noise [35] and Noise2Void [36] in training their models without clean images: Zhang et al. [37] proposed using adjacent slices to approximate self-similarity, whereas Niu et al. [38] further extended it to handle uncorrelated noise and structural artifacts with a broader range of patch-searching volumes. Alternatively, adversarial learning is also an option for unsupervised learning, which is usually achieved by generative adversarial networks (GANs) [39]: Shan et al. [40] employed 3D convolution in the design of the encoder-decoder network and supervised the transfer learning from a pre-trained 2D variant using the Wasserstein distance. Yang et al. [41] combined the supervised and unsupervised learning with a hybrid loss term consisting of an adversarial loss and a perceptual loss to allow denoising while maintaining structural details, which was further improved by Li et al. [42] with the help of the self-attention module as well as the self-supervised perceptual loss. Ghahermani et al. [43] introduced an adversarial distortion learning method that considers the element-wise discrimination loss, reconstruction loss, pyramidal texture loss, and histogram loss in the supervision. Zhang et al. [44] employed an artifact and noise attention network and used an edge feature extraction path to compensate for the over-smoothed details. Gu et al. [45] adopted the cycle consistency and proposed a CycleGAN [46]-based model with adaptive instance normalization layers, achieving improved performance over the conventional CycleGAN at the cost of about only half of the parameters. Similarly, Lee et al. [47] introduced a pseudo network along with the CycleGAN framework and added a bypass consistency to prevent the generator from learning to embed blind information of noise into the output.

2.2 Iterative Image Reconstruction and Denoising

Although the post-reconstruction CT image denoising is simple and fast, an obvious limitation is that the high-pass filter (e.g., ramp filter) in the back-projection operation inevitably amplifies the noise component and introduces signal-dependent artifacts, which makes denoising more challenging and thus deteriorates the performance. To cope with this problem, model-based image reconstruction (MBIR) comes to the rescue, given by:

Ipr=argminIψIPn22+λϕ(I),subscript𝐼𝑝𝑟subscript𝐼superscriptsubscriptnorm𝜓𝐼subscript𝑃𝑛22𝜆italic-ϕ𝐼I_{pr}=\mathop{\arg\min}_{I}\left\|\psi I-P_{n}\right\|_{2}^{2}+\lambda\phi% \left(I\right),italic_I start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT = start_BIGOP roman_arg roman_min end_BIGOP start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∥ italic_ψ italic_I - italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_λ italic_ϕ ( italic_I ) , (2)

where ψ𝜓\psiitalic_ψ is the forward-projection operation that maps the reconstructed image I𝐼Iitalic_I back to the projection domain, Pnsubscript𝑃𝑛P_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT represents the corresponding low-dose noisy observation, ϕitalic-ϕ\phiitalic_ϕ is a regularization function, and λ𝜆\lambdaitalic_λ is a balancing parameter.

The solution of (2) is typically obtained in an iterative manner that updates the reconstructed image by comparing the forward-projection result with the measurement under some constraints to stabilize the optimization. Many methods have been presented to embed pre-trained CNN denoisers as a part of the update protocol and achieved better performance against post-reconstruction denoising: Gupta et al., [48] utilized a CNN to project the objective function onto the data manifold and proposed a relaxed version of the projected gradient descent method that guarantees the convergence of the optimization. Kang et al. [49] reviewed the denoising task under the low-rank Hankel structured matrix constraint and presented a wavelet residual network that learns to impose the low-rankness. Chen et al. [50] proposed to replace the generalized regularization term referred to as the fields of experts [51] with a three-layer CNN, in which the trainable parameters are independent at each iteration. Similar work was presented by Aggarwal et al. [52] with their proposed conjugate gradient optimization-based data consistency layer, enabling the training of the unrolled model to be performed in an end-to-end manner with minimal memory cost. Inspired by [50], Xia et al. [53] further employed a learned graph convolutional network as an additional constraint to enhance non-local topological features in the low-dimensional patch manifold. He et al. [54] reformulated the problem as a dual-domain optimization task and modified the iteration of the alternating direction method of multipliers (ADMM) by using CNNs to represent the gradient, resulting in a parameterized plug-and-play ADMM optimization scheme. Chun et al. [55] introduced BCD-Net [56] to LDCT reconstruction and applied the accelerated proximal gradient method as a fast numerical solver. Ye et al. [57] took both the supervised regularization and the unsupervised regularization into account and proposed an optimization scheme to alternatively update the reconstruction result under specific constraints, where the experiments on several publicly available MBIR-based methods showed improved performance against their vanilla counterparts.

2.3 Cross-domain Joint Optimization

MBIR methods generally yield higher reconstruction quality. However, they significantly increase the reconstruction time and typically occupy more computational resources. To achieve fast inference speed while reserving access to raw projection data, a promising solution is to apply denoising across different data domains that cover the complete reconstruction pipeline, as follows:

Ipr=𝒢I(φ(𝒢P(Pn))),subscript𝐼𝑝𝑟subscript𝒢𝐼𝜑subscript𝒢𝑃subscript𝑃𝑛I_{pr}=\mathcal{G}_{I}\left(\varphi\left(\mathcal{G}_{P}\left(P_{n}\right)% \right)\right),italic_I start_POSTSUBSCRIPT italic_p italic_r end_POSTSUBSCRIPT = caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ( italic_φ ( caligraphic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) ) , (3)

where 𝒢Psubscript𝒢𝑃\mathcal{G}_{P}caligraphic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and 𝒢Isubscript𝒢𝐼\mathcal{G}_{I}caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT are denoisers in the projection domain and image domain, respectively. φ𝜑\varphiitalic_φ is a projection-to-image operation that can be performed by conventional algorithms such as filtered back-projection (FBP) or learned models.

Several dedicated works fall into this category: Würfl et al. [13] utilized a multi-layer perceptron to model the behavior of filtering, back-projection, and non-negative constraint for sinogram-to-image reconstruction. Li et al. [15] designed a model named iCT-Net consisting of a novel back-projection layer for both LDCT denoising and sparse-view reconstruction. He et al. [16] presented a deep learning-based Radon inversion framework, where the filtered sinogram is resampled by a sinusoidal back-projection layer, followed by a typical CNN. Zhang et al. [58] adopted the Fourier feature representation [59] in their proposed sinogram prediction module and designed an iterative optimization scheme via forward and backward projection. Another work [17] combined two 3D residual U-Nets (ResUNets) for the projection domain and image domain denoising and trained them using cross-domain supervision and adversarial learning, demonstrating state-of-the-art performance.

The methods above have contributed to LDCT denoising to some extent; however, there is room for improvement due to their limitations. For image domain denoising, lacking direct access to the projection data increases the difficulty of distinguishing between subtle structures and signal-dependent artifacts. For iterative image reconstruction and denoising, the simulated scanner, i.e., the planar scanning with parallel trajectory, is not aligned with multi-slice CT scanners rotating in helical mode. Applying to fit the actual geometry will likely introduce extra complexity to the optimization or even affect model convergence. As to cross-domain methods, not only a single image reconstruction for modern scanners typically requires thousands of projections to complete, which makes the training of 𝒢Psubscript𝒢𝑃\mathcal{G}_{P}caligraphic_G start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT and 𝒢Isubscript𝒢𝐼\mathcal{G}_{I}caligraphic_G start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT heavily unbalanced, but also some 3D operations in the reconstruction, such as the Feldkamp-like weighted FBP [60], are difficult to be replaced by learned models. Consequently, attempts at end-to-end optimization become even more challenging, whereas these cascaded models can be easily affected by the over-smoothing problem due to the aggressive denoising of separately trained sub-networks.

Unlike existing works, our proposed framework takes the complete characteristics of the CT image reconstruction pipeline into account and performs joint projection-domain denoising and image-domain refinement while avoiding the complexity and difficulty of end-to-end fine-tuning without compromised performance observed in other cascaded designs. To the best of our knowledge, there are only very few works closely related to ours in the literature: Yin et al. [14] proposed to use two 3D sub-networks for sinogram and image denoising, respectively, where each sub-network was trained separately to take as input volumetric frames and estimate noise using 3D convolutions. However, this cascaded design still struggles to recover from aggressive denoising. Also, the effect of rebinning has not been considered. As a result, performance improvement is limited. In comparison, our method decomposes the reconstruction pipeline into several learning-based optimization problems according to the characteristics of data representation, which yields higher transparency to clients. This two-stage design not only improves the overall performance by a significant margin but also strengthens the system’s robustness subject to different gantry geometries.

3 Proposed Method

In this section, we first discuss the intuition behind the design to offer the readers a brief overview and then provide more details of the proposed framework.

Refer to caption
Figure 1: Overview of the proposed multi-stage hierarchical framework. The curly brackets indicate the concatenation operation. Note that only a single stream is shown in the projection-domain denoising, and all the noise components are amplified for better visibility.

3.1 Reconstruction Revisit

Pixels in a CT projection are obtained through the line integrals along the attenuating path. Without considering other effects (e.g., beam hardening), the ideal clean measurement pcsubscript𝑝𝑐p_{c}italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is given as:

pc=ln(NN0),subscript𝑝𝑐𝑁subscript𝑁0p_{c}=-\ln\left(\frac{N}{N_{0}}\right),italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = - roman_ln ( divide start_ARG italic_N end_ARG start_ARG italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) , (4)

where N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and N𝑁Nitalic_N are the incident and received intensity, respectively. However, noise is inevitable due to the quantum effects of photons. According to [61], a measurement under clinical environment, denoted as pnsubscript𝑝𝑛p_{n}italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, can be approximated by:

pn=pc+xN0exp(pc),subscript𝑝𝑛subscript𝑝𝑐𝑥subscript𝑁0subscript𝑝𝑐p_{n}=p_{c}+\frac{x}{\sqrt{N_{0}\exp\left(-p_{c}\right)}},italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + divide start_ARG italic_x end_ARG start_ARG square-root start_ARG italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_exp ( - italic_p start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG end_ARG , (5)

where x𝒩(0,1)similar-to𝑥𝒩01x\sim\mathcal{N}\left(0,1\right)italic_x ∼ caligraphic_N ( 0 , 1 ) is a unit Gaussian random variable. Similar to the light field camera [62], two extra variables further parameterize each measurement in the detector array to represent the spatial information of the ray, namely the ray distance d𝑑ditalic_d to the isocenter and the ray angle α𝛼\alphaitalic_α to the table. Hence, a complete ray representation is given as pn=Pn(d,α)subscript𝑝𝑛subscript𝑃𝑛𝑑𝛼p_{n}=P_{n}\left(d,\alpha\right)italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_d , italic_α ). Note that this process is general enough regardless of the source geometry (e.g., fan-beam or parallel-beam).

Modern CT scanners usually operate in helical trajectories, which requires an additional step called rebinning to convert raw projections to pseudo-parallel geometry via the following slicing operation:

P^n(i)=Pn(di,αi),subscript^𝑃𝑛𝑖subscript𝑃𝑛subscript𝑑𝑖subscript𝛼𝑖\hat{P}_{n}\left(i\right)=P_{n}\left(d_{i},\alpha_{i}\right),over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_i ) = italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (6)

where i𝑖iitalic_i is an element index in the rebined projection data. As the slicing indices are usually fractions, this operation is basically a 2D interpolation.

After that, all the resulting projections are transformed into image representation through CT reconstruction methods, where FBP is a popular choice, simplified as:

In(u,v)=πMm=1M𝒫^n(ucosθm+vsinθm),subscript𝐼𝑛𝑢𝑣𝜋𝑀superscriptsubscript𝑚1𝑀subscript^𝒫𝑛𝑢subscript𝜃𝑚𝑣subscript𝜃𝑚I_{n}\left(u,v\right)=\frac{\pi}{M}\sum_{m=1}^{M}\mathcal{\hat{P}}_{n}\left(u% \cos\theta_{m}+v\sin\theta_{m}\right),italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_u , italic_v ) = divide start_ARG italic_π end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_u roman_cos italic_θ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + italic_v roman_sin italic_θ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , (7)

where (u,v)𝑢𝑣(u,v)( italic_u , italic_v ) denotes the image pixel location, 𝒫^^𝒫\mathcal{\hat{P}}over^ start_ARG caligraphic_P end_ARG represents the filtered result of P^^𝑃\hat{P}over^ start_ARG italic_P end_ARG, θmsubscript𝜃𝑚\theta_{m}italic_θ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the m𝑚mitalic_m-th projection angle, and M𝑀Mitalic_M is the number of rebined projections. Readers are referred to [63] for a more comprehensive understanding of image reconstruction on multi-slice spiral CT.

3.2 Framework Overview

An overview of the proposed framework is presented in Fig. 1. It mainly performs the projection-domain denoising and the image-domain refinement, embodied by two multi-frame-based neural networks, named MPD-Net and MIR-Net, respectively.

Let SnP=[Pn1,Pn2,,PnK]superscriptsubscript𝑆𝑛𝑃superscriptsubscript𝑃𝑛1superscriptsubscript𝑃𝑛2superscriptsubscript𝑃𝑛𝐾S_{n*}^{P}=\left[P_{n*}^{1},P_{n*}^{2},\cdots,P_{n*}^{K}\right]italic_S start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT = [ italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ] denote a sequence of K𝐾Kitalic_K consecutive noisy projections, where {l,r}*\in\left\{l,r\right\}∗ ∈ { italic_l , italic_r } denotes the left or the right candidate to be sampled in (6), and the upper-script represents the time step, which can be omitted when unnecessary. Given SnPsuperscriptsubscript𝑆𝑛𝑃S_{n*}^{P}italic_S start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT as the input, MPD-Net performs multi-frame noise estimation using a sliding window of size 2F+12𝐹12F+12 italic_F + 1 for every projection, resulting in a denoised sequence SdP=[PdF+1,PdF+2,,PdKF]superscriptsubscript𝑆𝑑𝑃superscriptsubscript𝑃𝑑𝐹1superscriptsubscript𝑃𝑑𝐹2superscriptsubscript𝑃𝑑𝐾𝐹S_{d*}^{P}=\left[P_{d*}^{F+1},P_{d*}^{F+2},\cdots,P_{d*}^{K-F}\right]italic_S start_POSTSUBSCRIPT italic_d ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT = [ italic_P start_POSTSUBSCRIPT italic_d ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F + 1 end_POSTSUPERSCRIPT , italic_P start_POSTSUBSCRIPT italic_d ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F + 2 end_POSTSUPERSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_d ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - italic_F end_POSTSUPERSCRIPT ] with a noise level similar to routine dose (i.e., full dose), as follows:

Pdtsuperscriptsubscript𝑃𝑑𝑡\displaystyle P_{d*}^{t}italic_P start_POSTSUBSCRIPT italic_d ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =Pnt+Rtabsentsuperscriptsubscript𝑃𝑛𝑡superscriptsubscript𝑅𝑡\displaystyle=P_{n*}^{t}+R_{*}^{t}= italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_R start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT (8)
=Pnt+𝒢MPD(PntF,,Pnt,,Pnt+F),absentsuperscriptsubscript𝑃𝑛𝑡subscript𝒢𝑀𝑃𝐷superscriptsubscript𝑃𝑛𝑡𝐹superscriptsubscript𝑃𝑛𝑡superscriptsubscript𝑃𝑛𝑡𝐹\displaystyle=P_{n*}^{t}+\mathcal{G}_{MPD}\left(P_{n*}^{t-F},\cdots,P_{n*}^{t}% ,\cdots,P_{n*}^{t+F}\right),= italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + caligraphic_G start_POSTSUBSCRIPT italic_M italic_P italic_D end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - italic_F end_POSTSUPERSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_F end_POSTSUPERSCRIPT ) ,

where 𝒢MPDsubscript𝒢𝑀𝑃𝐷\mathcal{G}_{MPD}caligraphic_G start_POSTSUBSCRIPT italic_M italic_P italic_D end_POSTSUBSCRIPT represents MPD-Net, Rtsuperscriptsubscript𝑅𝑡R_{*}^{t}italic_R start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT represents its output, t{F+1,F+2,,KF}𝑡𝐹1𝐹2𝐾𝐹t\in\left\{F+1,F+2,\cdots,K-F\right\}italic_t ∈ { italic_F + 1 , italic_F + 2 , ⋯ , italic_K - italic_F }, and KFmuch-greater-than𝐾𝐹K\gg Fitalic_K ≫ italic_F.

The rebined projection sequence, denoted as S^dP=[P^dF+1,P^dF+2,,P^dKF]superscriptsubscript^𝑆𝑑𝑃superscriptsubscript^𝑃𝑑𝐹1superscriptsubscript^𝑃𝑑𝐹2superscriptsubscript^𝑃𝑑𝐾𝐹\hat{S}_{d}^{P}=\left[\hat{P}_{d}^{F+1},\hat{P}_{d}^{F+2},\cdots,\hat{P}_{d}^{% K-F}\right]over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT = [ over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F + 1 end_POSTSUPERSCRIPT , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_F + 2 end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - italic_F end_POSTSUPERSCRIPT ], is then obtained by the weighted summation over adjacent projections as follows:

P^dt=ω00Pdlt+ω01Pdrt+ω10Pdlt+1+ω11Pdrt+1,superscriptsubscript^𝑃𝑑𝑡subscript𝜔00superscriptsubscript𝑃𝑑𝑙𝑡subscript𝜔01superscriptsubscript𝑃𝑑𝑟𝑡subscript𝜔10superscriptsubscript𝑃𝑑𝑙𝑡1subscript𝜔11superscriptsubscript𝑃𝑑𝑟𝑡1\hat{P}_{d}^{t}=\omega_{00}P_{dl}^{t}+\omega_{01}P_{dr}^{t}+\omega_{10}P_{dl}^% {t+1}+\omega_{11}P_{dr}^{t+1},over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_ω start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_d italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ω start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_d italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ω start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_d italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT + italic_ω start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_d italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , (9)

where ω00subscript𝜔00\omega_{00}italic_ω start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT, ω01subscript𝜔01\omega_{01}italic_ω start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT, ω10subscript𝜔10\omega_{10}italic_ω start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT, and ω11subscript𝜔11\omega_{11}italic_ω start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT are interpolation weights that add up to one.

After that, the conjugate projections with angles θ~j=θj+kπsubscript~𝜃𝑗subscript𝜃𝑗𝑘𝜋\tilde{\theta}_{j}=\theta_{j}+k\piover~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_k italic_π are filtered and back-projected onto the orthogonal 2D plane, forming a back-projection view Vθjsubscript𝑉subscript𝜃𝑗V_{\theta_{j}}italic_V start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT as follows:

Vθjz(u,v)=1Hθjkh(zj,kz)𝒫^d(ucosθj+vsinθj),superscriptsubscript𝑉subscript𝜃𝑗𝑧𝑢𝑣1subscript𝐻subscript𝜃𝑗subscript𝑘subscript𝑧𝑗𝑘𝑧subscript^𝒫𝑑𝑢subscript𝜃𝑗𝑣subscript𝜃𝑗V_{\theta_{j}}^{z}\left(u,v\right)=\frac{1}{H_{\theta_{j}}}\sum_{k}h\left(z_{j% ,k}-z\right)\mathcal{\hat{P}}_{d}\left(u\cos\theta_{j}+v\sin\theta_{j}\right),italic_V start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ( italic_u , italic_v ) = divide start_ARG 1 end_ARG start_ARG italic_H start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h ( italic_z start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT - italic_z ) over^ start_ARG caligraphic_P end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( italic_u roman_cos italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_v roman_sin italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , (10)

where zj,kzsubscript𝑧𝑗𝑘𝑧z_{j,k}-zitalic_z start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT - italic_z is the axial offset of the ray to the reconstruction center z𝑧zitalic_z, hhitalic_h is a non-linear weighting function related to the multi-slice spiral geometry, and Hθjsubscript𝐻subscript𝜃𝑗H_{\theta_{j}}italic_H start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the sum of hhitalic_h over k𝑘kitalic_k.

A complete reconstruction can then be derived once a half-turn L𝐿Litalic_L is reached:

Idz(u,v)=πLj=1LVθjz(u,v).superscriptsubscript𝐼𝑑𝑧𝑢𝑣𝜋𝐿superscriptsubscript𝑗1𝐿superscriptsubscript𝑉subscript𝜃𝑗𝑧𝑢𝑣I_{d}^{z}\left(u,v\right)=\frac{\pi}{L}\sum_{j=1}^{L}V_{\theta_{j}}^{z}\left(u% ,v\right).italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ( italic_u , italic_v ) = divide start_ARG italic_π end_ARG start_ARG italic_L end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ( italic_u , italic_v ) . (11)

Similar to the projection domain denoising, when a sequence of Q𝑄Qitalic_Q reconstructed images is collected, MIR-Net takes the sequence SdI=[Idz1,Idz2,,IdzQ]superscriptsubscript𝑆𝑑𝐼superscriptsubscript𝐼𝑑subscript𝑧1superscriptsubscript𝐼𝑑subscript𝑧2superscriptsubscript𝐼𝑑subscript𝑧𝑄S_{d}^{I}=\left[I_{d}^{z_{1}},I_{d}^{z_{2}},\cdots,I_{d}^{z_{Q}}\right]italic_S start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT = [ italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ⋯ , italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ] as input and generates the refined image as the final result using a sliding window of size 2F+12𝐹12F+12 italic_F + 1, given as:

Irzqsuperscriptsubscript𝐼𝑟subscript𝑧𝑞\displaystyle I_{r}^{z_{q}}italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT =Idzq+Rrzqabsentsuperscriptsubscript𝐼𝑑subscript𝑧𝑞superscriptsubscript𝑅𝑟subscript𝑧𝑞\displaystyle=I_{d}^{z_{q}}+R_{r}^{z_{q}}= italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + italic_R start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (12)
=Idzq+𝒢MIR(IdzqF,,Idzq,,Idzq+F)absentsuperscriptsubscript𝐼𝑑subscript𝑧𝑞subscript𝒢𝑀𝐼𝑅superscriptsubscript𝐼𝑑subscript𝑧𝑞𝐹superscriptsubscript𝐼𝑑subscript𝑧𝑞superscriptsubscript𝐼𝑑subscript𝑧𝑞𝐹\displaystyle=I_{d}^{z_{q}}+\mathcal{G}_{MIR}\left(I_{d}^{z_{q-F}},\cdots,I_{d% }^{z_{q}},\cdots,I_{d}^{z_{q+F}}\right)= italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + caligraphic_G start_POSTSUBSCRIPT italic_M italic_I italic_R end_POSTSUBSCRIPT ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q - italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ⋯ , italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ⋯ , italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q + italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT )

where 𝒢MIRsubscript𝒢𝑀𝐼𝑅\mathcal{G}_{MIR}caligraphic_G start_POSTSUBSCRIPT italic_M italic_I italic_R end_POSTSUBSCRIPT represents MIR-Net, Rrzqsuperscriptsubscript𝑅𝑟subscript𝑧𝑞R_{r}^{z_{q}}italic_R start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT corresponds to its output, and q{F+1,F+2,,QF}𝑞𝐹1𝐹2𝑄𝐹q\in\left\{F+1,F+2,\cdots,Q-F\right\}italic_q ∈ { italic_F + 1 , italic_F + 2 , ⋯ , italic_Q - italic_F }. The input sequence to MIR-Net has a stride of F+1𝐹1F+1italic_F + 1, i.e., q(mod(F+1))1annotated𝑞moduloabsent𝐹11q\left(\mod\left(F+1\right)\right)\equiv 1italic_q ( roman_mod ( italic_F + 1 ) ) ≡ 1, which will be explained in Section 3.4.

Refer to caption
Figure 2: A sample clip from the raw projections obtained by a multi-slice spiral CT scanner.

3.3 Multi-frame-based Projection Denoising

We observe that modern scanners with multi-slice detectors provide intra-view (i.e., detector array) and inter-view (i.e., neighboring views) redundancy that are both beneficial for denoising. As can be seen from the example in Fig. 2, the intra-view similarity provides the projected structural details of the objects, and the inter-view similarity presents the relative motion between objects. As discussed in Section 3.1, the dominant source of noise is the photon noise that follows a Poisson distribution, which can be alleviated by averaging over multiple independent measurements. We thus consider reformulating the task as a burst imaging problem and propose MPD-Net to capture both the intra-view and inter-view features. It features two significant merits: On the one hand, noise reduction based on the statistics of burst imaging can be realized via implicit alignment and fusion along views; on the other hand, it also considers intra-view similarities so that structures within a view can be well preserved.

The main structure of MPD-Net is depicted in Fig. 3. Inspired by [64], a multi-step denoising model consisting of two modified ResUNets is adopted. Each ResUNet takes 2F+12𝐹12F+12 italic_F + 1 frames as input and predicts a denoised version of the middle frame via residual learning. Different from the original, where the same noise prior is used at each step [64], we take the predicted residual from the first step along with the untouched frames as input of the second step to avoid potential accumulated artifacts. Furthermore, the adaptive mix-up from [65] is also employed empirically in boosting performance.

Refer to caption
Figure 3: The structure of MPD-Net: (a) MPD-Net with F𝐹Fitalic_F=2, where ResUNets with different colors have individual parameter sets; (b) the structure of ResUNet. Details of E1-E2 and D1-D2 are given in Fig. 5.

The above intuition based on multi-frame denoising could be sub-optimal since the rebinning operation defined in (6) generates pseudo-parallel projections obtained across time. In addition, some advanced scanners apply the flying focal spot technique to improve axial resolution, where the rebinning process comes with another step that interleaves rows from two focal spots. All these operations lead to inconsistent correlations along detector channels, as we will demonstrate in Section 4.2, applying rebinning after MPD-Net results in unsmooth projections due to the absence of long-term consistency. However, if rebinning is performed ahead of MPD-Net, as it is essentially an interpolation operation, the element-wise noise independence will be violated, which degrades the denoising performance, as reported by [66, 67].

To guarantee element-wise noise independence while preserving rebinning consistency, we decompose the rebinning process into two steps, namely integer slicing and weighted summation, and bridge them with MPD-Net. The integer slicing extracts four untouched neighbors from 2D raw projections, whereas the weighted summation performs rebinning to the denoised results. This modification not only retains both the noise property and intra-view smoothness but also avoids complicated long-term memory mechanisms in the design.

Refer to caption
Figure 4: Two examples of inter-slice (study: Siemens-L291) and intra-slice (study: Siemens-L291, slice: 9,500) dose levels. It can be seen that the dose level not only varies dramatically within the scan but also has a non-uniform distribution across detector columns.

Besides, as shown in Fig. 4, due to the wildly applied Automatic Exposure Control (AEC) [63], not only the exposure levels of projections oscillate dramatically among table locations compared to that of images, but also the noise distributes non-uniformly within each projection. Although an existing work [68] demonstrated a certain level of tolerance of CNNs against dose levels in the image domain, blind denoisers tend to learn a more aggressive strategy when the noise variance becomes large, resulting in more blurry predictions [69]. Hence, we provide an external noise prior and train MPD-Net to recover the refined noise map. Given two sets of projections under low dose (dubbed Plsubscript𝑃𝑙P_{l}italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT) and target full dose (dubbed Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT), let Nl0subscript𝑁subscript𝑙0N_{l_{0}}italic_N start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT and Nf0subscript𝑁subscript𝑓0N_{f_{0}}italic_N start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT be their corresponding source intensity. Considering a realization of x𝑥xitalic_x in (5) be X𝑋Xitalic_X, then Plsubscript𝑃𝑙P_{l}italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT can be approximated using Pfsubscript𝑃𝑓P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT by:

Pl=Pf+(1Nl01Nf0)exp(Pf)X.subscript𝑃𝑙subscript𝑃𝑓1subscript𝑁subscript𝑙01subscript𝑁subscript𝑓0subscript𝑃𝑓𝑋P_{l}=P_{f}+\sqrt{\left(\frac{1}{N_{l_{0}}}-\frac{1}{N_{f_{0}}}\right)\exp% \left(P_{f}\right)}X.italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + square-root start_ARG ( divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) roman_exp ( italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) end_ARG italic_X . (13)

Replacing Plsubscript𝑃𝑙P_{l}italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT by Pf+ΔPsubscript𝑃𝑓subscriptΔ𝑃P_{f}+\Delta_{P}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT + roman_Δ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT, (13) can be rewritten as:

Pfsubscript𝑃𝑓\displaystyle P_{f}italic_P start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT =Pl(1Nl01Nf0)exp(Pl)exp(ΔP)Xabsentsubscript𝑃𝑙1subscript𝑁subscript𝑙01subscript𝑁subscript𝑓0subscript𝑃𝑙subscriptΔ𝑃𝑋\displaystyle=P_{l}-\sqrt{\left(\frac{1}{N_{l_{0}}}-\frac{1}{N_{f_{0}}}\right)% \exp\left(P_{l}\right)}\cdot\sqrt{\exp\left(-\Delta_{P}\right)}X= italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - square-root start_ARG ( divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG - divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG ) roman_exp ( italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) end_ARG ⋅ square-root start_ARG roman_exp ( - roman_Δ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ) end_ARG italic_X (14)
=PlΦω.absentsubscript𝑃𝑙Φ𝜔\displaystyle=P_{l}-\Phi\cdot\omega.= italic_P start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - roman_Φ ⋅ italic_ω .

Interestingly, the second part in (14) can be viewed as the multiplication of a constant term ΦΦ\Phiroman_Φ, which we define as the noise prior, and a weighting term ω𝜔\omegaitalic_ω, which we define as the weight map. Although noise-irrelevant, ΦΦ\Phiroman_Φ keeps changing among views when AEC is enabled. As a result, noise estimation becomes more challenging due to the extra uncertainty from ΦΦ\Phiroman_Φ. Fortunately, the per-channel source intensity N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is required to compute the attenuation ratio in (4). To alleviate the difficulty of noise estimation, we thus feed this noise prior to MPD-Net to help predict a more accurate noise map. The updated workflow defined in (8) is then given by:

Pdtsuperscriptsubscript𝑃𝑑𝑡\displaystyle P_{d*}^{t}italic_P start_POSTSUBSCRIPT italic_d ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =Pnt+𝒢MPD(P^ntF,,P^nt,,P^nt+F),absentsuperscriptsubscript𝑃𝑛𝑡subscript𝒢𝑀𝑃𝐷superscriptsubscript^𝑃𝑛𝑡𝐹superscriptsubscript^𝑃𝑛𝑡superscriptsubscript^𝑃𝑛𝑡𝐹\displaystyle=P_{n*}^{t}+\mathcal{G}_{MPD}\left(\hat{P}_{n*}^{t-F},\cdots,\hat% {P}_{n*}^{t},\cdots,\hat{P}_{n*}^{t+F}\right),= italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + caligraphic_G start_POSTSUBSCRIPT italic_M italic_P italic_D end_POSTSUBSCRIPT ( over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - italic_F end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_F end_POSTSUPERSCRIPT ) , (15)

where P^nt=[Pnt,Φt]superscriptsubscript^𝑃𝑛𝑡superscriptsubscript𝑃𝑛𝑡superscriptsubscriptΦ𝑡\hat{P}_{n*}^{t}=\left[P_{n*}^{t},\Phi_{*}^{t}\right]over^ start_ARG italic_P end_ARG start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] is the concatenation of the noisy projection Pntsuperscriptsubscript𝑃𝑛𝑡P_{n*}^{t}italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and the corresponding noise prior ΦtsuperscriptsubscriptΦ𝑡\Phi_{*}^{t}roman_Φ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

3.4 Multi-frame-based Image Refinement

Although the proposed MPD-Net can significantly reduce the noise of multi-slice projections, it is still far from satisfactory for two reasons: First, MPD-Net does not capture the structural features of the final reconstructed image because the reconstruction plane is in parallel with the ray trajectories. Second, the remaining noise in the results will be amplified by the high-pass filter and lead to streak artifacts after reconstruction. To tackle these problems, we introduce a second network called MIR-Net to refine the LDCT image further.

Fig. 5 presents the structure of MIR-Net, which consists of a single ResUNet. MIR-Net takes a reconstructed image sequence SrIsuperscriptsubscript𝑆𝑟𝐼S_{r}^{I}italic_S start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I end_POSTSUPERSCRIPT as input and produces the residual Rrsubscript𝑅𝑟R_{r}italic_R start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT as the output. The hourglass design enables an expanding receptive field that better captures structural features without large kernels, where we observe more excellent performance against other straight (i.e., without down/up-sampling) networks.

Refer to caption
Figure 5: Structure of MIR-Net. The number in each layer represents the output channel size, and direct-sum\bigoplus represents the adaptive mix-up operation. E1-E2 and D1-D2 represent encoding and decoding blocks, respectively. All layers use ReLU as the activation function.

Although MPD-Net benefits from multi-frame processing, it is challenging for MIR-Net to discover the full potential of multi-frame input. Let Izqsuperscript𝐼subscript𝑧𝑞I^{z_{q}}italic_I start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and Izq+1superscript𝐼subscript𝑧𝑞1I^{z_{q+1}}italic_I start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT be two consecutive CT images, and D𝐷Ditalic_D be the slice thickness. As can be seen in Fig. 6(a), when |zq+1zq|Dsubscript𝑧𝑞1subscript𝑧𝑞𝐷\left|z_{q+1}-z_{q}\right|\geq D| italic_z start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | ≥ italic_D, the (ideal) reconstructed CT images do not share objects between slices, meaning that the multi-frame input only features structural similarity instead of redundant observations. Besides, a 2D CT reconstruction represents a 3D volume in reality. Recalling (10) and (11), artifacts from conjugate projections are combined in each view. In contrast, a stack of views over a half-turn is projected onto the 2D image plane, resulting in compressed artifact patterns that are more difficult to remove.

We propose a simple yet effective approach to alleviate these problems by introducing overlapped slices as intermediate representations. According to (10), the reconstruction is obtained by averaging nearby back-projections through a weighting function hhitalic_h. We refer to the reconstruction method in [60], where hhitalic_h is given as:

h(Δz)=max(0,1|Δz|D)w(r),Δ𝑧01Δ𝑧𝐷𝑤𝑟\displaystyle h\left(\Delta z\right)=\max\left(0,1-\frac{\left|\Delta z\right|% }{D}\right)w\left(r\right),italic_h ( roman_Δ italic_z ) = roman_max ( 0 , 1 - divide start_ARG | roman_Δ italic_z | end_ARG start_ARG italic_D end_ARG ) italic_w ( italic_r ) , (16)

where r𝑟ritalic_r denotes the detector row index, and Δz=|zq+1zq|Δ𝑧subscript𝑧𝑞1subscript𝑧𝑞\Delta z=\left|z_{q+1}-z_{q}\right|roman_Δ italic_z = | italic_z start_POSTSUBSCRIPT italic_q + 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT | represents the distance between the projection and the reconstruction center along the table direction.

Refer to caption
Figure 6: Illustration of slice relations: (a) a pair of non-overlapped slices when Δz>DΔ𝑧𝐷\Delta z>Droman_Δ italic_z > italic_D; the dotted boxes are their projection data grabbing range, the red lines are corresponding weights that contribute to hhitalic_h (w/o w(r)𝑤𝑟w\left(r\right)italic_w ( italic_r )), and the grid in between indicates shared projections; (b) the proposed multi-slice redundancy (F𝐹Fitalic_F=1).

Without considering w(r)𝑤𝑟w\left(r\right)italic_w ( italic_r ), the amount of projection data required for a complete reconstruction, defined by h(Δz)>0Δ𝑧0h\left(\Delta z\right)>0italic_h ( roman_Δ italic_z ) > 0, simply lies in (zD,z+D)𝑧𝐷𝑧𝐷\left(z-D,z+D\right)( italic_z - italic_D , italic_z + italic_D ), indicating that the expansion of projections is wider than the slice. In other words, there may still be a small number of shared projections used in reconstructing both Izsuperscript𝐼𝑧I^{z}italic_I start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT and Iz+1superscript𝐼𝑧1I^{z+1}italic_I start_POSTSUPERSCRIPT italic_z + 1 end_POSTSUPERSCRIPT. These shared features provide redundant observations that are beneficial for multi-frame-based refinement. However, as Fig. 6(a) shows, the weights of shared projections become insignificant as ΔzΔ𝑧\Delta zroman_Δ italic_z increases; one could insert more slices in between to emphasize these weak signals better. To this end, we reconstruct F𝐹Fitalic_F slices as intermediate observations between each pair of adjacent slices and collect 2F+12𝐹12F+12 italic_F + 1 images as the multi-frame input. Fig. 6(b) presents the proposed solution when F=1𝐹1F=1italic_F = 1; by doing so, the emphasized projections offer a different realization of the compressed artifacts. It is worth mentioning that these intermediate representations cannot be obtained via interpolation from adjacent slices as the complete form of the weighting function is non-linear. Also, the shared features among adjacent slices are pixel-wise aligned; thus, a single ResUNet is efficient enough to handle multi-frame inputs.

Besides, we notice a certain degree of degradation in high-frequencies due to the aggressive denoising in individual sub-networks, which commonly occurs in cascaded designs. A straightforward solution is to fine-tune the entire chain in an end-to-end manner. However, as discussed in Section 2, end-to-end training is expensive and impractical for multi-slice spiral CT. Alternatively, instead of obtaining the denoised results from MPD-Net, we use their concatenation form, i.e., [Pnt,Rt]superscriptsubscript𝑃𝑛𝑡superscriptsubscript𝑅𝑡\left[P_{n*}^{t},R_{*}^{t}\right][ italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_R start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]. This not only compensates for the missing high-frequencies but also provides a decoupled reference of structural artifacts for further refinement. In short, the image domain refinement defined in (12) is rewritten as:

Irzq=Inzq+𝒢MIR(I^dqF,,I^dzq,,I^dzq+F),superscriptsubscript𝐼𝑟subscript𝑧𝑞superscriptsubscript𝐼𝑛subscript𝑧𝑞subscript𝒢𝑀𝐼𝑅superscriptsubscript^𝐼𝑑𝑞𝐹superscriptsubscript^𝐼𝑑subscript𝑧𝑞superscriptsubscript^𝐼𝑑subscript𝑧𝑞𝐹I_{r}^{z_{q}}=I_{n}^{z_{q}}+\mathcal{G}_{MIR}\left(\hat{I}_{d}^{{}_{q-F}},% \cdots,\hat{I}_{d}^{z_{q}},\cdots,\hat{I}_{d}^{z_{q+F}}\right),italic_I start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + caligraphic_G start_POSTSUBSCRIPT italic_M italic_I italic_R end_POSTSUBSCRIPT ( over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT start_FLOATSUBSCRIPT italic_q - italic_F end_FLOATSUBSCRIPT end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ⋯ , over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q + italic_F end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) , (17)

where I^dzq=[Inzq,Rnzq]subscriptsuperscript^𝐼subscript𝑧𝑞𝑑subscriptsuperscript𝐼subscript𝑧𝑞𝑛subscriptsuperscript𝑅subscript𝑧𝑞𝑛\hat{I}^{z_{q}}_{d}=\left[I^{z_{q}}_{n},R^{z_{q}}_{n}\right]over^ start_ARG italic_I end_ARG start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = [ italic_I start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_R start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] is the concatenation of the low-dose image Inzqsubscriptsuperscript𝐼subscript𝑧𝑞𝑛I^{z_{q}}_{n}italic_I start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and the residual Rnzqsubscriptsuperscript𝑅subscript𝑧𝑞𝑛R^{z_{q}}_{n}italic_R start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT reconstructed using the raw projections and MPD-Net predictions, respectively.

Table 1: Dataset Partition Summary
Partition Siemens Scanner Subset GE Scanner Subset
Training L004, L006, L019, L033, L057, L064, L071, L072, L081, L107 L012, L024, L027, L030, L036, L044, L045, L048, L079, L082
L110, L114, L116, L125, L131, L134, L150, L160, L170, L175 L094, L111, L113, L121, L127, L129, L133, L136, L138, L143
L178, L179, L193, L203, L210, L212, L220, L221, L232, L237 L147, L154, L163, L166, L171, L172, L181, L183, L185, L188
L248, L273, L299 L196, L216
Validation L077, L148, L229 L043, L213, L238
Testing L014, L056, L058, L075, L123, L145, L186, L187, L209, L219 L218, L228, L231, L234, L235, L244, L250, L251, L257, L260
L241, L266, L277 L267, L269, L288

4 Experiments and Analysis

In this section, we present evaluation results and analysis of the proposed framework. We first provide implementation details and ablation studies to verify our design. We then evaluate its performance both quantitatively and qualitatively.

4.1 Implementation Details

The proposed framework was implemented in PyTorch and trained on an RTX 3090 GPU with i9-10980XE CPU and 128GB RAM. We chose Adam Optimizer to update the model parameters. The Low Dose CT Image and Projection Data V6 [70], obtained from two scanners (Siemens SOMATOM Definition Flash, dubbed Siemens dataset, and GE Discovery CT750i, dubbed GE dataset), was used for training, validation, and testing. Both Siemens and GE datasets contain paired full-dose (7.6-28.8 mGy for Siemens studies and 9.2-21.6 mGy for GE studies) and quarter-dose (1.9-7.2 mGy for Siemens studies and 2.3-5.4 mGy for GE studies) data. The detailed data partitions are given in Table 1. The test dataset of the 2016 Low-dose CT AAPM Grand Challenge [71] was also employed in subjective evaluation since it contains scans from the control group, in which the dose level ranges from 0.8 mGy to 4.5 mGy. We followed the methods described in [72, 60, 73] for rebinning, filtering, back-projection, and weighted summation. The default Shepp–Logan filter was chosen as the reconstruction kernel, and the slice thickness from metadata was adopted.

During the network training, we used L1 loss to supervise both MPD-Net and MIR-Net. The initial learning rate was set to 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT, then reduced to 1×1051superscript1051\times 10^{-5}1 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT if the model performance on the validation dataset had no further improvement after certain steps. The complete convergence of the two CNNs took about 40 and 150 epochs, respectively. For both networks, we set F=2𝐹2F=2italic_F = 2 as the size of the sliding window.

4.2 Ablation Studies

We conducted experiments to show how the image quality is progressively improved by each component in the proposed framework, namely MPD-Net, MIR-Net, multi-frame input, external noise prior, and decoupled input (i.e., separated Pntsuperscriptsubscript𝑃𝑛𝑡P_{n*}^{t}italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and Rtsuperscriptsubscript𝑅𝑡R_{*}^{t}italic_R start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT). Ablation studies were performed on the Siemens test dataset; we report the measured mean square error (MSE) and structural similarity (SSIM) [74] in Table 2.

Table 2: Results of Ablation Studies
MPD MIR Multi Prior Decoupled MSE\downarrow SSIM\uparrow
774.46 0.9577
499.93 0.9685
418.54 0.9730
309.79 0.9774
236.60 0.9819
184.98 0.9881

It can be seen that both MPD-Net and MIR-Net play essential roles in enhancing the reconstruction quality of LDCT. For MPD-Net, introducing external noise prior leads to quality improvement, whereas placing the model before the rebinning operation results in sub-optimal performance, where we observe artifacts as shown in Fig. 7. Similarly, it is noticed that the summation (i.e., Pnt+Rtsuperscriptsubscript𝑃𝑛𝑡superscriptsubscript𝑅𝑡P_{n*}^{t}+R_{*}^{t}italic_P start_POSTSUBSCRIPT italic_n ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_R start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT) in (8) affects the reconstruction, mainly due to over-smoothing, whereas volumetric input brings extra improvement to the final image quality.

We also tested the performance of MIR-Net as a standalone image-domain denoiser. Although the results regarding MSE (215.62) and SSIM (0.9807) seem promising, further inspection of the noise power spectrum (NPS), shown in Fig. 8, indicates an obvious over-smoothing problem evidenced by the shifted peak with a higher level of noise power in low frequencies. In short, all these results align well with our analysis above.

Refer to caption
Figure 7: The interaction between the rebinning process and MPD-Net: (a) the result obtained from MPD-Net placed before the rebinning operation; (b) the result of the proposed method. Images are contrast-enhanced for better visibility.
Refer to caption
Figure 8: NPS results of ablation study on ACR CT phantom.
Table 3: Objective Evaluation Summary (Quarter-dose)
Siemens Test Dataset GE Test Dataset ACR CT Phantom
CTDIvolsubscriptCTDIvol\mathrm{CTDI_{vol}}roman_CTDI start_POSTSUBSCRIPT roman_vol end_POSTSUBSCRIPT: 1.9-7.2 mGy CTDIvolsubscriptCTDIvol\mathrm{CTDI_{vol}}roman_CTDI start_POSTSUBSCRIPT roman_vol end_POSTSUBSCRIPT: 2.3-5.4 mGy CTDIvolsubscriptCTDIvol\mathrm{CTDI_{vol}}roman_CTDI start_POSTSUBSCRIPT roman_vol end_POSTSUBSCRIPT: 3.4 mGy
WL: 40, WW: 300 WL: 40, WW: 400 Full-range
MSE\downarrow SSIM\uparrow MSE\downarrow SSIM\uparrow MSE\downarrow SSIM\uparrow
Baseline 774.46±811.18plus-or-minus774.46811.18774.46\pm 811.18774.46 ± 811.18 0.9577±0.0323plus-or-minus0.95770.03230.9577\pm 0.03230.9577 ± 0.0323 1768.27±1040.13plus-or-minus1768.271040.131768.27\pm 1040.131768.27 ± 1040.13 0.9152±0.0389plus-or-minus0.91520.03890.9152\pm 0.03890.9152 ± 0.0389 187.16±26.89plus-or-minus187.1626.89187.16\pm 26.89187.16 ± 26.89 0.9842±0.0016plus-or-minus0.98420.00160.9842\pm 0.00160.9842 ± 0.0016
BM3D 670.30±755.00plus-or-minus670.30755.00670.30\pm 755.00670.30 ± 755.00 0.9631±0.0296plus-or-minus0.96310.02960.9631\pm 0.02960.9631 ± 0.0296 1176.68±902.82plus-or-minus1176.68902.821176.68\pm 902.821176.68 ± 902.82 0.9424±0.0339plus-or-minus0.94240.03390.9424\pm 0.03390.9424 ± 0.0339 184.61±26.52plus-or-minus184.6126.52184.61\pm 26.52184.61 ± 26.52 0.9844±0.0016plus-or-minus0.98440.00160.9844\pm 0.00160.9844 ± 0.0016
NLM 724.50±780.64plus-or-minus724.50780.64724.50\pm 780.64724.50 ± 780.64 0.9605±0.0304plus-or-minus0.96050.03040.9605\pm 0.03040.9605 ± 0.0304 1268.01±944.06plus-or-minus1268.01944.061268.01\pm 944.061268.01 ± 944.06 0.9391±0.0347plus-or-minus0.93910.03470.9391\pm 0.03470.9391 ± 0.0347 187.09±26.86plus-or-minus187.0926.86187.09\pm 26.86187.09 ± 26.86 0.9842±0.0016plus-or-minus0.98420.00160.9842\pm 0.00160.9842 ± 0.0016
RED-CNN 210.50±208.29plus-or-minus210.50208.29210.50\pm 208.29210.50 ± 208.29 0.9863±0.0122plus-or-minus0.98630.01220.9863\pm 0.01220.9863 ± 0.0122 477.36±216.44plus-or-minus477.36216.44477.36\pm 216.44477.36 ± 216.44 0.9713±0.0126plus-or-minus0.97130.01260.9713\pm 0.01260.9713 ± 0.0126 77.40±20.93plus-or-minus77.4020.9377.40\pm 20.9377.40 ± 20.93 0.9925±0.0018plus-or-minus0.99250.00180.9925\pm 0.00180.9925 ± 0.0018
WGAN 453.66±313.47plus-or-minus453.66313.47453.66\pm 313.47453.66 ± 313.47 0.9691±0.0168plus-or-minus0.96910.01680.9691\pm 0.01680.9691 ± 0.0168 1004.27±473.89plus-or-minus1004.27473.891004.27\pm 473.891004.27 ± 473.89 0.9449±0.0236plus-or-minus0.94490.02360.9449\pm 0.02360.9449 ± 0.0236 179.32±122.46plus-or-minus179.32122.46179.32\pm 122.46179.32 ± 122.46 0.9874±0.0025plus-or-minus0.98740.00250.9874\pm 0.00250.9874 ± 0.0025
CPCE-3D 324.67±284.80plus-or-minus324.67284.80324.67\pm 284.80324.67 ± 284.80 0.9797±0.0156plus-or-minus0.97970.01560.9797\pm 0.01560.9797 ± 0.0156 761.76±368.73plus-or-minus761.76368.73761.76\pm 368.73761.76 ± 368.73 0.9589±0.0184plus-or-minus0.95890.01840.9589\pm 0.01840.9589 ± 0.0184 130.91±96.21plus-or-minus130.9196.21130.91\pm 96.21130.91 ± 96.21 0.9906±0.0025plus-or-minus0.99060.00250.9906\pm 0.00250.9906 ± 0.0025
QAE 241.36±274.95plus-or-minus241.36274.95241.36\pm 274.95241.36 ± 274.95 0.9853±0.0134plus-or-minus0.98530.01340.9853\pm 0.01340.9853 ± 0.0134 526.67±236.40plus-or-minus526.67236.40526.67\pm 236.40526.67 ± 236.40 0.9699±0.0132plus-or-minus0.96990.01320.9699\pm 0.01320.9699 ± 0.0132 87.41±23.91plus-or-minus87.4123.9187.41\pm 23.9187.41 ± 23.91 0.9913±0.0024plus-or-minus0.99130.00240.9913\pm 0.00240.9913 ± 0.0024
DP-ResNet 270.77±122.73plus-or-minus270.77122.73270.77\pm 122.73270.77 ± 122.73 0.9795±0.0080plus-or-minus0.97950.00800.9795\pm 0.00800.9795 ± 0.0080 451.32±202.43plus-or-minus451.32202.43451.32\pm 202.43451.32 ± 202.43 0.9734±0.0117plus-or-minus0.97340.01170.9734\pm 0.01170.9734 ± 0.0117 153.08±193.50plus-or-minus153.08193.50153.08\pm 193.50153.08 ± 193.50 0.9927±0.0033plus-or-minus0.99270.00330.9927\pm 0.00330.9927 ± 0.0033
EDCNN 240.39±275.13plus-or-minus240.39275.13240.39\pm 275.13240.39 ± 275.13 0.9848±0.0144plus-or-minus0.98480.01440.9848\pm 0.01440.9848 ± 0.0144 523.12±253.46plus-or-minus523.12253.46523.12\pm 253.46523.12 ± 253.46 0.9694±0.0139plus-or-minus0.96940.01390.9694\pm 0.01390.9694 ± 0.0139 84.48±18.74plus-or-minus84.4818.7484.48\pm 18.7484.48 ± 18.74 0.9919±0.0019plus-or-minus0.99190.00190.9919\pm 0.00190.9919 ± 0.0019
TransCT 250.96±226.68plus-or-minus250.96226.68250.96\pm 226.68250.96 ± 226.68 0.9836±0.0141plus-or-minus0.98360.01410.9836\pm 0.01410.9836 ± 0.0141 512.06±221.15plus-or-minus512.06221.15512.06\pm 221.15512.06 ± 221.15 0.9700±0.0126plus-or-minus0.97000.01260.9700\pm 0.01260.9700 ± 0.0126 98.43±72.26plus-or-minus98.4372.2698.43\pm 72.2698.43 ± 72.26 0.9915±0.0026plus-or-minus0.99150.00260.9915\pm 0.00260.9915 ± 0.0026
DU-GAN 286.49±309.19plus-or-minus286.49309.19286.49\pm 309.19286.49 ± 309.19 0.9820±0.0162plus-or-minus0.98200.01620.9820\pm 0.01620.9820 ± 0.0162 584.49±275.07plus-or-minus584.49275.07584.49\pm 275.07584.49 ± 275.07 0.9667±0.0150plus-or-minus0.96670.01500.9667\pm 0.01500.9667 ± 0.0150 108.70±23.07plus-or-minus108.7023.07108.70\pm 23.07108.70 ± 23.07 0.9898±0.0018plus-or-minus0.98980.00180.9898\pm 0.00180.9898 ± 0.0018
CTformer 242.06±282.92plus-or-minus242.06282.92242.06\pm 282.92242.06 ± 282.92 0.9847±0.0147plus-or-minus0.98470.01470.9847\pm 0.01470.9847 ± 0.0147 785.66±501.63plus-or-minus785.66501.63785.66\pm 501.63785.66 ± 501.63 0.9550±0.0221plus-or-minus0.95500.02210.9550\pm 0.02210.9550 ± 0.0221 80.16±19.46plus-or-minus80.1619.4680.16\pm 19.4680.16 ± 19.46 0.9925±0.0015plus-or-minus0.99250.00150.9925\pm 0.00150.9925 ± 0.0015
Ours 184.98±106.53plus-or-minus184.98106.53\textbf{184.98}\pm\textbf{106.53}184.98 ± 106.53 0.9881±0.0067plus-or-minus0.98810.0067\textbf{0.9881}\pm\textbf{0.0067}0.9881 ± 0.0067 422.39±201.62plus-or-minus422.39201.62\textbf{422.39}\pm\textbf{201.62}422.39 ± 201.62 0.9751±0.0115plus-or-minus0.97510.0115\textbf{0.9751}\pm\textbf{0.0115}0.9751 ± 0.0115 51.62±14.67plus-or-minus51.6214.67\textbf{51.62}\pm\textbf{14.67}51.62 ± 14.67 0.9951±0.0014plus-or-minus0.99510.0014\textbf{0.9951}\pm\textbf{0.0014}0.9951 ± 0.0014
Refer to caption
Figure 9: Visualization of a selected region with high contrast. The blue arrow indicates a continuous high-density structure (Slice ID: Siemens L266-002, WL: 40, WW: 300).

4.3 Objective Evaluation

We conducted an objective evaluation to compare the proposed framework with nine state-of-the-art learning-based methods, namely RED-CNN [11], WGAN [41], CPCE-3D [40], QAE [22], DP-ResNet [14], EDCNN [25], TransCT [31], DU-GAN [75], and CTformer [76]. We retrained all these methods using our training dataset for a fair comparison. Specifically, as the overall volume of the training dataset differs from their original setups, we retrained each model with more iterations. The retraining was terminated when the validation performance saturated, which also applied to GAN-based methods as they still employed pixel-wise paired supervision such as L1 or L2. The performance of two traditional denoising methods, i.e., BM3D [4] and non-local means (NLM) [3], was also tested as references. We employed MSE and SSIM as quantitative evaluation metrics. The results are reported in Table 3.

Our method reduces the MSE score by up to 70%, outperforming the others by a significant margin in all studies, indicating its superiority. Furthermore, our results on two scanners show high consistency, whereas some methods, such as DP-ResNet, EDCNN, TransCT, and CTformer, witness lower robustness. To better visualize the image quality, we present two sample slices in their corresponding zoom-in patches: In Fig. 9, the proposed method successfully recovers the continuous structure; in comparison, most methods failed to remove the heavy streak artifacts. In Fig. 10, the branches of the low-contrast structure have better visibility and sharpness in our results. Although the results from DP-ResNet are also promising, a certain degree of blurriness is observed.

Refer to caption
Figure 10: Visualization of a selected region with severe streak artifacts. The blue arrow indicates a low-contrast fine structure (Slice ID: GE L235-097, WL: 40, WW: 300).

4.4 Subjective Test

Although the quantitative evaluation shows significant improvement of the proposed framework against other methods, the evaluation metrics (i.e., MSE, and SSIM) might not reflect the real-world application scenario. In other words, a professional radiologist may pay special attention to certain aspects rather than general image quality metrics when reviewing CT images. As the main application is to assist clinical diagnosis, two radiologists with clinical experience of 12 years and 9 years were invited to perform a series of subjective evaluations.

Table 4: Results of Lesion Detection
Method Recall Precision Accuracy F1
RED-CNN 0.5128 0.5714 0.3929 0.5405
WGAN 0.4250 0.8947 0.4444 0.5763
CPCE-3D 0.4390 0.8182 0.4375 0.5714
QAE 0.4595 0.6296 0.4000 0.5312
DP-ResNet 0.5000 0.7407 0.4375 0.5970
EDCNN 0.5122 0.7500 0.4375 0.6087
TransCT 0.4872 0.7037 0.4510 0.5758
DU-GAN 0.3500 0.6667 0.3400 0.4590
Ctformer 0.5000 0.8400 0.5000 0.6269
Ours 0.6098 0.9615 0.6304 0.7463

The first subjective evaluation was performed using the 2016 Low-dose CT AAPM Grand Challenge test dataset, which consists of 16 patient scans with lesions and 4 healthy references. The radiologists were asked to mark all the lesions without prior knowledge of the patient’s diagnosis. By comparing the radiologists’ annotations with the ground truth provided by the challenge committee, we calculated precision, recall, accuracy, and F1 score for each method. The results are listed in Table 4. All the metrics of the proposed framework show significant improvements compared to the others. On the one hand, our result receives the highest precision over 0.9, meaning that it does not produce artifacts that could affect the diagnostic acceptability. On the other hand, the higher recall rate indicates better diagnostic sensitivity than the others. Overall measurements in precision and the F1 score further confirm the true value of the proposed framework in clinical exams.

Table 5: Results of Subjective Quality Evaluation
RED-CNN WGAN CPCE-3D QAE DP-ResNet EDCNN TransCT DU-GAN CTformer Ours
Noise suppression 4.38 3.08 3.19 3.65 4.38 4.15 4.35 3.31 3.69 4.62
Contrast retention 3.88 3.77 3.96 3.73 3.42 4.15 3.62 4.12 3.85 4.46
Margin sharpness 2.65 2.85 2.81 2.58 2.46 2.73 2.46 2.69 2.54 3.38
Diagnostic acceptability 3.58 3.31 3.58 3.31 3.73 3.81 3.50 3.46 3.35 4.19

The second subjective evaluation was to evaluate the perceptual quality of each method using the Siemens test dataset. During the test, three images, including a normal-dose CT, a low-dose CT, and a refined low-dose CT, were presented to the reviewer. For each study, the reviewers were requested to assign scores concerning noise suppression, contrast retention, margin sharpness, and diagnostic acceptability111All studies in the Siemens dataset contain at least one lesion., respectively. A five-point scale table was employed, where the lowest score (1) was assigned to low-dose CT whereas the highest score (5) was assigned to full-dose CT. All processed studies from each method were de-identified and shuffled to avoid biased judgments. Since there were no absolute clean references and the full-dose counterparts still contained a certain amount of noise, the radiologists were asked to assign scores no higher than 5, no matter whether the sample was of higher quality than the full-dose reference. The results are summarized in Table 5. Again, the proposed method receives the highest scores by a significant margin in all aspects, which is well-aligned with the results of the previous evaluations.

Refer to caption
Figure 11: Illustration of the two samples, where the box regions in red were identified as (a) metastasis pancreas and (b) aorta. Their zoom-in comparisons are presented in Figs. 12-13. (WL: 10, WW: 400, source: the AAPM Grand Challenge committee)
Refer to caption
Figure 12: Zoom-in visualization of the red box region in Fig. 11(a). The low-contrast lesion is marked by the yellow arrow (Slice ID: L593-050, WL: 10, WW: 400, source of diagnosis: the AAPM Grand Challenge committee).
Refer to caption
Figure 13: Zoom-in visualization of the red box region in Fig. 11(b). The yellow arrow marks the uniformly distributed aorta (Slice ID: L548-050, WL: 10, WW: 300).

Interestingly, RED-CNN, DP-ResNet, and TransCT get high noise suppression scores but low margin sharpness scores due to aggressive denoising (i.e., over-smoothing). On the contrary, introducing detail preservation constraints (e.g., perceptual loss in CPCE-3D and adversarial loss in WGAN and DU-GAN) leads to compromised denoising performance and reduced diagnostic sensitivity. In comparison, the proposed method can reach a pleasing balance between these aspects, yielding satisfactory results that radiologists prefer.

Last, we show two representative examples from these evaluations: Fig. 11 presents an overview of the two samples in full-dose, in which their zoom-in crops are compared in Figs. 12-13: Fig. 12 depicts a lesion that is barely noticeable by experts due to its low contrast. Similar to the full-dose references, our method could transfer details more faithfully and maintain the contrast but with an even lower noise level. In comparison, some methods either fail to suppress noise (e.g., WGAN, DU-GAN) or introduce artifacts (e.g., CTformer), making them less reliable in clinical exams. Fig. 13 visualizes the cross-section of the aorta, where the higher uniformity in our result indicates better denoising quality.

Refer to caption
Figure 14: NPS and TTF results of learning-based methods on ACR CT phantom.
Refer to caption
Figure 15: NPS and TTF results of our method on various simulated dose levels on ACR CT phantom.

4.5 Phantom Examination

To further analyze the standardized CT properties, we conducted an examination using ACR 464 CT Phantom. In this experiment, two real scans representing the full dose (13.5 mGy) and the quarter dose (3.4 mGy) of the same phantom were obtained. Because the same scanner model was also used previously (i.e., Siemens SOMATOM Definition Flash), we directly applied all the learning-based models to these scans without retraining. The results in terms of MSE and SSIM are reported in Table 3, where the proposed method trained using synthetic data still works well in real clinic exams. After that, the slices of the phantom’s first and third layers were used to analyze the task-based transfer function (TTF) and NPS. The methods and default settings from [77] were employed to obtain the results.

First, we show the NPS plot and TTF plot of each method in Fig. 14. It can be observed that every candidate achieves a certain degree of noise reduction on the low-dose phantom scan. However, many do not show satisfactory noise suppression on low-frequencies. The facts of lower peak frequency, higher noise magnitude, and degraded transfer function indicate increased blurriness and decreased spatial resolution in the resulting images. In contrast, the proposed method produces a much smoother NPS with a lower degree of noise than the full-dose version. Although the noise power at the low-frequency band is slightly higher than the full-dose reference, its improved TTF implies that the details are reproduced more faithfully, whereas sharpness degradation was marginal.

Second, we analyzed the robustness under various dose levels. Following the noise insertion method described in [68], we generated a bunch of simulated low-dose projections ranging from 17% to 80% of the full dose level. Fig. 15 presents the dose-dependent NPS and TTF plots. The resultant NPS plots of the proposed method show consistent denoising performance across a wide range of dose levels, and the TTF plots of two inserts further support its high robustness.

Table 6: CT Number Accuracy under Various Dose Levels
Dose Level Air Bone Acrylic Polyethylene
(-1000 HU) (955 HU) (120 HU) (-95 HU)
17% -968.31 847.61 116.65 -84.84
20% -968.28 847.80 116.67 -84.85
25% -968.26 847.96 116.68 -84.85
30% -968.24 848.01 116.70 -84.84
38% -968.25 847.97 116.71 -84.83
48% -968.25 847.90 116.69 -84.80
62% -968.25 847.88 116.67 -84.77
80% -968.24 847.91 116.65 -84.73
100%a -968.38 847.34 116.60 -84.69
  • a

    Reference full-dose images without denoising.

Finally, we measured the CT number accuracy of the processed images against the full-dose references. The results are listed in Table 6. These reports confirm that no particular bias is presented in the reconstructed images, and dose levels do not affect this observation.

5 Conclusions

The main goal of this paper is to improve the quality of LDCT images for multi-slice spiral CT. We comprehensively discussed the proposed two-stage processing pipeline across both projection and image domains. We analyzed the impacts of rebinning and filtered back-projection to the final reconstruction. To fully utilize the intra-slice and inter-slice similarity inherent in the acquired projection volume, we transformed the task into a multi-frame-based denoising and refinement problem. Although conventional rebinning and reconstruction methods link the two networks, we performed several studies to verify the effectiveness of our method and show its superiority over state-of-the-art methods through qualitative and quantitative performance evaluations. Future work might include better detail preservation via higher-level loss functions other than a simple L1 penalty.

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable feedback and constructive suggestions, which have greatly improved the quality and clarity of this manuscript.

References

  • [1] A. F. H. Ortiz, L. J. F. Beaujon, S. Y. G. Villamizar, and F. F. F. López, “Magnetic resonance versus computed tomography for the detection of retroperitoneal lymph node metastasis due to testicular cancer: A systematic literature review,” European Journal of Radiology Open, vol. 8, p. 100372, 2021.
  • [2] D. J. Brenner and E. J. Hall, “Computed tomography—an increasing source of radiation exposure,” New England Journal of Medicine, vol. 357, no. 22, pp. 2277–2284, 2007.
  • [3] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 2, 2005, pp. 60–65.
  • [4] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007.
  • [5] Y. Lu and S.-W. Jung, “Progressive joint low-light enhancement and noise removal for raw images,” IEEE Trans. Image Process., vol. 31, pp. 2390–2404, 2022.
  • [6] A. Abdelhamed, M. Afifi, R. Timofte, and M. S. Brown, “Ntire 2020 challenge on real image denoising: Dataset, methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 496–497.
  • [7] S. Nah, S. Son, S. Lee, R. Timofte, and K. M. Lee, “Ntire 2021 challenge on image deblurring,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 149–165.
  • [8] J. Cai, S. Gu, R. Timofte, and L. Zhang, “Ntire 2019 challenge on real image super-resolution: Methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 2211–2223.
  • [9] K. H. **, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
  • [10] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for low-dose x-ray ct reconstruction,” Medical Physics, vol. 44, no. 10, pp. e360–e375, 2017.
  • [11] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose ct with a residual encoder-decoder convolutional neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2524–2535, 2017.
  • [12] D. Wu, K. Kim, G. El Fakhri, and Q. Li, “Iterative low-dose ct reconstruction with priors trained by artificial neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2479–2486, 2017.
  • [13] T. Würfl, F. C. Ghesu, V. Christlein, and A. Maier, “Deep learning computed tomography,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2016, pp. 432–440.
  • [14] X. Yin, Q. Zhao, J. Liu, W. Yang, J. Yang, G. Quan, Y. Chen, H. Shu, L. Luo, and J.-L. Coatrieux, “Domain progressive 3D residual convolution network to improve low-dose CT imaging,” IEEE Trans. Med. Imag., vol. 38, no. 12, pp. 2903–2913, 2019.
  • [15] Y. Li, K. Li, C. Zhang, J. Montoya, and G.-H. Chen, “Learning to reconstruct computed tomography images directly from sinogram data under a variety of data acquisition conditions,” IEEE Trans. Med. Imag., vol. 38, no. 10, pp. 2469–2481, 2019.
  • [16] J. He, Y. Wang, and J. Ma, “Radon inversion via deep learning,” IEEE Trans. Med. Imag., vol. 39, no. 6, pp. 2076–2087, 2020.
  • [17] Y. Zhang, D. Hu, Q. Zhao, G. Quan, J. Liu, Q. Liu, Y. Zhang, G. Coatrieux, Y. Chen, and H. Yu, “CLEAR: comprehensive learning enabled adversarial reconstruction for subtle structure enhanced low-dose CT imaging,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3089–3101, 2021.
  • [18] Y. Zhang, Y. Wang, W. Zhang, F. Lin, Y. Pu, and J. Zhou, “Statistical iterative reconstruction using adaptive fractional order regularization,” Biomedical Optics Express, vol. 7, no. 3, pp. 1015–1029, 2016.
  • [19] Y. Chen, D. Gao, C. Nie, L. Luo, W. Chen, X. Yin, and Y. Lin, “Bayesian statistical reconstruction for low-dose x-ray computed tomography using an adaptive-weighting nonlocal prior,” Computerized Medical Imaging and Graphics, vol. 33, no. 7, pp. 495–500, 2009.
  • [20] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Low-dose x-ray ct reconstruction via dictionary learning,” IEEE Trans. Med. Imag., vol. 31, no. 9, pp. 1682–1697, 2012.
  • [21] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3d random noise filtering for absorption optical projection tomography,” Physics in Medicine & Biology, vol. 55, no. 18, p. 5401, 2010.
  • [22] F. Fan, H. Shan, M. K. Kalra, R. Singh, G. Qian, M. Getzin, Y. Teng, J. Hahn, and G. Wang, “Quadratic autoencoder (Q-AE) for low-dose CT denoising,” IEEE Trans. Med. Imag., vol. 39, no. 6, pp. 2035–2050, 2019.
  • [23] L. A. Zavala-Mondragon, P. Rongen, J. O. Bescos, P. H. De With, and F. Van der Sommen, “Noise reduction in CT using learned wavelet-frame shrinkage networks,” IEEE Trans. Med. Imag., vol. 41, no. 8, pp. 2048–2066, 2022.
  • [24] L. A. Zavala-Mondragon, P. H. de With, and F. van der Sommen, “Image noise reduction based on a fixed wavelet frame and CNNs applied to CT,” IEEE Trans. Image Process., vol. 30, pp. 9386–9401, 2021.
  • [25] T. Liang, Y. **, Y. Li, and T. Wang, “EDCNN: Edge enhancement-based densely connected network with compound loss for low-dose CT denoising,” in Proceedings of the IEEE International Conference on Signal Processing, vol. 1, 2020, pp. 193–198.
  • [26] M. Matsuura, J. Zhou, N. Akino, and Z. Yu, “Feature-aware deep-learning reconstruction for context-sensitive x-ray computed tomography,” IEEE Trans. Radiat. Plasma Med. Sci., vol. 5, no. 1, pp. 99–107, 2020.
  • [27] X. Tao, H. Zhang, Y. Wang, G. Yan, D. Zeng, W. Chen, and J. Ma, “VVBP-tensor in the FBP algorithm: its properties and application in low-dose CT reconstruction,” IEEE Trans. Med. Imag., vol. 39, no. 3, pp. 764–776, 2019.
  • [28] X. Tao, Y. Wang, L. Lin, Z. Hong, and J. Ma, “Learning to reconstruct CT images from the VVBP-tensor,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3030–3041, 2021.
  • [29] L. Xu, Y. Zhang, Y. Liu, D. Wang, M. Zhou, J. Ren, J. Wei, and Z. Ye, “Low-dose CT denoising using a structure-preserving kernel prediction network,” in Proceedings of the IEEE International Conference on Image Processing, 2021, pp. 1639–1643.
  • [30] B. Kim, H. Shim, and J. Baek, “Weakly-supervised progressive denoising with unpaired CT images,” Medical Image Analysis, vol. 71, p. 102065, 2021.
  • [31] Z. Zhang, L. Yu, X. Liang, W. Zhao, and L. Xing, “TransCT: dual-path transformer for low dose computed tomography,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021, pp. 55–64.
  • [32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proceedings of the Advances in Neural Information Processing systems, vol. 30, pp. 6000–6010, 2017.
  • [33] D. Wang, Z. Wu, and H. Yu, “TED-net: Convolution-free T2T vision transformer-based encoder-decoder dilation network for low-dose ct denoising,” in Proceedings of the International Workshop on Machine Learning in Medical Imaging, 2021, pp. 416–425.
  • [34] N. Huber, T. Anderson, A. Missert, M. Adkins, S. Leng, J. Fletcher, C. McCollough, L. Yu, and K. N. Glazebrook, “Clinical evaluation of a phantom-based deep convolutional neural network for whole-body-low-dose and ultra-low-dose ct skeletal surveys,” Skeletal Radiology, vol. 51, no. 1, pp. 145–151, 2022.
  • [35] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2noise: Learning image restoration without clean data,” arXiv preprint arXiv:1803.04189, 2018.
  • [36] A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void-learning denoising from single noisy images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2129–2137.
  • [37] Z. Zhang, X. Liang, W. Zhao, and L. Xing, “Noise2context: Context-assisted learning 3d thin-layer for low-dose ct,” Medical Physics, vol. 48, no. 10, pp. 5794–5803, 2021.
  • [38] C. Niu, M. Li, F. Fan, W. Wu, X. Guo, Q. Lyu, and G. Wang, “Noise suppression with similarity-based self-supervised deep learning,” IEEE Trans. Med. Imag., vol. 42, no. 6, pp. 1590–1602, 2022.
  • [39] X. Yi, E. Walia, and P. Babyn, “Generative adversarial network in medical imaging: A review,” Medical Image Analysis, vol. 58, p. 101552, 2019.
  • [40] H. Shan, Y. Zhang, Q. Yang, U. Kruger, M. K. Kalra, L. Sun, W. Cong, and G. Wang, “3-d convolutional encoder-decoder network for low-dose ct via transfer learning from a 2-d trained network,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1522–1534, 2018.
  • [41] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1348–1357, 2018.
  • [42] M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao, “Sacnn: Self-attention convolutional neural network for low-dose ct denoising with self-supervised perceptual loss network,” IEEE Trans. Med. Imag., vol. 39, no. 7, pp. 2289–2301, 2020.
  • [43] M. Ghahremani, M. Khateri, A. Sierra, and J. Tohka, “Adversarial distortion learning for medical image denoising,” arXiv preprint arXiv:2204.14100, 2022.
  • [44] X. Zhang, Z. Han, H. Shangguan, X. Han, X. Cui, and A. Wang, “Artifact and detail attention generative adversarial networks for low-dose ct denoising,” IEEE Trans. Med. Imag., vol. 40, no. 12, pp. 3901–3918, 2021.
  • [45] J. Gu and J. C. Ye, “AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising,” IEEE Trans. Comput. Imaging, vol. 7, pp. 73–85, 2021.
  • [46] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2223–2232.
  • [47] K. Lee and W.-K. Jeong, “ISCL: Interdependent self-cooperative learning for unpaired image denoising,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3238–3248, 2021.
  • [48] H. Gupta, K. H. **, H. Q. Nguyen, M. T. McCann, and M. Unser, “CNN-based projected gradient descent for consistent CT image reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1440–1453, 2018.
  • [49] E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional framelet denosing for low-dose CT via wavelet residual network,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1358–1369, 2018.
  • [50] H. Chen, Y. Zhang, Y. Chen, J. Zhang, W. Zhang, H. Sun, Y. Lv, P. Liao, J. Zhou, and G. Wang, “LEARN: Learned experts’ assessment-based reconstruction network for sparse-data ct,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1333–1347, 2018.
  • [51] S. Roth and M. J. Black, “Fields of experts: A framework for learning image priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 2, 2005, pp. 860–867.
  • [52] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 394–405, 2018.
  • [53] W. Xia, Z. Lu, Y. Huang, Z. Shi, Y. Liu, H. Chen, Y. Chen, J. Zhou, and Y. Zhang, “MAGIC: Manifold and graph integrative convolutional network for low-dose ct reconstruction,” IEEE Trans. Med. Imag., vol. 40, no. 12, pp. 3459–3472, 2021.
  • [54] J. He, Y. Yang, Y. Wang, D. Zeng, Z. Bian, H. Zhang, J. Sun, Z. Xu, and J. Ma, “Optimizing a parameterized plug-and-play ADMM for iterative low-dose CT reconstruction,” IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 371–382, 2018.
  • [55] I. Y. Chun, X. Zheng, Y. Long, and J. A. Fessler, “BCD-Net for low-dose CT reconstruction: Acceleration, convergence, and generalization,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019, pp. 31–40.
  • [56] Y. Chun and J. A. Fessler, “Deep BCD-net using identical encoding-decoding CNN structures for iterative image recovery,” in Proceedings of the IEEE Image, Video, and Multidimensional Signal Processing Workshop, 2018, pp. 1–5.
  • [57] S. Ye, Z. Li, M. T. McCann, Y. Long, and S. Ravishankar, “Unified supervised-unsupervised (SUPER) learning for X-ray CT image reconstruction,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 2986–3001, 2021.
  • [58] G. Zang, R. Idoughi, R. Li, P. Wonka, and W. Heidrich, “IntraTomo: Self-supervised learning-based tomography via sinogram synthesis and prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1960–1970.
  • [59] M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020.
  • [60] K. Stierstorfer, A. Rauscher, J. Boese, H. Bruder, S. Schaller, and T. Flohr, “Weighted FBP—a simple approximate 3D FBP algorithm for multislice spiral CT with good dose usage for arbitrary pitch,” Physics in Medicine & Biology, vol. 49, no. 11, p. 2209, 2004.
  • [61] B. R. Whiting, P. Massoumzadeh, O. A. Earl, J. A. O’Sullivan, D. L. Snyder, and J. F. Williamson, “Properties of preprocessed sinogram data in x-ray computed tomography,” Medical Physics, vol. 33, no. 9, pp. 3290–3303, 2006.
  • [62] R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Ph.D. dissertation, Stanford University, 2005.
  • [63] J. Hsieh, Computed tomography: principles, design, artifacts, and recent advances.   SPIE press, 2003.
  • [64] M. Tassano, J. Delon, and T. Veit, “FastDVDnet: Towards real-time deep video denoising without flow estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1354–1363.
  • [65] H. Wu, Y. Qu, S. Lin, J. Zhou, R. Qiao, Z. Zhang, Y. Xie, and L. Ma, “Contrastive learning for compact single image dehazing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 551–10 560.
  • [66] T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and J. T. Barron, “Unprocessing images for learned raw denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 036–11 045.
  • [67] L. Bao, Z. Yang, S. Wang, D. Bai, and J. Lee, “Real image denoising based on multi-scale residual dense block and cascaded u-net with block-connection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 448–449.
  • [68] R. Zeng, C. Y. Lin, Q. Li, L. Jiang, M. Skopec, J. A. Fessler, and K. J. Myers, “Performance of a deep learning-based ct image denoising method: Generalizability over dose, reconstruction kernel, and slice thickness,” Medical physics, vol. 49, no. 2, pp. 836–853, 2022.
  • [69] J. Xu, L. Zhang, and D. Zhang, “External prior guided internal prior learning for real-world noisy image denoising,” IEEE Trans. Image Process., vol. 27, no. 6, pp. 2996–3010, 2018.
  • [70] T. R. Moen, B. Chen, D. R. Holmes III, X. Duan, Z. Yu, L. Yu, S. Leng, J. G. Fletcher, and C. H. McCollough, “Low-dose CT image and projection dataset,” Medical Physics, vol. 48, no. 2, pp. 902–911, 2021.
  • [71] A. A. of Physicists in Medicine. (2016) 2016 low dose ct grand challenge. [Online]. Available: https://www.aapm.org/grandchallenge/lowdosect/
  • [72] T. Flohr, K. Stierstorfer, S. Ulzheimer, H. Bruder, A. Primak, and C. McCollough, “Image reconstruction and image quality evaluation for a 64-slice CT scanner with-flying focal spot,” Medical Physics, vol. 32, no. 8, pp. 2536–2547, 2005.
  • [73] J. M. Hoffman, F. Noo, S. Young, S. S. Hsieh, and M. McNitt-Gray, “FreeCT_ICD: An open-source implementation of a model-based iterative reconstruction method using coordinate descent optimization for CT imaging investigations,” Medical physics, vol. 45, no. 8, pp. 3591–3603, 2018.
  • [74] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [75] Z. Huang, J. Zhang, Y. Zhang, and H. Shan, “DU-GAN: Generative adversarial networks with dual-domain u-net-based discriminators for low-dose ct denoising,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, 2021.
  • [76] D. Wang, F. Fan, Z. Wu, R. Liu, F. Wang, and H. Yu, “Ctformer: Convolution-free token2token dilated vision transformer for low-dose ct denoising,” arXiv preprint arXiv:2202.13517, 2022.
  • [77] J. Greffier, Y. Barbotteau, and F. Gardavaud, “iQMetrix-CT: New software for task-based image quality assessment of phantom ct images,” Diagnostic and Interventional Imaging, vol. 103, no. 11, pp. 555–562, 2022.