Cross-domain Denoising for Low-dose Multi-frame Spiral Computed Tomography

Yucheng Lu Zhixin Xu Moon Hyung Choi Jimin Kim and Seung-Won Jung \IEEEmembershipSenior Member, IEEE This work was supported by the Korea Medical Device Development Fund grant funded by the Korea government (the Ministry of Science and ICT, the Ministry of Trade, Industry, and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711139124, KMDF

\_

\_

20200901

\_

0096). (Corresponding author: Seung-Won Jung.)Y. Lu is with the Education and Research Center for Socialware IT, Korea University, Seoul, Korea; and the Department of Datalogi, IT University of Copenhagen, Copenhagen, Denmark (e-mail: [email protected]).Z. Xu and S.-W. Jung is with the Department of Electrical Engineering, Korea University, Seoul, Korea (e-mail: [email protected]; [email protected]).M. H. Choi and J. Kim are with the Department of Radiology, Eunpyeong St. Mary’s Hospital, College of Medicine, The Catholic University of Korea (e-mail: [email protected]; [email protected]).

Abstract

Computed tomography (CT) has been used worldwide as a non-invasive test to assist in diagnosis. However, the ionizing nature of X-ray exposure raises concerns about potential health risks such as cancer. The desire for lower radiation doses has driven researchers to improve reconstruction quality. Although previous studies on low-dose computed tomography (LDCT) denoising have demonstrated the effectiveness of learning-based methods, most were developed on the simulated data. However, the real-world scenario differs significantly from the simulation domain, especially when using the multi-slice spiral scanner geometry. This paper proposes a two-stage method for the commercially available multi-slice spiral CT scanners that better exploits the complete reconstruction pipeline for LDCT denoising across different domains. Our approach makes good use of the high redundancy of multi-slice projections and the volumetric reconstructions while leveraging the over-smoothing problem in conventional cascaded frameworks caused by aggressive denoising. The dedicated design also provides a more explicit interpretation of the data flow. Extensive experiments on various datasets showed that the proposed method could remove up to 70% of noise without compromised spatial resolution, and subjective evaluations by two experienced radiologists further supported its superior performance against state-of-the-art methods in clinical practice. Code is available at https://github.com/YCL92/TMD-LDCT.

{IEEEkeywords}

Deep learning, low-dose computed tomography, image and video denoising

1 Introduction

\IEEEPARstart

Computed tomography (CT) is one of the most popular tools used in clinical examinations nowadays due to its non-invasive and volumetric data acquisition advantages. Unlike conventional X-ray studies that project all volumetric information onto a single planar image, CT enables us to restore piles of axial structures through reverse reconstruction, providing rich space of data representation that helps access finer patterns for further evaluation and diagnosis.

Despite its great convenience and performance, there have been significant concerns about the potential health hazard to patients. Cell and organ damage may occur due to excessive exposure if proper measures against ionizing radiation are not considered. Even though the dosage of CT is relatively low, exposure over a protracted time can still increase the risk of develo** cancer [1]. Therefore, minimizing radiation exposure has been carried out with a sense of urgency regarding science and public opinion [2].

Since it is currently impractical to exclude CT from general health examinations, engineers have been working actively on reducing the radiation exposed to the subjects through various techniques, such as enlarging the source field of view, increasing the sensor resolution both horizontally and vertically, improving the detector sensitivity, and increasing the table speed for faster scanning. These efforts have led to considerable performance improvement over the past decades.

In addition to hardware-based solutions, researchers have also paid attention to dosage reduction through software-assisted technology. Two representative methods are sparse-view CT, which uses a reduced number of projections per gantry rotation, and low-dose CT (LDCT), which uses a reduced intensity of the X-ray source. Whereas the former tries to compensate for the artifacts introduced by the missing views, the latter gives rise to an image denoising problem, receiving more attention as similar tasks in other areas (e.g., low-light image enhancement) have been extensively studied [3, 4, 5].

With the success of convolutional neural networks (CNNs) in low-level computer vision tasks such as denoising [6], deblurring [7], and super-resolution [8], they have been rapidly adopted to medical imaging applications. Early studies have shown the promising potential of CNNs on LDCT denoising compared to conventional handcrafted regularizers [9, 10, 11, 12]. However, the protocol of LDCT denoising significantly differs from that of conventional image denoising, requiring further optimization of the obtained data representation (i.e., projections) before the final reconstruction. Unfortunately, many existing works directly borrow ideas from conventional image denoising and apply them to the reconstructed CT slices as a post-processing stage, which can be sub-optimal for LDCT denoising. Although there are a few works in the literature that handle projection data [13, 14, 15, 16, 17], they either operate on simulated 1D parallel-beam projection data which is not aligned with the modern CT scanner geometry, or try to incorporate the entire reconstruction pipeline into a single black-box model. All the above limitations hinder these works from being more optimal and transparent.

To address these problems, we propose a two-stage denoising framework dedicated to multi-slice spiral CT scanners in this paper. Specifically, in the first stage, a projection domain denoising network takes as input the successive projection slices and estimates sequential noise components, which are then rebined and used by the image restoration network in the second stage for further refinement. This two-stage design considers the domain-specific characteristics while avoiding information degradation in common cascaded structures, yielding objectively and subjectively more satisfactory image quality. In summary, the main contributions of this paper are as follows:

•

We propose a two-stage framework for LDCT denoising. The proposed method works across both the projection and image domains. It is specifically optimized for CT scanners with multi-slice helical geometry.
•

We model each stage’s physical properties of noise and artifacts based on the data acquisition process in the reconstruction pipeline. This design improves the denoising performance and gives end-users richer interpretation ability and transparency.
•

We demonstrate through experiments on patient data that our method significantly outperforms existing works both quantitatively and qualitatively. An extensive analysis of phantom scans further supports that the proposed method has achieved state-of-the-art performance.

The remaining sections of this paper are organized as follows: Section II reviews some existing works related to our topic and briefly discusses their limitations, Section III presents the proposed method in full detail, Section IV provides the experiment setup and compares our results with those of several representative works, and finally, Section V concludes the paper.

2 Related Work

Since the data acquisition and image restoration protocol of CT differs from that of conventional digital cameras, the design of an LDCT denoiser can be highly flexible depending on the appearance of the data to be processed. Compared to early works based on handcrafted image priors, such as total variation [18], non-local means [19], dictionary learning [20], block matching and 3D filtering [21], etc., some pioneering works [9, 10, 11, 12] have shown the superior potential of CNNs in LDCT denoising. Thus, we mainly focus on CNN-based methods and classify the most recent research into three categories: post-reconstruction image denoising, model-based iterative image reconstruction and denoising, and cross-domain joint optimization.

2.1 Post-reconstruction Image Denoising

Post-reconstruction image denoising aims at removing noise directly from the reconstructed CT images, which can be formulated as:

I_{pr}=\mathcal{G}_{I}\left(I_{n}\right),

(1)

where $I_{n}$ and $I_{pr}$ are the low-dose noisy input and noise-suppressed output predicted by the denoiser $\mathcal{G}_{I}$ , respectively. A significant merit of this approach is that it works in the 2D image domain as a post-processing step, so there is no need to alter the reconstruction pipeline in clinical practice. A considerable number of works fall within this category: Fan et al. [22] replaced the conventional 2D convolution [11] with the quadratic representation. Zavala et al. [23] revisited the perfect reconstruction conditions of the encoder-decoder-structured CNNs with soft-shrinkage and proposed a learnable shrinkage layer to handle decomposed wavelet frames, which was later extended using the over-complete Haar wavelet transform [24]. Liang et al. [25] employed a perceptual loss from a pre-trained VGGNet to their densely connected denoiser. Matsuura et al. [26] extracted context-aware features from images reconstructed with various parameter presets as data augmentation to enhance the quality of the denoised result. Tao et al. [27] observed the specific patterns lying in the stacked view-by-view back-projection tensors and developed a tensor singular value decomposition-based algorithm for LDCT, which was extended to the field of deep learning-based LDCT denoising [28]. Xu et al. [29] applied dynamic filters predicted from a CNN to the extracted image features such that the non-uniformly distributed noise can be separated. Kim et al. [30] proposed a progressive denoising method to remove noise in an iterative manner while injecting synthetic noise into the projections on the fly. Zhang et al. [31] designed a framework consisting of two parallel networks to handle the low-frequency and high-frequency components, where the popular and powerful Transformer architecture [32] was adopted. Similarly, Wang et al. [33] introduced resha**, dilated unfolding, and cyclic shifting operations between successive Transformer blocks to share information across different patches, reaching better performance.

Unfortunately, clean observations can never be reached due to the statistical uncertainty of CT. Hence, the paired dataset used for training actually consists of routine-dose images, which serve as approximations of their noise-free counterparts, and low-dose images, which can be simulated via projection-domain noise injection or image-domain superimposition [34]. As the noise statistics in CT images still follow some common properties such as zero-mean and zero discrepancies, a few researchers adopted the idea of Noise2Noise [35] and Noise2Void [36] in training their models without clean images: Zhang et al. [37] proposed using adjacent slices to approximate self-similarity, whereas Niu et al. [38] further extended it to handle uncorrelated noise and structural artifacts with a broader range of patch-searching volumes. Alternatively, adversarial learning is also an option for unsupervised learning, which is usually achieved by generative adversarial networks (GANs) [39]: Shan et al. [40] employed 3D convolution in the design of the encoder-decoder network and supervised the transfer learning from a pre-trained 2D variant using the Wasserstein distance. Yang et al. [41] combined the supervised and unsupervised learning with a hybrid loss term consisting of an adversarial loss and a perceptual loss to allow denoising while maintaining structural details, which was further improved by Li et al. [42] with the help of the self-attention module as well as the self-supervised perceptual loss. Ghahermani et al. [43] introduced an adversarial distortion learning method that considers the element-wise discrimination loss, reconstruction loss, pyramidal texture loss, and histogram loss in the supervision. Zhang et al. [44] employed an artifact and noise attention network and used an edge feature extraction path to compensate for the over-smoothed details. Gu et al. [45] adopted the cycle consistency and proposed a CycleGAN [46]-based model with adaptive instance normalization layers, achieving improved performance over the conventional CycleGAN at the cost of about only half of the parameters. Similarly, Lee et al. [47] introduced a pseudo network along with the CycleGAN framework and added a bypass consistency to prevent the generator from learning to embed blind information of noise into the output.

2.2 Iterative Image Reconstruction and Denoising

Although the post-reconstruction CT image denoising is simple and fast, an obvious limitation is that the high-pass filter (e.g., ramp filter) in the back-projection operation inevitably amplifies the noise component and introduces signal-dependent artifacts, which makes denoising more challenging and thus deteriorates the performance. To cope with this problem, model-based image reconstruction (MBIR) comes to the rescue, given by:

I_{pr}=\mathop{\arg\min}_{I}\left\|\psi I-P_{n}\right\|_{2}^{2}+\lambda\phi% \left(I\right),

(2)

where $\psi$ is the forward-projection operation that maps the reconstructed image $I$ back to the projection domain, $P_{n}$ represents the corresponding low-dose noisy observation, $\phi$ is a regularization function, and $\lambda$ is a balancing parameter.

The solution of (2) is typically obtained in an iterative manner that updates the reconstructed image by comparing the forward-projection result with the measurement under some constraints to stabilize the optimization. Many methods have been presented to embed pre-trained CNN denoisers as a part of the update protocol and achieved better performance against post-reconstruction denoising: Gupta et al., [48] utilized a CNN to project the objective function onto the data manifold and proposed a relaxed version of the projected gradient descent method that guarantees the convergence of the optimization. Kang et al. [49] reviewed the denoising task under the low-rank Hankel structured matrix constraint and presented a wavelet residual network that learns to impose the low-rankness. Chen et al. [50] proposed to replace the generalized regularization term referred to as the fields of experts [51] with a three-layer CNN, in which the trainable parameters are independent at each iteration. Similar work was presented by Aggarwal et al. [52] with their proposed conjugate gradient optimization-based data consistency layer, enabling the training of the unrolled model to be performed in an end-to-end manner with minimal memory cost. Inspired by [50], Xia et al. [53] further employed a learned graph convolutional network as an additional constraint to enhance non-local topological features in the low-dimensional patch manifold. He et al. [54] reformulated the problem as a dual-domain optimization task and modified the iteration of the alternating direction method of multipliers (ADMM) by using CNNs to represent the gradient, resulting in a parameterized plug-and-play ADMM optimization scheme. Chun et al. [55] introduced BCD-Net [56] to LDCT reconstruction and applied the accelerated proximal gradient method as a fast numerical solver. Ye et al. [57] took both the supervised regularization and the unsupervised regularization into account and proposed an optimization scheme to alternatively update the reconstruction result under specific constraints, where the experiments on several publicly available MBIR-based methods showed improved performance against their vanilla counterparts.

2.3 Cross-domain Joint Optimization

MBIR methods generally yield higher reconstruction quality. However, they significantly increase the reconstruction time and typically occupy more computational resources. To achieve fast inference speed while reserving access to raw projection data, a promising solution is to apply denoising across different data domains that cover the complete reconstruction pipeline, as follows:

I_{pr}=\mathcal{G}_{I}\left(\varphi\left(\mathcal{G}_{P}\left(P_{n}\right)% \right)\right),

(3)

where $\mathcal{G}_{P}$ and $\mathcal{G}_{I}$ are denoisers in the projection domain and image domain, respectively. $\varphi$ is a projection-to-image operation that can be performed by conventional algorithms such as filtered back-projection (FBP) or learned models.

Several dedicated works fall into this category: Würfl et al. [13] utilized a multi-layer perceptron to model the behavior of filtering, back-projection, and non-negative constraint for sinogram-to-image reconstruction. Li et al. [15] designed a model named iCT-Net consisting of a novel back-projection layer for both LDCT denoising and sparse-view reconstruction. He et al. [16] presented a deep learning-based Radon inversion framework, where the filtered sinogram is resampled by a sinusoidal back-projection layer, followed by a typical CNN. Zhang et al. [58] adopted the Fourier feature representation [59] in their proposed sinogram prediction module and designed an iterative optimization scheme via forward and backward projection. Another work [17] combined two 3D residual U-Nets (ResUNets) for the projection domain and image domain denoising and trained them using cross-domain supervision and adversarial learning, demonstrating state-of-the-art performance.

The methods above have contributed to LDCT denoising to some extent; however, there is room for improvement due to their limitations. For image domain denoising, lacking direct access to the projection data increases the difficulty of distinguishing between subtle structures and signal-dependent artifacts. For iterative image reconstruction and denoising, the simulated scanner, i.e., the planar scanning with parallel trajectory, is not aligned with multi-slice CT scanners rotating in helical mode. Applying to fit the actual geometry will likely introduce extra complexity to the optimization or even affect model convergence. As to cross-domain methods, not only a single image reconstruction for modern scanners typically requires thousands of projections to complete, which makes the training of $\mathcal{G}_{P}$ and $\mathcal{G}_{I}$ heavily unbalanced, but also some 3D operations in the reconstruction, such as the Feldkamp-like weighted FBP [60], are difficult to be replaced by learned models. Consequently, attempts at end-to-end optimization become even more challenging, whereas these cascaded models can be easily affected by the over-smoothing problem due to the aggressive denoising of separately trained sub-networks.

Unlike existing works, our proposed framework takes the complete characteristics of the CT image reconstruction pipeline into account and performs joint projection-domain denoising and image-domain refinement while avoiding the complexity and difficulty of end-to-end fine-tuning without compromised performance observed in other cascaded designs. To the best of our knowledge, there are only very few works closely related to ours in the literature: Yin et al. [14] proposed to use two 3D sub-networks for sinogram and image denoising, respectively, where each sub-network was trained separately to take as input volumetric frames and estimate noise using 3D convolutions. However, this cascaded design still struggles to recover from aggressive denoising. Also, the effect of rebinning has not been considered. As a result, performance improvement is limited. In comparison, our method decomposes the reconstruction pipeline into several learning-based optimization problems according to the characteristics of data representation, which yields higher transparency to clients. This two-stage design not only improves the overall performance by a significant margin but also strengthens the system’s robustness subject to different gantry geometries.

3 Proposed Method

In this section, we first discuss the intuition behind the design to offer the readers a brief overview and then provide more details of the proposed framework.

Refer to caption — Figure 1: Overview of the proposed multi-stage hierarchical framework. The curly brackets indicate the concatenation operation. Note that only a single stream is shown in the projection-domain denoising, and all the noise components are amplified for better visibility.

3.1 Reconstruction Revisit

Pixels in a CT projection are obtained through the line integrals along the attenuating path. Without considering other effects (e.g., beam hardening), the ideal clean measurement $p_{c}$ is given as:

p_{c}=-\ln\left(\frac{N}{N_{0}}\right),

(4)

where $N_{0}$ and $N$ are the incident and received intensity, respectively. However, noise is inevitable due to the quantum effects of photons. According to [61], a measurement under clinical environment, denoted as $p_{n}$ , can be approximated by:

p_{n}=p_{c}+\frac{x}{\sqrt{N_{0}\exp\left(-p_{c}\right)}},

(5)

where $x\sim\mathcal{N}\left(0,1\right)$ is a unit Gaussian random variable. Similar to the light field camera [62], two extra variables further parameterize each measurement in the detector array to represent the spatial information of the ray, namely the ray distance $d$ to the isocenter and the ray angle $\alpha$ to the table. Hence, a complete ray representation is given as $p_{n}=P_{n}\left(d,\alpha\right)$ . Note that this process is general enough regardless of the source geometry (e.g., fan-beam or parallel-beam).

Modern CT scanners usually operate in helical trajectories, which requires an additional step called rebinning to convert raw projections to pseudo-parallel geometry via the following slicing operation:

\hat{P}_{n}\left(i\right)=P_{n}\left(d_{i},\alpha_{i}\right),

(6)

where $i$ is an element index in the rebined projection data. As the slicing indices are usually fractions, this operation is basically a 2D interpolation.

After that, all the resulting projections are transformed into image representation through CT reconstruction methods, where FBP is a popular choice, simplified as:

I_{n}\left(u,v\right)=\frac{\pi}{M}\sum_{m=1}^{M}\mathcal{\hat{P}}_{n}\left(u% \cos\theta_{m}+v\sin\theta_{m}\right),

(7)

where $(u,v)$ denotes the image pixel location, $\mathcal{\hat{P}}$ represents the filtered result of $\hat{P}$ , $\theta_{m}$ is the $m$ -th projection angle, and $M$ is the number of rebined projections. Readers are referred to [63] for a more comprehensive understanding of image reconstruction on multi-slice spiral CT.

3.2 Framework Overview

An overview of the proposed framework is presented in Fig. 1. It mainly performs the projection-domain denoising and the image-domain refinement, embodied by two multi-frame-based neural networks, named MPD-Net and MIR-Net, respectively.

Let $S_{n*}^{P}=\left[P_{n*}^{1},P_{n*}^{2},\cdots,P_{n*}^{K}\right]$ denote a sequence of $K$ consecutive noisy projections, where $*\in\left\{l,r\right\}$ denotes the left or the right candidate to be sampled in (6), and the upper-script represents the time step, which can be omitted when unnecessary. Given $S_{n*}^{P}$ as the input, MPD-Net performs multi-frame noise estimation using a sliding window of size $2F+1$ for every projection, resulting in a denoised sequence $S_{d*}^{P}=\left[P_{d*}^{F+1},P_{d*}^{F+2},\cdots,P_{d*}^{K-F}\right]$ with a noise level similar to routine dose (i.e., full dose), as follows:

	$\displaystyle P_{d*}^{t}$	$\displaystyle=P_{n}^{t}+R_{}^{t}$		(8)
		$\displaystyle=P_{n}^{t}+\mathcal{G}_{MPD}\left(P_{n}^{t-F},\cdots,P_{n}^{t}% ,\cdots,P_{n}^{t+F}\right),$		(8)

where $\mathcal{G}_{MPD}$ represents MPD-Net, $R_{*}^{t}$ represents its output, $t\in\left\{F+1,F+2,\cdots,K-F\right\}$ , and $K\gg F$ .

The rebined projection sequence, denoted as $\hat{S}_{d}^{P}=\left[\hat{P}_{d}^{F+1},\hat{P}_{d}^{F+2},\cdots,\hat{P}_{d}^{% K-F}\right]$ , is then obtained by the weighted summation over adjacent projections as follows:

\hat{P}_{d}^{t}=\omega_{00}P_{dl}^{t}+\omega_{01}P_{dr}^{t}+\omega_{10}P_{dl}^% {t+1}+\omega_{11}P_{dr}^{t+1},

(9)

where $\omega_{00}$ , $\omega_{01}$ , $\omega_{10}$ , and $\omega_{11}$ are interpolation weights that add up to one.

After that, the conjugate projections with angles $\tilde{\theta}_{j}=\theta_{j}+k\pi$ are filtered and back-projected onto the orthogonal 2D plane, forming a back-projection view $V_{\theta_{j}}$ as follows:

V_{\theta_{j}}^{z}\left(u,v\right)=\frac{1}{H_{\theta_{j}}}\sum_{k}h\left(z_{j% ,k}-z\right)\mathcal{\hat{P}}_{d}\left(u\cos\theta_{j}+v\sin\theta_{j}\right),

(10)

where $z_{j,k}-z$ is the axial offset of the ray to the reconstruction center $z$ , $h$ is a non-linear weighting function related to the multi-slice spiral geometry, and $H_{\theta_{j}}$ is the sum of $h$ over $k$ .

A complete reconstruction can then be derived once a half-turn $L$ is reached:

I_{d}^{z}\left(u,v\right)=\frac{\pi}{L}\sum_{j=1}^{L}V_{\theta_{j}}^{z}\left(u% ,v\right).

(11)

Similar to the projection domain denoising, when a sequence of $Q$ reconstructed images is collected, MIR-Net takes the sequence $S_{d}^{I}=\left[I_{d}^{z_{1}},I_{d}^{z_{2}},\cdots,I_{d}^{z_{Q}}\right]$ as input and generates the refined image as the final result using a sliding window of size $2F+1$ , given as:

	$\displaystyle I_{r}^{z_{q}}$	$\displaystyle=I_{d}^{z_{q}}+R_{r}^{z_{q}}$		(12)
		$\displaystyle=I_{d}^{z_{q}}+\mathcal{G}_{MIR}\left(I_{d}^{z_{q-F}},\cdots,I_{d% }^{z_{q}},\cdots,I_{d}^{z_{q+F}}\right)$		(12)

where $\mathcal{G}_{MIR}$ represents MIR-Net, $R_{r}^{z_{q}}$ corresponds to its output, and $q\in\left\{F+1,F+2,\cdots,Q-F\right\}$ . The input sequence to MIR-Net has a stride of $F+1$ , i.e., $q\left(\mod\left(F+1\right)\right)\equiv 1$ , which will be explained in Section 3.4.

3.3 Multi-frame-based Projection Denoising

We observe that modern scanners with multi-slice detectors provide intra-view (i.e., detector array) and inter-view (i.e., neighboring views) redundancy that are both beneficial for denoising. As can be seen from the example in Fig. 2, the intra-view similarity provides the projected structural details of the objects, and the inter-view similarity presents the relative motion between objects. As discussed in Section 3.1, the dominant source of noise is the photon noise that follows a Poisson distribution, which can be alleviated by averaging over multiple independent measurements. We thus consider reformulating the task as a burst imaging problem and propose MPD-Net to capture both the intra-view and inter-view features. It features two significant merits: On the one hand, noise reduction based on the statistics of burst imaging can be realized via implicit alignment and fusion along views; on the other hand, it also considers intra-view similarities so that structures within a view can be well preserved.

The main structure of MPD-Net is depicted in Fig. 3. Inspired by [64], a multi-step denoising model consisting of two modified ResUNets is adopted. Each ResUNet takes $2F+1$ frames as input and predicts a denoised version of the middle frame via residual learning. Different from the original, where the same noise prior is used at each step [64], we take the predicted residual from the first step along with the untouched frames as input of the second step to avoid potential accumulated artifacts. Furthermore, the adaptive mix-up from [65] is also employed empirically in boosting performance.

The above intuition based on multi-frame denoising could be sub-optimal since the rebinning operation defined in (6) generates pseudo-parallel projections obtained across time. In addition, some advanced scanners apply the flying focal spot technique to improve axial resolution, where the rebinning process comes with another step that interleaves rows from two focal spots. All these operations lead to inconsistent correlations along detector channels, as we will demonstrate in Section 4.2, applying rebinning after MPD-Net results in unsmooth projections due to the absence of long-term consistency. However, if rebinning is performed ahead of MPD-Net, as it is essentially an interpolation operation, the element-wise noise independence will be violated, which degrades the denoising performance, as reported by [66, 67].

To guarantee element-wise noise independence while preserving rebinning consistency, we decompose the rebinning process into two steps, namely integer slicing and weighted summation, and bridge them with MPD-Net. The integer slicing extracts four untouched neighbors from 2D raw projections, whereas the weighted summation performs rebinning to the denoised results. This modification not only retains both the noise property and intra-view smoothness but also avoids complicated long-term memory mechanisms in the design.

Besides, as shown in Fig. 4, due to the wildly applied Automatic Exposure Control (AEC) [63], not only the exposure levels of projections oscillate dramatically among table locations compared to that of images, but also the noise distributes non-uniformly within each projection. Although an existing work [68] demonstrated a certain level of tolerance of CNNs against dose levels in the image domain, blind denoisers tend to learn a more aggressive strategy when the noise variance becomes large, resulting in more blurry predictions [69]. Hence, we provide an external noise prior and train MPD-Net to recover the refined noise map. Given two sets of projections under low dose (dubbed $P_{l}$ ) and target full dose (dubbed $P_{f}$ ), let $N_{l_{0}}$ and $N_{f_{0}}$ be their corresponding source intensity. Considering a realization of $x$ in (5) be $X$ , then $P_{l}$ can be approximated using $P_{f}$ by:

P_{l}=P_{f}+\sqrt{\left(\frac{1}{N_{l_{0}}}-\frac{1}{N_{f_{0}}}\right)\exp% \left(P_{f}\right)}X.

(13)

Replacing $P_{l}$ by $P_{f}+\Delta_{P}$ , (13) can be rewritten as:

	$\displaystyle P_{f}$	$\displaystyle=P_{l}-\sqrt{\left(\frac{1}{N_{l_{0}}}-\frac{1}{N_{f_{0}}}\right)% \exp\left(P_{l}\right)}\cdot\sqrt{\exp\left(-\Delta_{P}\right)}X$		(14)
		$\displaystyle=P_{l}-\Phi\cdot\omega.$		(14)

Interestingly, the second part in (14) can be viewed as the multiplication of a constant term $\Phi$ , which we define as the noise prior, and a weighting term $\omega$ , which we define as the weight map. Although noise-irrelevant, $\Phi$ keeps changing among views when AEC is enabled. As a result, noise estimation becomes more challenging due to the extra uncertainty from $\Phi$ . Fortunately, the per-channel source intensity $N_{0}$ is required to compute the attenuation ratio in (4). To alleviate the difficulty of noise estimation, we thus feed this noise prior to MPD-Net to help predict a more accurate noise map. The updated workflow defined in (8) is then given by:

\displaystyle P_{d*}^{t}

\displaystyle=P_{n*}^{t}+\mathcal{G}_{MPD}\left(\hat{P}_{n*}^{t-F},\cdots,\hat% {P}_{n*}^{t},\cdots,\hat{P}_{n*}^{t+F}\right),

(15)

where $\hat{P}_{n*}^{t}=\left[P_{n*}^{t},\Phi_{*}^{t}\right]$ is the concatenation of the noisy projection $P_{n*}^{t}$ and the corresponding noise prior $\Phi_{*}^{t}$ .

3.4 Multi-frame-based Image Refinement

Although the proposed MPD-Net can significantly reduce the noise of multi-slice projections, it is still far from satisfactory for two reasons: First, MPD-Net does not capture the structural features of the final reconstructed image because the reconstruction plane is in parallel with the ray trajectories. Second, the remaining noise in the results will be amplified by the high-pass filter and lead to streak artifacts after reconstruction. To tackle these problems, we introduce a second network called MIR-Net to refine the LDCT image further.

Fig. 5 presents the structure of MIR-Net, which consists of a single ResUNet. MIR-Net takes a reconstructed image sequence $S_{r}^{I}$ as input and produces the residual $R_{r}$ as the output. The hourglass design enables an expanding receptive field that better captures structural features without large kernels, where we observe more excellent performance against other straight (i.e., without down/up-sampling) networks.

Although MPD-Net benefits from multi-frame processing, it is challenging for MIR-Net to discover the full potential of multi-frame input. Let $I^{z_{q}}$ and $I^{z_{q+1}}$ be two consecutive CT images, and $D$ be the slice thickness. As can be seen in Fig. 6(a), when $\left|z_{q+1}-z_{q}\right|\geq D$ , the (ideal) reconstructed CT images do not share objects between slices, meaning that the multi-frame input only features structural similarity instead of redundant observations. Besides, a 2D CT reconstruction represents a 3D volume in reality. Recalling (10) and (11), artifacts from conjugate projections are combined in each view. In contrast, a stack of views over a half-turn is projected onto the 2D image plane, resulting in compressed artifact patterns that are more difficult to remove.

We propose a simple yet effective approach to alleviate these problems by introducing overlapped slices as intermediate representations. According to (10), the reconstruction is obtained by averaging nearby back-projections through a weighting function $h$ . We refer to the reconstruction method in [60], where $h$ is given as:

\displaystyle h\left(\Delta z\right)=\max\left(0,1-\frac{\left|\Delta z\right|% }{D}\right)w\left(r\right),

(16)

where $r$ denotes the detector row index, and $\Delta z=\left|z_{q+1}-z_{q}\right|$ represents the distance between the projection and the reconstruction center along the table direction.

Without considering $w\left(r\right)$ , the amount of projection data required for a complete reconstruction, defined by $h\left(\Delta z\right)>0$ , simply lies in $\left(z-D,z+D\right)$ , indicating that the expansion of projections is wider than the slice. In other words, there may still be a small number of shared projections used in reconstructing both $I^{z}$ and $I^{z+1}$ . These shared features provide redundant observations that are beneficial for multi-frame-based refinement. However, as Fig. 6(a) shows, the weights of shared projections become insignificant as $\Delta z$ increases; one could insert more slices in between to emphasize these weak signals better. To this end, we reconstruct $F$ slices as intermediate observations between each pair of adjacent slices and collect $2F+1$ images as the multi-frame input. Fig. 6(b) presents the proposed solution when $F=1$ ; by doing so, the emphasized projections offer a different realization of the compressed artifacts. It is worth mentioning that these intermediate representations cannot be obtained via interpolation from adjacent slices as the complete form of the weighting function is non-linear. Also, the shared features among adjacent slices are pixel-wise aligned; thus, a single ResUNet is efficient enough to handle multi-frame inputs.

Besides, we notice a certain degree of degradation in high-frequencies due to the aggressive denoising in individual sub-networks, which commonly occurs in cascaded designs. A straightforward solution is to fine-tune the entire chain in an end-to-end manner. However, as discussed in Section 2, end-to-end training is expensive and impractical for multi-slice spiral CT. Alternatively, instead of obtaining the denoised results from MPD-Net, we use their concatenation form, i.e., $\left[P_{n*}^{t},R_{*}^{t}\right]$ . This not only compensates for the missing high-frequencies but also provides a decoupled reference of structural artifacts for further refinement. In short, the image domain refinement defined in (12) is rewritten as:

I_{r}^{z_{q}}=I_{n}^{z_{q}}+\mathcal{G}_{MIR}\left(\hat{I}_{d}^{{}_{q-F}},% \cdots,\hat{I}_{d}^{z_{q}},\cdots,\hat{I}_{d}^{z_{q+F}}\right),

(17)

where $\hat{I}^{z_{q}}_{d}=\left[I^{z_{q}}_{n},R^{z_{q}}_{n}\right]$ is the concatenation of the low-dose image $I^{z_{q}}_{n}$ and the residual $R^{z_{q}}_{n}$ reconstructed using the raw projections and MPD-Net predictions, respectively.

Table 1: Dataset Partition Summary

Partition	Siemens Scanner Subset	GE Scanner Subset
Training	L004, L006, L019, L033, L057, L064, L071, L072, L081, L107	L012, L024, L027, L030, L036, L044, L045, L048, L079, L082
	L110, L114, L116, L125, L131, L134, L150, L160, L170, L175	L094, L111, L113, L121, L127, L129, L133, L136, L138, L143
	L178, L179, L193, L203, L210, L212, L220, L221, L232, L237	L147, L154, L163, L166, L171, L172, L181, L183, L185, L188
	L248, L273, L299	L196, L216
Validation	L077, L148, L229	L043, L213, L238
Testing	L014, L056, L058, L075, L123, L145, L186, L187, L209, L219	L218, L228, L231, L234, L235, L244, L250, L251, L257, L260
Testing	L241, L266, L277	L267, L269, L288

4 Experiments and Analysis

In this section, we present evaluation results and analysis of the proposed framework. We first provide implementation details and ablation studies to verify our design. We then evaluate its performance both quantitatively and qualitatively.

4.1 Implementation Details

The proposed framework was implemented in PyTorch and trained on an RTX 3090 GPU with i9-10980XE CPU and 128GB RAM. We chose Adam Optimizer to update the model parameters. The Low Dose CT Image and Projection Data V6 [70], obtained from two scanners (Siemens SOMATOM Definition Flash, dubbed Siemens dataset, and GE Discovery CT750i, dubbed GE dataset), was used for training, validation, and testing. Both Siemens and GE datasets contain paired full-dose (7.6-28.8 mGy for Siemens studies and 9.2-21.6 mGy for GE studies) and quarter-dose (1.9-7.2 mGy for Siemens studies and 2.3-5.4 mGy for GE studies) data. The detailed data partitions are given in Table 1. The test dataset of the 2016 Low-dose CT AAPM Grand Challenge [71] was also employed in subjective evaluation since it contains scans from the control group, in which the dose level ranges from 0.8 mGy to 4.5 mGy. We followed the methods described in [72, 60, 73] for rebinning, filtering, back-projection, and weighted summation. The default Shepp–Logan filter was chosen as the reconstruction kernel, and the slice thickness from metadata was adopted.

During the network training, we used L1 loss to supervise both MPD-Net and MIR-Net. The initial learning rate was set to $1\times 10^{-4}$ , then reduced to $1\times 10^{-5}$ if the model performance on the validation dataset had no further improvement after certain steps. The complete convergence of the two CNNs took about 40 and 150 epochs, respectively. For both networks, we set $F=2$ as the size of the sliding window.

4.2 Ablation Studies

We conducted experiments to show how the image quality is progressively improved by each component in the proposed framework, namely MPD-Net, MIR-Net, multi-frame input, external noise prior, and decoupled input (i.e., separated $P_{n*}^{t}$ and $R_{*}^{t}$ ). Ablation studies were performed on the Siemens test dataset; we report the measured mean square error (MSE) and structural similarity (SSIM) [74] in Table 2.

Table 2: Results of Ablation Studies

MPD	MIR	Multi	Prior	Decoupled	MSE $\downarrow$	SSIM $\uparrow$
✗	✗	✗	✗	✗	774.46	0.9577
✓	✗	✗	✗	✗	499.93	0.9685
✓	✓	✗	✗	✗	418.54	0.9730
✓	✓	✓	✗	✗	309.79	0.9774
✓	✓	✓	✓	✗	236.60	0.9819
✓	✓	✓	✓	✓	184.98	0.9881

It can be seen that both MPD-Net and MIR-Net play essential roles in enhancing the reconstruction quality of LDCT. For MPD-Net, introducing external noise prior leads to quality improvement, whereas placing the model before the rebinning operation results in sub-optimal performance, where we observe artifacts as shown in Fig. 7. Similarly, it is noticed that the summation (i.e., $P_{n*}^{t}+R_{*}^{t}$ ) in (8) affects the reconstruction, mainly due to over-smoothing, whereas volumetric input brings extra improvement to the final image quality.

We also tested the performance of MIR-Net as a standalone image-domain denoiser. Although the results regarding MSE (215.62) and SSIM (0.9807) seem promising, further inspection of the noise power spectrum (NPS), shown in Fig. 8, indicates an obvious over-smoothing problem evidenced by the shifted peak with a higher level of noise power in low frequencies. In short, all these results align well with our analysis above.

Table 3: Objective Evaluation Summary (Quarter-dose)

	Siemens Test Dataset		GE Test Dataset		ACR CT Phantom
	$\mathrm{CTDI_{vol}}$ : 1.9-7.2 mGy		$\mathrm{CTDI_{vol}}$ : 2.3-5.4 mGy		$\mathrm{CTDI_{vol}}$ : 3.4 mGy
	WL: 40, WW: 300		WL: 40, WW: 400		Full-range
	MSE $\downarrow$	SSIM $\uparrow$	MSE $\downarrow$	SSIM $\uparrow$	MSE $\downarrow$	SSIM $\uparrow$
Baseline	$774.46\pm 811.18$	$0.9577\pm 0.0323$	$1768.27\pm 1040.13$	$0.9152\pm 0.0389$	$187.16\pm 26.89$	$0.9842\pm 0.0016$
BM3D	$670.30\pm 755.00$	$0.9631\pm 0.0296$	$1176.68\pm 902.82$	$0.9424\pm 0.0339$	$184.61\pm 26.52$	$0.9844\pm 0.0016$
NLM	$724.50\pm 780.64$	$0.9605\pm 0.0304$	$1268.01\pm 944.06$	$0.9391\pm 0.0347$	$187.09\pm 26.86$	$0.9842\pm 0.0016$
RED-CNN	$210.50\pm 208.29$	$0.9863\pm 0.0122$	$477.36\pm 216.44$	$0.9713\pm 0.0126$	$77.40\pm 20.93$	$0.9925\pm 0.0018$
WGAN	$453.66\pm 313.47$	$0.9691\pm 0.0168$	$1004.27\pm 473.89$	$0.9449\pm 0.0236$	$179.32\pm 122.46$	$0.9874\pm 0.0025$
CPCE-3D	$324.67\pm 284.80$	$0.9797\pm 0.0156$	$761.76\pm 368.73$	$0.9589\pm 0.0184$	$130.91\pm 96.21$	$0.9906\pm 0.0025$
QAE	$241.36\pm 274.95$	$0.9853\pm 0.0134$	$526.67\pm 236.40$	$0.9699\pm 0.0132$	$87.41\pm 23.91$	$0.9913\pm 0.0024$
DP-ResNet	$270.77\pm 122.73$	$0.9795\pm 0.0080$	$451.32\pm 202.43$	$0.9734\pm 0.0117$	$153.08\pm 193.50$	$0.9927\pm 0.0033$
EDCNN	$240.39\pm 275.13$	$0.9848\pm 0.0144$	$523.12\pm 253.46$	$0.9694\pm 0.0139$	$84.48\pm 18.74$	$0.9919\pm 0.0019$
TransCT	$250.96\pm 226.68$	$0.9836\pm 0.0141$	$512.06\pm 221.15$	$0.9700\pm 0.0126$	$98.43\pm 72.26$	$0.9915\pm 0.0026$
DU-GAN	$286.49\pm 309.19$	$0.9820\pm 0.0162$	$584.49\pm 275.07$	$0.9667\pm 0.0150$	$108.70\pm 23.07$	$0.9898\pm 0.0018$
CTformer	$242.06\pm 282.92$	$0.9847\pm 0.0147$	$785.66\pm 501.63$	$0.9550\pm 0.0221$	$80.16\pm 19.46$	$0.9925\pm 0.0015$
Ours	$\textbf{184.98}\pm\textbf{106.53}$	$\textbf{0.9881}\pm\textbf{0.0067}$	$\textbf{422.39}\pm\textbf{201.62}$	$\textbf{0.9751}\pm\textbf{0.0115}$	$\textbf{51.62}\pm\textbf{14.67}$	$\textbf{0.9951}\pm\textbf{0.0014}$

4.3 Objective Evaluation

We conducted an objective evaluation to compare the proposed framework with nine state-of-the-art learning-based methods, namely RED-CNN [11], WGAN [41], CPCE-3D [40], QAE [22], DP-ResNet [14], EDCNN [25], TransCT [31], DU-GAN [75], and CTformer [76]. We retrained all these methods using our training dataset for a fair comparison. Specifically, as the overall volume of the training dataset differs from their original setups, we retrained each model with more iterations. The retraining was terminated when the validation performance saturated, which also applied to GAN-based methods as they still employed pixel-wise paired supervision such as L1 or L2. The performance of two traditional denoising methods, i.e., BM3D [4] and non-local means (NLM) [3], was also tested as references. We employed MSE and SSIM as quantitative evaluation metrics. The results are reported in Table 3.

Our method reduces the MSE score by up to 70%, outperforming the others by a significant margin in all studies, indicating its superiority. Furthermore, our results on two scanners show high consistency, whereas some methods, such as DP-ResNet, EDCNN, TransCT, and CTformer, witness lower robustness. To better visualize the image quality, we present two sample slices in their corresponding zoom-in patches: In Fig. 9, the proposed method successfully recovers the continuous structure; in comparison, most methods failed to remove the heavy streak artifacts. In Fig. 10, the branches of the low-contrast structure have better visibility and sharpness in our results. Although the results from DP-ResNet are also promising, a certain degree of blurriness is observed.

4.4 Subjective Test

Although the quantitative evaluation shows significant improvement of the proposed framework against other methods, the evaluation metrics (i.e., MSE, and SSIM) might not reflect the real-world application scenario. In other words, a professional radiologist may pay special attention to certain aspects rather than general image quality metrics when reviewing CT images. As the main application is to assist clinical diagnosis, two radiologists with clinical experience of 12 years and 9 years were invited to perform a series of subjective evaluations.

Table 4: Results of Lesion Detection

Method	Recall	Precision	Accuracy	F1
RED-CNN	0.5128	0.5714	0.3929	0.5405
WGAN	0.4250	0.8947	0.4444	0.5763
CPCE-3D	0.4390	0.8182	0.4375	0.5714
QAE	0.4595	0.6296	0.4000	0.5312
DP-ResNet	0.5000	0.7407	0.4375	0.5970
EDCNN	0.5122	0.7500	0.4375	0.6087
TransCT	0.4872	0.7037	0.4510	0.5758
DU-GAN	0.3500	0.6667	0.3400	0.4590
Ctformer	0.5000	0.8400	0.5000	0.6269
Ours	0.6098	0.9615	0.6304	0.7463

The first subjective evaluation was performed using the 2016 Low-dose CT AAPM Grand Challenge test dataset, which consists of 16 patient scans with lesions and 4 healthy references. The radiologists were asked to mark all the lesions without prior knowledge of the patient’s diagnosis. By comparing the radiologists’ annotations with the ground truth provided by the challenge committee, we calculated precision, recall, accuracy, and F1 score for each method. The results are listed in Table 4. All the metrics of the proposed framework show significant improvements compared to the others. On the one hand, our result receives the highest precision over 0.9, meaning that it does not produce artifacts that could affect the diagnostic acceptability. On the other hand, the higher recall rate indicates better diagnostic sensitivity than the others. Overall measurements in precision and the F1 score further confirm the true value of the proposed framework in clinical exams.

Table 5: Results of Subjective Quality Evaluation

	RED-CNN	WGAN	CPCE-3D	QAE	DP-ResNet	EDCNN	TransCT	DU-GAN	CTformer	Ours
Noise suppression	4.38	3.08	3.19	3.65	4.38	4.15	4.35	3.31	3.69	4.62
Contrast retention	3.88	3.77	3.96	3.73	3.42	4.15	3.62	4.12	3.85	4.46
Margin sharpness	2.65	2.85	2.81	2.58	2.46	2.73	2.46	2.69	2.54	3.38
Diagnostic acceptability	3.58	3.31	3.58	3.31	3.73	3.81	3.50	3.46	3.35	4.19

The second subjective evaluation was to evaluate the perceptual quality of each method using the Siemens test dataset. During the test, three images, including a normal-dose CT, a low-dose CT, and a refined low-dose CT, were presented to the reviewer. For each study, the reviewers were requested to assign scores concerning noise suppression, contrast retention, margin sharpness, and diagnostic acceptability¹¹1All studies in the Siemens dataset contain at least one lesion., respectively. A five-point scale table was employed, where the lowest score (1) was assigned to low-dose CT whereas the highest score (5) was assigned to full-dose CT. All processed studies from each method were de-identified and shuffled to avoid biased judgments. Since there were no absolute clean references and the full-dose counterparts still contained a certain amount of noise, the radiologists were asked to assign scores no higher than 5, no matter whether the sample was of higher quality than the full-dose reference. The results are summarized in Table 5. Again, the proposed method receives the highest scores by a significant margin in all aspects, which is well-aligned with the results of the previous evaluations.

Interestingly, RED-CNN, DP-ResNet, and TransCT get high noise suppression scores but low margin sharpness scores due to aggressive denoising (i.e., over-smoothing). On the contrary, introducing detail preservation constraints (e.g., perceptual loss in CPCE-3D and adversarial loss in WGAN and DU-GAN) leads to compromised denoising performance and reduced diagnostic sensitivity. In comparison, the proposed method can reach a pleasing balance between these aspects, yielding satisfactory results that radiologists prefer.

Last, we show two representative examples from these evaluations: Fig. 11 presents an overview of the two samples in full-dose, in which their zoom-in crops are compared in Figs. 12-13: Fig. 12 depicts a lesion that is barely noticeable by experts due to its low contrast. Similar to the full-dose references, our method could transfer details more faithfully and maintain the contrast but with an even lower noise level. In comparison, some methods either fail to suppress noise (e.g., WGAN, DU-GAN) or introduce artifacts (e.g., CTformer), making them less reliable in clinical exams. Fig. 13 visualizes the cross-section of the aorta, where the higher uniformity in our result indicates better denoising quality.

4.5 Phantom Examination

To further analyze the standardized CT properties, we conducted an examination using ACR 464 CT Phantom. In this experiment, two real scans representing the full dose (13.5 mGy) and the quarter dose (3.4 mGy) of the same phantom were obtained. Because the same scanner model was also used previously (i.e., Siemens SOMATOM Definition Flash), we directly applied all the learning-based models to these scans without retraining. The results in terms of MSE and SSIM are reported in Table 3, where the proposed method trained using synthetic data still works well in real clinic exams. After that, the slices of the phantom’s first and third layers were used to analyze the task-based transfer function (TTF) and NPS. The methods and default settings from [77] were employed to obtain the results.

First, we show the NPS plot and TTF plot of each method in Fig. 14. It can be observed that every candidate achieves a certain degree of noise reduction on the low-dose phantom scan. However, many do not show satisfactory noise suppression on low-frequencies. The facts of lower peak frequency, higher noise magnitude, and degraded transfer function indicate increased blurriness and decreased spatial resolution in the resulting images. In contrast, the proposed method produces a much smoother NPS with a lower degree of noise than the full-dose version. Although the noise power at the low-frequency band is slightly higher than the full-dose reference, its improved TTF implies that the details are reproduced more faithfully, whereas sharpness degradation was marginal.

Second, we analyzed the robustness under various dose levels. Following the noise insertion method described in [68], we generated a bunch of simulated low-dose projections ranging from 17% to 80% of the full dose level. Fig. 15 presents the dose-dependent NPS and TTF plots. The resultant NPS plots of the proposed method show consistent denoising performance across a wide range of dose levels, and the TTF plots of two inserts further support its high robustness.

Table 6: CT Number Accuracy under Various Dose Levels

Dose Level	Air	Bone	Acrylic	Polyethylene
Dose Level	(-1000 HU)	(955 HU)	(120 HU)	(-95 HU)
17%	-968.31	847.61	116.65	-84.84
20%	-968.28	847.80	116.67	-84.85
25%	-968.26	847.96	116.68	-84.85
30%	-968.24	848.01	116.70	-84.84
38%	-968.25	847.97	116.71	-84.83
48%	-968.25	847.90	116.69	-84.80
62%	-968.25	847.88	116.67	-84.77
80%	-968.24	847.91	116.65	-84.73
100%^a	-968.38	847.34	116.60	-84.69

a

Reference full-dose images without denoising.

Finally, we measured the CT number accuracy of the processed images against the full-dose references. The results are listed in Table 6. These reports confirm that no particular bias is presented in the reconstructed images, and dose levels do not affect this observation.

5 Conclusions

The main goal of this paper is to improve the quality of LDCT images for multi-slice spiral CT. We comprehensively discussed the proposed two-stage processing pipeline across both projection and image domains. We analyzed the impacts of rebinning and filtered back-projection to the final reconstruction. To fully utilize the intra-slice and inter-slice similarity inherent in the acquired projection volume, we transformed the task into a multi-frame-based denoising and refinement problem. Although conventional rebinning and reconstruction methods link the two networks, we performed several studies to verify the effectiveness of our method and show its superiority over state-of-the-art methods through qualitative and quantitative performance evaluations. Future work might include better detail preservation via higher-level loss functions other than a simple L1 penalty.

Acknowledgment

The authors would like to thank the anonymous reviewers for their valuable feedback and constructive suggestions, which have greatly improved the quality and clarity of this manuscript.

References

[1] A. F. H. Ortiz, L. J. F. Beaujon, S. Y. G. Villamizar, and F. F. F. López, “Magnetic resonance versus computed tomography for the detection of retroperitoneal lymph node metastasis due to testicular cancer: A systematic literature review,” European Journal of Radiology Open, vol. 8, p. 100372, 2021.
[2] D. J. Brenner and E. J. Hall, “Computed tomography—an increasing source of radiation exposure,” New England Journal of Medicine, vol. 357, no. 22, pp. 2277–2284, 2007.
[3] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 2, 2005, pp. 60–65.
[4] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, 2007.
[5] Y. Lu and S.-W. Jung, “Progressive joint low-light enhancement and noise removal for raw images,” IEEE Trans. Image Process., vol. 31, pp. 2390–2404, 2022.
[6] A. Abdelhamed, M. Afifi, R. Timofte, and M. S. Brown, “Ntire 2020 challenge on real image denoising: Dataset, methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 496–497.
[7] S. Nah, S. Son, S. Lee, R. Timofte, and K. M. Lee, “Ntire 2021 challenge on image deblurring,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 149–165.
[8] J. Cai, S. Gu, R. Timofte, and L. Zhang, “Ntire 2019 challenge on real image super-resolution: Methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 2211–2223.
[9] K. H. **, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
[10] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for low-dose x-ray ct reconstruction,” Medical Physics, vol. 44, no. 10, pp. e360–e375, 2017.
[11] H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Low-dose ct with a residual encoder-decoder convolutional neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2524–2535, 2017.
[12] D. Wu, K. Kim, G. El Fakhri, and Q. Li, “Iterative low-dose ct reconstruction with priors trained by artificial neural network,” IEEE Trans. Med. Imag., vol. 36, no. 12, pp. 2479–2486, 2017.
[13] T. Würfl, F. C. Ghesu, V. Christlein, and A. Maier, “Deep learning computed tomography,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2016, pp. 432–440.
[14] X. Yin, Q. Zhao, J. Liu, W. Yang, J. Yang, G. Quan, Y. Chen, H. Shu, L. Luo, and J.-L. Coatrieux, “Domain progressive 3D residual convolution network to improve low-dose CT imaging,” IEEE Trans. Med. Imag., vol. 38, no. 12, pp. 2903–2913, 2019.
[15] Y. Li, K. Li, C. Zhang, J. Montoya, and G.-H. Chen, “Learning to reconstruct computed tomography images directly from sinogram data under a variety of data acquisition conditions,” IEEE Trans. Med. Imag., vol. 38, no. 10, pp. 2469–2481, 2019.
[16] J. He, Y. Wang, and J. Ma, “Radon inversion via deep learning,” IEEE Trans. Med. Imag., vol. 39, no. 6, pp. 2076–2087, 2020.
[17] Y. Zhang, D. Hu, Q. Zhao, G. Quan, J. Liu, Q. Liu, Y. Zhang, G. Coatrieux, Y. Chen, and H. Yu, “CLEAR: comprehensive learning enabled adversarial reconstruction for subtle structure enhanced low-dose CT imaging,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3089–3101, 2021.
[18] Y. Zhang, Y. Wang, W. Zhang, F. Lin, Y. Pu, and J. Zhou, “Statistical iterative reconstruction using adaptive fractional order regularization,” Biomedical Optics Express, vol. 7, no. 3, pp. 1015–1029, 2016.
[19] Y. Chen, D. Gao, C. Nie, L. Luo, W. Chen, X. Yin, and Y. Lin, “Bayesian statistical reconstruction for low-dose x-ray computed tomography using an adaptive-weighting nonlocal prior,” Computerized Medical Imaging and Graphics, vol. 33, no. 7, pp. 495–500, 2009.
[20] Q. Xu, H. Yu, X. Mou, L. Zhang, J. Hsieh, and G. Wang, “Low-dose x-ray ct reconstruction via dictionary learning,” IEEE Trans. Med. Imag., vol. 31, no. 9, pp. 1682–1697, 2012.
[21] P. F. Feruglio, C. Vinegoni, J. Gros, A. Sbarbati, and R. Weissleder, “Block matching 3d random noise filtering for absorption optical projection tomography,” Physics in Medicine & Biology, vol. 55, no. 18, p. 5401, 2010.
[22] F. Fan, H. Shan, M. K. Kalra, R. Singh, G. Qian, M. Getzin, Y. Teng, J. Hahn, and G. Wang, “Quadratic autoencoder (Q-AE) for low-dose CT denoising,” IEEE Trans. Med. Imag., vol. 39, no. 6, pp. 2035–2050, 2019.
[23] L. A. Zavala-Mondragon, P. Rongen, J. O. Bescos, P. H. De With, and F. Van der Sommen, “Noise reduction in CT using learned wavelet-frame shrinkage networks,” IEEE Trans. Med. Imag., vol. 41, no. 8, pp. 2048–2066, 2022.
[24] L. A. Zavala-Mondragon, P. H. de With, and F. van der Sommen, “Image noise reduction based on a fixed wavelet frame and CNNs applied to CT,” IEEE Trans. Image Process., vol. 30, pp. 9386–9401, 2021.
[25] T. Liang, Y. **, Y. Li, and T. Wang, “EDCNN: Edge enhancement-based densely connected network with compound loss for low-dose CT denoising,” in Proceedings of the IEEE International Conference on Signal Processing, vol. 1, 2020, pp. 193–198.
[26] M. Matsuura, J. Zhou, N. Akino, and Z. Yu, “Feature-aware deep-learning reconstruction for context-sensitive x-ray computed tomography,” IEEE Trans. Radiat. Plasma Med. Sci., vol. 5, no. 1, pp. 99–107, 2020.
[27] X. Tao, H. Zhang, Y. Wang, G. Yan, D. Zeng, W. Chen, and J. Ma, “VVBP-tensor in the FBP algorithm: its properties and application in low-dose CT reconstruction,” IEEE Trans. Med. Imag., vol. 39, no. 3, pp. 764–776, 2019.
[28] X. Tao, Y. Wang, L. Lin, Z. Hong, and J. Ma, “Learning to reconstruct CT images from the VVBP-tensor,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3030–3041, 2021.
[29] L. Xu, Y. Zhang, Y. Liu, D. Wang, M. Zhou, J. Ren, J. Wei, and Z. Ye, “Low-dose CT denoising using a structure-preserving kernel prediction network,” in Proceedings of the IEEE International Conference on Image Processing, 2021, pp. 1639–1643.
[30] B. Kim, H. Shim, and J. Baek, “Weakly-supervised progressive denoising with unpaired CT images,” Medical Image Analysis, vol. 71, p. 102065, 2021.
[31] Z. Zhang, L. Yu, X. Liang, W. Zhao, and L. Xing, “TransCT: dual-path transformer for low dose computed tomography,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2021, pp. 55–64.
[32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proceedings of the Advances in Neural Information Processing systems, vol. 30, pp. 6000–6010, 2017.
[33] D. Wang, Z. Wu, and H. Yu, “TED-net: Convolution-free T2T vision transformer-based encoder-decoder dilation network for low-dose ct denoising,” in Proceedings of the International Workshop on Machine Learning in Medical Imaging, 2021, pp. 416–425.
[34] N. Huber, T. Anderson, A. Missert, M. Adkins, S. Leng, J. Fletcher, C. McCollough, L. Yu, and K. N. Glazebrook, “Clinical evaluation of a phantom-based deep convolutional neural network for whole-body-low-dose and ultra-low-dose ct skeletal surveys,” Skeletal Radiology, vol. 51, no. 1, pp. 145–151, 2022.
[35] J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2noise: Learning image restoration without clean data,” arXiv preprint arXiv:1803.04189, 2018.
[36] A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void-learning denoising from single noisy images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2129–2137.
[37] Z. Zhang, X. Liang, W. Zhao, and L. Xing, “Noise2context: Context-assisted learning 3d thin-layer for low-dose ct,” Medical Physics, vol. 48, no. 10, pp. 5794–5803, 2021.
[38] C. Niu, M. Li, F. Fan, W. Wu, X. Guo, Q. Lyu, and G. Wang, “Noise suppression with similarity-based self-supervised deep learning,” IEEE Trans. Med. Imag., vol. 42, no. 6, pp. 1590–1602, 2022.
[39] X. Yi, E. Walia, and P. Babyn, “Generative adversarial network in medical imaging: A review,” Medical Image Analysis, vol. 58, p. 101552, 2019.
[40] H. Shan, Y. Zhang, Q. Yang, U. Kruger, M. K. Kalra, L. Sun, W. Cong, and G. Wang, “3-d convolutional encoder-decoder network for low-dose ct via transfer learning from a 2-d trained network,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1522–1534, 2018.
[41] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low-dose ct image denoising using a generative adversarial network with wasserstein distance and perceptual loss,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1348–1357, 2018.
[42] M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao, “Sacnn: Self-attention convolutional neural network for low-dose ct denoising with self-supervised perceptual loss network,” IEEE Trans. Med. Imag., vol. 39, no. 7, pp. 2289–2301, 2020.
[43] M. Ghahremani, M. Khateri, A. Sierra, and J. Tohka, “Adversarial distortion learning for medical image denoising,” arXiv preprint arXiv:2204.14100, 2022.
[44] X. Zhang, Z. Han, H. Shangguan, X. Han, X. Cui, and A. Wang, “Artifact and detail attention generative adversarial networks for low-dose ct denoising,” IEEE Trans. Med. Imag., vol. 40, no. 12, pp. 3901–3918, 2021.
[45] J. Gu and J. C. Ye, “AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising,” IEEE Trans. Comput. Imaging, vol. 7, pp. 73–85, 2021.
[46] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2223–2232.
[47] K. Lee and W.-K. Jeong, “ISCL: Interdependent self-cooperative learning for unpaired image denoising,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 3238–3248, 2021.
[48] H. Gupta, K. H. **, H. Q. Nguyen, M. T. McCann, and M. Unser, “CNN-based projected gradient descent for consistent CT image reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1440–1453, 2018.
[49] E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional framelet denosing for low-dose CT via wavelet residual network,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1358–1369, 2018.
[50] H. Chen, Y. Zhang, Y. Chen, J. Zhang, W. Zhang, H. Sun, Y. Lv, P. Liao, J. Zhou, and G. Wang, “LEARN: Learned experts’ assessment-based reconstruction network for sparse-data ct,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1333–1347, 2018.
[51] S. Roth and M. J. Black, “Fields of experts: A framework for learning image priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, vol. 2, 2005, pp. 860–867.
[52] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 394–405, 2018.
[53] W. Xia, Z. Lu, Y. Huang, Z. Shi, Y. Liu, H. Chen, Y. Chen, J. Zhou, and Y. Zhang, “MAGIC: Manifold and graph integrative convolutional network for low-dose ct reconstruction,” IEEE Trans. Med. Imag., vol. 40, no. 12, pp. 3459–3472, 2021.
[54] J. He, Y. Yang, Y. Wang, D. Zeng, Z. Bian, H. Zhang, J. Sun, Z. Xu, and J. Ma, “Optimizing a parameterized plug-and-play ADMM for iterative low-dose CT reconstruction,” IEEE Trans. Med. Imag., vol. 38, no. 2, pp. 371–382, 2018.
[55] I. Y. Chun, X. Zheng, Y. Long, and J. A. Fessler, “BCD-Net for low-dose CT reconstruction: Acceleration, convergence, and generalization,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019, pp. 31–40.
[56] Y. Chun and J. A. Fessler, “Deep BCD-net using identical encoding-decoding CNN structures for iterative image recovery,” in Proceedings of the IEEE Image, Video, and Multidimensional Signal Processing Workshop, 2018, pp. 1–5.
[57] S. Ye, Z. Li, M. T. McCann, Y. Long, and S. Ravishankar, “Unified supervised-unsupervised (SUPER) learning for X-ray CT image reconstruction,” IEEE Trans. Med. Imag., vol. 40, no. 11, pp. 2986–3001, 2021.
[58] G. Zang, R. Idoughi, R. Li, P. Wonka, and W. Heidrich, “IntraTomo: Self-supervised learning-based tomography via sinogram synthesis and prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1960–1970.
[59] M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020.
[60] K. Stierstorfer, A. Rauscher, J. Boese, H. Bruder, S. Schaller, and T. Flohr, “Weighted FBP—a simple approximate 3D FBP algorithm for multislice spiral CT with good dose usage for arbitrary pitch,” Physics in Medicine & Biology, vol. 49, no. 11, p. 2209, 2004.
[61] B. R. Whiting, P. Massoumzadeh, O. A. Earl, J. A. O’Sullivan, D. L. Snyder, and J. F. Williamson, “Properties of preprocessed sinogram data in x-ray computed tomography,” Medical Physics, vol. 33, no. 9, pp. 3290–3303, 2006.
[62] R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Ph.D. dissertation, Stanford University, 2005.
[63] J. Hsieh, Computed tomography: principles, design, artifacts, and recent advances. SPIE press, 2003.
[64] M. Tassano, J. Delon, and T. Veit, “FastDVDnet: Towards real-time deep video denoising without flow estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1354–1363.
[65] H. Wu, Y. Qu, S. Lin, J. Zhou, R. Qiao, Z. Zhang, Y. Xie, and L. Ma, “Contrastive learning for compact single image dehazing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 551–10 560.
[66] T. Brooks, B. Mildenhall, T. Xue, J. Chen, D. Sharlet, and J. T. Barron, “Unprocessing images for learned raw denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 036–11 045.
[67] L. Bao, Z. Yang, S. Wang, D. Bai, and J. Lee, “Real image denoising based on multi-scale residual dense block and cascaded u-net with block-connection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 448–449.
[68] R. Zeng, C. Y. Lin, Q. Li, L. Jiang, M. Skopec, J. A. Fessler, and K. J. Myers, “Performance of a deep learning-based ct image denoising method: Generalizability over dose, reconstruction kernel, and slice thickness,” Medical physics, vol. 49, no. 2, pp. 836–853, 2022.
[69] J. Xu, L. Zhang, and D. Zhang, “External prior guided internal prior learning for real-world noisy image denoising,” IEEE Trans. Image Process., vol. 27, no. 6, pp. 2996–3010, 2018.
[70] T. R. Moen, B. Chen, D. R. Holmes III, X. Duan, Z. Yu, L. Yu, S. Leng, J. G. Fletcher, and C. H. McCollough, “Low-dose CT image and projection dataset,” Medical Physics, vol. 48, no. 2, pp. 902–911, 2021.
[71] A. A. of Physicists in Medicine. (2016) 2016 low dose ct grand challenge. [Online]. Available: https://www.aapm.org/grandchallenge/lowdosect/
[72] T. Flohr, K. Stierstorfer, S. Ulzheimer, H. Bruder, A. Primak, and C. McCollough, “Image reconstruction and image quality evaluation for a 64-slice CT scanner with-flying focal spot,” Medical Physics, vol. 32, no. 8, pp. 2536–2547, 2005.
[73] J. M. Hoffman, F. Noo, S. Young, S. S. Hsieh, and M. McNitt-Gray, “FreeCT_ICD: An open-source implementation of a model-based iterative reconstruction method using coordinate descent optimization for CT imaging investigations,” Medical physics, vol. 45, no. 8, pp. 3591–3603, 2018.
[74] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
[75] Z. Huang, J. Zhang, Y. Zhang, and H. Shan, “DU-GAN: Generative adversarial networks with dual-domain u-net-based discriminators for low-dose ct denoising,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, 2021.
[76] D. Wang, F. Fan, Z. Wu, R. Liu, F. Wang, and H. Yu, “Ctformer: Convolution-free token2token dilated vision transformer for low-dose ct denoising,” arXiv preprint arXiv:2202.13517, 2022.
[77] J. Greffier, Y. Barbotteau, and F. Gardavaud, “iQMetrix-CT: New software for task-based image quality assessment of phantom ct images,” Diagnostic and Interventional Imaging, vol. 103, no. 11, pp. 555–562, 2022.