(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version

¹¹institutetext: Sun Yat-sen University ²²institutetext: Aviation University of Air Force ³³institutetext: Huawei Cloud
³³email: {[email protected],guoyulan@sysu}.edu.cn

Preserving Full Degradation Details for Blind Image Super-Resolution

Hongda Liu 11 Longguang Wang 22 Ye Zhang 11 Kaiwen Xue 33 Shunbo Zhou 33 Yulan Guo 11

Abstract

The performance of image super-resolution relies heavily on the accuracy of degradation information, especially under blind settings. Due to absence of true degradation models in real-world scenarios, previous methods learn distinct representations by distinguishing different degradations in a batch. However, the most significant degradation differences may provide shortcuts for the learning of representations such that subtle difference may be discarded. In this paper, we propose an alternative to learn degradation representations through reproducing degraded low-resolution (LR) images. By guiding the degrader to reconstruct input LR images, full degradation information can be encoded into the representations. In addition, we develop an energy distance loss to facilitate the learning of the degradation representations by introducing a bounded constraint. Experiments show that our representations can extract accurate and highly robust degradation information. Moreover, evaluations on both synthetic and real images demonstrate that our ReDSR achieves state-of-the-art performance for the blind SR tasks.

Keywords:

Image Super-Resolution Preserving Degradation Information Energy Distance Loss

1 Introduction

Single image super-resolution (SR) aims at reconstructing a high-resolution (HR) image from its low-resolution (LR) counterpart. As a typical inverse problem, SR is highly coupled with the degradation model [3]. In early stages, most CNN-based methods [5, 40, 29, 52] are developed based on an assumption that the degradation is known and fixed (e.g., bicubic downsampling). However, these methods suffer severe performance drop when the real degradation is different from the assumption. To remedy this, many efforts have been made to endow SR networks with the capability of handling various degradations [50, 43, 28, 20, 49, 39].

To ease the ill-posedness of the SR task under diverse degradations, numerous works were first conducted to degradation estimation and then use it as priori information for SR [13, 3, 20, 28]. However, these methods are sensitive to the estimated degradation and the estimation error may be magnified by the SR network to produce severe artifacts. Inspired by contrastive learning [6, 15], Wang et al. [42] proposed to distinguish different degradations rather than explicitly estimate the degradation. Xia et al. [46] further enhanced the discriminability of degradation representations by employing knowledge distillation. However, the learned degradation representations cannot well capture subtle degradation difference, resulting in limited generalization performance.

Refer to caption — Figure 1: An illustration of degradation representation space. (a) Representations learned without a bounded constraint (*e.g.*, DASR [42]). (b) Representations learned by our method with a bounded constraint.

In this paper, we propose an alternative to learn degradation representations by reproducing degraded LR images. Specifically, the LR image is first fed to an encoder to extract degradation representation. Then, this degradation representation is leveraged by a degrader to reproduce the LR image from the HR image. By reproducing the LR image, full degradation information details can be captured in a compact degradation representation. With degradation information being encoded, degradation representation is passed to the generator to super-resolve the input LR image. Existing re-degradation methods [26, 21] only consider a single degradation type (e.g., blur) and cannot distinguish different degradations, resulting in weak robust representations. In real-world applications where images are corrupted by multiple degradations, different degradations are required to be distinguished to control the synthesized LR image to maintain degradation consistency. To promote the learning of degradation representation, we introduce an energy distance loss to bound the learned representations in a pre-defined distribution (e.g., Gaussian distribution). As compared to previous approaches that learn representation in an unbounded space (Fig. 1(a)), our energy distance loss facilitates more distinctive representations to be extracted (Fig. 1(b)), especially for degradations unseen in the training set [2].

In summary, our contributions are three-fold:

•

We propose an alternative to extract degradation information from LR images by learning degradation representation to guide the degrader to reproduce the input LR images.
•

We introduce an energy distance loss to facilitate the learning of discriminative representations by constraining the representations in a bounded pre-defined space.
•

Extensive experiments show that our method produces state-of-the-art performance on benchmark datasets.

2 Related Work

In this section, we first briefly review recent advances of blind image super-resolution methods. Then, we discuss energy-based models that are related to our work.

2.1 Blind Image Super-Resolution

Blind image super resolution aims to super-resolve LR images with unknown degradations. Early methods [3, 13] commonly follow a two-step pipeline that first estimate the degradation model and then conduct image SR conditioned on the degradation. Specifically, Gu et al. [13] proposed an iterative kernel correction (IKC) method to alternately correct estimated degradation and conduct image SR. Huang et al. [20] developed a deep alternating network (DAN) by iteratively estimating the degradation and restoring the SR image. Liang et al. [28] proposed a mutual affine network (MANet) to exploit the interdependence between different channels by mutual affine transformation. Since numerous iterations are required to obtain accurate degradation information at test time, these methods are usually time-consuming.

Inspired by the developments of contrastive learning [6, 15], several efforts [42, 55, 44] have been made to leverage contrastive learning to extract discriminative representations to obtain degradation information. Wang et al. [42] first introduced degradation representation learning to distinguish different degradations in the representation space rather than explicit degradation estimation. Zhou et al. [55] proposed content-aware embedding to encode more information into the representations. Xia et al. [46] proposed to employ knowledge distillation to further improve the discriminability of degradation representations in a two-stage pipeline.

2.2 Energy-Based Model

Energy-based model (EBM) has demonstrated great advantages in modeling data distributions for image generation [25, 24, 14, 53, 7]. Early EBMs [1, 17, 37, 38] formulated the energy function as a composition of latent and observable variables. Later EBMs [34, 16, 35] directly mapped image samples to the representations in a certain distribution. However, the number of samples limits the quality of generated images [35]. To remedy this, Zhao et al. [53] combines GAN [10] and Auto-Encoder [18] to achieve image quality improvement. Previous works commonly minimize L1/L2 distance between the data distribution and a pre-defined distribution for data modeling. Nevertheless, these methods suffer slow convergence and low image quality [7].

Recently, Gretton et al. [11] proposed an empirical estimate of Maximum Mean Discrepancy (MMD) to measure the distance between two distributions. Compared with L1/L2 distance, MMD preforms more stably and consistently in data distribution modeling [12, 27]. Later, Rizzo et al. [36] further simplified the MMD to an energy distance that is scalable to high-dimension space and easier to be implemented [9].

3 Methodology

In this section, we first introduce the problem formulation of blind image SR. Then, we present our method in details.

3.1 Problem Formulation

Generally, the degradation model of an LR image $I^{LR}$ can be formulated as follows:

I^{LR}=(I^{HR}\otimes k){\downarrow}_{s}+n,

(1)

where $I^{HR}$ is the HR image, $k$ is a blur kernel, $\otimes$ denotes convolution operation, ${\downarrow}_{s}$ is downsampling operation controlled by scale factor $s$ and $n$ refers to Gaussian noise. Under blind settings, image SR aims at super-resolving input LR images without knowing the true degradation information.

3.2 Our Method

Our framework consists of an encoder, a degrader, and a generator, as illustrated in Fig. 2. During the training phase, the encoder and the degrader are employed to extract discriminative representations from LR images. Meanwhile, the generator incorporates the degradation representation to super-resolve the LR image. During the inference, only the encoder and the generator are employed to produce the SR result.

3.2.1 3.2.1 Degradation Representation Learning

Degradation representation learning aims to extract implicit degradation information from LR images in a self-supervised manner.
(1) Image Re-degradation: First, an LR image $I^{LR}_{1}$ is fed to an encoder to obtain the degradation representation $f\in{R}^{{C}}$ :

f={\rm Encoder}(I^{LR}_{1}),

(2)

where $C$ is the number of channels. The encoder consists of five $5\times 5$ convolutional layers without batch normalization (BN) layers.

Then, the degrader takes the extracted degradation representation $f$ and an HR image $I^{HR}_{2}\in{R}^{H\times{W}\times 3}$ as input to reproduce the corresponding LR image ${R}^{\frac{H}{s}\times\frac{W}{s}\times 3}$ :

{\hat{I}_{2}^{LR}}={\rm Degrader}(I_{2}^{HR},f),

(3)

where $s$ is the scale factor. Inspired by DASR [42], its SR module is employed as our degrader except that the last upscaler is replaced with a downscaler to produce pseudo LR images. Note that, the degrader is only executed in the training phase.

Next, an L1 loss between the synthesized pseudo LR image and the input LR image is employed for optimization:

\mathcal{L}_{RD}=||I^{LR}_{2}-\hat{I}^{LR}_{2}||_{1}.

(4)

By encouraging the synthesized LR images to reproduce the diverse degradation details in the input LR images, full degradation information is captured in the degradation representations. Note that, to avoid the encoder to memorize the degradation in $I^{LR}_{1}$ rather than learning general degradation information, $I^{LR}_{2}$ has different contents with $I^{LR}_{1}$ but shares the same degradation.

(2) Energy Distance Loss: To constrain the learned representations in a bounded space for better generalization performance, an energy distance loss is introduced. Specifically, $b$ LR images are first randomly selected and encoded into $\left\{f_{1},f_{2}...f_{b}\right\}$ using our encoder, where $f_{i}\in{R}^{{C}}$ is degradation representation of the $i^{th}$ image. Then, we sample $m$ samples from a pre-defined distribution (e.g., Gaussian distribution), obtaining $\left\{t_{1},t_{2}...t_{m}\right\}$ . Here, $t_{j}\in{\mathbb{R}}^{{C}}$ is the $j^{\rm th}$ sample. Next, the energy distance loss is formulated as:

\displaystyle\mathcal{L}_{ED}=

\displaystyle\frac{2}{bm}\sum_{i=1}^{b}\sum_{j=1}^{m}{\|{f_{i}-t_{j}}\|}_{2}-% \frac{1}{b^{2}}\sum_{i=1}^{b}\sum_{j=1}^{b}{\|{f_{i}-f_{j}}\|}_{2}-\frac{1}{m^% {2}}\sum_{i=1}^{m}\sum_{j=1}^{m}{\|{t_{i}-t_{j}}\|}_{2}.

(5)

The bounded constraint ensures the uniformity of degradation representations in space. This prevents the degradation representations from crowding into a small space and facilitates the encoder to better distinguish subtle degradation details.

(3) Discussion: Previous methods commonly use contrastive learning to extract discriminative representations by distinguishing different degradations [42]. As a result, these representations focus on the degradation differences in a batch while overlooking their common components. Since a finite batch size is not able to cover the whole degradation space, the learned representations cannot well capture subtle degradation difference. In contrast, representations learned by our method are expected to preserve full degradation details by reconstructing the input LR images, thereby obtaining more accurate degradation information.

3.2.2 3.2.2 Degradation-Aware SR

With degradation information being encoded into the representations, degradation-aware super-resolution is performed to super-resolve the input LR image conditioned on the degradation information, as illustrated in Fig. 2.

(1) Generator: Degradtion-aware SR module in DASR [42] is employed as the generator to super-resolve the LR image. Specifically, the degradation representation is first fed to MLPs for feature compression and then passed to the generator to recover the missing details conditionally:

{I^{SR}_{1}}={\rm Generator}(I^{LR}_{1},f).

(6)

(2) Modulated Reconstruction Loss: As our generator performs conditional SR, the accuracy of the condition information (i.e., degradation information) determines the confidence of the sample. To make samples with higher confidence have more significant contributions to the optimization of our generator, we introduce a modulated reconstruction loss:

\mathcal{L}_{SR}=W\cdot{\|{{I}^{SR}_{1}-{I}_{1}^{HR}}\|}_{1},

(7)

where $W$ is the modulation coefficient of the input sample and is defined as:

W=\frac{2}{1+C^{-1}}.

(8)

Here $C$ is the confidence of the input sample. Intuitively, we calculate the RMSE score $d$ between the reconstructed LR image $\hat{I}_{2}^{LR}$ and the input LR image ${I}_{2}^{LR}$ , then make $C$ equal to $d^{-1}$ (i.e., we employ $d^{-1}$ as the confidence metric). More accurate the reproduced LR images are, more accurate the degradation information is.

4 Experiments

In this section, we first introduce the datasets and implementation details. Then, we conduct experiments on images with simple, complicated, and real degradations.

4.1 Datasets and Implementation Details

Following the protocal in [13, 42], we use 800 images in DIV2K [8] and 2650 images in Flickr2K [41] as the training set, and include 4 benchmark datasets (Set5 [4], Set14 [48], B100 [33] and Urban100 [19]) for evaluation. The kernel size is fixed to $21\times 21$ . We first trained our method on isotropic Gaussian kernels. The ranges of kernel width $\sigma$ were set to $[0.2,4.0]$ for $\times 4$ SR. Then, our framework was trained on more general degradations. Specifically, anisotropic Gaussian kernels with kernel width $\sigma_{1},\sigma_{2}{\sim}U(0.2,4)$ and rotation angle $\theta{\sim}U(0,\pi)$ are employed. In addition, the range of noise level is set to $[0,25]$ . The size of LR patch is set to $64\times 64$ for all experiments.

Table 1: PSNR and SSIM results achieved on Gaussian8 kernels for

\times 4

SR. Methods marked with ^∗ are degradation estimation based approaches, while other methods are degradation representation learning based approaches. Best and second best performance are in red and blue colors, respectively. Running time is averaged on Set14.

			Set5		Set14		B100		Urban100
Methods	Param (M)	Time (ms)	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
MANet^∗ [28]	9.90	102	30.43	0.8213	27.40	0.7464	26.63	0.7231	24.88	0.7485
DARSR^∗ [54]	3.54	62160	28.45	0.7877	26.17	0.7143	25.20	0.6920	24.09	0.7135
DANv1^∗ [20]	4.33	151	31.89	0.8864	28.42	0.7687	27.51	0.7248	25.86	0.7721
DANv2^∗ [30]	4.71	152	32.00	0.8885	28.50	0.7715	27.56	0.7277	25.94	0.7748
DCLS^∗ [31]	13.63	133	32.12	0.8890	28.54	0.7728	27.60	0.7285	26.15	0.7809
DASR [42]	5.84	49	31.46	0.8789	28.11	0.7603	27.44	0.7214	25.36	0.7506
CDSR [55]	13.23	113	31.33	0.8328	27.90	0.7477	27.13	0.7046	25.25	0.7492
CMDSR [47]	1.48	40	29.10	0.8146	26.57	0.7239	26.19	0.6980	23.67	0.7211
MRDA [45]	5.84	57	31.98	0.8872	28.42	0.7671	27.55	0.7254	25.90	0.7734
KDSR [46]	5.80	63	32.02	0.8892	28.46	0.7761	27.52	0.7281	25.96	0.7760
ReDSR (Ours)	6.28	49	32.18	0.8918	28.50	0.7770	27.55	0.7285	26.27	0.8050
ReDSR-L (Ours)	10.43	93	32.30	0.9035	28.75	0.7930	27.79	0.7392	26.47	0.8102

During training, 16 HRs were randomly selected. Then, we randomly selected 16 degradation models from the above ranges to generate LR images. Next, 32 HR-LR patch pairs (2 patch pairs from each image, as illustrated in Sec. 3.2.1) were divided into 2 groups. One group was fed to the degrader while another one was fed to the generator and the encoder. Sample number $m$ in $\mathcal{L}_{ED}$ (Eq. 5) was set to 64. We adopted the Adam optimizer [22] with the momentum of $\beta_{1}=0.9$ , $\beta_{2}=0.999$ for optimization. We trained the whole network for 1000 epochs. The initial learning rate was set to $0.0001$ and decreased to half after every 200 epochs. The overall loss function was defined as:

L={\lambda_{1}}\mathcal{L}_{ED}+\mathcal{L}_{RD}+\mathcal{L}_{SR},

(9)

where ${\lambda_{1}}=0.01$ .

4.2 Experiments on Simple Degradations

We first conduct experiments on simple degradations with only isotropic Gaussian kernels.

4.2.1 4.2.1 Performance Evaluation

We compare ReDSR to recent state-of-the-art blind SR methods, including MANet¹¹1Since MANet [28] is trained on anisotropic Gaussian kernels, we re-train it on isotropic Gaussian kernels for fair comparison. [28], DASR [42], CDSR²²2Since the pre-trained model of CDSR [55] is unavailable, we re-train it using the officially released codes. [55], DAN [20], CMDSR [47], KDSR [46], MRDA [45] and DARSR [54]. MANet and DAN require the degradation information as supervision to estimate the degradation model of the LR image. Other methods extract degradation information from the LR images in a fully unsupervised manner. Quantitative results are listed in Table 1, with visualization results being provided in Fig. 3.

From Table 1 we can see that our ReDSR achieves the best performance. Degradation estimation based methods (MANet and DAN) require numerous iterations to achieve accurate estimation of the degradation and suffer relatively long inference time. In contrast, other methods achieve higher efficiency as they employ implicit degradation representation. As compared to DASR, our ReDSR produces significant accuracy improvements. This is because our re-degradation mechanism can better preserve degradation details such that superior performance is achieved. Figure 3 further compares the visualization results produced by different methods. As we can see, our ReDSR produces results with the best perceptual quality while other methods commonly suffer blurring artifacts.

4.2.2 4.2.2 Model Analyses

(1) Re-degradation Loss $\mathcal{L}_{RD}$ : The main idea of this paper is to extract degradation information by re-producing the LR images. To achieve this, the core design is the re-degradation loss $\mathcal{L}_{RD}$ . To validate its effectiveness, we first introduce a model variant (Model 1) by removing all designs. Then, we add the re-degradation loss to model 1 to obtain model 2 for comparison. It can be observed from Table 2 that the re-degradation loss improves the performance of model 1 on all kernel widths. With our re-degradation loss, model 2 can extract accurate degradation information by reproducing the LR images such that higher PSNR scores are produced.

(2) Energy Distance Loss $\mathcal{L}_{ED}$ : To promote the learning of degradation information, energy distance loss $\mathcal{L}_{ED}$ is employed in our method. To demonstrate its contribution to the final performance, we add the energy distance loss to model 1 to obtain model 3 for comparison. As we can see, model 3 surpasses model 1 with notable margins. In addition, we further develop model 4 by adding energy distance loss to model 2. With both $\mathcal{L}_{RD}$ and $\mathcal{L}_{ED}$ , model 4 produces significantly higher PSNR scores as compared to model 1.

We further visualize the degradation representations extracted from images with various degradations using the t-SNE method [32]. As we can see, models 1, 2, and 3 cannot distinguish different degradations in the representation space (Fig. 4(a-c)). In contrast, with both $\mathcal{L}_{RD}$ and $\mathcal{L}_{ED}$ , model 4 can well distinguish different degradations and gather them into discriminative clusters (Fig. 4(d)). We further conduct experiments to study the effect of different distribution types (i.e., Gaussian, uniform, and exponential distributions). As compared to models 5 and 6, model 4 produces comparable performance on different kernel widths. Moreover, the representations learned by model 4 are more discriminative. As a result, Gaussian distribution is used as the default setting for our method.

Table 2: PSNR results achieved on Urban100 for

\times 4

SR. Note that, We replace the encoder with 5 fully-connected layers to learn representations directly from true degradation in model 9.

Method	$\mathcal{L}_{RD}$	$\mathcal{L}_{ED}$	$\mathcal{L}_{KL}$	Modulated Loss	Oracle Degradation	Kernel Width $\sigma$
Method	$\mathcal{L}_{RD}$	$\mathcal{L}_{ED}$	$\mathcal{L}_{KL}$	Modulated Loss	Oracle Degradation	1.0	1.8	2.6	3.4
Model 1	\faTimes	\faTimes	\faTimes	\faTimes	\faTimes	25.17	25.07	24.81	24.36
Model 2	\faCheck	\faTimes	\faTimes	\faTimes	\faTimes	25.30	25.22	24.97	24.45
Model 3	\faTimes	Gaussian	\faTimes	\faTimes	\faTimes	25.26	25.20	24.88	24.42
Model 4	\faCheck	Gaussian	\faTimes	\faTimes	\faTimes	26.53	26.45	26.18	25.53
Model 5	\faCheck	Uniform	\faTimes	\faTimes	\faTimes	26.49	26.42	26.23	25.60
Model 6	\faCheck	Exponential	\faTimes	\faTimes	\faTimes	26.52	26.50	26.19	25.56
Model 7	\faCheck	\faTimes	Gaussian	\faCheck	\faTimes	26.37	26.30	25.98	25.41
Model 8 (Ours)	\faCheck	Gaussian	\faTimes	\faCheck	\faTimes	26.62	26.55	26.28	25.67
Model 9 (upper bound)	\faCheck	Gaussian	\faTimes	\faCheck	\faCheck	26.65	26.67	26.39	25.75

(3) Energy Distance Loss vs. KL Divergence Loss: The energy distance loss associates the representation distribution with a pre-defined distribution. As an alternative, KL divergence [23] is also capable of map** image space to a pre-defined distribution. To demonstrate the superiority of the energy distance loss, we replace $\mathcal{L}_{ED}$ with KL divergence loss to obtain model 7 for comparison. As compared to model 7, our method (model 8) produces significant performance improvements with over 0.2 PSNR gains on different kernel widths. We also visualize the representations extracted from model 7 in Fig. 4(g). It can be observed that several degradations are not well distinguished. For example, representations for $\sigma=4.1$ (orange points) are mixed with the ones for $\sigma=5.0$ (purple points). In contrast, the representations learned by our method gather into several distinct clusters (Fig. 4(h)), which further demonstrates that accurate degradation information can be learned by our method. In addition, as compared to DASR (Fig. 4(i)), our degradation representation can better distinguish the subtle degradation differences.

(4) Evolution of Representation Clusters: During the training phase, ED loss helps the encoder to gradually distinguish different degradations (Fig. 5).

(5) Modulated Reconstruction Loss: The modulated reconstruction loss is developed to dynamically tune the weights of samples with different confidences to stabilize the optimization of the networks. To validate its effectiveness, we develop model 4 by removing the modulated reconstruction loss and compare its performance to our model. As we can see, when modulated reconstruction loss is removed, model 4 suffers notable performance drop, especially on large kernel widths (e.g., 25.67 $\rightarrow$ 25.53 on $\sigma=3.4$ ).

(6) Oracle Degradataion Representations: To study the upper bound of our method, we develop a model with oracle degradation information. Specifically, we replace the degradation encoder in the network with 5 fully-connected layers (model 9) to directly learn degradation representations from the true degradation models. It can be observed that model 9 produces the best results on all kernel widths. However, model 8 still achieves competitive performance against model 9, which demonstrates the high accuracy of the degradation representations learned by our method.

(7) Robustness of Degradation Representations: Our representation learning scheme aims at extracting content-invariant degradation information. To validate this, we conduct experiment to study the effects of various image contents to our degradation representations. Specifically, given an HR image, we first generate a LR image $I_{1}$ using a Gaussian kernel $k$ . Then we randomly select another 99 HR images to generate LR images using $k$ . Afterwards, 100 degradation representations are extracted from these LR images to super-resolve $I_{1}$ . As shown in Fig. 6, it is clear that our ReDSR achieves relatively stable performance. This further verifies that our ReDSR can extract degradation information from different image contents.

Table 3: PSNR results achieved on Urban100 for

\times 4

SR, testing on Anisotropic Gaussian blur and noise.

Method	#Params.	Time	Noise	Blur Kernel
Method	#Params.	Time	Noise
DnCnn [51] +DANv2 [30]	650K +4.71M	155ms	0	25.57	25.45	25.40	25.27	25.17	25.22	25.03	24.90	24.70
			10	24.41	24.26	24.15	23.95	23.80	23.75	23.52	23.32	23.05
			20	23.65	23.49	23.37	23.20	23.01	22.78	22.55	22.30	22.04
DnCNN [51] +DCLS [31]	650K +19.05M	170ms	0	24.85	24.78	24.68	24.52	24.41	24.25	24.08	23.92	23.65
			10	23.83	23.75	23.60	23.42	23.22	23.01	22.77	22.49	22.15
			20	23.45	23.32	23.14	22.94	22.70	22.47	22.23	22.00	21.78
DASR [42]	5.84M	49ms	0	25.00	24.90	24.80	24.77	24.71	24.64	24.58	24.47	24.30
			10	24.07	23.93	23.77	23.56	23.37	23.20	23.02	22.82	22.63
			20	23.33	23.18	23.02	22.84	22.66	22.48	22.30	22.12	21.95
MRDA [45]	5.84M	57ms	0	25.43	25.38	25.29	25.19	25.13	25.03	24.93	24.74	24.52
			10	24.39	24.27	24.11	23.90	23.70	23.52	23.32	23.11	22.89
			20	23.57	23.44	23.28	23.10	22.93	22.75	22.56	22.38	22.20
KDSR [46]	5.80M	63ms	0	25.69	25.68	25.63	25.58	25.54	25.47	25.37	25.25	25.09
			10	24.58	24.48	24.33	24.13	23.93	23.75	23.54	23.32	23.12
			20	23.69	23.57	23.42	23.24	23.06	22.87	22.68	22.49	22.31
ReDSR (Ours)	6.28M	49ms	0	25.73	25.77	25.68	25.62	25.60	25.55	25.43	25.35	25.16
			10	24.74	24.63	24.50	24.31	24.11	23.92	23.71	23.50	23.28
			20	23.86	23.73	23.59	23.41	23.22	23.03	22.83	22.62	22.44

4.3 Experiments on General Degradations

We conduct experiments on general degradations with anisotropic Gaussian kernels and noises. Specifically, 9 anisotropic Gaussian kernels and different noise levels are employed.

(1) Performance Evaluation: It can be observed from Table 3 that our ReDSR outperforms other comparative methods on all blur kernels and noise levels. Specifically, DANv2 [51] performs favorably against another three methods (i.e., DCLS [31], DASR [42], MRDA [45]) but is time-consuming since numerous iterations are required. As compared to DAN, our method produces better performance in terms of both accuracy and efficiency. Visualization results achieved by different methods are illustrated in Fig. 7. As we can see, our ReDSR achieves better visual quality while other methods suffer blurring artifacts.

(2) Visualization of Degradation Representation: As compared to simple degradations with only isotropic Gaussian kernels, general degradations are more difficult to be distinguished. We further visualize the degradation representations in Fig. 9 using the t-SNE method. From the last two rows, we can see that DASR and KDSR cannot well distinguish different degradations. Particularly, degradation representations extracted from images with different noise levels are mixed together in KDSR’s representation space, while DASR is confused by different Gaussian kernels. In contrast, our ReDSR produces more distinctive clusters, which demonstrates its effectiveness in extracting accurate degradation information.

Our ReDSR employs an energy distance loss to associate the distribution of the learned representations with a pre-defined distribution. This bounded constraint enables the learning of a compact representation space with a bijective map** to the degradation space. To demonstrate this, we synthesize LR images with six different degradations (random Gaussian blur kernels and Gaussian noise levels) and visualize their representations in Fig. 9. In addition, we also randomly sample 1000 representations from a pre-defined distribution. Without bounded constraint, the representations extracted by DASR and KDSR collapse into a small subspace of the Gaussian distribution. In contrast, the representations learned by our ReDSRs span the pre-defined distribution, establishing a bijective map** and facilitating the learning of distinctive representations, especially for unseen degradations.

4.4 Experiments on Real Degradations

(1) Performance Evalution: We further conduct experiments on real degradations to demonstrate the generalization capability of our method. Following [42], ReDSR trained on isotropic Gaussian kernels is used for evaluation on real images. Visualization results are shown in Fig. 10. It can be observed that ReDSR produces high-quality images with clearer details and fewer blurring artifacts. For example, in the first scene, the texts in the SR results obtained by previous methods are blurry. In contrast, our ReDSR produces results with sharper edges of higher perceptual quality.

(2) Generalization Improvement by ED Loss: As shown in Fig. 9, degradation representations generated by DASR/KDSR collapse into a subspace in Gaussian distribution. This indicates that subtle degradation differences cannot be distinguished. With our ED loss, the representations are embedded into a regular pre-defined space, which helps decouple different degradations and amplifies their differences between them. Then differences between various degradations can be well recognized to generate more discriminative and robust representations.

Table 4: PSNR results on Urban100 for models trained on isotropic kernels. Note that, LR images are generated by anisotropic Gaussian kernels.

$\frac{\sigma_{1}}{\sigma_{2}}$	$\frac{0.5}{0.2}$	$\frac{1.4}{0.8}$	$\frac{2.3}{1.4}$	$\frac{3.2}{2.0}$	$\frac{4.1}{2.6}$	$\frac{5.0}{3.2}$
ReDSR (Ours)	26.57	26.57	26.41	25.85	24.71	23.68
KDSR [46]	26.35	26.08	25.25	24.21	23.28	22.52
DASR[42]	25.83	25.54	24.86	23.93	23.12	22.41
DASR[42] + ED loss	25.90	25.75	25.36	24.68	23.60	22.79

As shown in Fig. 11(a), we use model 8 in Table 2 (trained on isotropic Gaussian kernels) to extract representations from anisotropic Gaussian kernels. It also forms distinct clusters. To validate the universality of ED loss, we retrain contrastive learning methods (i.e., DASR) on isotropic Gaussian kernels with a bounded constraint for the representations. Figure 11(c) shows more distinct clusters than pure DASR. Quantitative results are listed in Table 4. ReDSR is stable for unseen degradation while the others suffer from severe performance drop. Besides, ED loss helps DASR generate robust representations to obtain higher and stable PSNR scores. ED loss enhances the robustness of the representation space and is easy to couple with other representation-based methods.

5 Conclusion

In this paper, we propose an alternative to learn degradation representations by re-producing LR images. In addition, we introduce an energy distance loss to associate learned representations with a pre-defined distribution for superior generalization capability. It is demonstrated that our degradation representation learning scheme can extract discriminative representations to obtain accurate degradation information. Experimental results show that our network achieves state-of-the-art performance for blind SR with various degradations.

References

[1] Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for boltzmann machines. Cognitive Science 9(1), 147–169 (1985)
[2] Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2745–2754 (2017)
[3] Bell-Kligler, S., Shocher, A., Irani, M.: Blind super-resolution kernel estimation using an internal-gan. Advances in Neural Information Processing Systems 32 (2019)
[4] Bevilacqua, M., Roumy, A., Guillemot, C., Morel, M.L.A.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: British Machine Vision Conference (BMVC) (2012)
[5] Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2), 295–307 (2015)
[6] Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. Advances in Neural Information Processing Systems 27 (2014)
[7] Du, Y., Mordatch, I.: Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems 32 (2019)
[8] Eirikur, A., Radu, T.: Ntire 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 126–135 (2017)
[9] Goldenberg, I., Webb, G.I.: Survey of distance measures for quantifying concept drift and shift in numeric data. Knowledge and Information Systems 60(2), 591–615 (2019)
[10] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014)
[11] Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. Advances in Neural Information Processing Systems 19 (2006)
[12] Gritsenko, A., Salimans, T., van den Berg, R., Snoek, J., Kalchbrenner, N.: A spectral energy distance for parallel speech synthesis. Advances in Neural Information Processing Systems 33, 13062–13072 (2020)
[13] Gu, J., Lu, H., Zuo, W., Dong, C.: Blind super-resolution with iterative kernel correction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1604–1613 (2019)
[14] Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). vol. 2, pp. 1735–1742. IEEE (2006)
[15] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9729–9738 (2020)
[16] Hinton, G., Osindero, S., Welling, M., Teh, Y.W.: Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive Science 30(4), 725–731 (2006)
[17] Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)
[18] Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
[19] Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5197–5206 (2015)
[20] Huang, Y., Li, S., Wang, L., Tan, T., et al.: Unfolding the alternating optimization for blind super resolution. Advances in Neural Information Processing Systems 33, 5632–5643 (2020)
[21] Kim, S.Y., Sim, H., Kim, M.: Koalanet: Blind super-resolution using kernel-oriented adaptive local adjustment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10611–10620 (2021)
[22] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[23] Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
[24] Kumar, S., Hebert, M.: Discriminative fields for modeling spatial dependencies in natural images. Advances in Neural Information Processing Systems 16 (2003)
[25] LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. Predicting Structured Data 1(0) (2006)
[26] Li, D., Zhang, Y., Cheung, K.C., Wang, X., Qin, H., Li, H.: Learning degradation representations for image deblurring. In: European Conference on Computer Vision. pp. 736–753. Springer (2022)
[27] Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: International Conference on Machine Learning. pp. 1718–1727. PMLR (2015)
[28] Liang, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Mutual affine network for spatially variant kernel estimation in blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4096–4105 (2021)
[29] Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 136–144 (2017)
[30] Luo, Z., Huang, Y., Li, S., Wang, L., Tan, T.: End-to-end alternating optimization for real-world blind super resolution. International Journal of Computer Vision 131(12), 3152–3169 (2023)
[31] Luo, Z., Huang, H., Yu, L., Li, Y., Fan, H., Liu, S.: Deep constrained least squares for blind image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17642–17652 (2022)
[32] Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(11) (2008)
[33] Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. vol. 2, pp. 416–423. IEEE (2001)
[34] Mnih, A., Hinton, G.: Learning nonlinear constraints with contrastive backpropagation. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. vol. 2, pp. 1302–1307. IEEE (2005)
[35] Ranzato, M., Poultney, C., Chopra, S., Cun, Y.: Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems 19 (2006)
[36] Rizzo, M.L., Székely, G.J.: Energy distance. wiley interdisciplinary reviews: Computational Statistics 8(1), 27–38 (2016)
[37] Salakhutdinov, R., Hinton, G.: Deep boltzmann machines. In: Artificial Intelligence and Statistics. pp. 448–455. PMLR (2009)
[38] Salakhutdinov, R., Larochelle, H.: Efficient learning of deep boltzmann machines. In: Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 693–700. JMLR Workshop and Conference Proceedings (2010)
[39] Shocher, A., Cohen, N., Irani, M.: “zero-shot” super-resolution using deep internal learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3118–3126 (2018)
[40] Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3147–3155 (2017)
[41] Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L.: Ntire 2017 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 114–125 (2017)
[42] Wang, L., Wang, Y., Dong, X., Xu, Q., Yang, J., An, W., Guo, Y.: Unsupervised degradation representation learning for blind super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10581–10590 (2021)
[43] Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1905–1914 (2021)
[44] Wang, Y., Ming, J., Jia, X., Elder, J.H., Lu, H.: Blind image super-resolution with degradation-aware adaptation. In: Proceedings of the Asian Conference on Computer Vision. pp. 894–910 (2022)
[45] Xia, B., Tian, Y., Zhang, Y., Hang, Y., Yang, W., Liao, Q.: Meta-learning based degradation representation for blind super-resolution. IEEE Transactions on Image Processing (2023)
[46] Xia, B., Zhang, Y., Wang, Y., Tian, Y., Yang, W., Timofte, R., Van Gool, L.: Knowledge distillation based degradation estimation for blind super-resolution. In: The Eleventh International Conference on Learning Representations (2022)
[47] Yin, G., Wang, W., Yuan, Z., Ji, W., Yu, D., Sun, S., Chua, T.S., Wang, C.: Conditional hyper-network for blind super-resolution with multiple degradations. IEEE Transactions on Image Processing 31, 3949–3960 (2022)
[48] Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. pp. 711–730. Springer (2012)
[49] Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3217–3226 (2020)
[50] Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4791–4800 (2021)
[51] Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26(7), 3142–3155 (2017)
[52] Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 286–301 (2018)
[53] Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial networks. In: International Conference on Learning Representations (2016)
[54] Zhou, H., Zhu, X., Zhu, J., Han, Z., Zhang, S.X., Qin, J., Yin, X.C.: Learning correction filter via degradation-adaptive regression for blind single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12365–12375 (2023)
[55] Zhou, Y., Lin, C., Luo, D., Liu, Y., Tai, Y., Wang, C., Chen, M.: Joint learning content and degradation aware feature for blind super-resolution. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 2606–2616 (2022)