(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version

11institutetext: Sun Yat-sen University 22institutetext: Aviation University of Air Force 33institutetext: Huawei Cloud
33email: {[email protected],guoyulan@sysu}.edu.cn

Preserving Full Degradation Details for Blind Image Super-Resolution

Hongda Liu 11    Longguang Wang 22    Ye Zhang 11    Kaiwen Xue 33    Shunbo Zhou 33    Yulan Guo 11
Abstract

The performance of image super-resolution relies heavily on the accuracy of degradation information, especially under blind settings. Due to absence of true degradation models in real-world scenarios, previous methods learn distinct representations by distinguishing different degradations in a batch. However, the most significant degradation differences may provide shortcuts for the learning of representations such that subtle difference may be discarded. In this paper, we propose an alternative to learn degradation representations through reproducing degraded low-resolution (LR) images. By guiding the degrader to reconstruct input LR images, full degradation information can be encoded into the representations. In addition, we develop an energy distance loss to facilitate the learning of the degradation representations by introducing a bounded constraint. Experiments show that our representations can extract accurate and highly robust degradation information. Moreover, evaluations on both synthetic and real images demonstrate that our ReDSR achieves state-of-the-art performance for the blind SR tasks.

Keywords:
Image Super-Resolution Preserving Degradation Information Energy Distance Loss

1 Introduction

Single image super-resolution (SR) aims at reconstructing a high-resolution (HR) image from its low-resolution (LR) counterpart. As a typical inverse problem, SR is highly coupled with the degradation model [3]. In early stages, most CNN-based methods [5, 40, 29, 52] are developed based on an assumption that the degradation is known and fixed (e.g., bicubic downsampling). However, these methods suffer severe performance drop when the real degradation is different from the assumption. To remedy this, many efforts have been made to endow SR networks with the capability of handling various degradations [50, 43, 28, 20, 49, 39].

To ease the ill-posedness of the SR task under diverse degradations, numerous works were first conducted to degradation estimation and then use it as priori information for SR [13, 3, 20, 28]. However, these methods are sensitive to the estimated degradation and the estimation error may be magnified by the SR network to produce severe artifacts. Inspired by contrastive learning [6, 15], Wang et al. [42] proposed to distinguish different degradations rather than explicitly estimate the degradation. Xia et al. [46] further enhanced the discriminability of degradation representations by employing knowledge distillation. However, the learned degradation representations cannot well capture subtle degradation difference, resulting in limited generalization performance.

Refer to caption
Figure 1: An illustration of degradation representation space. (a) Representations learned without a bounded constraint (e.g., DASR [42]). (b) Representations learned by our method with a bounded constraint.

In this paper, we propose an alternative to learn degradation representations by reproducing degraded LR images. Specifically, the LR image is first fed to an encoder to extract degradation representation. Then, this degradation representation is leveraged by a degrader to reproduce the LR image from the HR image. By reproducing the LR image, full degradation information details can be captured in a compact degradation representation. With degradation information being encoded, degradation representation is passed to the generator to super-resolve the input LR image. Existing re-degradation methods [26, 21] only consider a single degradation type (e.g., blur) and cannot distinguish different degradations, resulting in weak robust representations. In real-world applications where images are corrupted by multiple degradations, different degradations are required to be distinguished to control the synthesized LR image to maintain degradation consistency. To promote the learning of degradation representation, we introduce an energy distance loss to bound the learned representations in a pre-defined distribution (e.g., Gaussian distribution). As compared to previous approaches that learn representation in an unbounded space (Fig. 1(a)), our energy distance loss facilitates more distinctive representations to be extracted (Fig. 1(b)), especially for degradations unseen in the training set [2].

In summary, our contributions are three-fold:

  • We propose an alternative to extract degradation information from LR images by learning degradation representation to guide the degrader to reproduce the input LR images.

  • We introduce an energy distance loss to facilitate the learning of discriminative representations by constraining the representations in a bounded pre-defined space.

  • Extensive experiments show that our method produces state-of-the-art performance on benchmark datasets.

2 Related Work

In this section, we first briefly review recent advances of blind image super-resolution methods. Then, we discuss energy-based models that are related to our work.

2.1 Blind Image Super-Resolution

Blind image super resolution aims to super-resolve LR images with unknown degradations. Early methods [3, 13] commonly follow a two-step pipeline that first estimate the degradation model and then conduct image SR conditioned on the degradation. Specifically, Gu et al. [13] proposed an iterative kernel correction (IKC) method to alternately correct estimated degradation and conduct image SR. Huang et al. [20] developed a deep alternating network (DAN) by iteratively estimating the degradation and restoring the SR image. Liang et al. [28] proposed a mutual affine network (MANet) to exploit the interdependence between different channels by mutual affine transformation. Since numerous iterations are required to obtain accurate degradation information at test time, these methods are usually time-consuming.

Inspired by the developments of contrastive learning [6, 15], several efforts [42, 55, 44] have been made to leverage contrastive learning to extract discriminative representations to obtain degradation information. Wang et al. [42] first introduced degradation representation learning to distinguish different degradations in the representation space rather than explicit degradation estimation. Zhou et al. [55] proposed content-aware embedding to encode more information into the representations. Xia et al. [46] proposed to employ knowledge distillation to further improve the discriminability of degradation representations in a two-stage pipeline.

2.2 Energy-Based Model

Energy-based model (EBM) has demonstrated great advantages in modeling data distributions for image generation [25, 24, 14, 53, 7]. Early EBMs [1, 17, 37, 38] formulated the energy function as a composition of latent and observable variables. Later EBMs [34, 16, 35] directly mapped image samples to the representations in a certain distribution. However, the number of samples limits the quality of generated images [35]. To remedy this, Zhao et al. [53] combines GAN [10] and Auto-Encoder [18] to achieve image quality improvement. Previous works commonly minimize L1/L2 distance between the data distribution and a pre-defined distribution for data modeling. Nevertheless, these methods suffer slow convergence and low image quality [7].

Recently, Gretton et al. [11] proposed an empirical estimate of Maximum Mean Discrepancy (MMD) to measure the distance between two distributions. Compared with L1/L2 distance, MMD preforms more stably and consistently in data distribution modeling [12, 27]. Later, Rizzo et al. [36] further simplified the MMD to an energy distance that is scalable to high-dimension space and easier to be implemented [9].

3 Methodology

In this section, we first introduce the problem formulation of blind image SR. Then, we present our method in details.

3.1 Problem Formulation

Generally, the degradation model of an LR image ILRsuperscript𝐼𝐿𝑅I^{LR}italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT can be formulated as follows:

ILR=(IHRk)s+n,superscript𝐼𝐿𝑅tensor-productsuperscript𝐼𝐻𝑅𝑘subscript𝑠𝑛I^{LR}=(I^{HR}\otimes k){\downarrow}_{s}+n,italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT = ( italic_I start_POSTSUPERSCRIPT italic_H italic_R end_POSTSUPERSCRIPT ⊗ italic_k ) ↓ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_n , (1)

where IHRsuperscript𝐼𝐻𝑅I^{HR}italic_I start_POSTSUPERSCRIPT italic_H italic_R end_POSTSUPERSCRIPT is the HR image, k𝑘kitalic_k is a blur kernel, tensor-product\otimes denotes convolution operation, ssubscript𝑠{\downarrow}_{s}↓ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is downsampling operation controlled by scale factor s𝑠sitalic_s and n𝑛nitalic_n refers to Gaussian noise. Under blind settings, image SR aims at super-resolving input LR images without knowing the true degradation information.

Refer to caption
Figure 2: An overview of our ReDSR framework.

3.2 Our Method

Our framework consists of an encoder, a degrader, and a generator, as illustrated in Fig. 2. During the training phase, the encoder and the degrader are employed to extract discriminative representations from LR images. Meanwhile, the generator incorporates the degradation representation to super-resolve the LR image. During the inference, only the encoder and the generator are employed to produce the SR result.

3.2.1 3.2.1 Degradation Representation Learning

Degradation representation learning aims to extract implicit degradation information from LR images in a self-supervised manner.
(1) Image Re-degradation: First, an LR image I1LRsubscriptsuperscript𝐼𝐿𝑅1I^{LR}_{1}italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is fed to an encoder to obtain the degradation representation fRC𝑓superscript𝑅𝐶f\in{R}^{{C}}italic_f ∈ italic_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT:

f=Encoder(I1LR),𝑓Encodersubscriptsuperscript𝐼𝐿𝑅1f={\rm Encoder}(I^{LR}_{1}),italic_f = roman_Encoder ( italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , (2)

where C𝐶Citalic_C is the number of channels. The encoder consists of five 5×5555\times 55 × 5 convolutional layers without batch normalization (BN) layers.

Then, the degrader takes the extracted degradation representation f𝑓fitalic_f and an HR image I2HRRH×W×3subscriptsuperscript𝐼𝐻𝑅2superscript𝑅𝐻𝑊3I^{HR}_{2}\in{R}^{H\times{W}\times 3}italic_I start_POSTSUPERSCRIPT italic_H italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_H × italic_W × 3 end_POSTSUPERSCRIPT as input to reproduce the corresponding LR image RHs×Ws×3superscript𝑅𝐻𝑠𝑊𝑠3{R}^{\frac{H}{s}\times\frac{W}{s}\times 3}italic_R start_POSTSUPERSCRIPT divide start_ARG italic_H end_ARG start_ARG italic_s end_ARG × divide start_ARG italic_W end_ARG start_ARG italic_s end_ARG × 3 end_POSTSUPERSCRIPT:

I^2LR=Degrader(I2HR,f),superscriptsubscript^𝐼2𝐿𝑅Degradersuperscriptsubscript𝐼2𝐻𝑅𝑓{\hat{I}_{2}^{LR}}={\rm Degrader}(I_{2}^{HR},f),over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT = roman_Degrader ( italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_R end_POSTSUPERSCRIPT , italic_f ) , (3)

where s𝑠sitalic_s is the scale factor. Inspired by DASR [42], its SR module is employed as our degrader except that the last upscaler is replaced with a downscaler to produce pseudo LR images. Note that, the degrader is only executed in the training phase.

Next, an L1 loss between the synthesized pseudo LR image and the input LR image is employed for optimization:

RD=I2LRI^2LR1.subscript𝑅𝐷subscriptnormsubscriptsuperscript𝐼𝐿𝑅2subscriptsuperscript^𝐼𝐿𝑅21\mathcal{L}_{RD}=||I^{LR}_{2}-\hat{I}^{LR}_{2}||_{1}.caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT = | | italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - over^ start_ARG italic_I end_ARG start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (4)

By encouraging the synthesized LR images to reproduce the diverse degradation details in the input LR images, full degradation information is captured in the degradation representations. Note that, to avoid the encoder to memorize the degradation in I1LRsubscriptsuperscript𝐼𝐿𝑅1I^{LR}_{1}italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT rather than learning general degradation information, I2LRsubscriptsuperscript𝐼𝐿𝑅2I^{LR}_{2}italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT has different contents with I1LRsubscriptsuperscript𝐼𝐿𝑅1I^{LR}_{1}italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT but shares the same degradation.

(2) Energy Distance Loss: To constrain the learned representations in a bounded space for better generalization performance, an energy distance loss is introduced. Specifically, b𝑏bitalic_b LR images are first randomly selected and encoded into {f1,f2fb}subscript𝑓1subscript𝑓2subscript𝑓𝑏\left\{f_{1},f_{2}...f_{b}\right\}{ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_f start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT } using our encoder, where fiRCsubscript𝑓𝑖superscript𝑅𝐶f_{i}\in{R}^{{C}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT is degradation representation of the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT image. Then, we sample m𝑚mitalic_m samples from a pre-defined distribution (e.g., Gaussian distribution), obtaining {t1,t2tm}subscript𝑡1subscript𝑡2subscript𝑡𝑚\left\{t_{1},t_{2}...t_{m}\right\}{ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … italic_t start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }. Here, tjCsubscript𝑡𝑗superscript𝐶t_{j}\in{\mathbb{R}}^{{C}}italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT is the jthsuperscript𝑗thj^{\rm th}italic_j start_POSTSUPERSCRIPT roman_th end_POSTSUPERSCRIPT sample. Next, the energy distance loss is formulated as:

ED=subscript𝐸𝐷absent\displaystyle\mathcal{L}_{ED}=caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT = 2bmi=1bj=1mfitj21b2i=1bj=1bfifj21m2i=1mj=1mtitj2.2𝑏𝑚superscriptsubscript𝑖1𝑏superscriptsubscript𝑗1𝑚subscriptnormsubscript𝑓𝑖subscript𝑡𝑗21superscript𝑏2superscriptsubscript𝑖1𝑏superscriptsubscript𝑗1𝑏subscriptnormsubscript𝑓𝑖subscript𝑓𝑗21superscript𝑚2superscriptsubscript𝑖1𝑚superscriptsubscript𝑗1𝑚subscriptnormsubscript𝑡𝑖subscript𝑡𝑗2\displaystyle\frac{2}{bm}\sum_{i=1}^{b}\sum_{j=1}^{m}{\|{f_{i}-t_{j}}\|}_{2}-% \frac{1}{b^{2}}\sum_{i=1}^{b}\sum_{j=1}^{b}{\|{f_{i}-f_{j}}\|}_{2}-\frac{1}{m^% {2}}\sum_{i=1}^{m}\sum_{j=1}^{m}{\|{t_{i}-t_{j}}\|}_{2}.divide start_ARG 2 end_ARG start_ARG italic_b italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_b start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∥ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∥ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (5)

The bounded constraint ensures the uniformity of degradation representations in space. This prevents the degradation representations from crowding into a small space and facilitates the encoder to better distinguish subtle degradation details.

(3) Discussion: Previous methods commonly use contrastive learning to extract discriminative representations by distinguishing different degradations [42]. As a result, these representations focus on the degradation differences in a batch while overlooking their common components. Since a finite batch size is not able to cover the whole degradation space, the learned representations cannot well capture subtle degradation difference. In contrast, representations learned by our method are expected to preserve full degradation details by reconstructing the input LR images, thereby obtaining more accurate degradation information.

3.2.2 3.2.2 Degradation-Aware SR

With degradation information being encoded into the representations, degradation-aware super-resolution is performed to super-resolve the input LR image conditioned on the degradation information, as illustrated in Fig. 2.

(1) Generator: Degradtion-aware SR module in DASR [42] is employed as the generator to super-resolve the LR image. Specifically, the degradation representation is first fed to MLPs for feature compression and then passed to the generator to recover the missing details conditionally:

I1SR=Generator(I1LR,f).subscriptsuperscript𝐼𝑆𝑅1Generatorsubscriptsuperscript𝐼𝐿𝑅1𝑓{I^{SR}_{1}}={\rm Generator}(I^{LR}_{1},f).italic_I start_POSTSUPERSCRIPT italic_S italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Generator ( italic_I start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f ) . (6)

(2) Modulated Reconstruction Loss: As our generator performs conditional SR, the accuracy of the condition information (i.e., degradation information) determines the confidence of the sample. To make samples with higher confidence have more significant contributions to the optimization of our generator, we introduce a modulated reconstruction loss:

SR=WI1SRI1HR1,subscript𝑆𝑅𝑊subscriptnormsubscriptsuperscript𝐼𝑆𝑅1superscriptsubscript𝐼1𝐻𝑅1\mathcal{L}_{SR}=W\cdot{\|{{I}^{SR}_{1}-{I}_{1}^{HR}}\|}_{1},caligraphic_L start_POSTSUBSCRIPT italic_S italic_R end_POSTSUBSCRIPT = italic_W ⋅ ∥ italic_I start_POSTSUPERSCRIPT italic_S italic_R end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_R end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , (7)

where W𝑊Witalic_W is the modulation coefficient of the input sample and is defined as:

W=21+C1.𝑊21superscript𝐶1W=\frac{2}{1+C^{-1}}.italic_W = divide start_ARG 2 end_ARG start_ARG 1 + italic_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG . (8)

Here C𝐶Citalic_C is the confidence of the input sample. Intuitively, we calculate the RMSE score d𝑑ditalic_d between the reconstructed LR image I^2LRsuperscriptsubscript^𝐼2𝐿𝑅\hat{I}_{2}^{LR}over^ start_ARG italic_I end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT and the input LR image I2LRsuperscriptsubscript𝐼2𝐿𝑅{I}_{2}^{LR}italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L italic_R end_POSTSUPERSCRIPT, then make C𝐶Citalic_C equal to d1superscript𝑑1d^{-1}italic_d start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT (i.e., we employ d1superscript𝑑1d^{-1}italic_d start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT as the confidence metric). More accurate the reproduced LR images are, more accurate the degradation information is.

4 Experiments

In this section, we first introduce the datasets and implementation details. Then, we conduct experiments on images with simple, complicated, and real degradations.

4.1 Datasets and Implementation Details

Following the protocal in [13, 42], we use 800 images in DIV2K [8] and 2650 images in Flickr2K [41] as the training set, and include 4 benchmark datasets (Set5 [4], Set14 [48], B100 [33] and Urban100 [19]) for evaluation. The kernel size is fixed to 21×21212121\times 2121 × 21. We first trained our method on isotropic Gaussian kernels. The ranges of kernel width σ𝜎\sigmaitalic_σ were set to [0.2,4.0]0.24.0[0.2,4.0][ 0.2 , 4.0 ] for ×4absent4\times 4× 4 SR. Then, our framework was trained on more general degradations. Specifically, anisotropic Gaussian kernels with kernel width σ1,σ2U(0.2,4)similar-tosubscript𝜎1subscript𝜎2𝑈0.24\sigma_{1},\sigma_{2}{\sim}U(0.2,4)italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_U ( 0.2 , 4 ) and rotation angle θU(0,π)similar-to𝜃𝑈0𝜋\theta{\sim}U(0,\pi)italic_θ ∼ italic_U ( 0 , italic_π ) are employed. In addition, the range of noise level is set to [0,25]025[0,25][ 0 , 25 ]. The size of LR patch is set to 64×64646464\times 6464 × 64 for all experiments.

Table 1: PSNR and SSIM results achieved on Gaussian8 kernels for ×4absent4\times 4× 4 SR. Methods marked with are degradation estimation based approaches, while other methods are degradation representation learning based approaches. Best and second best performance are in red and blue colors, respectively. Running time is averaged on Set14.
Set5 Set14 B100 Urban100
Methods Param (M) Time (ms) PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
MANet [28] 9.90 102 30.43 0.8213 27.40 0.7464 26.63 0.7231 24.88 0.7485
DARSR [54] 3.54 62160 28.45 0.7877 26.17 0.7143 25.20 0.6920 24.09 0.7135
DANv1 [20] 4.33 151 31.89 0.8864 28.42 0.7687 27.51 0.7248 25.86 0.7721
DANv2 [30] 4.71 152 32.00 0.8885 28.50 0.7715 27.56 0.7277 25.94 0.7748
DCLS [31] 13.63 133 32.12 0.8890 28.54 0.7728 27.60 0.7285 26.15 0.7809
DASR [42] 5.84 49 31.46 0.8789 28.11 0.7603 27.44 0.7214 25.36 0.7506
CDSR [55] 13.23 113 31.33 0.8328 27.90 0.7477 27.13 0.7046 25.25 0.7492
CMDSR [47] 1.48 40 29.10 0.8146 26.57 0.7239 26.19 0.6980 23.67 0.7211
MRDA [45] 5.84 57 31.98 0.8872 28.42 0.7671 27.55 0.7254 25.90 0.7734
KDSR [46] 5.80 63 32.02 0.8892 28.46 0.7761 27.52 0.7281 25.96 0.7760
ReDSR (Ours) 6.28 49 32.18 0.8918 28.50 0.7770 27.55 0.7285 26.27 0.8050
ReDSR-L (Ours) 10.43 93 32.30 0.9035 28.75 0.7930 27.79 0.7392 26.47 0.8102

During training, 16 HRs were randomly selected. Then, we randomly selected 16 degradation models from the above ranges to generate LR images. Next, 32 HR-LR patch pairs (2 patch pairs from each image, as illustrated in Sec. 3.2.1) were divided into 2 groups. One group was fed to the degrader while another one was fed to the generator and the encoder. Sample number m𝑚mitalic_m in EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT (Eq. 5) was set to 64. We adopted the Adam optimizer [22] with the momentum of β1=0.9subscript𝛽10.9\beta_{1}=0.9italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9, β2=0.999subscript𝛽20.999\beta_{2}=0.999italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.999 for optimization. We trained the whole network for 1000 epochs. The initial learning rate was set to 0.00010.00010.00010.0001 and decreased to half after every 200 epochs. The overall loss function was defined as:

L=λ1ED+RD+SR,𝐿subscript𝜆1subscript𝐸𝐷subscript𝑅𝐷subscript𝑆𝑅L={\lambda_{1}}\mathcal{L}_{ED}+\mathcal{L}_{RD}+\mathcal{L}_{SR},italic_L = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT + caligraphic_L start_POSTSUBSCRIPT italic_S italic_R end_POSTSUBSCRIPT , (9)

where λ1=0.01subscript𝜆10.01{\lambda_{1}}=0.01italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.01.

4.2 Experiments on Simple Degradations

We first conduct experiments on simple degradations with only isotropic Gaussian kernels.

4.2.1 4.2.1 Performance Evaluation

We compare ReDSR to recent state-of-the-art blind SR methods, including MANet111Since MANet [28] is trained on anisotropic Gaussian kernels, we re-train it on isotropic Gaussian kernels for fair comparison. [28], DASR [42], CDSR222Since the pre-trained model of CDSR [55] is unavailable, we re-train it using the officially released codes. [55], DAN [20], CMDSR [47], KDSR [46], MRDA [45] and DARSR [54]. MANet and DAN require the degradation information as supervision to estimate the degradation model of the LR image. Other methods extract degradation information from the LR images in a fully unsupervised manner. Quantitative results are listed in Table 1, with visualization results being provided in Fig. 3.

Refer to caption
Figure 3: Visualization results produced by different method on Urban100. Input LR images are produced with only isotropic Gaussian blur kernels.

From Table 1 we can see that our ReDSR achieves the best performance. Degradation estimation based methods (MANet and DAN) require numerous iterations to achieve accurate estimation of the degradation and suffer relatively long inference time. In contrast, other methods achieve higher efficiency as they employ implicit degradation representation. As compared to DASR, our ReDSR produces significant accuracy improvements. This is because our re-degradation mechanism can better preserve degradation details such that superior performance is achieved. Figure 3 further compares the visualization results produced by different methods. As we can see, our ReDSR produces results with the best perceptual quality while other methods commonly suffer blurring artifacts.

4.2.2 4.2.2 Model Analyses

(1) Re-degradation Loss RDsubscript𝑅𝐷\mathcal{L}_{RD}caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT: The main idea of this paper is to extract degradation information by re-producing the LR images. To achieve this, the core design is the re-degradation loss RDsubscript𝑅𝐷\mathcal{L}_{RD}caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT. To validate its effectiveness, we first introduce a model variant (Model 1) by removing all designs. Then, we add the re-degradation loss to model 1 to obtain model 2 for comparison. It can be observed from Table 2 that the re-degradation loss improves the performance of model 1 on all kernel widths. With our re-degradation loss, model 2 can extract accurate degradation information by reproducing the LR images such that higher PSNR scores are produced.

(2) Energy Distance Loss EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT: To promote the learning of degradation information, energy distance loss EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT is employed in our method. To demonstrate its contribution to the final performance, we add the energy distance loss to model 1 to obtain model 3 for comparison. As we can see, model 3 surpasses model 1 with notable margins. In addition, we further develop model 4 by adding energy distance loss to model 2. With both RDsubscript𝑅𝐷\mathcal{L}_{RD}caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT and EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT, model 4 produces significantly higher PSNR scores as compared to model 1.

We further visualize the degradation representations extracted from images with various degradations using the t-SNE method [32]. As we can see, models 1, 2, and 3 cannot distinguish different degradations in the representation space (Fig. 4(a-c)). In contrast, with both RDsubscript𝑅𝐷\mathcal{L}_{RD}caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT and EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT, model 4 can well distinguish different degradations and gather them into discriminative clusters (Fig. 4(d)). We further conduct experiments to study the effect of different distribution types (i.e., Gaussian, uniform, and exponential distributions). As compared to models 5 and 6, model 4 produces comparable performance on different kernel widths. Moreover, the representations learned by model 4 are more discriminative. As a result, Gaussian distribution is used as the default setting for our method.

Table 2: PSNR results achieved on Urban100 for ×4absent4\times 4× 4 SR. Note that, We replace the encoder with 5 fully-connected layers to learn representations directly from true degradation in model 9.
Method RDsubscript𝑅𝐷\mathcal{L}_{RD}caligraphic_L start_POSTSUBSCRIPT italic_R italic_D end_POSTSUBSCRIPT EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT KLsubscript𝐾𝐿\mathcal{L}_{KL}caligraphic_L start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT Modulated Loss Oracle Degradation Kernel Width σ𝜎\sigmaitalic_σ
1.0 1.8 2.6 3.4
Model 1 \faTimes \faTimes \faTimes \faTimes \faTimes 25.17 25.07 24.81 24.36
Model 2 \faCheck \faTimes \faTimes \faTimes \faTimes 25.30 25.22 24.97 24.45
Model 3 \faTimes Gaussian \faTimes \faTimes \faTimes 25.26 25.20 24.88 24.42
Model 4 \faCheck Gaussian \faTimes \faTimes \faTimes 26.53 26.45 26.18 25.53
Model 5 \faCheck Uniform \faTimes \faTimes \faTimes 26.49 26.42 26.23 25.60
Model 6 \faCheck Exponential \faTimes \faTimes \faTimes 26.52 26.50 26.19 25.56
Model 7 \faCheck \faTimes Gaussian \faCheck \faTimes 26.37 26.30 25.98 25.41
Model 8 (Ours) \faCheck Gaussian \faTimes \faCheck \faTimes 26.62 26.55 26.28 25.67
Model 9 (upper bound) \faCheck Gaussian \faTimes \faCheck \faCheck 26.65 26.67 26.39 25.75
Refer to caption
Figure 4: Visualization of representations extracted from LR images with different kernel widths σ𝜎\sigmaitalic_σ. The settings of the model variants are shown in Table 2.

(3) Energy Distance Loss vs. KL Divergence Loss: The energy distance loss associates the representation distribution with a pre-defined distribution. As an alternative, KL divergence [23] is also capable of map** image space to a pre-defined distribution. To demonstrate the superiority of the energy distance loss, we replace EDsubscript𝐸𝐷\mathcal{L}_{ED}caligraphic_L start_POSTSUBSCRIPT italic_E italic_D end_POSTSUBSCRIPT with KL divergence loss to obtain model 7 for comparison. As compared to model 7, our method (model 8) produces significant performance improvements with over 0.2 PSNR gains on different kernel widths. We also visualize the representations extracted from model 7 in Fig. 4(g). It can be observed that several degradations are not well distinguished. For example, representations for σ=4.1𝜎4.1\sigma=4.1italic_σ = 4.1 (orange points) are mixed with the ones for σ=5.0𝜎5.0\sigma=5.0italic_σ = 5.0 (purple points). In contrast, the representations learned by our method gather into several distinct clusters (Fig. 4(h)), which further demonstrates that accurate degradation information can be learned by our method. In addition, as compared to DASR (Fig. 4(i)), our degradation representation can better distinguish the subtle degradation differences.

(4) Evolution of Representation Clusters: During the training phase, ED loss helps the encoder to gradually distinguish different degradations (Fig. 5).

Refer to caption
Figure 5: Visualization of representations in different epochs.

(5) Modulated Reconstruction Loss: The modulated reconstruction loss is developed to dynamically tune the weights of samples with different confidences to stabilize the optimization of the networks. To validate its effectiveness, we develop model 4 by removing the modulated reconstruction loss and compare its performance to our model. As we can see, when modulated reconstruction loss is removed, model 4 suffers notable performance drop, especially on large kernel widths (e.g., 25.67\rightarrow25.53 on σ=3.4𝜎3.4\sigma=3.4italic_σ = 3.4).

(6) Oracle Degradataion Representations: To study the upper bound of our method, we develop a model with oracle degradation information. Specifically, we replace the degradation encoder in the network with 5 fully-connected layers (model 9) to directly learn degradation representations from the true degradation models. It can be observed that model 9 produces the best results on all kernel widths. However, model 8 still achieves competitive performance against model 9, which demonstrates the high accuracy of the degradation representations learned by our method.

(7) Robustness of Degradation Representations: Our representation learning scheme aims at extracting content-invariant degradation information. To validate this, we conduct experiment to study the effects of various image contents to our degradation representations. Specifically, given an HR image, we first generate a LR image I1subscript𝐼1I_{1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT using a Gaussian kernel k𝑘kitalic_k. Then we randomly select another 99 HR images to generate LR images using k𝑘kitalic_k. Afterwards, 100 degradation representations are extracted from these LR images to super-resolve I1subscript𝐼1I_{1}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. As shown in Fig. 6, it is clear that our ReDSR achieves relatively stable performance. This further verifies that our ReDSR can extract degradation information from different image contents.

Table 3: PSNR results achieved on Urban100 for ×4absent4\times 4× 4 SR, testing on Anisotropic Gaussian blur and noise.
Method #Params. Time Noise Blur Kernel
[Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image] [Uncaptioned image]
DnCnn [51] +DANv2 [30] 650K +4.71M 155ms 0 25.57 25.45 25.40 25.27 25.17 25.22 25.03 24.90 24.70
10 24.41 24.26 24.15 23.95 23.80 23.75 23.52 23.32 23.05
20 23.65 23.49 23.37 23.20 23.01 22.78 22.55 22.30 22.04
DnCNN [51] +DCLS [31] 650K +19.05M 170ms 0 24.85 24.78 24.68 24.52 24.41 24.25 24.08 23.92 23.65
10 23.83 23.75 23.60 23.42 23.22 23.01 22.77 22.49 22.15
20 23.45 23.32 23.14 22.94 22.70 22.47 22.23 22.00 21.78
DASR [42] 5.84M 49ms 0 25.00 24.90 24.80 24.77 24.71 24.64 24.58 24.47 24.30
10 24.07 23.93 23.77 23.56 23.37 23.20 23.02 22.82 22.63
20 23.33 23.18 23.02 22.84 22.66 22.48 22.30 22.12 21.95
MRDA [45] 5.84M 57ms 0 25.43 25.38 25.29 25.19 25.13 25.03 24.93 24.74 24.52
10 24.39 24.27 24.11 23.90 23.70 23.52 23.32 23.11 22.89
20 23.57 23.44 23.28 23.10 22.93 22.75 22.56 22.38 22.20
KDSR [46] 5.80M 63ms 0 25.69 25.68 25.63 25.58 25.54 25.47 25.37 25.25 25.09
10 24.58 24.48 24.33 24.13 23.93 23.75 23.54 23.32 23.12
20 23.69 23.57 23.42 23.24 23.06 22.87 22.68 22.49 22.31
ReDSR (Ours) 6.28M 49ms 0 25.73 25.77 25.68 25.62 25.60 25.55 25.43 25.35 25.16
10 24.74 24.63 24.50 24.31 24.11 23.92 23.71 23.50 23.28
20 23.86 23.73 23.59 23.41 23.22 23.03 22.83 22.62 22.44
Refer to caption
Figure 6: PSNR results achieved on img_27.pngformulae-sequence𝑖𝑚𝑔_27𝑝𝑛𝑔img\_27.pngitalic_i italic_m italic_g _ 27 . italic_p italic_n italic_g in Urban100, using representations from various image contents with the same degradation.

4.3 Experiments on General Degradations

We conduct experiments on general degradations with anisotropic Gaussian kernels and noises. Specifically, 9 anisotropic Gaussian kernels and different noise levels are employed.

(1) Performance Evaluation: It can be observed from Table 3 that our ReDSR outperforms other comparative methods on all blur kernels and noise levels. Specifically, DANv2 [51] performs favorably against another three methods (i.e., DCLS [31], DASR [42], MRDA [45]) but is time-consuming since numerous iterations are required. As compared to DAN, our method produces better performance in terms of both accuracy and efficiency. Visualization results achieved by different methods are illustrated in Fig. 7. As we can see, our ReDSR achieves better visual quality while other methods suffer blurring artifacts.

Refer to caption
Figure 7: Visual comparison achieved on Urban100. Noise level is set to 10 and 20 for these two images, respectively.
Refer to caption
Figure 8: Visualization of representations extracted from images with various blur kernels and noise levels. (a)(d)(g): θ=π4,n=10formulae-sequence𝜃𝜋4𝑛10\theta=\frac{\pi}{4},n=10italic_θ = divide start_ARG italic_π end_ARG start_ARG 4 end_ARG , italic_n = 10; (b)(e)(h): (σ1,σ2)=(2.4,1.0),n=10formulae-sequencesubscript𝜎1subscript𝜎22.41.0𝑛10(\sigma_{1},\sigma_{2})=(2.4,1.0),n=10( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( 2.4 , 1.0 ) , italic_n = 10; (c)(f)(i): (σ1,σ2)=(2.4,1.0),θ=π4formulae-sequencesubscript𝜎1subscript𝜎22.41.0𝜃𝜋4(\sigma_{1},\sigma_{2})=(2.4,1.0),\theta=\frac{\pi}{4}( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( 2.4 , 1.0 ) , italic_θ = divide start_ARG italic_π end_ARG start_ARG 4 end_ARG.
Refer to caption
Figure 9: Visualization of representations extracted from LR images and pre-defined distributions. The 3 Encoders of ReDSR are trained on Gaussian, Exponential and Uniform distribution, respectively.

(2) Visualization of Degradation Representation: As compared to simple degradations with only isotropic Gaussian kernels, general degradations are more difficult to be distinguished. We further visualize the degradation representations in Fig. 9 using the t-SNE method. From the last two rows, we can see that DASR and KDSR cannot well distinguish different degradations. Particularly, degradation representations extracted from images with different noise levels are mixed together in KDSR’s representation space, while DASR is confused by different Gaussian kernels. In contrast, our ReDSR produces more distinctive clusters, which demonstrates its effectiveness in extracting accurate degradation information.

Our ReDSR employs an energy distance loss to associate the distribution of the learned representations with a pre-defined distribution. This bounded constraint enables the learning of a compact representation space with a bijective map** to the degradation space. To demonstrate this, we synthesize LR images with six different degradations (random Gaussian blur kernels and Gaussian noise levels) and visualize their representations in Fig. 9. In addition, we also randomly sample 1000 representations from a pre-defined distribution. Without bounded constraint, the representations extracted by DASR and KDSR collapse into a small subspace of the Gaussian distribution. In contrast, the representations learned by our ReDSRs span the pre-defined distribution, establishing a bijective map** and facilitating the learning of distinctive representations, especially for unseen degradations.

4.4 Experiments on Real Degradations

(1) Performance Evalution: We further conduct experiments on real degradations to demonstrate the generalization capability of our method. Following [42], ReDSR trained on isotropic Gaussian kernels is used for evaluation on real images. Visualization results are shown in Fig. 10. It can be observed that ReDSR produces high-quality images with clearer details and fewer blurring artifacts. For example, in the first scene, the texts in the SR results obtained by previous methods are blurry. In contrast, our ReDSR produces results with sharper edges of higher perceptual quality.

Refer to caption
Figure 10: Visualization results by real-world images

(2) Generalization Improvement by ED Loss: As shown in Fig. 9, degradation representations generated by DASR/KDSR collapse into a subspace in Gaussian distribution. This indicates that subtle degradation differences cannot be distinguished. With our ED loss, the representations are embedded into a regular pre-defined space, which helps decouple different degradations and amplifies their differences between them. Then differences between various degradations can be well recognized to generate more discriminative and robust representations.

Table 4: PSNR results on Urban100 for models trained on isotropic kernels. Note that, LR images are generated by anisotropic Gaussian kernels.
σ1σ2subscript𝜎1subscript𝜎2\frac{\sigma_{1}}{\sigma_{2}}divide start_ARG italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG 0.50.20.50.2\frac{0.5}{0.2}divide start_ARG 0.5 end_ARG start_ARG 0.2 end_ARG 1.40.81.40.8\frac{1.4}{0.8}divide start_ARG 1.4 end_ARG start_ARG 0.8 end_ARG 2.31.42.31.4\frac{2.3}{1.4}divide start_ARG 2.3 end_ARG start_ARG 1.4 end_ARG 3.22.03.22.0\frac{3.2}{2.0}divide start_ARG 3.2 end_ARG start_ARG 2.0 end_ARG 4.12.64.12.6\frac{4.1}{2.6}divide start_ARG 4.1 end_ARG start_ARG 2.6 end_ARG 5.03.25.03.2\frac{5.0}{3.2}divide start_ARG 5.0 end_ARG start_ARG 3.2 end_ARG
ReDSR (Ours) 26.57 26.57 26.41 25.85 24.71 23.68
KDSR [46] 26.35 26.08 25.25 24.21 23.28 22.52
DASR[42] 25.83 25.54 24.86 23.93 23.12 22.41
DASR[42] + ED loss 25.90 25.75 25.36 24.68 23.60 22.79
Refer to caption
Figure 11: Visualization of representations for models trained on isotropic kernels. Note that, degradation representations are from LR images generated by anisotropic Gaussian kernels.

As shown in Fig. 11(a), we use model 8 in Table 2 (trained on isotropic Gaussian kernels) to extract representations from anisotropic Gaussian kernels. It also forms distinct clusters. To validate the universality of ED loss, we retrain contrastive learning methods (i.e., DASR) on isotropic Gaussian kernels with a bounded constraint for the representations. Figure 11(c) shows more distinct clusters than pure DASR. Quantitative results are listed in Table 4. ReDSR is stable for unseen degradation while the others suffer from severe performance drop. Besides, ED loss helps DASR generate robust representations to obtain higher and stable PSNR scores. ED loss enhances the robustness of the representation space and is easy to couple with other representation-based methods.

5 Conclusion

In this paper, we propose an alternative to learn degradation representations by re-producing LR images. In addition, we introduce an energy distance loss to associate learned representations with a pre-defined distribution for superior generalization capability. It is demonstrated that our degradation representation learning scheme can extract discriminative representations to obtain accurate degradation information. Experimental results show that our network achieves state-of-the-art performance for blind SR with various degradations.

References

  • [1] Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for boltzmann machines. Cognitive Science 9(1), 147–169 (1985)
  • [2] Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: Cvae-gan: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2745–2754 (2017)
  • [3] Bell-Kligler, S., Shocher, A., Irani, M.: Blind super-resolution kernel estimation using an internal-gan. Advances in Neural Information Processing Systems 32 (2019)
  • [4] Bevilacqua, M., Roumy, A., Guillemot, C., Morel, M.L.A.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: British Machine Vision Conference (BMVC) (2012)
  • [5] Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2), 295–307 (2015)
  • [6] Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. Advances in Neural Information Processing Systems 27 (2014)
  • [7] Du, Y., Mordatch, I.: Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems 32 (2019)
  • [8] Eirikur, A., Radu, T.: Ntire 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 126–135 (2017)
  • [9] Goldenberg, I., Webb, G.I.: Survey of distance measures for quantifying concept drift and shift in numeric data. Knowledge and Information Systems 60(2), 591–615 (2019)
  • [10] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014)
  • [11] Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel method for the two-sample-problem. Advances in Neural Information Processing Systems 19 (2006)
  • [12] Gritsenko, A., Salimans, T., van den Berg, R., Snoek, J., Kalchbrenner, N.: A spectral energy distance for parallel speech synthesis. Advances in Neural Information Processing Systems 33, 13062–13072 (2020)
  • [13] Gu, J., Lu, H., Zuo, W., Dong, C.: Blind super-resolution with iterative kernel correction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1604–1613 (2019)
  • [14] Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant map**. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). vol. 2, pp. 1735–1742. IEEE (2006)
  • [15] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9729–9738 (2020)
  • [16] Hinton, G., Osindero, S., Welling, M., Teh, Y.W.: Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive Science 30(4), 725–731 (2006)
  • [17] Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)
  • [18] Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
  • [19] Huang, J.B., Singh, A., Ahuja, N.: Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5197–5206 (2015)
  • [20] Huang, Y., Li, S., Wang, L., Tan, T., et al.: Unfolding the alternating optimization for blind super resolution. Advances in Neural Information Processing Systems 33, 5632–5643 (2020)
  • [21] Kim, S.Y., Sim, H., Kim, M.: Koalanet: Blind super-resolution using kernel-oriented adaptive local adjustment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10611–10620 (2021)
  • [22] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [23] Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
  • [24] Kumar, S., Hebert, M.: Discriminative fields for modeling spatial dependencies in natural images. Advances in Neural Information Processing Systems 16 (2003)
  • [25] LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. Predicting Structured Data 1(0) (2006)
  • [26] Li, D., Zhang, Y., Cheung, K.C., Wang, X., Qin, H., Li, H.: Learning degradation representations for image deblurring. In: European Conference on Computer Vision. pp. 736–753. Springer (2022)
  • [27] Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: International Conference on Machine Learning. pp. 1718–1727. PMLR (2015)
  • [28] Liang, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Mutual affine network for spatially variant kernel estimation in blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4096–4105 (2021)
  • [29] Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 136–144 (2017)
  • [30] Luo, Z., Huang, Y., Li, S., Wang, L., Tan, T.: End-to-end alternating optimization for real-world blind super resolution. International Journal of Computer Vision 131(12), 3152–3169 (2023)
  • [31] Luo, Z., Huang, H., Yu, L., Li, Y., Fan, H., Liu, S.: Deep constrained least squares for blind image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17642–17652 (2022)
  • [32] Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(11) (2008)
  • [33] Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. vol. 2, pp. 416–423. IEEE (2001)
  • [34] Mnih, A., Hinton, G.: Learning nonlinear constraints with contrastive backpropagation. In: Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. vol. 2, pp. 1302–1307. IEEE (2005)
  • [35] Ranzato, M., Poultney, C., Chopra, S., Cun, Y.: Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems 19 (2006)
  • [36] Rizzo, M.L., Székely, G.J.: Energy distance. wiley interdisciplinary reviews: Computational Statistics 8(1), 27–38 (2016)
  • [37] Salakhutdinov, R., Hinton, G.: Deep boltzmann machines. In: Artificial Intelligence and Statistics. pp. 448–455. PMLR (2009)
  • [38] Salakhutdinov, R., Larochelle, H.: Efficient learning of deep boltzmann machines. In: Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics. pp. 693–700. JMLR Workshop and Conference Proceedings (2010)
  • [39] Shocher, A., Cohen, N., Irani, M.: “zero-shot” super-resolution using deep internal learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3118–3126 (2018)
  • [40] Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3147–3155 (2017)
  • [41] Timofte, R., Agustsson, E., Van Gool, L., Yang, M.H., Zhang, L.: Ntire 2017 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 114–125 (2017)
  • [42] Wang, L., Wang, Y., Dong, X., Xu, Q., Yang, J., An, W., Guo, Y.: Unsupervised degradation representation learning for blind super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10581–10590 (2021)
  • [43] Wang, X., Xie, L., Dong, C., Shan, Y.: Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1905–1914 (2021)
  • [44] Wang, Y., Ming, J., Jia, X., Elder, J.H., Lu, H.: Blind image super-resolution with degradation-aware adaptation. In: Proceedings of the Asian Conference on Computer Vision. pp. 894–910 (2022)
  • [45] Xia, B., Tian, Y., Zhang, Y., Hang, Y., Yang, W., Liao, Q.: Meta-learning based degradation representation for blind super-resolution. IEEE Transactions on Image Processing (2023)
  • [46] Xia, B., Zhang, Y., Wang, Y., Tian, Y., Yang, W., Timofte, R., Van Gool, L.: Knowledge distillation based degradation estimation for blind super-resolution. In: The Eleventh International Conference on Learning Representations (2022)
  • [47] Yin, G., Wang, W., Yuan, Z., Ji, W., Yu, D., Sun, S., Chua, T.S., Wang, C.: Conditional hyper-network for blind super-resolution with multiple degradations. IEEE Transactions on Image Processing 31, 3949–3960 (2022)
  • [48] Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. pp. 711–730. Springer (2012)
  • [49] Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3217–3226 (2020)
  • [50] Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 4791–4800 (2021)
  • [51] Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26(7), 3142–3155 (2017)
  • [52] Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., Fu, Y.: Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 286–301 (2018)
  • [53] Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial networks. In: International Conference on Learning Representations (2016)
  • [54] Zhou, H., Zhu, X., Zhu, J., Han, Z., Zhang, S.X., Qin, J., Yin, X.C.: Learning correction filter via degradation-adaptive regression for blind single image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 12365–12375 (2023)
  • [55] Zhou, Y., Lin, C., Luo, D., Liu, Y., Tai, Y., Wang, C., Chen, M.: Joint learning content and degradation aware feature for blind super-resolution. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 2606–2616 (2022)