-
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Authors:
Cansu Korkmaz,
A. Murat Tekalp,
Zafer Dogan
Abstract:
Super-resolution (SR) is an ill-posed inverse problem, where the size of the set of feasible solutions that are consistent with a given low-resolution image is very large. Many algorithms have been proposed to find a "good" solution among the feasible solutions that strike a balance between fidelity and perceptual quality. Unfortunately, all known methods generate artifacts and hallucinations whil…
▽ More
Super-resolution (SR) is an ill-posed inverse problem, where the size of the set of feasible solutions that are consistent with a given low-resolution image is very large. Many algorithms have been proposed to find a "good" solution among the feasible solutions that strike a balance between fidelity and perceptual quality. Unfortunately, all known methods generate artifacts and hallucinations while trying to reconstruct high-frequency (HF) image details. A fundamental question is: Can a model learn to distinguish genuine image details from artifacts? Although some recent works focused on the differentiation of details and artifacts, this is a very challenging problem and a satisfactory solution is yet to be found. This paper shows that the characterization of genuine HF details versus artifacts can be better learned by training GAN-based SR models using wavelet-domain loss functions compared to RGB-domain or Fourier-space losses. Although wavelet-domain losses have been used in the literature before, they have not been used in the context of the SR task. More specifically, we train the discriminator only on the HF wavelet sub-bands instead of on RGB images and the generator is trained by a fidelity loss over wavelet subbands to make it sensitive to the scale and orientation of structures. Extensive experimental results demonstrate that our model achieves better perception-distortion trade-off according to multiple objective measures and visual evaluations.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Trustworthy SR: Resolving Ambiguity in Image Super-resolution via Diffusion Models and Human Feedback
Authors:
Cansu Korkmaz,
Ege Cirakman,
A. Murat Tekalp,
Zafer Dogan
Abstract:
Super-resolution (SR) is an ill-posed inverse problem with a large set of feasible solutions that are consistent with a given low-resolution image. Various deterministic algorithms aim to find a single solution that balances fidelity and perceptual quality; however, this trade-off often causes visual artifacts that bring ambiguity in information-centric applications. On the other hand, diffusion m…
▽ More
Super-resolution (SR) is an ill-posed inverse problem with a large set of feasible solutions that are consistent with a given low-resolution image. Various deterministic algorithms aim to find a single solution that balances fidelity and perceptual quality; however, this trade-off often causes visual artifacts that bring ambiguity in information-centric applications. On the other hand, diffusion models (DMs) excel in generating a diverse set of feasible SR images that span the solution space. The challenge is then how to determine the most likely solution among this set in a trustworthy manner. We observe that quantitative measures, such as PSNR, LPIPS, DISTS, are not reliable indicators to resolve ambiguous cases. To this effect, we propose employing human feedback, where we ask human subjects to select a small number of likely samples and we ensemble the averages of selected samples. This strategy leverages the high-quality image generation capabilities of DMs, while recognizing the importance of obtaining a single trustworthy solution, especially in use cases, such as identification of specific digits or letters, where generating multiple feasible solutions may not lead to a reliable outcome. Experimental results demonstrate that our proposed strategy provides more trustworthy solutions when compared to state-of-the art SR methods.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Optimal Nonlinearities Improve Generalization Performance of Random Features
Authors:
Samet Demir,
Zafer Doğan
Abstract:
Random feature model with a nonlinear activation function has been shown to perform asymptotically equivalent to a Gaussian model in terms of training and generalization errors. Analysis of the equivalent model reveals an important yet not fully understood role played by the activation function. To address this issue, we study the "parameters" of the equivalent model to achieve improved generaliza…
▽ More
Random feature model with a nonlinear activation function has been shown to perform asymptotically equivalent to a Gaussian model in terms of training and generalization errors. Analysis of the equivalent model reveals an important yet not fully understood role played by the activation function. To address this issue, we study the "parameters" of the equivalent model to achieve improved generalization performance for a given supervised learning problem. We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities. We provide two example classes from this set, e.g., second-order polynomial and piecewise linear functions. These functions are optimized to improve generalization performance regardless of the actual form. We experiment with regression and classification problems, including synthetic and real (e.g., CIFAR10) data. Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU. Furthermore, we illustrate that the proposed nonlinearities also mitigate the so-called double descent phenomenon, which is known as the non-monotonic generalization performance regarding the sample size and the model size.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
MMSR: Multiple-Model Learned Image Super-Resolution Benefiting From Class-Specific Image Priors
Authors:
Cansu Korkmaz,
A. Murat Tekalp,
Zafer Dogan
Abstract:
Assuming a known degradation model, the performance of a learned image super-resolution (SR) model depends on how well the variety of image characteristics within the training set matches those in the test set. As a result, the performance of an SR model varies noticeably from image to image over a test set depending on whether characteristics of specific images are similar to those in the trainin…
▽ More
Assuming a known degradation model, the performance of a learned image super-resolution (SR) model depends on how well the variety of image characteristics within the training set matches those in the test set. As a result, the performance of an SR model varies noticeably from image to image over a test set depending on whether characteristics of specific images are similar to those in the training set or not. Hence, in general, a single SR model cannot generalize well enough for all types of image content. In this work, we show that training multiple SR models for different classes of images (e.g., for text, texture, etc.) to exploit class-specific image priors and employing a post-processing network that learns how to best fuse the outputs produced by these multiple SR models surpasses the performance of state-of-the-art generic SR models. Experimental results clearly demonstrate that the proposed multiple-model SR (MMSR) approach significantly outperforms a single pre-trained state-of-the-art SR model both quantitatively and visually. It even exceeds the performance of the best single class-specific SR model trained on similar text or texture images.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Perception-Distortion Trade-off in the SR Space Spanned by Flow Models
Authors:
Cansu Korkmaz,
A. Murat Tekalp,
Zafer Dogan,
Erkut Erdem,
Aykut Erdem
Abstract:
Flow-based generative super-resolution (SR) models learn to produce a diverse set of feasible SR solutions, called the SR space. Diversity of SR solutions increases with the temperature ($τ$) of latent variables, which introduces random variations of texture among sample solutions, resulting in visual artifacts and low fidelity. In this paper, we present a simple but effective image ensembling/fus…
▽ More
Flow-based generative super-resolution (SR) models learn to produce a diverse set of feasible SR solutions, called the SR space. Diversity of SR solutions increases with the temperature ($τ$) of latent variables, which introduces random variations of texture among sample solutions, resulting in visual artifacts and low fidelity. In this paper, we present a simple but effective image ensembling/fusion approach to obtain a single SR image eliminating random artifacts and improving fidelity without significantly compromising perceptual quality. We achieve this by benefiting from a diverse set of feasible photo-realistic solutions in the SR space spanned by flow models. We propose different image ensembling and fusion strategies which offer multiple paths to move sample solutions in the SR space to more desired destinations in the perception-distortion plane in a controllable manner depending on the fidelity vs. perceptual quality requirements of the task at hand. Experimental results demonstrate that our image ensembling/fusion strategy achieves more promising perception-distortion trade-off compared to sample SR images produced by flow models and adversarially trained models in terms of both quantitative metrics and visual quality.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Two-stage domain adapted training for better generalization in real-world image restoration and super-resolution
Authors:
Cansu Korkmaz,
A. Murat Tekalp,
Zafer Dogan
Abstract:
It is well-known that in inverse problems, end-to-end trained networks overfit the degradation model seen in the training set, i.e., they do not generalize to other types of degradations well. Recently, an approach to first map images downsampled by unknown filters to bicubicly downsampled look-alike images was proposed to successfully super-resolve such images. In this paper, we show that any inv…
▽ More
It is well-known that in inverse problems, end-to-end trained networks overfit the degradation model seen in the training set, i.e., they do not generalize to other types of degradations well. Recently, an approach to first map images downsampled by unknown filters to bicubicly downsampled look-alike images was proposed to successfully super-resolve such images. In this paper, we show that any inverse problem can be formulated by first map** the input degraded images to an intermediate domain, and then training a second network to form output images from these intermediate images. Furthermore, the best intermediate domain may vary according to the task. Our experimental results demonstrate that this two-stage domain-adapted training strategy does not only achieve better results on a given class of unknown degradations but can also generalize to other unseen classes of degradations better.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
On the Computation of PSNR for a Set of Images or Video
Authors:
Onur Keleş,
M. Akın Yılmaz,
A. Murat Tekalp,
Cansu Korkmaz,
Zafer Dogan
Abstract:
When comparing learned image/video restoration and compression methods, it is common to report peak-signal to noise ratio (PSNR) results. However, there does not exist a generally agreed upon practice to compute PSNR for sets of images or video. Some authors report average of individual image/frame PSNR, which is equivalent to computing a single PSNR from the geometric mean of individual image/fra…
▽ More
When comparing learned image/video restoration and compression methods, it is common to report peak-signal to noise ratio (PSNR) results. However, there does not exist a generally agreed upon practice to compute PSNR for sets of images or video. Some authors report average of individual image/frame PSNR, which is equivalent to computing a single PSNR from the geometric mean of individual image/frame mean-square error (MSE). Others compute a single PSNR from the arithmetic mean of frame MSEs for each video. Furthermore, some compute the MSE/PSNR of Y-channel only, while others compute MSE/PSNR for RGB channels. This paper investigates different approaches to computing PSNR for sets of images, single video, and sets of video and the relation between them. We show the difference between computing the PSNR based on arithmetic vs. geometric mean of MSE depends on the distribution of MSE over the set of images or video, and that this distribution is task-dependent. In particular, these two methods yield larger differences in restoration problems, where the MSE is exponentially distributed and smaller differences in compression problems, where the MSE distribution is narrower. We hope this paper will motivate the community to clearly describe how they compute reported PSNR values to enable consistent comparison.
△ Less
Submitted 30 April, 2021;
originally announced April 2021.
-
Domain-Informed Spline Interpolation
Authors:
Hamid Behjat,
Zafer Doğan,
Dimitri Van De Ville,
Leif Sörnmo
Abstract:
Standard interpolation techniques are implicitly based on the assumption that the signal lies on a single homogeneous domain. In contrast, many naturally occurring signals lie on an inhomogeneous domain, such as brain activity associated to different brain tissue. We propose an interpolation method that instead exploits prior information about domain inhomogeneity, characterized by different, pote…
▽ More
Standard interpolation techniques are implicitly based on the assumption that the signal lies on a single homogeneous domain. In contrast, many naturally occurring signals lie on an inhomogeneous domain, such as brain activity associated to different brain tissue. We propose an interpolation method that instead exploits prior information about domain inhomogeneity, characterized by different, potentially overlap**, subdomains. As proof of concept, the focus is put on extending conventional shift-invariant B-spline interpolation. Given a known inhomogeneous domain, B-spline interpolation of a given order is extended to a domain-informed, shift-variant interpolation. This is done by constructing a domain-informed generating basis that satisfies stability properties. We illustrate example constructions of domain-informed generating basis, and show their property in increasing the coherence between the generating basis and the given inhomogeneous domain. By advantageously exploiting domain knowledge, we demonstrate the benefit of domain-informed interpolation over standard B-spline interpolation through Monte Carlo simulations across a range of B-spline orders. We also demonstrate the feasibility of domain-informed interpolation in a neuroimaging application where the domain information is available by a complementary image contrast. The results show the benefit of incorporating domain knowledge so that an interpolant consistent to the anatomy of the brain is obtained.
△ Less
Submitted 27 June, 2019; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Interpolation in the Presence of Domain Inhomogeneity
Authors:
Hamid Behjat,
Zafer Doğan,
Dimitri Van De Ville,
Leif Sörnmo
Abstract:
Standard interpolation techniques are implicitly based on the assumption that the signal lies on a homogeneous domain. In this letter, the proposed interpolation method instead exploits prior information about domain inhomogeneity, characterized by different, potentially overlap**, subdomains. By introducing a domain-similarity metric for each sample, the interpolation process is then based on a…
▽ More
Standard interpolation techniques are implicitly based on the assumption that the signal lies on a homogeneous domain. In this letter, the proposed interpolation method instead exploits prior information about domain inhomogeneity, characterized by different, potentially overlap**, subdomains. By introducing a domain-similarity metric for each sample, the interpolation process is then based on a domain-informed consistency principle. We illustrate and demonstrate the feasibility of domain-informed linear interpolation in 1D, and also, on a real fMRI image in 2D. The results show the benefit of incorporating domain knowledge so that, for example, sharp domain boundaries can be recovered by the interpolation, if such information is available.
△ Less
Submitted 13 April, 2017; v1 submitted 13 February, 2017;
originally announced February 2017.