Search | arXiv e-print repository

Does Diffusion Beat GAN in Image Super Resolution?

Authors: Denis Kuznedelev, Valerii Startsev, Daniil Shlenskii, Sergey Kastryulin

Abstract: There is a prevalent opinion in the recent literature that Diffusion-based models outperform GAN-based counterparts on the Image Super Resolution (ISR) problem. However, in most studies, Diffusion-based ISR models were trained longer and utilized larger networks than the GAN baselines. This raises the question of whether the superiority of Diffusion models is due to the Diffusion paradigm being be… ▽ More There is a prevalent opinion in the recent literature that Diffusion-based models outperform GAN-based counterparts on the Image Super Resolution (ISR) problem. However, in most studies, Diffusion-based ISR models were trained longer and utilized larger networks than the GAN baselines. This raises the question of whether the superiority of Diffusion models is due to the Diffusion paradigm being better suited for the ISR task or if it is a consequence of the increased scale and computational resources used in contemporary studies. In our work, we compare Diffusion-based and GAN-based Super Resolution under controlled settings, where both approaches are matched in terms of architecture, model and dataset size, and computational budget. We show that a GAN-based model can achieve results comparable to a Diffusion-based model. Additionally, we explore the impact of design choices such as text conditioning and augmentation on the performance of ISR models, showcasing their effect on several downstream tasks. We will release the inference code and weights of our scaled GAN. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.05666 [pdf, other]

YaART: Yet Another ART Rendering Technology

Authors: Sergey Kastryulin, Artem Konev, Alexander Shishenya, Eugene Lyapustin, Artem Khurshudov, Alexander Tselousov, Nikita Vinokurov, Denis Kuznedelev, Alexander Markovich, Grigoriy Livshits, Alexey Kirillov, Anastasiia Tabisheva, Liubov Chubarova, Marina Kaminskaia, Alexander Ustyuzhanin, Artemii Shvetsov, Daniil Shlenskii, Valerii Startsev, Dmitrii Kornilov, Mikhail Romanov, Artem Babenko, Sergei Ovcharenko, Valentin Khrulkov

Abstract: In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus… ▽ More In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Prompts and additional information are available on the project page, see https://ya.ru/ai/art/paper-yaart-v1

arXiv:2403.06866 [pdf, other]

QUASAR: QUality and Aesthetics Scoring with Advanced Representations

Authors: Sergey Kastryulin, Denis Prokopenko, Artem Babenko, Dmitry V. Dylov

Abstract: This paper introduces a new data-driven, non-parametric method for image quality and aesthetics assessment, surpassing existing approaches and requiring no prompt engineering or fine-tuning. We eliminate the need for expressive textual embeddings by proposing efficient image anchors in the data. Through extensive evaluations of 7 state-of-the-art self-supervised models, our method demonstrates sup… ▽ More This paper introduces a new data-driven, non-parametric method for image quality and aesthetics assessment, surpassing existing approaches and requiring no prompt engineering or fine-tuning. We eliminate the need for expressive textual embeddings by proposing efficient image anchors in the data. Through extensive evaluations of 7 state-of-the-art self-supervised models, our method demonstrates superior performance and robustness across various datasets and benchmarks. Notably, it achieves high agreement with human assessments even with limited data and shows high robustness to the nature of data and their pre-processing pipeline. Our contributions offer a streamlined solution for assessment of images while providing insights into the perception of visual information. △ Less

Submitted 20 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2208.14818 [pdf, other]

PyTorch Image Quality: Metrics for Image Quality Assessment

Authors: Sergey Kastryulin, Jamil Zakirov, Denis Prokopenko, Dmitry V. Dylov

Abstract: Image Quality Assessment (IQA) metrics are widely used to quantitatively estimate the extent of image degradation following some forming, restoring, transforming, or enhancing algorithms. We present PyTorch Image Quality (PIQ), a usability-centric library that contains the most popular modern IQA algorithms, guaranteed to be correctly implemented according to their original propositions and thorou… ▽ More Image Quality Assessment (IQA) metrics are widely used to quantitatively estimate the extent of image degradation following some forming, restoring, transforming, or enhancing algorithms. We present PyTorch Image Quality (PIQ), a usability-centric library that contains the most popular modern IQA algorithms, guaranteed to be correctly implemented according to their original propositions and thoroughly verified. In this paper, we detail the principles behind the foundation of the library, describe the evaluation strategy that makes it reliable, provide the benchmarks that showcase the performance-time trade-offs, and underline the benefits of GPU acceleration given the library is used within the PyTorch backend. PyTorch Image Quality is an open source software: https://github.com/photosynthesis-team/piq/. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: 20 pages with appendix; 4 Figures

arXiv:2203.07809 [pdf, other]

doi 10.1109/ACCESS.2023.3243466

Image Quality Assessment for Magnetic Resonance Imaging

Authors: Segrey Kastryulin, Jamil Zakirov, Nicola Pezzotti, Dmitry V. Dylov

Abstract: Image quality assessment (IQA) algorithms aim to reproduce the human's perception of the image quality. The growing popularity of image enhancement, generation, and recovery models instigated the development of many methods to assess their performance. However, most IQA solutions are designed to predict image quality in the general domain, with the applicability to specific areas, such as medical… ▽ More Image quality assessment (IQA) algorithms aim to reproduce the human's perception of the image quality. The growing popularity of image enhancement, generation, and recovery models instigated the development of many methods to assess their performance. However, most IQA solutions are designed to predict image quality in the general domain, with the applicability to specific areas, such as medical imaging, remaining questionable. Moreover, the selection of these IQA metrics for a specific task typically involves intentionally induced distortions, such as manually added noise or artificial blurring; yet, the chosen metrics are then used to judge the output of real-life computer vision models. In this work, we aspire to fill these gaps by carrying out the most extensive IQA evaluation study for Magnetic Resonance Imaging (MRI) to date (14,700 subjective scores). We use outputs of neural network models trained to solve problems relevant to MRI, including image reconstruction in the scan acceleration, motion correction, and denoising. Our emphasis is on reflecting the radiologist's perception of the reconstructed images, gauging the most diagnostically influential criteria for the quality of MRI scans: signal-to-noise ratio, contrast-to-noise ratio, and the presence of artifacts. Seven trained radiologists assess these distorted images, with their verdicts then correlated with 35 different image quality metrics (full-reference, no-reference, and distribution-based metrics considered). The top performers -- DISTS, HaarPSI, VSI, and FID-VGG16 -- are found to be efficient across three proposed quality criteria, for all considered anatomies and the target tasks. △ Less

Submitted 1 July, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: 13 pages, 8 figures, V2: under review in Medical Image Analysis (revised)

Journal ref: IEEE Access, V.11 pp. 14154-14168, 2023

arXiv:2103.02940 [pdf, ps, other]

doi 10.1007/978-3-030-87231-1_25

Towards Ultrafast MRI via Extreme k-Space Undersampling and Superresolution

Authors: Aleksandr Belov, Joel Stadelmann, Sergey Kastryulin, Dmitry V. Dylov

Abstract: We went below the MRI acceleration factors (a.k.a., k-space undersampling) reported by all published papers that reference the original fastMRI challenge, and then considered powerful deep learning based image enhancement methods to compensate for the underresolved images. We thoroughly study the influence of the sampling patterns, the undersampling and the downscaling factors, as well as the reco… ▽ More We went below the MRI acceleration factors (a.k.a., k-space undersampling) reported by all published papers that reference the original fastMRI challenge, and then considered powerful deep learning based image enhancement methods to compensate for the underresolved images. We thoroughly study the influence of the sampling patterns, the undersampling and the downscaling factors, as well as the recovery models on the final image quality for both the brain and the knee fastMRI benchmarks. The quality of the reconstructed images surpasses that of the other methods, yielding an MSE of 0.00114, a PSNR of 29.6 dB, and an SSIM of 0.956 at x16 acceleration factor. More extreme undersampling factors of x32 and x64 are also investigated, holding promise for certain clinical applications such as computer-assisted surgery or radiation planning. We survey 5 expert radiologists to assess 100 pairs of images and show that the recovered undersampled images statistically preserve their diagnostic value. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: Main text: 10 pages and 8 figures. 18 pages and 14 figures total (Supplementary material included)

Journal ref: MICCAI 2021. Lecture Notes in Computer Science, vol 12906, pp 254-264

arXiv:2004.07339 [pdf, other]

An Adaptive Intelligence Algorithm for Undersampled Knee MRI Reconstruction

Authors: Nicola Pezzotti, Sahar Yousefi, Mohamed S. Elmahdy, Jeroen van Gemert, Christophe Schülke, Mariya Doneva, Tim Nielsen, Sergey Kastryulin, Boudewijn P. F. Lelieveldt, Matthias J. P. van Osch, Elwin de Weerdt, Marius Staring

Abstract: Adaptive intelligence aims at empowering machine learning techniques with the additional use of domain knowledge. In this work, we present the application of adaptive intelligence to accelerate MR acquisition. Starting from undersampled k-space data, an iterative learning-based reconstruction scheme inspired by compressed sensing theory is used to reconstruct the images. We adopt deep neural netwo… ▽ More Adaptive intelligence aims at empowering machine learning techniques with the additional use of domain knowledge. In this work, we present the application of adaptive intelligence to accelerate MR acquisition. Starting from undersampled k-space data, an iterative learning-based reconstruction scheme inspired by compressed sensing theory is used to reconstruct the images. We adopt deep neural networks to refine and correct prior reconstruction assumptions given the training data. The network was trained and tested on a knee MRI dataset from the 2019 fastMRI challenge organized by Facebook AI Research and NYU Langone Health. All submissions to the challenge were initially ranked based on similarity with a known groundtruth, after which the top 4 submissions were evaluated radiologically. Our method was evaluated by the fastMRI organizers on an independent challenge dataset. It ranked #1, shared #1, and #3 on respectively the 8x accelerated multi-coil, the 4x multi-coil, and the 4x single-coil track. This demonstrates the superior performance and wide applicability of the method. △ Less

Submitted 27 October, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

Showing 1–7 of 7 results for author: Kastryulin, S