-
Joint Quality Assessment and Example-Guided Image Processing by Disentangling Picture Appearance from Content
Authors:
Abhinau K. Venkataramanan,
Cosmin Stejerean,
Ioannis Katsavounidis,
Hassene Tmar,
Alan C. Bovik
Abstract:
The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation…
▽ More
The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone map**, where the desired output characteristics may be tuned using example input-output pairs.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Cut-FUNQUE: An Objective Quality Model for Compressed Tone-Mapped High Dynamic Range Videos
Authors:
Abhinau K. Venkataramanan,
Cosmin Stejerean,
Ioannis Katsavounidis,
Hassene Tmar,
Alan C. Bovik
Abstract:
High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize…
▽ More
High Dynamic Range (HDR) videos have enjoyed a surge in popularity in recent years due to their ability to represent a wider range of contrast and color than Standard Dynamic Range (SDR) videos. Although HDR video capture has seen increasing popularity because of recent flagship mobile phones such as Apple iPhones, Google Pixels, and Samsung Galaxy phones, a broad swath of consumers still utilize legacy SDR displays that are unable to display HDR videos. As result, HDR videos must be processed, i.e., tone-mapped, before streaming to a large section of SDR-capable video consumers. However, server-side tone-map** involves automating decisions regarding the choices of tone-map** operators (TMOs) and their parameters to yield high-fidelity outputs. Moreover, these choices must be balanced against the effects of lossy compression, which is ubiquitous in streaming scenarios. In this work, we develop a novel, efficient model of objective video quality named Cut-FUNQUE that is able to accurately predict the visual quality of tone-mapped and compressed HDR videos. Finally, we evaluate Cut-FUNQUE on a large-scale crowdsourced database of such videos and show that it achieves state-of-the-art accuracy.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Subjective Quality Assessment of Compressed Tone-Mapped High Dynamic Range Videos
Authors:
Abhinau K. Venkataramanan,
Alan C. Bovik
Abstract:
High Dynamic Range (HDR) videos are able to represent wider ranges of contrasts and colors than Standard Dynamic Range (SDR) videos, giving more vivid experiences. Due to this, HDR videos are expected to grow into the dominant video modality of the future. However, HDR videos are incompatible with existing SDR displays, which form the majority of affordable consumer displays on the market. Because…
▽ More
High Dynamic Range (HDR) videos are able to represent wider ranges of contrasts and colors than Standard Dynamic Range (SDR) videos, giving more vivid experiences. Due to this, HDR videos are expected to grow into the dominant video modality of the future. However, HDR videos are incompatible with existing SDR displays, which form the majority of affordable consumer displays on the market. Because of this, HDR videos must be processed by tone-map** them to reduced bit-depths to service a broad swath of SDR-limited video consumers. Here, we analyze the impact of tone-map** operators on the visual quality of streaming HDR videos. To this end, we built the first large-scale subjectively annotated open-source database of compressed tone-mapped HDR videos, containing 15,000 tone-mapped sequences derived from 40 unique HDR source contents. The videos in the database were labeled with more than 750,000 subjective quality annotations, collected from more than 1,600 unique human observers. We demonstrate the usefulness of the new subjective database by benchmarking objective models of visual quality on it. We envision that the new LIVE Tone-Mapped HDR (LIVE-TMHDR) database will enable significant progress on HDR video tone map** and quality assessment in the future. To this end, we make the database freely available to the community at https://live.ece.utexas.edu/research/LIVE_TMHDR/index.html
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
A FUNQUE Approach to the Quality Assessment of Compressed HDR Videos
Authors:
Abhinau K. Venkataramanan,
Cosmin Stejerean,
Ioannis Katsavounidis,
Alan C. Bovik
Abstract:
Recent years have seen steady growth in the popularity and availability of High Dynamic Range (HDR) content, particularly videos, streamed over the internet. As a result, assessing the subjective quality of HDR videos, which are generally subjected to compression, is of increasing importance. In particular, we target the task of full-reference quality assessment of compressed HDR videos. The state…
▽ More
Recent years have seen steady growth in the popularity and availability of High Dynamic Range (HDR) content, particularly videos, streamed over the internet. As a result, assessing the subjective quality of HDR videos, which are generally subjected to compression, is of increasing importance. In particular, we target the task of full-reference quality assessment of compressed HDR videos. The state-of-the-art (SOTA) approach HDRMAX involves augmenting off-the-shelf video quality models, such as VMAF, with features computed on non-linearly transformed video frames. However, HDRMAX increases the computational complexity of models like VMAF. Here, we show that an efficient class of video quality prediction models named FUNQUE+ achieves SOTA accuracy. This shows that the FUNQUE+ models are flexible alternatives to VMAF that achieve higher HDR video quality prediction accuracy at lower computational cost.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Style Transfer to Calvin and Hobbes comics using Stable Diffusion
Authors:
Sloke Shrestha,
Sundar Sripada V. S.,
Asvin Venkataramanan
Abstract:
This project report summarizes our journey to perform stable diffusion fine-tuning on a dataset containing Calvin and Hobbes comics. The purpose is to convert any given input image into the comic style of Calvin and Hobbes, essentially performing style transfer. We train stable-diffusion-v1.5 using Low Rank Adaptation (LoRA) to efficiently speed up the fine-tuning process. The diffusion itself is…
▽ More
This project report summarizes our journey to perform stable diffusion fine-tuning on a dataset containing Calvin and Hobbes comics. The purpose is to convert any given input image into the comic style of Calvin and Hobbes, essentially performing style transfer. We train stable-diffusion-v1.5 using Low Rank Adaptation (LoRA) to efficiently speed up the fine-tuning process. The diffusion itself is handled by a Variational Autoencoder (VAE), which is a U-net. Our results were visually appealing for the amount of training time and the quality of input data that went into training.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Quality Modeling Under A Relaxed Natural Scene Statistics Model
Authors:
Abhinau K. Venkataramanan,
Alan C. Bovik
Abstract:
Information-theoretic image quality assessment (IQA) models such as Visual Information Fidelity (VIF) and Spatio-temporal Reduced Reference Entropic Differences (ST-RRED) have enjoyed great success by seamlessly integrating natural scene statistics (NSS) with information theory. The Gaussian Scale Mixture (GSM) model that governs the wavelet subband coefficients of natural images forms the foundat…
▽ More
Information-theoretic image quality assessment (IQA) models such as Visual Information Fidelity (VIF) and Spatio-temporal Reduced Reference Entropic Differences (ST-RRED) have enjoyed great success by seamlessly integrating natural scene statistics (NSS) with information theory. The Gaussian Scale Mixture (GSM) model that governs the wavelet subband coefficients of natural images forms the foundation for these algorithms. However, the explosion of user-generated content on social media, which is typically distorted by one or more of many possible unknown impairments, has revealed the limitations of NSS-based IQA models that rely on the simple GSM model. Here, we seek to elaborate the VIF index by deriving useful properties of the Multivariate Generalized Gaussian Distribution (MGGD), and using them to study the behavior of VIF under a Generalized GSM (GGSM) model.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval
Authors:
Aishwarya Venkataramanan,
Martin Laviale,
Cédric Pradalier
Abstract:
Most of the research in content-based image retrieval (CBIR) focus on develo** robust feature representations that can effectively retrieve instances from a database of images that are visually similar to a query. However, the retrieved images sometimes contain results that are not semantically related to the query. To address this, we propose a method for CBIR that captures both visual and sema…
▽ More
Most of the research in content-based image retrieval (CBIR) focus on develo** robust feature representations that can effectively retrieve instances from a database of images that are visually similar to a query. However, the retrieved images sometimes contain results that are not semantically related to the query. To address this, we propose a method for CBIR that captures both visual and semantic similarity using a visual hierarchy. The hierarchy is constructed by merging classes with overlap** features in the latent space of a deep neural network trained for classification, assuming that overlap** classes share high visual and semantic similarities. Finally, the constructed hierarchy is integrated into the distance calculation metric for similarity search. Experiments on standard datasets: CUB-200-2011 and CIFAR100, and a real-life use case using diatom microscopy images show that our method achieves superior performance compared to the existing methods on image retrieval.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Gaussian Latent Representations for Uncertainty Estimation using Mahalanobis Distance in Deep Classifiers
Authors:
Aishwarya Venkataramanan,
Assia Benbihi,
Martin Laviale,
Cedric Pradalier
Abstract:
Recent works show that the data distribution in a network's latent space is useful for estimating classification uncertainty and detecting Out-of-distribution (OOD) samples. To obtain a well-regularized latent space that is conducive for uncertainty estimation, existing methods bring in significant changes to model architectures and training procedures. In this paper, we present a lightweight, fas…
▽ More
Recent works show that the data distribution in a network's latent space is useful for estimating classification uncertainty and detecting Out-of-distribution (OOD) samples. To obtain a well-regularized latent space that is conducive for uncertainty estimation, existing methods bring in significant changes to model architectures and training procedures. In this paper, we present a lightweight, fast, and high-performance regularization method for Mahalanobis distance-based uncertainty prediction, and that requires minimal changes to the network's architecture. To derive Gaussian latent representation favourable for Mahalanobis Distance calculation, we introduce a self-supervised representation learning method that separates in-class representations into multiple Gaussians. Classes with non-Gaussian representations are automatically identified and dynamically clustered into multiple new classes that are approximately Gaussian. Evaluation on standard OOD benchmarks shows that our method achieves state-of-the-art results on OOD detection with minimal inference time, and is very competitive on predictive probability calibration. Finally, we show the applicability of our method to a real-life computer vision use case on microorganism classification.
△ Less
Submitted 29 September, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Edge-Aware Image Color Appearance and Difference Modeling
Authors:
Abhinau K. Venkataramanan
Abstract:
The perception of color is one of the most important aspects of human vision. From an evolutionary perspective, the accurate perception of color is crucial to distinguishing friend from foe, and food from fatal poison. As a result, humans have developed a keen sense of color and are able to detect subtle differences in appearance, while also robustly identifying colors across illumination and view…
▽ More
The perception of color is one of the most important aspects of human vision. From an evolutionary perspective, the accurate perception of color is crucial to distinguishing friend from foe, and food from fatal poison. As a result, humans have developed a keen sense of color and are able to detect subtle differences in appearance, while also robustly identifying colors across illumination and viewing conditions. In this paper, we shall briefly review methods for adapting traditional color appearance and difference models to complex image stimuli, and propose mechanisms to improve their performance. In particular, we find that applying contrast sensitivity functions and local adaptation rules in an edge-aware manner improves image difference predictions.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
FUNQUE: Fusion of Unified Quality Evaluators
Authors:
Abhinau K. Venkataramanan,
Cosmin Stejerean,
Alan C. Bovik
Abstract:
Fusion-based quality assessment has emerged as a powerful method for develo** high-performance quality models from quality models that individually achieve lower performances. A prominent example of such an algorithm is VMAF, which has been widely adopted as an industry standard for video quality prediction along with SSIM. In addition to advancing the state-of-the-art, it is imperative to allev…
▽ More
Fusion-based quality assessment has emerged as a powerful method for develo** high-performance quality models from quality models that individually achieve lower performances. A prominent example of such an algorithm is VMAF, which has been widely adopted as an industry standard for video quality prediction along with SSIM. In addition to advancing the state-of-the-art, it is imperative to alleviate the computational burden presented by the use of a heterogeneous set of quality models. In this paper, we unify "atom" quality models by computing them on a common transform domain that accounts for the Human Visual System, and we propose FUNQUE, a quality model that fuses unified quality evaluators. We demonstrate that in comparison to the state-of-the-art, FUNQUE offers significant improvements in both correlation against subjective scores and efficiency, due to computation sharing.
△ Less
Submitted 6 July, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
Tackling Inter-Class Similarity and Intra-Class Variance for Microscopic Image-based Classification
Authors:
Aishwarya Venkataramanan,
Martin Laviale,
Cécile Figus,
Philippe Usseglio-Polatera,
Cédric Pradalier
Abstract:
Automatic classification of aquatic microorganisms is based on the morphological features extracted from individual images. The current works on their classification do not consider the inter-class similarity and intra-class variance that causes misclassification. We are particularly interested in the case where variance within a class occurs due to discrete visual changes in microscopic images. I…
▽ More
Automatic classification of aquatic microorganisms is based on the morphological features extracted from individual images. The current works on their classification do not consider the inter-class similarity and intra-class variance that causes misclassification. We are particularly interested in the case where variance within a class occurs due to discrete visual changes in microscopic images. In this paper, we propose to account for it by partitioning the classes with high variance based on the visual features. Our algorithm automatically decides the optimal number of sub-classes to be created and consider each of them as a separate class for training. This way, the network learns finer-grained visual features. Our experiments on two databases of freshwater benthic diatoms and marine plankton show that our method can outperform the state-of-the-art approaches for classification of these aquatic microorganisms.
△ Less
Submitted 24 September, 2021;
originally announced September 2021.
-
A Hitchhiker's Guide to Structural Similarity
Authors:
Abhinau K. Venkataramanan,
Chengyang Wu,
Alan C. Bovik,
Ioannis Katsavounidis,
Zafar Shahid
Abstract:
The Structural Similarity (SSIM) Index is a very widely used image/video quality model that continues to play an important role in the perceptual evaluation of compression algorithms, encoding recipes and numerous other image/video processing algorithms. Several public implementations of the SSIM and Multiscale-SSIM (MS-SSIM) algorithms have been developed, which differ in efficiency and performan…
▽ More
The Structural Similarity (SSIM) Index is a very widely used image/video quality model that continues to play an important role in the perceptual evaluation of compression algorithms, encoding recipes and numerous other image/video processing algorithms. Several public implementations of the SSIM and Multiscale-SSIM (MS-SSIM) algorithms have been developed, which differ in efficiency and performance. This "bendable ruler" makes the process of quality assessment of encoding algorithms unreliable. To address this situation, we studied and compared the functions and performances of popular and widely used implementations of SSIM, and we also considered a variety of design choices. Based on our studies and experiments, we have arrived at a collection of recommendations on how to use SSIM most effectively, including ways to reduce its computational burden.
△ Less
Submitted 30 January, 2021; v1 submitted 15 January, 2021;
originally announced January 2021.