Search | arXiv e-print repository

YouTube SFV+HDR Quality Dataset

Authors: Yilin Wang, Joong Gon Yim, Neil Birkbeck, Balu Adsumilli

Abstract: The popularity of Short form videos (SFV) has grown dramatically in the past few years, and has become a phenomenal video category with billions of viewers. Meanwhile, High Dynamic Range (HDR) as an advanced feature also becomes more and more popular on video sharing platforms. As a hot topic with huge impact, SFV and HDR bring new questions to video quality research: 1) is SFV+HDR quality assessm… ▽ More The popularity of Short form videos (SFV) has grown dramatically in the past few years, and has become a phenomenal video category with billions of viewers. Meanwhile, High Dynamic Range (HDR) as an advanced feature also becomes more and more popular on video sharing platforms. As a hot topic with huge impact, SFV and HDR bring new questions to video quality research: 1) is SFV+HDR quality assessment significantly different from traditional User Generated Content (UGC) quality assessment? 2) do objective quality metrics designed for traditional UGC still work well for SFV+HDR? To answer the above questions, we created the first large scale SFV+HDR dataset with reliable subjective quality scores, covering 10 popular content categories. Further, we also introduce a general sampling framework to maximize the representativeness of the dataset. We provided a comprehensive analysis of subjective quality scores for Short form SDR and HDR videos, and discuss the reliability of state-of-the-art UGC quality metrics and potential improvements. △ Less

Submitted 20 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted by 2024 IEEE International Conference on Image Processing Dataset link: https://media.withyoutube.com/sfv-hdr

arXiv:2303.16163 [pdf, other]

Comparison of HDR quality metrics in Per-Clip Lagrangian multiplier optimisation with AV1

Authors: Vibhoothi, François Pitié, Angeliki Katsenou, Ye** Su, Balu Adsumilli, Anil Kokaram

Abstract: The complexity of modern codecs along with the increased need of delivering high-quality videos at low bitrates has reinforced the idea of a per-clip tailoring of parameters for optimised rate-distortion performance. While the objective quality metrics used for Standard Dynamic Range (SDR) videos have been well studied, the transitioning of consumer displays to support High Dynamic Range (HDR) vid… ▽ More The complexity of modern codecs along with the increased need of delivering high-quality videos at low bitrates has reinforced the idea of a per-clip tailoring of parameters for optimised rate-distortion performance. While the objective quality metrics used for Standard Dynamic Range (SDR) videos have been well studied, the transitioning of consumer displays to support High Dynamic Range (HDR) videos, poses a new challenge to rate-distortion optimisation. In this paper, we review the popular HDR metrics DeltaE100 (DE100), PSNRL100, wPSNR, and HDR-VQM. We measure the impact of employing these metrics in per-clip direct search optimisation of the rate-distortion Lagrange multiplier in AV1. We report, on 35 HDR videos, average Bjontegaard Delta Rate (BD-Rate) gains of 4.675%, 2.226%, and 7.253% in terms of DE100, PSNRL100, and HDR-VQM. We also show that the inclusion of chroma in the quality metrics has a significant impact on optimisation, which can only be partially addressed by the use of chroma offsets. △ Less

Submitted 26 April, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: Accepted version for ICME 2023 Special Session, "Optimised Media Delivery"

arXiv:2303.06254 [pdf, other]

Rate-Distortion Optimization With Alternative References For UGC Video Compression

Authors: Xin Xiong, Eduardo Pavez, Antonio Ortega, Balu Adsumilli

Abstract: User generated content (UGC) refers to videos that are uploaded by users and shared over the Internet. UGC may have low quality due to noise and previous compression. When re-encoding UGC for streaming or downloading, a traditional video coding pipeline will perform rate-distortion (RD) optimization to choose coding parameters. However, in the UGC video coding case, since the input is not pristine… ▽ More User generated content (UGC) refers to videos that are uploaded by users and shared over the Internet. UGC may have low quality due to noise and previous compression. When re-encoding UGC for streaming or downloading, a traditional video coding pipeline will perform rate-distortion (RD) optimization to choose coding parameters. However, in the UGC video coding case, since the input is not pristine, quality ``saturation'' (or even degradation) can be observed, i.e., increased bitrate only leads to improved representation of coding artifacts and noise present in the UGC input. In this paper, we study the saturation problem in UGC compression, where the goal is to identify and avoid during encoding, the coding parameters and rates that lead to quality saturation. We proposed a geometric criterion for saturation detection that works with rate-distortion optimization, and only requires a few frames from the UGC video. In addition, we show how to combine the proposed saturation detection method with existing video coding systems that implement rate-distortion optimization for efficient compression of UGC videos. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 5 pages, 6 figures, accepted at International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2023

arXiv:2208.11150 [pdf, other]

doi 10.1117/12.2632272

Direct Optimisation of $\boldsymbolλ$ for HDR Content Adaptive Transcoding in AV1

Authors: Vibhoothi, François Pitié, Angeliki Katsenou, Daniel Joseph Ringis, Ye** Su, Neil Birkbeck, Jessie Lin, Balu Adsumilli, Anil Kokaram

Abstract: Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR conten… ▽ More Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR content being broadcast and the challenges in RD optimisation with that material. AV1 codec optimisation has been ongoing since 2020 including consideration of the computational load. In this paper, we explore the idea of direct optimisation of the Lagrangian $λ$ parameter used in the rate control of the encoders to estimate the optimal Rate-Distortion trade-off achievable for a High Dynamic Range signalled video clip. We show that by adjusting the Lagrange multiplier in the RD optimisation process on a frame-hierarchy basis, we are able to increase the Bjontegaard difference rate gains by more than 3.98$\times$ on average without visually affecting the quality. △ Less

Submitted 7 October, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: SPIE2022:Applications of Digital Image Processing XLV accepted manuscript

arXiv:2206.14713 [pdf, other]

CONVIQT: Contrastive Video Quality Estimator

Authors: Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional N… ▽ More Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information. The model is trained using a contrastive loss and we therefore refer to this training framework and resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During testing, the weights of the trained model are frozen, and a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model on multiple VQA databases by analyzing the correlations between model predictions and ground-truth quality ratings, and achieve competitive performance when compared to state-of-the-art NR-VQA models, even though it is not trained on those databases. Our ablation experiments demonstrate that the learned representations are highly robust and generalize well across synthetic and realistic distortions. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning. The implementations used in this work have been made available at https://github.com/pavancm/CONVIQT. △ Less

Submitted 29 June, 2022; originally announced June 2022.

arXiv:2205.10501 [pdf, ps, other]

doi 10.1109/LSP.2022.3162159

Making Video Quality Assessment Models Sensitive to Frame Rate Distortions

Authors: Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: We consider the problem of capturing distortions arising from changes in frame rate as part of Video Quality Assessment (VQA). Variable frame rate (VFR) videos have become much more common, and streamed videos commonly range from 30 frames per second (fps) up to 120 fps. VFR-VQA offers unique challenges in terms of distortion types as well as in making non-uniform comparisons of reference and dist… ▽ More We consider the problem of capturing distortions arising from changes in frame rate as part of Video Quality Assessment (VQA). Variable frame rate (VFR) videos have become much more common, and streamed videos commonly range from 30 frames per second (fps) up to 120 fps. VFR-VQA offers unique challenges in terms of distortion types as well as in making non-uniform comparisons of reference and distorted videos having different frame rates. The majority of current VQA models require compared videos to be of the same frame rate, but are unable to adequately account for frame rate artifacts. The recently proposed Generalized Entropic Difference (GREED) VQA model succeeds at this task, using natural video statistics models of entropic differences of temporal band-pass coefficients, delivering superior performance on predicting video quality changes arising from frame rate distortions. Here we propose a simple fusion framework, whereby temporal features from GREED are combined with existing VQA models, towards improving model sensitivity towards frame rate distortions. We find through extensive experiments that this feature fusion significantly boosts model performance on both HFR/VFR datasets as well as fixed frame rate (FFR) VQA databases. Our results suggest that employing efficient temporal representations can result much more robust and accurate VQA models when frame rate variations can occur. △ Less

Submitted 21 May, 2022; originally announced May 2022.

Journal ref: IEEE Signal Processing Letters. 29 (2022) 897-901

arXiv:2204.00128 [pdf, other]

Perceptual Quality Assessment of UGC Gaming Videos

Authors: Xiangxu Yu, Zhengzhong Tu, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: In recent years, with the vigorous development of the video game industry, the proportion of gaming videos on major video websites like YouTube has dramatically increased. However, relatively little research has been done on the automatic quality prediction of gaming videos, especially on those that fall in the category of "User-Generated-Content" (UGC). Since current leading general-purpose Video… ▽ More In recent years, with the vigorous development of the video game industry, the proportion of gaming videos on major video websites like YouTube has dramatically increased. However, relatively little research has been done on the automatic quality prediction of gaming videos, especially on those that fall in the category of "User-Generated-Content" (UGC). Since current leading general-purpose Video Quality Assessment (VQA) models do not perform well on this type of gaming videos, we have created a new VQA model specifically designed to succeed on UGC gaming videos, which we call the Gaming Video Quality Predictor (GAME-VQP). GAME-VQP successfully predicts the unique statistical characteristics of gaming videos by drawing upon features designed under modified natural scene statistics models, combined with gaming specific features learned by a Convolution Neural Network. We study the performance of GAME-VQP on a very recent large UGC gaming video database called LIVE-YT-Gaming, and find that it both outperforms other mainstream general VQA models as well as VQA models specifically designed for gaming videos. The new model will be made public after paper being accepted. △ Less

Submitted 13 April, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

arXiv:2203.12824 [pdf, other]

Subjective and Objective Analysis of Streamed Gaming Videos

Authors: Xiangxu Yu, Zhenqiang Ying, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: The rising popularity of online User-Generated-Content (UGC) in the form of streamed and shared videos, has hastened the development of perceptual Video Quality Assessment (VQA) models, which can be used to help optimize their delivery. Gaming videos, which are a relatively new type of UGC videos, are created when skilled gamers post videos of their gameplay. These kinds of screenshots of UGC game… ▽ More The rising popularity of online User-Generated-Content (UGC) in the form of streamed and shared videos, has hastened the development of perceptual Video Quality Assessment (VQA) models, which can be used to help optimize their delivery. Gaming videos, which are a relatively new type of UGC videos, are created when skilled gamers post videos of their gameplay. These kinds of screenshots of UGC gameplay videos have become extremely popular on major streaming platforms like YouTube and Twitch. Synthetically-generated gaming content presents challenges to existing VQA algorithms, including those based on natural scene/video statistics models. Synthetically generated gaming content presents different statistical behavior than naturalistic videos. A number of studies have been directed towards understanding the perceptual characteristics of professionally generated gaming videos arising in gaming video streaming, online gaming, and cloud gaming. However, little work has been done on understanding the quality of UGC gaming videos, and how it can be characterized and predicted. Towards boosting the progress of gaming video VQA model development, we conducted a comprehensive study of subjective and objective VQA models on UGC gaming videos. To do this, we created a novel UGC gaming video resource, called the LIVE-YouTube Gaming video quality (LIVE-YT-Gaming) database, comprised of 600 real UGC gaming videos. We conducted a subjective human study on this data, yielding 18,600 human quality ratings recorded by 61 human subjects. We also evaluated a number of state-of-the-art (SOTA) VQA models on the new database, including a new one, called GAME-VQP, based on both natural video statistics and CNN-learned features. To help support work in this field, we are making the new LIVE-YT-Gaming Database, publicly available through the link: https://live.ece.utexas.edu/research/LIVE-YT-Gaming/index.html . △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.03553 [pdf, other]

Compression of user generated content using denoised references

Authors: Eduardo Pavez, Enrique Perez, Xin Xiong, Antonio Ortega, Balu Adsumilli

Abstract: Video shared over the internet is commonly referred to as user generated content (UGC). UGC video may have low quality due to various factors including previous compression. UGC video is uploaded by users, and then it is re-encoded to be made available at various levels of quality. In a traditional video coding pipeline the encoder parameters are optimized to minimize a rate-distortion criterion,… ▽ More Video shared over the internet is commonly referred to as user generated content (UGC). UGC video may have low quality due to various factors including previous compression. UGC video is uploaded by users, and then it is re-encoded to be made available at various levels of quality. In a traditional video coding pipeline the encoder parameters are optimized to minimize a rate-distortion criterion, but when the input signal has low quality, this results in sub-optimal coding parameters optimized to preserve undesirable artifacts. In this paper we formulate the UGC compression problem as that of compression of a noisy/corrupted source. The noisy source coding theorem reveals that an optimal UGC compression system is comprised of optimal denoising of the UGC signal, followed by compression of the denoised signal. Since optimal denoising is unattainable and users may be against modification of their content, we propose encoding the UGC signal, and using denoised references only to compute distortion, so the encoding process can be guided towards perceptually better solutions. We demonstrate the effectiveness of the proposed strategy for JPEG compression of UGC images and videos. △ Less

Submitted 17 July, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 5 pages, 6 figures, accepted at International Conference on Image Processing (ICIP) 2022

arXiv:2110.13266 [pdf, other]

doi 10.1109/TIP.2022.3181496

Image Quality Assessment using Contrastive Learning

Authors: Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary proble… ▽ More We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We refer to the proposed training framework and resulting deep IQA model as the CONTRastive Image QUality Evaluator (CONTRIQUE). During evaluation, the CNN weights are frozen and a linear regressor maps the learned representations to quality scores in a No-Reference (NR) setting. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models, even without any additional fine-tuning of the CNN backbone. The learned representations are highly robust and generalize well across images afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets. The implementations used in this paper are available at \url{https://github.com/pavancm/CONTRIQUE}. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Journal ref: IEEE Transactions on Image Processing. 31 (2022) 4149 - 4161

arXiv:2109.12785 [pdf, other]

doi 10.1109/PCS50896.2021.9477462

High Frame Rate Video Quality Assessment using VMAF and Entropic Differences

Authors: Pavan C Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: The popularity of streaming videos with live, high-action content has led to an increased interest in High Frame Rate (HFR) videos. In this work we address the problem of frame rate dependent Video Quality Assessment (VQA) when the videos to be compared have different frame rate and compression factor. The current VQA models such as VMAF have superior correlation with perceptual judgments when vid… ▽ More The popularity of streaming videos with live, high-action content has led to an increased interest in High Frame Rate (HFR) videos. In this work we address the problem of frame rate dependent Video Quality Assessment (VQA) when the videos to be compared have different frame rate and compression factor. The current VQA models such as VMAF have superior correlation with perceptual judgments when videos to be compared have same frame rates and contain conventional distortions such as compression, scaling etc. However this framework requires additional pre-processing step when videos with different frame rates need to be compared, which can potentially limit its overall performance. Recently, Generalized Entropic Difference (GREED) VQA model was proposed to account for artifacts that arise due to changes in frame rate, and showed superior performance on the LIVE-YT-HFR database which contains frame rate dependent artifacts such as judder, strobing etc. In this paper we propose a simple extension, where the features from VMAF and GREED are fused in order to exploit the advantages of both models. We show through various experiments that the proposed fusion framework results in more efficient features for predicting frame rate dependent video quality. We also evaluate the fused feature set on standard non-HFR VQA databases and obtain superior performance than both GREED and VMAF, indicating the combined feature set captures complimentary perceptual quality information. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Journal ref: 2021 Picture Coding Symposium (PCS)

arXiv:2102.00155 [pdf, other]

Regression or Classification? New Methods to Evaluate No-Reference Picture and Video Quality Models

Authors: Zhengzhong Tu, Chia-Ju Chen, Li-Heng Chen, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

Abstract: Video and image quality assessment has long been projected as a regression problem, which requires predicting a continuous quality score given an input stimulus. However, recent efforts have shown that accurate quality score regression on real-world user-generated content (UGC) is a very challenging task. To make the problem more tractable, we propose two new methods - binary, and ordinal classifi… ▽ More Video and image quality assessment has long been projected as a regression problem, which requires predicting a continuous quality score given an input stimulus. However, recent efforts have shown that accurate quality score regression on real-world user-generated content (UGC) is a very challenging task. To make the problem more tractable, we propose two new methods - binary, and ordinal classification - as alternatives to evaluate and compare no-reference quality models at coarser levels. Moreover, the proposed new tasks convey more practical meaning on perceptually optimized UGC transcoding, or for preprocessing on media processing platforms. We conduct a comprehensive benchmark experiment of popular no-reference quality models on recent in-the-wild picture and video quality datasets, providing reliable baselines for both evaluation methods to support further studies. We hope this work promotes coarse-grained perceptual modeling and its applications to efficient UGC processing. △ Less

Submitted 30 January, 2021; originally announced February 2021.

Comments: ICASSP2021

arXiv:2101.10955 [pdf, other]

doi 10.1109/OJSP.2021.3090333

RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content

Authors: Zhengzhong Tu, Xiangxu Yu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

Abstract: Blind or no-reference video quality assessment of user-generated content (UGC) has become a trending, challenging, heretofore unsolved problem. Accurate and efficient video quality predictors suitable for this content are thus in great demand to achieve more intelligent analysis and processing of UGC videos. Previous studies have shown that natural scene statistics and deep learning features are b… ▽ More Blind or no-reference video quality assessment of user-generated content (UGC) has become a trending, challenging, heretofore unsolved problem. Accurate and efficient video quality predictors suitable for this content are thus in great demand to achieve more intelligent analysis and processing of UGC videos. Previous studies have shown that natural scene statistics and deep learning features are both sufficient to capture spatial distortions, which contribute to a significant aspect of UGC video quality issues. However, these models are either incapable or inefficient for predicting the quality of complex and diverse UGC videos in practical applications. Here we introduce an effective and efficient video quality model for UGC content, which we dub the Rapid and Accurate Video Quality Evaluator (RAPIQUE), which we show performs comparably to state-of-the-art (SOTA) models but with orders-of-magnitude faster runtime. RAPIQUE combines and leverages the advantages of both quality-aware scene statistics features and semantics-aware deep convolutional features, allowing us to design the first general and efficient spatial and temporal (space-time) bandpass statistics model for video quality modeling. Our experimental results on recent large-scale UGC video quality databases show that RAPIQUE delivers top performances on all the datasets at a considerably lower computational expense. We hope this work promotes and inspires further efforts towards practical modeling of video quality problems for potential real-time and low-latency applications. To promote public usage, an implementation of RAPIQUE has been made freely available online: \url{https://github.com/vztu/RAPIQUE}. △ Less

Submitted 14 November, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

Comments: IEEE Open Journal of Signal Processing 2021

arXiv:2010.13715 [pdf, other]

doi 10.1109/TIP.2021.3106801

ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction

Authors: Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED… ▽ More We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. A generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes. The entropic differences are calculated across multiple temporal and spatial subbands, and merged using a learned regressor. We show through extensive experiments that GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models. The features used in GREED are highly generalizable and obtain competitive performance even on standard, non-HFR VQA databases. The implementation of GREED has been made available online: https://github.com/pavancm/GREED △ Less

Submitted 26 September, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Journal ref: IEEE Transactions on Image Processing. 30 (2021) 7446 - 7457

arXiv:2009.10804 [pdf, other]

doi 10.1109/LSP.2020.3024985

Adaptive Debanding Filter

Authors: Zhengzhong Tu, Jessie Lin, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: Banding artifacts, which manifest as staircase-like color bands on pictures or video frames, is a common distortion caused by compression of low-textured smooth regions. These false contours can be very noticeable even on high-quality videos, especially when displayed on high-definition screens. Yet, relatively little attention has been applied to this problem. Here we consider banding artifact re… ▽ More Banding artifacts, which manifest as staircase-like color bands on pictures or video frames, is a common distortion caused by compression of low-textured smooth regions. These false contours can be very noticeable even on high-quality videos, especially when displayed on high-definition screens. Yet, relatively little attention has been applied to this problem. Here we consider banding artifact removal as a visual enhancement problem, and accordingly, we solve it by applying a form of content-adaptive smoothing filtering followed by dithered quantization, as a post-processing module. The proposed debanding filter is able to adaptively smooth banded regions while preserving image edges and details, yielding perceptually enhanced gradient rendering with limited bit-depths. Experimental results show that our proposed debanding filter outperforms state-of-the-art false contour removing algorithms both visually and quantitatively. △ Less

Submitted 22 September, 2020; originally announced September 2020.

Comments: 4 pages, 7 figures, 1 table. Accepted to IEEE Signal Processing Letters

arXiv:2007.11634 [pdf, other]

doi 10.1109/ACCESS.2021.3100462

Subjective and Objective Quality Assessment of High Frame Rate Videos

Authors: Pavan C. Madhusudana, Xiangxu Yu, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: High frame rate (HFR) videos are becoming increasingly common with the tremendous popularity of live, high-action streaming content such as sports. Although HFR contents are generally of very high quality, high bandwidth requirements make them challenging to deliver efficiently, while simultaneously maintaining their quality. To optimize trade-offs between bandwidth requirements and video quality,… ▽ More High frame rate (HFR) videos are becoming increasingly common with the tremendous popularity of live, high-action streaming content such as sports. Although HFR contents are generally of very high quality, high bandwidth requirements make them challenging to deliver efficiently, while simultaneously maintaining their quality. To optimize trade-offs between bandwidth requirements and video quality, in terms of frame rate adaptation, it is imperative to understand the intricate relationship between frame rate and perceptual video quality. Towards advancing progression in this direction we designed a new subjective resource, called the LIVE-YouTube-HFR (LIVE-YT-HFR) dataset, which is comprised of 480 videos having 6 different frame rates, obtained from 16 diverse contents. In order to understand the combined effects of compression and frame rate adjustment, we also processed videos at 5 compression levels at each frame rate. To obtain subjective labels on the videos, we conducted a human study yielding 19,000 human quality ratings obtained from a pool of 85 human subjects. We also conducted a holistic evaluation of existing state-of-the-art Full and No-Reference video quality algorithms, and statistically benchmarked their performance on the new database. The LIVE-YT-HFR database has been made available online for public use and evaluation purposes, with hopes that it will help advance research in this exciting video technology direction. It may be obtained at \url{https://live.ece.utexas.edu/research/LIVE_YT_HFR/LIVE_YT_HFR/index.html} △ Less

Submitted 26 September, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Journal ref: IEEE Access. 9 (2021) 108069 - 108082

arXiv:2006.11424 [pdf, other]

doi 10.1109/LSP.2020.3028687

Capturing Video Frame Rate Variations via Entropic Differencing

Authors: Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: High frame rate videos are increasingly getting popular in recent years, driven by the strong requirements of the entertainment and streaming industries to provide high quality of experiences to consumers. To achieve the best trade-offs between the bandwidth requirements and video quality in terms of frame rate adaptation, it is imperative to understand the effects of frame rate on video quality.… ▽ More High frame rate videos are increasingly getting popular in recent years, driven by the strong requirements of the entertainment and streaming industries to provide high quality of experiences to consumers. To achieve the best trade-offs between the bandwidth requirements and video quality in terms of frame rate adaptation, it is imperative to understand the effects of frame rate on video quality. In this direction, we devise a novel statistical entropic differencing method based on a Generalized Gaussian Distribution model expressed in the spatial and temporal band-pass domains, which measures the difference in quality between reference and distorted videos. The proposed design is highly generalizable and can be employed when the reference and distorted sequences have different frame rates. Our proposed model correlates very well with subjective scores in the recently proposed LIVE-YT-HFR database and achieves state of the art performance when compared with existing methodologies. △ Less

Submitted 20 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

Journal ref: IEEE Signal Processing Letters. 27 (2020) 1809-1813

arXiv:2005.14354 [pdf, other]

doi 10.1109/TIP.2021.3072221

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

Authors: Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

Abstract: Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize thi… ▽ More Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize this vast content. Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of UGC content are unpredictable, complicated, and often commingled. Here we contribute to advancing the UGC-VQA problem by conducting a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and VQA model design. By employing a feature selection strategy on top of leading VQA model features, we are able to extract 60 of the 763 statistical features used by the leading models to create a new fusion-based BVQA model, which we dub the \textbf{VID}eo quality \textbf{EVAL}uator (VIDEVAL), that effectively balances the trade-off between VQA performance and efficiency. Our experimental results show that VIDEVAL achieves state-of-the-art performance at considerably lower computational cost than other leading models. Our study protocol also defines a reliable benchmark for the UGC-VQA problem, which we believe will facilitate further research on deep learning-based VQA modeling, as well as perceptually-optimized efficient UGC video processing, transcoding, and streaming. To promote reproducible research and public evaluation, an implementation of VIDEVAL has been made available online: \url{https://github.com/tu184044109/VIDEVAL_release}. △ Less

Submitted 17 April, 2021; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: IEEE Transactions on Image Processing 2021

arXiv:2004.02943 [pdf, other]

doi 10.1109/TIP.2021.3107213

Predicting the Quality of Compressed Videos with Pre-Existing Distortions

Authors: Xiangxu Yu, Neil Birkbeck, Yilin Wang, Christos G. Bampis, Balu Adsumilli, Alan C. Bovik

Abstract: Over the past decade, the online video industry has greatly expanded the volume of visual data that is streamed and shared over the Internet. Moreover, because of the increasing ease of video capture, many millions of consumers create and upload large volumes of User-Generated-Content (UGC) videos. Unlike streaming television or cinematic content produced by professional videographers and cinemagr… ▽ More Over the past decade, the online video industry has greatly expanded the volume of visual data that is streamed and shared over the Internet. Moreover, because of the increasing ease of video capture, many millions of consumers create and upload large volumes of User-Generated-Content (UGC) videos. Unlike streaming television or cinematic content produced by professional videographers and cinemagraphers, UGC videos are most commonly captured by naive users having limited skills and imperfect technique, and often are afflicted by highly diverse and mixed in-capture distortions. These UGC videos are then often uploaded for sharing onto cloud servers, where they further compressed for storage and transmission. Our paper tackles the highly practical problem of predicting the quality of compressed videos (perhaps during the process of compression, to help guide it), with only (possibly severely) distorted UGC videos as references. To address this problem, we have developed a novel Video Quality Assessment (VQA) framework that we call 1stepVQA (to distinguish it from two-step methods that we discuss). 1stepVQA overcomes limitations of Full-Reference, Reduced-Reference and No-Reference VQA models by exploiting the statistical regularities of both natural videos and distorted videos. We show that 1stepVQA is able to more accurately predict the quality of compressed videos, given imperfect reference videos. We also describe a new dedicated video database which includes (typically distorted) UGC reference videos, and a large number of compressed versions of them. We show that the 1stepVQA model outperforms other VQA models in this scenario. We are providing the dedicated new database free of charge at https://live.ece.utexas.edu/research/onestep/index.html △ Less

Submitted 6 April, 2020; originally announced April 2020.

arXiv:2002.12275 [pdf, other]

Subjective Quality Assessment for YouTube UGC Dataset

Authors: Joong Gon Yim, Yilin Wang, Neil Birkbeck, Balu Adsumilli

Abstract: Due to the scale of social video sharing, User Generated Content (UGC) is getting more attention from academia and industry. To facilitate compression-related research on UGC, YouTube has released a large-scale dataset. The initial dataset only provided videos, limiting its use in quality assessment. We used a crowd-sourcing platform to collect subjective quality scores for this dataset. We analyz… ▽ More Due to the scale of social video sharing, User Generated Content (UGC) is getting more attention from academia and industry. To facilitate compression-related research on UGC, YouTube has released a large-scale dataset. The initial dataset only provided videos, limiting its use in quality assessment. We used a crowd-sourcing platform to collect subjective quality scores for this dataset. We analyzed the distribution of Mean Opinion Score (MOS) in various dimensions, and investigated some fundamental questions in video quality assessment, like the correlation between full video MOS and corresponding chunk MOS, and the influence of chunk variation in quality score aggregation. △ Less

Submitted 27 February, 2020; originally announced February 2020.

arXiv:2002.11891 [pdf, other]

BBAND Index: A No-Reference Banding Artifact Predictor

Authors: Zhengzhong Tu, Jessie Lin, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Abstract: Banding artifact, or false contouring, is a common video compression impairment that tends to appear on large flat regions in encoded videos. These staircase-shaped color bands can be very noticeable in high-definition videos. Here we study this artifact, and propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAN… ▽ More Banding artifact, or false contouring, is a common video compression impairment that tends to appear on large flat regions in encoded videos. These staircase-shaped color bands can be very noticeable in high-definition videos. Here we study this artifact, and propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAND index). BBAND is inspired by human visual models. The proposed detector can generate a pixel-wise banding visibility map and output a banding severity score at both the frame and video levels. Experimental results show that our proposed method outperforms state-of-the-art banding detection algorithms and delivers better consistency with subjective evaluations. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: Accepted by ICASSP 2020

arXiv:1904.06457 [pdf, other]

doi 10.1109/MMSP.2019.8901772

YouTube UGC Dataset for Video Compression Research

Authors: Yilin Wang, Sasi Inguva, Balu Adsumilli

Abstract: Non-professional video, commonly known as User Generated Content (UGC) has become very popular in today's video sharing applications. However, traditional metrics used in compression and quality assessment, like BD-Rate and PSNR, are designed for pristine originals. Thus, their accuracy drops significantly when being applied on non-pristine originals (the majority of UGC). Understanding difficulti… ▽ More Non-professional video, commonly known as User Generated Content (UGC) has become very popular in today's video sharing applications. However, traditional metrics used in compression and quality assessment, like BD-Rate and PSNR, are designed for pristine originals. Thus, their accuracy drops significantly when being applied on non-pristine originals (the majority of UGC). Understanding difficulties for compression and quality assessment in the scenario of UGC is important, but there are few public UGC datasets available for research. This paper introduces a large scale UGC dataset (1500 20 sec video clips) sampled from millions of YouTube videos. The dataset covers popular categories like Gaming, Sports, and new features like High Dynamic Range (HDR). Besides a novel sampling method based on features extracted from encoding, challenges for UGC compression and quality evaluation are also discussed. Shortcomings of traditional reference-based metrics on UGC are addressed. We demonstrate a promising way to evaluate UGC quality by no-reference objective quality metrics, and evaluate the current dataset with three no-reference metrics (Noise, Banding, and SLEEQ). △ Less

Submitted 1 August, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

Journal ref: 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)

Showing 1–22 of 22 results for author: Adsumilli, B