Search | arXiv e-print repository

AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

Authors: Liu Yang, Huiyu Duan, Long Teng, Yucheng Zhu, Xiaohong Liu, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

Abstract: In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distorti… ▽ More In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distortions compared to natural omnidirectional images, however, there is no dedicated Image Quality Assessment (IQA) criteria for assessing them. This study addresses this gap by establishing a large-scale AI generated omnidirectional image IQA database named AIGCOIQA2024 and constructing a comprehensive benchmark. We first generate 300 omnidirectional images based on 5 AIGC models utilizing 25 text prompts. A subjective IQA experiment is conducted subsequently to assess human visual preferences from three perspectives including quality, comfortability, and correspondence. Finally, we conduct a benchmark experiment to evaluate the performance of state-of-the-art IQA models on our database. The database will be released to facilitate future research. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2302.04796 [pdf, other]

BASICS: Broad quality Assessment of Static point clouds In Compression Scenarios

Authors: Ali Ak, Emin Zerman, Maurice Quach, Aladine Chetouani, Aljosa Smolic, Giuseppe Valenzise, Patrick Le Callet

Abstract: Point clouds are now commonly used to represent 3D scenes in virtual world, in addition to 3D meshes. Their ease of capture enable various applications on mobile devices, such as smartphones or other microcontrollers. Point cloud compression is now at an advanced level and being standardized. Nevertheless, quality assessment databases, which is needed to develop better objective quality metrics, a… ▽ More Point clouds are now commonly used to represent 3D scenes in virtual world, in addition to 3D meshes. Their ease of capture enable various applications on mobile devices, such as smartphones or other microcontrollers. Point cloud compression is now at an advanced level and being standardized. Nevertheless, quality assessment databases, which is needed to develop better objective quality metrics, are still limited. In this work, we create a broad quality assessment database for static point clouds, mainly for telepresence scenario. For the sake of completeness, the created database is analyzed using the mean opinion scores, and it is used to benchmark several state-of-the-art quality estimators. The generated database is named Broad quality Assessment of Static point clouds In Compression Scenario (BASICS). Currently, the BASICS database is used as part of the ICIP 2023 Grand Challenge on Point Cloud Quality Assessment, and therefore only a part of the database has been made publicly available at the challenge website. The rest of the database will be made available once the challenge is over. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: Manuscript in preparation, 11 pages, 8 figures

arXiv:2206.09854 [pdf, other]

On the benefit of parameter-driven approaches for the modeling and the prediction of Satisfied User Ratio for compressed video

Authors: **gwen Zhu, Patrick Le Callet, Anne-Flore Perrin, Sriram Sethuraman, Kumar Rahul

Abstract: The human eye cannot perceive small pixel changes in images or videos until a certain threshold of distortion. In the context of video compression, Just Noticeable Difference (JND) is the smallest distortion level from which the human eye can perceive the difference between reference video and the distorted/compressed one. Satisfied-User-Ratio (SUR) curve is the complementary cumulative distributi… ▽ More The human eye cannot perceive small pixel changes in images or videos until a certain threshold of distortion. In the context of video compression, Just Noticeable Difference (JND) is the smallest distortion level from which the human eye can perceive the difference between reference video and the distorted/compressed one. Satisfied-User-Ratio (SUR) curve is the complementary cumulative distribution function of the individual JNDs of a viewer group. However, most of the previous works predict each point in SUR curve by using features both from source video and from compressed videos with assumption that the group-based JND annotations follow Gaussian distribution, which is neither practical nor accurate. In this work, we firstly compared various common functions for SUR curve modeling. Afterwards, we proposed a novel parameter-driven method to predict the video-wise SUR from video features. Besides, we compared the prediction results of source-only features based (SRC-based) models and source plus compressed videos features (SRC+PVS-based) models. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2205.08007 [pdf, other]

Perceptual Evaluation on Audio-visual Dataset of 360 Content

Authors: Randy F Fela, Andréas Pastor, Patrick Le Callet, Nick Zacharov, Toinon Vigier, Søren Forchhammer

Abstract: To open up new possibilities to assess the multimodal perceptual quality of omnidirectional media formats, we proposed a novel open source 360 audiovisual (AV) quality dataset. The dataset consists of high-quality 360 video clips in equirectangular (ERP) format and higher-order ambisonic (4th order) along with the subjective scores. Three subjective quality experiments were conducted for audio, vi… ▽ More To open up new possibilities to assess the multimodal perceptual quality of omnidirectional media formats, we proposed a novel open source 360 audiovisual (AV) quality dataset. The dataset consists of high-quality 360 video clips in equirectangular (ERP) format and higher-order ambisonic (4th order) along with the subjective scores. Three subjective quality experiments were conducted for audio, video, and AV with the procedures detailed in this paper. Using the data from subjective tests, we demonstrated that this dataset can be used to quantify perceived audio, video, and audiovisual quality. The diversity and discriminability of subjective scores were also analyzed. Finally, we investigated how our dataset correlates with various objective quality metrics of audio and video. Evidence from the results of this study implies that the proposed dataset can benefit future studies on multimodal quality evaluation of 360 content. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: 6 pages, 5 figures, International Conference on Multimedia and Expo 2022

arXiv:2205.07565 [pdf, other]

A Framework to Map VMAF with the Probability of Just Noticeable Difference between Video Encoding Recipes

Authors: **gwen Zhu, Suiyi Ling, Yoann Baveye, Patrick Le Callet

Abstract: Just Noticeable Difference (JND) model developed based on Human Vision System (HVS) through subjective studies is valuable for many multimedia use cases. In the streaming industries, it is commonly applied to reach a good balance between compression efficiency and perceptual quality when selecting video encoding recipes. Nevertheless, recent state-of-the-art deep learning based JND prediction mode… ▽ More Just Noticeable Difference (JND) model developed based on Human Vision System (HVS) through subjective studies is valuable for many multimedia use cases. In the streaming industries, it is commonly applied to reach a good balance between compression efficiency and perceptual quality when selecting video encoding recipes. Nevertheless, recent state-of-the-art deep learning based JND prediction model relies on large-scale JND ground truth that is expensive and time consuming to collect. Most of the existing JND datasets contain limited number of contents and are limited to a certain codec (e.g., H264). As a result, JND prediction models that were trained on such datasets are normally not agnostic to the codecs. To this end, in order to decouple encoding recipes and JND estimation, we propose a novel framework to map the difference of objective Video Quality Assessment (VQA) scores, i.e., VMAF, between two given videos encoded with different encoding recipes from the same content to the probability of having just noticeable difference between them. The proposed probability map** model learns from DCR test data, which is significantly cheaper compared to standard JND subjective test. As we utilize objective VQA metric (e.g., VMAF that trained with contents encoded with different codecs) as proxy to estimate JND, our model is agnostic to codecs and computationally efficient. Throughout extensive experiments, it is demonstrated that the proposed model is able to estimate JND values efficiently. △ Less

Submitted 20 May, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

arXiv:2205.03574 [pdf, other]

Utility-Oriented Underwater Image Quality Assessment Based on Transfer Learning

Authors: Weiling Chen, Rongfu Lin, Honggang Liao, Tiesong Zhao, Ke Gu, Patrick Le Callet

Abstract: The widespread image applications have greatly promoted the vision-based tasks, in which the Image Quality Assessment (IQA) technique has become an increasingly significant issue. For user enjoyment in multimedia systems, the IQA exploits image fidelity and aesthetics to characterize user experience; while for other tasks such as popular object recognition, there exists a low correlation between u… ▽ More The widespread image applications have greatly promoted the vision-based tasks, in which the Image Quality Assessment (IQA) technique has become an increasingly significant issue. For user enjoyment in multimedia systems, the IQA exploits image fidelity and aesthetics to characterize user experience; while for other tasks such as popular object recognition, there exists a low correlation between utilities and perceptions. In such cases, the fidelity-based and aesthetics-based IQA methods cannot be directly applied. To address this issue, this paper proposes a utility-oriented IQA in object recognition. In particular, we initialize our research in the scenario of underwater fish detection, which is a critical task that has not yet been perfectly addressed. Based on this task, we build an Underwater Image Utility Database (UIUD) and a learning-based Underwater Image Utility Measure (UIUM). Inspired by the top-down design of fidelity-based IQA, we exploit the deep models of object recognition and transfer their features to our UIUM. Experiments validate that the proposed transfer-learning-based UIUM achieves promising performance in the recognition task. We envision our research provides insights to bridge the researches of IQA and computer vision. △ Less

Submitted 7 May, 2022; originally announced May 2022.

arXiv:2204.04900 [pdf, other]

doi 10.1109/TIP.2022.3220404

Confusing Image Quality Assessment: Towards Better Augmented Reality Experience

Authors: Huiyu Duan, Xiongkuo Min, Yucheng Zhu, Guangtao Zhai, Xiaokang Yang, Patrick Le Callet

Abstract: With the development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary value of AR is to promote the fusion of digital contents and real-world environments, however, studies on how this fusion will influence the Quality of Experience (QoE) of these two components are lacking. To achieve better QoE of AR, whose two layers are influe… ▽ More With the development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary value of AR is to promote the fusion of digital contents and real-world environments, however, studies on how this fusion will influence the Quality of Experience (QoE) of these two components are lacking. To achieve better QoE of AR, whose two layers are influenced by each other, it is important to evaluate its perceptual quality first. In this paper, we consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory. A more general problem is first proposed, which is evaluating the perceptual quality of superimposed images, i.e., confusing image quality assessment. A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs. Then a subjective quality perception study and an objective model evaluation experiment are conducted towards attaining a better understanding of how humans perceive the confusing images. An objective metric termed CFIQA is also proposed to better evaluate the confusing image quality. Moreover, an extended ARIQA study is further conducted based on the CFIQA study. We establish an ARIQA database to better simulate the real AR application scenarios, which contains 20 AR reference images, 20 background (BG) reference images, and 560 distorted images generated from AR and BG references, as well as the correspondingly collected subjective quality ratings. We also design three types of full-reference (FR) IQA metrics to study whether we should consider the visual confusion when designing corresponding IQA algorithms. An ARIQA metric is finally proposed for better evaluating the perceptual quality of AR images. △ Less

Submitted 31 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2103.05099 [pdf, other]

Subjective and Objective Quality Assessment of Mobile Gaming Video

Authors: Shaoguo Wen, Suiyi Ling, Junle Wang, Ximing Chen, Lizhi Fang, Yanqing **g, Patrick Le Callet

Abstract: Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher quality of experience is also growing swiftly. As most of the existing research focuses on traditional video streaming, there is a clear lack of both subjective study and objective quality models that are tailored for quali… ▽ More Nowadays, with the vigorous expansion and development of gaming video streaming techniques and services, the expectation of users, especially the mobile phone users, for higher quality of experience is also growing swiftly. As most of the existing research focuses on traditional video streaming, there is a clear lack of both subjective study and objective quality models that are tailored for quality assessment of mobile gaming content. To this end, in this study, we first present a brand new Tencent Gaming Video dataset containing 1293 mobile gaming sequences encoded with three different codecs. Second, we propose an objective quality framework, namely Efficient hard-RAnk Quality Estimator (ERAQUE), that is equipped with (1) a novel hard pairwise ranking loss, which forces the model to put more emphasis on differentiating similar pairs; (2) an adapted model distillation strategy, which could be utilized to compress the proposed model efficiently without causing significant performance drop. Extensive experiments demonstrate the efficiency and robustness of our model. △ Less

Submitted 27 January, 2021; originally announced March 2021.

Comments: 5 pages

MSC Class: 68U10 ACM Class: J.0

arXiv:2101.07451 [pdf, other]

doi 10.1109/TMM.2020.3032026

Wide Color Gamut Image Content Characterization: Method, Evaluation, and Applications

Authors: Junghyuk Lee, Toinon Vigier, Patrick Le Callet, Jong-Seok Lee

Abstract: In this paper, we propose a novel framework to characterize a wide color gamut image content based on perceived quality due to the processes that change color gamut, and demonstrate two practical use cases where the framework can be applied. We first introduce the main framework and implementation details. Then, we provide analysis for understanding of existing wide color gamut datasets with quant… ▽ More In this paper, we propose a novel framework to characterize a wide color gamut image content based on perceived quality due to the processes that change color gamut, and demonstrate two practical use cases where the framework can be applied. We first introduce the main framework and implementation details. Then, we provide analysis for understanding of existing wide color gamut datasets with quantitative characterization criteria on their characteristics, where four criteria, i.e., coverage, total coverage, uniformity, and total uniformity, are proposed. Finally, the framework is applied to content selection in a gamut map** evaluation scenario in order to enhance reliability and robustness of the evaluation results. As a result, the framework fulfils content characterization for studies where quality of experience of wide color gamut stimuli is involved. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Journal ref: IEEE Transactions on Multimedia (2020)

arXiv:2101.07439 [pdf, other]

doi 10.1016/j.image.2021.116150

Ambiguity of Objective Image Quality Metrics: A New Methodology for Performance Evaluation

Authors: Manri Cheon, Toinon Vigier, Lukáš Krasula, Junghyuk Lee, Patrick Le Callet, Jong-Seok Lee

Abstract: Objective image quality metrics try to estimate the perceptual quality of the given image by considering the characteristics of the human visual system. However, it is possible that the metrics produce different quality scores even for two images that are perceptually indistinguishable by human viewers, which have not been considered in the existing studies related to objective quality assessment.… ▽ More Objective image quality metrics try to estimate the perceptual quality of the given image by considering the characteristics of the human visual system. However, it is possible that the metrics produce different quality scores even for two images that are perceptually indistinguishable by human viewers, which have not been considered in the existing studies related to objective quality assessment. In this paper, we address the issue of ambiguity of objective image quality assessment. We propose an approach to obtain an ambiguity interval of an objective metric, within which the quality score difference is not perceptually significant. In particular, we use the visual difference predictor, which can consider viewing conditions that are important for visual quality perception. In order to demonstrate the usefulness of the proposed approach, we conduct experiments with 33 state-of-the-art image quality metrics in the viewpoint of their accuracy and ambiguity for three image quality databases. The results show that the ambiguity intervals can be applied as an additional figure of merit when conventional performance measurement does not determine superiority between the metrics. The effect of the viewing distance on the ambiguity interval is also shown. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Journal ref: Signal Processing: Image Communication (2021)

arXiv:1909.01738 [pdf, other]

Binocular Rivalry Oriented Predictive Auto-Encoding Network for Blind Stereoscopic Image Quality Measurement

Authors: Jiahua Xu, Wei Zhou, Zhibo Chen, Suiyi Ling, Patrick Le Callet

Abstract: Stereoscopic image quality measurement (SIQM) has become increasingly important for guiding stereo image processing and commutation systems due to the widespread usage of 3D contents. Compared with conventional methods which are relied on hand-crafted features, deep learning oriented measurements have achieved remarkable performance in recent years. However, most existing deep SIQM evaluators are… ▽ More Stereoscopic image quality measurement (SIQM) has become increasingly important for guiding stereo image processing and commutation systems due to the widespread usage of 3D contents. Compared with conventional methods which are relied on hand-crafted features, deep learning oriented measurements have achieved remarkable performance in recent years. However, most existing deep SIQM evaluators are not specifically built for stereoscopic contents and consider little prior domain knowledge of the 3D human visual system (HVS) in network design. In this paper, we develop a Predictive Auto-encoDing Network (PAD-Net) for blind/No-Reference stereoscopic image quality measurement. In the first stage, inspired by the predictive coding theory that the cognition system tries to match bottom-up visual signal with top-down predictions, we adopt the encoder-decoder architecture to reconstruct the distorted inputs. Besides, motivated by the binocular rivalry phenomenon, we leverage the likelihood and prior maps generated from the predictive coding process in the Siamese framework for assisting SIQM. In the second stage, quality regression network is applied to the fusion image for acquiring the perceptual quality prediction. The performance of PAD-Net has been extensively evaluated on three benchmark databases and the superiority has been well validated on both symmetrically and asymmetrically distorted stereoscopic images under various distortion types. △ Less

Submitted 1 November, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

arXiv:1905.00161 [pdf, other]

State-of-the-art in 360° Video/Image Processing: Perception, Assessment and Compression

Authors: Chen Li, Mai Xu, Shanyi Zhang, Patrick Le Callet

Abstract: Nowadays, 360° video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360° video/image accounts for huge data, which pose the challenges to 360° video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360° video/image processing. In this paper, we re… ▽ More Nowadays, 360° video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360° video/image accounts for huge data, which pose the challenges to 360° video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360° video/image processing. In this paper, we review the state-of-the-art works on 360° video/image processing from the aspects of perception, assessment and compression. First, this paper reviews both datasets and visual attention modelling approaches for 360° video/image. Second, we survey the related works on both subjective and objective visual quality assessment (VQA) of 360° video/image. Third, we overview the compression approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models. Finally, we summarize this overview paper and outlook the future research trends on 360° video/image processing. △ Less

Submitted 28 October, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

Comments: Submitted to IEEE J-STSP SI of Perception-driven 360-degree video processing as an Invited Overview Paper

Showing 1–12 of 12 results for author: Callet, P L