Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

You, Zhiyuan; Li, Zheyuan; Gu, ****; Yin, Zhenfei; Xue, Tianfan; Dong, Chao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.08962 (cs)

[Submitted on 14 Dec 2023 (v1), last revised 10 Mar 2024 (this version, v2)]

Title:Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Authors:Zhiyuan You, Zheyuan Li, **** Gu, Zhenfei Yin, Tianfan Xue, Chao Dong

View PDF HTML (experimental)

Abstract:We introduce a Depicted image Quality Assessment method (DepictQA), overcoming the constraints of traditional score-based methods. DepictQA allows for detailed, language-based, human-like evaluation of image quality by leveraging Multi-modal Large Language Models (MLLMs). Unlike conventional Image Quality Assessment (IQA) methods relying on scores, DepictQA interprets image content and distortions descriptively and comparatively, aligning closely with humans' reasoning process. To build the DepictQA model, we establish a hierarchical task framework, and collect a multi-modal IQA training dataset. To tackle the challenges of limited training data and multi-image processing, we propose to use multi-source training data and specialized image tags. These designs result in a better performance of DepictQA than score-based approaches on multiple benchmarks. Moreover, compared with general MLLMs, DepictQA can generate more accurate reasoning descriptive languages. Our work demonstrates the utility of our full-reference dataset in non-reference applications, and indicates that language-based IQA methods have the potential to be customized for individual preferences.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.08962 [cs.CV]
	(or arXiv:2312.08962v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.08962

Submission history

From: Zhiyuan You [view email]
[v1] Thu, 14 Dec 2023 14:10:02 UTC (5,579 KB)
[v2] Sun, 10 Mar 2024 09:18:17 UTC (5,811 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators