Search | arXiv e-print repository

Evaluating Image Review Ability of Vision Language Models

Authors: Shigeki Saito, Kazuki Hayashi, Yusuke Ide, Yusuke Sakai, Kazuma Onishi, Toma Suzuki, Seiji Gobara, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

Abstract: Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written… ▽ More Large-scale vision language models (LVLMs) are language models that are capable of processing images and text inputs by a single model. This paper explores the use of LVLMs to generate review texts for images. The ability of LVLMs to review images is not fully understood, highlighting the need for a methodical evaluation of their review abilities. Unlike image captions, review texts can be written from various perspectives such as image composition and exposure. This diversity of review perspectives makes it difficult to uniquely determine a single correct review for an image. To address this challenge, we introduce an evaluation method based on rank correlation analysis, in which review texts are ranked by humans and LVLMs, then, measures the correlation between these rankings. We further validate this approach by creating a benchmark dataset aimed at assessing the image review ability of recent LVLMs. Our experiments with the dataset reveal that LVLMs, particularly those with proven superiority in other evaluative contexts, excel at distinguishing between high-quality and substandard image reviews. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 9pages, under reviewing

arXiv:2306.17399 [pdf, other]

Japanese Lexical Complexity for Non-Native Readers: A New Dataset

Authors: Yusuke Ide, Masato Mita, Adam Nohejl, Hiroki Ouchi, Taro Watanabe

Abstract: Lexical complexity prediction (LCP) is the task of predicting the complexity of words in a text on a continuous scale. It plays a vital role in simplifying or annotating complex words to assist readers. To study lexical complexity in Japanese, we construct the first Japanese LCP dataset. Our dataset provides separate complexity scores for Chinese/Korean annotators and others to address the readers… ▽ More Lexical complexity prediction (LCP) is the task of predicting the complexity of words in a text on a continuous scale. It plays a vital role in simplifying or annotating complex words to assist readers. To study lexical complexity in Japanese, we construct the first Japanese LCP dataset. Our dataset provides separate complexity scores for Chinese/Korean annotators and others to address the readers' L1-specific needs. In the baseline experiment, we demonstrate the effectiveness of a BERT-based system for Japanese LCP. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: BEA 2023

arXiv:2305.13844 [pdf, other]

Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

Authors: Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, Taro Watanabe

Abstract: Geoparsing is a fundamental technique for analyzing geo-entity information in text. We focus on document-level geoparsing, which considers geographic relatedness among geo-entity mentions, and presents a Japanese travelogue dataset designed for evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coref… ▽ More Geoparsing is a fundamental technique for analyzing geo-entity information in text. We focus on document-level geoparsing, which considers geographic relatedness among geo-entity mentions, and presents a Japanese travelogue dataset designed for evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coreference clusters, and 2,551 geo-entities linked to geo-database entries. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:1301.2369 [pdf, ps, other]

doi 10.1016/j.physa.2013.01.025

Combinatorial and approximative analyses in a spatially random division process

Authors: Yukio Hayashi, Takayuki Komaki, Yusuke Ide, Takuya Machida, Norio Konno

Abstract: For a spatial characteristic, there exist commonly fat-tail frequency distributions of fragment-size and -mass of glass, areas enclosed by city roads, and pore size/volume in random packings. In order to give a new analytical approach for the distributions, we consider a simple model which constructs a fractal-like hierarchical network based on random divisions of rectangles. The stochastic proces… ▽ More For a spatial characteristic, there exist commonly fat-tail frequency distributions of fragment-size and -mass of glass, areas enclosed by city roads, and pore size/volume in random packings. In order to give a new analytical approach for the distributions, we consider a simple model which constructs a fractal-like hierarchical network based on random divisions of rectangles. The stochastic process makes a Markov chain and corresponds to directional random walks with splitting into four particles. We derive a combinatorial analytical form and its continuous approximation for the distribution of rectangle areas, and numerically show a good fitting with the actual distribution in the averaging behavior of the divisions. △ Less

Submitted 31 January, 2013; v1 submitted 10 January, 2013; originally announced January 2013.

Comments: 23 pages, 10 figures, 1 table

Journal ref: Physica A 392(9), 2212-2225, 2013

arXiv:1001.0136 [pdf, ps, other]

Spectral Properties of the Threshold Network Model

Authors: Yusuke Ide, Norio Konno, Nobuaki Obata

Abstract: We study the spectral distribution of the threshold network model.The results contain an explicit description and its asymptotic behaviour. We study the spectral distribution of the threshold network model.The results contain an explicit description and its asymptotic behaviour. △ Less

Submitted 31 December, 2009; originally announced January 2010.

Journal ref: Internet Mathematics, Vol.6, No.2, pp.173-187 (2010)

Showing 1–5 of 5 results for author: Ide, Y