Looking at words and points with attention: a benchmark for text-to-shape coherence

Amaduzzi, Andrea; Lisanti, Giuseppe; Salti, Samuele; Di Stefano, Luigi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2309.07917 (cs)

[Submitted on 14 Sep 2023]

Title:Looking at words and points with attention: a benchmark for text-to-shape coherence

Authors:Andrea Amaduzzi, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

View PDF

Abstract:While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs validated by the user study comprise a novel, fine-grained benchmark that we publicly release to foster research on text-to-shape coherence of text-conditioned 3D generative models. Benchmark available at this https URL.

Comments:	ICCV 2023 Workshop "AI for 3D Content Creation", Project page: this https URL, 26 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.07917 [cs.CV]
	(or arXiv:2309.07917v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.07917

Submission history

From: Andrea Amaduzzi [view email]
[v1] Thu, 14 Sep 2023 17:59:48 UTC (20,039 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Looking at words and points with attention: a benchmark for text-to-shape coherence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Looking at words and points with attention: a benchmark for text-to-shape coherence

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators