Exploring the Dialogue Comprehension Ability of Large Language Models

She, Shuaijie; Huang, Shujian; Wang, Xingyun; Zhou, Yanke; Chen, Jiajun

Computer Science > Computation and Language

arXiv:2311.07194v2 (cs)

[Submitted on 13 Nov 2023 (v1), revised 16 Nov 2023 (this version, v2), latest version 1 Apr 2024 (v3)]

Title:Exploring the Dialogue Comprehension Ability of Large Language Models

Authors:Shuaijie She, Shujian Huang, Xingyun Wang, Yanke Zhou, Jiajun Chen

View PDF

Abstract:LLMs may interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation with the help of the dialogue summarization task. Beside evaluating and analyzing the dialogue summarization performance (DIAC-Sum) of different LLMs, we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 27% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest model evaluated, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average error rate of all evaluated LLMs is 37.2%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still the most challenging problem for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data. The experimental results demonstrate that our method achieved an error rate improvement of 10.9% on DIAC-FactQA.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.07194 [cs.CL]
	(or arXiv:2311.07194v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.07194

Submission history

From: Shuaijie She [view email]
[v1] Mon, 13 Nov 2023 09:32:12 UTC (1,948 KB)
[v2] Thu, 16 Nov 2023 11:56:12 UTC (1,949 KB)
[v3] Mon, 1 Apr 2024 16:37:50 UTC (2,029 KB)

Computer Science > Computation and Language

Title:Exploring the Dialogue Comprehension Ability of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring the Dialogue Comprehension Ability of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators