Evaluating Human-Language Model Interaction

Lee, Mina; Srivastava, Megha; Hardy, Amelia; Thickstun, John; Durmus, Esin; Paranjape, Ashwin; Gerard-Ursin, Ines; Li, Xiang Lisa; Ladhak, Faisal; Rong, Frieda; Wang, Rose E.; Kwon, Minae; Park, Joon Sung; Cao, Hancheng; Lee, Tony; Bommasani, Rishi; Bernstein, Michael; Liang, Percy

Computer Science > Computation and Language

arXiv:2212.09746v2 (cs)

[Submitted on 19 Dec 2022 (v1), revised 20 Dec 2022 (this version, v2), latest version 5 Jan 2024 (v5)]

Title:Evaluating Human-Language Model Interaction

Authors:Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E. Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

View PDF

Abstract:Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction. However, the main LM benchmarks are non-interactive in that a system produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.09746 [cs.CL]
	(or arXiv:2212.09746v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.09746

Submission history

From: Mina Lee [view email]
[v1] Mon, 19 Dec 2022 18:59:45 UTC (10,573 KB)
[v2] Tue, 20 Dec 2022 18:53:53 UTC (10,573 KB)
[v3] Wed, 12 Jul 2023 16:29:28 UTC (8,547 KB)
[v4] Sun, 10 Sep 2023 13:31:08 UTC (8,552 KB)
[v5] Fri, 5 Jan 2024 22:09:26 UTC (8,552 KB)

Computer Science > Computation and Language

Title:Evaluating Human-Language Model Interaction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluating Human-Language Model Interaction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators