LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Zhao, Jun; Zu, Can; Xu, Hao; Lu, Yi; He, Wei; Ding, Yiwen; Gui, Tao; Zhang, Qi; Huang, Xuan**g

Computer Science > Computation and Language

arXiv:2402.11550 (cs)

[Submitted on 18 Feb 2024 (v1), last revised 13 Mar 2024 (this version, v2)]

Title:LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Authors:Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuan**g Huang

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.11550 [cs.CL]
	(or arXiv:2402.11550v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.11550

Submission history

From: Jun Zhao [view email]
[v1] Sun, 18 Feb 2024 11:46:52 UTC (610 KB)
[v2] Wed, 13 Mar 2024 07:16:42 UTC (610 KB)

Computer Science > Computation and Language

Title:LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators