Parallel Context Windows for Large Language Models

Ratner, Nir; Levine, Yoav; Belinkov, Yonatan; Ram, Ori; Magar, Inbal; Abend, Omri; Karpas, Ehud; Shashua, Amnon; Leyton-Brown, Kevin; Shoham, Yoav

Computer Science > Computation and Language

arXiv:2212.10947 (cs)

[Submitted on 21 Dec 2022 (v1), last revised 1 Aug 2023 (this version, v3)]

Title:Parallel Context Windows for Large Language Models

Authors:Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri Abend, Ehud Karpas, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham

View PDF

Abstract:When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at this https URL.

Comments:	The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.10947 [cs.CL]
	(or arXiv:2212.10947v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.10947

Submission history

From: Yoav Levine [view email]
[v1] Wed, 21 Dec 2022 11:38:51 UTC (7,368 KB)
[v2] Mon, 15 May 2023 06:09:57 UTC (7,375 KB)
[v3] Tue, 1 Aug 2023 16:48:47 UTC (7,378 KB)

Computer Science > Computation and Language

Title:Parallel Context Windows for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Parallel Context Windows for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators