Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Feucht, Sheridan; Atkinson, David; Wallace, Byron; Bau, David

Computer Science > Computation and Language

arXiv:2406.20086 (cs)

[Submitted on 28 Jun 2024]

Title:Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Authors:Sheridan Feucht, David Atkinson, Byron Wallace, David Bau

View PDF HTML (experimental)

Abstract:LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantically meaningful units like "north" or "east." Similarly, the overall meanings of named entities like "Neil Young" and multi-word expressions like "break a leg" cannot be directly inferred from their constituent tokens. Mechanistically, how do LLMs convert such arbitrary groups of tokens into useful higher-level representations? In this work, we find that last token representations of named entities and multi-token words exhibit a pronounced "erasure" effect, where information about previous and current tokens is rapidly forgotten in early layers. Using this observation, we propose a method to "read out" the implicit vocabulary of an autoregressive LLM by examining differences in token representations across layers, and present results of this method for Llama-2-7b and Llama-3-8B. To our knowledge, this is the first attempt to probe the implicit vocabulary of an LLM.

Comments:	13 pages, 14 figures. Code and data at this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
ACM classes:	I.2.7
Cite as:	arXiv:2406.20086 [cs.CL]
	(or arXiv:2406.20086v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.20086

Submission history

From: Sheridan Feucht [view email]
[v1] Fri, 28 Jun 2024 17:54:47 UTC (2,477 KB)

Computer Science > Computation and Language

Title:Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators