Word Embeddings Are Steers for Language Models

Han, Chi; Xu, Jialiang; Li, Manling; Fung, Yi; Sun, Chenkai; Jiang, Nan; Abdelzaher, Tarek; Ji, Heng

Computer Science > Computation and Language

arXiv:2305.12798 (cs)

[Submitted on 22 May 2023 (v1), last revised 6 Jun 2024 (this version, v2)]

Title:Word Embeddings Are Steers for Language Models

Authors:Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji

View PDF HTML (experimental)

Abstract:Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain underexplored. In this work, we theoretically and empirically revisit output word embeddings and find that their linear transformations are equivalent to steering language model generation styles. We name such steers LM-Steers and find them existing in LMs of all sizes. It requires learning parameters equal to 0.2% of the original LMs' size for steering each style. On tasks such as language model detoxification and sentiment control, LM-Steers can achieve comparable or superior performance compared with state-of-the-art controlled generation methods while maintaining a better balance with generation quality. The learned LM-Steer serves as a lens in text styles: it reveals that word embeddings are interpretable when associated with language model generations and can highlight text spans that most indicate the style differences. An LM-Steer is transferrable between different language models by an explicit form calculation. One can also continuously steer LMs simply by scaling the LM-Steer or compose multiple LM-Steers by adding their transformations. Our codes are publicly available at \url{this https URL}.

Comments:	ACL 2024 Long Paper, 9 pages, 3 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2305.12798 [cs.CL]
	(or arXiv:2305.12798v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.12798

Submission history

From: Chi Han [view email]
[v1] Mon, 22 May 2023 07:52:04 UTC (5,990 KB)
[v2] Thu, 6 Jun 2024 06:07:27 UTC (6,208 KB)

Computer Science > Computation and Language

Title:Word Embeddings Are Steers for Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Word Embeddings Are Steers for Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators