Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings

Manrique, Natalia Flechas; Bao, Wanqian; Herbelot, Aurelie; Hasson, Uri

Computer Science > Computation and Language

arXiv:2310.10262 (cs)

[Submitted on 16 Oct 2023]

Title:Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings

Authors:Natalia Flechas Manrique, Wanqian Bao, Aurelie Herbelot, Uri Hasson

View PDF

Abstract:Interpretability methods in NLP aim to provide insights into the semantics underlying specific system architectures. Focusing on word embeddings, we present a supervised-learning method that, for a given domain (e.g., sports, professions), identifies a subset of model features that strongly improve prediction of human similarity judgments. We show this method keeps only 20-40% of the original embeddings, for 8 independent semantic domains, and that it retains different feature sets across domains. We then present two approaches for interpreting the semantics of the retained features. The first obtains the scores of the domain words (co-hyponyms) on the first principal component of the retained embeddings, and extracts terms whose co-occurrence with the co-hyponyms tracks these scores' profile. This analysis reveals that humans differentiate e.g. sports based on how gender-inclusive and international they are. The second approach uses the retained sets as variables in a probing task that predicts values along 65 semantically annotated dimensions for a dataset of 535 words. The features retained for professions are best at predicting cognitive, emotional and social dimensions, whereas features retained for fruits or vegetables best predict the gustation (taste) dimension. We discuss implications for alignment between AI systems and human knowledge.

Comments:	Accepted for presentation at the BlackboxNLP workshop at EMNLP 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.10262 [cs.CL]
	(or arXiv:2310.10262v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.10262

Submission history

From: Uri Hasson [view email]
[v1] Mon, 16 Oct 2023 10:38:49 UTC (439 KB)

Computer Science > Computation and Language

Title:Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Enhancing Interpretability using Human Similarity Judgements to Prune Word Embeddings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators