Pragmatic Constraint on Distributional Semantics

Zhemchuzhina, Elizaveta; Filippov, Nikolai; Yamshchikov, Ivan P.

Computer Science > Computation and Language

arXiv:2211.11041 (cs)

[Submitted on 20 Nov 2022]

Title:Pragmatic Constraint on Distributional Semantics

Authors:Elizaveta Zhemchuzhina, Nikolai Filippov, Ivan P. Yamshchikov

View PDF

Abstract:This paper studies the limits of language models' statistical learning in the context of Zipf's law. First, we demonstrate that Zipf-law token distribution emerges irrespective of the chosen tokenization. Second, we show that Zipf distribution is characterized by two distinct groups of tokens that differ both in terms of their frequency and their semantics. Namely, the tokens that have a one-to-one correspondence with one semantic concept have different statistical properties than those with semantic ambiguity. Finally, we demonstrate how these properties interfere with statistical learning procedures motivated by distributional semantics.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
ACM classes:	E.4; H.1.1; I.2.7
Cite as:	arXiv:2211.11041 [cs.CL]
	(or arXiv:2211.11041v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2211.11041

Submission history

From: Ivan P Yamshchikov [view email]
[v1] Sun, 20 Nov 2022 17:51:06 UTC (629 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2022-11

Change to browse by:

cs
cs.CL
cs.IT
math
math.IT

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Pragmatic Constraint on Distributional Semantics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Pragmatic Constraint on Distributional Semantics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators