Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Lin, Bill Yuchen; Lee, Seyeon; Khanna, Rahul; Ren, Xiang

Computer Science > Computation and Language

arXiv:2005.00683 (cs)

[Submitted on 2 May 2020 (v1), last revised 18 Sep 2020 (this version, v2)]

Title:Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Authors:Bill Yuchen Lin, Seyeon Lee, Rahul Khanna, Xiang Ren

View PDF

Abstract:Recent works show that pre-trained language models (PTLMs), such as BERT, possess certain commonsense and factual knowledge. They suggest that it is promising to use PTLMs as "neural knowledge bases" via predicting masked words. Surprisingly, we find that this may not work for numerical commonsense knowledge (e.g., a bird usually has two legs). In this paper, we investigate whether and to what extent we can induce numerical commonsense knowledge from PTLMs as well as the robustness of this process. To study this, we introduce a novel probing task with a diagnostic dataset, NumerSense, containing 13.6k masked-word-prediction probes (10.5k for fine-tuning and 3.1k for testing). Our analysis reveals that: (1) BERT and its stronger variant RoBERTa perform poorly on the diagnostic dataset prior to any fine-tuning; (2) fine-tuning with distant supervision brings some improvement; (3) the best supervised model still performs poorly as compared to human performance (54.06% vs 96.3% in accuracy).

Comments:	To appear in Proceedings of EMNLP 2020. Project page: this http URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2005.00683 [cs.CL]
	(or arXiv:2005.00683v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00683

Submission history

From: Bill Yuchen Lin [view email]
[v1] Sat, 2 May 2020 02:47:02 UTC (422 KB)
[v2] Fri, 18 Sep 2020 00:42:25 UTC (7,655 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-05

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Bill Yuchen Lin
Rahul Khanna
Xiang Ren

export BibTeX citation

Computer Science > Computation and Language

Title:Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators