EnCore: Fine-Grained Entity Ty** by Pre-Training Entity Encoders on Coreference Chains

Mtumbuka, Frank; Schockaert, Steven

Computer Science > Computation and Language

arXiv:2305.12924 (cs)

[Submitted on 22 May 2023 (v1), last revised 27 Jan 2024 (this version, v2)]

Title:EnCore: Fine-Grained Entity Ty** by Pre-Training Entity Encoders on Coreference Chains

Authors:Frank Mtumbuka, Steven Schockaert

View PDF

Abstract:Entity ty** is the task of assigning semantic types to the entities that are mentioned in a text. In the case of fine-grained entity ty** (FET), a large set of candidate type labels is considered. Since obtaining sufficient amounts of manual annotations is then prohibitively expensive, FET models are typically trained using distant supervision. In this paper, we propose to improve on this process by pre-training an entity encoder such that embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. The main problem with this strategy, which helps to explain why it has not previously been considered, is that predicted coreference links are often too noisy. We show that this problem can be addressed by using a simple trick: we only consider coreference links that are predicted by two different off-the-shelf systems. With this prudent use of coreference links, our pre-training strategy allows us to improve the state-of-the-art in benchmarks on fine-grained entity ty**, as well as traditional entity extraction.

Comments:	To appear at EACL 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.12924 [cs.CL]
	(or arXiv:2305.12924v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.12924

Submission history

From: Frank Mtumbuka [view email]
[v1] Mon, 22 May 2023 11:11:59 UTC (7,710 KB)
[v2] Sat, 27 Jan 2024 13:12:43 UTC (8,557 KB)

Computer Science > Computation and Language

Title:EnCore: Fine-Grained Entity Ty** by Pre-Training Entity Encoders on Coreference Chains

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:EnCore: Fine-Grained Entity Ty** by Pre-Training Entity Encoders on Coreference Chains

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators