Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

Samadi, Mohammad Amin; Akhondzadeh, Mohammad Sadegh; Zahabi, Sayed Jalal; Manshaei, Mohammad Hossein; Maleki, Zeinab; Adibi, Payman

Computer Science > Computation and Language

arXiv:2005.05114 (cs)

[Submitted on 11 May 2020]

Title:Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

Authors:Mohammad Amin Samadi, Mohammad Sadegh Akhondzadeh, Sayed Jalal Zahabi, Mohammad Hossein Manshaei, Zeinab Maleki, Payman Adibi

View PDF

Abstract:Word embeddings have found their way into a wide range of natural language processing tasks including those in the biomedical domain. While these vector representations successfully capture semantic and syntactic word relations, hidden patterns and trends in the data, they fail to offer interpretability. Interpretability is a key means to justification which is an integral part when it comes to biomedical applications. We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods. Qualitative and quantitative measurements and metrics for interpretability of word vector representations are provided. For the quantitative evaluation, we introduce an extensive categorized dataset that can be used to quantify interpretability based on category theory. Intrinsic and extrinsic evaluation of the studied methods are also presented. As for the latter, we propose datasets which can be utilized for effective extrinsic evaluation of word vectors in the biomedical domain. Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2005.05114 [cs.CL]
	(or arXiv:2005.05114v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.05114

Submission history

From: Mohammad Hossein Manshaei [view email]
[v1] Mon, 11 May 2020 13:56:58 UTC (660 KB)

Computer Science > Computation and Language

Title:Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators