A Simple Approach to Learning Unsupervised Multilingual Embeddings

Jawanpuria, Pratik; Meghwanshi, Mayank; Mishra, Bamdev

Computer Science > Computation and Language

arXiv:2004.05991 (cs)

[Submitted on 10 Apr 2020 (v1), last revised 20 Apr 2020 (this version, v2)]

Title:A Simple Approach to Learning Unsupervised Multilingual Embeddings

Authors:Pratik Jawanpuria, Mayank Meghwanshi, Bamdev Mishra

View PDF

Abstract:Recent progress on unsupervised learning of cross-lingual embeddings in bilingual setting has given impetus to learning a shared embedding space for several languages without any supervision. A popular framework to solve the latter problem is to jointly solve the following two sub-problems: 1) learning unsupervised word alignment between several pairs of languages, and 2) learning how to map the monolingual embeddings of every language to a shared multilingual space. In contrast, we propose a simple, two-stage framework in which we decouple the above two sub-problems and solve them separately using existing techniques. The proposed approach obtains surprisingly good performance in various tasks such as bilingual lexicon induction, cross-lingual word similarity, multilingual document classification, and multilingual dependency parsing. When distant languages are involved, the proposed solution illustrates robustness and outperforms existing unsupervised multilingual word embedding approaches. Overall, our experimental results encourage development of multi-stage models for such challenging problems.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2004.05991 [cs.CL]
	(or arXiv:2004.05991v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2004.05991

Submission history

From: Pratik Jawanpuria [view email]
[v1] Fri, 10 Apr 2020 05:54:10 UTC (264 KB)
[v2] Mon, 20 Apr 2020 15:17:01 UTC (36 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-04

Change to browse by:

cs
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Pratik Jawanpuria
Mayank Meghwanshi
Bamdev Mishra

export BibTeX citation

Computer Science > Computation and Language

Title:A Simple Approach to Learning Unsupervised Multilingual Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Simple Approach to Learning Unsupervised Multilingual Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators