Cluse: Cross-lingual unsupervised sense embeddings
Journal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
Pages
271-281
Date Issued
2020
Author(s)
Chi, T.-C.
Abstract
This paper proposes a modularized sense induction and representation learning model that jointly learns bilingual sense embeddings that align well in the vector space, where the crosslingual signal in the English-Chinese parallel corpus is exploited to capture the collocation and distributed characteristics in the language pair. The model is evaluated on the Stanford Contextual Word Similarity (SCWS) dataset to ensure the quality of monolingual sense embeddings. In addition, we introduce Bilingual Contextual Word Similarity (BCWS), a large and high-quality dataset for evaluating crosslingual sense embeddings, which is the first attempt of measuring whether the learned embeddings are indeed aligned well in the vector space. The proposed approach shows the superior quality of sense embeddings evaluated in both monolingual and bilingual spaces.1 © 2018 Association for Computational Linguistics
Other Subjects
Embeddings; Large dataset; Natural language processing systems; Vector spaces; Contextual words; Cross-lingual; Distributed characteristics; English-chinese parallel corpora; High quality; Language pairs; Learning models; Modularized; Quality control
Type
conference paper