Cross-Language Encyclopedia Article Linking
Date Issued
2015
Date
2015
Author(s)
Wang, Yu-Chun
Abstract
Online encyclopedias, like Wikipedia, are one of the most widely used internet services around the world. Though Wikipedia has many language editions, their coverage is imbalanced when compared to the number of language users both online and offline. Furthermore, large alternative online encyclopedias exist for some languages, such as Chinese Baidu Baike. We could improve access to the knowledge in these various sources by constructing and integrating multiple online encyclopedias into large multilingual knowledge bases. The main task in such a project is creating links between articles in different encyclopedias in different languages. Most research to date has focused on linking articles in the different language editions of Wikipedia, yet little work has been done in linking other platform encyclopedias. In this thesis, we develop a method for cross-language encyclopedia article linking (CLEAL) between encyclopedias on different platforms, English Wikipedia and Chinese Baidu Baike. We use a bilingual topic model and translation features based on an SVM model to link articles between these two encyclopedias. To evaluate our approach, we compile datasets from Baidu Baike articles and their corresponding En Wikipedia articles. The evaluation results show that our approach achieves 0.8252 in MRR, outperforming the baseline system by 0.1745 (+26.82%). Our method does not heavily depend on specific platform formats or linguistic characteristics, so it could be easily extended to generate cross-language article links among other online encyclopedias in other languages and on other platforms.
Subjects
online encyclopedia article linking
cross-language
topic model
parenthetical translation
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-D97922023-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):a77b398bd6a32285d0bece1f569975e8
