A Machine Learning Approach for Result Fusion in Multilingual Information Retrieval
Date Issued
2008
Date
2008
Author(s)
Wang, Yu-Ting
Abstract
Multilingual information retrieval aims to able users enter query in one language and access relevant documents in various languages. Usually, implementation of MLIR (multilingual information retrieval) is first retrieving each language to obtain bilingual retrieved documents lists from each language collection. Then, how to merge these bilingual lists is the main issue in this work. In this work, we use machine learning approach, FRank, to build a merge model; merging these multiple bilingual lists using the merge model score and retrieval score. Firstly, we identify some effective factors which may influence MLIR process from three levels general level, translation level and document level. On translation level, previous study showed translation quality is crucial for cross-language information retrieval. Besides, we classify each query term into a category which are pre-defined manually. From our experiment, some categories play more important roles in a query while information retrieval; moreover, there are some relationships between categories. The translation quality of those influential categories is crucial for MLIR. On document level, we extract document and document title length as the quantity of informative. On each level, we totally extract 62 features; utilizing these features, we not only train a merge model but also identify what are the effective features for MLIR merging process. In our experiment, we can achieve the best performance among all traditional merging strategies, including raw-score merging, round-robin merging, normalized by top K merging, logistic regression and 2-step re-indexing merging method. Besides, from the features picked up by FRank as weak learners, we can identify translation quality of some query term categories, translatable query terms and ambiguous degree while translating are effective while MLIR merging.
Subjects
Multilingual Information Retrieval
Data Fusion
Machine Learning
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-97-R95922066-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):6083e4db4b56ee1b94ee6058b27753bc
