Query sampling for learning data fusion
Journal
International Conference on Information and Knowledge Management
Pages
141-146
Date Issued
2011
Author(s)
Lin, T.-C.
Abstract
10.1145/2063576.2063601
Data fusion is to merge the results of multiple independent retrieval models into a single ranked list. Several earlier studies have shown that the combination of different models can improve the retrieval performance better than using any of the individual models. Although many promising results have been given by supervised fusion methods, training data sampling has attracted little attention in previous work of data fusion. By observing some evaluations on TREC and NTCIR datasets, we found that the performance of one model varied largely from one training example to another, so that not all training examples were equivalently effective. In this paper, we propose two novel approaches: greedy and boosting approaches, which select effective training data by query sampling to improve the performance of supervised data fusion algorithms such as BayesFuse, probFuse and MAPFuse. Extensive experiments were conducted on five data sets including TREC-3,4,5 and NTCIR-3,4. The results show that our sampling approaches can significantly improve the retrieval performance of those data fusion methods. © 2011 ACM.
Subjects
adafuse; data fusion; query sampling
Other Subjects
adafuse; Boosting approach; Data fusion algorithm; Data fusion methods; Data sets; Fusion methods; Individual models; Learning data; Retrieval models; Retrieval performance; Training data; Training example; Information retrieval; Knowledge management; Data fusion
Type
conference paper