https://scholars.lib.ntu.edu.tw/handle/123456789/365020
標題: | Query sampling for learning data fusion | 作者: | Lin, T.-C. PU-JEN CHENG |
關鍵字: | adafuse; data fusion; query sampling | 公開日期: | 2011 | 起(迄)頁: | 141-146 | 來源出版物: | International Conference on Information and Knowledge Management | 摘要: | 10.1145/2063576.2063601 Data fusion is to merge the results of multiple independent retrieval models into a single ranked list. Several earlier studies have shown that the combination of different models can improve the retrieval performance better than using any of the individual models. Although many promising results have been given by supervised fusion methods, training data sampling has attracted little attention in previous work of data fusion. By observing some evaluations on TREC and NTCIR datasets, we found that the performance of one model varied largely from one training example to another, so that not all training examples were equivalently effective. In this paper, we propose two novel approaches: greedy and boosting approaches, which select effective training data by query sampling to improve the performance of supervised data fusion algorithms such as BayesFuse, probFuse and MAPFuse. Extensive experiments were conducted on five data sets including TREC-3,4,5 and NTCIR-3,4. The results show that our sampling approaches can significantly improve the retrieval performance of those data fusion methods. © 2011 ACM. |
URI: | http://www.scopus.com/inward/record.url?eid=2-s2.0-83055187772&partnerID=MN8TOARS http://scholars.lib.ntu.edu.tw/handle/123456789/365020 |
DOI: | 10.1145/2063576.2063601 | SDG/關鍵字: | adafuse; Boosting approach; Data fusion algorithm; Data fusion methods; Data sets; Fusion methods; Individual models; Learning data; Retrieval models; Retrieval performance; Training data; Training example; Information retrieval; Knowledge management; Data fusion |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。