https://scholars.lib.ntu.edu.tw/handle/123456789/632518
標題: | Novel association measures using web search with double checking | 作者: | HSIN-HSI CHEN Lin M.-S Wei Y.-C. |
公開日期: | 2006 | 卷: | 1 | 起(迄)頁: | 1009-1016 | 來源出版物: | COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference | 摘要: | A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co-Occurrence Double Check (CODC), are presented. In the experiments on Rubenstein-Goodenough's benchmark data set, the CODC measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using WordNet. The experiments on link detection of named entities using the strategies of direct association, association matrix and scalar association matrix verify that the double-check frequencies are reliable. Further study on named entity clustering shows that the five measures are quite useful. In particular, CODC measure is very stable on wordword and name-name experiments. The application of CODC measure to expand community chains for personal name disambiguation achieves 9.65% and 14.22% increase compared to the system without community expansion. All the experiments illustrate that the novel model of web search with double checking is feasible for mining associations from the web. © 2006 Association for Computational Linguistics. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84860518427&doi=10.3115%2f1220175.1220302&partnerID=40&md5=6d76ab935d7935ec99b448c716275d2b https://scholars.lib.ntu.edu.tw/handle/123456789/632518 |
DOI: | 10.3115/1220175.1220302 | SDG/關鍵字: | Benchmarking; Information retrieval; Natural language processing systems; Websites; Association matrix; Association measures; Benchmark data; Correlation coefficient; Mining associations; Named entities; Novel associations; Personal name disambiguation; Computational linguistics |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。