A cascaded classification approach to disambiguating polysemous mentions with social chains
Journal
Expert Systems with Applications
Journal Volume
37
Journal Issue
7
Pages
5404-5414
Date Issued
2010
Author(s)
Abstract
This paper considers five features including titles, community chains, terms, temporal expressions, and hostnames for personal name disambiguation. In nine test data sets covering three ambiguous personal names, we address the issues of awareness degree of an entity, the source of materials and web pages in different areas. In a single-clusterer approach, employing all features achieve average F-score 0.635, which is better than employing contextual terms only 0.502. When community chains are expanded by using the web, the average F-score is increased to 0.676. We also propose a multiple-clusterer approach, which cascades five clusterers corresponding to the five features. The average F-score is further improved to 0.684. Expanding community chains also enhances the average F-score of the multiple-clusterer approach to 0.697. In summary, the proposed features are quite useful; the cascaded multiple-clusterer approach is better than the single-clusterer approach; and expanding community chains using the web has positive effects on personal name disambiguation. The experiments show that this approach has significant improvements. ? 2010 Elsevier Ltd. All rights reserved.
Subjects
Cascaded clusterers
Community chain
Name disambiguation
Single-clusterers
Type
journal article
