Options
Effective string processing and matching for author disambiguation
Journal
Proceedings of the 2013 KDD Cup 2013 Workshop
End Page
9
ISBN
9781450324953
Date Issued
2013-01-01
Author(s)
Chin, Wei Sheng
Juan, Yu Chin
Zhuang, Yong
Wu, Felix
Tung, Hsiao Yu
Yu, Tong
Wang, Jui Pin
Chang, Cheng Xia
Yang, Chun Pai
Chang, Wei Cheng
Huang, Kuan Hao
Kuo, Tzu Ming
Lin, Shan Wei
Lin, Young San
Lu, Yu Chen
Su, Yu Chuan
Wei, Cheng Kuang
Yin, Tu Chun
Li, Chun Liang
Lin, Ting Wei
Tsai, Cheng Hao
Abstract
Track 2 in KDD Cup 2013 aims at determining duplicated authors in a data set from Microsoft Academic Search. This type of problems appears in many large-scale applications that compile information from different sources. This paper describes our solution developed at National Taiwan University to win the first prize of the competition. We propose an effective name matching framework and realize two implementations. An important strategy in our approach is to consider Chinese and non-Chinese names separately because of their different naming conventions. Post-processing including merging results of two predictions further boosts the performance. Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board. © 2013 ACM.
Type
conference paper