Effective string processing and matching for author disambiguation

Chin, Wei ShengWei ShengChinJuan, Yu ChinYu ChinJuanZhuang, YongYongZhuangWu, FelixFelixWuTung, Hsiao YuHsiao YuTungYu, TongTongYuWang, Jui PinJui PinWangChang, Cheng XiaCheng XiaChangYang, Chun PaiChun PaiYangChang, Wei ChengWei ChengChangHuang, Kuan HaoKuan HaoHuangKuo, Tzu MingTzu MingKuoLin, Shan WeiShan WeiLinLin, Young SanYoung SanLinLu, Yu ChenYu ChenLuSu, Yu ChuanYu ChuanSuWei, Cheng KuangCheng KuangWeiYin, Tu ChunTu ChunYinLi, Chun LiangChun LiangLiLin, Ting WeiTing WeiLinTsai, Cheng HaoCheng HaoTsaiSHOU-DE LINHSUAN-TIEN LINCHIH-JEN LIN2023-08-012023-08-012013-01-019781450324953https://scholars.lib.ntu.edu.tw/handle/123456789/634398Track 2 in KDD Cup 2013 aims at determining duplicated authors in a data set from Microsoft Academic Search. This type of problems appears in many large-scale applications that compile information from different sources. This paper describes our solution developed at National Taiwan University to win the first prize of the competition. We propose an effective name matching framework and realize two implementations. An important strategy in our approach is to consider Chinese and non-Chinese names separately because of their different naming conventions. Post-processing including merging results of two predictions further boosts the performance. Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board. © 2013 ACM.[SDGs]SDG17Effective string processing and matching for author disambiguationconference paper10.1145/2517288.25172952-s2.0-85146706624https://api.elsevier.com/content/abstract/scopus_id/85146706624