Position-Weighted Measures for the Company Name-Matching Problem
Date Issued
2016
Date
2016
Author(s)
Li, Ching-Kuo
Abstract
This thesis focuses on the company name-matching problem. We analyze common errors and complications in company names committed by users that make the company name-matching problem difficult. Although the company name-matching problem is a type of name-matching problem, it has special features that make these common name-matching methods barely the best choice in the company name-matching problem. Therefore, according to the construction of the company name, we propose a novel idea of position weight to address company name-matching problem. Then, we compare our proposed position-weighted measure with the Monge-Elkan measure and the soft TF/IDF in the popular business data set and two data sets from a major semiconductors manufacturer. The result indicates that the position-weighted measure performs best overall based on maximum F1 and our proposed rating measure in the company name-matching problem. Besides the company name, the position weighted measure can also be used in some name-matching problems that have similar construction with the company name.
Subjects
Company name
Name-matching problem
String similarity
Position weight
Data integration
Type
thesis
File(s)
Loading...
Name
ntu-105-R03323037-1.pdf
Size
23.54 KB
Format
Adobe PDF
Checksum
(MD5):8e44c04e4d037512dc9b462819a6ea34