A Query-Efficient Algorithm for String-Similarity Search
Date Issued
2015
Date
2015
Author(s)
Huang, You-Sheng
Abstract
Edit distance, a measure determining the similarity between two strings, is a criterion that has been used widely. String similarity search finds strings in a dataset that are similar to a given query string. Edit-distance based string similarity search is exploited in many fields, e.g., database cleaning, error detection and correction and data retrieval. Most approaches toward string similarity search resort to filtering out as many strings in datasets as possible and verifying the remaining strings. However, these approaches use only one filtering method throughout the whole procedure, which makes the power of the method fluctuates during the whole procedure. To overcome this problem, we propose a data structure integrating different filtering methods and adopting a more efficient one on each phase. We also give corresponding algorithms for two important queries of string similarity search, range query and top-k query. Experimental results show that our approach is competitive for range query when thresholds are small enough.
Subjects
edit distance
string similarity search
range query
top-k query
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-R02922064-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):70947453774e240ba6d1c972645f6119