A Query-Efficient Algorithm for String-Similarity Search

Huang, You-Sheng

A Query-Efficient Algorithm for String-Similarity Search

Date Issued

2015

Date

2015

Author(s)

Huang, You-Sheng

URI

http://ntur.lib.ntu.edu.tw//handle/246246/275479

Abstract

Edit distance, a measure determining the similarity between two strings, is a criterion that has been used widely. String similarity search finds strings in a dataset that are similar to a given query string. Edit-distance based string similarity search is exploited in many fields, e.g., database cleaning, error detection and correction and data retrieval. Most approaches toward string similarity search resort to filtering out as many strings in datasets as possible and verifying the remaining strings. However, these approaches use only one filtering method throughout the whole procedure, which makes the power of the method fluctuates during the whole procedure. To overcome this problem, we propose a data structure integrating different filtering methods and adopting a more efficient one on each phase. We also give corresponding algorithms for two important queries of string similarity search, range query and top-k query. Experimental results show that our approach is competitive for range query when thresholds are small enough.

Subjects

edit distance

string similarity search

range query

top-k query

Type

thesis

File(s)

Name

ntu-104-R02922064-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):70947453774e240ba6d1c972645f6119

A Query-Efficient Algorithm for String-Similarity Search

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)