Faster Algorithms for Computing Edit Distance and Alignment between DNA Strings
Date Issued
2009
Date
2009
Author(s)
Hsu, Kang-Hua
Abstract
A newly discovered DNA sequence is usually input as a query sequence to search the database, which contains many known sequences, for some sequences similar to the query sequence. The reason is that high similarity between two DNA sequences is thought that they are homology and they have similar functions. The edit distance and alignment between two DNA sequences are two elementary methods showing the similarity between two sequences. The dynamic programming method is mathematically optimal for these problems; however, it costs too much time and too much memory such that using a common PC for analyzing DNA sequences is impossible. Therefore, some fast heuristics are developed. This thesis can be divided into two parts. In the first part, we concentrate on the fast database search heuristics. Although the fast heuristics runs rapidly, the outcomes are usually not accurate. We propose the UDCR algorithm, which first maps the DNA sequences to the numeric domain and then employ the characteristics of the discrete correlation function to show the “better-aligned locations” between two sequences. According to the better-aligned locations, we then use the dynamic programming method for computing the edit distance (or similarity score) and the accompanying alignment between two sequences. In the second part, we propose a fast algorithm which aligns two sequences and computes the accurate edit distance. The proposed algorithm is based on the dynamic programming method accompanying with some rules to ignore unnecessary computations. Besides, the memory requirement is substantially reduced. It is noteworthy that the proposed algorithm can be used to further analyze the outcomes of the fast database search heuristics.
Subjects
Database search
Dynamic programming
Bioinformatics
Sequence alignment
Edit distance
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-98-R96942097-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):fc4a2713a102f0bf6e7e5b883e7f98d8
