Efficient algorithms for locating the length-constrained heaviest segments, with applications to biomolecular sequence analysis
Resource
The 27th International Symposium on Mathematical Foundations of Computer Science, (MFCS 2002)
Journal
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Journal Volume
2420
Pages
459 - 470
Date Issued
2002
Date
2002
Author(s)
Abstract
We study two fundamental problems concerning the search for interesting regions in sequences: (i) given a sequence of real numbers of length n and an upper bound U, find a consecutive subsequence of length at most U with the maximum sum and (ii) given a sequence of real numbers of length n and a lower bound L, find a consecutive subsequence of length at least L with the maximum average. We present an O(n)- time algorithm for the first problem and an O(n log L)-time algorithm for the second. The algorithms have potential applications in several areas of biomolecular sequence analysis including locating GC-rich regions in a genomic DNA sequence, post-processing sequence alignments, annotating multiple sequence alignments, and computing length-constrained ungapped local alignment. Our preliminary tests on both simulated and real data demonstrate that the algorithms are very efficient and able to locate useful (such as GC-rich) regions. © Springer-Verlag Berlin Heidelberg 2002.
Event(s)
27th International Symposium on Mathematical Foundations of Computer Science, MFCS 2002
Subjects
Algorithm; Biomolecular sequence analysis; Efficiency; Length constraint; Maximum consecutive subsequence; Ungapped local alignment
Type
conference paper
File(s)![Thumbnail Image]()
Loading...
Name
16.pdf
Size
202.68 KB
Format
Adobe PDF
Checksum
(MD5):527d75b159a28338f46446ac2fe0a096
