Mining Maximal Sequential Patterns in Protein Databases
Date Issued
2005
Date
2005
Author(s)
Ling, Yu
DOI
en-US
Abstract
Because of the close relationship between sequential patterns and protein function, systematically mining significant sequential patterns in protein databases has become an important research topic.
In this thesis, we proposed a suffix-tree-based algorithm to discover patterns in protein databases. We use the occurrence information maintained in the suffix tree to mine closed frequent substrings, generate maximal frequent sequential patterns, and adjust the gaps within the patterns. To ensure the compactness of the patterns we generate, we do not generate all patterns but only maximal patterns. From the experimental results, our proposed algorithm can find not only the patterns recorded in PROSITE database, but also some other patterns worth of further biological studying, such as longer patterns and the classifier pattern set. Besides, our proposed algorithm generates better results than those of Chang and Halgamuge’s method in the experiment.
Subjects
蛋白質
最大序列樣式
字尾樹
protein
maximal sequential pattern
suffix tree
Type
other
File(s)![Thumbnail Image]()
Loading...
Name
ntu-94-R92725020-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):155556d7d1b6e7a93781798c9aedd102
