Prediction of enzyme catalytic sites by sequential pattern mining
Date Issued
2008
Date
2008
Author(s)
Chien, Ting-Ying
Abstract
Large-scale automatic annotation for protein sequences remains challenging in post-genomics era. This thesis aims at predicting catalytic sites of enzyme sequences based on a repository of protein signatures. The employed sequence signatures are derived from a motif based method. The blocks of a signature, also called conserved regions, are composed of the key residues found among the homologues. These blocks are conserved during evolution because of their importance in protein functions. Biological experiments reveal that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. To predict catalytic sites comprehensively, it is expected that the employed signatures must contain residues that are largely scattered in sequence. In this regard, we employ a recently developed pattern mining algorithm WildSpan for generating enzyme sequence signatures. WildSpan is well designed for discovering sequence motifs spanning a large number of unimportant positions. To measure the performance of our method, we collect the annotated catalytic sites for 831 enzymes from Catalytic Site Atlas (CSA). The results reveal that our method performs more effectively in identifying catalytic sites and catalytic residues than the patterns derived from PROSITE database. The proposed method has been realized in a web server named E1DS (http://e1ds.csbb.ntu.edu.tw/). E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. In average, on the task of predicting catalytic sites, E1DS achieves a ‘correct’ rate of 35.5% and a ‘success rate’ of 49.6%, while the ‘correct’ and ’success’ rates of using PROSITE patterns are 18.9% and 33.7% respectively. On the other hand, on the task of predicting catalytic residues, the sensitivity rate of E1DS is 30.0%, better than that of PROSITE (16.2%), though the specificity rate of E1DS (96.7%) is slightly worse than that of PROSITE (98.6%).
Subjects
Sequential pattern mining
Catalytic site
Signature
EC number
Enzyme function
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-97-R95922108-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):1b2dc9a7bc034414280e34b445f172a4
