On 3D Substructure Matching of Molecules and Its Applications in Protein Function Classification
Date Issued
2004
Date
2004
Author(s)
Chen, Chien-Cheng
DOI
en-US
Abstract
With the explosion of protein sequences and structures storing into databanks, it is highly desirable to explore feasible alignment and classification methods for newly found protein enzymes classified into their respective enzyme classes by means of an automated procedure. This is indeed important because knowing which class an enzyme belongs to may help deduce its catalytic mechanism and specificity, giving clues to relevant biological functions. Traditional experimental methods are both time-consuming and costly. However, these requirements can be met by transforming them into a 3D common substructure matching problem among proteins. In this dissertation, I have proposed a two-pass geometric hashing based framework, the Geometric Hashing and Iterative Closest Point algorithms, to handle the 3D substructure matching problem of proteins. Two techniques, alpha shape and 3D reference frame schemes for feature extraction, are used to reduce the computation cost.
Our methods aligned two molecules (not just proteins) based on 3D structural data. Two main experiments are conducted based on the data from the PDB. The first is to solve the molecular alignment problem, where, for example, the similarity between a protein EFG and a non-protein EF-tu/tRNA complex is calculated. We also compare with four popular tools (Yale, Dali, CE and Lund) with six protein pairs, and the results show that our method improves over the others in terms of RMSD and the number of matched atom pairs. However, our method is computationally more expensive. In the second experiment, which is also an innovative application, active site residues in a protein can be aligned and matched against other proteins with different enzyme classification numbers. With the consideration of atom type and binding surface area as discriminators, this method can reach an accuracy rate of 42.12% for enzymes classified in EC4, and 79.06% for EC5, and over 60.0% for EC6 conservatively, and a higher upper bound for accuracy rate is also evaluated. Both experiments demonstrate that the proposed methods are useful and versatile.
Our methods aligned two molecules (not just proteins) based on 3D structural data. Two main experiments are conducted based on the data from the PDB. The first is to solve the molecular alignment problem, where, for example, the similarity between a protein EFG and a non-protein EF-tu/tRNA complex is calculated. We also compare with four popular tools (Yale, Dali, CE and Lund) with six protein pairs, and the results show that our method improves over the others in terms of RMSD and the number of matched atom pairs. However, our method is computationally more expensive. In the second experiment, which is also an innovative application, active site residues in a protein can be aligned and matched against other proteins with different enzyme classification numbers. With the consideration of atom type and binding surface area as discriminators, this method can reach an accuracy rate of 42.12% for enzymes classified in EC4, and 79.06% for EC5, and over 60.0% for EC6 conservatively, and a higher upper bound for accuracy rate is also evaluated. Both experiments demonstrate that the proposed methods are useful and versatile.
Subjects
活化區域
幾何雜湊法
疊代式最近點演算法
結構比對
蛋白質功能分類
structure alignment
protein function classification
active site
geometric hashing
Iterative Closest Point
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-93-D84526008-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):0ca5ebc9561dcb8a7bf83ee5bed31c48