Protein Structure Comparison Based on Encoding and Fast Indexing of Sphere Model
Date Issued
2007
Date
2007
Author(s)
Shiau, Yhi
DOI
en-US
Abstract
This thesis proposes a new method for PSC (Protein Structure Comparison) based on encoding and fast indexing of sphere model. At first, we try to create a fast PSC tool, EMPSC (Ellipsoidal Model for Protein Structure Comparison), hope to solve some drawbacks of other algorithms and provide an abililty of local alignment. Second, we apply the local alignment capability of EMPSC to try to detect structure conservation. We encounter the problem of variable size of finding local alignment region, third, so we apply NRS (Neighborhood Residues Sphere) concept to fix a size (10 Å) of local alignment region. From NRS sequence-structure clustering and comparisons, we also try to detect structure conservation. The applications using these algorithms are proven workable in the same EC class. At last, via the training of NRS related experiments, we propose another new ESC (Environmental Signature Cluster) PSC method. Try to provide an indexing methodology in three-dimensional geometry of protein local structure, let this method process the capability of massive database search and local alignment finding. The ESC method has fast provided the probability of mining structure conservation for whole PDB database.
EMPSC:a fast PSC tool based on ellipsoidal model
First, we propose a new method EMPSC for the well-known PSC (Protein Structure Comparison) problems. The proposed method EMPSC is a protein structural alignment algorithm based on ellipsoidal model abstraction. We segment the protein 3D structure into two different kinds of structures, including Secondary Structure Elements recognized by DSSP [Kabsch 1983] and other coil/loop structures. These SSEs (Secondary Structure Elements) will be the initial alignment center for obtaining the transformation coordinate systems. Different heuristic filters and geometric hashing based global alignment estimation are used for quick finding better initial alignments. In the refined alignment stage of analysis, a standard refinement algorithm is invoked to fine-tune the alignment outputted by the first stage. Our experimental results reveal that EMPSC generally achieves comparable accuracy and better performance in comparison with the existing PSC algorithms. Moreover, we analyzed the factors that affect the EMPSC performance and SSE-based PSC algorithms. Further investigation in multiple protein structure comparison and local structure comparison will be continued.
ESC:another faster PSC tool based on sphere model
Second, in this paper, our proposed method, Environmental Signature Cluster method (ESC), uses residues environmental signature based on Neighborhood Residues Sphere (NRS) concept to index three-dimensional geometry of protein local structure. With NRS local geometry indexing, we digitize protein structure into pieces of environmental signature of NRS which makes our method can process massive database search and local alignment finding, whatever one-against-all protein comparisons. So far, ESC can provide the similarity degree among proteins quickly. However, ESC method currently is very good for constraint local structure alignment and applying this fast method in one-to-all PDB (Protein Data Bank) comparison is workable. In average, we can output alignment result about 15 minutes while randomly selecting 50 protein chains to test one-against-all whole PDB search. The experimental results reveal that our proposed method possesses the capability of massive database search and fits for local structure identification and local structure conservation discovery.
Subjects
蛋白質結構比對(Protein Structure Comparison)
EMPSC (Ellipsoidal Model for Protein Structure Comparison)
結構一致性(Structure Conservation)
NRS (Neighborhood Residues Sphere)
ESC (Environmental Signature Cluster)
結構挖掘(Structure Mining)
PSC (Protein Structure Comparison)
structure conservation
geometric hashing
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-96-D87526004-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):12924f8416461c13a1e16159e8f6c901
