Tracking DNA route on protein structure by knowledge-based learning considering geometric propensity between side chains and bases
Date Issued
2009
Date
2009
Author(s)
Wang, Chien-Chih
Abstract
DNA-binding proteins reveal their functions through specific or non-specific protein-DNA recognition. Identifying DNA-binding residues with computational tools facilitates predicting or validating protein functions at a high-throughput rate. The protein-DNA complexes available in Protein Data Bank (PDB) further unveils how a DNA-binding protein recognizes its partners. Such information greatly helps biologists to determine or predict the binding elements in DNA sequences such as transcription factor binding sites (TFBSs). In this way, accurate regulatory networks in whole-genome scale can be constructed more efficiently in the near future. While it remains a challenging task to understand the mechanism of protein-DNA interactions without crystal complex structures, this thesis proposes an algorithm to predict the binding position and direction of DNA when given a known protein structure. First, potential DNA-binding regions of a query protein is predicted by a sequential pattern mining software, MAGIIC-PRO, which identifies functional regions of a protein by discovering concurrent conserved regions among its related protein sequences. After functional regions are predicted, we extract the residues in the protein surface and use hierarchical clustering algorithm to derive potential DNA-binding units, compact conserved regions with high DNA-binding propensity. Afterward, principal component analysis (PCA) is applied on the collected atoms to predict the orientation of DNA grooves. In order to derive the positions where the DNA bases like to be present, we propose a knowledge-based learning procedure to construct a predicting model that considers geometric propensity between protein side chains and DNA bases. The experiments conducted in the thesis reveal that we can predict the orientation of the DNA grooves around the selected conserved regions with satisfied errors. Furthermore, with a well-designed scoring function that incorporates radius basis function (RBF) as the kernel, we build spatial distributions of the positions where DNA bases likes to be present. The computational outputs are expected to provide useful information for many of the next-step analyses such as protein-DNA docking and TFBS predictions.
Subjects
DNA-binding sites
binding orientation
structure-based prediction
protein-DNA interactions
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-98-R96631012-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):e5fa535b7cf69b115bc98c3d0ddbdceb
