Abstract: The objective of this project is to study the prediction and characterization of multifunctional proteins. The project aims to provide biologists a service of protein functional annotations that achieves both correctness and completeness. In the recent study, we have successfully employed the statistical models to improve the efficiency of the traditional hierarchical clustering algorithms for protein family analysis. The proposed statistical model based algorithm also provides users a summarized hierarchy that the size of which is much smaller than the original binary tree generated by the traditional hierarchical clustering algorithms. In this project, we next investigate the possibility of using this information in predicting multifunctional proteins.
There are still some challenges for protein functional annotations. The challenges come from two respects: the twilight zone of homology and the multifunctional proteins. The noisy relationships inherent in the twilight zone of homology result in many false predictions when only similarity scores are employed during analysis. Other the other hand, the existing of multifunctional proteins increases the complexity of the annotating procedure. Multifunctional proteins share many properties on sequences with multi-domain proteins. This causes the prediction even more difficult. In this project, we will continue our recent study in designing a robust clustering algorithm based on statistical models. We employ the proposed statistical test to identify the so called homogeneous homology protein clusters and next use this information for identifying multifunctional proteins based on the property of homogeneity. The predictions of multifunctional proteins are next fed back to the clustering hierarchy in order to provide correct and completed functional annotations.
The duration of this project is one year. In the first half of the year, we plan to develop a robust predicting procedure for multifunctional proteins and simultaneously characterize the sequences of multifunctional proteins. In the remaining half of the year, the results of prediction are integrated with the generated protein hierarchy to provide a web service for biologists in automated functional annotations.