Prediction of Human Protein-Protein Interactions Using Support Vector Machines
Date Issued
2007
Date
2007
Author(s)
Huang, Tao-Wei
DOI
en-US
Abstract
The recent increase in the use of high-throughput two-hybrid
analysis has generated a large amount of data on protein
interactions. Specifically, the availability of information about
experimental protein-protein interactions and other protein features
on the Internet enables human protein-protein interactions to be
computationally predicted from co-evolution events (interolog).
Computational methods must be developed to integrate these
heterogeneous biological data to facilitate the maximum accuracy of
the human protein interaction prediction.
In knowledge-based study, we proposes a relative conservation score
by identifying maximal quasi-cliques in protein interaction
networks, and addressing of other interaction features to formulate
a scoring method. The scoring method can be adopted to discover
which protein pairs are the most likely to interact in multiple
protein pairs. The predicted human protein-protein interactions
associated with confidence scores are derived from six eukaryotic
organisms - rat, mouse, fly, worm, thale cress and baker's yeast.
The evaluation of our proposed method using functional keyword and
gene ontology annotations indicates that some confidence is
justified in the accuracy of the predicted interactions. Comparisons
among existing methods also reveal that the proposed method predicts
human protein-protein interactions more accurately than other
interolog-based methods.
This study considers protein interaction features, including
interolog, spatial proximity (sub-cellular localization,
tissue-specificity), temporal synchronicity (the cell-cycle stage),
and domain-domain pair combinations. Using these $6$ protein
features, and combination of hydrophobic, charge, and volume amino
acid property as $3$ sets of $16$-dimension features to construct
committee models of support vector machines (SVMs). The final
$5$-fold cross validation testing for $10$ different size test sets
revealed that the accuracy of test set can be obtained above 90\%.
Moreover, the analytical comparisons also suggested our proposed
method have higher accuracy than other SVM-based methods.
analysis has generated a large amount of data on protein
interactions. Specifically, the availability of information about
experimental protein-protein interactions and other protein features
on the Internet enables human protein-protein interactions to be
computationally predicted from co-evolution events (interolog).
Computational methods must be developed to integrate these
heterogeneous biological data to facilitate the maximum accuracy of
the human protein interaction prediction.
In knowledge-based study, we proposes a relative conservation score
by identifying maximal quasi-cliques in protein interaction
networks, and addressing of other interaction features to formulate
a scoring method. The scoring method can be adopted to discover
which protein pairs are the most likely to interact in multiple
protein pairs. The predicted human protein-protein interactions
associated with confidence scores are derived from six eukaryotic
organisms - rat, mouse, fly, worm, thale cress and baker's yeast.
The evaluation of our proposed method using functional keyword and
gene ontology annotations indicates that some confidence is
justified in the accuracy of the predicted interactions. Comparisons
among existing methods also reveal that the proposed method predicts
human protein-protein interactions more accurately than other
interolog-based methods.
This study considers protein interaction features, including
interolog, spatial proximity (sub-cellular localization,
tissue-specificity), temporal synchronicity (the cell-cycle stage),
and domain-domain pair combinations. Using these $6$ protein
features, and combination of hydrophobic, charge, and volume amino
acid property as $3$ sets of $16$-dimension features to construct
committee models of support vector machines (SVMs). The final
$5$-fold cross validation testing for $10$ different size test sets
revealed that the accuracy of test set can be obtained above 90\%.
Moreover, the analytical comparisons also suggested our proposed
method have higher accuracy than other SVM-based methods.
Subjects
支援向量機
蛋白質交互作用
同源蛋白質交互作用
疏水性
帶電性
分子體積
support vector machine
SVM
protein interaction
PPI
interolog
hydrophobic
charge
volume
Type
thesis