複合式蛋白質序列分群演算法之研究

瀏覽統計 Email 通知 RSS Feed

簡歷

基本資料

Project title/計畫英文名

複合式蛋白質序列分群演算法之研究

Project Number/計畫編號

93-2218-E-002-149

Translated Name/計畫中文名

複合式蛋白質序列分群演算法之研究

Project Principal Investigator/計畫主持人

CHIEN-YU CHEN

Funding Organization

National Science and Technology Council

Start date/計畫起

01-10-2004

Expected Completion/計畫迄

30-09-2005

摘要：本計畫將針對複合式蛋白質分群演算法進行研究，目的在於開發以統計模型為基礎的階層式分群演算法，使其產生之蛋白質階層在不同高度具有不同的特性。在本主持人最近的研究成果中，成功地將統計模型與階層式分群演算法結合，所開發之蛋白質分群演算法不僅比傳統階層式分群演算法享有較低的複雜度，同時利用統計模型簡化原有二元階層的節點數，提供生物學家更為精確與實用之分群資訊。

階層式蛋白質分群演算法目前仍有一些困難需要克服始能提高蛋白質階層之正確率。本計畫將延續本主持人之前之研究，首先針對傳統階層式分群演算法不容許同一蛋白質分屬階層之不同位置之特性加以改良，進一步則必須針對蛋白質家族的特性，設計適當的控制機制，使所得之蛋白質階層在不同高度，分別滿足不同大小之蛋白質家族的需求。

本計畫為一年期之計畫，前半年將集中在研究如何利用特定蛋白質與其他蛋白質之相似度分布曲線，辨識需要在階層底層被複製的蛋白質集合。後半年則研究如何結合已開發之統計檢測，在各個分群階段使用不同的分群準則，以達其研究目的。

Abstract: The objective of this project is to study the hybrid hierarchical protein sequence clustering algorithms. The project aims to provide biologists a protein hierarchy that matches different sizes of proteins in the different levels of the hierarchy. In the recent study, we have successfully employed the statistical models to improve the efficiency of the traditional hierarchical clustering algorithms for protein family analysis. The proposed statistical model based algorithm also provides users a summarized hierarchy that the size of which is much smaller than the original binary tree generated by the traditional hierarchical clustering algorithms.

There are still some challenges for protein sequence clustering. In this project, we will continue our recent study to design a hybrid hierarchical clustering algorithm based on statistical models. In order to satisfy the demand of protein family analysis, the first problem we need to tackle is some multi-function proteins should be placed at more than one position in the protein hierarchy. Next, different sizes of protein families possess different properties. Smaller families ask for the property of homogeneity, while the larger families need to utilize the property of transitivity in order to find remote homology. The hierarchical clustering algorithm should hybridize different criterions for controlling the formation of new clusters.

The duration of this project is one year. In the first half of the year, we plan to recognize the proteins that should be duplicated in the bottom level of the hierarchy by examining the distribution of the similarities between a particular protein and all of the other proteins. In the remaining half of the year, different controlling criterions are designed and used in the different stages of clustering process to generate the hierarchy that matches the protein families better.

Keyword(s)

蛋白質序列
分群演算法
protein sequence clustering

DSpace CRIS

複合式蛋白質序列分群演算法之研究

基本資料

Description