2011-08-012024-05-17https://scholars.lib.ntu.edu.tw/handle/123456789/685471摘要:生物現象的表現通常是數個基因透過複雜的訊息傳導路徑、基因網絡或生物功能交互作而成,因此利用生物晶片資料來分析基因叢集更能真正反映生物演化過程,ㄧ基因叢集可能代表ㄧ群有相關功能的基因或者同屬於某一生物註解功能的類別,主要依據其代謝路徑、蛋白體結構或基因本體(Gene Ontology)等基因註解整合性資料庫來加以分類;藉由整合這些基因的生物註解資訊及基因表現資料,我們可以更深入去探討基因和生物特徵之間的相關分析。基因叢集富集分析(GSEA)方法主要分析在基因晶片實驗中,與代謝路徑或細胞調控結構有關之群體基因的顯著表現分析。然而較少研究來探討具有生物註解資訊的基因叢集對於生物特徵是否較任意基因集更具有分類及臨床預測的能力。 本研究主要整合基因叢集及基因表現資料,期望提出一合理實用的基因叢集分類分析(GSCA)的統計分析方法。在本計劃中我們將利用在ROC 曲線下之區域(AUC)來檢測基因叢集是否具有顯著的分類能力,找出使得AUC 最大化下基因叢集內基因表現的線性組合,並提出以AUC 為基礎的統計量,經由隨機排列方計算其P 值,以此為判斷基因叢集是否具有顯著分類能力的準則。此外,藉由每個基因叢集最佳化的線性組合係數,我們可以來量化基因叢集中每個基因的影響力。在計畫中,各分析步驟將透過電腦模擬的方式來評估方法的有效性,並結合基因晶片資料和在一些大型公用資料庫中已知的生物註解資訊來探索基因叢集表現之變化情形,期望能提供協助解釋複雜的生物演化過程。 <br> Abstract: Gene Set Enrichment Analysis (GSEA) utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes and is the most widely used method for gene analysis. However, little attention has been given to the discriminatory power of gene sets. Thus, it is of great interest to identify which differential gene sets are strongly associated with phenotypic class distinction ability by integrating gene expression data with prior biological knowledge. In this study, we plan to propose two non-parametric methods to identify differential gene sets using the area under the receiver operating characteristic (ROC) curve (AUC) of linear risk scores of gene sets, which are obtained through a maximization algorithm within gene sets. The p-values of AUC-based statistics and the AUC values obtained from cross-validation of the linear risk scores are calculated, and used as indexes to identify differential gene sets. The discrimination powers of gene sets are summarized and gene sets that possess discrimination power are selected via a prescribed p-value threshold or a predefined cross-validation AUC threshold. Moreover, we further distinguish the impacts of individual gene sets in terms of discrimination power based on the absolute values of linear combination coefficients. The proposed methods allow investigators to identify enriched gene sets with high discrimination power and discover the contributions of genes within gene set via the corresponding linear combination coefficients. The performance of proposed approach will be evaluated and compared with other methods by using extensive numerical studies including artificial datasets and several public gene expression datasets. We postulate that our proposed methods have the potential to detecting enrichment and to provide an insightful alternative to gene set testing.基因叢集基因本體基因叢集富集分析基因叢集分類分析ROC 曲線下之區域(AUC)Gene Set Enrichment Analysis (GSEA)gene ontology (GO)discriminatory powerthe receiver operating characteristic (ROC)the area under the ROC curve (AUC)cross-validation.以生物註解基因叢集為基礎的基因表現分類分析