Utilizing Cox Regression Model to Assess the Relations between Gene Sets and the Prognosis of Lung Adenocarcinoma
Date Issued
2010
Date
2010
Author(s)
Lu, Jo-Yang
Abstract
Lung cancer has been the leading cause of cancer related death worldwide. Hitherto, more than 30% of early stage lung cancer patients who received complete surgical resection died of relapse. Although lots of prognosis studies based on whole genome profiling had been published, little overlaps of the prognostic genes from different research groups made the utilization in clinical infeasible. Moreover, due to the biological meaning and the interactions of the prognostic genes were not taken into account in previous studies, explaining those results biologically and comprehensively was not easy.
To overcome the bottleneck, we proposed a novel method based on gene set enrichment analysis and Cox-hazard regression model. The predefined gene sets which derived from biological pathways, in vitro, and in vivo experiments were utilized as prognosis targets instead of single genes. Cox-hazard regression model was then applied to assess the relation between the survival outcome and the enrichment of gene sets. The integration of predefined gene sets and Cox-hazard regression model provided not only the power of predicting survival outcome, but also the connection between biological functions and prognosis.
Our method was composed of three algorithms: first, a two-step hypothesis testing procedure was applied to select gene sets associated with the survival outcome. Second, prognostic genes sets were clustered, and a representative gene set was selected from each cluster. Lastly, the similarities between the gene expression patterns of those representative gene sets were evaluated by kernel matrix, and the results were illustrated as gene set association networks.
The biological functions associated with the survival outcome of lung adenocarcinoma can be divided into four categories: cell cycle regulation, energy dysregulation, post-translation modification, and wound healing process. Although they were all reported by previous literatures as essential functions in the progression of lung adenocarcinoma, our data indicated that they functioned in a coordinate manner.
Not only for clinical prognosis as we demonstrated, our method can also be applied to assess the relations between other types of continuous variable and gene set. By incorporating predefined gene sets and the regression model, the gene sets associated with the testing variables and the biological interpretation connected with those gene sets could be illuminated by our method.
To overcome the bottleneck, we proposed a novel method based on gene set enrichment analysis and Cox-hazard regression model. The predefined gene sets which derived from biological pathways, in vitro, and in vivo experiments were utilized as prognosis targets instead of single genes. Cox-hazard regression model was then applied to assess the relation between the survival outcome and the enrichment of gene sets. The integration of predefined gene sets and Cox-hazard regression model provided not only the power of predicting survival outcome, but also the connection between biological functions and prognosis.
Our method was composed of three algorithms: first, a two-step hypothesis testing procedure was applied to select gene sets associated with the survival outcome. Second, prognostic genes sets were clustered, and a representative gene set was selected from each cluster. Lastly, the similarities between the gene expression patterns of those representative gene sets were evaluated by kernel matrix, and the results were illustrated as gene set association networks.
The biological functions associated with the survival outcome of lung adenocarcinoma can be divided into four categories: cell cycle regulation, energy dysregulation, post-translation modification, and wound healing process. Although they were all reported by previous literatures as essential functions in the progression of lung adenocarcinoma, our data indicated that they functioned in a coordinate manner.
Not only for clinical prognosis as we demonstrated, our method can also be applied to assess the relations between other types of continuous variable and gene set. By incorporating predefined gene sets and the regression model, the gene sets associated with the testing variables and the biological interpretation connected with those gene sets could be illuminated by our method.
Subjects
NSCLC
prognosis
survival analysis
functional enrichment
predefined gene set
SDGs
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-99-R97945003-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):cea4a1cc4cf6d00230714b36bbf0068a