Gene Selection and Regulatory Network Construction with Partial Coefficient of Intrinsic Dependence
Date Issued
2015
Date
2015
Author(s)
Hsiao, Ya-Chun
Abstract
The coefficient of intrinsic dependence (CID) is capable of determining associations among variables without making distributional or functional assumptions regarding to random variables. The CID value of the target variable would increase when more predictor variables include. This implies that a CID value of the target variable given multiple predictors is significant as the most relevant predictor is included even though the other predictors have weak association with the target variable. In this study, we developed the partial coefficient of intrinsic dependence (pCID) to facilitate the step-by-step selection of variables that are relevant to a target variable. Furthermore, we applied pCID method to stepwise variable selection and the construction of gene regulatory network. In stepwise variable selection, the strategy of selecting relevant variables using the CID along with the pCID can eliminate interference from other relevant variables. From simulation results, we observed that the proposed method is more sensitive to curvilinearity and more specific to linearity than the combination of Pearson’s correlation coefficient and the partial correlation coefficient (PCC/pPCC). This property may provide the opportunity to index different levels of curvilinearity according to CID/pCID outcomes. While being exercised on publicly available microarray data, the CID/pCID procedure successfully identified cold-responsive genes related to three C-repeat binding factors, and was especially effective at identifying some sample-specific gene-gene interactions. Therefore, the proposed strategy may be beneficial in meta analysis to distinguish general forms of relationships from the noise. On the other hand, the strategy of constructing the gene regulatory network using the CID/pCID can stepwise choose the target node and decide the corresponding source node while eliminating the influence of the other relevant nodes. Because of the asymmetric CID/pCID values, we used this property to discriminate the direction of two nodes. Pseudo network was conducted to evaluate the performance of the heuristic approach by CID/pCID from one hundred replications with different sample sizes. As the sample size increased, the accuracy of the reconstructive pseudo network would increase. Furthermore, the proposed approach was applied to two microarray datasets. One was the known cold signaling pathway, C-repeat binding factors would induce a set of cold-regulated (COR) genes in Arabidopsis. The CID/pCID approach could successfully discover the connection between C-repeat binding factor and cold-regulated gene. The other dataset was about the basic helix-loop-helix gene family in rice, which network was undiscovered in biology. We constructed the network based on the CID/pCID outcomes to provide the suggestion for biologists. In summary, the CID/pCID method could efficiently identify the relevant variables which had various types of the association. Besides, the asymmetric CID/pCID values were used to distinguish the direction of two variables from the statistical viewpoints. Therefore, the statistical outcomes of the variable selection and gene regulated network construction based on the CID/pCID method could provide references for biologists before making an experiment on plants.
Subjects
Coefficient of intrinsic dependence
Partial coefficient of intrinsic dependence
Stepwise variable selection
Gene regulatory network
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-D96621204-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):eb38dce597825be95c4d09d06189f430