微陣列數據資料相關之統計方法

張啟仁2018-07-062018-07-062002http://ntur.lib.ntu.edu.tw//handle/246246/22601在目前台灣之各大教學醫院的研究群中，基因體醫學因大型生物資訊資料庫之進步，大型之資料能很容易地被取得、納入、分享並迅速地分析，加上在研究型之教學醫院如台大醫院，許多研究者除了在本身因臨床經驗資料能快速取得、整理之外，大都皆已投入與基因相關之研究。尤其在基因微陣列實驗更投入相當大之人力與預算。台大醫院已建立一個良好的「研究服務」團隊，包括微陣列實驗室之分子生物科學家及生物統計學家。透過此一研究服務機制、生物醫學上的新發現即能很快地散佈於所有研究者之中。因此，許多費時、費錢的實驗，也得以免除重覆執行的命運。生物資訊與生物統計資料處理之研究服務亦提供了因微陣列數列資料之產生而運應而生之資料分析相關之工作，鑑於該資料之數量龐大，因此資料的整理、除錯、建檔及分析的需求亦相對地增加。由於微陣列技術所產生之資料在實際應用上因為其間之處理過程相當複雜，也自然而然在其資料之產生過程中，引進了許多之誤差，所以該筆資料在實際分析過程中，若不有效地控制並探討其資料之可靠及正確性，那麼之後統計所得之結果就沒有其代表之意義，資料篩減(Data Filtering)之重要性亦自然而然隨之增加。因此、我們提出ACE (Alternating Conditional Expectation Transformation) 統計方法應用於資料篩減並與其他現行之方法做比較。Genomic medicine research studies in Taiwan have just begun. Thanks to the Modern laboratory tools, computer technology advances and ubiquity of the internet offer unprecedented opportunity for scientists to gain access to, share, and analyze critical data and information stored in databases over the cyberspace. Scientific discovery can be expedited and many wastefully and costly experiments can be avoided if the vast information could be stored, shared, analyzed and opened to the research scientists in any clinical research institute. One of the successful collaboration examples among physicians, laboratory scientists, and biostatisticians has been established within National Taiwan University Hospital Research Group. They have successfully implemented their own research topics in collaboration with scientist in Microarray laboratory and Bioinformatics and Biostatistics laboratory. Recent development of Microarray technology has enabled research scientists approaching their own area into a new era. With the collaboration from scientists in generating Microarray data, investigators can look into the research problems from a vast point of view, i.e. from a large data generated by array machine. However, due to the fast growing technique and astonishing data output, data analysis of this new Bioinformatic became an important issue in biomedical research. Understanding what and how the Bioinformatics can provide has given clinical physicians in medical center a starting point to reconsider an infrastructure of Bioinformatics facility as a research service resource. A Microarray facility center has been established under the direction of Dr. Jeremy Chen in which research services and education in using Microarray machine and technique were rapidly provided within NTUH. In the meantime, statistical data analysis support for Microarray data in Bioinformatic area has also been provided in NTUH research campus. A NSC supported grant “Bioinformatics Research Services Facility Using Microarray Data (NSC89-2316-B-002-035)” has provided us a good starting point to setup the connection among Microarray data analysts, Microarray data generators, and research investigators. In adjunction to the previous year support from NSC, another NSC research project “Statistical Mining Methods for Information Generated from Microarray (NSC90-2321-B002-001)” in dealing with the statistical methods is awarded and emphasize in discovering the data filtering method. Huge data sets can easily be generated from the fast speed machine, and thus the demands of data collection, clearing, selection, and data management in the database is therefore strongly needed. However, the ability to tackle such problems can only be made and solved by limited attached programs from the array management software such as Genecluster from MIT or Spotfire. However, even with a good starting point of the statistical support to the data generated from Microarray research. There still exist some potential problems in handling the data from the early data management stage to the later statistical modeling and analysis stage. Some interested issues in dealing with the data from Microarray are 1) data filtering; 2) cluster analysis; and 3) discriminant classification analysis. Our goals are to study, develop practical and advanced statistical data mining methods for Microarray data, especially when the data generated involved lots of uncertainty and thus the data filtering become the major issue in this research proposal. Formal statistical consideration of the validity of huge data set has been considered from many automatically software performed by individual PIs and their assistants. We have thus experienced the data generated from the Microarray machines have it own uncertainty. How to handle this error and how to perform the data correction become a good issue and have been discussed by many experts such as Lee, et al., and DeRisi et al. In a study to identify the possible clusters of genes, one need to eliminate the false expressed genes, it can be solved using both cell populations such as normal and abnormal genes in the experiment. This can be solved using two-stage procedure, first to use the normal vs. normal genes in expression to detect which genes are sensitive to the noise or bias of the experiment and thus to identify the “false expressed” genes. Secondly, after eliminating these “false expressed” genes, one can plot the normal vs. tumor genes to identify the influential genes. We propose using ACE (Alternating Conditional Expectation Transformation) to tackle the abnormality of the data generated from Microarray. Some discussion of this method is presented in this report. This report is under the guidance for report writing supported by the Grant of National Science Council.application/pdf412262 bytesapplication/pdfzh-TW國立台灣大學醫學院臨床醫學研究所Research ServicesMicroarrayBiostatisticsBioinformaticsACE微陣列數據資料相關之統計方法journal articlehttp://ntur.lib.ntu.edu.tw/bitstream/246246/22601/1/902321B002001.pdf