臺灣大學: 工程科學及海洋工程學研究所黃乾綱黃駿逸Huang, Chun-YiChun-YiHuang2013-03-272018-06-282013-03-272018-06-282010http://ntur.lib.ntu.edu.tw//handle/246246/252491微型核醣核酸(microRNA)是一段非常短的非編碼核醣核酸(non-coding RNA),長度約為21~23核苷酸,基因的調控、對生物的發育有非常顯著影響。近年研究人員常利用ab initio方法以及機器學習(Machine learning)技術預測原生微型核醣核酸,並且在數據上呈現了很好的效能,因為其有著不需要序列演化資訊即可進行預測的特性。這些前人所設計的預測演算法都是針對某些特定的序列做預測,然而卻不能知道這些方法對於實際上全染色體掃描這個議題的效能表現是否也呈現一致。 在本篇論文中,我們針對預測演算法評測(Benchmark)其對於全染色體掃描的效能,希望可以得到這些預測演算法對於全染色體掃描這個議題的效能,並且找到一個適合用在全染色體掃描的預測工具。我們提出了一種系統化取樣(Systematic sampling)方式,使得利用此方法選取所得樣本的最小自由能以及配對鹼基數的分布可以和母體相似,並由母體中選取較小數量的樣本,預測工具藉由估計小數量樣本的效能,可以推得其在母體中的效能。我們利用這種系統化取樣的方法建立負資料集,以期得到各種預測演算法對於實際上全染色體掃描的效能。 最後,我們選擇了五種預測演算法進行其效能評測,Triplet-SVM在預測多環結構上擁有最好的專一性,而MiPred較適合預測最小自由能較高的單環結構序列,miR-KDE適合預測最小自由能較低的單環結構序列用以全染色體掃描發現微型核醣核酸的議題。MicroRNAs (miRNAs) are short non-coding RNAs (~21 – 23 nucleotides) participating in post-transcriptional regulation of gene expression. There have been many efforts on discovering miRNA precursors (pre-miRNA) over the years. Recently, ab initio approaches get more attention compared to comparative approaches because of ab initio discard sequence alignment and can discover species-specific pre-miRNAs. Because to systematically identify miRNAs from a genome by existing experimental techniques is difficult, the use of computational methods is a key factor in miRNA discovery , However, the success of ab initial approach has not been well evaluated and extended to genome-wide miRNA discovery. In this study, a systematic analysis is performed to figure out the theoretic sampling rate that makes the evaluation statistically significant. Furthermore, we proposed a approach to reduce the negative set, and successfully generate a compact set which is smaller than the theoretic size but can yield accurate performance evaluation. Considering that there are some prevailing negative sets, this study also proposes a mathematic model that can estimate the realistic performance based on those obtained with biased datasets. Finally , 5 pre-miRNA predictors are re-evaluated based on the proposed benchmarks. The experimental results show that the proposed benchmarks can helps researchers to realize and compare the realistic performance of alternative methods.2203707 bytesapplication/pdfen-US微型核醣核酸染色體掃描系統化取樣預測演算法效能評測microRNAwhole genome scansystematic samplingpredictorperformance benchmark微型核醣核酸預測工具對基因體實際預測效能之研究Towards Realistic Benchmarks for MicroRNA Precursor Discovery Algorithmsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/252491/1/ntu-99-R97525076-1.pdf