https://scholars.lib.ntu.edu.tw/handle/123456789/185004
Title: | 模擬研究定序覆蓋率對探勘簡單重複序列的影響 The Effect of Sequencing Coverage on Mining Simple Sequence Repeats by Simulation |
Authors: | 王瀅翠 Wang, Ying-Tsui |
Keywords: | 簡單重複序列探勘;低倍率定序;模擬;454sim;二世代定序技術;SSR mining;low coverage sequencing;simulation;454sim;next generation sequencing technology | Issue Date: | 2012 | Abstract: | 微衛星序列(microsatellite),又稱為簡單重複序列(simple sequence repeats, SSRs),是以1-6個核苷酸為單位,不斷重複之序列,並分佈於各物種的基因體內。因為具有廣泛分佈及高多型性的特徵,簡單重複序列常被設計為分子標誌(molecular marker)應用於各種研究。近年來,二世代定序技術(next generation sequencing technology)的出現與發展,改變了傳統上用來找尋簡單重複序列的方式,二世代定序的高通量及相對低價格,亦提供給科學家們一個新的機會,尋找更多未曾發現的簡單重複序列,然而,在預算有限的先驗實驗中,經費通常只足以進行低倍率(low coverage)之定序,此情形也將在基因體越大的物種中越漸明顯。 此研究中,我們將以模擬的方式,在低倍率定序下,探討未解序物種的簡單重複序列個數和定序倍率之相關性。模擬分為兩個步驟,首先,我們以水稻全基因體去建立三個資料庫,再利用三個資料庫中水稻的片段序列(subsequence)去組裝出有興趣物種的模擬基因體,而此模擬基因體與原始物種基因體具有相似的DNA複雜度(complexity),接著,利用模擬器454sim去模擬在不同定序倍率下454平台的定序結果,並找尋簡單重複序列。結果顯示,簡單重複序列個數隨著定序倍率增高而增加,更重要的事,此方法使我們得以利用模擬的方式,估計未解序物種之簡單重複序列個數,以幫助我們事先做預算的分配。 Microsatellites or simple sequence repeats (SSRs) are tandem repeats distributed across genomes with 1 to 6 nucleotide motifs. Because of their genomic abundance and high level of polymorphism, SSRs is designed as molecular markers to apply in a variety of researches. In recent year, the rapidly-developing next generation sequencing technology (NGST) has impacted the ways of mining SSRs. NGST not only has the advantage of higher speed and lower cost but also offers the opportunities to discover novel SSRs. However, in a pilot study, the budget may be limited and one can only afford a low-coverage sequencing project regarding to the genome of interest. The situation may be more severe when the genome size is large. In this study, we aimed to investigate the relation between the mined SSR counts and the sequencing depth for a genome whose sequence which is not yet available by simulations at low coverage sequencing. The simulation was two-fold. First, we separate whole rice genome to establish three databases. Second, we simulated a genome with approximate complexity by recombining known rice genome subsequences. Moreover, we mimicked 454 sequencing results under different coverage using 454sim and mined SSRs accordingly. The results showed that the number of mined SSRs increased as the sequencing depth increased. More importantly, this procedure provided a mean to estimate the number of mined SSRs without whole genome sequence and hence to assist to set budget in advance. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/253586 |
Appears in Collections: | 農藝學系 |
File | Description | Size | Format | |
---|---|---|---|---|
ntu-101-R99621205-1.pdf | 23.32 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.