Comparison of Next Generation Sequencing Simulators

Lin,  Shu-Hung

Comparison of Next Generation Sequencing Simulators

Date Issued

2011

Date

2011

Author(s)

Lin, Shu-Hung

URI

http://ntur.lib.ntu.edu.tw//handle/246246/253617

Abstract

Next-generation sequencing technologies can sequence large amounts of bases in a short time, enhancing the fundamental biological research. Scientists could comprehend the knowledge about genomes, transcriptomes and interactomes by sequencing at low cost. In addition, because of the massive data generating by NGS, bioinformaticians and statisticians have to find new methods to process and analyze data. Many NGS data simulators have been proposed recently. If the simulator can produce data that are reasonably similar to the real data, it will help the inference about adequate methods and the setting of experimental workflow. In this thesis, we had compared five simulators, including ART, FlowSim, MetaSim, SimSeq, and wgsim, in application of simulating Roche 454 and Illumina platform data for E. coli and rice (Oryza sativa) genomes. The simulated data were compared with public-available real sequencing data through assembling and mapping to reference genome. For simulating Roche 454 data, FlowSim took the longest time to simulate; the computing time for other simulators are competitively shorter. ART generated data that were the most similar with the real data if comparing the results of assembly and alignment. While simulating Illumina sequencing data, SimSeq spent the most time on simulations. For simulations of small genome size Illumina date like E. coli, all simulators well illustrate the real results of assembly and alignment. However, while simulating lager genome size like rice, all simulators, except of ART, got over optimistic results in estimating the N50 and maximum contig length. In this thesis, we simply analyze data roughly by assembly and alignment, which is not enough to judge the pros and cons of simulators. Therefore, further research is needed and realize the characteristics of genomes to select proper simulators.

Subjects

Next Generation Sequencing

Simulator

assemble

alignment

Type

thesis

File(s)

Name

ntu-100-R98621202-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):4a25645e78e4876835e10552302c4b6a

Comparison of Next Generation Sequencing Simulators

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)