Development of a Fast Algorithm for Pathogen Identification through RNA-seq

Wu, Chin-Ting

Development of a Fast Algorithm for Pathogen Identification through RNA-seq

Date Issued

2015

Date

2015

Author(s)

Wu, Chin-Ting

URI

http://ntur.lib.ntu.edu.tw//handle/246246/272797

Abstract

The diagnostic of virus, bacterial or fungus in early stage of infectious disease has been an important issue in clinical research. Except for strain or virus identification by traditional labor-intensive in vitro experiments, in-silico methods have been developed for pathogen identification on account of the innovation of next-generation sequencing. Research groups over the world have developed several methods. However, these in-silico methods are still time-consuming and compute-intensive, so that they occur practical obstacles. To address these issues, we developed an accurate and efficient algorithm for pathogen identification. Here we presented a novel algorithm to identify pathogens in four algorithmic steps through RNA-seq. First, the reads of sequences were aligned to the reference genome of human and those unable to be aligned were retained for subsequent analysis; Secondly, the retained reads were assembled to construct contigs of pathogens by repeated region of retained reads; Next, a statistical model was applied to the putative transcript contigs to remove fake contigs resulting from random assembly. We then applied BLAST to the contigs that passed the statistical test to identify the species and strains of the pathogens. To evaluate the performance, we adopted both simulation and real data sets that contains samples with pathogen infections. The results of both simulation and real data show that our algorithm have high sensitivity and accuracy. We compared our method with the other three methods and demonstrated that algorithm we developed has higher effectiveness. Furthermore, we also applied our method to the cervical cancer, lung adenocarcinoma and colorectal cancer dataset for identifying possible pathogens associated with these three kinds of cancers. In summary, our method is accurate and effective in detecting pathogens using RNA-seq data from patient samples. Moreover, the efficiency and short working time of our proposed method has enabled the use of large data set in pathogenic studies.

Subjects

next generation sequencing

RNA-seq

pathogen

SDGs

[SDGs]SDG3

Type

thesis

File(s)

Name

ntu-104-R02945032-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):9a655c785c7521aaa4aa44f5a09c376f

Development of a Fast Algorithm for Pathogen Identification through RNA-seq

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)