Development of a Fast Algorithm for Pathogen Identification through RNA-seq
Date Issued
2015
Date
2015
Author(s)
Wu, Chin-Ting
Abstract
The diagnostic of virus, bacterial or fungus in early stage of infectious disease has been an important issue in clinical research. Except for strain or virus identification by traditional labor-intensive in vitro experiments, in-silico methods have been developed for pathogen identification on account of the innovation of next-generation sequencing. Research groups over the world have developed several methods. However, these in-silico methods are still time-consuming and compute-intensive, so that they occur practical obstacles. To address these issues, we developed an accurate and efficient algorithm for pathogen identification. Here we presented a novel algorithm to identify pathogens in four algorithmic steps through RNA-seq. First, the reads of sequences were aligned to the reference genome of human and those unable to be aligned were retained for subsequent analysis; Secondly, the retained reads were assembled to construct contigs of pathogens by repeated region of retained reads; Next, a statistical model was applied to the putative transcript contigs to remove fake contigs resulting from random assembly. We then applied BLAST to the contigs that passed the statistical test to identify the species and strains of the pathogens. To evaluate the performance, we adopted both simulation and real data sets that contains samples with pathogen infections. The results of both simulation and real data show that our algorithm have high sensitivity and accuracy. We compared our method with the other three methods and demonstrated that algorithm we developed has higher effectiveness. Furthermore, we also applied our method to the cervical cancer, lung adenocarcinoma and colorectal cancer dataset for identifying possible pathogens associated with these three kinds of cancers. In summary, our method is accurate and effective in detecting pathogens using RNA-seq data from patient samples. Moreover, the efficiency and short working time of our proposed method has enabled the use of large data set in pathogenic studies.
Subjects
next generation sequencing
RNA-seq
pathogen
SDGs
Type
thesis
File(s)
Loading...
Name
ntu-104-R02945032-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):9a655c785c7521aaa4aa44f5a09c376f