Incorporating sequence motifs to improve accuracy of predicting transcription factor binding sites using ChIP-seq data
Date Issued
2016
Date
2016
Author(s)
Wu, Ping-Cheng
Abstract
Transcription factors (TF) regulate gene expression in living organisms and influence multiple biological processes. Chromatin immunoprecipitation sequencing (ChIP-seq) is a technology that have been widely used to find transcription factor binding sites (TFBSs) of a specific TF among the DNA sequences of a genome. However, the accuracy of the TFBSs identified by ChIP-seq has not been systematically evaluated. In this regard, this thesis utilized TFBS information provided by the TRANSFAC database to validate the TFBSs identified by using ChIP-seq only with multiple false discovery rate (FDR). Moreover, in this thesis, a method incorporating de novo motif discovery was proposed to improve the performance of the predicted TFBSs. ChIP-seq data sampled from different cell lines was collected from ENCODE database. In general, ~60% of the peak regions identified by using the ChIP-seq only with a strict FDR cutoff (FDR = 0) contained at least one TFBS of the specific TF across multiple cell lines. In addition, by our proposed method, the prediction accuracy was improved and better than the results using ChIP-seq alone, though it was observed that the improved levels were affected by the used FDR cutoffs and discovered motifs. In conclusion, this thesis identified the accuracy problem of the ChIP-seq platform by observing from the data in a large scale, and address this issue by proposing a method incorporating de novo motif discovery. The observed results can serve as an important foundation for developing bioinformatics tools on TFBS prediction in future.
Subjects
Transcription factor
transcription factor binding site
motif discovery
Chromatin immunoprecipitation sequencing
Type
thesis
File(s)
Loading...
Name
ntu-105-R03631048-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):72d4add2a32b3dc316a6bb2910184836