Parallel Architecture and Hardware Implementation of the Pre-Filter and Post-Processor for Sequence Assembly
Date Issued
2012
Date
2012
Author(s)
Kuo, Yuan-Hsiang
Abstract
In DNA Sequence Assembly, assembly tools align and merge high-throughput DNA sequences to reconstruct original DNA sequence. In the general assembly flow, the pre-processor first filters out low quality reads to decrease the error rate as well as the problem size. Secondly, bases are changed from the 2-base color coding to pseudo bases that are compatible with general assembly tools. The second step is sequence assembly. After assembly, the post-processor changes pseudo-base results back to genome sequences. Since each of the three steps takes about 1/3 of the total computation time, we decide first to improve the performance of the pre-filter and post-processor.
In this study, we proposed a pre-filter and post-processor hardware design for the assembly tool, Velvet. In the pre-filter, due to hardware advantages of parallel processing and pipelining, reads can be filtered efficiently. The execution time is linearly proportional to the number of reads but independent of the read length. In addition, it is a scalable architecture to speed up further by increasing the number of parallel computational units. As for the post-processor, we propose a new approach to change pseudo bases back to real bases by using only first several short reads in the whole sequence and the Velvet results which is also implemented in a pipeline manner.
The chips are implemented with TSMC 90 nm technology. It is capable of processing 256 M reads and the maximum read length is 63. The two chip sizes are 2367051 um2 and 774724 um2, and the chips operate at 100 MHz. When compared to software approaches, the speed-up of the pre-filter and post-processor are as high as 11000 times and 34000 times, respectively.
Subjects
Sequence Assembly
DNA read filtering
hardware implementation
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-101-R99943092-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):561e6a35f58b5c89f957d1e99ee3b96b
