Exploiting Parallelism in Heterogeneous Hadoop System
Date Issued
2014
Date
2014
Author(s)
Lee, Yao-Cheng
Abstract
With the rise of big data, Apache Hadoop had been attracting increasing attention.
There are two primary components at the core of Apache Hadoop: Hadoop Distributed File System(HDFS) and MapReduce framework. MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm on a cluster. However, MapReduce framework is not efficient enough. With the parallelism of mapper, he implementation of Hadoop MapReduce does not fully exploit the parallelism to enhance performance. The implementation of mapper adopts serial processing algorithm instead of parallel processing algorithm. To solve these problems, this thesis proposed a new Hadoop framework which fully exploit parallelism by parallel processing. For better performance, we utilize GPGPU’s computational power to accelerate the program. Besides, in order to utilize both CPU and GPU to reduce the overall execution time, we also propose a scheduling policy to dynamically dispatch the computation on the appropriate device. Our experimental results show that our system can achieve a speedup of 1.45X on the benchmarks over Hadoop.
Subjects
巨量資料
異質系統
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-103-R01944049-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):dfe31dd6fdff3d8be59a7bd386680d98
