Exploiting Parallelism in Heterogeneous Hadoop System

Lee, Yao-Cheng

Exploiting Parallelism in Heterogeneous Hadoop System

Date Issued

2014

Date

2014

Author(s)

Lee, Yao-Cheng

URI

http://ntur.lib.ntu.edu.tw//handle/246246/263437

Abstract

With the rise of big data, Apache Hadoop had been attracting increasing attention. There are two primary components at the core of Apache Hadoop: Hadoop Distributed File System(HDFS) and MapReduce framework. MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm on a cluster. However, MapReduce framework is not efficient enough. With the parallelism of mapper, he implementation of Hadoop MapReduce does not fully exploit the parallelism to enhance performance. The implementation of mapper adopts serial processing algorithm instead of parallel processing algorithm. To solve these problems, this thesis proposed a new Hadoop framework which fully exploit parallelism by parallel processing. For better performance, we utilize GPGPU’s computational power to accelerate the program. Besides, in order to utilize both CPU and GPU to reduce the overall execution time, we also propose a scheduling policy to dynamically dispatch the computation on the appropriate device. Our experimental results show that our system can achieve a speedup of 1.45X on the benchmarks over Hadoop.

Subjects

巨量資料

異質系統

Type

thesis

File(s)

Name

ntu-103-R01944049-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):dfe31dd6fdff3d8be59a7bd386680d98

Exploiting Parallelism in Heterogeneous Hadoop System

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)