異質Hadoop下的平行度利用

指導教授：廖世偉臺灣大學：資訊網路與多媒體研究所李曜誠Lee, Yao-ChengYao-ChengLee2014-11-292018-07-052014-11-292018-07-052014http://ntur.lib.ntu.edu.tw//handle/246246/263437隨著巨量資料的興起，Apache Hadoop也逐漸受到關注。Apache Hadoop有兩個重要的核心：Hadoop Distributed File System與MapReduce架構。MapReduce是一個分散式運算的計算模型。然而，MapReduce還不夠有效率。MapReduce的實作並沒有充分利用平行度進行平行處理達到加速，反而採用循序化的方法來運算。為了解決這個問題，本論文提出了一個新的Hadoop架構，充分利用平行度來進行平行處理運算。為了達到更好的效能，我們也利用GPU的運算能力來加速整個程式。除此之外，為了充分利用CPU與GPU來減少執行時間，我們也提出了一個排程方法來動態的分配運算在適當的資源上。我們的實驗結果顯示出我們提出的系統與Hadoop相比加快了1.45倍。With the rise of big data, Apache Hadoop had been attracting increasing attention. There are two primary components at the core of Apache Hadoop: Hadoop Distributed File System(HDFS) and MapReduce framework. MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm on a cluster. However, MapReduce framework is not efficient enough. With the parallelism of mapper, he implementation of Hadoop MapReduce does not fully exploit the parallelism to enhance performance. The implementation of mapper adopts serial processing algorithm instead of parallel processing algorithm. To solve these problems, this thesis proposed a new Hadoop framework which fully exploit parallelism by parallel processing. For better performance, we utilize GPGPU’s computational power to accelerate the program. Besides, in order to utilize both CPU and GPU to reduce the overall execution time, we also propose a scheduling policy to dynamically dispatch the computation on the appropriate device. Our experimental results show that our system can achieve a speedup of 1.45X on the benchmarks over Hadoop.口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT ii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 Chapter 2 Background 5 2.1 Apache Hadoop 5 2.1.1 Hadoop Distributed File System 5 2.1.2 MapReduce 6 2.1.3 Overall Workflow in Hadoop 7 2.2 OpenCL 7 2.3 Aparapi 8 Chapter 3 System Design and Implementation 11 3.1 Design Goals 11 3.2 System Architecture 11 3.2.1 Extended TaskTracker 12 3.2.2 Heterogeneous Mapper 13 3.3 Integration of Heterogeneous Mapper and APARAPI 15 3.4 Intra-Node Scheduling Policy 17 Chapter 4 Related Work 19 Chapter 5 Evaluation 20 5.1 Experimental Setup 20 5.2 Benchmarks 20 5.3 Results 21 Chapter 6 Conclusion 26 REFERENCE 272909360 bytesapplication/pdf論文公開時間：2019/08/21論文使用權限：同意有償授權(權利金給回饋學校)巨量資料異質系統異質Hadoop下的平行度利用Exploiting Parallelism in Heterogeneous Hadoop Systemthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/263437/1/ntu-103-R01944049-1.pdf