指導教授:陳銘憲臺灣大學:電信工程學研究所陳姵伶Chen, Pei-LingPei-LingChen2014-11-302018-07-052014-11-302018-07-052014http://ntur.lib.ntu.edu.tw//handle/246246/264287k-truss 是一種可以代表一個網路凝聚力大小的子圖,是在分析一個 社群網路的重要指標。然而,隨著巨量社群網路的出現,其構成的圖 會擁有百萬甚至上億個節點和邊,這導致k-truss 傳統單機版演算法所 需要的運算時間將會超乎想像的久;除此之外,如此大型的圖會無法 載入單一機器的記憶體,這也是另一個傳統演算法無法運作的原因。 目前,大資料的運算已經迫切的仰賴雲端運算,因此我們的目標是基 於雲端運算的框架上,設計出可以處理巨量資料的k-truss 演算法。在 本篇論文中,我們先就已經存在的MapReduce 版k-truss 演算法進行改 良。而由於MapReduce 的架構在處理分散式的圖運算時會因為太多的 迴圈而導致過多的IO 負載,我們轉而使用圖平行架構(graph-parallel anstractions) ,且提出一系列的理論基礎來設計一個k-truss 平行化演 算法的版本。實驗的結果顯示,從運算時間以及硬碟使用量的觀點來 看,我們基於圖平行架構所提出的k-truss 平行化演算法比其他基於 MapReduce 設計的版本來的更有效率。k-truss, a type of cohesive subgraphs of a network, is an important measure for a social network graph. However, with the emergence of large online social networks, the running time of the traditional batch algorithms for k-truss decomposition is usually prohibitively long on such a graph with billions of edges and millions of vertices. Moreover, the size of a graph becomes too large to load into the main memory of a single machine. Currently, cloud computing has become an imperative way to process the big data. Thus, our aim is to design a scalable algorithm of k-truss decomposition in the scenario of cloud computing. In this thesis, we first improve the existing distributed k-truss decomposition in the MapReduce framework. We then propose a series of theoretical basis for k-truss and use them to design an algorithm based on graph-parallel abstractions. Our experiment results show that our method in the graph-parallel abstraction significantly outperforms the methods based on MapReduce in terms of running time and disk usage.口試委員會審定書i 誌謝ii 摘要iii Abstract iv 1 Introduction 1 2 Related Work 4 2.1 Graph-Parallel Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Social Measures in Large Graphs . . . . . . . . . . . . . . . . . . . . . . 5 3 Preliminaries 6 3.1 k-Truss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 k-Truss Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Distributed k-Truss Decomposition in MapReduce Framework 11 4.1 Basic Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.2 Improved Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5 Distributed k-Truss Decomposition based on Bulk Synchronous Parallel Model 17 5.1 Definition and Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.2 Algorithm in Graph Parallel Abstraction . . . . . . . . . . . . . . . . . . 21 5.3 Correctness and Convergence . . . . . . . . . . . . . . . . . . . . . . . . 25 5.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Experimental Analysis 30 6.1 Experiments on Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . 30 6.2 Experiments on Real World Data . . . . . . . . . . . . . . . . . . . . . . 32 7 Conclusion 39 Bibliography 402775831 bytesapplication/pdf論文公開時間:2014/08/21論文使用權限:同意有償授權(權利金給回饋學校)k叢集平行運算社群網路大數據k-Truss 分解之分散式演算法Distributed Algorithms for k-Truss Decompositionthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/264287/1/ntu-103-R01942069-1.pdf