指導教授:王偉仲臺灣大學:數學研究所朱哲明Chu, Che-MingChe-MingChu2014-11-302018-06-282014-11-302018-06-282014http://ntur.lib.ntu.edu.tw//handle/246246/264016隨著科技的進步,單核心電腦運算速度已經快要達到上限值,所以現在電腦都朝著多核心運算發展,因此科學計算也須跟著發展出平行運算。在科學計算中,大型稀疏線性系統一直以來都扮演了極為重要的角色。我們討論的方法是對一般稀疏矩陣,而我們使用的方法是利用巢狀分割法重排稀疏矩陣,以利於以舒爾法來平行解線性系統。但我們的只完成對稱正定矩陣的程式。 因為BLAS3對全滿矩陣運算在GPU上運算相當的快,也因為舒爾補矩陣幾乎都是全滿矩陣,又因為全滿矩陣通常較適合直接法,所以在本篇文章中,我們全部是以直接法解線性系統,並不會以疊代法解舒爾補矩陣。但這樣就需要計算出完整的舒爾補矩陣,這會帶來三點問題:第一,需要大量記憶體來儲存舒爾補矩陣;第二,分解舒爾補矩陣所需的時間會非常多。所以,我們需發展出一套方法來解決以上二點問題。 巢狀分割法能夠重排矩陣,使矩陣分成一部分可完全平行而另一部分不容易平行分解。而巢狀分割法重排矩陣有著特殊的性質,能夠使得舒爾補矩陣仍然與重排後的矩陣有相同的結構,也就是說舒爾補矩陣也有可以平行的部份(但可平行的效果會越來越差),因此我們想利用此一性質,來增加平行效果。CPUs have reached their clock rate limits due to physical constrains. Parallel computers are increasingly used to achieve higher computing performances. How computational algorithms and codes can take advantages of parallel computers thus become more and more important. In scientific computing, solving large sparse general linear systems is one of critical problem. The nested dissection recording can reorder a sparse matrix to allow parallelism of triangular factorization on block sub-matrices. The key of this approach is how we solve embedded linear system corresponding to the Schur complement. While existed methods solve this embedded linear system iteratively, we carefully study the hierarchical of the Schur complement so that we can use direct methods to solve the embedded linear system recursively. We demonstrate the advantage of our approach by focusing on an implementation on CPU-GPU hybrid system for symmetric positive definite linear systems. Numerical results suggest the proposed hierarchical Schur method is promising in small or large parallel computer cluster with many-core accelerators.Contents i List of Figures iv 中文摘要 vii Abstract viii 1 Introduction 1 1.1 Literature Review 2 1.2 Motivation 2 1.3 Purpose 3 2 Background 5 2.1 Nested Dissection for Sparse Matrix 5 2.1.1 One-level Nested Dissection 6 2.1.2 Two-level and Higher-level Nested Dissection 6 2.1.3 Separators Gathering 8 2.2 SchurComplementMethod 9 2.3 Nested Dissection Method and Schur Complement Method Combination 10 2.4 Direct Solver for Sparse SPD Matrix 12 2.5 GPU Accelerator for Scientific Computing 13 2.6 Challenge 13 2.6.1 The Linear System with the Schur Complement 13 2.6.2 The Sparse Multiple Right-hand-side 14 2.6.3 Communication Between Computer and Computer 14 3 Hierarchical Schur Complement Method 16 3.1 Schur Complement Matrix of NDG Matrices 17 3.1.1 Nonzero Submatrices of NDG Matrices 19 3.1.2 Nonzero Submatrices of the Schur Complement of NDG Matrices 21 3.2 Basic Idea of Hierarchical Schur Complement Method 24 3.2.1 The n Schur Complement Matrices 24 3.2.2 Solving the Linear System with G(n) 25 3.3 Communication Reduction and Improvement for Hierarchical Schur Complement Method 29 3.3.1 Numerical Data Storage Format 32 3.3.2 Compressing Matrices 33 3.3.3 Comparing with Basic Idea 44 3.4 Combining Right-hand-side 45 4 Numerical result 475581960 bytesapplication/pdf論文公開時間:2014/08/17論文使用權限:同意有償授權(權利金給回饋學校)線性系統圖形處理器舒爾法巢狀分割法直接法疊帶法以多層次舒爾法在 CPU-GPU 叢集上解大型稀疏線性系統Hybrid Hierarchical Schur Solvers for Large Sparse Linear Systems on CPU-GPU Clusterthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/264016/1/ntu-103-R00221036-1.pdf