CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
Date Issued
2012
Date
2012
Author(s)
D.Yu, Chenhan
Abstract
Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases.
We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation.
For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold.
Subjects
sparse matrix
GPU
parallel computing
direct method
linear system
high performance computing
Multifrontal
symmetric positive definite
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
index.html
Size
23.49 KB
Format
HTML
Checksum
(MD5):4e7c323c768c15aa723d4e3ea24c15e8
