2012-08-012024-05-14https://scholars.lib.ntu.edu.tw/handle/123456789/659716摘要:具備多個計算核心的多核中央處理器 (multicore Central Processing Unit, multicore CPU),己快速取代僅具單一核心的中央處理器;數款具有數百個計算核心的圖形處理器 (Graphic Processing Unit, GPU) 也在近年陸續推出。此類新興的計算架構,具備強大的平行處理能力與高速資料傳輸頻寬,因而成為各計算領域中,亟於探索與嘗試的新興計算機架構。例如,2010年11月全球500大超級電腦排名中,第一與第三快的電腦,就是由 GPU 所組成的平行電腦;而排名第二的電腦,則是由多核處理器所架構的平行電腦。此硬體的變化,對科學計算的演算法與軟體發展架構,已產生根本性的影響。舉例來說,目前由美國能源部、國科會與數家公司所資助的 MAGMA 計畫 (Matrix Algebra on GPU and Multicore Architectures),正針對現有的 LAPACK 程式庫全面改寫,企圖發展適合多核 CPU 與 GPU 架構的基本矩陣計算核心程式庫。換言之,平行電腦正以一種嶄新的面貌,提供強大的計算能量,讓我們有更好的機會,解決更大更難的計算問題,探索更廣更深的科學新知,進而發展更多樣的實際應用。然而,為充分利用這些強大的計算能力,許多相關科學計算演算法與軟體,都必須重新發展與設計,也因而產生許多具有挑戰的問題。 為因應此一計算環境變革,並在此一重要的前沿領域做出貢獻,經過文獻探討與仔細評估,本計劃選定 (A) 大型稀疏線性系統、(B) 最佳實驗設計、(C) 醫學影像重建、(D) 亂數產生器與其生物資訊應用等四項主題,深入研究如何利用多核中央處理器與圖形處理器,加速其運算。其中前兩項主題是科學計算的基本核心問題,後兩項則是重要的科學計算應用。對於如何在資料平行 (data parallel) 與大量密集浮點運算 (throughput intensive) 等兩項核心議題上,發展有效平行演算法與軟體,本計劃主要研究構想具體說明如下:(A) 利用多重前端 (multifrontal) 方法可將一個大型稀疏線性系統,轉換成一系列小型稠密矩陣計算的特性,將這些稠密矩陣計算進行加速計算。 (B) 最佳實驗設計中的最佳化問題,具有多重局部最佳解與隨著變數個數指數成長的可行解,我們計畫發展平行直接搜尋演算法,透過多核 CPU 與 GPU 大量平行處理能力,搜尋其全域最佳解。(C) 三維醫學影像重建以及相關最佳化問題,需要大量與快速的計算,藉以達到重建高解析度影像與降低輻射劑量的目的,我們將設計新的多層次平行演算法,透過快速存取多樣記憶體資料與高度同步計算,達到加速的目的。(D) 經由發展高速有效的平行化亂數產生器,將可大幅提昇統計模擬取樣與保密通訊的整體效能,進而增進其實際應用可能性。也計畫針對大量的基因資料進行分析,透過平行化的演算法加速運算以達到快速萃取基因資訊的目的。 我們預計此計畫可針對上述四項主題,在多核 CPU 與 GPU 的新興平行計算環境上,提出創新的演算法,實作快速有效的軟體,進而為相關科學問題深層意涵與廣泛的應用帶來正面的影響。 <br> Abstract: Two emerging computer architectures have widely and quickly affected computational sciences. They are (i) multicore Central Processing Unit (multicore CPU) and (ii) Graphic Processing Unit (GPU) equipped with several hundreds of lightweight computing threads. These two types of new processors have demonstrated extremely appealing levels of performance in floating point arithmetic and bandwidth to device memory. The November 2010 edition of Top 500 Supercomputer list shows that the first and the third fastest computer are composed by GPU, while the second one is composed by multicore CPUs. Such architecture evolutions have introduced strong needs to develop new computational algorithms and software for these emerging computing architectures. One renowned example is the US Department of Energy and National Science Foundation supported project named MAGMA (Matrix Algebra on GPU and Multicore Architectures). While growing number of sparkling stories due to the emerging techniques have been introduced, the accelerations do not come up as a free lunch. Straightforward ports to these new architectures may not result in satisfactory speedups and sophisticated algorithm developments and careful implementation considerations are necessary and challenging. To echo such exciting architecture evolutions, we have surveyed literatures and then chosen the following four focus problems in this project: (A) Multifrontal Linear System Solvers, (B) Experimental Designs, and (C) Medical Image Reconstruction. (D) Chaos-Based Pseudo Random Number Generator and Bayesian Variable Selection in Bio-information. The key issues for efficient parallel computing are data parallel and throughput intensive. We outline our main ideas for each of the focus problems and sketch why these ideas have great potential to be suitable to multicore CPU and GPU. (A) A multifrontal method for solving a large sparse linear system transforms a large sparse system (which is not easy to gain high performance on GPU) to a sequence of dense matrix operations. The dense matrix operations have great potential to be performed on multicore CPU and GPU. (B) To the optimization problems arising in experimental designs, we propose using stochastic based meta-heuristic methods to solve the problems. The proposed algorithms evaluate function values over a large amount of feasible points and thus are throughput intensive. The searches can be performed in terms of data parallel. (C) For 3D medical imaging and related optimization problems, we plan to develop multiple level parallization algorithms to achieve optimized memory accesses and best usage of multiple CPU and GPU. Direct search techniques will be used to solve the optimization problems in parallel. (D) We plan to use delay feedback perturbation to logistic to remedy the dynamic degradation; use fast, effective, and parallel PRNG to simulate the sampling process of Bayesian variable selection. The proposed PRNG has the structure of cascading independent chaotic map which suitable the SIMD property of GPU and multicore CPU. It also helps the speedup of the Bayesian variable selection. We anticipate the project will lead to innovative fast algorithms and efficient software implementations on the target multicore CPU and multiple GPU platforms. The resulting outcomes have promising scientific implications, as they enable our capabilities for tackling important computational science problems that remain challenges nowadays.多核中央處理器圖形處理器平行計算大型稀疏線性系統最佳實驗設計醫學影像重建混沌亂數產生器生物資訊Multicore central processing unitGraphic processing unitParallel computingMultifrontal linear system solversExperimental designsMedical image reconstructionChaos-based pseudo random number generatorBayesian variable selection in bioinformati新興電腦架構上的快速計算方法