https://scholars.lib.ntu.edu.tw/handle/123456789/152405
標題: | 資料平行GPU架構之記憶體存取最佳化 Memory Access Optimization for Data-parallel GPU Architectures |
作者: | 魏名鋒 Wei, Ming-Feng |
關鍵字: | 記憶體存取最佳化;Memory Access Optimization | 公開日期: | 2016 | 摘要: | 全域記憶體的存取往往會造成數百個週期的延遲,使得運作在異質多核心系統上的應用程式效能可能因存取全域記憶體機會增加而顯著降低。本論文提出一種對於記憶體存取的數學建模,它能夠去擷取一群執行緒對於全域的存取,我們也提出一個測量在GPU記憶體系統低效率逐步存取程度的因子。基於一系列對於全域記憶體存取的分析,我們提出一個針對在GPU下記憶體存取問題的方法。多種執行核心的估算結果顯示,在不修改原始碼的前提下,執行核心使用我們所建議的工作群組大小比起廠商所提供的會得到較佳的效能。 Global memory accesses always cause the latency with hundreds of cycles, so that the performance of heterogeneous applications might degrade significantly if global memory accesses increase. In this thesis, we present a mathematical modeling that captures the memory accessing to the public within a group of threads and a metric identifying the degree of inefficient serial accesses in the GPU memory system. Based on the analysis of serial accesses in the memory system caused by global memory accessing within a work-group and among work-groups, we propose an approach to the memory access problem in GPUs. Evaluation on various kernel functions shows that kernels running with the work-group size suggested by our methodology outperforms the work-group size provided by hardware vendors. Heterogeneous applications executing on GPUs can gain the better performance without any code modification except by the memory access optimization with work-group sizing as suggested by our methodology. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/276258 | Rights: | 論文公開時間: 2018/2/16 論文使用權限: 同意有償授權(權利金給回饋學校) |
顯示於: | 電機工程學系 |
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ntu-105-R02921043-1.pdf | 23.32 kB | Adobe PDF | 檢視/開啟 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。