Memory Access Optimization for Data-parallel GPU Architectures
Date Issued
2016
Date
2016
Author(s)
Wei, Ming-Feng
Abstract
Global memory accesses always cause the latency with hundreds of cycles, so that the performance of heterogeneous applications might degrade significantly if global memory accesses increase. In this thesis, we present a mathematical modeling that captures the memory accessing to the public within a group of threads and a metric identifying the degree of inefficient serial accesses in the GPU memory system. Based on the analysis of serial accesses in the memory system caused by global memory accessing within a work-group and among work-groups, we propose an approach to the memory access problem in GPUs. Evaluation on various kernel functions shows that kernels running with the work-group size suggested by our methodology outperforms the work-group size provided by hardware vendors. Heterogeneous applications executing on GPUs can gain the better performance without any code modification except by the memory access optimization with work-group sizing as suggested by our methodology.
Subjects
Memory Access Optimization
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-105-R02921043-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):c4fd326360b34eebc62cc6df3b4f3f34
