TLB-Aware Block Scheduler: Improving GPU Address Translation Performance for HSA Platform
Date Issued
2015
Date
2015
Author(s)
Wu, Bing-Xuan
Abstract
Processor vendors have already embraced heterogeneous systems today. A key component is a shared unified address space in order to efficiently utilize system memory as well as obtain the programmability benefits of virtual memory for integrated CPU and GPU architecture. However, though GPUs are latency-tolerant, a couple studies show that performance of virtual-to-physical address translation is still critical. In the thesis, we do performance characterization of TLB design with per-core L1 Translation Look-aside Buffers (TLBs) and shared L2 TLB. We find current block scheduler tends to allocate blocks that use the same TLB entry to different SMs and hence cause high miss rate of L1 TLB. Experimental results show some workloads cannot be improved even with large TLB size. Therefore, we design TLB-Aware block scheduler for improving GPU address translation performance. Based on our proposed software and hardware support, block scheduler knows what TLB entries the block uses in advance so that it can assigns proper blocks to a SM to optimize the TLB reuse opportunities. The results show TLB-Aware block scheduler reduces an average of 21% global TLB miss rate. Finally, TLB-Aware block scheduler improves 10% performance on average and maximum 22%.
Subjects
Heterogeneous System Architecture (HSA)
GPGPU
Unified address space
Address translation
Block (CTA) scheduler
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-R02922094-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):8221cb297d0d77938841a47fdfb2bb7b