Few Data Shuffles to Upgrade Whole-Function Vectorization
Date Issued
2015
Date
2015
Author(s)
Han, Cheng-Ting
Abstract
General-purpose computation on GPUs, commonly abbreviated as GPGPU, has recently received great attention in virtue of its excellent parallel computing power. Once particularly designed for computer graphics and difficult to program, today’s GPUs are general-purpose parallel processors with support for accessible programming interfaces and industry-standard languages such as C. Among general-purpose programming languages, OpenCL is the most special one because it is the first open standard for cross-platform and parallel programming of heterogeneous systems. In 2011, Saarland University publish a paper, Whole-Function Vectorization, to make OpenCL kernels run efficiently on CPUs, and in 2012 same authors published the continuation, Improving Performance of OpenCL on CPUs, to further optimize the process of the vectorization. By observing many kernels of applications, we discover there are some kinds of static divergences resulting from the get_global_id OpenCL function. These static divergences are treated as varying branches by Whole-Function Vectorization, thus the compiled codes are longer and run with less efficiency. Therefore in this thesis, we propose a mechanism with few data shuffles to upgrade Whole-Function Vectorization. By data-shuffle algorithm and some revisions on Whole-Function Vectorization, we transform the treatment to static divergences from varying branches to uniform branches, thus we gain great speedup to the execution time of kernels with static divergences. We apply this work to the version of Whole-Function Vectorization adjusted by the CSE department of MediaTek cooperation and gain 1.16-1.25x speedup when testing on famous Rodinia benchmarks.
Subjects
Whole-Function Vectorization
OpenCL on CPUs
SIMD instructions
Data Shuffle
Static Divergence
Optimization
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-104-R00922070-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):622ce00fdcc179cfd92ed052aee623a2
