指導教授:王勝德臺灣大學:電機工程學研究所穆達新Mutlugün, Tahsin TürkerTahsin TürkerMutlugün2014-11-282018-07-062014-11-282018-07-062014http://ntur.lib.ntu.edu.tw//handle/246246/262940針對 FPGA 所做的高階合成方法已被廣泛地運用於高效能運算上。 隨著 OpenCL 的推出,一些高階合成的研究已經轉向將 OpenCL 引入 FPGA 來使用。本論文提出一個適用於 FPGA 的 OpenCL 架構並且著 重於記憶體存取的改善以達成最佳效能的目標。在 OpenCL 的計算區 塊裡,執行時間與區域記憶體的存取延遲時間總是存在著一個線性關 係,而這延遲時間一般會以增加平行工作量來彌補的,然而這樣的方 法很容易地就會耗盡 FPGA 上的資源。因此本文使用無衝突的多埠記 憶體,藉此將區域記憶體的存取延遲時間減至最少。實驗結果顯示多 埠記憶體能成功地提高運算速度並減少所需的平行工作量到一個可行 值來提供最高產量。High-Level Synthesis (HLS) targeting FPGAs has been widely used for high performance computing. With the introduction of OpenCL, some of the HLS research have shifted towards bringing OpenCL to FPGAs. This thesis presents an OpenCL architecture for FPGAs and focuses on memory access improvements with the goal of achieving optimal performance. In OpenCL compute blocks, there is usually a linear relation between computation time and local memory access latency. This latency is normally hidden by increas- ing the parallel workload. However, with such an approach, target FPGA device could easily run out of resources. In this work, conflict-free multi- ported memories have been used instead to minimize local memory access latency. Experiments show that multiported memories can successfully increase computation speed and reduce the required parallel workload for max- imum throughput to practical amounts.口試委員會審定書 i 誌謝 ii Acknowledgements iii 摘要 iv Abstract v 1 Introduction 1 1.1 Motivation 2 1.2 Related Work 4 1.3 Contributions 5 2 Background 7 2.1 OpenCL Architecture 7 2.1.1 Platform Model 7 2.1.2 Execution Model 8 2.1.3 Memory Model 9 2.2 Hardware Acceleration 10 2.2.1 Memory Optimization 11 3 OpenCL on FPGA 15 3.1 Host-Device Interface 15 3.2 Execution 16 3.2.1 Compute Unit 17 3.2.2 Processing Element 18 3.3 Memory Hierarchy 19 3.3.1 Global Memory 19 3.3.2 Local Memory 22 3.3.3 Private Memory 23 3.4 Multiported Memories 23 3.4.1 Multipumping 24 3.4.2 Live-Value Table 24 4 Experimental Results 29 4.1 Experimental Setup 29 4.2 Multiported Memory Results 30 4.3 Performance 32 4.3.1 Matrix Multiplication 33 4.3.2 Sobel Edge Detector 39 4.3.3 Discrete Cosine Transform 43 5 Conclusion 49 5.1 Summary 49 5.2 Future Work 50 Bibliography 513528233 bytesapplication/pdf論文公開時間:2016/03/09論文使用權限:同意有償授權(權利金給回饋學校)OpenCL課程式陣列邏輯多埠記憶體高級綜合[SDGs]SDG16OpenCL在可程式邏輯陣列上運算使用多埠共享記憶體OpenCL Computing on FPGA Using Multiported Shared Memorythesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262940/1/ntu-103-R00921087-1.pdf