楊佳玲臺灣大學:資訊工程學研究所楊登峰Yang, Teng-FengTeng-FengYang2010-06-092018-07-052010-06-092018-07-052009U0001-1708200912165800http://ntur.lib.ntu.edu.tw//handle/246246/185429隨著製程的進步,多核心處理器已經成為實現高效能處理器的一主要方向。在多核心處理器的架構中,每一個處理核心(processor core)可以配有一獨立的私有快取記憶體(private cache),而多個處理核心更可以同時分享一大型的快取記憶體。由於整體系統的執行效能和快取記憶體的工作效率有著高度的關聯性,最佳化資料的存取模式將可以提升系統的效能,而一經過良好設計的工作排程(task scheduling)將能有效的達成此一目標。然而,多核心系統上的快取記憶體組織的高複雜度增加了以人工方式來最佳化工作排程的困難度。因此,開發一個良好的自動化工作排程最佳化工具是有其必要性的。這篇論文當中,我們試著提出一新工作排程策略,其考慮以增進快取記憶體依存性(cache affinity),減少記憶體用量(memory footprint)及同步流量(coherence traffic)的方式來減少快取記憶體上的容量失誤(capacity miss)及同步失誤(coherence miss),進而提升快取記憶體的工作效率。我們並將此一策略實現於一平行程式模組,Threading Building Blocks,的工作排程器中。程式開發者可以透過應用程式介面(application programming interface)來給定每一工作之資料使用大小及分享關係。實驗結果顯示,相較於其他工作排程策略,我們所提出的工作排程策略可以有效的減少程式執行時間,達到較高的系統效能。As the technology shrink and the increasing of the number of transistors on a single chip, multi-core processors have become major implementations to build high-performance processors. In multi-core processors, the processing cores may have separate private caches and/or share a large common cache. Since the system performance highly depends on the cache utilization, the data access pattern should be optimized to improve performance. A good task scheduling is an effective way to optimize data access pattern. However,he cache organizations of multi-core systems are quite complex and it is hard to optimize the scheduling manually. Therefore, a good tool is required. In this paper, we try to minimize capacity and coherence misses through affinity improvement, footprint reduction and coherence traffic minimization. We propose a scheduling policy which integrates these techniques to reduce cache misses effectively. We also implement the policy in the scheduler of a parallel programming model, Thread Building Blocks(TBB). Programmers can specify the footprint and sharing group of each task through API provided by TBB easily, and the scheduler would optimize the cache utilization accordingly. We believe that this tool can ease the programming complexity by hiding the details for cache utilization optimization to provide high performance.Abstract i Introduction 1.1 Overview of this Thesis...................... 5.2 Organization of this Thesis.................... 6 Related Works 8.1 Maximize data reuse ....................... 8.2 Minimize memory footprint ................... 12.3 Minimize data sharing overhead................. 14 Cache Performance Consideration in CMP 17.1 Data Reuse ............................ 17.2 Memory Footprint ........................ 20.3 Coherence ............................. 23 Cache-Aware Task Scheduling Policy 26.1 Optimize private cache performance............... 27.2 Optimize shared cache performance............... 28.3 Optimize both private and shared cache performance ..... 31 Implement Cache-Aware Task Scheduling Policy 35.1 Threading Building Blocks.................... 35.2 Target Parallel Programming Model............... 38.3 Detail Algorithm ......................... 40 Experimental Results and Evaluation 45.1 Experimental Setup........................ 45.2 Evaluation............................. 48.2.1 Experimental Results on Intel Q9300.......... 49.2.2 Experimental Results on Inteli7............. 53 Conclusion 56ibliography 572252159 bytesapplication/pdfen-US多核心工作排程快取記憶體Multi-coreTask schedulingCache多核心平台上之考慮快取記憶體之工作排程策略Cache-aware task scheduling for multi-core architecturesthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/185429/1/ntu-98-R96922040-1.pdf