洪士灝Hung, Shih-Hao臺灣大學:資訊工程學研究所孫天秀Soon, Thean-SiewThean-SiewSoon2010-05-172018-07-052010-05-172018-07-052009U0001-1908200910551400http://ntur.lib.ntu.edu.tw//handle/246246/183405開發多核心的程式時,效能評估是非常重要的工作。雖然目前市面上有些許可用於測量同質(homogeneous)多核心平台的效能分析工具(profiling tool)和技術,但是它們多半憑藉平台上的特殊硬體支援以達到測量效能的目的。而適用於異質(heterogeneous)多核心平台的分析工具則更加缺乏,因此本篇論文提出一個可以修改高階語言原始碼(source level),在多核心平台上進行效能追蹤分析的純軟體工具集「ParallelTracer」。由於ParallelTracer無需仰賴硬體,所以可輕易移植到其他多核心平台上。而在本論文中,我們將ParallelTracer移植到IBM的 Cell異質多核心平台上對程式進行效能追踪分析,並且透過圖形界面,幫助程式設計師了解程式的行爲模式,找出潛在的效能瓶頸。此外,本論文也探討 ParallelTracer運作時對於原本程式所造成的負擔(runtime overhead)。在合理的運用時,ParallelTracer所產生的額外執行時間為9.37%,而使程式碼增加約9KB。Performance evaluation is key to the optimization of computer applications on multicore systems. While many techniques and profiling tools are available for measuringerformance on homogeneous multicore platforms, most of them depend on the hardware support from the vendors. For developing applications on heterogeneous multicore systems, very few analysis tools exist to help the developers. This thesis describes a pure software tracing toolkit, called ParallelTracer that can be ported to a variety of platforms via code instrumentation at the source level. In this thesis, we use the IBM Cell processor as a case study to demonstrate the capabilities of ParallelTracer. Our results show that ParallelTracer provides useful information forrogrammers to understand program behaviors and identify potential performance bottlenecks via graphical visualization. We also discuss the runtime overhead of ParallelTracer. With proper usage, the performance and code size overhead introduced by our toolkit are about 9.73% and 9KB, for the benchmark programs in the case study.Acknowledgements ibstract(Chinese) iibstract iiiist of Tables viiist of Figures viii Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Cell Broadband Engine . . . . . . . . . . . . . . . . . . . . . . . 4.3 Trace Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Trace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Trace Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Related Works 10.1 Performance Analysis Tools for Homogeneous Multicore Platforms . . 11.2 Performance Analysis Tools for Heterogeneous Multicore Platforms . 12.3 Trace Collection and Trace Post-Processing Framework (TCPP) . . . 13.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.3 Trace Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.4 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3.5 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Trace Collection 17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Designing Trace Buffer Management Schemes . . . . . . . . . . . . . 20.2.1 N-PPE Threads . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 One-PPE Thread . . . . . . . . . . . . . . . . . . . . . . . . . 22.2.3 Trace Gathering and Asynchronous I/O . . . . . . . . . . . . 23.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3.1 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 26.3.2 Buffer Management . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 36.4.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . 37 Trace Analysis 43.1 Analysis of Communication Patterns on the Cell Processor . . . . . . 44.1.1 Edge Representation . . . . . . . . . . . . . . . . . . . . . . . 45.1.2 Node Representation . . . . . . . . . . . . . . . . . . . . . . . 46.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Trace Visualization 50.1 Design of the Timeline Diagram Converter . . . . . . . . . . . . . . . 51.1.1 Timer Synchronization between Processors . . . . . . . . . . . 52.1.2 Resolving Two-Sided Communication Characteristic of Visualization Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Conclusion and Future Works 59.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60ibliography 61application/pdf1185242 bytesapplication/pdfen-US效能追蹤工具效能評估Cell處理器異質多核心追踪資料視覺化tracing toolsperformance evaluationCell processorheterogeneous multicoretrace visualization異質多核心系統上的效能追蹤分析工具Trace-based Performance Analysis Tools on a Heterogeneousulticore Platformthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/183405/1/ntu-98-R96922143-1.pdf