洪士灝臺灣大學:資訊工程學研究所張筱薇Chang, Hsiao-WeiHsiao-WeiChang2007-11-262018-07-052007-11-262018-07-052007http://ntur.lib.ntu.edu.tw//handle/246246/53882The technique of increasing clock rate to speed up the application performance have reached bottlenecks such as power dissipation, design complexity, and diminishing returns from increasing Instruction Level Parallelism (ILP) supportcite{LDMoore}. Therefore, computer architects have designed multi-core processors by placing two or more processing cores on the same chip. However, with increasing number of cores, the simulation run time increases due to simulation complexity and its code size. These large simulation time limits the ability to predict the application performance during the design phase. In this study, we propose a performance evaluation framework aim to give a quick estimation of performance during early design phases. The framework achieves speedup by putting architecture-independent characteristics of an application into its application model and simulating the application model with a high level architecture model. We use MiBench, which is a a free, commercially representative embedded benchmark suite as our evaluation test case and verify the results by comparing it with a robust cycle-accurate simulator, ARM SoC designer. For homogenous workloads on the single-core, the dual-core, and the 4-core system, we got an average of 2.1X speedup over the ARM SoC designer. For the error rate, we got the average of 0%, 5%, 11% on the single-core, dual-core, and 4-core system. The workload bitcount has a highest error rate of all benchmarks. We propose several schemes to reduce the errors as the potential future work.List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objective and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Background and RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 3 Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 General Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 The Architecture-independent Characteristics . . . . . . . . . . . . . . 6 3.1.2 Overview of Architecture Model . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Specifications and Application Model . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1 The Function Call Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.2 The Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.3 The Application Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 The Architectural Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 4 Experimental Results on Embedded System . . . . . . . . . . . . . . . . . 30 4.1 Workloads and Evaluation Method . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Benchmark: Bicount Issues . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 5 Conclusion and FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40327014 bytesapplication/pdfen-US模擬器多核心多執行緒simulationmulti-coremulti-threading設計與實作一個快速分析多執行緒應用程式效能之多核心系統模擬環境A Rapid Simulation Environment for Application Performance Estimation on Parameterized Multi-core/Multi-threading Architecture Modelsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53882/1/ntu-96-R94922117-1.pdf