陳少傑Chen, Sao-Jie臺灣大學:電機工程學研究所林光輝Lin, Guang-HueiGuang-HueiLin2010-07-012018-07-062010-07-012018-07-062009U0001-1208200913554600http://ntur.lib.ntu.edu.tw//handle/246246/187998這論文是一個研究專案的成果,旨在開發嵌入式多媒體系統使用的晶片上多核處理器架構。近年來晶片上多核處理器已經成為數位電路設計,計算機輔助設計,以及嵌入式系統開發的焦點。我們的重點是設計一種新型的基於PLX的單指令多資料指令集架構的晶片設計平台。本論文探討研究成果的幾個面向,包含各種單處理器與多處理器的微架構,系統層級軟硬體協同設計和協同驗證,以及平行化的方法。This Dissertation is the outcomes of a research project aiming at developing multi-processor System-on-Chip (SoC) architecture for embedded multimedia systems. Since its inception a decade ago, SoC has captured the attentions of application specific integrated circuit (ASIC) design houses, computer aided design (CAD) companies, and embedded system developers. In particular, the immense popularity of killer multimedia gadgets, such as the iPod and smart phone, has fueled unprecedented interests in developing new generation multimedia SoC systems.e focused on the design of a novel SoC platform based on a PLX Subword-Parallel Single Instruction Multiple Data (SWP-SIMD) instruction set architecture. Most of the materials included in this Dissertation are drawn from the outcomes of our research project. Several single-processor and multi-processor micro-architectures are deeply studied and adapted to our design. However, the high level of integration also brings great challenges to system designers. Hardware and software are necessarily becoming convergent and must be fully concurrent design endeavors. The system level hardware/software co-design and co-verification methodologies are also discussed in this Dissertation.ABSTRACT iIST OF CONTENTS iiiIST OF FIGURES vIST FO TABLES viiIST OF CODES ixHAPTER 1. INTRODUCTION 1HAPTER 2. ASIP DESIGN 5.1 PLX Processor Design 6.1.1 SWP-SIMD 6.1.2 Fixed Point 11.1.3 Permutation 11.1.4 Saturation Arithmetic 12.1.5 Critical Path Analysis 12.2 Implementation of ME on PLX 14.3 PLX2 Processor Design 19.3.1 MAC on VLIW 20.3.2 Reconfigurable VLIW/SIMD 21.3.3 VLIW Limitation 23.3.4 SMT 26.3.5 Power Efficiency Consideration 29.3.6 PLX2 Performance 30HAPTER 3. SYSTEM LEVEL DESIGN AND VERIFICATION 35.1 Memory Sharing 36.2 Message Pass over Private Cache 38.3 TLM 40.4 OpenMP to TLM 42HAPTER 4. PARALLELIZATION 59.1 Vectorization 59.1.1 Dependence Analysis 60.1.2 Loop Normalization 62.1.3 Loop Transformation 63.1.4 Dependence Removal 63.1.5 Strongly Connected Component 65.1.6 Loop Distribution 66.2 SIMDization 67.2.1 Control Flow Conversion 67.2.2 Memory Alignment 68.2.3 Permutation Optimization 70.2.4 Subword Fusion 71.2.5 Matrix Transposition 71.2.6 Reduction 72.2.7 Loop Unrolling 73.3 ILP Scheduling 74.3.1 Software Pipelining 74.3.2 Basic Block Extension 75.4 TLP Scheduling 76.4.1 Profiling 76.4.2 Structuring 79.5 SIMDization for Memory Access Redundancy Optimization 81.5.1 Spatial Image Filter 83.5.2 SAD 88.5.3 Matrix Multiplication 92.5.4 Performance Analysis 98HAPTER 5. CONCLUSION 101EFERENCES 103IOGRAPHY 107986258 bytesapplication/pdfen-US晶片上系統多核處理器SoCMulti-coreProcessor晶片上多核處理器與其驗證模型之設計Design of On-chip Multi-Processor and its Verification Modelthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/187998/1/ntu-98-D87921034-1.pdf