陳良基Chen, Liang-Gee臺灣大學:電子工程學研究所程之奇Cheng, Chih-ChiChih-ChiCheng2010-07-142018-07-102010-07-142018-07-102009U0001-2807200910510100http://ntur.lib.ntu.edu.tw//handle/246246/189148在達到系統規格的情況下降低功率消耗對於目前超大型積體電路(VLSI)設計來說,是一項非常重要的設計方向,也就是以高能量使用效率為目標的設計。我們認為,目前這方面的架構設計可以分為三個大方向:信號處理電路之最佳化、晶片內記憶體之低功率使用以及晶片外記憶體之低功率使用。論文探究了高能量使用效率的設計技巧並且提供了五顆晶片之設計與實作作為設計實例。信號處理電路之最佳化已經在文獻中被廣泛地討論,因此在本論文中,將會以晶片內以及晶片外記憶體之低功率使用為主軸。取晶片外記憶體所引起的功率消耗通常佔整體系統功率消耗的七成,因此是以高能量使用效率為目標的晶片設計中非常重要的一環。我們在這裡討論三個提升能量使用效率之技巧:嵌入式壓縮、資料重複利用、以及系統晶片整合。我們設計以及實作了三顆晶片以做為晶片外記憶體最佳化的設計的實例:使用於功率感知視訊壓縮系統的嵌入式壓縮/解壓縮晶片可以有效降低視訊壓縮系統中62%的晶片外記憶體存取功率;可調適性視訊壓縮系統中之update stage設計利用晶片內資料之重複利用降低61%的晶片外記憶體存取功率;最後,iVisual整合型智慧視訊感測器系統單晶片利用系統整合完全地消除晶片外記憶體存取的需求。們也提出三個技巧以提升對於晶片內記憶體使用之能量使用效率:最小字元長度之分析方法論、晶片內記憶體階層設計、以及降低資料暫存時間。二維多層級的離散小波轉換字元長度分析是我們提出的第一個設計實例,我們提出的分析方法可以在大幅降低分析複雜度的情況下保證避免信號溢位以及達到0.1dB PSNR的分析精確度;我們提出的Multiple-lifting二維離散小波轉換架構利用晶片內記憶體階層可以有效降低78%的晶片內記憶體功率消耗;而最後一個設計實例是使用降低資料暫存時間之JPEG 2000壓縮/解壓縮晶片,記憶體功率消耗有效減少95%。Reducing power consumption while meeting the throughput requirement has been important in designing a VLSI for most, if not all, applications. We think most power efficient VLSI design techniques can be classified into the following three categories: power optimization of signal processing circuits, on-chip memory power reduction techniques and off-chip memory power reduction techniques.n this thesis, those power efficient VLSI design techniques are explored with real design examples. Five chips are implemented and measured to verify the design techniques. The power optimization techniques for signal processing circuits have been thoroughly and systematically discussed in literatures. Therefore, we will lay stress on design techniques for off-chip memory power reduction techniques and on-chip memory power reduction techniques.hree design techniques about off-chip memory access reduction are discussed. According to the data reported by TOSHIBA, the power consumption of off-chip memory access occupies more than 67\% of the total power consumption in a portable multimedia recording system. We discussed three techniques to reduce off-chip memory access: embedded compression, on-chip data reuse, and SoC integration. Three design examples are provided. A multi-mode embedded compression codec chip reduces 62\% of off-chip memory power in video coding systems; an update-step engine in a scalable video encoder reduces 61\% of off-chip memory power by combining different on-chip data reuse schemes; iVisual intelligent visual sensor SoC eliminates all off-chip pixel data traffic by use of an SoC architecture.nother three design techniques for on-chip memory power reduction are discussed, and they are on-chip memory bit-width requirement analysis, on-chip memory hierarchy design, and lifetime reduction of intermediate data. There are four design examples shown in this part. A bit-width analysis methodology for multi-level 2-D discrete wavelet transform (DWT) provides a tight upper bound for dynamic range and an accurate round-off error analysis with 0.1dB PSNR prediction error; multiple-lifting 2-D DWT scheme reduces 78\% of SRAM power with an on-chip memory hierarchy scheme; the memory hierarchy design of iVisual also reduces 62\% of total power consumption; a level-switching DWT scheme for JPEG 2000 codec reduces 95\% of SRAM power by reducing the intermediate data lifetime.n brief, we explore, design, and implement a series of power efficient VLSI design techniques. Hopefully, through the systematic discussion, this thesis can bring some helpful information to designers.1 Introduction 1.1 The Importance of Power Efficient Digital VLSI Design . . . . . . . . 1.2 Overview of Power Efficient VLSI Design Techniques . . . . . . . . . 2.3 Power Optimization of Signal Processing Circuits . . . . . . . . . . . . 4.3.1 Parallel/Pipelining Design . . . . . . . . . . . . . . . . . . . . 6.3.2 Multi-Threshold Techniques . . . . . . . . . . . . . . . . . . . 8.3.3 Dynamic Voltage-Frequency Scaling . . . . . . . . . . . . . . . 9.3.4 Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Off-Chip Memory Power Reduction 13 Embedded Compression (EC) 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Design Example: Multi-Mode Embedded Compression Codec for Power-ware Video Coding Systems . . . . . . . . . . . . . . . . . . . . . . 20.3.1 EC in Video Coding Systems . . . . . . . . . . . . . . . . . . . 21.3.2 The Adopted SPIHT Algorithm . . . . . . . . . . . . . . . . . 22.3.3 The Proposed Four-Tree Pipelining Scheme . . . . . . . . . . . 26.3.4 Overall Scheduling of Four-Tree Pipelining Scheme . . . . . . 29.3.5 VLSI Hardware Architecture Design . . . . . . . . . . . . . . . 31.3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 36.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 On-Chip Data Reuse Schemes 43.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.3 Design Example: An Update Stage Engine in a Scalable Video Encoder 46.3.1 The Update Stage in a Motion-Compensated Temporal FilterMCTF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46.3.2 The Proposed Scheme of Deriving Inverse MVs . . . . . . . . . 47.3.3 The Proposed ME-based Level C+ MC . . . . . . . . . . . . . 50.3.4 VLSI Hardware Architecture . . . . . . . . . . . . . . . . . . . 51.3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 55.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 SoC Integration 61.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Design Example: iVisual—an Intelligent Visual Sensor SoC with 2790ps CMOS Image Sensor and 205 GOPS/mW Vision Processor . . . . . 64.3.1 The ”Sensor and Pixel Everywhere” World . . . . . . . . . . . 64.3.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . 67.3.3 Module Architecture Design . . . . . . . . . . . . . . . . . . . 69.3.4 Physical Design . . . . . . . . . . . . . . . . . . . . . . . . . . 77.3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 79.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85I On-Chip Memory Power Reduction 89 Bit-Width Analysis Methodology for On-Chip Storage 95.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.2 Bit-Width Analysis for a General Linear Time-Invariant (LTI) System . 96.2.1 Basic Principles of Dynamic Range Estimation in LTI Systems . 96.2.2 Basic Principles of Round-off Error Estimation in LTI Systems. 97.3 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99.4 Design Example: Bit-Width Analysis Methodology for 2-D Discreteavelet Transform (DWT) . . . . . . . . . . . . . . . . . . . . . . . . 99.4.1 On-Chip Buffers in 2-D DWT . . . . . . . . . . . . . . . . . . 99.4.2 Factors Influencing the Temporary Buffer Bit-Width . . . . . . 102.4.3 The Proposed Dynamic Range Analysis Methodology . . . . . 103.4.4 The Proposed Round-off Error Analysis Methodology . . . . . 106.4.5 Experimental Results and Discussion . . . . . . . . . . . . . . 110.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 On-Chip Memory Hierarchy Design 113.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Design Example 1: Multi-Lifting Scheme—for Memory Efficient VLSImplementation for Line-Based 2-D DWT . . . . . . . . . . . . . . . . 116.3.1 On-Chip Buffers in 2-D DWT . . . . . . . . . . . . . . . . . . 116.3.2 Direct Implementation: Decrease Clock Rate by Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119.3.3 Proposed N-Lifting Scheme . . . . . . . . . . . . . . . . . . . 120.3.4 Proposed M-Scan for Multiple-lifting Scheme . . . . . . . . . . 123.3.5 Results and Discussions . . . . . . . . . . . . . . . . . . . . . 124.3.6 Chip Implementation . . . . . . . . . . . . . . . . . . . . . . . 128.4 Design Example 2: the PE Register File (PERF) in iVisual Chip . . . . 129.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Data Lifetime Reduction for Intermediate Data 133.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Design Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.3 Design Example: Level-Switching DWT in JPEG 2000 Codec with aarallel Embedded Block Coding Architecture . . . . . . . . . . . . . . 137.3.1 JPEG 2000 Standard . . . . . . . . . . . . . . . . . . . . . . . 137.3.2 The Level-Switching Scheduling . . . . . . . . . . . . . . . . . 140.3.3 Architecture of the level-switching DWT . . . . . . . . . . . . 145.3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 146.3.5 Impacts Made by the Developed Techniques . . . . . . . . . . . 146.3.6 Chip Implementation and Comparisons . . . . . . . . . . . . . 150.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151II Conclusion 153 Conclusion 155.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 The Effects Caused by the Developed Techniques . . . . . . . . . . . . 157.3 The Correlations among the Discussed Techniques . . . . . . . . . . . 161ibliography 1633464868 bytesapplication/pdfen-US數位低功率積體電路視覺辨識影像壓縮多媒體VLSISIMDRecognitionVideo CodingMultimedia針對高能量使用效率為目標所進行之數位積體電路設計技巧探究An Exploration of Power Efficient Digital VLSI Design Techniquesthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/189148/1/ntu-98-F93943033-1.pdf