針對MPEG-4編碼器及H.264移動估計之低功率電路設計

陳良基臺灣大學：電子工程學研究所林家平Lin, Chia-PingChia-PingLin2007-11-272018-07-102007-11-272018-07-102006http://ntur.lib.ntu.edu.tw//handle/246246/57422低功率的需求在有電源供應限制的系統之中是很重要的因素，像是行動電子裝置。在現在的行動電子裝置上，有越來越多的功能被加進來，這些功能有時是需要較大的運算的，例如影像的壓縮技術。要在行動電子裝置之中，實現這些功能，除了壓縮的效率和面積之外，低功率的設計成為了一項非常重要的考量。一個影像壓縮系統的組成，大致可以分成幾個部分，包含有移動估計、離散餘弦轉換、亂度編碼…等部分。他們在硬體實現上，各有著不同的特性，所以，不同的演算法和架構必需要被採用來完成低功率需求的設計。 MPEG-4是一個在1999年被制定的影像壓縮協定。它同時也被廣泛的應用在現今的影像壓縮系統之中，尤其在行動裝置之中，更是被廣泛的採用。它包含了基本而重要的幾項影像壓縮技術，如移動估計、離散餘弦轉換和亂度編碼。在本論文之中，我們分析了MPEG-4編碼器之中的幾個重要的原件，像移動估計和離散餘弦轉換，並發展了適合的低功率演算法及硬體架構來實現它們。我們加強了移動估計硬體之中的資料共用性，以減少了記憶體的讀取動作，並在離散餘弦轉換硬體之中，加入動態計算的設計，以此來節省了運算的功率消耗。在系統方面，我們利用了大量資料為零的特性，利用標記的方式，去除了為零資料的讀寫動作。在電路實作方面，我們更在整個編碼器之中，加入了閘控時脈的技巧，把整個編碼器的功率消耗進行了有效率的削減。在移動估計硬體上，我們可以達到僅僅0.65%的記憶體讀取，並僅造成了約0.05dB的品質下降。在離散餘弦轉換硬體上，我們可以下降約55%的運算量。標記的動作，可以省下量化之後到亂度編碼之間60%~80%的資料讀寫。最後，整個編碼器使用了TSMC 0.18 μm CMOS 1P6M製程進行下線，包含了201K的邏輯閘數和4.56 KB的記憶體，它在CIF 每秒 30張，1.3 V的工作電壓之下，達成僅 5 mW的低功率消耗要求。 H.264/AVC 是一個最新提出的影像壓縮協定，它以突出的高壓縮率而引起了大家廣泛的注意。然而，在高壓縮率的背後，它需要付出的是超出以往協定的大量運算。為了在進行大畫面的H.264編碼時，仍能達成合理的功率消耗需求，我們針對其中最需要運算的整數點移動估計部分，進行低功率的演算法和架構設計，以此來降低編碼器所造成的功率消耗，使其可以更適合地運用在如大畫面的數位攝影機…等的應用之中。Low power consumption is an important requirement on battery-limited systems, like mobile devices. Many applications on the mobile devices require high computation and high power consumption. Video encoding/decoding is one of these applications, and it needs specific low-power design on algorithms and architectures to reduce the power consumption. A video encoding system consists of different coding components, which include motion estimation (ME), discrete cosine transform (DCT), inverse discrete cosine transform (IDCT), entropy coding, and others depending on the standard. They have different characteristics on computation, and different algorithms and architectures are needed to achieve low-power requirement and maintain the performance at the same time. MPEG-4 is a video compression standard established since 1999. It has been widely adopted for video compression until now. On mobile devices,MPEG-4 simple profile is the popular standard because of its simplicity and good coding performance. It contains basic but useful encoding components, like ME, DCT, IDCT, AC/DC prediction, and variable length coding. We analyze some key components of MPEG-4 SP encoder, like ME, DCT, and IDCT, and develop suitable low-power algorithms and architectures for them. After optimizing each modules, we integrate them and propose a low-power MPEG-4 SP encoder. Power consumption of ME is reduced by fast algorithm and two dimensional bandwidth sharing architecture. power consumption of DCT and IDCT is reduced by content awareness. These algorithm can achieve much power reduction and maintain tolerable coding performance. In circuit level, fine-grained leaf-based gated-clock technique is widely applied on most registers in this design. A 2-D data sharing architecture is proposed for ME design. To reduce computation complexity, moving windows search with modified predictor scheme is adopted. It can achieve computation reduction and degrade less than 0.05dB comparing with full search. The final bandwidth requirement can be greatly reduced to 0.65% comparing with full search without data sharing. AdaptiveDCT is proposed for content-aware computation. It combines many low-power technique and solve the precision problem by coefficient scaling, hybrid architecture, and proposed content classification algorithm. The high probability of zero occurrence is exploited in IDCT and data transfer between quantization (Q) and variable length coding (VLC). Our IDCT adopts previous design proposed by Xanthopoulos [1] with coefficient scaling. It can achieve low-power characteristic in zero computation. Zero marker scheme is proposed to avoid zero-valued data transfer. Data recording of zero-valued data is implemented by registers. Therefore, memory read/write operation of zero-valued data can be avoided. It can reduce 60% to 80% memory access between Q and VLC. Finally, the encoder chip is fabricated under TSMC 0.18 µm CMOS 1P6M process. It contains 201K logic gate counts and 4.56 KB SRAM. It supports CIF 30fps encoding with acceptable performance and supports VGA 30fps as extended resolution. The post-layout gate-level power consumption estimated by the Synopsys Prime Power are 5.9 mW in I-VOP endoing and 9.7 mW in P-VOP encoding at 1.8 V in CIF 30fps encoding. The real power estimation of this chip is 2.5 mW in I-VOP encoding and 5 mW in P-VOP encoding at 1.3 V in CIF 30fps encoding. It has much power reduction from previous works. H.264 is the newest video compression standard developed by the Joint Video Team (JVT). It can reduce 39%, 49%, and 64% of bit-rate comparing with MPEG-4, H.264, and MPEG-2. Its excellent coding performance make it be widely adopted by commercial applications including digital TV broadcasting, next-generation DVD, and network streaming. The excellent coding performance makes H.264 suitable for high resolution video compression, but it also brings in huge computation overhead and consumes lots of hardware resources and power. To solve this problem, we focus on integer motion estimation (IME) part of H.264 encoder. It occupies most part of computation especially at high resolution, like high definition DV (HDTV). we propose a hierarchical-based ME algorithm which can reduce computation complexity to 0.45% from full search and improve the coding performance. Corresponding architecture can processing block matching at three different levels and support good data sharing scheme at each of them. These makes it suitable for low-power H.264 encoder design for high resolution applications.1 INTRODUCTION 1 1.1 Video Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Power Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 MPEG-4 Standard Overview . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 H.264/MPEG-4 AVC Standard Overview . . . . . . . . . . . . . . . . 5 1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 OVERVIEWOF MPEG-4 SIMPLE PROFILE 7 2.1 Temporal Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Motion Vector Prediction . . . . . . . . . . . . . . . . . . . . . 9 2.2 Block Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 AC/DC Prediction . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Transformation and Quantization . . . . . . . . . . . . . . . . 11 2.2.3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Entropy Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 RELATED LOW-POWER RESEARCHES OF MPEG-4 SYSTEM 15 3.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Multiplier-Accumulator Architecture . . . . . . . . . . . . . . 23 3.2.2 Distributed Arithmetic Architecture . . . . . . . . . . . . . . . 24 4 LOW-POWER MPEG-4 SP ENCODER DESIGN 27 4.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 ME Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.2 2-D data sharing architecture . . . . . . . . . . . . . . . . . . . 30 4.1.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Adaptive Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . 33 4.2.1 Content-Aware DCT Algorithm . . . . . . . . . . . . . . . . . 33 4.2.2 Adaptive DCT Architecture . . . . . . . . . . . . . . . . . . . 36 4.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.2 Content-Aware Block Engine . . . . . . . . . . . . . . . . . . 39 4.3.3 Power Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 40 5 CHIP IMPLEMENTATION 41 5.1 Low Power Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1 Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.2 Voltage Scaling Down . . . . . . . . . . . . . . . . . . . . . . 42 5.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.3 Design for Test Considerations . . . . . . . . . . . . . . . . . . . . . . 46 5.3.1 Implementation Result of Low-Power MPEG-4 SP Encoder . . 48 6 CONCLUSION 51 7 APPENDIX: LOW-POWER H.264/AVC IME DESIGN FOR HIGH RESOLUTION APPLICATIONS 53 7.1 Introduction to H.264 Temporal Prediction . . . . . . . . . . . . . . . . 53 7.1.1 Multiple Reference Frames . . . . . . . . . . . . . . . . . . . . 54 7.2 Variable Block Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.2.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.2.2 Motion Vector Predictor . . . . . . . . . . . . . . . . . . . . . 57 7.2.3 Skip Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.3 Hierarchical-based low-Power IME Algorithm . . . . . . . . . . . . . . 57 7.4 Low-Power H.264/AVC IME Architecture . . . . . . . . . . . . . . . . 59 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621666194 bytesapplication/pdfen-USMPEG-4編碼器H.264移動估計低功率MPEG-4 encoderH.264 IMElow power針對MPEG-4編碼器及H.264移動估計之低功率電路設計Low-Power Architecture Design for MPEG-4 SP Encoder and H.264 Motion Estimationthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/57422/1/ntu-95-R93943018-1.pdf