https://scholars.lib.ntu.edu.tw/handle/123456789/174382
標題: | 可支援720掃描線高解析數位視訊之H.264/AVC 標準弁鉠s碼器 Design and Implementation of H.264/MPEG-4 AVC Encoder for SDTV/HDTV Application |
作者: | 陳東杰 Chen, Tung-Chien |
關鍵字: | 編碼器;標準弁;積體電路;影像壓縮;VLSI;JVT;standard;h.264;video;compression;AVC | 公開日期: | 2004 | 摘要: | H.264/MPEG-4 AVC 是最新一代的壓縮標準,它提供了近50%壓縮效能的改進,伴隨而來的便是複雜度大量的上升。在本論文中,提出了一個實現H.264視訊編碼系統的四階層巨集區塊管線化系統架構,以及各個階層中核心運算的硬體架構。首先我們透過軟體C模型對H.264/MPEG-4視訊編碼的演算法進行分析,將硬體實現上的問題點出,並再以Verilog C模型驗證我們的系統架構。四階層巨集區塊管線化方法,配合以硬體為導向的演算法及排程,能避開編碼迴圈及演算法相依性的限制。同時根據演算法套用不同的平行度和架構於各個階層的核心運算,達到高度的硬體使用率,有效的將整個視訊編碼的演算法轉換到硬體上。此外,藉由內部記憶體的分享和內部資料傳輸,將頻寬的需求由Tera等級降至223 Mbyte/sec。再藉由調整Lagrangian mode decision使得壓縮品質更進一步的提升。 在模組架構設計方面,為了資源H.264的釵h新的壓縮工具,提出了釵h新的硬體架構及排程。整數點移動估計中的8套128運算單元的Parallel SAD Tree架構及蛇行掃描流程;小數點移動估計中Lagrangian外部模式選擇迴圈演算法的分解及4x4運算單元和線上內插法的架構;內部估計所用的四平行度之可重組內部預測運算單元,及4x4區塊16x16區塊交錯式排程和部分失真中斷的演算法;材質轉換模組中的多弁鄍i重組式轉換器;位元編碼模組的4 x4 管線排程CAVLC編碼器;以及內建可重組的轉置暫存器之去區塊濾波器。整合了這些關鍵性的模組,配合所提出的編碼系統,即可及時壓縮H.264 Baseline Profile。當運作頻率為81 MHz時,可支援SDTV每秒30張四張參考圖框;當運作頻率為108 MHz時,可支援HDTV每秒30張一張參考圖框。 最後本論文利用UMC 0.18 μm 1P6M製程技術實做H.264編碼晶片。根據合成與佈局繞線結果,這顆原型晶片輯閘總數為969K,大小為9.92x4.93mm2 ,最大的操作頻率可達120MHz。當操作於120MHz,1.8伏特時,必v的消耗為634.9mW。 The new video coding standard, H.264/AVC, developed by Joint Video Team (JVT) significantly outperforms previous standards in compression due to the new features including motion estimation (ME) with variable block sizes and multiple reference frames, intra prediction, context-based adaptive variable length coding (CAVLC), context-based adaptive binary arithmetic coding (CABAC), in-loop deblocking filter and more. Compared with MPEG-4, H.263, and MPEG-2, H.264/AVC can achieve 39%, 49%, and 64% of bit-rate reduction, respectively. The huge computational complexity is the penalty. Up to 3.6 Tera-instructions per second of computational complexity and 5.6 Tera-bytes per second of memory access are required for baseline profile level 3.1 (one reference frame and H+-64/V+-32 full search). It is obvious that hardware acceleration is a must for real-time applications. However, the reference software adopts sequential processing of each block in the macroblock (MB) and creates data dependencies that are harmful for parallel processing and MB pipelining. The video coding system with traditional two-stage MB pipelines, prediction (ME) and block engine (BE=MC+DCT+Q+IQ+IDCT+VLC), cannot be applied to H.264/MPEG-4 AVC efficiently because of the much more complex prediction procedures and the reconstruction loop that should not be separated with prediction. In this thesis, the first H.264/MPEG-4 AVC VLSI encoding system is proposed. According to our analysis, five major functions, integer motion estimation (IME), fractional motion estimation (FME), intra prediction (INTRA), entropy coding (EC), and deblocking (DB) are mapped into four MB pipeline stages with hardware-oriented algorithms and sophisticated scheduling to enable parallel processing and MB pipelining. The bandwidth requirement is reduced by utilizing shared memories and local data transmission. The improved Lagrangian multiplier can enhance the compressed video quality by up to 1.2 dB at high bitrates for large frame size with large motion compared with reference software. To support the new features of H.264/MPEG-4 AVC in each MB pipeline stage, several new architectures are proposed. In IME stage, parallel array of eight 128-PE SAD trees are designed with snake scan data flow to achieve 100% of processing element (PE) utilization and low on-chip SRAM bandwidth. Reuse of overlapped search area can save 87.5% of off-chip bandwidth. In FME stage, we analyze the Lagrangian inter-mode decision loops and provide decomposing methodologies to obtain the optimized projection in hardware implementation. The proposed architecture providing 36 times of parallelism per reference frame is characterized by regular flow and high utilization. In INTRA stage, architectures of reconfigurable intra predictor generator and parallel multi-transform engine are applied. Besides, interleaved schedule and proposed partial distortion elimination (PDE) scheme are used to meet the real-time constraint with only four times of parallelism. In DB stage, interleaved memory organization and an 8x4-pixel array with reconfigurable data path are used to support the 2-D filter with only one parallel-in parallel-out reconfigurable 1-D filter. Finally, highly utilized CAVLC engine is realized by dual-scan buffers for 4x4-block level pipelining in EC stage. Besides, 96-bits packer is proposed to support conversion from raw byte sequence payload (RBSP) to encapsulated byte sequence payload (EBSP). A prototype chip is implemented by using Artisan 0.18um standard CMOS cell library with UMC 0.18um 1P6M technology. The total gate count is about 970K synthesized at 120 MHz. It can support H.264/MPEG-4 AVC encoding in baseline profile level 3.0 with four reference frames under 81 MHz of operation frequency and level 3.1 with one reference frame under 108 MHz of operation frequency. The maximum processing capability is 108K MB's per second or namely HDTV 720p (1280x720) 4:2:0 30Hz video. Totally 34.72 Kbytes on-chip memory and 3.11 Mbytes off-chip memory are required. The core size is 7.68x4.13 mm^2. The average power dissipation is 635 mW when it operates at 120 MHz under 1.8 V power supply. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/57618 | 其他識別: | en-US |
顯示於: | 電子工程學研究所 |
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ntu-93-R91943022-1.pdf | 23.31 kB | Adobe PDF | 檢視/開啟 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。