簡韶逸Chien, Shao-Yi臺灣大學:電子工程學研究所吳東興Wu, Tung-HsingTung-HsingWu2010-07-142018-07-102010-07-142018-07-102009U0001-2007200914225700http://ntur.lib.ntu.edu.tw//handle/246246/189197現存的多媒體技術已經深深影響人的生活,由於對計算複雜性和傳輸頻寬的限制,高品質的影像視訊處理還有進步的空間,雖然最新的視訊壓縮標準H.264/AVC可提供數十到數百壓縮比率,但由於視訊影片最後的接收者是人的眼睛,傳統的標準只使用峰值信號雜訊比(Peak Signal to Noise Ratio, PSNR)來當成壓縮視訊影像的品質指標,並沒有考慮太多在人類視覺系統(Human Visual System, HVS)的特性,因此視訊壓縮的位元分配也就沒有對人眼的感知做最佳化的處理。因此,如何在有限的頻寬內有效地為不同的視訊內容分配位元率是很重要的,使用適當的位元分配,例如在畫面中重要的區域分配到更多的位元率,相對的,不重要的區域則分配到的較少的位元,如此一來位元率將可被有效的降低並提供更好的壓縮視訊品質。此一做法的關鍵點在於如何考慮並使用HVS中人眼感知的特性,但是一個可以模擬人眼感知視覺特性進一步使用在編碼系統來降低位元率的系統,通常需要龐大計算複雜性和系統頻寬,因此它在編碼系統中常常無法達到即時處理的效果。因此,我們提出人眼知覺評估引擎的演算法,用來加強視訊編碼器在位元分配率上的功能,最後並更進一步提出有效率的硬體架構設計。 篇論文的主要目標在於模擬HVS中的人眼感知特性,提出一知覺評估引擎,其必須能分析目前系統正在處理的視訊畫面的內容,並且決定這些數據可以被分配到的位元率多寡。先,我們採用且結合結構相似度模型(Structural Similarity Model),視覺專注模型(Visual Attention Model),還有視覺感知模型(例如最小可感知差異模型(Just-Noticeable Difference, JND),對比敏感度函數(Contrast Sensitivity Function, CSF),等等)並透過結合我們提出的將上述演算法,來決定視訊畫面中每個區塊對於人眼感知的重要程度。著結合H.264視訊編碼系統的參考軟體(Reference Software) JM14.0與我們所提出的演算法,並且提出一決定H.264視訊編碼系統中量化參數(Quantization Parameter, QP)的方法,在確立提出的演算法有實際效用後,我們更進一步開發適合硬體實做的視訊內容分析器架構設計。為了節省系統頻寬,我們使用以區塊為基本單元的處理方式並且使用Level-C重複使用資料的方法,以及平行處理的硬體視覺模型設計來達到即時(Real-Time)處理效能。出的演算法藉由改變每個區塊(Macroblock, MB)的量化參數來達到視訊編碼系統中更好的位元分配,經由結合我們提出的演算法與H.264編碼器(JM14.0)與主觀視覺評估實驗,結果顯示我們的演算法在幾乎沒有損失主觀視覺品質的情況下,在量化參數24~36的範圍內節省了5~40%的位元率,對於硬體實做部分,我們設計了一處理晶片,使用TSMC 0.18μm技術製程,晶片面積為3.3×3.3mm2,Power consumption 83.9mW,最高處理能力的視訊解析度為HDTV720p (1280×720)。The existing multimedia has been affecting the life of human beings nowadays. Due to the limitation of computation complexity and transmission bandwidth, the data processing of the high quality video needs improvement. The newest video compression standard, H.264/AVC, offers tens of to hundreds of compression ratio. The final receiver of the video information is human eyes. However, the traditional standard only uses Peak Signal-to-Noise-Ratio (PSNR) as the quality index for compressed video bit stream. PSNR index does not consider the properties in human visual system (HVS). The bit allocation of the video bit stream is usually not optimized for the perception of human eyes. How to allocate the bit rate for different content of the video effectively within a limited bandwidth is important. With proper allocation of bits, such as more bits for important area in one frame and fewer bits for indifferent area, the bit rate can be reduced. In other words, the compressed video shows better perceptual quality compared with the other compressed video in the same bit rate. The key point in bit allocation is considering the human eye perception in HVS. But the system which can model properties in the human eye perception and reduce the bit rate of the video bit stream with offering the same perceptual quality often needs a huge computation complexity and system bandwidth. It cannot satisfy the real time requirements in video encoding systems. Therefore, we proposed a bio-inspired human eye perception evaluation algorithm, which can improve the functionality of bit allocation of video encoders, and we further proposed an efficient hardware architecture.he main target of this thesis is modeling the properties in HVS. One perception evaluation engine must analyze the content of current video frame data and determine the bit allocation for these data. We adopt and combine the structural similarity model, visual attention models, and the visual sensitivity models (including Just-Noticeable-Distortion (JND) model, and Contrast Sensitivity Function (CSF)) to get the weighting of importance of human eye perception for each macroblock (MB) in video frame via a proper fusion algorithm. Co-operating with H.264 video encoding system, we further developed the algorithm and system architecture which is suitable for hardware implementation to analyze the video content and then proposed a scheme to determine the quantization parameter in the encoding system. To save the system bandwidth, we employed the macroblock-based processing with Level-C data reuse scheme as our basic unit of processing flow, and parallel processing for the each hardware of the visual model.he proposed algorithm achieves better bit allocation for video coding systems by changing quantization parameters at MB level. With simulations of the cooperation of our proposed evaluation engine and the H.264 encoder in JM14.0 and subjective experiments, results show that our algorithm achieves about 5−40% bit-rate saving in the QP range of 24− 36 without perceptual (visual) quality degradation. For the hardware implementation of the proposed evaluation engine, the chip is taped out using TSMC0.18μm technology. The chip size is about 3.3×3.3mm2, and the power consumption is 83.9mW. The processing capability is HDTV720p (1280×720).Abstract ix INTRODUCTION 1.1 The Compression in Video Coding Standard . . . . . . . . . . . . 1.2 Human Visual System . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Quality Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 6 THE RELATED RESEARCH WORKS OF VISUAL CONSIDERATIONOR VIDEO CODING 9.1 Efficient Coders with Visual Consideration . . . . . . . . . . . . . 9.2 Visual Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Visual Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Visual Quality Assessment . . . . . . . . . . . . . . . . . . . . . 14 PROPOSED PERCEPTUAL MODEL 15.1 Overview of the Proposed Algorithm . . . . . . . . . . . . . . . . 16.2 Perceptual Model for Encoding Intra Frames . . . . . . . . . . . . 17.3 Perceptual Model for Encoding Inter Frames . . . . . . . . . . . . 19.3.1 Spatial Consideration . . . . . . . . . . . . . . . . . . . . 19.3.2 Temporal Consideration . . . . . . . . . . . . . . . . . . 28.4 Fusion of the Perceptual Models . . . . . . . . . . . . . . . . . . 32.4.1 QP of MB of Intra Frames . . . . . . . . . . . . . . . . . 32.4.2 QP of MB of Inter Frames . . . . . . . . . . . . . . . . . 32 HARDWARE IMPLEMENTATIONOF THE PROPOSED PERCEPTUALODEL 35.1 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . 36.2 Color Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.3 Simplified Skin Color Detection . . . . . . . . . . . . . . . . . . 44.4 SSIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.5 JND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46.6 Chip Design Flow and Specification . . . . . . . . . . . . . . . . 49 FPGA VERIFICATION AND EMULATION 55 EXPERIMENTAL RESULTS OF THE PROPOSED PERCEPTUALODEL 61.1 Bit-rate Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Subjective Perception Experiment . . . . . . . . . . . . . . . . . 62 CONCLUSIONS 77eference 797322488 bytesapplication/pdfen-USH.264視訊編碼人眼感知系統注視模型感知模型對比敏感函數對比Video encoderH.264InterIntraHuman visual systemAttention modelPerceptual modelSSIMJNDMotionContrast sensitivity functionContrast感知導向視訊編碼器:人眼感知分析引擎之硬體架構設計及其於H.264 視訊編碼器之應用Perception-Aware Video Encoder: Hardware Architecture Design of Bio-Inspired Human Eyes Perception Evaluation Engine for H.264 Video Encoderthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/189197/1/ntu-98-R96943006-1.pdf