Algorithm and Architecture Design for Motion Estimation, H.264/AVC Standard, and Intelligent Video Signal Processing
Date Issued
2004
Date
2004
Author(s)
Huang, Yu-Wen
DOI
en-US
Abstract
Digital video technology has played an essential role in our daily life for entertainment, communication, surveillance, and intelligent human-machine interfaces. In this dissertation, algorithms and architectures of core techniques for both current and future video applications are discussed in three different parts: block matching motion estimation, H.264/AVC encoding systems, and intelligent video signal processing.
Motion estimation (ME) is the heart of video coding systems. It is the most important module and demands the most computing power and memory access in a video encoder. In Part I of this dissertation, we first made a comprehensive survey of ME algorithms and architectures during the last two decades (1981-2004). All fast block matching algorithms (BMAs) are classified into six categories, and many of them are compared in terms of video quality and computational complexity, which provides useful guidelines for software applications. Many architectures supporting full search or fast search are introduced, and comparisons of representative designs are presented in six aspects by hexagonal plots for clear evaluation. Second, we proposed a global elimination algorithm (GEA) for fast block matching. The main concept of GEA is to divide the block matching into an initial scan of all search positions with coarse matching of candidates, followed by fine matching of candidates which are the potential ones in the initial scan. While preserving the same quality as full search, GEA has less than 10% of full search complexity. The corresponding GEA architecture comprising a systolic part to extract coarse features, a parallel sum of absolute differences (SAD) tree to perform matching operations, and a parallel comparator tree to find the potential candidates, is also developed. Moreover, we further proposed a parallel global elimination algorithm (PGEA) and its corresponding architecture for higher specifications. Our design is 10 times more area-speed efficient than full search architectures. Third, we proposed a computation-aware (CA) BMA to obtain better motion vectors with real-time constraints in a computation-limited and computation-variant environment. Different from prior CA BMAs in which random access of macroblocks is inevitable, our one-pass flow can not only significantly reduce the memory size but also effectively utilize the context information of neighboring macroblocks to achieve faster speed and better quality. Moreover, video quality can be further improved with the adaptive search strategy. Our one-pass algorithm can save 70% of the processing time while obtaining the same quality in comparison with prior CA BMAs.
H.264/AVC is the latest international video coding standard. It can save 39%, 49%, and 64% of bitrates in comparison with MPEG-4, H.263, and MPEG-2, respectively. In Part II of this dissertation, we first proposed a context-based adaptive method to speed up the multi-frame ME, which is the most computationally intensive part in an H.264/AVC encoder. Statistical analysis is applied to the available information after intra prediction and the block matching process for the previous reference frame. Context-based adaptive criteria are then derived to determine whether it is worth searching more reference frames. Full search quality can be maintained while 76%-96% of unnecessary reference frames can be omitted. Second, we proposed an H.264/AVC intra frame coding fast algorithm and an H.264/AVC intra coder architecture. Context-based decimation of unlikely candidates, subsampling of matching operations, and interleaved full-search/partial-search strategy are adopted in the software implementation, which can reduce 45% of the total computation while keeping the PSNR degradation less than 0.3dB. As for the hardware accelerator, a four-parallel system architecture is designed with comprehensive analysis. A prototype chip with core size of 1.855x1.885mm2, which can process 16mega-pixels within one second at 54MHz, is fabricated using 0.25μm CMOS technology. Third, we proposed the first H.264/AVC single-chip encoder in the world. The core size is 7.68x4.13mm2 with 0.18μm CMOS technology. A new four-stage macroblock pipelining architecture encodes HDTV720p (1280x720) 30frames/s videos in real time at 108MHz. The new pipelining doubles the throughput and utilization of the conventional two-stage macroblock pipelining. The encoder contains five engines for integer motion estimation (IME), fractional motion estimation (FME), intra prediction (IP), entropy coding (EC), and deblocking (DB).We contributed many novel ideas to overcome the tough design challenges (3.6TOPS of computing power and 5.6TB/s of memory access on a processor).
Intelligent video signal processing is the driving force of advanced video applications, and video object segmentation is the most important pre-processing unit for object-based MPEG-4, object tracking, face recognition, sprite generation, MPEG-7 multimedia description, ...etc. In Part III of this dissertation, we first reviewed an efficient algorithm of video object segmentation. The background registration is the main idea, which can easily solve the still object problem and the uncovered background problem encountered by conventional change detection. With optimized implementation, a 450MHz Pentium III CPU can process 25 QCIF (176x144) frames in one second. Moreover, the elimination of shadow effects, combination with predictive watersheds for more accurate object boundaries, and global motion compensation for slight camera motion are also considered as enhancements of the baseline mode. Second, we proposed a simple but effective algorithm for a pan-tilt camera to automatically track one moving object. The proposed tracking algorithm collects the background information at the grid points of camera positions and then compares the captured frame with the background at a grid point for determining the next grid point. A moving object is thus kept in the middle of the image. Block-based processing and skin color detection are used to reduce computation and to favor human faces, respectively. Many practical situations are tested, and our tracking algorithm has been successfully integrated into a commercial surveillance IP camera. Third, we proposed a low complexity descriptor-based face recognition. Descriptors with translation-, rotation-, and scaling-invariant properties are used as the input vectors to the feature extraction kernel instead of raster scanned image pixels, making our method much more reliable than conventional pixel-based algorithms. What is more, the computational complexity and the memory requirement are significantly reduced by millions of times due to the dimension reduction of input vectors and the covariance matrix. The processing time to calculate the projection directions is reduced from several ten hours to a few seconds.
In brief, digital video techniques are contributed in three directions. The proposed motion estimation can be applied in all video coding standards. The proposed H.264/AVC encoding system is the leading design in the world and brings many new concepts. The proposed video segmentation, object tracking, and face recognition will play the key roles of structured videos and intelligent surveillance systems. We sincerely hope that our research results can make progress for the convenience of human life.
Subjects
H.264/AVC標準
移動估計
智慧型視訊信號處理
H.264/AVC Standard
Intelligent Video Signal Processing
Motion Estimation
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-93-F89921049-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):5525438d6468ef036e40a037459db11d
