Repository logo
  • English
  • 中文
Log In
Have you forgotten your password?
  1. Home
  2. College of Electrical Engineering and Computer Science / 電機資訊學院
  3. Electronics Engineering / 電子工程學研究所
  4. Algorithm and Architecture Design for Motion Estimation, H.264/AVC Standard, and Intelligent Video Signal Processing
 
  • Details

Algorithm and Architecture Design for Motion Estimation, H.264/AVC Standard, and Intelligent Video Signal Processing

Date Issued
2004
Date
2004
Author(s)
Huang, Yu-Wen
DOI
en-US
URI
http://ntur.lib.ntu.edu.tw//handle/246246/57301
Abstract
Digital video technology has played an essential role in our daily life for entertainment, communication, surveillance, and intelligent human-machine interfaces. In this dissertation, algorithms and architectures of core techniques for both current and future video applications are discussed in three different parts: block matching motion estimation, H.264/AVC encoding systems, and intelligent video signal processing. Motion estimation (ME) is the heart of video coding systems. It is the most important module and demands the most computing power and memory access in a video encoder. In Part I of this dissertation, we first made a comprehensive survey of ME algorithms and architectures during the last two decades (1981-2004). All fast block matching algorithms (BMAs) are classified into six categories, and many of them are compared in terms of video quality and computational complexity, which provides useful guidelines for software applications. Many architectures supporting full search or fast search are introduced, and comparisons of representative designs are presented in six aspects by hexagonal plots for clear evaluation. Second, we proposed a global elimination algorithm (GEA) for fast block matching. The main concept of GEA is to divide the block matching into an initial scan of all search positions with coarse matching of candidates, followed by fine matching of candidates which are the potential ones in the initial scan. While preserving the same quality as full search, GEA has less than 10% of full search complexity. The corresponding GEA architecture comprising a systolic part to extract coarse features, a parallel sum of absolute differences (SAD) tree to perform matching operations, and a parallel comparator tree to find the potential candidates, is also developed. Moreover, we further proposed a parallel global elimination algorithm (PGEA) and its corresponding architecture for higher specifications. Our design is 10 times more area-speed efficient than full search architectures. Third, we proposed a computation-aware (CA) BMA to obtain better motion vectors with real-time constraints in a computation-limited and computation-variant environment. Different from prior CA BMAs in which random access of macroblocks is inevitable, our one-pass flow can not only significantly reduce the memory size but also effectively utilize the context information of neighboring macroblocks to achieve faster speed and better quality. Moreover, video quality can be further improved with the adaptive search strategy. Our one-pass algorithm can save 70% of the processing time while obtaining the same quality in comparison with prior CA BMAs. H.264/AVC is the latest international video coding standard. It can save 39%, 49%, and 64% of bitrates in comparison with MPEG-4, H.263, and MPEG-2, respectively. In Part II of this dissertation, we first proposed a context-based adaptive method to speed up the multi-frame ME, which is the most computationally intensive part in an H.264/AVC encoder. Statistical analysis is applied to the available information after intra prediction and the block matching process for the previous reference frame. Context-based adaptive criteria are then derived to determine whether it is worth searching more reference frames. Full search quality can be maintained while 76%-96% of unnecessary reference frames can be omitted. Second, we proposed an H.264/AVC intra frame coding fast algorithm and an H.264/AVC intra coder architecture. Context-based decimation of unlikely candidates, subsampling of matching operations, and interleaved full-search/partial-search strategy are adopted in the software implementation, which can reduce 45% of the total computation while keeping the PSNR degradation less than 0.3dB. As for the hardware accelerator, a four-parallel system architecture is designed with comprehensive analysis. A prototype chip with core size of 1.855x1.885mm2, which can process 16mega-pixels within one second at 54MHz, is fabricated using 0.25μm CMOS technology. Third, we proposed the first H.264/AVC single-chip encoder in the world. The core size is 7.68x4.13mm2 with 0.18μm CMOS technology. A new four-stage macroblock pipelining architecture encodes HDTV720p (1280x720) 30frames/s videos in real time at 108MHz. The new pipelining doubles the throughput and utilization of the conventional two-stage macroblock pipelining. The encoder contains five engines for integer motion estimation (IME), fractional motion estimation (FME), intra prediction (IP), entropy coding (EC), and deblocking (DB).We contributed many novel ideas to overcome the tough design challenges (3.6TOPS of computing power and 5.6TB/s of memory access on a processor). Intelligent video signal processing is the driving force of advanced video applications, and video object segmentation is the most important pre-processing unit for object-based MPEG-4, object tracking, face recognition, sprite generation, MPEG-7 multimedia description, ...etc. In Part III of this dissertation, we first reviewed an efficient algorithm of video object segmentation. The background registration is the main idea, which can easily solve the still object problem and the uncovered background problem encountered by conventional change detection. With optimized implementation, a 450MHz Pentium III CPU can process 25 QCIF (176x144) frames in one second. Moreover, the elimination of shadow effects, combination with predictive watersheds for more accurate object boundaries, and global motion compensation for slight camera motion are also considered as enhancements of the baseline mode. Second, we proposed a simple but effective algorithm for a pan-tilt camera to automatically track one moving object. The proposed tracking algorithm collects the background information at the grid points of camera positions and then compares the captured frame with the background at a grid point for determining the next grid point. A moving object is thus kept in the middle of the image. Block-based processing and skin color detection are used to reduce computation and to favor human faces, respectively. Many practical situations are tested, and our tracking algorithm has been successfully integrated into a commercial surveillance IP camera. Third, we proposed a low complexity descriptor-based face recognition. Descriptors with translation-, rotation-, and scaling-invariant properties are used as the input vectors to the feature extraction kernel instead of raster scanned image pixels, making our method much more reliable than conventional pixel-based algorithms. What is more, the computational complexity and the memory requirement are significantly reduced by millions of times due to the dimension reduction of input vectors and the covariance matrix. The processing time to calculate the projection directions is reduced from several ten hours to a few seconds. In brief, digital video techniques are contributed in three directions. The proposed motion estimation can be applied in all video coding standards. The proposed H.264/AVC encoding system is the leading design in the world and brings many new concepts. The proposed video segmentation, object tracking, and face recognition will play the key roles of structured videos and intelligent surveillance systems. We sincerely hope that our research results can make progress for the convenience of human life.
Subjects
H.264/AVC標準
移動估計
智慧型視訊信號處理
H.264/AVC Standard
Intelligent Video Signal Processing
Motion Estimation
Type
thesis
File(s)
Loading...
Thumbnail Image
Name

ntu-93-F89921049-1.pdf

Size

23.31 KB

Format

Adobe PDF

Checksum

(MD5):5525438d6468ef036e40a037459db11d

臺大位居世界頂尖大學之列,為永久珍藏及向國際展現本校豐碩的研究成果及學術能量,圖書館整合機構典藏(NTUR)與學術庫(AH)不同功能平台,成為臺大學術典藏NTU scholars。期能整合研究能量、促進交流合作、保存學術產出、推廣研究成果。

To permanently archive and promote researcher profiles and scholarly works, Library integrates the services of “NTU Repository” with “Academic Hub” to form NTU Scholars.

總館學科館員 (Main Library)
醫學圖書館學科館員 (Medical Library)
社會科學院辜振甫紀念圖書館學科館員 (Social Sciences Library)

開放取用是從使用者角度提升資訊取用性的社會運動,應用在學術研究上是透過將研究著作公開供使用者自由取閱,以促進學術傳播及因應期刊訂購費用逐年攀升。同時可加速研究發展、提升研究影響力,NTU Scholars即為本校的開放取用典藏(OA Archive)平台。(點選深入了解OA)

  • 請確認所上傳的全文是原創的內容,若該文件包含部分內容的版權非匯入者所有,或由第三方贊助與合作完成,請確認該版權所有者及第三方同意提供此授權。
    Please represent that the submission is your original work, and that you have the right to grant the rights to upload.
  • 若欲上傳已出版的全文電子檔,可使用Open policy finder網站查詢,以確認出版單位之版權政策。
    Please use Open policy finder to find a summary of permissions that are normally given as part of each publisher's copyright transfer agreement.
  • 網站簡介 (Quickstart Guide)
  • 使用手冊 (Instruction Manual)
  • 線上預約服務 (Booking Service)
  • 方案一:臺灣大學計算機中心帳號登入
    (With C&INC Email Account)
  • 方案二:ORCID帳號登入 (With ORCID)
  • 方案一:定期更新ORCID者,以ID匯入 (Search for identifier (ORCID))
  • 方案二:自行建檔 (Default mode Submission)
  • 方案三:學科館員協助匯入 (Email worklist to subject librarians)

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science