Design and Implementation of High Performance JPEG 2000 Encoding System
Date Issued
2005
Date
2005
Author(s)
Fang, Hung-Chi
DOI
en-US
Abstract
JPEG 2000 is a new still image coding standard that has not only better coding efficiency but also abundant useful features. However, its high computational complexity and memory requirements has obstructed its entering the market. In this dissertation, we proposed a high performance JPEG 2000 encoding system to solve this problem.
For the embedded block coding, we proposed a parallel architecture to increase the throughput by processing a coefficient at a time. Thus, the state variable memory and code-block memory are eliminated. This greatly reduces the
hardware cost since these memories occupy more than 80% area of the embedded block coding engine in conventional architectures. Moreover, the processing speed is increased by more than 6 times compared with the best result in the literature. Therefore, the proposed parallel architecture is high performance for its high speed and low cost.
The rate-distortion optimization is an important function of JPEG 2000. However, the post-compression rate-distortion optimization algorithm recommended in the reference software requires that the original image is losslessly coded regardless of target bit rate. This wastes the computational power and time to process the unnecessary data, and requires a large memory to buffer the lossless bit stream. To solve this problem, we propose a pre-compression rate-distortion optimization algorithm, which can perform the rate-distortion optimization before the embedded block coding. Thus, the embedded block coding only needs to process necessary data. This greatly reduces the processing time and computation power of the embedded block coding. Moreover, it does not need to buffer the
bit stream. Therefore, the proposed pre-compression rate-distortion optimization algorithm presents low power, high speed, and low cost capability.
Based on the above two new techniques, a high speed parallel JPEG 2000 encoder chip is implemented. It can encode HDTV 720p video in real-time. For the discrete wavelet transform, we adopt the multi-level line-based 2-D architecture. The memory bandwidth requirement of this chip is therefore minimized, i.e. each pixel is read one and only one time. The chip is fabricated by TSMC 0.25 µm
CMOS technology, and the core area is 5.5 mm2. The power consumption is 348 mW at 81 MHz. This encoder has the highest throughput on smallest silicon area compared with all other encoders in the literature.
Finally, we propose a stripe pipeline scheme for large tile size. By use of this scheme, the on-chip memory requirement of a JPEG 2000 encoder is proportional to the square root of the tile size while it is proportional to the tile size in previous works. For a tile size of 256×256, the tile memory requirement is reduced to only 8.5% of previous works. To achieve the stripe pipeline scheme, the level switch discrete wavelet transform and the code-block switch embedded block coding has been proposed. The level switch discrete wavelet transform is a multi-level blockbased scan architecture, and the code-block switch embedded block coding can process 13 code-blocks in parallel. As a result, the hardware cost of this pipeline
architecture is about 30% of the parallel encoder when the tile size is 256×256, and the area saving increases as the increase of the tile size.
With the algorithms and architectures proposed in this dissertation, the cost of the JPEG 2000 encoder can be reduced to only several times of that of the JPEG encoder. Moreover, all the features and functionalities of JPEG 2000 are retained.Therefore, we believe that JPEG 2000 will start to take the place of JPEG as the core technology of still image coding systems in the near future.
For the embedded block coding, we proposed a parallel architecture to increase the throughput by processing a coefficient at a time. Thus, the state variable memory and code-block memory are eliminated. This greatly reduces the
hardware cost since these memories occupy more than 80% area of the embedded block coding engine in conventional architectures. Moreover, the processing speed is increased by more than 6 times compared with the best result in the literature. Therefore, the proposed parallel architecture is high performance for its high speed and low cost.
The rate-distortion optimization is an important function of JPEG 2000. However, the post-compression rate-distortion optimization algorithm recommended in the reference software requires that the original image is losslessly coded regardless of target bit rate. This wastes the computational power and time to process the unnecessary data, and requires a large memory to buffer the lossless bit stream. To solve this problem, we propose a pre-compression rate-distortion optimization algorithm, which can perform the rate-distortion optimization before the embedded block coding. Thus, the embedded block coding only needs to process necessary data. This greatly reduces the processing time and computation power of the embedded block coding. Moreover, it does not need to buffer the
bit stream. Therefore, the proposed pre-compression rate-distortion optimization algorithm presents low power, high speed, and low cost capability.
Based on the above two new techniques, a high speed parallel JPEG 2000 encoder chip is implemented. It can encode HDTV 720p video in real-time. For the discrete wavelet transform, we adopt the multi-level line-based 2-D architecture. The memory bandwidth requirement of this chip is therefore minimized, i.e. each pixel is read one and only one time. The chip is fabricated by TSMC 0.25 µm
CMOS technology, and the core area is 5.5 mm2. The power consumption is 348 mW at 81 MHz. This encoder has the highest throughput on smallest silicon area compared with all other encoders in the literature.
Finally, we propose a stripe pipeline scheme for large tile size. By use of this scheme, the on-chip memory requirement of a JPEG 2000 encoder is proportional to the square root of the tile size while it is proportional to the tile size in previous works. For a tile size of 256×256, the tile memory requirement is reduced to only 8.5% of previous works. To achieve the stripe pipeline scheme, the level switch discrete wavelet transform and the code-block switch embedded block coding has been proposed. The level switch discrete wavelet transform is a multi-level blockbased scan architecture, and the code-block switch embedded block coding can process 13 code-blocks in parallel. As a result, the hardware cost of this pipeline
architecture is about 30% of the parallel encoder when the tile size is 256×256, and the area saving increases as the increase of the tile size.
With the algorithms and architectures proposed in this dissertation, the cost of the JPEG 2000 encoder can be reduced to only several times of that of the JPEG encoder. Moreover, all the features and functionalities of JPEG 2000 are retained.Therefore, we believe that JPEG 2000 will start to take the place of JPEG as the core technology of still image coding systems in the near future.
Subjects
數位視訊
架構設計
最佳化演算法
數位訊號處理
影像處理
影像編碼
signal processing
algorithm
Architecture design
image coding
chip implementation
optimization
JPEG 2000
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-94-F90943013-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):f656ca1c3d8a61ea7371a85973fbc640