簡韶逸臺灣大學:電子工程學研究所陸家恆Lok, Ka-HangKa-HangLok2010-07-142018-07-102010-07-142018-07-102009U0001-2307200921051700http://ntur.lib.ntu.edu.tw//handle/246246/189194 近年來手持式行動裝置的市場迅速增長。同時,3D圖形在行動或者手持式設備裡變得越來越重要,其中功率消耗是最重要的挑戰。為了降低功耗,有證據顯示外部記憶體存取是最決定性的因素。在追求更好視覺品質的同時,會有越來越多的數據經由行動式三維圖形處理器(GPUs)來作處理,如此一來外部記憶體頻寬逐漸被消耗殆盡。因此,在繪圖系統中大量的外部記憶體存取不只導致能量消耗的問題,還會成為整體系統效能的瓶頸,因此以節省外部記憶體頻寬為目標的壓縮技術變得越來越重要。而且,與桌上型電腦的繪圖處理器比較起來,使用在行動裝置的三維繪圖處理器面臨更大的挑戰。首先,較小的設備尺寸限制了可使用的硬體資源。而且,從人眼到行動裝置螢幕的距離非常接近,因此跟桌上型裝置比較起來眼睛與像素間會形成比較大的夾角,因此在行動裝置上的視覺品質要求會比在桌上型的來得高。 在這篇論文裡,為了盡可能減少硬體成本,我們提出了一個適用於緩衝區數據的通用壓縮及解壓縮演算法。因為DXTC已經被為DirectX 和OpenGL這些圖形編程介面所支援,在繪圖處理器中通常存在著DXTC的解壓縮模組,因此在演算法的設計上是以處理顏色和深度數據為主。不過,相同的概念被增加至DXTC中使得在視覺品質上有更好的效果。這些模組的整合相信是在行動式三維圖形處理器的將來的趨勢。 結合上述的技術,我們實現了一個應用在行動式多媒體裝置的低功率三維繪圖處理器。這個處理器具有多媒體串流處理的特性,並且我們將之實現成一個系統晶片的平台。原型晶片利用聯電90nm技術製成,面積為4.1×4.1mm2,其工作頻率為143MHz,最大消耗功率為158 mW (當中所提出的壓縮及解壓縮模組約消耗28 mW)。In recent years, the market of mobile electronics grows rapidly. Meanwhile, 3D graphics becomes more and more important in mobile or portable devices, where energy efficiency is the most important design challenge. To reduce the power consumption, it is shown that the amount of external memory access is the most crucial factor. For better visual quality, more and more data is processed by the mobile graphics processor units (GPUs), thus suffering from higher and higher external memory bandwidth. Therefore, it is clear that large amount of external memory access in graphics systems causes not only the energy consuming problem but also the performance bottleneck, and compression techniques aiming at saving memory bandwidth become more and more important. Furthermore, mobile GPUs face larger challenge than its desktop counterpart. First, the limitation of small form factor restricts the hardware resources available. Moreover, the distance from the display of the mobile device to human eyes is quite closed, resulting to a rather large average eye-to-pixel angle in opposition that for a desktop. For this reason graphics on mobile devices should be better than on their desktop counterparts.n this thesis, we present a universal compression and decompression algorithm for the buffer data, aiming to reduce the hardware cost as little as possible. Since DXTC has been supported for the graphics API such as DirectX and OpenGL, there is always a DXTC unit in the graphics processors, here the universal algorithm is mainly designed to give the first place to color and depth data. However, some modifications using the same concept as for color and depth data has been added to the DXTC method to enable better visual quality. The unified of these units should be the future trend in designing mobile GPUs.ntegrated with the proposed unit above, a low power graphics processing units for mobile multimedia applications is implemented in this thesis. The prototype chip is fabricated by $UMC$ 90nm technology, and the chip size is 4.1$ imes$4.1$mm^2$. The designed working frequency is 143MHz, and the worst case power consumption is 158mW (with about 28mW consumed by the proposed codec unit).Abstract xiii Introduction 1.1 Basic Concept of Mobile Graphic Processing Units . . . . . . . . 1.2 Analysis of the External Memory Access of Mobile Graphic Processingnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 7 State-of-the-art Buffer Compression 9.1 Depth/Color Buffer Compression Algorithm . . . . . . . . . . . . 10.1.1 Fix-Rate Buffer Compression Algorithm . . . . . . . . . 10.1.2 Variable-Rate Buffer Compression Algorithm . . . . . . . 12.2 Lossy Texture Compression Algorithm . . . . . . . . . . . . . . . 16.3 Analysis and Design Challenges . . . . . . . . . . . . . . . . . . 17 Proposed Buffer Compression and Decompression Algorithm 19.1 Compression of the Color Data . . . . . . . . . . . . . . . . . . . 20.1.1 Color Transform . . . . . . . . . . . . . . . . . . . . . . 20.1.2 Spatial Predictor . . . . . . . . . . . . . . . . . . . . . . 22.1.3 Group-Based Variable Length Coding . . . . . . . . . . . 25.1.4 Bit Stream Rearrangement . . . . . . . . . . . . . . . . . 27.1.5 Lossy Color Compression . . . . . . . . . . . . . . . . . 30.2 Compression of the Depth Data . . . . . . . . . . . . . . . . . . . 33.3 Lossy Compression of the Texture Data . . . . . . . . . . . . . . 38 Architecture Design of Proposed Buffer Compression and Decompressionnit 41.1 Architecture overview . . . . . . . . . . . . . . . . . . . . . . . . 41.2 ROP Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3 Register File Control Unit . . . . . . . . . . . . . . . . . . . . . 43.4 Compression Unit and Decompression Unit . . . . . . . . . . . . 43.4.1 Compression Unit . . . . . . . . . . . . . . . . . . . . . 44.4.2 Decompression Unit . . . . . . . . . . . . . . . . . . . . 50.5 Configurable Buffer Cache Unit . . . . . . . . . . . . . . . . . . 54.6 Modified DXT5 Unit . . . . . . . . . . . . . . . . . . . . . . . . 55 Experimental Results 59.1 Performance of Lossless Color and Depth Buffer Compression . . 60.2 Performance of Lossy Color Buffer Compression . . . . . . . . . 62.3 Performance of Lossless Texture Compression . . . . . . . . . . . 65.4 Performance of Lossy Texture Compression . . . . . . . . . . . . 65.5 Performance of Reconfigurable Buffer Cache Scheme . . . . . . . 67 The Implementation of Low PowerGraphics Processing Units forMobileultimedia Applications 71.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . 72.1.1 Graphics Processing Units with Stream Processing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.2 IntegratedMultimedia System-on-a-Chip using Low Powerraphics Processing Units . . . . . . . . . . . . . . . . . 73.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75.3 Chip Layout and Specification . . . . . . . . . . . . . . . . . . . 76.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Conclusion 83eference 859477052 bytesapplication/pdfen-US繪圖處理器壓縮GPUcompression[SDGs]SDG7適用於三維繪圖系統之通用緩衝存儲器壓縮與解壓縮模組之硬體架構設計與實現Design and Implementation of Universal Buffer Compression andecompression Unit for Mobile 3D Graphic Systemthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/189194/1/ntu-98-R95943174-1.pdf