簡韶逸臺灣大學:電子工程學研究所吳其玲Wu, Chi-LingChi-LingWu2007-11-272018-07-102007-11-272018-07-102006http://ntur.lib.ntu.edu.tw//handle/246246/57650近幾年隨著三維圖學相關工業的快速發展,人們對於圖學硬體加速器的需求量也隨著發展而增加。因為娛樂事業的蓬勃發展,帶動整個三維圖學的應用滲透到各個需求層面,各處都可以看到三維圖學的蹤跡,不管是個人電腦或是消費性電子產品。然而,隨著娛樂上享受品質的需求逐漸提升,導致三維的模型和場景變得越來越複雜,對於顯示處理的三維圖學渲染系統而言,必須處理更多的資料量,而這些資料導致系統中的外部記憶體頻寬量居高不下,影響到三維渲染系統的處理能力表現。 本論文中,我們提出一個硬體導向的可見度測試演算法來減少不必要的外部記憶體頻寬。在演算法層級上,提出的可見度測試演算法利用遮蔽遮罩來減少可見度資料的頻寬,除了有減少頻寬的好處之外,利用遮蔽遮罩可以很容易的利用較多次的取樣來達到反鋸齒的效果。除此之外,我們也將遮蔽遮罩以階層式(hierarchy)的方式來存放,如此利用階層可以較快速的決定某些部分的可見度,達到可見度測試加速的目標。重要的是,在顯示處理的過程中,提出的可見度測試中不需要特別利用硬體回傳可見度資訊的機制。使用提出的可見度測試演算法時,因為使用遮蔽遮罩,必須將三角型平面由近至遠做排序。對於靜態的場景或模型,常會先利用二元空間數作排序,使顯示處理在一般使用深度緩衝區的系統中達到加速。若在同樣的情況下,對於已經排序的三角形貯列,利用提出的演算法不只可以加速另外更可以減少許多不必要的外部記憶體頻寬的花費。因此,提出的演算法對於一些排序過的靜態場景和模型提供一個低頻寬的加速方式。 在架構層級上,提出的可見度測試演算法可以實現在三維圖學硬體顯示系統,同時硬體實作上可以加入可擴充特性(scalability)考量。擁有可擴充特性後,提出的渲染系統將可以延伸擴展至各種不同的圖學應用。 實驗結果顯示,不考慮反鋸齒的情況下,最多可節省百分之八十的外部記憶體頻寬,而在考慮反鋸齒的情況下,最多可節省百分之九十七的頻寬。我們將提出的可見度測試演算法實作成利用硬體名為可見度測試引擎的三維圖學渲染系統晶片原型,其使用TSMC 0.18um 1P6M技術,晶片大小為2.57x2.57mm2。In recent years, 3D graphics industry is growing rapidly, and the requirements for graphics hardware become larger than before. With the flourishing development of entertainment industry, 3D graphics applications become widespread. However, since the models and scenes become more and more complex and the rendering quality requirement is getting higher and higher, the 3D graphics rendering systems will need to process more and more data and suffer from high external memory bandwidth. In this thesis, we propose a hardware-oriented visibility testing algorithm to reduce the external memory bandwidth. In the algorithm level, the proposed visibility testing algorithm adopts coverage masks to reduce the visibility data bandwidth and easily integrates antialiasing with oversampling. Beside, coverage masks can construct a hierarchical structure, which can speed up the visibility testing progress. On the other hand, the visibility tests are done during rendering without occlusion queries. In this algorithm, the incoming primitives are sorted in the front-to-back order because of coverage mask adoption. For static scenes and models, they are usually sorted with BSP trees and are accelerated in Z-buffer systems. But with our proposed algorithm, not only accelerating but also reducing more external memory bandwidth can be achieved with sorted primitives. Thus the proposed visibility algorithm can be seen as an accelerator for static scenes or models. In architecture level, the proposed algorithm can be integrated into 3D graphics hardware rendering systems. The proposed hardware-oriented visibility testing algorithm can be implemented by hardware with scalability. With scalability, the proposed rendering system can be easily extended to various graphics applications. The experimental results shows that $80\%$ of the external memory bandwidth can be reduced without antialiasing, and $97\%$ of reduction can be achieved with antialiasing. The prototype chip of the proposed 3D graphics rendering system with visibility testing engine is fabricated with TSMC 0.18um 1P6M technology, where the chip size is 2.57x2.57 mm^2.Abstract vii 1 Introduction 1 1.1 3D Graphics Pipeline Overview 1 1.2 Motivation 1 1.3 Thesis Organization 5 2 Visible Surface Determination 7 2.1 Prior Arts of Occlusion Culling 10 2.1.1 Hierarchical Z Buffer 11 2.1.2 Hierarchical Occlusion Maps 14 2.1.3 Hardware Occlusion Queries 14 2.2 Challenges of Occlusion Culling 16 3 Proposed Visibility Testing Algorithm 19 3.1 Coverage Hierarchy 20 3.2 Primitive Triage Mask 22 3.2.1 Point-Triangle Relationship 22 3.2.2 Rect-Triangle Relationship 23 3.2.3 Construction 3.3 Proposed Visibility Test 25 3.4 Antialiasing 29 3.5 Working in Rendering Systems 30 4 Architecture Design of Proposed 3D Graphics Rendering System 35 4.1 Architecture Overview 36 4.2 Triangle Setup 37 4.3 Visibility Testing Engine 39 4.3.1 Primitive Triage Mask Processing Element 39 4.3.2 Visibility Testing Order and Memory Spatial Correlation 41 4.4 Shading 43 4.5 Arbiter 44 4.6 Experimental Results 45 5 Chip Implementation 51 5.1 Design Flow 51 5.2 Functional Verification 55 5.3 Test Consideration 56 5.3.1 Ad-hoc Testing 56 5.3.2 Scan Chain Insertion and ATPG 56 5.4 Chip Layout and Specification 56 6 Conclusion 613324905 bytesapplication/pdfen-US三維圖學渲染圖學硬體可見度3D graphics renderinggraphics hardwarevisibility三維圖學渲染系統可見度測試引擎之記憶體頻寬減少技術Memory Bandwidth Reduction Technique with Visibility Testing Engine for 3D Graphics Rendering Systemsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/57650/1/ntu-95-R93943025-1.pdf