楊佳玲臺灣大學:資訊工程學研究所施澤聰Shih, Tse-TsungTse-TsungShih2007-11-262018-07-052007-11-262018-07-052004http://ntur.lib.ntu.edu.tw//handle/246246/54113多媒體應用程式在現代的電腦系統上已成為很重要的工作量(Workload)。最近一代的的影像壓縮標準H.264/AVC 採用了許多的壓縮工具,能夠讓改善壓縮效率以及影像品質,不過卻也增加了很多實作上的複雜度。這些增加的運算量以及對儲存記憶體的需求,對一般性處理器是否能夠及時地播放影片造成很大的挑戰。 從這一份研究,我們可以發現在現代一般性處理器上執行H.264解碼器在效能上的瓶頸,瞭解H.264解碼器的特性可以讓我們調整硬體處理器的架構和軟體程式的實作去求得更好的效能。我們的分析重點放在H.264解碼器原本存在的指令間的平行度,記憶體效能,以及程式控制流程的可預測性。除此之外,我還進一步去研究哪些程式特徵(影像內容,大小,位元比例)及新增加的編碼工具(多畫框區塊參考方式,CABAC)會對超純量架構處理器的效能有直接的影響。在研究的過程中,我是採用以軟體模擬為基礎的方法來分析這個工作量的特性,它可以讓我們可以徹底地探索設計空間以及彈性地去評估各種不同架構上的加強。一些重要的發現包括1) H.264解碼器的確有顯著的指令間的平行度 2) H.264 效能不是受記憶體所限制,因為資料重複使用的單位是方塊大小而且可以留存在資料快取記憶體 3)H.264有很差的分支指令預測來自於多層迴圈以及依內容決定的分支指令,展開迴圈及絕對值指令可以減少很多因預測錯誤而等候的時間。對於程式特徵,影像內容和大小只對快取記憶體有很小的影響,較高的位元比例會增加熵函數解碼的執行時間。新增加的多畫框區塊參考方式(multi-ref frame) 並不會對資料快取記憶體有直接的影響,因為參考到前一張畫框的資料並不能留存在資料快取記憶體供重複使用。CABAC 比CAVLC 更差的程式流程可預測性,這是因為需要對位元串流做位元處理而有二分之一的機會將預測錯誤。Multimedia applications have become important workloads for modern computer systems. The latest video coding standard H.264/AVC adopts lots of coding tools, which can improve the coding efficiency and visual quality but also add the implementation complexity a lot. The increasing computation and storage requirements pose challenges to achieve real-time video playback on general-purpose processors (GPPs). In this thesis, I study and analyze the performance of a software implementation of H.264/AVC decoder on GPPs. Through this study, we can find out the performance bottleneck of running the H.264 decoder on a modern GPP. Understanding the characteristics of H.264 decoder allows us to tune hardware processor architecture and software program implementation for performance. I analyze three important program characteristics: the intrinsic available ILP, program locality and control flow predictability. Furthermore, I investigate what application features (sequence content, resolution, bitrate) and new added coding tools (multi-ref frames, CABAC) have direct impact on performance. In this study, I adopt the simulation-based approach to perform workload characterization. It allows us to explore the design space thoroughly and evaluate different architectural enhancements. The important findings of this study includes 1) H.264 decoder does present significant instruction level parallelism. 2) H.264 is computation-bound not memory-bound because block-level data reuse can be captured by data cache. 3) H.264 has poor branch predictability due to nested loops and content dependent branch. Loop unrolling and absolute instruction can reduce branch stall time significantly. 4) For application features, video contents with low motion and smaller resolution increase the inter frame prediction opportunity thereby increasing cache miss rates. Higher bitrate increases execution time of entropy coding. New added multi-ref frame does not have direct impact on cache performance since inter-frame reuse cannot be captured in data cache. CABAC has lower control flow predictability than CAVLC due to bit-wise access to bitstream.List of Tables iv List of Figures v 1 Introduction 1 2 Related Work 5 2.1 Multimedia Application Characteristics 5 2.2 Video application Characteristic 7 2.3 Media Architectural Enhancements 8 3 H.264/AVC Overview 11 3.1 Motion Compensation 14 3.1.1 Intra Prediction 15 3.1.2 Inter Prediction 16 3.2 Integer Transform 17 3.3 Entropy Coding 18 3.4 Deblocking Filter 19 4 Tools and Methodology 21 5 Workload Characterization 23 5.1 ILP Analysis 23 5.2 Memory System Characteristic 27 5.3 Control Flow Behavior 35 6 Conclusion 46 Bibliography 48547403 bytesapplication/pdfen-US工作量特性分析影像壓縮AVCWorkload CharacterizationCABACH.264MPEG-4 Part.10 AVC之工作量特性分析MPEG-4 Part.10 AVC Workload Characterizationthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/54113/1/ntu-93-R91922099-1.pdf