莊永裕臺灣大學:資訊工程學研究所李根逸Lee, Ken-YiKen-YiLee2007-11-262018-07-052007-11-262018-07-052006http://ntur.lib.ntu.edu.tw//handle/246246/53653漫畫是一種靜態呈現動態跟語音資訊的方法。漫畫式電影摘要是一個新而有用的方法可以在二維的平面空間中呈現影片中動態跟語音的資訊。然而,要將一部電影轉換成一本漫畫書有很多挑戰需要被克服。其中之一就是我們需要將漫畫式的對話框放在畫格中正確的位置。我們提出一個可用於漫畫式電影摘要的半自動說話角色定位系統使得我們可以在最少的使用者提示下找出對話框放置的位置。在一部影片中,演員們可能會做出很複雜的動作或互動,諸如打光之類的環境因素也可能變化多端。為了得到可靠的結果,我們的系統是基於一個貪多式人臉分類演算法,而且獲得了足夠好的結果可以在實際的漫畫式影片摘要系統中使用。Comics is a static presentation but has temporal and speech information. A comic-styled film summarization is a new and useful method to summarize a movie in 2D. But, there are many challenges when we try to transform a movie into a comic book. One of them is that we should place dialog balloons at correct positions. We propose a semi-automatic speaker localization system for comic-styled film summarization such that we can locate speakers with very less user hints. In a film, actors may do many complex motions and interactions, and environment such as lighting may change very much. For robustness, this work is based on a greedy face clustering algorithm, and has good enough performance in practice to be used in a comic-styled film summarization system.口試委員會審定書 i 致謝 iii 摘要 v Abstract vii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Our Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related Work 3 2.1 Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Audio-Visual Synchronization . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Automatic Character Retrieval . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Automatic Cast Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 The Speaker Localization System 5 3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Shot Boundary Detection and Shot Selection . . . . . . . . . . . . . . . . 6 3.2.1 Canny Edge Detector . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.2 Edge Change Ratio Algorithm . . . . . . . . . . . . . . . . . . . 7 3.2.3 Correspondent Shot Selection . . . . . . . . . . . . . . . . . . . 8 3.3 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.1 Viola-Jones Cascaded Algorithm . . . . . . . . . . . . . . . . . . 9 3.3.2 Skin Color Detector . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 First Face Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4.1 Continuously Adaptive Mean Shift (CAMSHIFT) Algorithm . . . 12 3.4.2 Track Construction . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Face Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5.2 Our Greedy Recognition Algorithm . . . . . . . . . . . . . . . . 17 3.5.3 Distance Measure between Faces . . . . . . . . . . . . . . . . . 17 3.5.4 Distance Measure between Tracks . . . . . . . . . . . . . . . . . 17 3.5.5 Greedy Clustering Procedure . . . . . . . . . . . . . . . . . . . . 18 3.6 Final Face Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Results and Discussions 21 4.1 Evaluation of Our System . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Performance Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.2 Face Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2.3 Face Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Conclusions and Future Work 23 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23784191 bytesapplication/pdfen-US影片摘要人臉辯識影片瀏覽video summarizationface recognitionvideo browsing用於漫畫式影片摘要之對話角色定位Speaker Localization for Comic-Styled Film Summarizationthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53653/1/ntu-95-R94922024-1.pdf