吳家麟臺灣大學:資訊工程學研究所朱威達Chu, Wei-TaWei-TaChu2007-11-262018-07-052007-11-262018-07-052006http://ntur.lib.ntu.edu.tw//handle/246246/53908將內容分析技術推向語意層級是近年來在多媒體領域中急速發展的研究課題。此類技術的分析結果較能符合使用者的需求,也讓內容管理與應用變得更加有效率。有別於傳統以內容為基礎的檢索技術,數位內容語意分析結合圖型識別、機器學習的技術與特定製作原則、領域知識來彌合低階特徵值與高階語意之間的鴻溝。 基於機器學習與圖型識別的技術,已有許多系統結合不同分類器、不同特徵值、或不同媒體型態的結果來進行語意分析。在本論文中,我們提出一個通用的架構來進行此類研究。其中,我們引入介於視聽特徵值與語意概念之間的中介資訊來輔助分析。 我們發展了三個不同的系統,在電影、棒球影片、以及一般的運動影片中進行語意概念偵測。在動作電影中,我們透過聲音的資訊來偵測槍戰與飛車追逐等語意概念。我們採用統計方法來描述概念以及對應不同層次的語意。在棒球比賽中,我們基於畫面與語音的資訊,結合了以規則為基礎與以模型為基礎的方法來做語意概念偵測。總計有十三種不同的概念,如一壘安打、二壘安打、全壘打、三振等,可被偵測出來,也藉此我們可發展許多實際的應用。在一般的運動影片中,我們提出可用球的軌跡來輔助內容分析。一些新型態的語意概念,如棒球比賽中投手的球種,可因此被描述與偵測出來。這三大類研究都是基於我們所提的通用架構,也因此證實了此架構對於語意概念偵測的實用性。Conducting content analysis approaching semantics level is an emerging trend in multimedia researches. Such kind of analysis matches users’ needs and facilitates content management and utilization in a more effective and reasonable way. Unlike conventional content-based retrieval or indexing, works on semantics analysis integrate techniques of statistical pattern recognition and machine learning with specific production rules or domain knowledge to bridge the semantic gap between low-level features and high-level semantics. On the basis of machine learning and pattern recognition technologies, systems that combine analytical results from different classifiers, different features, or different modalities are developed. In this dissertation, we propose a general framework that introduces the idea of mid-level representation between audiovisual features and semantic concepts. Two types of techniques, i.e. statistical pattern recognition and rule-based decision, are combined to facilitate narrowing the semantic gap. We develop three systems that respectively conduct semantic concept detection in action movies, in broadcasting baseball games, and in sports videos. In action movies, we detect semantic concepts, such as gunplay and car-chasing scenes, through analyzing aural information. Statistical approaches are exploited to characterize concept modeling and to facilitate mapping between different semantic granularities. In baseball games, visual and speech information are combined, and a hybrid method that includes rule-based and statistical techniques is designed for semantic concept detection. Thirteen semantic concepts, such as single, double, homerun, and strikeout, are explicitly detected, and several realistic applications can therefore be built. In general sports videos, we extract the ball trajectory to be a new type of metadata for describing content characteristics. Some novel semantic concepts, such as pitch types in baseball games, can therefore be modeled and detected. These studies are the instances of the proposed general framework and demonstrate the realization of automatic semantic concept detection.致謝 i Abstract iii 中文摘要 iv List of Figures x List of Tables xiii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Works 2 1.2.1 Categorize by Modality 2 1.2.2 Categorize by Level of Analysis 3 1.2.3 Categorize by Processing Methods 4 1.2.4 Concerns from International Standards 4 1.3 Semantic Concept Detection 6 1.3.1 From Feature to Knowledge 6 1.3.2 Pattern Recognition vs. Semantic Concept Detection 8 1.4 Problem Statement 10 1.5 Summary of Contributions 10 1.5.1 Audio Semantic Concept Detection in Movies 10 1.5.2 Explicit Baseball Concept Detection 11 1.5.3 Trajectory-Based Analysis in Baseball Videos 11 1.6 Dissertation Organization 12 Chapter 2 A Unified Framework for Multimedia Semantic Analysis 13 2.1 Content Analysis and Concept Language 13 2.2 Content Chain Framework 14 2.2.1 Framework Overview 14 2.2.2 Deterministic Mapping Function 16 2.2.3 Nondeterministic Mapping Function 16 2.2.4 Generality of the Content Chain Framework 16 2.3 Framework Correspondence 18 2.3.1 Semantic Concept Detection in Movies 18 2.3.2 Semantic Concept Detection in Baseball Videos 19 2.3.3 Trajectory-based Analysis in Sports Videos 20 2.4 Summary 21 Chapter 3 Semantic Analysis in Movies through Audio Information 23 3.1 Introduction 23 3.2 Hierarchical Audio Models 24 3.2.1 Audio Event and Semantic Concept 25 3.2.2 Hierarchical Framework 26 3.3 Audio Feature Extraction 27 3.3.1 Short-Time Energy 27 3.3.2 Band Energy Ratio 28 3.3.3 Zero-Crossing Rate 28 3.3.4 Frequency Centroid 29 3.3.5 Bandwidth 29 3.3.6 Mel-Frequency Cepstral Coefficients 29 3.4 Audio Event Modeling 30 3.4.1 Model Size Estimation 30 3.4.2 Model Training 31 3.4.3 Specific and World Distribution 32 3.4.4 Pseudo-Semantic Features 33 3.5 Generative Modeling for Semantic Concept 35 3.5.1 Model Training 36 3.5.2 Semantic Concept Detection 36 3.6 Discriminative Modeling for Semantic Concept 36 3.6.1 Model Training 37 3.6.2 Semantic Concept Detection 38 3.7 Performance Evaluation 38 3.7.1 Evaluation of Audio Event Detection 39 3.7.1.1 Overall Performance 40 3.7.1.2 Performance Comparison 41 3.7.2 Evaluation of Semantic Concept Detection 42 3.7.3 Comparison with Baseline System 44 3.7.4 Discussion 46 3.7.5 Semantic Indexing Based on the Proposed Framework 46 3.8 Summary 47 Chapter 4 Semantic Analysis and Game Abstraction in Baseball Videos 49 4.1 Introduction 49 4.2 System Framework 51 4.2.1 Characteristics of Baseball Games 51 4.2.2 Overview of System Framework 52 4.3 Shot Classification 53 4.3.1 Procedure of Shot Classification 53 4.3.2 Adaptive Field Color Determination 54 4.3.3 Infield/Outfield Classification 55 4.3.4 Pitch Shot Detection 55 4.4 Concept Detection 56 4.4.1 Rule-based Concept Detection 56 4.4.1.1 Caption Feature Extraction 57 4.4.1.2 Feature Filtering 58 4.4.1.3 Concept Identification 59 4.4.2 Model-based Concept Detection 61 4.4.2.1 Shot Context Features 62 4.4.2.2 Modeling 63 4.4.3 Combine Visual Cues with Speech Information 63 4.4.3.1 Overview 63 4.4.3.2 Information Fusion 65 4.4.4 Results of Concept Detection 67 4.5 Extended Applications 71 4.5.1 Automatic Game Summarization 71 4.5.1.1 Significance Degree of Concepts 72 4.5.1.2 Selection of Summarization 72 4.5.1.3 Evaluation of Summarization 74 4.5.2 Automatic Highlight Generation 75 4.5.2.1 Significance Degree of Concepts 75 4.5.2.2 Highlight Selection Algorithm 77 4.5.2.3 Evaluation of Highlight 78 4.5.3 An Integrated Baseball System 80 4.6 Discussion and Summary 82 Chapter 5 Semantic Analysis in Sports Videos through Ball Trajectory 85 5.1 Introduction 85 5.2 System Overview 86 5.3 Ball Candidate Detection 87 5.4 Trajectory Forming Process 89 5.4.1 Trajectory Segments Generation 90 5.4.2 Trajectory Candidates Generation 92 5.4.3 Physical Model-Based Trajectory Validation 93 5.4.3.1 Physical Model of Ball Trajectory 93 5.4.3.2 Trajectory Validation via Physical Limitation 96 5.5 Trajectory-based Analysis in Different Sports 97 5.5.1 Pitch Type Recognition in Baseball Videos 97 5.5.1.1 Pitch Type Recognition 98 5.5.1.2 Evaluation of Trajectory Extraction 101 5.5.1.3 Evaluation of Pitch Type Recognition 102 5.5.2 Penalty Kick Analysis in Soccer Videos 103 5.5.2.1 Soccer Trajectory Extraction 103 5.5.2.2 Evaluation of Soccer Trajectory Extraction 105 5.5.3 Tactics Analysis in Tennis Videos 105 5.5.3.1 Tennis Trajectory Extraction 105 5.5.3.2 Evaluation of Tennis Trajectory Extraction 106 5.6 Discussion and Summary 107 Chapter 6 Future Research and Conclusions 109 6.1 Discussions 109 6.1.1 Content Adaptation Architecture 109 6.1.2 Content Adaptation Modeling 110 6.2 Future Research 112 6.3 Conclusions 113 Appendix A Hidden Markov Model 115 A.1 Specification 115 A.2 Inside HMM 116 A.2.1 Solution to the Evaluate Problem — The Forward Algorithm 117 A.2.2 Solution to the Decoding Problem — The Vertibi Algorithm 118 A.2.3 Solution to the Learning Problem — Baum-Welch Algorithm 119 Appendix B Support Vector Machine 120 B.1 Introduction 120 B.2 Training and Testing 121 B.3 Multiclass SVM 122 Appendix C Computational Media Aesthetics 124 C.1 Film Grammar 124 C.2 Computational Media Aesthetics (CMA) 124 C.3 Examples of CMA Applications 126 C.3.1 Formulating Film Tempo [Dora02] 126 C.3.2 Horror Film Genre Typing and Scene Labeling via Audio Analysis [Monc03] 126 C.3.3 Pivot Vector Space Approach for Audio-Video Mixing [Mulh03] 126 C.4 Semantic Indexing vs. CMA 127 References 129 Curriculum Vitae 1412367078 bytesapplication/pdfen-US語意分析影片分析與組織事件與概念偵測視訊檢索semantic analysisvideo analysis and organizationevent and concept detectionvideo indexing具語意基礎之電影與運動影片內容分析及組織Semantics-based Content Analysis and Organization in Movies and Sports Videosthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53908/1/ntu-95-D91922016-1.pdf