新型多人人臉追蹤方法之共同注意力發展

黃漢邦臺灣大學：機械工程學研究所簡宏景Jian, Hung-JingHung-JingJian2007-11-282018-06-282007-11-282018-06-282007http://ntur.lib.ntu.edu.tw//handle/246246/61350本論文主要目的為研究多人人臉追蹤方法以及人形機器人與使用者之間的共同注意力發展。我們發展出一套新型的多人人臉追蹤的方法(Modified Multi-CAMSHIFT, MMC)來實現多物件追蹤，其利用結合色彩跟形狀兩種主要資訊，可以更有效的來找出並追蹤影像中所有人臉的位置。色彩資訊是利用我們所發展的Modified Multi-CAMSHIFT理論計算而得；形狀資訊是使用Scharr kernel mask求得。再分別計算出兩者的色彩和方向分佈直方圖，代入特徵選擇機制(Adaptive Feature Selection)裡面做最佳化追蹤判斷。為了分辨出人臉區域跟非人臉區域，我們加入雙眼快速取出機制(Eyes-pair Fast Extracting)。我們提出的多人人臉追蹤的方法，都是在適應性多重解析度(Adaptive Multi-Resolution)下進行運算，可以減少影像處理運算量。實驗結果顯示，加入上述種種機制，我們提出的多人人臉追蹤方法(Modified Multi-CAMSHIFT )是一個效果很好的追蹤方法。找出人臉後，再進一步來判斷出每個人臉的方向，研究其與機器人之間的互動情形，亦即共同注意力(Joint Attention)。我們使用靜態及動態兩種資訊，來判斷人臉的方向。動態資訊是利用光流(Optical Flow)來觀察計算當使用者的注視方向從看機器人轉移到看另一個目標物時的運動資訊。而靜態資訊為當人臉注視某目標物時，所計算出的人臉邊界影像資訊。靜態和動態資訊有互補的特性，前者雖然演算法很複雜但是可以給予精確的注視方向。另一方面，後者提供粗略資訊但是可以很容易來理解注視方向上的轉移和馬達跟隨著使用者視線轉移輸出之間的關係。學習模式是利用支撐向量機(SVM)，從觀察使用者的視線移動獲得的靜態和動態兩種資訊，使得機器人能夠有效地獲得共同注意力能力和與人自然的互動。動態資訊搭配靜態資訊可以加速共同注意力的獲悉而提升整體的性能。我們將上述的方法以及理論，成功的實現多人人臉追蹤與共同注意力發展。This thesis aims to develop a system for multiple objects tracking and joint attention between people and robot. We propose a new method (Modified Multi-CAMSHIFT, MMC), which is based on the characteristics of color and shape probability distribution, to solve the tracking problems for multiple objects. The color cue information is calculated by MMC that improves from CAMSHIFT theory. And the shape cue information is calculated by procedure of Scharr kernel mask. Then we calculate out color histogram and orientation histogram respectively, and use the Adaptive Feature Selection for optimal tracking. For judging face or non-face regions, we have included Eyes-pair Fast Extracting. Our proposed MMC is based on adaptive multi-resolution (AMR) framework for reducing computation. The experimental results show that based on all the mechanisms mentioned above, the proposed MMC is a tracking method that performs satisfactory effects. After finding human faces, we tell the direction of each human face, and research the human-robot interaction between human and robot that is called Joint Attention. We establish joint attention with a human by utilizing both static and dynamic information. As the static information, we extract the edge image of the human face when he/she is gazing at the object. As the dynamic information, the robot uses the optical flow detected when observing a human who is shifting his/her gaze from looking at the robot to looking at another object. The static and dynamic information have complementary characteristics. The static information gives the exact direction of gaze, even though it is difficult to interpret. On the other hand, the dynamic information provides a rough direction but it is easily understandable relationship between the direction of gaze shift and motor output to follow the gaze. We use Support Vector Machine (SVM) for learning model. Utilizing both static and dynamic information acquired from observing a human’s gaze shift enables the robot to efficiently acquire joint attention ability and to naturally interact with the human by SVM. The dynamic information accelerates the learning of joint attention while the static information improves the task performance. From experiment results, the proposed Modified Multi-CAMSHIFT was successfully applied to multiple faces tracking and the development of the Joint Attention.摘要 I Abstract III Contents V List of Tables VIII List of Figures IX Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Related Works 2 1.2.1 Object Tracking 3 1.2.2 Human-Robot Interaction (HRI) 4 1.3 Objectives and Contributions 6 1.4 Thesis Organization 9 Chapter 2 Background Knowledge 10 2.1 Color Space Used for Skin Modeling 10 2.2 The CAMSHIFT Algorithm 12 2.2.1 Introduction to the CAMSHIFT Algorithm 12 2.2.2 Mass Center Calculation 13 2.2.3 Probability Distribution 14 2.3 Joint Attention 15 2.4 Edge Detector 18 2.5 Optical Flow 21 2.5.1 Optical Flow Computation 22 2.6 Support Vector Machine (SVM) 24 2.6.1 Structural Risk Minimization 24 2.6.2 Introduction to SVMs 25 Chapter 3 Faces Tracking for Multiple People 30 3.1 Skin Color Model 30 3.1.1 Skin Color Probability Modeling 30 3.1.2 Adaptive Skin Color Probability Model Update 33 3.2 Modified CAMSHIFT 34 3.2.1 Interested probability Enhancement 36 3.2.2 Initial Block Searching in Small Resolution 38 3.2.3 Search Window of CAMSHIFT 39 3.2.4 Center Tendency 40 3.3 Multi-CAMSHIFT Algorithm 41 3.3.1 Sort Indexes of MCAMSHIFT 44 3.4 Adaptive Multi-Resolution (AMR) 45 3.5 Modified Multi-CAMSHIFT Algorithm 50 3.5.1 Adaptive Feature Extraction 54 3.5.2 Eyes-pair Fast Extracting 59 Chapter 4 Development of Joint Attention 62 4.1 Model of Joint Attention 62 4.2 Face Image Orientation Detector 64 4.2.1 Edge Detector for Static Information 65 4.2.2 Optical Flow for Dynamic Information 67 4.3 Learning Module with SVM 69 4.4 Joint Attention with Modify Multi-CAMSHIFT 70 4.4.1 PID Control Theorem in Pan-Tilt System 73 Chapter 5 Experiment Results 74 5.1 System Overview 74 5.2 Performance of Modified MCAMSHIFT Tracking 76 5.3 Performance of Joint Attention with Modified Multi-CAMSHIFT Tracking Experiments 82 5.3.1 Static Orientation Detector Experiments 83 5.3.2 Dynamic Orientation Detector Experiments 84 5.3.3 Tracking Object of Joint Attention with PTU System Experiments 86 5.3.4 Graphical user interfaces of System 88 Chapter 6 Conclusions 91 6.1 Conclusions 91 6.2 Future Works 92 References 94en-US人臉追蹤連續適應性中心移動演算法邊緣偵測光流共同注意力支持向量機FaceTrackingCAMSHIFTEdge DetectionOptical FlowJoint AttentionSVM新型多人人臉追蹤方法之共同注意力發展Development of the Joint Attention with a New Face Tracking Method for Multiple Peoplethesis