洪ㄧ平臺灣大學:資訊工程學研究所鄭先廷Cheng, Hsien-TingHsien-TingCheng2007-11-262018-07-052007-11-262018-07-052005http://ntur.lib.ntu.edu.tw//handle/246246/53601在本論文中,根據更多的資訊將帶來較佳的辨識結果,我們提出了在信心層級來結合了人臉以及語音的資訊進而進行以生物特徵為基礎的身份確認。從系統的觀點,我們建構了一個讓使用者可輕鬆註冊、介面人性化、以及防止偽裝入侵的線上身份確認系統。以方法論的角度,我們使用了目前公認最好、最新的技術來發展人臉跟語音的模組。在整合部分,為了利用所有可使用的人臉資訊,我們提出了”多張人臉/單一語句”的策略進而降低了人臉偵測錯誤或是對位錯誤的風險。支持分類器(Support Vector Machine)被選用來當作二維分類器。 除了個別化的模組以及後端的整合,在此論文中我們更探討了”從一個不平衡資料集中學習”的問題。一般而言,我們都希望可以有越多的訓練資料可供訓練越好,然後若是訓練資料的分佈極不平衡,一般的分類器方法將會受到影響進而偏向資料較多的類別。此問題在分類問題中其實相當常見,但在身份辨識或是身份確認的領域,我們是首次提出並加以解決,實驗結果證明透過不同階層、方式的處理,將會使得不平衡的現象獲得改善。Based on the idea of more information brings better performance, in this thesis we presents a confidence-level fusion method to combine face and voice information in biometric person identity verification. In systematic aspect, we develop an on-line verification system with light-weight enrollment process, fraud precaution mechanism and an easy-to-use verification interface. While in algorithmic point of view, state-of-the art techniques are used to build the face and voice experts. More-over, a multi-face/single-sentence strategy is proposed to utilize all the available in-formation to reduce the cost of miss-detection and miss-registration of face, and support vector machine (SVM) is employed as the binary fusion classifier. In addition to individual experts and the fusion work, another important issue proposed in this thesis is learning from a class-imbalanced dataset. To train a good classifier, most of the time we use as many training data as possible. However in lots of fields involving classification jobs, training data is highly imbalanced distributed from class to class, ordinary classification algorithms will favor to the class which has more training samples. In the field of identity verification we are the first one that discover such important issue and try to handle it. Different level approaches are studied and implemented to reduce the influence of imbalanced dataset and lead to better performance.Table of Contents i List of Figures v List of Tables vii Chapter 1 Introduction 1 1.1 Proposed method 4 1.2 Recognition, identification, or verification? 6 1.3 Organization of the thesis 7 Chapter 2 Background and related works 9 2.1 Biometric person identity verification 9 2.2 Face recognition 11 2.2.1 Introduction 11 2.2.2 Background 13 2.2.3 Local feature approach and holistic approach 16 2.3 Speaker recognition 19 2.3.1 Introduction 19 2.3.2 Text-dependent and text-independent speaker verification 21 2.4 Fusion of multiple modalities 24 2.4.1 Introduction 24 2.4.2 Background 25 Chapter 3 Multimodal fusion of face and speaker verification experts 27 3.1 Introduction 27 3.2 System design 31 3.2.1 System overview 31 3.3 Face verification expert 35 3.3.1 Cascade-boosting face detector 36 3.3.2 Eigen-eye localization technique 37 3.3.3 Reliable facial feature extraction 38 3.3.4 Incremental Kernel Fisher Discriminant learning 38 3.4 Speaker verification expert 41 3.4.1 Log-likelihood ratio detector 41 3.4.2 Integration of speech recognition and Speaker verification 42 3.5 Information fusion 43 3.5.1 Confidence level fusion 43 3.5.2 Multi-face/single-sentence strategy 46 3.5.3 Decision making 47 Chapter 4 Learning from imbalanced data sets 53 4.1 Introduction 53 4.2 Data level approaches 55 4.2.1 Random re-sampling 56 4.2.2 Synthetic minority over-sampling technique 56 4.2.3 Asymmetric bagging 58 4.3 Algorithmic level approaches 59 4.3.1 Cost-sensitive learning 60 Chapter 5 Experiments 63 5.1 Database configuration 63 5.1.1 In-House database 64 5.2 Results 65 5.2.1 Evaluation measurement 66 5.2.2 Multimodal vs. single modality 67 5.2.3 Imbalanced dataset learning 68 Chapter 6 Conclusion and future work 75 6.1 Conclusion 75 6.2 Future work 76 Bibliography 79748362 bytesapplication/pdfen-US身份確認人臉確認語者確認多模式整合支持分類器不平衡資料集person identity verificationface verificationspeaker verificationmultimodal fusionSVMclass-imbalanced dataset以不平衡資料集之分類技術進行結合人臉與語音之身分確認Fusion of Face and Voice Information in Person Identity Verification with Class-Imbalanced Datasetthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53601/1/ntu-94-R92922006-1.pdf