監督式學習演算法之惡意來電偵測

指導教授：雷欽隆臺灣大學：電機工程學研究所陳亭霓Chen, Ting-NiTing-NiChen2014-11-282018-07-062014-11-282018-07-062014http://ntur.lib.ntu.edu.tw//handle/246246/262877隨著科技快速進步,行動電話成了多數人生活中不可或缺的物品。通訊技術的廣泛流通亦造成電話詐騙等犯罪行為,使用者可能被竊取個資,甚至造成金錢上的損失。隨著行動電話使用者介面的進步,多數使用者仰賴電話簿及來電顯示以判斷來電者。然而,若來電號碼未紀錄於電話簿中,則使用者將無從判斷此來電是否為惡意電話。現今判斷惡意電話的方法採用黑名單,由使用者回報惡意電話號碼,因此, 判斷的準確率將視其回報者與黑名單的維護而定。本論文採用監督式學習演算法 (supervised learning algorithm),利用使用者過去的行為來分類惡意使用者與正常使用者。基於正常的使用者與惡意使用者有不同行為的假設,試圖讓電腦自動判斷一個號碼的擁有者是否為惡意使用者,並且分析此分類器 (classifier) 的精確度。實驗結果顯示,此方式能較傳統方法更早發現惡意電話使用者,讓惡意來電的自動判斷與偵測成為可能。With rapid advancement in technologies, mobile phones have gained popularity and become indispensable. The growth of elecommunication has also given rise to malicious calling behaviors where users may encounter theft of identities or even financial losses. Due to the improvement of user interface on mobile phone devices, most mobile phone users rely on caller IDs which link to contact book to identify callers. However, it is often difficult to detect whether an unknown ID is malicious or not without additional information. Recent malicious caller identification establishes blacklists based on user reports. Detecting malicious callers in this fashion proofs to be difficult and inefficient due to the fact that user report is inconsistant and unreliable. Since there might be differences between malicious and benign call patterns, the aim of this study is to automatically predicting whether an unknown ID is malicious or not by observing their past call histories. In this study, we collected phone call histories in two different countries and applied machine learning algorithms to detect whether an unknown ID is benign or malicious. We evaluated the ability of different classifiers and compared the experimental results with conventional blacklist approach. Emperical results suggest that the proposed method is effective and can be a viable approach in detecting malicious calls.Contents 誌謝 i 摘要 ii Abstract iii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Related Work 4 2.1 Fraud and telemarketing detection . . . . . . . . . . . . . . . . . . . . . 4 2.2 Financial fraud detection . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Machine Learning and Classification 3.1 6 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Supervised classification algorithms . . . . . . . . . . . . . . . . . . . . 7 3.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 Evaluation of Data Results . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4.1 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 11 iv3.4.3 Receiver Operating Characteristics Curve (ROC Curve) . . . . . 12 4 Method 14 4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.4.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4.4 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4.5 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Result and Analysis 5.1 20 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.1.2 False Positive and False Negative Rate . . . . . . . . . . . . . . 26 5.1.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.1.4 Rationality of Feature Weight . . . . . . . . . . . . . . . . . . . 28 5.1.5 Expected Features . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.1.6 Robustness of Features . . . . . . . . . . . . . . . . . . . . . . . 35 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.1 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.2 False positive and false negative rate . . . . . . . . . . . . . . . . 37 5.2.3 ROC Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 Experiment 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6 Conclusions and Future Work 45 A Recorded Attributes 46 B Extracted Features 48 Bibliography 511098995 bytesapplication/pdf論文公開時間：2019/07/29論文使用權限：同意有償授權(權利金給回饋學校)機器學習監督式學習分類交叉驗證監督式學習演算法之惡意來電偵測Who''s Calling? Malicious Call Detection Using Supervised Learning Algorithmsthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262877/1/ntu-103-R01921078-1.pdf