指導教授:陳靜枝臺灣大學:資訊管理學研究所李永裕Lei, Weng-UWeng-ULei2014-11-292018-06-292014-11-292018-06-292014http://ntur.lib.ntu.edu.tw//handle/246246/263483信用評估已成為金融機構評估是否核准顧客申請貸款的一項重要步驟。企業能根據不同的資料探勘技術來建立穩定而可靠的評估標準,降低放貸所產生的違約風險。但實際上,資料的來源已進入巨量資料時代。大量而雜亂的資料,加上複雜的資料探勘技術,大大提高了企業在資料處理與模型應用的難度。因此,本研究提出了一個使用決策樹作為主要的資料探勘模型來解決信用評估問題的方法(DTCAA)。其可讀性與良好的預測能力,以及它所產生的各種風險規則,有助於企業更好地理解顧客的特性,並能準確地結合實務應用。另外,本研究亦提出多種資料處理的方法來解決巨量資料下雜亂的資料定義與來源所帶來的問題,降低實務應用的門檻。經由使用真實的車貸申請資料,本研究驗證了DTCAA在實務上的可行性。即使在不同的違約比例與多種因素的改變下,決策樹同樣能夠提供與其他資料探勘方法相近的預測能力。Credit assessment has been a large-scale problem among finance institutes. Their demand in reducing risk of customer debt can be achieved by applying data mining techniques to determine whether a new application should be approved or not. The problem, however, is actually under a Big Data environment. Complicated preprocessing steps are required because of the vast and messy data sources. The study proposes a Decision-Tree-Based Credit Assessment Approach (DTCAA) to solve the problem. Decision tree model is selected because of its interpretability and easily understanding rules, as well as its competitive performance. Additionally, the approach also includes various methods for data preprocessing. The consolidations can reduce messiness of the raw data, facilitating the implementation process. By acquiring the real data from one of the three biggest car collateral loan companies in Taiwan, the experiments indicate that decision Tree is competitive among various situations. Within multiple factors, the experiments suggest the usability of DTCAA in practice.Contents i List of Figures iv List of Tables v Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Objective 5 1.3 Scope 6 Chapter 2 Literature Review 7 2.1 Credit Assessment Problems 7 2.2 Supervised Learning / Classification 9 2.3 Decision Tree 13 2.4 Conclusion 16 Chapter 3 Problem Description 18 3.1 The Credit Assessment Problem 18 3.2 Classification and Decision Tree 20 3.3 The Big Data Environment 24 3.4 Problem Statement 25 3.5 Summary 27 Chapter 4 The Decision-Tree Based Credit Assessment Approach 29 4.1 Step 1: Data Analysis and Preprocessing 30 4.1.1 Defining the Target Variable 31 4.1.2 Consolidating data 34 4.1.3 Data Sampling and Attribute Selection 40 4.1.4 Data Partition 43 4.2 Step 2: Decision Tree Models Building 44 4.2.1 Model Building 46 4.2.2 Model Assessment 47 4.3 Step 3: Data Prediction and Scoring 48 4.4 Complexity 48 Chapter 5 Computational Analysis 50 5.1 Data Description 50 5.2 Factors 52 5.2.1 Target Variable 52 5.2.2 Different Multi-Class Approaches 53 5.2.3 Variable Selection 54 5.3 Experiments 54 5.3.1 Case 1: Balance Dataset with 1 run 57 5.3.2 Case 2: Balance Dataset with 30 runs 62 5.3.3 Case 3: Imbalance Dataset with 1 run 67 5.3.4 Case 4: Imbalance Dataset with 30 runs 72 5.4 Summary 76 Chapter 6 Conclusion and Future Work 78 6.1 Conclusion 78 6.2 Future Work 79 Reference 8110687378 bytesapplication/pdf論文公開時間:2024/12/31論文使用權限:同意有償授權(權利金給回饋學校)信用評估決策樹巨量資料海量資料大數據資料探勘資料整合應用於分析信用評估巨量資料的決策樹分類法A Decision Tree Classifier for Big Data Analytics on Credit Assessment Problemthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/263483/1/ntu-103-R01725019-1.pdf