Data Mining for the Classification Problem-The Inspiration of Genetic Programming

Huang, Jih-Jeng

Data Mining for the Classification Problem-The Inspiration of Genetic Programming

Date Issued

2006

Date

2006

Author(s)

Huang, Jih-Jeng

DOI

en-US

URI

http://ntur.lib.ntu.edu.tw//handle/246246/54442

Abstract

With the rapid development of storage system technology, databases, data warehouses are widely employed by enterprises to extract useful information for applying supply chain management (SCM), enterprise resource planning (ERP), and customer relationship management (CRM). In order to effectively extract the useful knowledge hidden in the database/data warehouse, data mining technology is highlighted in the process of knowledge discovering in databases (KDD). Data mining can be considered as the core of KDD and an iterative and interactive process to extract valid, nontrivial, and interesting information and knowledge from large among of data. The tasks of data mining can be divided into classification, regression, deviation detection, clustering, association rules, and sequential pattern. In this dissertation, the problem of data classification is highlighted. The problems of the conventional classification models are considered to develop three models. These three models are proposed to incorporate the advantages of the discriminant-based and the induction-based methods based on the genetic programming method (GP). The first model is to employ GP for building a classification model. The reasons which we employ GP to propose the classification model are that GP can automatically and heuristically determine the adequate discriminant functions and the valid attributes simultaneously. In addition, unlike artificial neural networks (ANNs) which are only suited for large data sets, GP can perform well even in small data sets. The second model called the IF-THEN ruled genetic programming (IF-THEN GP) is based on the principle of “divide and conquer.” We can set a threshold of the cut to retrain the indiscernible data set to form the second discriminant function using GP and to obtain other discriminant functions in this order. In order to combine the advantages of the discriminant-based and the induction-based methods, the third model we propose is two-stage genetic programming (2SGP). 2SGP integrates the function-based and the induction-based methods to form a hybrid model. First, the IF-THEN rules are derived using GP. Next, the reduced data are fed into GP again to form the discriminant function for providing the capability of forecasting. In addition, we used two credit-scoring data sets to test the effectiveness of the proposed models and to compared with the conventional methods including multi-layer perceptron (MLP), classification and regression tree (CART), C4.5, rough sets, and logistic regression (LR). On the basis of the numerical results, we can conclude that the proposed methods outperform to other models and should be more suitable for the real-life classification problems.

Subjects

知識發現

資料挖掘

分類模型

基因規劃

信用計分

Classification models

genetic programming

artificial neural networks (ANNs)

decision tree

rough sets

logistic regression

Type

other

File(s)

Name

ntu-95-D91725010-1.pdf

Size

23.31 KB

Format

Adobe PDF

Checksum

(MD5):05136d5dcab26fc130f8a18c9dcc319a

Data Mining for the Classification Problem-The Inspiration of Genetic Programming

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)