Efficient Classification for Mining Concept-Drifting Data Streams
Date Issued
2005
Date
2005
Author(s)
Lee, Yi-yao
DOI
en-US
Abstract
We devise in this thesis a concept-drift-driven classification algorithm, called SODA(Speedy Concept-Drift Detection Algorithm) to mine data streams with concept drift. SODA is an on-line incremental learning algorithm which is able to keep its model consistent with new concepts and to process each example in constant time. The contributions of the algorithm SODA are many folds. We address the problem of detecting concept drifts by inspecting the distribution of one attribute which is most discriminative to target class. The SODA algorithm is capable of capturing concept drifts in data streams efficiently, and looks after execution performance and accuracy of classifiers. From the empirical studies in Section 4, by applying the efficient split checking method, the concept drift detection with statistical analysis, and the effective alternative tree selection strategy, algorithm SODA outperforms prior works in terms of execution efficiency, performance of detecting concept drifts, and economic usage of memory. Thus, the concepts in data streams can be captured and learned efficiently. Therefore, SODA algorithm is able to strike a balance between the memory usage and accuracy of the classifier in data streams.
Subjects
串流資料
概念遞移
決策樹
Data Streams
Concept Drfit
Classification
Data Mining
Type
thesis
