Adaptive clustering for multiple evolving streams

Dai, Bi-Ru; Huang, Jen-Wei; Yeh, Mi-Yen; Chen, Ming-Syan

標題:	Adaptive clustering for multiple evolving streams
作者:	Dai, Bi-Ru Huang, Jen-Wei Yeh, Mi-Yen Chen, Ming-Syan
關鍵字:	Data mining;clustering of multiple data streams;time-series clustering
公開日期:	九月-2006
出版社:	Taipei:National Taiwan University Dept Elect Engn
卷:	18
期:	9
起(迄)頁:	1166-1180
來源出版物:	IEEE Transactions on Knowledge and Data Engineering
摘要:	In the data stream environment, the patterns generated at different time instances are different due to data evolution. As time progresses, the behavior and members of clusters usually change. Hence, clustering continuous data streams allows us to observe the changes of group behavior. In order to support flexible clustering requirements, we devise in this paper a Clustering on Demand framework, abbreviated as COD framework, to dynamically cluster multiple data streams. While providing a general framework of clustering on multiple data streams, the COD framework has two advantageous features, namely, one data scan for online statistics collection and compact multiresolution approximations, which are designed to address, respectively, the time and the space constraints in a data stream environment. The COD framework consists of two phases, i.e., the online maintenance phase and the offline clustering phase. The online maintenance phase provides an efficient mechanism to maintain summary hierarchies of data streams with multiple resolutions in time linear in both the number of streams and the number of data points in each stream. On the other hand, an adaptive clustering algorithm is devised for the offline phase to retrieve approximations of desired substreams from summary hierarchies according to clustering queries. We propose two summarization techniques, based on wavelet and regression analyses, to construct the summary hierarchies. The regression-based summary hierarchy approximates the data stream more precisely and provides better clustering results, at the cost of slightly longer time than and twice the storage space as the waveletbased one. An adaptive version of COD framework is designed to make a selection between a wavelet-based model and a regressionbased model for building the summary hierarchy. By the adaptive COD, we can obtain clustering results with almost the same quality as the regression-based COD while using much less storage space for the summary hierarchy. As shown in the complexity analyses and also validated by our empirical studies, the COD framework performs very efficiently in the data stream environment while producing clustering results of very high quality.
URI:	http://ntur.lib.ntu.edu.tw//handle/246246/200611150121715 http://ntur.lib.ntu.edu.tw/bitstream/246246/200611150121715/1/687.pdf
其他識別:	246246/200611150121715
DOI:	10.1109/TKDE.2006.137
顯示於：	電機工程學系

文件中的檔案：

檔案	描述	大小	格式
687.pdf		2.37 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/11/12

WEB OF SCIENCE^TM
Citations

checked on 2024/2/6

Page view(s) 10

109

checked on 2024/4/20

下載

checked on 2024/4/20

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

文件中的檔案：

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s) 10

下載

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM