Constrained Data Clustering

Dai, Bi-Ru

Constrained Data Clustering

Date Issued

2006

Date

2006

Author(s)

Dai, Bi-Ru

DOI

en-US

URI

http://ntur.lib.ntu.edu.tw//handle/246246/53565

Abstract

Among various data mining capabilities, data clustering is a useful technique for group behavior investigation, and is helpful for many applications. Since data mining is an application dependent technology, the information involving domain knowledge is usually imposed on the mining systems as various constraints. In this dissertation, we address the problem of constrained clustering with numerical constraints, in which the constraint attribute values of any two data items in the same cluster are required to be within the corresponding constraint range. Several algorithms are proposed to solve such a clustering problem. It is noted that due to the intrinsic nature of the numerical constrained clustering, there is an order dependency on the process of attaining the clustering, which in many cases degrades the clustering results. In view of this, we devise a progressive constraint relaxation technique to remedy this drawback and improve the overall performance of clustering results. In addition to clustering on static data sets, the problem of clustering multiple data streams is also addressed in this dissertation. We devise a Clustering on Demand framework, abbreviated as COD framework, to dynamically cluster multiple data streams. The COD framework consists of two phases, i.e., the online maintaining phase and the offline clustering phase. The online maintaining phase provides an efficient mechanism to maintain the summary hierarchies of the data streams with multiple resolutions. On the other hand, an adaptive clustering algorithm is devised for the offline phase to retrieve the approximations of the desired sub-streams from the summary hierarchies according to the clustering queries. Finally, the concepts of constraints and data streams are combined and considered together. We devise a framework of Constrained Clustering for the Evolving Data Stream, abbreviated as CCDS framework, to cluster the data stream under the pairwise range constraint. Two phases are designed to maintain the data points and to generate clusters respectively.

Subjects

資料探勘

資料叢集

資料串流

Data Mining

Data Clustering

Data Stream

Type

thesis

File(s)

Name

ntu-95-F89921035-1.pdf

Size

23.31 KB

Format

Adobe PDF

Checksum

(MD5):c91797a2413fc4751027c5637cb5f275

Constrained Data Clustering

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)