Options
Categorical Data Clustering Using Gravitational Algorithm and Randomized Technique
Date Issued
2006
Date
2006
Author(s)
Wu, Tsung-Ho
DOI
zh-TW
Abstract
Data clustering is an important research subject for identifying important properties of data sets. However, existing literature have been focusing mainly on numerical data. In this thesis, we have proposed a method to cluster categorical data.
Our method proposes a similarity measurement for categorical data based on information entropy. It measures the difference between the original information entropy and the entropy after merging the data into clusters. We use this measurement as the distance between clusters and apply the gravity theory to obtain the final clusters. To avoid the problem of local optimum, we have incorporated randomizing schemes into our algorithm. The notion of digital search trees is also utilized to speedup the clustering process.
Experiments using UCI ML repository data show that our algorithm produces results is quite competitive with those in the existing literature.
Our method proposes a similarity measurement for categorical data based on information entropy. It measures the difference between the original information entropy and the entropy after merging the data into clusters. We use this measurement as the distance between clusters and apply the gravity theory to obtain the final clusters. To avoid the problem of local optimum, we have incorporated randomizing schemes into our algorithm. The notion of digital search trees is also utilized to speedup the clustering process.
Experiments using UCI ML repository data show that our algorithm produces results is quite competitive with those in the existing literature.
Subjects
資料分群
類別性資料
Categorical Data Clustering
Gravitational Algorithm
Type
thesis
File(s)
No Thumbnail Available
Name
ntu-95-R93922082-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):3ede99f8ad87a6cdbdb1e431c3aaacff