Performance Evaluations of Clustering Algorithms for Categorical Variables, Illustrated with SNPs

Chen, Hung-Che

Performance Evaluations of Clustering Algorithms for Categorical Variables, Illustrated with SNPs

Date Issued

2016

Date

2016

Author(s)

Chen, Hung-Che

URI

http://ntur.lib.ntu.edu.tw//handle/246246/273820

Abstract

Digital data and information are being generated at an escalating speed, especially in human modern life. Dealing with such large amounts of information has become an important issue for scientists. One way to reduce such large volume of data is via clustering. The development of clustering algorithms has a long history. Most of them, however, aimed at continuous observations, such as age and weight. For categorical data, not many algorithms have been proposed, not to mention for data that are of a greater size. In this paper we evaluate the performance of various clustering algorithms for categorical variables. Specifically, we compare three algorithms, K-modes, Hamming distance-based clustering algorithms (HD cluster), and RObust Clustering using linKs (ROCK). We investigate how their performances are affected by the frequencies of variables and the correlation between variables. The criteria for their performance evaluation are Rand Index (RI), Adjusted Rand Index (ARI), Number in Wrong Clusters, C-impurity, and Normalized Mutual Information (NMI). Simulation studies are conducted for illustrations to compare all three algorithms. The results show that the HD cluster performs better than or at least the same as the other two algorithms in all tested cases. Finally we discuss limitations and future directions for the HD cluster algorithm.

Subjects

large volume

categorical data

Type

thesis

File(s)

Name

ntu-105-R02849031-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):66138f48ffd632e7ec3fe5fe605da991

Performance Evaluations of Clustering Algorithms for Categorical Variables, Illustrated with SNPs

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)