A Data Clustering Algorithm Based on Multiple Pairwise Similarity Matrices
Date Issued
2016
Date
2016
Author(s)
Yang, Chun-Pai
Abstract
This thesis aims at solving the data clustering problem given only pairwise similarity information, different from many conventional clustering methods that assume the availability of instance features. To make it more general, we consider the existence of multiple similarity matrices as the input. We propose a non-negative matrix factorization model, MDSC, to handle this task. The main concept is to learn a low-rank representation for the partition potentials among all the similarity matrices. Different from other matrix factorization models, MDSC is formulated as a series of non-negative symmetric tri-factorizations to approximate the similarity matrix of each dimension. In particular, the latent matrices are regarded as the clustering potential matrix and shared among all the dimensions. To fit every similarity matrix using the same potential matrix, the middle matrices are introduced to adjust the bases and to absorb noises. An optimization framework is established to learn the model parameters. The experimental results show that the proposed model outperforms the existing approaches on real-world datasets in terms of three common clustering performance metrics. We also show that our model converges fast and can reach similar quality of results using less than 10% of the data.
Subjects
Data Clustering
Machine Learning
Data Mining
Type
thesis
File(s)
Loading...
Name
ntu-105-R03922039-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):9022b2216cc991494e815806eaea01e7