CDBSCAN:Cloud Based DBSCAN Clustering Algorithm
Date Issued
2011
Date
2011
Author(s)
Chen, Tze-Yu
Abstract
DBSCAN is one of the well-known density-based clustering algorithms which can identify clusters with arbitrary shape in a noisy space. However, when the scales of the data become larger and larger, DBSCAN is unable to process the data efficiently due to the difficulty of a single machine to scale up. Recently, the development of cloud computing is gradually mature which can help us manage the issue of scalability. In this thesis, we propose an algorithm CDBSCAN, standing for cloud based DBSCAN, which is a distributed version of DBSCAN and is implemented on the Hadoop platform. We use Map/Reduce jobs to cluster the partitioned data set and merge the individual clustering results. The experimental evaluations show that CDBSCAN is a highly parallel algorithm that only requires one Map/Reduce job and achieves near-linearly scalability.
Subjects
Clustering Algorithms
Parallel Algorithms
Distributed Algorithms
Cloud Computing
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-100-R98921044-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):b36e2ec93c46bfc7afd6c4146d25bb5f
