Bipolar Person Name Identification of Topic Documents Using Principal Component Analysis
Date Issued
2010
Date
2010
Author(s)
Wu, Chen-Yuan
Abstract
In this paper, we propose an unsupervised approach for identifying bipolar named entities in a set of topic documents. We employ principal component analysis (PCA) to discover bipolar word usage patterns of named entities in the documents and show that the signs of the entries in the principal eigenvector of PCA partition the named entities into bipolar groups spontaneously. We present two techniques, called off-topic block elimination and weighted correlation coefficient, to reduce the effect of data sparseness on person name bipolarization. Empirical evaluations demonstrate the efficacy of the proposed approach in identifying bipolar named entities of topics and the approach is language independent.
After employing PCA, the sign of the entries in the principal eigenvector vector of PCA identifies bipolar groups of person names in a set of chronological topic documents. To help readers discover the evolution of bipolar groups, it would be useful to analyze the activeness trend of each polarity. We demonstrate that the changes in activeness accurately reflect the activeness trend of a bipolar group.
Subjects
text mining
opinion mining
clustering
bipolar person name identification
topic documents
principal component analysis
activeness timeline graph
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-99-R97725035-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):f66305c57cc1ff19b2ec779f4ddbad73
