Spam Filtering on Social Media Posts Using Convolutional Neural Networks
Date Issued
2016
Date
2016
Author(s)
Chiu, Chien-Ching
Abstract
This thesis proposes a blog spam filtering system, the convolutional neural network (CNN), which aims at filtering the blog posts on Pixnet. The articles that are filtered by the system mentioned in the thesis not only permits readers to have a more excellent reading experience, but also allows researchers to have a more purified traditional Chinese corpus as their resource data. CNN is trained on Pixnet blog dataset by pre-trained word vectors for spam/non-spam classification. The score output of CNN can be considered as an index of spam level, which offers further gains in performance than statistical classification methods (error rate of 8.8% versus 13.7%). CNN configuration for training a traditional Chinese text classifier is reported in detail. One observation in our experimental results is that the feature extracted by each filter in convolutional layer, is highly relevant to important keywords in the articles. On the other hand, the descriptors extracted from our CNN achieved an acceptable performance in another text classification task. The result is better than both roughly-tuned CNN and bag-of-words method.
Subjects
Social network
Spam detection
Convolutional neural network
Deep learning
Type
thesis
File(s)
Loading...
Name
ntu-105-R03525087-1.pdf
Size
23.54 KB
Format
Adobe PDF
Checksum
(MD5):a9783ae520cbc322221b7d4e7fb17cd2