Bi-perceptron for Chinese Web News Categorization
Date Issued
2016
Date
2016
Author(s)
Pan, Jian
Abstract
Mobile news, due to its natural attributes of high frequency, has become a popular area pursued by many commercial companies in China. News categorization is an important technology in news automatic process. Many supervised learning methods can be applied in this area, where Support Vector Machine(SVM) achieves the state-of-art performance with discrete features. This paper provides the idea of bi-perceptron learning to solve the binary-class classification problem in the hope of achieving comparable or even better results than SVM. Bi-perceptron learning is a divide-and-conquer idea. We proposed this idea in this paper and realized a basic approach of it. We divided the classification problem into three steps: data partition, base classification and aggregation and compared different partition and aggregation methods. Moreover, we analyzed the effect of word segmentation methods, keywords number, the regularization of base classifiers and partition number on the categorization performance. Finally, we find an approach of bi-perceptron learning that is perfect in both time and memory consumption.
Subjects
bi-perceptron learning
Chinese web news categorization
text classification
supervised learning
Type
thesis
File(s)
Loading...
Name
ntu-105-R03922141-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):a795f19df93baa6c20406fa4e7713ada