林智仁臺灣大學:工業工程學研究所高子翔Kao, Tzu-HsiangTzu-HsiangKao2007-11-262018-06-292007-11-262018-06-292005http://ntur.lib.ntu.edu.tw//handle/246246/51256This thesis studies Parametric Mixture Models (PMMs). They are efficient statistical models to solve multi-label text categorization problem. Conventional machine learning models usually training binary classifiers for predicting multi-label problem. In contrast, PMMs use a single statistical model to handle multi-label text. We propose an Advanced Parametric Mixture Model (APMM) based on PMMs. Its maximum likelihood is a concave programming problem. We design update rules so that iterations converge to a global maximum. The experiments use the real-world yahoo.com datasets under three common multi-label classification measurements. The results show that APMM is competitive.TABLE OF CONTENTS ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ii LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : v LIST OF TABLES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vi CHAPTER I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II. Advanced Parametric Mixture Model . . . . . . . . . . . . . . . 4 2.1 Parametric Mixture Model . . . . . . . . . . . . . . . . . . . . 4 2.1.1 PMM1 . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 PMM2 . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Advanced Parametric Mixture Model . . . . . . . . . . . . . . . 10 2.3 Training Update Formula . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 PMM1 . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.2 APMM . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Prediction Method . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 PMM1 . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2 APMM . . . . . . . . . . . . . . . . . . . . . . . . . . 25 III. Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Exact Match Ratio . . . . . . . . . . . . . . . . . . . 28 3.2.2 Labeling F-measure . . . . . . . . . . . . . . . . . . . 29 3.2.3 Retrieval F-measure . . . . . . . . . . . . . . . . . . . 30 3.3 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . 30 3.4 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . 31 IV. Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . 36 APPENDICES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 BIBLIOGRAPHY : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 43 LIST OF FIGURES Figure 3.1 The relation between stopping tolerances and performances. . . . . . 33 3.2 Number of iterations versus stop criteria from 10 to 0.00001 . . . . . 33 3.3 ( ⣵76; 1) from 0.00001 to 10. . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 ( ⣵76; 1) from 0.1 to 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 LIST OF TABLES Table 3.1 Details of the yahoo.com Web page datasets. #Text" is the number of texts in the dataset, #Voc" is the number of vocabularies (i.e., features), #Tpc" is the number of topics, #Lbl" is the number of labels, and Label size Frequency" is the relative frequency of each label size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Single-label documents prediction performance. Pr s" is the number of documents predicted as single-label, Co s" is the number of single- label documents which have been predicted correctly, Co ratio" is the ratio of single-label documents has been correctly predicted. . . . . . 31 3.3 Training and testing time of models with di erent stopping tolerances. Since the numbers in this table are the average of several problems, the numbers of iterations have decimal point . . . . . . . . . . . . . . 32 3.4 Performance of using stopping tolerances = 0:01. Three evaluation criteria presented in Section 3.2. The Exact Match ratio of APMM is better than that of PMM1, but the Retrieval F-measure is lower then PMM1. The Labeling F-measures of the two models are quite similar. 32 3.5 Performance of using stop tolerance 0:00001. The legend is the same as Table 3.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.6 Prediction accuracy of di erent label size. #label" is the label size, mun" is the total number of #label in the dataset, Pr" is the size of label has been predicted, Co" is the correctly predicted number, and Co ratio" is the ratio of correctly predicted. Since Table 3.1 shows the frequency of label size larger than 4 are relatively smaller, we combines the correctly predicted ratio 4. . . . . . . . . . . . . 35270983 bytesapplication/pdfen-US參數混成模型多標籤分類文件分類機器學習最大概似機率parametric mixture modelmulti-label classificationtext categorizationmachine learningmaximum likelihood進階參數混成模型於多標籤文件分類之應用Advanced Parametric Mixture Model for Multi-Label Text Categorizationthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/51256/1/ntu-94-R93546015-1.pdf