Feature reinforcement approach to poly-lingual text categorization

CHIH-PING WEI; Shi H; Yang C.C.; Wei C.-P;Shi H;Yang C.C.

Feature reinforcement approach to poly-lingual text categorization

Journal

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Journal Volume

4822 LNCS

Pages

99-108

Date Issued

2007

Author(s)

CHIH-PING WEI

Shi H

Yang C.C.

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-38149032438&partnerID=40&md5=9dd84c93768c812a5dad88b56f0cc9e6

https://scholars.lib.ntu.edu.tw/handle/123456789/456505

https://www.scopus.com/inward/record.uri?eid=2-s2.0-38149032438&doi=10.1007%2f978-3-540-77094-7_17&partnerID=40&md5=bccbdef7a9bfd9bde631276ea10daa3f

Abstract

With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naïve approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora. © Springer-Verlag Berlin Heidelberg 2007.

Other Subjects

Classification (of information); Information retrieval systems; Internet; Problem solving; Feature reinforcement; Poly-lingual text categorization (PLTC); Textual documents; Feature extraction

Type

conference paper

Feature reinforcement approach to poly-lingual text categorization

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)