Predicting morphological types of Chinese bi-character words by machine learning approaches

Huang T.-H.; Ku L.-W.; Chen H.-H.; Chen H.-H.;Ku L.-W.;Huang T.-H.

Predicting morphological types of Chinese bi-character words by machine learning approaches

Journal

7th International Conference on Language Resources and Evaluation

Pages

844-850

ISBN

9782951740860

Date Issued

2010

Author(s)

Huang T.-H.

Ku L.-W.

Chen H.-H.

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/413166

URL

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85037079802&partnerID=40&md5=2c163b6dba14d1529d00bb32a2db23af

Abstract

This paper presented an overview of Chinese bi-character words' morphological types, and proposed a set of features for machine learning approaches to predict these types based on composite characters' information. First, eight morphological types were defined, and 6,500 Chinese bi-character words were annotated with these types. After pre-processing, 6,178 words were selected to construct a corpus named Reduced Set. We analyzed Reduced Set and conducted the inter-annotator agreement test. The average kappa value of 0.67 indicates a substantial agreement. Second, Bi-character words' morphological types are considered strongly related with the composite characters' parts of speech in this paper, so we proposed a set of features which can simply be extracted from dictionaries to indicate the characters' "tendency" of parts of speech. Finally, we used these features and adopted three machine learning algorithms, SVM, CRF, and Na?ve Bayes, to predict the morphological types. On the average, the best algorithm CRF achieved 75% of the annotators' performance.

Type

conference paper

Predicting morphological types of Chinese bi-character words by machine learning approaches

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)