Corpus-driven Linguistic Approaches to Sense Prediction

Hong,  Jia-Fei

Corpus-driven Linguistic Approaches to Sense Prediction

Date Issued

2010

Date

2010

Author(s)

Hong, Jia-Fei

URI

http://ntur.lib.ntu.edu.tw//handle/246246/257555

Abstract

In this study, I proposed using corpus-driven distribution as the main method of prediction. I concentrated on individual semantic features to predict the senses of non-defined words by using corpora and tools, such as Chinese Gigaword Corpus, HowNet, Chinese Wordnet, and XianDai HanYu CiDian (Xian Han). Using these corpora, I determined the collocation clusters of the four target words--- chi1 “eat”, wan2 “play”, huan4 “change” and shao1 “burn” through character similarities and concepts similarities. The four target words are all transitive verbs and they each have more than two senses. The collocation words of the four target words are very useful and play an important role in this sense prediction study. When conducting the character similarity clustering analysis, I employed identical morphemes of some of the collocation words in order to cluster them into the same cluster. Therefore, there are two main strategies of the corpus-based and computational approach used in this sense prediction study: (1) character similarity clustering analysis; and (2) concept similarity clustering analysis, which encompasses via HowNet (a) similarity between sememes, and (b) similarity between concepts. In this sense prediction study, I first predicted that different clusters can represent different senses, and I examined the accuracy rates of the four target words via the character similarity clustering analysis and the concept similarity clustering analysis of the corpus-based and computational approach. Then, I evaluated the four target words via sense divisions in Chinese Wordnet and in Xiandai Hanyu Cidian and was able to employ automatically computational programming to predict different senses for chi “eat”, wan2 “play”, huan4 “change”, and shao1 “burn”. After the corpus-based and computational approach used in this sense prediction study, I demonstrated that I was able to use off-line tasks to test my participants’ intuition, which supports the theory that different clusters can represent different senses when using the corpus-based and computational approach. Therefore, in order to examine the related collocation words for the lexically ambiguous target words, I employed a multiple-choice task (Burton et al. 1991). In addition, because the stimuli were collected from the character similarity clustering analysis of the corpus-based and computational approach, I demonstrated the viability of this approach by the results presented in this sense prediction study.

Subjects

Lexical ambiguity

sense prediction

corpus-based approach

character similarity clustering approach

concept similarity clustering approach

experimental Evaluation

Type

thesis

File(s)

Name

ntu-99-D95142001-1.pdf

Size

23.53 KB

Format

Adobe PDF

Checksum

(MD5):fdcff629a0a0169d5332a83a8d3671cf

Corpus-driven Linguistic Approaches to Sense Prediction

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)