A Probabilistic Chunker
Resource
Proceedings of the 6th R.O.C. Computational Linguistics Conference VI, 99-117
Journal
6th R.O.C. Computational Linguistics Conference VI
Pages
99-117
Date Issued
1993
Date
1993
Author(s)
Abstract
This paper proposes a probabilistic partial parser, which we call chunker. The chunker partitions the input sentence into segments. This idea is motivated by the fact that when we read a sentence, we read it chunk by chunk. We train the chunker from Susanne Corpus, which is a modified but shrinked version of Brown Corpus, underlying bi-gram language model. The experiment is evaluated by outside test and inside test. The preliminary results show the chunker has more than 98% chunk correct rate and 94% sentence correct rate in outside test, and 99% chunk correct rate and 97% sentence correct rate in inside test. The simple but effective chunker design has shown to be promising and can be extended to complete parsing and many applications. © 1993 Proceedings of Rocling 6th Computational Linguistics Conference, ROCLING 1993. All rights reserved.
Other Subjects
Computational linguistics; Bi-gram language models; Probabilistics; Simple++; Syntactics
Publisher
Taipei, Taiwan: ROCLING
Type
conference paper
