https://scholars.lib.ntu.edu.tw/handle/123456789/413152
Title: | Pause and stop labeling for Chinese sentence boundary detection | Authors: | Huang H.-H. Chen H.-H. |
Issue Date: | 2011 | Start page/Pages: | 146-153 | Source: | International Conference Recent Advances in Natural Language Processing, RANLP | Abstract: | The fuzziness of Chinese sentence boundary makes discourse analysis more challenging. Moreover, many articles posted on the Internet are even lack of punctuation marks. In this paper, we collect documents written by masters as a reference corpus and propose a model to label the punctuation marks for the given text. Conditional random field (CRF) models trained with the corpus determine the correct delimiter (a comma or a full-stop) between each pair of successive clauses. Different tagging schemes and various features from different linguistic levels are explored. The results show that our segmenter achieves an accuracy of 77.48% for plain text, which is close to the human performance 81.18%. For the rich formatted text, our segmenter achieves an even better accuracy of 82.93%. |
Description: | 8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011, 12 September 2011 through 14 September 2011, Hissar |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84858321116&partnerID=40&md5=78033f1f8d5a56b23c8ee30263e9f0e5 | ISSN: | 13138502 |
Appears in Collections: | 資訊工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.