https://scholars.lib.ntu.edu.tw/handle/123456789/413152
標題: | Pause and stop labeling for Chinese sentence boundary detection | 作者: | Huang H.-H. Chen H.-H. |
公開日期: | 2011 | 起(迄)頁: | 146-153 | 來源出版物: | International Conference Recent Advances in Natural Language Processing, RANLP | 摘要: | The fuzziness of Chinese sentence boundary makes discourse analysis more challenging. Moreover, many articles posted on the Internet are even lack of punctuation marks. In this paper, we collect documents written by masters as a reference corpus and propose a model to label the punctuation marks for the given text. Conditional random field (CRF) models trained with the corpus determine the correct delimiter (a comma or a full-stop) between each pair of successive clauses. Different tagging schemes and various features from different linguistic levels are explored. The results show that our segmenter achieves an accuracy of 77.48% for plain text, which is close to the human performance 81.18%. For the rich formatted text, our segmenter achieves an even better accuracy of 82.93%. |
描述: | 8th International Conference on Recent Advances in Natural Language Processing, RANLP 2011, 12 September 2011 through 14 September 2011, Hissar |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84858321116&partnerID=40&md5=78033f1f8d5a56b23c8ee30263e9f0e5 | ISSN: | 13138502 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。