https://scholars.lib.ntu.edu.tw/handle/123456789/413103
標題: | A probabilistic framework for Chinese spelling check | 作者: | Chen K.-Y. Wang H.-M. HSIN-HSI CHEN |
關鍵字: | Chinese;Language model;Probabilistic;Spelling check;Topic modeling | 公開日期: | 2015 | 卷: | 14 | 期: | 4 | 來源出版物: | ACM Transactions on Asian and Low-Resource Language Information Processing | 摘要: | Chinese spelling check (CSC) is still an unsolved problem today since there are many homonymous or homomorphous characters. Recently, more and more CSC systems have been proposed. To the best of our knowledge, language modeling is one of the major components among these systems because of its simplicity and moderately good predictive power. After deeply analyzing the school of research, we are aware that most of the systems only employ the conventional n-gram language models. The contributions of this article are threefold. First, we propose a novel probabilistic framework for CSC, which naturally combines several important components, such as the substitution model and the language model, to inherit their individual merits as well as to overcome their limitations. Second, we incorporate the topic language models into the CSC system in an unsupervised fashion. The topic language models can capture the long-span semantic information from a word (character) string while the conventional n-gram language models can only preserve the local regularity information. Third, we further integrate Web resources with the proposed framework to enhance the overall performance. Our rigorously empirical experiments demonstrate the consistent and utility performance of the proposed framework in the CSC task. ? 2015 ACM. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/413103 | ISSN: | 23754699 | DOI: | 10.1145/2826234 |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。