https://scholars.lib.ntu.edu.tw/handle/123456789/632417
標題: | Truncation on combined word-based and class-based language model using kullback-leibler distance criterion | 作者: | Yang K.-C Ho T.-H Lin J.-S LIN-SHAN LEE |
公開日期: | 1997 | 起(迄)頁: | 335-344 | 來源出版物: | Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997 | 摘要: | In this paper we present a novel approach to truncate combined word-based and class-based n-gram language model using Kullback-Leibler distance criterion. First, we investigate a reliable backoff scheme for unseen n-gram using class-based language model, which outperforms conventional approaches using (n-l)-gram in perplexity for both training and testing data. As for the language model truncation, our approach uses dynamic thresholds for different words or word contexts determined by the Kullback-Leibler distance criterion, as opposed to the conventional scheme which truncates the language model by a constant threshold. In our experiments, 80% of the parameters are reduced by using the combined word-based and class-based n-gram language model and the Kullback-Leibler distance truncation criterion, while the perplexity only increases 1.6%, as compared with the word bigram language model without any truncation. © 1997 Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997. All rights reserved. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85121257022&partnerID=40&md5=8a1b12b844f4633cdb46d8d405ba27a6 https://scholars.lib.ntu.edu.tw/handle/123456789/632417 |
SDG/關鍵字: | Backoffs; Class-based; Class-based language model; Conventional approach; Distance criterion; Kullback-Leibler distance; N-gram language models; N-grams; Training and testing; Computational linguistics |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。