Truncation on combined word-based and class-based language model using kullback-leibler distance criterion

Yang K.-C;Ho T.-H;Lin J.-S;Lee L.-S.

標題:	Truncation on combined word-based and class-based language model using kullback-leibler distance criterion
作者:	Yang K.-C Ho T.-H Lin J.-S LIN-SHAN LEE
公開日期:	1997
起(迄)頁:	335-344
來源出版物:	Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997
摘要:	In this paper we present a novel approach to truncate combined word-based and class-based n-gram language model using Kullback-Leibler distance criterion. First, we investigate a reliable backoff scheme for unseen n-gram using class-based language model, which outperforms conventional approaches using (n-l)-gram in perplexity for both training and testing data. As for the language model truncation, our approach uses dynamic thresholds for different words or word contexts determined by the Kullback-Leibler distance criterion, as opposed to the conventional scheme which truncates the language model by a constant threshold. In our experiments, 80% of the parameters are reduced by using the combined word-based and class-based n-gram language model and the Kullback-Leibler distance truncation criterion, while the perplexity only increases 1.6%, as compared with the word bigram language model without any truncation. © 1997 Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997. All rights reserved.
URI:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85121257022&partnerID=40&md5=8a1b12b844f4633cdb46d8d405ba27a6 https://scholars.lib.ntu.edu.tw/handle/123456789/632417
SDG/關鍵字:	Backoffs; Class-based; Class-based language model; Conventional approach; Distance criterion; Kullback-Leibler distance; N-gram language models; N-grams; Training and testing; Computational linguistics
顯示於：	電機工程學系

顯示文件完整紀錄

Page view(s)

checked on 2024/4/27

Google Scholar^TM

檢查

TAIR相關文章

Page view(s)

Google ScholarTM

Google Scholar^TM