Dept. of Electr. Eng., National Taiwan Univ.Yang, Kae-CherngKae-CherngYangHo, Tai-HsuanTai-HsuanHoChienq, Lee-FengLee-FengChienqLIN-SHAN LEE2007-04-192018-07-062007-04-192018-07-061998-0515206149http://ntur.lib.ntu.edu.tw//handle/246246/2007041910021754https://www.scopus.com/inward/record.uri?eid=2-s2.0-0031623659&doi=10.1109%2fICASSP.1998.674394&partnerID=40&md5=86fe6a5a8b1fe598f3bafa35bf3f642cThis paper presents a new direction for Chinese language modeling based on a different concept of the lexicon. Because every Chinese character has its own meaning and there are no in Chinese sentences serving as word boundaries, also because the wording structure in the Chinese language is extremely flexible, the «words» in Chinese are actually not well defined, and there does not exist a commonly accepted lexicon. This makes language modeling very sophisticated in the Chinese language, and the «out of vocabulary (OOV)» problem specially serious. A new concept for the lexicon is thus proposed. The elements of this lexicon can be words or any other «segment patterns». They should be extracted from the training corpus by statistical approaches with a goal to minimize the overall perplexity. The language models can then be developed based on this new lexicon. Very encouraging experimental results have been obtained. © 1998 IEEE.application/pdf520011 bytesapplication/pdfen-US[SDGs]SDG4Computational linguistics; Signal processing; Algorithms; Computer simulation; Feature extraction; Parameter estimation; Chinese characters; Chinese language; Chinese language modeling; Chinese sentence; Language model; Segment pattern; Statistical approach; Training corpus; Natural language processing systems; Speech recognition; Chinese language modeling; Out of vocabulary patterns; Segment pattern lexiconStatistics-based segment pattern lexicon-a new direction for Chinese language modelingconference paper10.1109/ICASSP.1998.6743942-s2.0-0031623659http://ntur.lib.ntu.edu.tw/bitstream/246246/2007041910021754/1/00674394.pdf