Statistics-based segment pattern lexicon-a new direction for Chinese language modeling
Resource
Acoustics, Speech, and Signal Processing, 1998. ICASSP '98. Proceedings of the 1998 IEEE International Conference on
Journal
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Journal Volume
1
Pages
169-172
Date Issued
1998-05
Date
1998-05
Author(s)
DOI
N/A
Abstract
This paper presents a new direction for Chinese language modeling based on a different concept of the lexicon. Because every Chinese character has its own meaning and there are no in Chinese sentences serving as word boundaries, also because the wording structure in the Chinese language is extremely flexible, the «words» in Chinese are actually not well defined, and there does not exist a commonly accepted lexicon. This makes language modeling very sophisticated in the Chinese language, and the «out of vocabulary (OOV)» problem specially serious. A new concept for the lexicon is thus proposed. The elements of this lexicon can be words or any other «segment patterns». They should be extracted from the training corpus by statistical approaches with a goal to minimize the overall perplexity. The language models can then be developed based on this new lexicon. Very encouraging experimental results have been obtained. © 1998 IEEE.
Event(s)
1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
SDGs
Other Subjects
Computational linguistics; Signal processing; Algorithms; Computer simulation; Feature extraction; Parameter estimation; Chinese characters; Chinese language; Chinese language modeling; Chinese sentence; Language model; Segment pattern; Statistical approach; Training corpus; Natural language processing systems; Speech recognition; Chinese language modeling; Out of vocabulary patterns; Segment pattern lexicon
Type
conference paper
File(s)![Thumbnail Image]()
Loading...
Name
00674394.pdf
Size
507.82 KB
Format
Adobe PDF
Checksum
(MD5):c4330e6e2bbf5afef9f3c8187f99d6db
