An Approach of Using Multiple Dictionaries and Conditional Random Field in Chinese Segmentation and Part of Speech Tagging

Lo, Yong-Sheng

An Approach of Using Multiple Dictionaries and Conditional Random Field in Chinese Segmentation and Part of Speech Tagging

Date Issued

2008

Date

2008

Author(s)

Lo, Yong-Sheng

URI

http://ntur.lib.ntu.edu.tw//handle/246246/184939

Abstract

This paper proposes a dictionary-CRF-combined approach for Chinese word segmentation and part of speech tagging. This approach proposes all probable sentences by looking up dictionaries and selects the best sentence utilizing a CRF model. This approach can incorporate as many dictionaries as possible to solve new term problem without re-training the model. Moreover, a practical method which adds terms in the system’s dictionary without causing any inconsistence of segmentation rules is also proposed. Most usefully, this approach is able to select dictionaries and segmentation settings according to the document type. Training and testing collections of SIGHAN bakeoff 1 and a medical document collection are used in the experiments. This approach achieves an f-score 0.964 in segmentation, and 0.922 in part of speech tagging, which is satisfactory. Moreover, the training process uses only 7,229 lines in the training file, and this shows that it is easy to build this model by small training data. This approach achieves an f-score 0.954 in segmentation and 0.939 in part of speech tagging even 10 simplified parts of speech are used for training. The simplicity, practicability and flexibility are the superiorities of this approach.

Subjects

Chinese word segmentation

part of speech tagging

dictionaries

conditional random field

CRF

linguistic rules

SIGHAN

Type

thesis

File(s)

Name

ntu-97-R95922009-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):f9d9d4881ae7711253cbbc58772cd822

An Approach of Using Multiple Dictionaries and Conditional Random Field in Chinese Segmentation and Part of Speech Tagging

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)