智慧型知識擷取技術與應用研究(II)─子計畫一：語料庫之設計與製作(II)

陳光華

智慧型知識擷取技術與應用研究(II)─子計畫一：語料庫之設計與製作(II)

Date Issued

1998-07-31

Date

1998-07-31

Author(s)

陳光華

DOI

872213E002023

URI

http://ntur.lib.ntu.edu.tw//handle/246246/20391

Abstract

Bilingual corpus carries many kinds of linguistic knowledge such that they can be used in word-sense disambiguation, extracting translation templates, finding bilingual collocations, automatic translation in noun compounds, building bilingual dictionary, and so on. To do such kinds of applications, the most important task is to align the bilingual texts. To align a text means to show which parts of the first language correspond to which parts of the second language. In this study, an approach for word alignment in English-Chinese corpus is presented. Previous works on aligning words seldom touch the texts in different language families, like English and Chinese. Our experimental material consists of two corpora: ROCLING Text Corpus and NTU Bilingual Corpus. Three language models are proposed to do word alignment in this study. Theoretically, the matching procedure will initially align the English word to its Chinese counterpart if it appears in the corresponding Chinese sentence, and then resolving conflicts to make no different English words are corresponded to the same Chinese word in the corresponding Chinese sentence. Precision and augmentation are used to evaluate the system performance.

Subjects

Word Alignment

Bilingual
Corpus

Natural Language Processing

Publisher

臺北市：國立臺灣大學圖書資訊學系暨研究所

Type

report

File(s)

Name

872213E002023.pdf

Size

40.33 KB

Format

Adobe PDF

Checksum

(MD5):87352a43a33fb6bebac5f7116c7e6290

智慧型知識擷取技術與應用研究(II)─子計畫一：語料庫之設計與製作(II)

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)