智慧型知識擷取技術與應用研究(II)─子計畫一:語料庫之設計與製作(II)
Date Issued
1998-07-31
Date
1998-07-31
Author(s)
DOI
872213E002023
Abstract
Bilingual corpus carries many kinds of
linguistic knowledge such that they can be
used in word-sense disambiguation,
extracting translation templates, finding
bilingual collocations, automatic translation
in noun compounds, building bilingual
dictionary, and so on. To do such kinds of
applications, the most important task is to
align the bilingual texts. To align a text
means to show which parts of the first
language correspond to which parts of the
second language. In this study, an approach
for word alignment in English-Chinese
corpus is presented. Previous works on
aligning words seldom touch the texts in
different language families, like English and
Chinese. Our experimental material consists of two corpora: ROCLING Text Corpus and
NTU Bilingual Corpus.
Three language models are proposed to
do word alignment in this study.
Theoretically, the matching procedure will
initially align the English word to its Chinese
counterpart if it appears in the corresponding
Chinese sentence, and then resolving
conflicts to make no different English words
are corresponded to the same Chinese word
in the corresponding Chinese sentence.
Precision and augmentation are used to
evaluate the system performance.
linguistic knowledge such that they can be
used in word-sense disambiguation,
extracting translation templates, finding
bilingual collocations, automatic translation
in noun compounds, building bilingual
dictionary, and so on. To do such kinds of
applications, the most important task is to
align the bilingual texts. To align a text
means to show which parts of the first
language correspond to which parts of the
second language. In this study, an approach
for word alignment in English-Chinese
corpus is presented. Previous works on
aligning words seldom touch the texts in
different language families, like English and
Chinese. Our experimental material consists of two corpora: ROCLING Text Corpus and
NTU Bilingual Corpus.
Three language models are proposed to
do word alignment in this study.
Theoretically, the matching procedure will
initially align the English word to its Chinese
counterpart if it appears in the corresponding
Chinese sentence, and then resolving
conflicts to make no different English words
are corresponded to the same Chinese word
in the corresponding Chinese sentence.
Precision and augmentation are used to
evaluate the system performance.
Subjects
Word Alignment
Bilingual
Corpus
Corpus
Natural Language Processing
Publisher
臺北市:國立臺灣大學圖書資訊學系暨研究所
Type
report
File(s)![Thumbnail Image]()
Loading...
Name
872213E002023.pdf
Size
40.33 KB
Format
Adobe PDF
Checksum
(MD5):87352a43a33fb6bebac5f7116c7e6290