智慧型知識擷取技術與應用研究(II)─子計畫一：語料庫之設計與製作(II)

陳光華

標題:	智慧型知識擷取技術與應用研究(II)─子計畫一：語料庫之設計與製作(II)
作者:	陳光華
關鍵字:	詞彙對列;雙語語料庫;自然語言處理;Word Alignment;Bilingual Corpus;Natural Language Processing
公開日期:	31-七月-1998
出版社:	臺北市：國立臺灣大學圖書資訊學系暨研究所
摘要:	雙語語料庫帶有許多語言的訊息，因而有許多可能的應用，例如，詞彙的多義校正、翻譯樣版的抽取、名詞複合詞的自動翻譯，及雙語詞典的建立。前人的研究很少觸及不同語系的平行語料，本研究提出一些方法，以建立詞彙的對列。實驗語料主要是ROCLING 語料庫中的HP 與 Lotus 中英雙語語料，以及NTU 中英雙語語料庫。本研究提出三種語言模型，基本上每個模型皆包括兩部份。第一為初步找出句子對列完成的雙語語料中相對應的中英文詞；第二為解決二個以上英文詞對應同一個中文詞的情形。系統的評估標準為精確率與增加率。 Bilingual corpus carries many kinds of linguistic knowledge such that they can be used in word-sense disambiguation, extracting translation templates, finding bilingual collocations, automatic translation in noun compounds, building bilingual dictionary, and so on. To do such kinds of applications, the most important task is to align the bilingual texts. To align a text means to show which parts of the first language correspond to which parts of the second language. In this study, an approach for word alignment in English-Chinese corpus is presented. Previous works on aligning words seldom touch the texts in different language families, like English and Chinese. Our experimental material consists of two corpora: ROCLING Text Corpus and NTU Bilingual Corpus. Three language models are proposed to do word alignment in this study. Theoretically, the matching procedure will initially align the English word to its Chinese counterpart if it appears in the corresponding Chinese sentence, and then resolving conflicts to make no different English words are corresponded to the same Chinese word in the corresponding Chinese sentence. Precision and augmentation are used to evaluate the system performance.
URI:	http://ntur.lib.ntu.edu.tw//handle/246246/20391
其他識別:	872213E002023
Rights:	國立臺灣大學圖書資訊學系暨研究所
顯示於：	圖書資訊學系

文件中的檔案：

檔案	描述	大小	格式
872213E002023.pdf		40.33 kB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

Page view(s) 5

106

checked on 2024/4/27

下載

checked on 2024/4/27

Google Scholar^TM

檢查

TAIR相關文章

文件中的檔案：

Page view(s) 5

下載

Google ScholarTM

Google Scholar^TM