https://scholars.lib.ntu.edu.tw/handle/123456789/23750
標題: | 智慧型知識擷取技術與應用研究(II)─子計畫一:語料庫之設計與製作(II) | 作者: | 陳光華 | 關鍵字: | 詞彙對列;雙語語料庫;自然語言處理;Word Alignment;Bilingual Corpus;Natural Language Processing | 公開日期: | 31-七月-1998 | 出版社: | 臺北市:國立臺灣大學圖書資訊學系暨研究所 | 摘要: | 雙語語料庫帶有許多語言的訊息,因 而有許多可能的應用,例如,詞彙的多義 校正、翻譯樣版的抽取、名詞複合詞的自 動翻譯,及雙語詞典的建立。前人的研究 很少觸及不同語系的平行語料,本研究提 出一些方法,以建立詞彙的對列。實驗語 料主要是ROCLING 語料庫中的HP 與 Lotus 中英雙語語料,以及NTU 中英雙語 語料庫。本研究提出三種語言模型,基本 上每個模型皆包括兩部份。第一為初步找 出句子對列完成的雙語語料中相對應的中 英文詞;第二為解決二個以上英文詞對應 同一個中文詞的情形。系統的評估標準為 精確率與增加率。 Bilingual corpus carries many kinds of linguistic knowledge such that they can be used in word-sense disambiguation, extracting translation templates, finding bilingual collocations, automatic translation in noun compounds, building bilingual dictionary, and so on. To do such kinds of applications, the most important task is to align the bilingual texts. To align a text means to show which parts of the first language correspond to which parts of the second language. In this study, an approach for word alignment in English-Chinese corpus is presented. Previous works on aligning words seldom touch the texts in different language families, like English and Chinese. Our experimental material consists of two corpora: ROCLING Text Corpus and NTU Bilingual Corpus. Three language models are proposed to do word alignment in this study. Theoretically, the matching procedure will initially align the English word to its Chinese counterpart if it appears in the corresponding Chinese sentence, and then resolving conflicts to make no different English words are corresponded to the same Chinese word in the corresponding Chinese sentence. Precision and augmentation are used to evaluate the system performance. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/20391 | 其他識別: | 872213E002023 | Rights: | 國立臺灣大學圖書資訊學系暨研究所 |
顯示於: | 圖書資訊學系 |
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
872213E002023.pdf | 40.33 kB | Adobe PDF | 檢視/開啟 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。