文件內容之分析--語料庫為本的模型

陳光華; 陳信希; 陳光華; 陳信希

文件內容之分析--語料庫為本的模型

Resource

台灣大學圖書館學刊，11，97-114

Journal

台灣大學圖書館學刊

Journal Issue

11

Pages

97-114

Date Issued

1996

Date

1996

Author(s)

陳光華

陳信希

URI

http://ntur.lib.ntu.edu.tw//handle/246246/29212

Abstract

An important step to understand text is to build the discourse structure through cohesion and coherence. However, to build the discourse structure in turn depends on the full understanding of texts, so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1) repetition of words, 2) importance of words, and 3) collocational semantics for texts is proposed in this paper. It focuses on association norms of noun-noun relations and noun-verb relations defined on discourse level and sentence level, respectively. According to this model, a text partition algorithm is proposed to determine the boundaries of discourse structures and a topic identification algorithm is also presented. The results of a series of experiments show that the proposed model is promising.

Subjects

文件內容

語料庫

Publisher

臺北市:國立臺灣大學圖書資訊學系

Type

journal article

File(s)

Name

jlis1996.pdf

Size

79.87 KB

Format

Adobe PDF

Checksum

(MD5):aa9674c0bb17cba1f0e7072e419140e6

文件內容之分析--語料庫為本的模型

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)