文件內容之分析--語料庫為本的模型
Resource
台灣大學圖書館學刊,11,97-114
Journal
台灣大學圖書館學刊
Journal Issue
11
Pages
97-114
Date Issued
1996
Date
1996
Abstract
An important step to understand text is to build the discourse structure through cohesion and
coherence. However, to build the discourse structure in turn depends on the full understanding of texts,
so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1)
repetition of words, 2) importance of words, and 3) collocational semantics for texts is proposed in this
paper. It focuses on association norms of noun-noun relations and noun-verb relations defined on
discourse level and sentence level, respectively. According to this model, a text partition algorithm is
proposed to determine the boundaries of discourse structures and a topic identification algorithm is also
presented. The results of a series of experiments show that the proposed model is promising.
Subjects
文件內容
語料庫
Publisher
臺北市:國立臺灣大學圖書資訊學系
Type
journal article
File(s)![Thumbnail Image]()
Loading...
Name
jlis1996.pdf
Size
79.87 KB
Format
Adobe PDF
Checksum
(MD5):aa9674c0bb17cba1f0e7072e419140e6
