Content Analysis - A Corpus Based Model
Resource
Journal of Library Science, n.12 pp.95-112
Journal
圖書館學刊
Journal Issue
n.11 pp.95-112
Pages
-
Date Issued
1996-12
Date
1996-12
Abstract
An important step to understand text is to build the discourse structure through cohesion and coherence. However, to build the discourse structure in turn depends on the full understanding of texts, so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1) repetition of words, 2) importance of words, and 3) collocational semantics for texts is proposed in this paper. It focuses on association norms of noun-noun relations and noun-verb relations defined on discourse level and sentence level, respectively. According to this model, a text partition algorithm is proposed to determine the boundaries of discourse structures and a topic identification algorithm is also presented. The results of a series of experiments show that the proposed model is Promising.
Subjects
Discourse Analysis
Information Retrieval
Natural Language Processing
Publisher
Department of Library and Information Science, National Taiwan University
Type
journal article
File(s)![Thumbnail Image]()
Loading...
Name
o11-5.pdf
Size
23.39 KB
Format
Adobe PDF
Checksum
(MD5):ad77c97bf26e7d908b372910824383ee
