Content Analysis - A Corpus Based Model

陳光華; 陳信希; 陳光華; 陳信希

Content Analysis - A Corpus Based Model

Resource

Journal of Library Science, n.12 pp.95-112

Journal

圖書館學刊

Journal Issue

n.11 pp.95-112

Pages

-

Date Issued

1996-12

Date

1996-12

Author(s)

陳光華

陳信希

URI

http://ntur.lib.ntu.edu.tw//handle/246246/190607

Abstract

An important step to understand text is to build the discourse structure through cohesion and coherence. However, to build the discourse structure in turn depends on the full understanding of texts, so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1) repetition of words, 2) importance of words, and 3) collocational semantics for texts is proposed in this paper. It focuses on association norms of noun-noun relations and noun-verb relations defined on discourse level and sentence level, respectively. According to this model, a text partition algorithm is proposed to determine the boundaries of discourse structures and a topic identification algorithm is also presented. The results of a series of experiments show that the proposed model is Promising.

Subjects

Discourse Analysis

Information Retrieval

Natural Language Processing

Publisher

Department of Library and Information Science, National Taiwan University

Type

journal article

File(s)

Name

o11-5.pdf

Size

23.39 KB

Format

Adobe PDF

Checksum

(MD5):ad77c97bf26e7d908b372910824383ee

Content Analysis - A Corpus Based Model

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)