文件內容之分析─語料庫為本的模型

陳光華; 陳信希

DC 欄位	值	語言
dc.contributor	國立臺灣大學圖書館學系; 國立臺灣大學資訊工程學系	zh-TW
dc.contributor	Department of Library Science, National Taiwan University; Department of Computer Science and Information Engineering, National Taiwan University	en
dc.contributor.author	陳光華	zh-TW
dc.contributor.author	陳信希	zh-TW
dc.creator	陳光華; 陳信希	-
dc.date	1996-12	en
dc.date.accessioned	2010-08-06T02:36:09Z	-
dc.date.accessioned	2018-05-30T04:23:00Z	-
dc.date.available	2010-08-06T02:36:09Z	-
dc.date.available	2018-05-30T04:23:00Z	-
dc.date.issued	1996-12	-
dc.identifier.uri	http://ntur.lib.ntu.edu.tw//handle/246246/190607	-
dc.description.abstract	一般資訊檢索的研究著重於檢索模型的建構、查詢的自饋機制、檢索行為的探討、檢索系統的執行效能。本文則把研究的重心回歸資訊或文件本身，希望對資訊的內容有一個初步的瞭解。本文根據三個因素：1）詞彙的重複，2）詞彙的重要性，3）共容語意，提出一個基於真實語料的文件內容分析的模型。這樣的模型著重於文章中名詞／動詞與名詞／名詞之間的配對關係。本文也說明如何使用文件分析模型進行文件切分與文件主題辨識的研究，同時討論相關實驗的結果。	zh-TW
dc.description.abstract	An important step to understand text is to build the discourse structure through cohesion and coherence. However, to build the discourse structure in turn depends on the full understanding of texts, so that many efforts on this line are not automatic and not successful. A corpus-based model based on 1) repetition of words, 2) importance of words, and 3) collocational semantics for texts is proposed in this paper. It focuses on association norms of noun-noun relations and noun-verb relations defined on discourse level and sentence level, respectively. According to this model, a text partition algorithm is proposed to determine the boundaries of discourse structures and a topic identification algorithm is also presented. The results of a series of experiments show that the proposed model is Promising.	en
dc.language	zh-TW	en
dc.language.iso	en_US	-
dc.publisher	國立臺灣大學圖書資訊學系	zh-TW
dc.publisher	Department of Library and Information Science, National Taiwan University	en
dc.relation	圖書館學刊, n.11 pp.95-112	zh-TW
dc.relation	Journal of Library Science, n.12 pp.95-112	en
dc.relation.ispartof	圖書館學刊	-
dc.subject	言談分析	zh-TW
dc.subject	資訊檢索	zh-TW
dc.subject	自然語言處理	zh-TW
dc.subject	Discourse Analysis	en
dc.subject	Information Retrieval	en
dc.subject	Natural Language Processing	en
dc.title	文件內容之分析─語料庫為本的模型	zh-TW
dc.title	Content Analysis - A Corpus Based Model	en
dc.type	journal article	en
dc.relation.pages	-	-
dc.relation.journalissue	n.11 pp.95-112	-
dc.identifier.uri.fulltext	http://ntur.lib.ntu.edu.tw/bitstream/246246/190607/-1/o11-5.pdf	-
item.openairecristype	http://purl.org/coar/resource_type/c_6501	-
item.openairetype	journal article	-
item.languageiso639-1	en_US	-
item.grantfulltext	open	-
item.cerifentitytype	Publications	-
item.fulltext	with fulltext	-
crisitem.author.dept	Library	-
crisitem.author.dept	Library and Information Science	-
crisitem.author.dept	Networking and Multimedia	-
crisitem.author.dept	Computer Science and Information Engineering	-
crisitem.author.orcid	0000-0003-0616-3815	-
crisitem.author.orcid	0000-0001-9757-9423	-
crisitem.author.parentorg	Administrative Unit	-
crisitem.author.parentorg	College of Liberal Arts	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
顯示於：	圖書資訊學系

文件中的檔案：

檔案	描述	大小	格式
o11-5.pdf		23.39 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

Page view(s) 5

138

checked on 2024/4/20

下載 5

checked on 2024/4/20

Google Scholar^TM

檢查

TAIR相關文章

文件中的檔案：

Page view(s) 5

下載 5

Google ScholarTM

Google Scholar^TM