https://scholars.lib.ntu.edu.tw/handle/123456789/24373
Title: | 控制詞彙之自動索引 Automatic Indexing for Controlled Vocabularies |
Authors: | 陳光華 伍健廷 |
Keywords: | 自動索引;控制詞彙;主題分析;Automatic Indexing;Controlled Vocabulary;Subject Analysis | Issue Date: | 1997 | Publisher: | 臺北市:國立臺灣大學圖書資訊學系 | Journal Issue: | 61 | Start page/Pages: | 81-102 | Source: | 中國圖書館學會會報 | Abstract: | 本論文於詞彙頻率統計的基礎下,利用大量經人工控制詞彙索引的文件,配合控制詞彙所提供 的語意訊息,設計一個自動索引模型。索引模型使用新的詞彙顯著性計算公式TF×OSDF×CSIDF 修 正傳統以TF×IDF 無法將主題專指性詞彙從主題相近的文件集合中分離出來的問題。實驗針對100 個MeSH 標題,利用總數60,400 篇文件的摘要與題名進行訓練與測試,結果顯示索引模型的表現相 當優良。摘要部份的索引精確率與索引回現率可同時到達90%以上,題名部份則在索引精確率90% 的要求下,維持索引回現率於70%。透過索引模型產生大量的控制詞彙建議名單,將可以減輕索引 一致性的問題,提高文件的控制詞彙索引數量,改善傳統控制詞彙索引因為產量過少,導致檢索時 精確率雖高,但回現率卻不如自然語言索引的現象。 Based on statistics of word frequency and supported by semantic information of controlled vocabularies, a new model for automatically controlled-vocabulary indexing is proposed in this paper. In the proposed model, a new formula of term significance, TF×OSDF×CSIDF, amends the flaw of TF×IDF, in which subject-specific words with high benefit to subject identification cannot be distinguished from other words in the document collection of the same or close subject. Involving with 100 MeSH subject heading and 60,400 abstracts and titles, results of the experiment achieve high performance, whereas indexing precision and recall exceed 90% concurrently in abstract part. In title part, the indexing precision reaches 90% and indexing recall keeps 70% at the same time. By consulting plentiful candidates of controlled vocabularies generated by the model, problem of indexer’s consistency could be alleviated. Besides, much time and cost saved will directly prompt quality and quantity of controlled-vocabulary index terms, and finally improve retrieval performance indirectly. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/29201 | Rights: | 國立臺灣大學圖書資訊學系 |
Appears in Collections: | 圖書資訊學系 |
File | Description | Size | Format | |
---|---|---|---|---|
blac1998.pdf | 353.74 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.