Automatic Indexing for Controlled Vocabularies
Resource
中國圖書館學會會報,61,81-102
Journal
中國圖書館學會會報
Journal Issue
61
Pages
81-102
Date Issued
1997
Date
1997
Author(s)
伍健廷
Abstract
Based on statistics of word frequency and supported by semantic information of controlled
vocabularies, a new model for automatically controlled-vocabulary indexing is proposed in this paper. In
the proposed model, a new formula of term significance, TF×OSDF×CSIDF, amends the flaw of TF×IDF,
in which subject-specific words with high benefit to subject identification cannot be distinguished from
other words in the document collection of the same or close subject. Involving with 100 MeSH subject
heading and 60,400 abstracts and titles, results of the experiment achieve high performance, whereas
indexing precision and recall exceed 90% concurrently in abstract part. In title part, the indexing precision
reaches 90% and indexing recall keeps 70% at the same time. By consulting plentiful candidates of
controlled vocabularies generated by the model, problem of indexer’s consistency could be alleviated.
Besides, much time and cost saved will directly prompt quality and quantity of controlled-vocabulary index
terms, and finally improve retrieval performance indirectly.
vocabularies, a new model for automatically controlled-vocabulary indexing is proposed in this paper. In
the proposed model, a new formula of term significance, TF×OSDF×CSIDF, amends the flaw of TF×IDF,
in which subject-specific words with high benefit to subject identification cannot be distinguished from
other words in the document collection of the same or close subject. Involving with 100 MeSH subject
heading and 60,400 abstracts and titles, results of the experiment achieve high performance, whereas
indexing precision and recall exceed 90% concurrently in abstract part. In title part, the indexing precision
reaches 90% and indexing recall keeps 70% at the same time. By consulting plentiful candidates of
controlled vocabularies generated by the model, problem of indexer’s consistency could be alleviated.
Besides, much time and cost saved will directly prompt quality and quantity of controlled-vocabulary index
terms, and finally improve retrieval performance indirectly.
Subjects
Automatic Indexing
Controlled Vocabulary
Subject Analysis
Publisher
臺北市:國立臺灣大學圖書資訊學系
Type
journal article
File(s)![Thumbnail Image]()
Loading...
Name
blac1998.pdf
Size
353.74 KB
Format
Adobe PDF
Checksum
(MD5):ed29eadc2e87486f964a08524d14f6c4