2001-08-012024-05-18https://scholars.lib.ntu.edu.tw/handle/123456789/700922摘要:本計畫主要是在詞彙頻率統計的基礎下,利用大量文獻資料,透過訓練的方式,達成控制詞彙索引自動指派之目的。此研究方法的主要目的,乃是企圖建立特定的控制詞彙與一組相關自然語言詞彙之間的關係,然後藉著這一層關係,就可以決定是否將一個特定的控制詞彙指派給一篇文獻。控制詞彙可能由標題表或是索引典提供系統使用,基本上,標題表提供的是一般性詞彙;索引典則是特定領域的控制詞彙集合。本計畫將採用索引典,因為它提供比較精緻的詞彙及詞間關係,能夠有效地提高檢索系統的績效,並提供語意概念給自動摘要系統,製作較高品質的文件摘要。本計畫分三年進行控制詞彙索引系統、控制詞彙分類系統、控制詞彙整合系統之研究,主要工作項目如下所示: 一、第一年(88年 ~ 89年) * 探討文件的主題分析與機器如何輔助自動索引工作 * 研究人工索引與自動索引的異同 * 討論自動索引取代人工索引的可能性 * 建構控制詞彙與主題辨識之間的關係 * 評估模型之有效性 二、第二年(89年 ~ 90年) * 探討索引典階層架構中各節點之特徵 * 研究上下階層特徵之繼承模型 * 應用索引典之詞間關係建構瀏覽模型 * 整合上述<br> Abstract: Based on statistics of word frequencies of controlled vocabularies, this project will use a large volume of training data to fulfill the automatic assignment of controlled vocabulary for document indexing. The purpose of this research is to construct the mapping relation between controlled vocabulary and a set of natural terms. As a result, we could use this constructed relation to automatically assign a controlled vocabulary to documents. Basically, the controlled vocabularies may be provided by subject heading (for example, LCSH) or by thesaurus (for example, Thesaurus of Information Science and Librarianship). However, due to the sophisticated relationship included in thesaurus, we could effectively apply inter-term relations (hierarchical relation, equivalent relation, associative relation) to helping query expansion in IR systems and summary generation in AS systems. This project will study the system for controlled-vocabulary indexing, system for controlled-vocabulary classification, and integrated system for controlled vocabulary in the following three years. The core tasks of this project are shown as follows: 1. The first year (1999 ~ 2000) Investigate the subject analysis of text and examine how machine help the indexing task Study the difference of manual indexing and automatic indexing Discuss the possibility of using automatic indexing rather than manual indexing Construct a bottom-up model for controlled vocabulary indexing Evaluate the effectiveness of the constructed model 2. The second year (2000 ~ 2001) Investigate how to extract the feature of nodes in thesaurus Study the model for herited property in hierarchical thesaurus Apply the inter-term relations to constructing browsing model for controlled vocabulary Construct a top-down automatic classification model based on thesaurus Evaluate the effectiveness of the constructed model 3. The third year (2001 ~ 2002) Integrate automatic classification model and automatic indexing model Evaluate the effectiveness of the integrated model Identify the relation between new term and the controlled vocabulary Determine the cluster for the new term suggested by dynamic dictionary Help the query expansion in IR systems via inter-term relation自動摘要控制詞彙資訊擷取資訊檢索Automatic SummarizationControlled VocabularyInformation ExtractionInformation Retrieval多語言資訊檢索與擷取之研究(3/3)-子計畫二:控制詞彙自動檢索與擷取之研究