曹承礎臺灣大學:資訊管理學研究所楊瑋琳Yang, Wei-LinWei-LinYang2007-11-262018-06-292007-11-262018-06-292006http://ntur.lib.ntu.edu.tw//handle/246246/54215以關鍵字為搜尋基礎的搜尋引擎是主要用來作為從大量資料中擷取相關文件的方法之一,回傳的搜尋結果(Snippet)並未加以組織,且僅以關鍵字為單一維度的選擇條件,無法提供多面向的瀏覽。在資訊擷取(Information Retrieval)領域中,分類(Classification)與分群(Clustering)是用來自動給定文件集合不同語意目錄類別的兩個方法。前者必須先訓練部分文件集合,形成分類模式以進行自動化分類。而後者則以統計方法計算文件之間的相似度,達到自動分群的目的。由於搜尋結果具有動態特性,且預先定義的目錄類別不具彈性,因此以分群技術作為本研究的工具之一。 本研究提出以具多維度瀏覽功能的虛擬文件倉儲系統作為提供多面向瀏覽搜尋結果的方法。結合現有搜尋引擎,以HAC+P階層分群演算法形成語意上的階層結構,即形成以語意為基礎的概念階層,透過不斷的搜尋與分群,可形成屬於個人的概念化知識地圖,藉此改善使用者的瀏覽經驗,更有效地找到相關的主題及文件內容。Searching for information based on the keyword-based retrieval by using search engines has limited ability to mine the most important and relevant knowledge. The retrieved search results are disorganized results and lack of dimensions. In the information retrieval (IR) field, text categorization has been investigated for many years to organize search results automatically into corresponding categories, which contains classification and clustering. In this thesis, we propose and describe the Virtual Document Warehouse System, which contains an integrated interface for multi-dimensional analysis for knowledge management and decision-making. The system extracts relevant documents by using search engines and we utilize clustering algorithms to dynamically and automatically organize information retrieved from heterogeneous sources into hierarchical structures, and to combine different concept hierarchies. Finally, we propose an approach that makes searching more convenient and multi-dimensional, and present the application of personalized conceptual knowledge maps.Chapter 1 Introduction 1 1.1. Motivation 1 1.2. Objective 3 1.3. Organization 4 Chapter 2 Literature Review 5 2.1. Text Classification Techniques 5 2.2. Document Clustering Techniques 6 2.2.1. Partitioning Clustering 6 2.2.2. Hierarchical Clustering 8 2.2.3. Agglomerative Hierarchical Clustering 9 2.3. Document Warehouses 11 2.3.1. Data Warehouses and Document Warehouses 11 2.3.2. Virtual Data Warehouses 14 2.3.3. Concept Hierarchy 15 2.3.4. Dimensions 16 2.3.5. Data Cube 18 2.3.6. On-Line Analytical Processing Operations 18 2.4. Knowledge Maps 19 Chapter 3 System Design 22 3.1. System Architecture 22 3.2. System Components 22 3.2.1. Heterogeneous document sources 23 3.2.2. Clustering-based Search Engine 23 3.2.3. Warehouse Administrator 26 3.2.4. Multi-dimensional Browser Engine 28 3.3. System Flow 29 3.3.1. Extracting, Transforming, and Loading (ETL) Function 29 3.3.2. Virtual Document Warehousing and Cube Function 30 3.3.3. Multi-dimensional Analysis Function 31 Chapter 4 System Implementation and Experiment Analysis 33 4.1. Scenario 33 4.2. Development Tools 34 4.3. Hierarchical Clustering Experiment and Analysis 35 4.3.1. Hierarchical clustering experiment 35 4.3.2. Discussion and analysis 44 4.4. Virtual Document Warehouse Implementation 47 4.4.1. Data Source Format 47 4.4.2. Virtual Document Warehouse Design 48 4.4.3. Concept Hierarchy Design 56 4.4.4. Documents Loading 57 4.4.5. Dimensions and Cubes 58 4.5. Clustering-based Search engine 61 4.6. Multi-dimensional Browser Engine 62 4.7. Application of Knowledge Maps 69 4.8. Analysis and Discussion 70 Chapter 5 Conclusion and Future Work 72 5.1. Conclusion 72 5.2. Future Work 73 Bibliography 755393869 bytesapplication/pdfen-US資訊擷取階層分群文件倉儲概念階層搜尋引擎Information RetrievalHierarchical ClusteringDocument WarehouseConcept HierarchySearch Engine以動態階層分群技術為基礎建立虛擬文件倉儲系統Developing a Virtual Document Warehouse with Dynamic Hierarchical Clustering Techniquesotherhttp://ntur.lib.ntu.edu.tw/bitstream/246246/54215/1/ntu-95-R93725037-1.pdf