Applying Text Classification Techniques in Multidimensional Document Warehouse System
Date Issued
2005
Date
2005
Author(s)
Wu, Shu-Fu
DOI
zh-TW
Abstract
The development and growth of information technologies have caused a situation called “information overloading”. Therefore, we begin to look for new tools which allow us to create a query in multidimensional perspectives rather then to use traditional keyword-based search engines. Data warehouse systems provide the capabilities of storing and analyzing numerical data but lack the ability to deal with document collections. In order to solve these problems above, we are going to build a whole new system.
In this paper, we describe automatic metadata extraction algorithm and build up a document warehouse system. We define 15 kinds of metadata as 15 classes. Using support vector machine, we create 15 classifies to extract metadata from a new document. Sentences in the document with corresponding metadata were saved in xml format. Next, we use star schema to build a multidimensional document warehouse system. Metadata is used to support the process of loading documents into document warehouse. We also provide client side tools such as OLAP, cube browser, MDX query interface.
Our Experiments show that support vector machine can achieve high classification performance. We can extract most metadata from a document by SVM classifier. The prototype system built in this paper also shows the fundamental components and processes in a document warehouse system. The OLAP tools and multidimensional query tools provide methods of search and analyze document from multi-points of view of user perspectives
Subjects
支撐向量法
元資料擷取
線上分析處理
Support Vector Machine
Metadata Extraction
Document Warehouse System
Online Analytical Processing
Type
other
File(s)![Thumbnail Image]()
Loading...
Name
ntu-94-R92725009-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):ff9bd021192d8f44c24849393a46db58