Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System
Date Issued
2005
Date
2005
Author(s)
Huang, Chun-Chih
DOI
zh-TW
Abstract
The purpose of a text retrieval system is to locate documents from a large, textual
document collection that meet a user’s needs. The SIR system is such a system that is
based on the sequence model. As it was designed and implemented as a sequential, rather
than a parallel application, it becomes less efficient when the size of the data collection
gets larger. Another drawback of the SIR system is that the index must be rebuilt entirely
when the data collections are modified. Also, compared with other models, the query
evaluation process of the sequence model is time consuming. In this thesis, we seek to
make improvements that address these problems.
To facilitate parallel query processing, we implement three kinds of index partitioning
schemes in the system, and evalauete their load balancing characteristics. To improve the
scalability of index building, we design and implement a mechanism that allows the SIR
system to support incremental index updates. We also make other improvements such as
support of queries with homophones and support of more types of token, that make the
system more flexible.
Subjects
累加式更新
索引切割設計
平行化反轉索引
平行化處理
Incremental Update
Index Partitioning Schemes
Information Retrieval
Parallel Inverted Index
Parallel Processing
Text Retrieval
Type
other
File(s)![Thumbnail Image]()
Loading...
Name
ntu-94-R92725027-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):bbb9a9af12e54a17ef22fef490150664
