Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System

Huang, Chun-Chih

Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System

Date Issued

2005

Date

2005

Author(s)

Huang, Chun-Chih

DOI

zh-TW

URI

http://ntur.lib.ntu.edu.tw//handle/246246/54346

Abstract

The purpose of a text retrieval system is to locate documents from a large, textual document collection that meet a user’s needs. The SIR system is such a system that is based on the sequence model. As it was designed and implemented as a sequential, rather than a parallel application, it becomes less efficient when the size of the data collection gets larger. Another drawback of the SIR system is that the index must be rebuilt entirely when the data collections are modified. Also, compared with other models, the query evaluation process of the sequence model is time consuming. In this thesis, we seek to make improvements that address these problems. To facilitate parallel query processing, we implement three kinds of index partitioning schemes in the system, and evalauete their load balancing characteristics. To improve the scalability of index building, we design and implement a mechanism that allows the SIR system to support incremental index updates. We also make other improvements such as support of queries with homophones and support of more types of token, that make the system more flexible.

Subjects

累加式更新

索引切割設計

平行化反轉索引

平行化處理

Incremental Update

Index Partitioning Schemes

Information Retrieval

Parallel Inverted Index

Parallel Processing

Text Retrieval

Type

other

File(s)

Name

ntu-94-R92725027-1.pdf

Size

23.31 KB

Format

Adobe PDF

Checksum

(MD5):bbb9a9af12e54a17ef22fef490150664

Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)