Efficient Peer-to-Peer Keyword Search Using Adaptive Space Partition
Date Issued
2004
Date
2004
Author(s)
Chen, Po-Hung
DOI
zh-TW
Abstract
Locating the node that stores a particular data item is a fundamental problem in peer-topeer
systems and is traditionally formulated as the construction of a distributed hash table
(DHT). Typical DHTs support only a simple map from keys to values, but not keyword
search. To support keyword search on a DHT, some distributed inverted index is needed.
The challenge is how to evenly distribute the inverted index entries over the peers of the network.
Because the keywords frequency in data items is usually of a zipf-distribution, simply
partitioning inverted index entries by keywords would cause unbalanced loads. Furthermore,
when there are more than one keywords in a query, some intermediate data have to be transmitted
across the network and joined with each other to get the final result, incuring much
network traffic overhead.
We propose a distributed indexing scheme, called Adaptive Space Partition (ASP), that
has a good load balancing and incurs little network traffic overhead. The ASP scheme is
designed to work on top of the CAN DHT. A CAN network is structured as a d-dimension
virtual coordinate space, where each peer node is assigned to a zone in the space. The ASP
scheme maps a keyword to a region, consisting of one or more zones. Each object related
to a given keyword is inserted into a peer randomly selected from the keyword’s region. To
lookup objects related to the same keyword, a peer is randomly selected from the same region
as a starting point, which then searches its neighborhood by flooding. Because objects with
the same keyword are inserted into the same region, objects have good keyword locality and
can be found easily. Furthermore, our scheme partitions inverted index entries according to
the number of peers in a region to achieve better scalability. The ASP scheme also optimizes
the query operation to have the same complexity as a CAN routing procedure.
systems and is traditionally formulated as the construction of a distributed hash table
(DHT). Typical DHTs support only a simple map from keys to values, but not keyword
search. To support keyword search on a DHT, some distributed inverted index is needed.
The challenge is how to evenly distribute the inverted index entries over the peers of the network.
Because the keywords frequency in data items is usually of a zipf-distribution, simply
partitioning inverted index entries by keywords would cause unbalanced loads. Furthermore,
when there are more than one keywords in a query, some intermediate data have to be transmitted
across the network and joined with each other to get the final result, incuring much
network traffic overhead.
We propose a distributed indexing scheme, called Adaptive Space Partition (ASP), that
has a good load balancing and incurs little network traffic overhead. The ASP scheme is
designed to work on top of the CAN DHT. A CAN network is structured as a d-dimension
virtual coordinate space, where each peer node is assigned to a zone in the space. The ASP
scheme maps a keyword to a region, consisting of one or more zones. Each object related
to a given keyword is inserted into a peer randomly selected from the keyword’s region. To
lookup objects related to the same keyword, a peer is randomly selected from the same region
as a starting point, which then searches its neighborhood by flooding. Because objects with
the same keyword are inserted into the same region, objects have good keyword locality and
can be found easily. Furthermore, our scheme partitions inverted index entries according to
the number of peers in a region to achieve better scalability. The ASP scheme also optimizes
the query operation to have the same complexity as a CAN routing procedure.
Subjects
同儕網路
關鍵字搜尋
keyword search
DHT
p2p
peer-to-peer
Type
other
File(s)![Thumbnail Image]()
Loading...
Name
ntu-93-R91725035-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):ca331202a233f60557d8fcdcf8e34b4e