Efficient Peer-to-Peer Keyword Search Using Adaptive Space Partition

蔡益坤臺灣大學：資訊管理學研究所陳柏宏Chen, Po-HungPo-HungChen2007-11-262018-06-292007-11-262018-06-292004http://ntur.lib.ntu.edu.tw//handle/246246/54267Locating the node that stores a particular data item is a fundamental problem in peer-topeer systems and is traditionally formulated as the construction of a distributed hash table (DHT). Typical DHTs support only a simple map from keys to values, but not keyword search. To support keyword search on a DHT, some distributed inverted index is needed. The challenge is how to evenly distribute the inverted index entries over the peers of the network. Because the keywords frequency in data items is usually of a zipf-distribution, simply partitioning inverted index entries by keywords would cause unbalanced loads. Furthermore, when there are more than one keywords in a query, some intermediate data have to be transmitted across the network and joined with each other to get the final result, incuring much network traffic overhead. We propose a distributed indexing scheme, called Adaptive Space Partition (ASP), that has a good load balancing and incurs little network traffic overhead. The ASP scheme is designed to work on top of the CAN DHT. A CAN network is structured as a d-dimension virtual coordinate space, where each peer node is assigned to a zone in the space. The ASP scheme maps a keyword to a region, consisting of one or more zones. Each object related to a given keyword is inserted into a peer randomly selected from the keyword’s region. To lookup objects related to the same keyword, a peer is randomly selected from the same region as a starting point, which then searches its neighborhood by flooding. Because objects with the same keyword are inserted into the same region, objects have good keyword locality and can be found easily. Furthermore, our scheme partitions inverted index entries according to the number of peers in a region to achieve better scalability. The ASP scheme also optimizes the query operation to have the same complexity as a CAN routing procedure.1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . 4 1.3 Thesis Outline . . . . . . . . . 4 2 Related Work 6 2.1 Content Addressable Network (CAN) 6 2.1.1 Routing in CAN . . . . 6 2.1.2 Network Construction in CAN . . 7 2.1.3 Zone Assignment . . . . . . . . 8 2.2 Gnutella .. . . . . . . . . . . . . . 10 2.3 Napster . . . . . . . . . . . . . . . . 11 2.4 Distributed Join Using Bloom Filters . . . . . 11 2.5 Peer-to-Peer Prefix Search . . . . . . . . . 13 2.6 Keyword Set Search (KSS) . . . . . . . . . 14 2.6.1 System Model . . . . . . . . . . . . . 14 2.6.2 Insert and Delete . . . . . . . . . 15 2.6.3 Query . . . . . . . . . . . . . . . . 15 2.6.4 Storage overhead . . . . . . . . . . 16 2.6.5 Load Balancing . . . . . . . . . . 16 2.7 PeerSearch . . . . . . . . . . . . . . . . . . 16 2.7.1 Vector Space Model (VSM) . . . . . . . 16 2.7.2 Latent Semantic Indexing (LSI) . . . . 17 2.7.3 Basic Algorithm . . . . . . . . . . . . . 17 2.8 Peer-to-Peer Keyword Search Using Keyword Relationship . . . . . . . . 18 2.8.1 KRDB . . .. . . . . . . . . . . . . 18 2.8.2 Search . . . . . . . . . . . . . . . 18 2.8.3 Distributed KRDB Updates . . . . . . . 18 3 Keyword Search Scheme 20 3.1 Resource Sharing in Peer-to-Peer Environments .. 20 3.2 Basic Idea . . . . . . . . . . . . . . 20 3.3 Challenges . .. . . . . . . . . . . . . . . . . 21 3.4 Region Assignment . . . . . . . . . . . . 22 ii 3.5 ASP Interative Deepening . . . . . . . . . . 24 3.6 Basic Insertion . . . . . . . . . . . . . 25 3.7 Basic Lookup . . . . . . . . . . . . . . . . 27 3.8 Adaptive Space Partition . . . . . . . . . 28 3.9 Step by Step Example . . . . . . . . . . . 33 3.10 Improved Query Operation . . . . . . . . . . 34 3.11 Persistence Object and Network Expansion . . . . . 36 4 Simulation and Analysis 39 4.1 Data Source . . . .. . . . . . . . . . . . . . 39 4.1.1 Keyword Distribution . . .. . . . . . . 40 4.2 System Parameter . . . . . . . . . . . . . . . 40 4.3 Insertion Overhead . . . .. . . . . . . . . . . 42 4.4 Query Overhead . . . . . . . . . . . . . . . 43 4.5 Peer Loading Distribution . . . . . .. . . . . . 46 4.6 Inverted Index Level Distribution . . . . . . 48 5 Conclusion 54 5.1 Contributions . . . . . . . . . . . . . . . . . 54 5.2 Furture Work . . . . . . . . . . . . . . . . . 55694396 bytesapplication/pdfen-US同儕網路關鍵字搜尋keyword searchDHTp2ppeer-to-peerEfficient Peer-to-Peer Keyword Search Using Adaptive Space Partitionotherhttp://ntur.lib.ntu.edu.tw/bitstream/246246/54267/1/ntu-93-R91725035-1.pdf