Publication: Web Data Retrieval, Management, and Analysis
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The World Wilde Web is a popular and interactive medium to disseminate information today. The Web has become a huge and mostly unstructured data repository. Peer-to-Peer system also has become a popular file sharing platform in recent years. In this dissertation, we consider three issues: capturing individual user's access patterns for Web data mining, the influence of user's clicking behavior and user's interest for Web structure mining, and the searching policy for P2P system. For capturing individual user's access pattern, we design and implement an access pattern collection server to conduct data mining in the Web. By using the concept of page conversion, the proposed method is able to resolve the difficulty imposed by proxy servers and capture the Web user behavior effectively. Using the devised mechanism, traversal patterns are generated and compared to those produced by the ordinary Web servers to validate our results. In addition, for considering the page readers' contribution in Web structure mining, the influence of user's interest in VIPAS system is discussed. We devise a new algorithm, called Adjustable Cluster based VIPAS (AC-VIPAS), to adjust Web pages' scores according to the recommendation of users with similar interest. The experiment is conducted to evaluate the performance of the content based user cluster. Finally, for improving the searching performance in Peer-to-Peer system, we propose a cluster-based peer-to-peer system, called PeerCluster. In PeerCluster, all participant computers are grouped into various interest clusters, each of which contains computers that have the same interests. To efficiently route and broadcast messages across/within interest clusters, a hypercube topology is employed. Moreover, we augment PeerCluster with a system recovery mechanism to make it robust against unpredictable computer/network failures.