Options
Web Data Retrieval, Management, and Analysis
Date Issued
2005
Date
2005
Author(s)
Huang, Hsin-Mao
DOI
en-US
Abstract
The World Wilde Web is a popular and interactive medium to disseminate information today. The Web has become a huge and mostly unstructured data repository. Peer-to-Peer system also has become a popular file sharing platform in recent years. In this dissertation, we consider three issues: capturing individual user's access patterns for Web data mining, the influence of user's clicking behavior and user's interest for Web structure mining, and the searching policy for P2P system.
For capturing individual user's access pattern, we design and implement an access pattern collection server to conduct data mining in the Web. By using the concept of page conversion, the proposed method is able to resolve the difficulty imposed by proxy servers and capture the Web user behavior effectively. Using the devised mechanism, traversal patterns are generated and compared to those produced by the ordinary Web servers to validate our results.
In addition, for considering the page readers' contribution in Web structure mining, the influence of user's interest in VIPAS system is discussed. We devise a new algorithm, called Adjustable Cluster based VIPAS (AC-VIPAS), to adjust Web pages' scores according to the recommendation of users with similar interest. The experiment is conducted to evaluate the performance of the content based user cluster.
Finally, for improving the searching performance in Peer-to-Peer system, we propose a cluster-based peer-to-peer system, called PeerCluster. In PeerCluster, all participant computers are grouped into various interest clusters, each of which contains computers that have the same interests. To efficiently route and broadcast messages across/within interest clusters, a hypercube topology is employed. Moreover, we augment PeerCluster with a system recovery mechanism to make it robust against unpredictable computer/network failures.
For capturing individual user's access pattern, we design and implement an access pattern collection server to conduct data mining in the Web. By using the concept of page conversion, the proposed method is able to resolve the difficulty imposed by proxy servers and capture the Web user behavior effectively. Using the devised mechanism, traversal patterns are generated and compared to those produced by the ordinary Web servers to validate our results.
In addition, for considering the page readers' contribution in Web structure mining, the influence of user's interest in VIPAS system is discussed. We devise a new algorithm, called Adjustable Cluster based VIPAS (AC-VIPAS), to adjust Web pages' scores according to the recommendation of users with similar interest. The experiment is conducted to evaluate the performance of the content based user cluster.
Finally, for improving the searching performance in Peer-to-Peer system, we propose a cluster-based peer-to-peer system, called PeerCluster. In PeerCluster, all participant computers are grouped into various interest clusters, each of which contains computers that have the same interests. To efficiently route and broadcast messages across/within interest clusters, a hypercube topology is employed. Moreover, we augment PeerCluster with a system recovery mechanism to make it robust against unpredictable computer/network failures.
Subjects
網頁資料探勘
分散式計算
點對點系統
Web Data Mining
Distribution Computing
Peer-to-Peer System
Type
thesis
File(s)
No Thumbnail Available
Name
ntu-94-D87921025-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):e59df9b81bb7864080f4c7e4097600b0