Web Data Retrieval, Management, and Analysis

Huang, Hsin-Mao

Web Data Retrieval, Management, and Analysis

Date Issued

2005

Date

2005

Author(s)

Huang, Hsin-Mao

DOI

en-US

URI

http://ntur.lib.ntu.edu.tw//handle/246246/53405

Abstract

The World Wilde Web is a popular and interactive medium to disseminate information today. The Web has become a huge and mostly unstructured data repository. Peer-to-Peer system also has become a popular file sharing platform in recent years. In this dissertation, we consider three issues: capturing individual user's access patterns for Web data mining, the influence of user's clicking behavior and user's interest for Web structure mining, and the searching policy for P2P system. For capturing individual user's access pattern, we design and implement an access pattern collection server to conduct data mining in the Web. By using the concept of page conversion, the proposed method is able to resolve the difficulty imposed by proxy servers and capture the Web user behavior effectively. Using the devised mechanism, traversal patterns are generated and compared to those produced by the ordinary Web servers to validate our results. In addition, for considering the page readers' contribution in Web structure mining, the influence of user's interest in VIPAS system is discussed. We devise a new algorithm, called Adjustable Cluster based VIPAS (AC-VIPAS), to adjust Web pages' scores according to the recommendation of users with similar interest. The experiment is conducted to evaluate the performance of the content based user cluster. Finally, for improving the searching performance in Peer-to-Peer system, we propose a cluster-based peer-to-peer system, called PeerCluster. In PeerCluster, all participant computers are grouped into various interest clusters, each of which contains computers that have the same interests. To efficiently route and broadcast messages across/within interest clusters, a hypercube topology is employed. Moreover, we augment PeerCluster with a system recovery mechanism to make it robust against unpredictable computer/network failures.

Subjects

網頁資料探勘

分散式計算

點對點系統

Web Data Mining

Distribution Computing

Peer-to-Peer System

Type

thesis

File(s)

Name

ntu-94-D87921025-1.pdf

Size

23.31 KB

Format

Adobe PDF

Checksum

(MD5):e59df9b81bb7864080f4c7e4097600b0

Web Data Retrieval, Management, and Analysis

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)