Building Focused Crawler by Using the Tree-based Batch Mode Reinforcement Learning
Date Issued
2007
Date
2007
Author(s)
Wu, Shing-Bing
DOI
en-US
Abstract
The main issue of reinforcement learning is to find an optimal control policy by getting information from environment. In the past, Q-learning used tabular form to represent the Q-function (Q-function is defined on the state-action space and can be derived to find the optimal policy). When the state or action space is continuous or very large, the Q-function can not be represented in tabular form. In tree-based batch mode reinforcement learning, we can approximate the Q-function based on the ensemble trees which are constructed by a set of four-tuples (xt, ut, rt, xt+1) where xt is the current state, ut is the next state, rt is the instantaneous reward and xt+1 is the next state. In the past, we could find some related work that try to approximate the Q-function from a set of four-tuples by solving the supervised learning problem. We study how to use the extremely randomized tree algorithm and reinforcement learning to build the focused crawler.
There are many application of reinforcement learning, including some optimal control topic, game playing and web crawler (also know as Web crawler or Web robot) crawler is a program which browses the World Wide Web. The main work of the Web crawler is crating a copy of the visited page which will be processed by search engine. Because there are a huge number of pages needs to choose and download in a given time, we must have a policy that states which page we need. Focused crawler aims to search the relevant document on a given specific topic. In this paper, we constructed and built focused crawler by tree-based batch mode reinforcement learning and we also compared the performance with other methods. We aim to practice tree-based batch mode reinforcement and find its strength and shortcoming from experiment. We used the dataset of World Wide knowledge-based project (WebKB project) and also provide the analysis of the experiment result.
This work was partially supported by National Science Council, ROC under contract number NSC 94-2213-E-002-105
Keywords: tree-based batch mode reinforcement learning, ensemble tree algorithm, supervised learning, Q-learning, optimal control, Web crawler, focused crawler.
Subjects
增強式學習
樹狀批次模式增強式學習
ensemble tree
supervised learning
Q-learning
web crawler
聚焦爬蟲
tree-based batch mode reinforcement learning
ensemble tree algorithm
optimal control
Web crawler
focused crawler
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-96-R94922100-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):6ecff3ca66a9ba0ca6507e82c379305f
