Building Focused Crawler by Using the Tree-based Batch Mode Reinforcement Learning

Wu, Shing-Bing

Building Focused Crawler by Using the Tree-based Batch Mode Reinforcement Learning

Date Issued

2007

Date

2007

Author(s)

Wu, Shing-Bing

DOI

en-US

URI

http://ntur.lib.ntu.edu.tw//handle/246246/54078

Abstract

The main issue of reinforcement learning is to find an optimal control policy by getting information from environment. In the past, Q-learning used tabular form to represent the Q-function (Q-function is defined on the state-action space and can be derived to find the optimal policy). When the state or action space is continuous or very large, the Q-function can not be represented in tabular form. In tree-based batch mode reinforcement learning, we can approximate the Q-function based on the ensemble trees which are constructed by a set of four-tuples (xt, ut, rt, xt+1) where xt is the current state, ut is the next state, rt is the instantaneous reward and xt+1 is the next state. In the past, we could find some related work that try to approximate the Q-function from a set of four-tuples by solving the supervised learning problem. We study how to use the extremely randomized tree algorithm and reinforcement learning to build the focused crawler. There are many application of reinforcement learning, including some optimal control topic, game playing and web crawler (also know as Web crawler or Web robot) crawler is a program which browses the World Wide Web. The main work of the Web crawler is crating a copy of the visited page which will be processed by search engine. Because there are a huge number of pages needs to choose and download in a given time, we must have a policy that states which page we need. Focused crawler aims to search the relevant document on a given specific topic. In this paper, we constructed and built focused crawler by tree-based batch mode reinforcement learning and we also compared the performance with other methods. We aim to practice tree-based batch mode reinforcement and find its strength and shortcoming from experiment. We used the dataset of World Wide knowledge-based project (WebKB project) and also provide the analysis of the experiment result. This work was partially supported by National Science Council, ROC under contract number NSC 94-2213-E-002-105 Keywords: tree-based batch mode reinforcement learning, ensemble tree algorithm, supervised learning, Q-learning, optimal control, Web crawler, focused crawler.

Subjects

增強式學習

樹狀批次模式增強式學習

ensemble tree

supervised learning

Q-learning

web crawler

聚焦爬蟲

tree-based batch mode reinforcement learning

ensemble tree algorithm

optimal control

Web crawler

focused crawler

Type

thesis

File(s)

Name

ntu-96-R94922100-1.pdf

Size

23.31 KB

Format

Adobe PDF

Checksum

(MD5):6ecff3ca66a9ba0ca6507e82c379305f

Building Focused Crawler by Using the Tree-based Batch Mode Reinforcement Learning

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)