https://scholars.lib.ntu.edu.tw/handle/123456789/118412
標題: | 基於環境與人類行為學習之機器人運動控制 Environment and Human Behavior Learning for Robot Motion Control |
作者: | 游岳齊 Yu, Yueh-Chi |
關鍵字: | 環境;人類行為;學習;機器人;控制;environment;human behavior;learning;robot;control | 公開日期: | 2008 | 摘要: | 接近式圖表(Nearness Diagram)為機器人的運動控制提供了一個反應式的演算法,它使用一個決策樹(decision tree)來將環境分類成不同的狀況,並且用映射函數(mapping function)將環境狀況對應至控制命令。然而,此方法中使用的決策樹及映射函數是事先定義的,還需要調整許多參數,而且,其所產生出的路徑與人類的路徑並不相似。仿學習(Imitation Learning)則是一種讓機器人產生類似人類行為的方法,它是基於馬可夫決策過程(Markov decision process),試著從使用者的控制行為找出獎勵函數(reward function),而此獎勵函數則用來產生模仿人類行為的控制命令。然而,對一般的使用者而言,真正在其心中的獎勵函數是難以描述的,因此,我們在比較真正的及學習得到的獎勵函數上是有困難的。此篇論文中,我們結合了接近式圖表和模仿學習兩個方法。我們並不使用事先定義好的決策樹來分類環境,也不試圖將獎勵函數解出,我們將試著找到環境資訊與人類控制行為的對應關係。下簡單描述我們的系統:首先,使用者會被要求控制機器人,環境的資訊以及使用者的控制資料會被收集來當作訓練資料。接著使用多重平均數法(the K-means method)將這些資料分類成不同的狀況,例如直線或者是轉彎。我們提出一個類似於尺度恆常特徵轉換(Scale-Invariant Feature Transform, SIFT)的時間特徵來標出這些狀況並且試圖移除雜訊。針對每一個狀況,使用適應性促進Adaptive Boosting, AdaBoost)演算法來產生一個分類器。最後,我們提出一個最近相鄰點(nearest neighbor)控制器來產生控制命令。 THE Nearness Diagram (ND) method provides a reactive algorithm for robot motionontrol. It uses a decision-tree to classify the environment into several situations. mapping function is used to generate the control commands from the situations.owever, the decision-tree and the mapping function are pre-defined andany parameters need to be manually tuned. Besides, the generated path is not humanlike.he imitation learning method is an approach that aims to make robot behave as a human.t is based on the Markov decision process (MDP) which is a framework for modelinghe environment. In the imitation learning method, it tries to extract the reward function inDP under given human’s control behavior. Then, the reward function is used to generatehe control command which imitates human’s behavior. Unfortunately, the true rewardunctions in their mind are hard to describe for general users. Thus, we have difficulty onomparing the learned reward function and the ground truth.n this thesis, we combined the ND method and the imitation learning method. Weo not use a pre-defined decision tree to classify the environment in the ND method. Also,e do not solve the reward function in the imitation learning method. Instead, we try toind a mapping from the environment information to the human’s control behavior.ur system is simply described below. Several users are asked to control the robot atirst. Then, the environment information and users’ control data are gathered as trainingata. The incremental K-means method is used to classify the training data into differentituations, such as straight or turns. We use the concept of scale-invariant feature transformSIFT) in computer vision. A SIFT-like temporal feature is proposed to mark the differentituations and try to eliminate noise. The Adaptive Boosting (AdaBoost) algorithms applied to train one classifier for each situation. Finally, a nearest neighbor controller isroposed to generate the control command. |
URI: | http://ntur.lib.ntu.edu.tw//handle/246246/184961 |
顯示於: | 資訊工程學系 |
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ntu-97-R95922030-1.pdf | 23.32 kB | Adobe PDF | 檢視/開啟 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。