鄭勝文臺灣大學:工程科學及海洋工程學研究所鄭曉鍾Cheng, Hsiao-ChungHsiao-ChungCheng2007-11-262018-06-282007-11-262018-06-282007http://ntur.lib.ntu.edu.tw//handle/246246/51144本研究提出一快速線上自動修正動作指令之系統,由機器人本身利用計算感測器與致動器間關係,藉由強化學習來修正節奏性動作。本研究雖無法直接產生動作指令,但可以利用感測器回傳訊號,將所有內外部不確定性納入考量,並依據使用者設定之目標,自動修正原有動作直到符合需求。因此使用者無須依照傳統方式進行精密之模擬計算並規劃動作指令,僅需輸入粗略動作指令並選取適當回饋值,即可找出合適之新動作指令。 現階段以雙足機器人步行動作為研究實例,由感測器回傳訊號計算回饋值,採用三階段可適隨機搜尋法在原始動作指令的特定範圍之內搜尋產生新的動作指令。目前已求得可穩定行走之動作指令,並以此為基礎,粗略規劃較大與較小步伐的動作指令,在不同步行速度下分別搜尋,分析不同步行速度可以穩定前行的最大步幅,可得機器人硬體可行之理論最快前進速度為1386.207mm/min。此外,嘗試讓機器人自行學會行走不同路面,現已可爬上高度與水平長度比例為1:20,傾角約2.86度的斜坡。 本研究發展之線上動作修正系統,架構簡單且軟硬體需求低,使用者僅需給予一組不需十分精確的動作指令並定義合適的回饋值,即可自動學習出適應現在外界環境之動作指令,可大幅減少傳統計算精確動作指令並針對不確定性進行修正之工作量與複雜度,易於內建至各類智慧型機器人中,並推廣到不同應用領域,極具實用價值與發展潛力。In this thesis, an on-line learning system of humanoid robot has been developed for robot motion pattern modification. There are a lot of environment and robot uncertainties when humanoid robot moving, this learning system could find out the modification of motion pattern to overcome all uncertainties through the computation of sensor-motor relation. The learning process is based on adaptive random search (ARS) with reinforcement learning. Sensor signals of motion are used to calculate the fitness function for reinforcement learning. There are one two-axis accelerometer and two one-axis gyros on the head of the robot and four pressure sensors on the feet. At present, initial gait pattern always makes the robot falls down. After learning process, the stable gait pattern was found. Further, to find out the fastest pattern, the different gait patterns and speeds were composed. In this case, the robot maximum walk velocity is 1386.207mm/min, and could walk on a ramp with 2.85 degree of slope. This system is uncomplicated, easy to adjust for different conditions and deal with all uncertainties at once. User only need to give a rough initial pattern and the suitable fitness function, don’t need the exact simulation. It could apply on many fields and embedding in robot.中文摘要..................................................I 英文摘要................................................III 目錄.....................................................IV 圖目錄...................................................VI 表目錄...................................................IX 第一章 前言...............................................1 1.1 研究背景與動機.....................................1 1.2 研究目的...........................................5 1.3 論文回顧...........................................9 1.4 章節說明..........................................11 第二章 系統架構..........................................12 2.1 整體架構..........................................12 2.2 機器人機構系統....................................14 2.3 機器人控制系統....................................16 2.4 機器人感測系統....................................17 2.4.1 加速度計....................................20 2.4.2 陀螺儀......................................25 2.4.3 壓力感測器..................................29 2.4.4 感測訊號分析與整合..........................33 第三章 線上學習流程與理論................................37 3.1 線上學習流程......................................37 3.2 可適隨機搜尋法....................................39 3.3 增強式學習手段_可適隨機搜尋法.....................41 第四章 實驗驗證..........................................43 4.1 自動爬起機制......................................43 4.1.1 倒地靜態姿勢判斷............................47 4.1.2 自動爬起動作................................56 4.2 穩定步行動作......................................62 4.3 最快前行動作:步行速度與步幅關係分析...............67 4.4 特殊地形動作:斜坡上行.............................71 第五章 結論..............................................75 參考文獻.................................................76 附錄.....................................................80en-US線上自動學習可適隨機搜尋法增強式學習人型機器人步伐模式On-line learningAdaptive Random SearchReinforcement LearningHumanoid robotsGait pattern運用可適隨機搜尋法之實體人型機器人進化學習之研究Adaptive Random Search based Evolutionary Learning of a Humanoid Robotthesis