Abstract
摘要:產業實際問題往往複雜度高且缺乏明確定義的規則或結構,無法以圍棋程式Alpha Go的自我對弈或學習歷史棋譜的方式進行深度學習,本研究團隊結合過去豐富的動態規化理論基礎、產學經驗、以及對深層學習的前期研究成果,預計將深層學習方法推廣至過去無法解決的大型動態最佳化難題中,並減少過去動態規劃模型的模型專一性問題,提高研究成果的應用價值。
第一年度利用模型內的深層學習(Learning within Models),求解過去無法求解的超大型動態規化問題:模型內學習是指利用各個不同狀態的不同發生頻率,針對高頻率狀態進行高精確性求解,對於低頻率系統狀態則採用較低精確度的運算,結合深層神經網路的預測與學習能力,對於不同運算解析度的決策,進行內插、外插等預測運算,產生整體有效且精確的動態控制方案。克服動態規劃的維度詛咒(curse-of-dimensionality)與運算複雜度難題。
第二年度則利用模型間的深層學習(Learning between Models),求解過去動態規化普遍面臨的模型專一性問題。系統組態變化在實務上相當頻繁(例如:增設新的生產機台或產品組合改變),當系統組態變化後,動態決策問題雖與過去類似,但不完全相同,過去的動態決策方法必須針對新組態重建立動態規劃模型並求解,因此在系統組態頻繁變更的環境中,應用頗受侷限。當類似但不同組態的的動態問題最佳解過去曾被求得時,本研究第二年將利用深層學習的高效預測能力,建構代表性的最佳解空間(representative-policy-prediction space),快速找出新組態下的動態最佳解。克服動態規劃模型的模型專一性限制。
第三年度則利用超越模型的深層學習(Learning beyond Models),快速根據過去動態規化最佳解的特性,找出過去未曾求解的動態問題最佳決策。例如:當生產系統導入新種類機台、或新種類產品時,可靠度、加工特性等的機台相關最佳化模型參數往往並不完整,也無法建立完整動態規劃模型。因此,第三年度在模型不完整時,利用機台規格、產品規格、材料規格等非模型直接參數,在缺乏完整機率模型或直接模型參數時,預測最佳動態策略。追求快速因應變化的環境,達成高效、彈性的全域動態智慧系統。
本研究成果的應用範疇包含所有傳統的動態規劃與非確定序列式決策問題(sequential decision problems under uncertainties),例如:營收管理、生產系統管理、資源分配等。
Abstract: Real world dynamic optimization problems are oftentimes unstructured and cannot be solved by typical dynamic programming or deep learning algorithms. Combined the past dynamic optimization experience of the research team, this project plans to explore the use of deep learning (DL) in dynamic optimization problems that suffer the curse-of-dimensionality and cannot be solved in the past. Moreover, this project plans to learn between dynamic systems and overcome the “problem specific model” limitation of all dynamic optimization problems.
In Year 1, this project will propose a “Learning-within-Models” deep learning algorithm to learn from existing models that suffer from the computational complexity problems. By conducting the value iteration algorithm under different resolution level. States with high visiting probability will go through high-resolution value iteration process and other states will be solved at a lower resolution. Deep learning engine will learn from the dynamic solution and conduct interpolation and extrapolation of policies to provide dynamic control policies for large and traditionally unsolvable problems.
In Year 2, “Learning-between-Models” will use deep learning engine to learn between different optimal control policies of systems under different configurations. When systems face configuration changes, the deep learning engine will quickly generate new control policies from the representative-policy-prediction space generated by the Deep Neural Network (DNN).
In Year 3, “Learning-beyond-models” will learn from systems without a specific model and achieve “Model-free” goal of this study. When new products, new machines, new customer, or new vehicles are introduced into an existing system, model parameters or probabilistic behaviors of those new components are oftentimes unclear. This research seeks to use “deep learning” to predict control policies using specification or non-model parameter information to generate effective dynamic control policies for those systems without complete models.
The application of our research results will include all fields that require dynamic optimization. In which, sequential decisions under uncertainties are significant. Specifically, revenue management, production system control, and resource allocation problems can benefit from our research results.
Keyword(s)
深層學習
深層神經網路
動態規劃
策略迭代
無模型最佳化
Deep Learning
Deep Neural Networks
Dynamic Programming
Policy Iteration
Model-Free Optimization