Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Liu, Guan Ting; Hu, En Pei; PU-JEN CHENG; HUNG-YI LEE; Sun, Shao Hua

DC 欄位	值	語言
dc.contributor.author	Liu, Guan Ting	en_US
dc.contributor.author	Hu, En Pei	en_US
dc.contributor.author	PU-JEN CHENG	en_US
dc.contributor.author	HUNG-YI LEE	en_US
dc.contributor.author	Sun, Shao Hua	en_US
dc.date.accessioned	2023-11-08T07:56:45Z	-
dc.date.available	2023-11-08T07:56:45Z	-
dc.date.issued	2023-01-01	-
dc.identifier.uri	https://scholars.lib.ntu.edu.tw/handle/123456789/636993	-
dc.description.abstract	Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterize diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task. Despite the encouraging results, the program policies that LEAPS can produce are limited by the distribution of the program dataset. Furthermore, during searching, LEAPS evaluates each candidate program solely based on its return, failing to precisely reward correct parts of programs and penalize incorrect parts. To address these issues, we propose to learn a meta-policy that composes a series of programs sampled from the learned program embedding space. By learning to compose programs, our proposed hierarchical programmatic reinforcement learning (HPRL) framework can produce program policies that describe out-of-distributionally complex behaviors and directly assign credits to programs that induce desired behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines. The ablation studies confirm the limitations of LEAPS and justify our design choices.	en_US
dc.relation.ispartof	Proceedings of Machine Learning Research	en_US
dc.title	Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs	en_US
dc.type	conference paper	en_US
dc.identifier.scopus	2-s2.0-85174410787	-
dc.identifier.url	https://api.elsevier.com/content/abstract/scopus_id/85174410787	-
dc.relation.journalvolume	202	en_US
dc.relation.pageend	22337	en_US
item.openairetype	conference paper	-
item.openairecristype	http://purl.org/coar/resource_type/c_5794	-
item.fulltext	no fulltext	-
item.grantfulltext	none	-
item.cerifentitytype	Publications	-
crisitem.author.dept	Networking and Multimedia	-
crisitem.author.dept	Computer Science and Information Engineering	-
crisitem.author.orcid	0000-0001-5892-0385	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
顯示於：	電機工程學系

顯示文件簡單紀錄

Page view(s)

checked on 2024/5/11

Google Scholar^TM

檢查

TAIR相關文章

Page view(s)

Google ScholarTM

Google Scholar^TM