Learning to Synthesize Programs as Interpretable and Generalizable Policies

Trivedi D; Zhang J; Sun S.-H; Lim J.J.; SHAO-HUA SUN

Learning to Synthesize Programs as Interpretable and Generalizable Policies

Journal

Advances in Neural Information Processing Systems

Journal Volume

30

Pages

25146-25163

Date Issued

2021

Author(s)

Trivedi D

Zhang J

Sun S.-H

Lim J.J.

SHAO-HUA SUN

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85126901546&partnerID=40&md5=100de917732bae0b94dfdd6f284ddd25

https://scholars.lib.ntu.edu.tw/handle/123456789/624946

Abstract

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding. Website at https://clvrai.com/leaps. © 2021 Neural information processing systems foundation. All rights reserved.

Other Subjects

Decision trees; Embeddings; Reinforcement learning; Embeddings; Generalisation; Learn+; Network policy; Neural-networks; Performance; Program templates; Programmatics; Reinforcement learning method; State-machine; Deep learning

Type

conference paper

Learning to Synthesize Programs as Interpretable and Generalizable Policies

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)