HUNG-YI LEEPei-Hung ChungYen-Chen WuTzu-Hsiang LinTsung-Hsien WenLIN-SHAN LEE2019-10-242019-10-24201823299290https://www.scopus.com/inward/record.uri?eid=2-s2.0-85049486998&doi=10.1109%2fTASLP.2018.2852739&partnerID=40&md5=c168e272cfe6b94f02e0d3daae1d6d7cFor text content retrieval, the user can easily scan through and select from a list of retrieved items. This is impossible for spoken content retrieval, because the retrieved items are not easily displayed on-screen. In addition, due to the high degree of uncertainty for speech recognition, retrieval results can be very noisy. One way to counter such difficulties is through user-machine interaction. The machine can take different actions to interact with the user to obtain better retrieval results before showing them to the user. For example, the machine can request extra information from the user, return a list of topics for the user to select from, and so on. In this paper, we propose using deep-Q-network (DQN) to determine the machine actions for interactive spoken content retrieval. DQN bypasses the need to estimate hand-crafted states, and directly determines the best action based on the present retrieval results even without any human knowledge. It is shown to achieve significantly better performance as compared with the previous hand-crafted states. We further find that double DQN and dueling DQN improve the naive version. © 2014 IEEE.deep-Q-learning; reinforcement learning; Spoken content retrieval; user-machine interactionInteractive Spoken Content Retrieval by Deep Reinforcement Learningjournal article10.1109/taslp.2018.28527392-s2.0-85049486998