DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Lin, Guan Ting; Chuang, Yung Sung; Chung, Ho Lam; Yang, Shu Wen; Chen, Hsuan Jui; Dong, Shuyan; Li, Shang Wen; Mohamed, Abdelrahman; HUNG-YI LEE; LIN-SHAN LEE

doi:10.21437/Interspeech.2022-612

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Journal

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Journal Volume

2022-September

Date Issued

2022-01-01

Author(s)

Lin, Guan Ting

Chuang, Yung Sung

Chung, Ho Lam

Yang, Shu Wen

Chen, Hsuan Jui

Dong, Shuyan

Li, Shang Wen

Mohamed, Abdelrahman

HUNG-YI LEE

LIN-SHAN LEE

DOI

10.21437/Interspeech.2022-612

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/633654

URL

https://api.elsevier.com/content/abstract/scopus_id/85140053690

Abstract

Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages, but more importantly, very often the answers to the questions include name entities or out-of-vocabulary words that cannot be recognized correctly. Also, ASR aims to minimize recognition errors equally over all words, including many function words irrelevant to the SQA task. Therefore, SQA without ASR transcripts (textless) is always highly desired, although known to be very difficult. This work proposes Discrete Spoken Unit Adaptive Learning (DUAL), leveraging unlabeled data for pre-training and fine-tuned by the SQA downstream task. The time intervals of spoken answers can be directly predicted from spoken documents. We also release a new SQA benchmark corpus, NMSQA, for data with more realistic scenarios. We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.

Subjects

Self-Supervised Representation | Spoken Question Answering | Textless NLP

Type

conference paper

DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)