DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering

Lin, Guan Ting; Chuang, Yung Sung; Chung, Ho Lam; Yang, Shu Wen; Chen, Hsuan Jui; Dong, Shuyan; Li, Shang Wen; Mohamed, Abdelrahman; HUNG-YI LEE; LIN-SHAN LEE

標題:	DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering
作者:	Lin, Guan Ting Chuang, Yung Sung Chung, Ho Lam Yang, Shu Wen Chen, Hsuan Jui Dong, Shuyan Li, Shang Wen Mohamed, Abdelrahman HUNG-YI LEE LIN-SHAN LEE
關鍵字:	Self-Supervised Representation \| Spoken Question Answering \| Textless NLP
公開日期:	1-一月-2022
卷:	2022-September
來源出版物:	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
摘要:	Spoken Question Answering (SQA) is to find the answer from a spoken document given a question, which is crucial for personal assistants when replying to the queries from the users. Existing SQA methods all rely on Automatic Speech Recognition (ASR) transcripts. Not only does ASR need to be trained with massive annotated data that are time and cost-prohibitive to collect for low-resourced languages, but more importantly, very often the answers to the questions include name entities or out-of-vocabulary words that cannot be recognized correctly. Also, ASR aims to minimize recognition errors equally over all words, including many function words irrelevant to the SQA task. Therefore, SQA without ASR transcripts (textless) is always highly desired, although known to be very difficult. This work proposes Discrete Spoken Unit Adaptive Learning (DUAL), leveraging unlabeled data for pre-training and fine-tuned by the SQA downstream task. The time intervals of spoken answers can be directly predicted from spoken documents. We also release a new SQA benchmark corpus, NMSQA, for data with more realistic scenarios. We empirically showed that DUAL yields results comparable to those obtained by cascading ASR and text QA model and robust to real-world data.
URI:	https://scholars.lib.ntu.edu.tw/handle/123456789/633654
ISSN:	2308457X
DOI:	10.21437/Interspeech.2022-612
顯示於：	電機工程學系

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/12/27

Page view(s)

checked on 2024/4/27

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM