https://scholars.lib.ntu.edu.tw/handle/123456789/636983
標題: | Introducing Semantics into Speech Encoders | 作者: | Xu, Derek Dong, Shuyan Wang, Changhan Kim, Suyoun Lin, Zhaojiang Liu, Bing Shrivastava, Akshat Li, Shang Wen Tseng, Liang Hsuan Lin, Guan Ting Baevski, Alexei HUNG-YI LEE Sun, Yizhou Wang, Wei |
公開日期: | 1-一月-2023 | 卷: | 1 | 來源出版物: | Proceedings of the Annual Meeting of the Association for Computational Linguistics | 摘要: | Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding (SLU) performance by over 5% on intent classification (IC), with modest gains in named entity resolution (NER) and slot filling (SF), and spoken question answering (SQA) FF1 score by over 2%. Our approach, which uses no ASR data, achieves similar performance as methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/636983 | ISBN: | 9781959429722 | ISSN: | 0736587X |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。