Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model

Hung-Chieh Fang; Nai-Xuan Ye; Yi-Jen Shih; Puyuan Peng; Hsuan-Fu Wang; Layne Berry; Hung-Yi Lee; David Harwath

doi:10.1109/icasspw62465.2024.10625802

Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model

Part Of

2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings

Journal Volume

33

Start Page

645

End Page

649

ISBN (of the container)

979-835037451-3

Date Issued

2024-04-14

Author(s)

Hung-Chieh Fang

Nai-Xuan Ye

Yi-Jen Shih

Puyuan Peng

Hsuan-Fu Wang

Layne Berry

Hung-Yi Lee

David Harwath

DOI

10.1109/icasspw62465.2024.10625802

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/722039

Abstract

Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-world setting. To address this challenge, we propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process, where the targets are derived from a visually-ground speech model, notably eliminating the need for speech-text paired data. Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.

Event(s)

49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024

Publisher

IEEE

Type

conference paper

Integrating Self-Supervised Speech Model with Pseudo Word-Level Targets from Visually-Grounded Speech Model

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)