Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted Settings

Huang, Kuan Po; Fu, Yu Kuan; Hsu, Tsu Yuan; Gutierrez, Fabian Ritter; Wang, Fan Lin; Tseng, Liang Hsuan; Zhang, Yu; HUNG-YI LEE

doi:10.1109/SLT54892.2023.10022474

Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted Settings

Journal

2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

ISBN

9798350396904

Date Issued

2023-01-01

Author(s)

Huang, Kuan Po

Fu, Yu Kuan

Hsu, Tsu Yuan

Gutierrez, Fabian Ritter

Wang, Fan Lin

Tseng, Liang Hsuan

Zhang, Yu

HUNG-YI LEE

DOI

10.1109/SLT54892.2023.10022474

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/633676

URL

https://api.elsevier.com/content/abstract/scopus_id/85140729518

Abstract

Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. This paper proposes to apply Cross-Distortion Mapping and Domain Adversarial Training to SSL models during knowledge distillation to alleviate the performance gap caused by the domain mismatch problem. Results show consistent performance improvements under both in- and out-of-domain distorted setups for different downstream tasks while keeping efficient model size.

Subjects

Distortions | Domain Adversarial Training | Domain-adaptive Pre-training | Self-supervised Learning | SUPERB

Type

conference paper

Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted Settings

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)