https://scholars.lib.ntu.edu.tw/handle/123456789/632480
標題: | An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition | 作者: | Chang X Maekaku T Guo P Shi J Lu Y.-J Subramanian A.S Wang T Yang S.-W Tsao Y HUNG-YI LEE Watanabe S. |
關鍵字: | End-to-End Speech Recognition; ESPnet; Representation Learning | 公開日期: | 2021 | 起(迄)頁: | 228-235 | 來源出版物: | 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings | 摘要: | Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on evaluating the quality of self-supervised pretrained representations on various tasks with-out domain restriction, e.g. SUPERB. However, such evaluations do not provide a comprehensive comparison among many ASR benchmark corpora. In this paper, we focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models. We select sev-eral pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR. Without any modification of the back-end model archi-tectures or training strategy, some of the experiments with pretrained representations, e.g., WSJ, WSJ0-2mix with HuBERT, reach or out-perform current state-of-the-art (SOTA) recognition performance. Moreover, we further explore more scenarios for whether the pre-training representations are effective, such as the cross-language or overlapped speech. The scripts, configuratons and the trained mod-els have been released in ESPnet to let the community reproduce our experiments and improve them. © 2021 IEEE. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85117658726&doi=10.1109%2fASRU51503.2021.9688137&partnerID=40&md5=8571bdf8efad31e67fec2c31782126fd https://scholars.lib.ntu.edu.tw/handle/123456789/632480 |
DOI: | 10.1109/ASRU51503.2021.9688137 | SDG/關鍵字: | Speech recognition; Comprehensive comparisons; End to end; End-to-end speech recognition; Espnet; High-fidelity; Performance; Pre-training; Representation learning; Speech data; Speech signals; Speech |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。