An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Chang X;Maekaku T;Guo P;Shi J;Lu Y.-J;Subramanian A.S;Wang T;Yang S.-W;Tsao Y;Lee H.-Y;Watanabe S.

標題:	An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
作者:	Chang X Maekaku T Guo P Shi J Lu Y.-J Subramanian A.S Wang T Yang S.-W Tsao Y HUNG-YI LEE Watanabe S.
關鍵字:	End-to-End Speech Recognition; ESPnet; Representation Learning
公開日期:	2021
起(迄)頁:	228-235
來源出版物:	2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
摘要:	Self-supervised pretraining on speech data has achieved a lot of progress. High-fidelity representation of the speech signal is learned from a lot of untranscribed data and shows promising performance. Recently, there are several works focusing on evaluating the quality of self-supervised pretrained representations on various tasks with-out domain restriction, e.g. SUPERB. However, such evaluations do not provide a comprehensive comparison among many ASR benchmark corpora. In this paper, we focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models. We select sev-eral pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR. Without any modification of the back-end model archi-tectures or training strategy, some of the experiments with pretrained representations, e.g., WSJ, WSJ0-2mix with HuBERT, reach or out-perform current state-of-the-art (SOTA) recognition performance. Moreover, we further explore more scenarios for whether the pre-training representations are effective, such as the cross-language or overlapped speech. The scripts, configuratons and the trained mod-els have been released in ESPnet to let the community reproduce our experiments and improve them. © 2021 IEEE.
URI:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85117658726&doi=10.1109%2fASRU51503.2021.9688137&partnerID=40&md5=8571bdf8efad31e67fec2c31782126fd https://scholars.lib.ntu.edu.tw/handle/123456789/632480
DOI:	10.1109/ASRU51503.2021.9688137
SDG/關鍵字:	Speech recognition; Comprehensive comparisons; End to end; End-to-end speech recognition; Espnet; High-fidelity; Performance; Pre-training; Representation learning; Speech data; Speech signals; Speech
顯示於：	電機工程學系

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/11/14

Page view(s)

checked on 2024/4/27

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM