Euro: Espnet Unsupervised ASR Open-Source Toolkit
Journal
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISBN
978-1-7281-6327-7
Date Issued
2023-01-01
Author(s)
Gao, Dongji
Shi, Jiatong
Chuang, Shun Po
Garcia, Leibny Paola
Watanabe, Shinji
Khudanpur, Sanjeev
Abstract
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.
Subjects
ESPnet | S3PRL | self-supervised learning | unsupervised ASR
Type
conference paper
