Improving ASR in Reverberant Environments
Journal
2022 13th International Symposium on Chinese Spoken Language Processing, ISCSLP 2022
ISBN
9798350397963
Date Issued
2022-01-01
Author(s)
Abstract
Automatic Speech Recognition (ASR) significantly reduces the effort to create audio transcripts. Despite its convenience, the performance of ASR is unstable in disturbing environments; for instance, indoor signals are usually corrupted by reverberation (reverb), resulting in diminished performance in ASR. A type of solution is to construct an acoustic dereverberation (dereverb) model to pre-process the original signals before submitting them to ASR. However, the acoustic properties of the output signal of the dereverb model differ from that of the training dataset for ASR, resulting in a decline in performance. This paper optimizes the aforementioned structure from four aspects: signal classification, reverberation removal, data mismatch removal in ASR, and ensemble algorithms. With the proposed sentence-level fusion (SLF) and word-level fusion (WLF) ensemble algorithms, a CER of 7.23% was reached in the mixture test set of the reverberated and clean Aishell1 compared to the single model, achieving a reduction in the CER by 20.72%.
Subjects
automatic speech recognition | dereverberation | model ensemble | new structure | string confusion network
SDGs
Type
conference paper
