STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

Zezario R.E;Fu S.-W;Fuh C.-S;Tsao Y;Wang H.-M.

標題:	STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model
作者:	Zezario R.E Fu S.-W CHIOU-SHANN FUH Tsao Y Wang H.-M.
關鍵字:	Convolutional neural networks; Deep learning; Learning systems; Attention mechanisms; Correlation value; Good correlations; Input and outputs; Intelligibility assessment; Real-world scenario; Spectral feature; Speech utterance; Speech intelligibility
公開日期:	2020
起(迄)頁:	482-486
來源出版物:	2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
摘要:	The calculation of most objective speech intelligibility assessment metrics requires clean speech as a reference. Such a requirement may limit the applicability of these metrics in real-world scenarios. To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net. The input and output of STOI-Net are speech spectral features and predicted STOI scores, respectively. The model is formed by the combination of a convolutional neural network and bidirectional long short-term memory (CNNBLSTM) architecture with a multiplicative attention mechanism. Experimental results show that the STOI score estimated by STOI-Net has a good correlation with the actual STOI score when tested with noisy and enhanced speech utterances. The correlation values are 0.97 and 0.83, respectively, for the seen test condition (the test speakers and noise types are involved in the training set) and the unseen test condition (the test speakers and noise types are not involved in the training set). The results confirm the capability of STOI-Net to accurately predict the STOI scores without referring to clean speech. ? 2020 APSIPA.
URI:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85100916868&partnerID=40&md5=9da85546220f4a7d00d1a88b4d8ccda4 https://scholars.lib.ntu.edu.tw/handle/123456789/581297
SDG/關鍵字:	Convolutional neural networks; Deep learning; Learning systems; Attention mechanisms; Correlation value; Good correlations; Input and outputs; Intelligibility assessment; Real-world scenario; Spectral feature; Speech utterance; Speech intelligibility
顯示於：	資訊工程學系

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/10/9

Page view(s)

checked on 2024/4/20

Google Scholar^TM

檢查

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

SCOPUS^TM
Citations

Google Scholar^TM