Improving Speech Assessment Using Deep Neural Networks
Date Issued
2016
Date
2016
Author(s)
Fan, Chun-Hao
Abstract
Pronunciation plays an important role in communication. Similar but different pronunciations may lead to different meanings. Therefore, correct pronunciation is a very important part of language learning. The thesis is divided into two parts. The first part describes the use of deep neural networks (DNN) to classify phonemes. The second part explain how we can use the DNN output to perform speech assessment. Building a DNN-based speech assessment system is the main goal of this thesis. In terms of the use of DNN, we have compared the features of MFCC and Mel-filter bank coefficients. Moreover, we have tried a number of DNN configurations in order to find the best setting. Our main finding is that large-dimension features can give better accuracy. In our experiments, the best recognition rate of DNN models can be as high as 73.33% using large-dimension MFCC features. In terms of speech assessment, we have proposed two methods, max-gap and adaptive-k, to use the DNN’s output for speech assessment. A conventional HMM-GMM based speech assessment system is regard as a baseline. Our experiments demonstrate that, adaptive-k outperforms HMM-GMM for short sentence assessment. For long sentences, adaptive-k and HMM-GMM have comparable performance. In general, adaptive-k is still better than HMM-GMM for speech assessment.
Subjects
neural network
speech assessment
pronunciation scoring
computer assisted language learning (CALL)
computer assisted pronunciation training (CAPT)
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-105-R03944018-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):ad263c48d841661e5daba26251dc3a8d