指導教授:李琳山臺灣大學:電機工程學研究所王祐邦Wang, Yow-BangYow-BangWang2014-11-282018-07-062014-11-282018-07-062014http://ntur.lib.ntu.edu.tw//handle/246246/262892Pronunciation error patterns (EPs) are patterns of mispronunciation frequently produced by language learners, and are usually different for different pairs of target and native languages. Accurate information of EPs can offer helpful feedbacks to the learners to improve their language skills. However, the major difficulty of EP detection comes from the fact that EPs are intrinsically similar to their corresponding canonical pronunciation, and EPs corresponding to the same canonical pronunciation are also intrinsically similar to each other. As a result, distinguishing EPs from their corresponding canonical pronunciation and between different EPs of the same phoneme is a difficult task – perhaps even more difficult than distinguishing between different phonemes in one language. On the other hand, the cost of deriving all EPs for each pair of target and native languages is high, usually requiring extensive expert knowledge or high-quality annotated data. Unsupervised EP discovery from a corpus of learner recordings would thus be an attractive addition to the field. In this dissertation, we propose new frameworks for both supervised EP detection and unsupervised EP discovery. For supervised EP detection, we use hierarchical MLPs as the EP classifiers to be integrated with the baseline using HMM/GMM in a two-pass Viterbi decoding architecture. Experimental results show that the new framework enhances the power of EP diagnosis. For unsupervised EP discovery we propose the first known framework, using the hierarchical agglomerative clustering (HAC) algorithm to explore sub-segmental variation within phoneme segments and produce fixed-length segment-level feature vectors in order to distinguish different EPs. We tested K-means (assuming a known number of EPs) and the Gaussian mixture model with the minimum description length principle (estimating an unknown number of EPs) for EP discovery. Preliminary experiments offered very encouraging results, although there is still a long way to go to approach the performance of human experts. We also propose to use the universal phoneme posteriorgram (UPP), derived from an MLP trained on corpora of mixed languages, as frame-level features in both supervised detection and unsupervised discovery of EPs. Experimental results show that using UPP not only achieves the best performance , but also is useful in analyzing the mispronunciation produced by language learners.誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 Computer assisted language learning . . . . . . . . . . . . . . . . . . . .1 1.2 Major contributions of this dissertation . . . . . . . . . . . . . . . . . . .5 1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 2 Background Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 2.2 Error Pattern definition and labeling . . . . . . . . . . . . . . . . . . . .8 2.3 Multi-layer perceptron in acoustic modeling . . . . . . . . . . . . . . . . 11 2.4 Universal Phoneme Posteriorgram (UPP) . . . . . . . . . . . . . . . . . 14 3 Supervised Detection of Pronunciation Error Patterns . . . . . . . . . . . . . . 17 3.1 Acoustic modeling for EPs . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 EP detection framework based on the hybrid approach . . . . . . . . . . 20 3.3 Hierarchical MLPs as the EP classifiers . . . . . . . . . . . . . . . . . . 22 3.4 EP diagnosis confidence estimation . . . . . . . . . . . . . . . . . . . . . 23 3.5 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.6 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.7 Experimental results and discussion . . . . . . . . . . . . . . . . . . . . 28 3.8 Complementarity analysis for the EP classifiers and EP AMs in the pro- posed framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Unsupervised Discovery of Pronunciation Error Patterns . . . . . . . . . . . . 34 4.1 Framework overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Hierarchical Agglomerative Clustering (HAC) and Segment-level Feature Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Unsupervised Clustering Algorithms for EP Discovery . . . . . . . . . . 39 4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.6 Experimental results (I) – K-means with assumed known number of EPs . 42 4.7 Experimental results (II) – GMM-MDL with automatically estimated num- ber of EPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.8 Analysis for an example set of automatically discovered EPs . . . . . . . 46 5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .542754846 bytesapplication/pdf論文公開時間:2014/03/09論文使用權限:同意有償授權(權利金給回饋本人)電腦輔助語言學習電腦輔助發音訓練偏誤模式偵測偏誤模式探勘宇集音素事後機率發音偏誤模式之督導式偵測與非督導式探勘用於電腦輔助語言學習Supervised Detection and Unsupervised Discovery of Pronunciation Error Patterns for Computer-Assisted Language Learningthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/262892/1/ntu-103-D98921028-1.pdf