Tri-Articulatory Feature and Multi-input/Multi-target Deep Neural Network
Date Issued
2016
Date
2016
Author(s)
Yang, Chih-Hsiang
Abstract
Tri-articulatory feature(Tri-AF) is a context-dependent articulatory feature. When we speak, the shape of mouth change continuously. Therefore, the same phone with different context should be different in articulatory feature. In this thesis, the articulatory feature is categorized into eight groups; construct context-dependent Hidden Markov Model for each group, and then we can get tri-AF labels. In speech recognition, deep neural network(DNN) has been widely used for acoustic model, and multi-target training DNN has been demonstrated that it can improve acoustic model. Accoding to this concept, this paper uses triphone, tri-AF, grapheme as multitarget to enhance the acoustic model. On the other hand, two-stage DNN is also popular in recent year. The first stage acts as feature extraction model; concatenate the extracted feature with acoustic feature to be the input of second stage. This thesis uses grapheme, tri-AF, monolingual bottleneck feature and multilingual bottleneck feature as extra input to realize multi-input DNN. Finally, combining multi-target and multi-input to fulfill multi-input/multi-target DNN, and we can get the best recognition results.
Subjects
articulatory feature
bottleneck feature
deep neural network(DNN)
multi-target DNN
multi-input DNN
Type
thesis
File(s)
Loading...
Name
ntu-105-R03942066-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):949554cc84f885802711c90445abb5e1