電機資訊學院: 電信工程學研究所指導教授: 李琳山楊植翔Yang, Chih-HsiangChih-HsiangYang2017-03-062018-07-052017-03-062018-07-052016http://ntur.lib.ntu.edu.tw//handle/246246/276714三連發聲特徵(tri-articulatory feature, tri-AF)是一種考慮前後文的發聲特徵。人在說話時,口型會連續變化,故前後連接的音素不同時,相同的發聲特徵應有所不同。本論文將發聲特徵分為八大類別,每個類別皆建成考慮前後文的隱藏馬可夫模型(Context-dependent Hidden Markov Model),藉此得到三連發聲特徵標記。 在語音辨識中,深層類神經網路(deep neural network, DNN)已廣泛被用來建構聲學模型。多訓練目標之深層類神經網路亦已被證實能夠改善模型的表現,故本論文以此為基本架構,使用三連音素、字母與三連發聲特徵為多重訓練目標,以增強聲學模型。 此外,兩階段的深層類神經網路模型在近期也被廣泛使用,第一階段的深層類神經網路作為特徵抽取之用,將抽出的特徵和聲學特徵結合,作為第二階段深層類神經網路的輸入。本論文將聲學特徵結合字母、三連發聲特徵、單語言瓶頸特徵與多語言瓶頸特徵等,實現多輸入特徵之深層類神經網路。 最後,本論文結合上述兩者,實現多輸入特徵/多訓練目標之深層類神經網路,兩者相輔相成,得到最佳的實驗結果。Tri-articulatory feature(Tri-AF) is a context-dependent articulatory feature. When we speak, the shape of mouth change continuously. Therefore, the same phone with different context should be different in articulatory feature. In this thesis, the articulatory feature is categorized into eight groups; construct context-dependent Hidden Markov Model for each group, and then we can get tri-AF labels. In speech recognition, deep neural network(DNN) has been widely used for acoustic model, and multi-target training DNN has been demonstrated that it can improve acoustic model. Accoding to this concept, this paper uses triphone, tri-AF, grapheme as multitarget to enhance the acoustic model. On the other hand, two-stage DNN is also popular in recent year. The first stage acts as feature extraction model; concatenate the extracted feature with acoustic feature to be the input of second stage. This thesis uses grapheme, tri-AF, monolingual bottleneck feature and multilingual bottleneck feature as extra input to realize multi-input DNN. Finally, combining multi-target and multi-input to fulfill multi-input/multi-target DNN, and we can get the best recognition results.7480550 bytesapplication/pdf論文公開時間: 2016/8/2論文使用權限: 同意有償授權(權利金給回饋學校)發聲特徵瓶頸特徵深層類神經網路多目標學習之深層類神經網路多輸入特徵之深層類神經網路articulatory featurebottleneck featuredeep neural network(DNN)multi-target DNNmulti-input DNN三連發聲特徵與多輸入多目標之深層類神經網路Tri-Articulatory Feature and Multi-input/Multi-target Deep Neural Networkthesis10.6342/NTU201601024http://ntur.lib.ntu.edu.tw/bitstream/246246/276714/1/ntu-105-R03942066-1.pdf