Crosslingual Acoustic Modeling in Speech Recognition Using Deep Learning
Date Issued
2016
Date
2016
Author(s)
Lu, Hsiang-Hung
Abstract
Speech Signal Processing technologies have gone mature as well as the Big Data Era. The beauty of sound draws high attention from the modern people. These resources are not occupied by only few strong companies, but shared by speakers in different regions, using different languages all over the world. The various types of human speech have their own unique properties, but they all share the same one: people rely on it to comprehend each others. This thesis focuses on the cooperation of speech data from different languages to help enhance the conventional monolingual speech recognition system. The latent crosslingual information could be found and utilized. We use GlobalPhone Corpus to discuss about linguistic knowledge, data-driven methods and model sharing techniques. The research procedure starts from coarse phonetic level mergig to delicate model level sharing in a step-by-step way, achieving better results using crosslingual information. Once multilingual speech recognition systems are built, the model should become deep and cumbersome. The training procedure should contain more complex and time-consuming techniques. To incorporate generalization ability lying inside the huge models with tiny, in-hand and real-time model size, one can use Knowledge Distillation to extract information, thus acheiving model compression.
Subjects
Multilingual
Speech Recognition
Crosslingual Information
Deep Learning
Knowledge Distillation
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-105-R03942039-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):9b5e1da87c3c6c1799a90804b53e8f5c