Lip Optical-flow Driven Automatic Continuous Speech Syllabification in Mandarin
Date Issued
2016
Date
2016
Author(s)
Lin, Yu-Chi
Abstract
Automatic speech recognition (ASR) researches with video can be split into two categories which are based on signal processing methods. One is recognizing signals thoroughly from streaming video, the other is recognizing the signals on the syllable basis. The latter approach analyzes the energy centralized region to reduce noise interference. Recognizing syllables requires detecting the syllable boundaries correctly from continuous speech signals. To achieve this goal, this study focuses on detecting syllable boundaries contained in the continuous Mandarin corpus of the lip images. The transitions between different lip shapes are the key information in detecting syllable boundaries. The algorithm proposed in this research firstly locates the face positions, then dense optical flow is adopted to calculate the lip images variance between every two neighboring video frames. It is the basis that using this variance to detect syllable boundaries via lip images in continuous video frames. Since the audio and video are simultaneously recorded in this research, it is reasonable to assume the boundaries of two adjacent syllables should be seen from image information. The experiment result shows that more than half of the syllable boundaries can be extracted with the variance of lip images when audio signals of the syllables are inseparable from their energy distribution. Using both audio and video channels not only helps raise the stability in detection syllable boundaries, but also make the system robust to resist noisy environment. Furthermore, the recorded database of this study which consists of 2,480 clips (both reading & speaking) by 40 informants will be opened for download to promote academic research in Mandarin continuous speech recognition.
Subjects
Continuous lip reading
Continuous speech recognition
Automatic syllable boundary detection
Automatic syllable segmentation
Dense optical flow
Face localization
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-105-R03525064-1.pdf
Size
23.54 KB
Format
Adobe PDF
Checksum
(MD5):535ce2d80ab78dacd97b7a4e29f98bf5