New Approaches for Detecting Edit Disfluencies in Transcribing Spontaneous Mandarin Speech
Date Issued
2009
Date
2009
Author(s)
Lin, Che-Kuang
Abstract
Detection of edit disfluencies is one of the keys to transcribing spontaneous utterances. In this dissertation, we present improved features and models to detect edit disfluencies and enhance transcription of spontaneous Mandarin speech using hypothesized disfluency interruption points (IPs) and edit word detection. A comprehensive set of prosodic features that takes into account the special characteristics of edit disfluencies in Mandarin is developed, and an improved model combining decision trees and maximum entropy is proposed to detect IPs. This model is further adapted to desired prosodic conditions by latent prosodic modeling, a probabilistic framework for analyzing speech prosody in terms of a set of latent prosodic states. These techniques contribute to higher recognition accuracy (by rescoring with the hypothesized IPs) and better edit word detection (using conditional random fields defined on Chinese characters) in the final transcription, as verified by experiments on a spontaneous Mandarin speech corpus. Detailed analysis on the output latent states of the proposed latent prosodic modeling is conducted. Further analysis on the relevance of the proposed prosodic features to each type of edit disfluency is also conducted for further insight into the characteristics of various disfluency categories.
Subjects
edit disfluency
interruption point detection
prosody
speech recognition
spontaneous speech
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-98-F91942036-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):8094a42036fc988b9d894fdf82edd7bc
