Relevant Technologies For Improved Chinese Spoken Dialog Systems
Date Issued
2007
Date
2007
Author(s)
Wang, Nick Jui-Chang
DOI
en-US
Abstract
The application of automatic speech recognition technology in spoken dialogue systems comprises important technologies from several different aspects, including digital signal processing, robust speech feature extraction, acoustic-phonetic modeling, speaker adaptation, language modeling, and language understanding, dialogue management, as well as language generation and speech synthesis. All of these technologies contribute to the performance of the spoken dialogue system, which accomplishes the communication between men and machines. The dissertation includes three relevant technologies for improved Chinese spoken dialogue systems: the first topic is about speaker adaptation and speaker identification, the second one is about speech understanding, and the third one is an interactive open-vocabulary Chinese name input system.
The speaker adaptation technology in the first topic is to adapt acoustic models to speaker voice characteristics to improve speech recognition accuracy. Eigen-MLLR approach was proposed to construct the subspace of MLLR parameters space by Principal Component Analysis (PCA) technique, hence it is more robust than MLLR approach with small amount of enrollment data. Compared with Eigenvoice approach, it requires less storage memory for model-adaptation estimates. Therefore, it could be more realistic for application in speaker-independent spoken dialogue systems. The author compared Eigen-MLLR with MLLR and Eigenvoice, developed a fast Eigen-MLLR coefficient estimation algorithm, and applied Eigen-MLLR coefficients for speaker identification.
The second topic is about speech understanding. Most of speech understanding systems with middle to large vocabularies incorporate a two-stage approach: the speech recognition component as the first stage, followed by the second stage of natural language understanding component. The speech understanding performance is usually constrained by speech recognition errors and out-of-grammar problems. Therefore, it is necessary to have robust speech understanding ability. The proposed novel approach integrates a concept layer, Key Semantic Chunk, into the two-stage system. The Key Semantic Chunk is a language unit between sentence and word, is integrated into both speech recognition and language understanding components, and interfaces the communication between these two components. Not only the language model of speech recognition can be improved in its robustness to data-sparseness, but also the language understanding processing on the speech recognition output can work more robustly. Consequently, the improved system achieved about 30% reduction over understanding errors. Besides, the building and maintenance efforts for language understanding grammars and speech recognition n-gram models can be reduced.
The last topic is to build an interactive open-vocabulary Chinese name input system and to establish an error correction mechanism. The motivation of building the system came from the experience of 104 directory-assistance services in Chunghwa Telecom. This service is the biggest commercial telephony service in Taiwan. It has the largest group of consumers and is frequently used by the telephone user. However, its service is clear and simple – the telephone number of a person, a company, or a branch of a company. The difficulty of an open-vocabulary Chinese name input task is its huge vocabulary size. For example, with very short periods, less than two seconds, of speech, the task requires a system to recognize the target name among billion names. It is incredible to have high recognition accuracy only by the speech recognition technique. The experimental system attempts to design an intelligent and friendly dialogue strategy by incorporate the error correction mechanism to achieve a reasonable high success rate. Referring to actual 104-service interactions, the human operator may attempt to ask the caller to describe again the ambiguous characters. Finally, both character confirmation and character input mechanisms were designed into the experimental system and achieved an 86.7% high success rate.
The dissertation has included several relevant technologies for improved Chinese spoken dialogue systems, although the first two can also be applied in different languages. Via all different research topics, the author would like to understand more about the spoken dialogue system and to improve the whole system performance. There is a wish in the mind of the author: to see the speech recognition and dialogue system technologies being widely and successfully applied in many applications.
Subjects
語音對話系統
自動語音辨識
語者調適技術
語者識別技術
自然語言理解
語音理解
spoken dialogue systems
automatic speech recognition
speaker adaptation
speaker identification
natural language understanding
speech understanding
Chinese name speech input systems
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-96-D87942013-1.pdf
Size
23.31 KB
Format
Adobe PDF
Checksum
(MD5):8e06a0cd4a4872d38bbacfbcffc882ed
