A multi-model method for short-utterance speaker recognition
Journal
APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011
Pages
857 - 860
Date Issued
2011
Author(s)
Abstract
The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1~5 seconds, the performance decreases significantly because the GMM-UBM based speaker recognition method is a statistical one, of which sufficient data is the foundation. Considering that the use of text information will be helpful to speaker recognition, a multi-model method is proposed to improve short-utterance speaker recognition (SUSR) in Chinese. We build a few phoneme class models for each speaker to represent different parts of the characteristic space and fuse the scores to fit the test data on the models with the purpose of increasing the matching degree between training models and test utterance. Experimental results showed that the proposed method achieved a relative EER reduction of about 26% compared with the traditional GMM-UBM method.
Event(s)
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011
Other Subjects
Class models; Matching degree; Multi-model method; Speaker recognition; Speaker recognition system; Test data; Text information; Training model; Data processing; Speech recognition
Type
conference paper