Speaker Identification System based on Long Term Average Spectrum and Speech Content Distribution
Date Issued
2012
Date
2012
Author(s)
Tsai, Zong-Syuan
Abstract
Timbre is the important characteristic that human can distinguish the difference between each other by their voice. This thesis aims to give a feature of timbre that considers both Long Term Average Spectrum (LTAS) and speech content distribution and implements to speaker identification system.
LTAS is a feature influenced by both characteristics of speaker and content, so the same speaker still has inconsistent patterns of LTAS. The inconsistency of patterns also directly influences the accuracy of speaker identification using LTAS as feature.
To increase the accuracy by considering the effect of content, this thesis proposes the idea of Pseudo LTAS. All Taiwanese Mandarin phonemes are analyzed. Then the influential phonemes are chosen and their average spectra are derived as the components of speaker database. When the test speech signal is inputted, system recognizes its content and synthesizes the pseudo LTAS weighted by the content for individual. Because the contents of Pseudo LTAS and test speech signal are same, the accuracy of speaker identification using Pseudo LTAS as the decision pattern will be better than the one using LTAS which ignores the influence of content.
The accuracy of speaker identification system using Pseudo LTAS is 94.2 %.
LTAS is a feature influenced by both characteristics of speaker and content, so the same speaker still has inconsistent patterns of LTAS. The inconsistency of patterns also directly influences the accuracy of speaker identification using LTAS as feature.
To increase the accuracy by considering the effect of content, this thesis proposes the idea of Pseudo LTAS. All Taiwanese Mandarin phonemes are analyzed. Then the influential phonemes are chosen and their average spectra are derived as the components of speaker database. When the test speech signal is inputted, system recognizes its content and synthesizes the pseudo LTAS weighted by the content for individual. Because the contents of Pseudo LTAS and test speech signal are same, the accuracy of speaker identification using Pseudo LTAS as the decision pattern will be better than the one using LTAS which ignores the influence of content.
The accuracy of speaker identification system using Pseudo LTAS is 94.2 %.
Subjects
Long Term Average Spectrum
Spectrum synthesis
Speaker identification
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-101-R99921062-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):e3283d3d0a33fba1fcceb5d327d61d98