臺灣大學: 電機工程學研究所陳永耀蔡宗軒Tsai, Zong-SyuanZong-SyuanTsai2013-03-272018-07-062013-03-272018-07-062012http://ntur.lib.ntu.edu.tw//handle/246246/253885音色是人類用以區分人聲差異性的重要特徵,因此如何描述音色的差異性也成為語者辨識領域中的重要課題。本研究致力於提出一個同時考慮長時間平均頻譜與語音內容分布的音色特徵,並應用此特徵於語者辨識系統上。 長時間平均頻譜是一個綜合內容影響和語者特性影響的特徵,所以長時間平均頻譜對同一位語者並不具有一致性的特徵型態。這也直接影響到利用長時間平均頻譜作為特徵選取的語者辨識系統的辨識率。 為了考量語音內容對長時間平均頻譜的影響以提高辨識率,本論文提出虛擬長時間平均頻譜的概念。首先,分析並找出具足夠影響力的中文發音音素,再依據這些音素建立相對應的平均頻譜並存放至語者資料庫。當未知的測試語音訊號輸入至系統時,系統會先辨識其語音內容,再依據此內容從語者資料庫合成出每一位語者相對應的虛擬的長時間平均頻譜。因為虛擬長時間平均頻譜與測試訊號的平均頻譜有著相同的發音內容,所以利用虛擬長時間平均頻譜作為語者辨識的決策基準,其效能也會較利用忽略內容影響的長時間平均頻譜來得高。 最後,本論文利用虛擬長時間平均頻譜得到了 94.2% 的語者辨識率。Timbre is the important characteristic that human can distinguish the difference between each other by their voice. This thesis aims to give a feature of timbre that considers both Long Term Average Spectrum (LTAS) and speech content distribution and implements to speaker identification system. LTAS is a feature influenced by both characteristics of speaker and content, so the same speaker still has inconsistent patterns of LTAS. The inconsistency of patterns also directly influences the accuracy of speaker identification using LTAS as feature. To increase the accuracy by considering the effect of content, this thesis proposes the idea of Pseudo LTAS. All Taiwanese Mandarin phonemes are analyzed. Then the influential phonemes are chosen and their average spectra are derived as the components of speaker database. When the test speech signal is inputted, system recognizes its content and synthesizes the pseudo LTAS weighted by the content for individual. Because the contents of Pseudo LTAS and test speech signal are same, the accuracy of speaker identification using Pseudo LTAS as the decision pattern will be better than the one using LTAS which ignores the influence of content. The accuracy of speaker identification system using Pseudo LTAS is 94.2 %.3603615 bytesapplication/pdfen-US長時間平均頻譜頻譜合成語者辨識Long Term Average SpectrumSpectrum synthesisSpeaker identification長時間平均頻譜與語音內容分布的語者辨識系統Speaker Identification System based on Long Term Average Spectrum and Speech Content Distributionthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/253885/1/ntu-101-R99921062-1.pdf