賴飛羆臺灣大學:電機工程學研究所蕭鈞榮Hsiao, Chun-JungChun-JungHsiao2007-11-262018-07-062007-11-262018-07-062006http://ntur.lib.ntu.edu.tw//handle/246246/53088本篇論文主要的研究方向是將語音變換方法架構在2.4kbps低位元率的混合激發線性預測(Mixed Excitation Linear Prediction)語音編碼器上,以便實際應用在即時通訊之中,增添娛樂性質甚至保密功能。 經由大量語料統計發現,在相同語者說話語音的相同音節發音當中,使用MELP編碼器分析而得的四階線頻譜(Line Spectrum Frequency)參數,其第一階及第二階參數在向量索引(index)的分布上具有多數聚集的特性。本論文提出以音節為基礎的對照方式,建造一來源語者與目標語者的口腔頻譜特徵對照表,以改善因為選錯音節而造成不連續語音的情形;另外線性調整兩語者的基頻週期,改變語者語音的原始激發訊號(Residual Signal);經由模擬實驗結果證實,來源語者確實可以改變成目標語者的效果,而合成語音的品質也令人滿意。In this work we focused on reusing parameters of 2.4kbps Mixed Excitation Linear Prediction (MELP) voice coder, implement the speech conversion from source speaker to the specified target speaker. Using MELP algorithm to analyze the speech, statistically we found that for the same phoneme of the same speaker, the first and second stage indexes of MELP 4-stage vector quantized Line Spectral Frequency (LSF) tend to collect around some certain index values. We proposed a method that based on Mandarin syllable to build up a mapping table of these indexes between the spectral features of the source and the target speakers. To avoid the discontinued voice that caused by mismatching of the syllable, we proposed a new segmental technique based on feature vector frame. The pitch periods of residual signal were also modified using linear relationship. The simulation results show that the source speaker can be changed to the target speaker, and the quality of synthesized voice is good.中文摘要 i Abstract ii 致謝 iii Contents iv List of Figures vi List of Tables vii Chapter 1 INTRODUCTION 1 1.1. Motive 1 1.2. Background and Related works 2 1.3. Research Methodology 3 1.4. Organization 3 Chapter 2 RELATED TECHNOLOGIES OVERVIEW 5 2.1. Fundamental Knowledge of Speech 5 2.1.1. Vocal System 5 2.1.2. Characteristic of Mandarin Speech 6 2.2. Speech Conversion Introduction 9 2.3. MELP Speech Coding Basics 11 2.3.1. Encoder 11 2.3.2. Decoder 14 Chapter 3 RESEARCH METHODOLOGY 17 3.1. Source-Filter Model 17 3.1.1. Vocal Tract Filter 17 3.1.2. Excitation 20 3.2. Method of Spectral Mapping 22 3.2.1. Mandarin Syllable 22 3.2.2. Dynamic Time Warping 22 3.3. Syllable Segments 26 3.3.1. Methodology 26 Chapter 4 SIMULATION AND RESULTS 33 4.1. Simulation 33 4.1.1. Input Speech Data 33 4.1.2. Model Training 33 4.1.3. Modified Speech 35 4.2. Results 36 4.2.1. Mono-syllables 36 4.2.2. Continuous sentences 39 Chapter 5 CONCLUSIONS 43 REFERENCES 45982140 bytesapplication/pdfen-US語音轉換混合激發線性預測國語音節MELPSpeech ConversionMandarin syllable中文語音轉換在混合激發線性預測語音編碼器上之實現Implement Mandarin Speech Conversion on Mixed Excitation Linear Prediction (MELP) CODECthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53088/1/ntu-95-P92921005-1.pdf