Repository logo
  • English
  • 中文
Log In
Have you forgotten your password?
  1. Home
  2. College of Engineering / 工學院
  3. Applied Mechanics / 應用力學研究所
  4. Speech Emotion Recognition Using Bayesian Network and Adaptive Approach Methods
 
  • Details

Speech Emotion Recognition Using Bayesian Network and Adaptive Approach Methods

Date Issued
2011
Date
2011
Author(s)
Yu, Chih-Yuan
URI
http://ntur.lib.ntu.edu.tw//handle/246246/250107
Abstract
The objective of this study is to develop an automatic speech emotion recognition method using Bayesian Network. By calculating the relevant features of emotion speech and comparing the features with emotion database, the speaker’s emotion state can be identified. Firstly, we calculate the statistical features of pitch, frame energy, formants, mel-scale frequency cepstral coefficients (MFCC). Then we use the mean value of neutral emotion in corpus as normalized factor for each feature, and calculate the normalized features of pitch, frame energy and formants. The normalized features can reduce the feature difference between speakers. Each feature has different ability of emotion recognition. For example, the normalized pitch mean can recognize sad and neutral, and happy and angry can consider as the same cluster. No features can obviously recognize the four emotions, so we use different cluster to recognize the four emotions layer by layer. We cluster the features which have similar ability of emotion recognition and establish the Multi-Layered Bayesian Network (MLBN) method for speech emotion recognition. The features of layer 1 can recognize two clusters of emotion. The features of layer 2 can recognize three clusters of emotion. The features which have no obvious clusters are put on layer 3 and recognize the four emotions. There are some relations between each feature. Therefore, we extend the MLBN method and establish the Multi-Layered Bayesian Network with Covariance (MLBNC) method, which consider the relations between each feature, for speech emotion recognition. The recognition rate will be poor if the training data of recognizer did not contain speaker’s speech emotion data. Therefore, we propose adaptive MLBN and MLBNC method for speech emotion recognition. In the adaptive MLBN and MLBNC process, we adjust the mean and standard deviation or covariance of clusters in the MLBN or MLBNC database to fit speaker’s real emotion status when the recognition result is wrong. To verify the proposed method in this research, we use German emotional database (EMO-DB) as training and testing data for inside and outside test of KNN, SVM, MLBN and MLBNC recognizer. We also use EMO-DB as training data and ITRI emotional database as testing data for different corpus test. In the adaptive tests, we use EMO-DB as training data and ITRI emotional database as adaptive and testing data for adaptive KNN, MLBN and MLBNC recognizer. The inside test recognition rate of MLBN, MLBNC and Bayesian Decision (BD) are 81.1%, 88.8% and 70.8% respectively. It shows that cluster of features layer by layer can effectively increase the recognition rate and it will be better when regards of the relations between each feature. In outside test, the recognition rate of KNN, SVM and MLBN are 78.2%, 89.1% and 69.9% respectively using original features and 82.6%, 91.7% and 77.6% respectively using normalized features. It shows that normalized features can reduce the feature difference between speakers and increase the recognition rate. In testing corpus is different with training, the recognition rate of KNN, SVM, MLBN and MLBNC are 34.21%, 46.92%, 39.33% and 52.08% respectively. It shows if speaker’s pronunciation or emotion presentation is different with training data, the recognition result is bad for each recognizer. For adaptive emotion recognition test, adaptive KNN method can increase the recognition rate from 34.2% to 73.7%, adaptive MLBN method can increase from 37.8% to 82.4% and adaptive MLBNC method can increase from 51.6% to 81.2%. The proposed adaptive MLBN and MLBNC method of this study is better than adaptive KNN method. When adjustment times increase, the recognition rate of MLBN can increase from 39.3% to 88.9% and MLBNC can increase from 52.1% to 90.0%. It shows that adaptive MLBN and MLBNC method can really reflect the real status of speaker’s emotion state and get good recognition results after appropriate adjustment.
Subjects
speech emotion recognition
features
normalization
MLBN
MLBNC
adaptive
File(s)
Loading...
Thumbnail Image
Name

ntu-100-D90543002-1.pdf

Size

23.54 KB

Format

Adobe PDF

Checksum

(MD5):fa7b7c057516fc31fce7267a2130d2de

臺大位居世界頂尖大學之列,為永久珍藏及向國際展現本校豐碩的研究成果及學術能量,圖書館整合機構典藏(NTUR)與學術庫(AH)不同功能平台,成為臺大學術典藏NTU scholars。期能整合研究能量、促進交流合作、保存學術產出、推廣研究成果。

To permanently archive and promote researcher profiles and scholarly works, Library integrates the services of “NTU Repository” with “Academic Hub” to form NTU Scholars.

總館學科館員 (Main Library)
醫學圖書館學科館員 (Medical Library)
社會科學院辜振甫紀念圖書館學科館員 (Social Sciences Library)

開放取用是從使用者角度提升資訊取用性的社會運動,應用在學術研究上是透過將研究著作公開供使用者自由取閱,以促進學術傳播及因應期刊訂購費用逐年攀升。同時可加速研究發展、提升研究影響力,NTU Scholars即為本校的開放取用典藏(OA Archive)平台。(點選深入了解OA)

  • 請確認所上傳的全文是原創的內容,若該文件包含部分內容的版權非匯入者所有,或由第三方贊助與合作完成,請確認該版權所有者及第三方同意提供此授權。
    Please represent that the submission is your original work, and that you have the right to grant the rights to upload.
  • 若欲上傳已出版的全文電子檔,可使用Open policy finder網站查詢,以確認出版單位之版權政策。
    Please use Open policy finder to find a summary of permissions that are normally given as part of each publisher's copyright transfer agreement.
  • 網站簡介 (Quickstart Guide)
  • 使用手冊 (Instruction Manual)
  • 線上預約服務 (Booking Service)
  • 方案一:臺灣大學計算機中心帳號登入
    (With C&INC Email Account)
  • 方案二:ORCID帳號登入 (With ORCID)
  • 方案一:定期更新ORCID者,以ID匯入 (Search for identifier (ORCID))
  • 方案二:自行建檔 (Default mode Submission)
  • 方案三:學科館員協助匯入 (Email worklist to subject librarians)

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science