鄭士康臺灣大學:電機工程學研究所陳錦翰CHen, Chin-HanChin-HanCHen2007-11-262018-07-062007-11-262018-07-062007http://ntur.lib.ntu.edu.tw//handle/246246/53182一般多媒體播放軟體播放音樂時,常會內建視覺效果來搭配音樂,但是這些效果常是一些與音樂無關的圖樣。本論文提出一個結合視覺與聽覺的多媒體播放秀。主要分成三部分工作;首先,建立一資料庫含有四百張關於旅遊與自然景觀的照片和兩百首音樂片段,並透過網路由約五百人建立這些資料的情緒標籤;第二部份,則在探討如何針對音樂與照片這兩種多媒體形式做情緒的分類,利用其低階的特徵及SVM作為我們分類的工具,建立一個情緒分類的架構;最後一部份描述結合此兩種多媒體所使用的策略,提出一個階層式的方法。在第一階段,一首音樂被分析並根據節拍追蹤演算法拆解成多個基本的單元,接著系統分析每個音樂片段的情緒,相對應情緒的照片則被選為搭配的候選資料庫;第二階段中,我們將音樂與照片結合的問題轉換成一尋找最佳解的問題,並利用Viterbi演算法來處理,音樂中的頻譜重心和頻譜通量和照片中的亮度和對比成為組合的條件,播放秀的結果顯示兩種多媒體的結合引發了使用者的更多共鳴。Nowadays, the media player software is often featured with some visual effects when listening to music. But most of them are always meaningless patterns to the musical content. In this thesis, a novel method is exhibited here to show a fancy media player show, integrating auditory effect and visual cognition. The work is divided into three parts. A database of 400 travel and nature photos and 200 clips from film soundtracks is constructed with emotion labels by near five hundreds of users through web. This work marks the ground truth for emotion labels. The second part of the work focuses on the process to automatically detect the emotion of the two kinds of media. Digital photos and music are analyzed with low level features and SVM (Support Vector Machine) is utilized to classify the emotion of the media. In the final part, we demonstrate a strategy to combine these two media. A hierarchical methodology is proposed. In the first phase, a complete music is analyzed and segmented according to the beat tracking algorithm. Music emotion detection is invoked to mark each segment and images with the same emotion become the candidate data source. In the second phase, we formulate the music and photo alignment into an optimization problem and a greedy algorithm is used to solve it. Spectral centroid and spectral flux of music and color brightness and contrast of images are used as the features to coordinate. Results of subjective feedbacks show that the users have given good evaluations.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Literature Survey 2 1.3 Approach 3 1.4 Organization of Thesis 6 Chapter 2 Background 7 2.1 Ground Truth Collection 7 2.2 Music Features Extraction 9 2.3 Image Features Extraction 10 2.4 Emotion Learning 12 2.5 Media Composition 13 Chapter 3 Dataset Collection 15 3.1 Emotion Database Creation 15 3.1.1 Emotion Checklist 15 3.1.2 Labeling Process 20 3.2 Gathering Statistics 22 Chapter 4 Music and Photo Emotion Recognition 25 4.1 Music Emotion Detection 25 4.1.1 Music features 25 4.1.2 Emotion detection 28 4.1.3 Results 30 4.2 Photo Emotion Detection 30 4.2.1 Image features 31 4.2.2 Emotion detection 32 4.2.3 Results 35 Chapter 5 Media Player Show Composition 37 5.1 Analysis Level 39 5.1.1 Music preprocessing 39 5.1.2 Photo classification 44 5.2 Considered Criteria for Composition 44 5.2.1 Problem Formulation 45 5.2.2 The consistence between music and photos 46 5.2.3 The consistence between photo sequences 49 5.3 Optimization method 50 5.3.1 Use of the Greedy algorithm 50 Chapter 6 Results and Evaluation 55 6.1 The Show Sequences 55 6.2 User Evaluation 57 6.3 Future work 58 Chapter 7 Conclusion 611070746 bytesapplication/pdfen-US情緒多媒體emotionmedia以情緒為基礎的多媒體播放展示Emotion-based Media Player Showthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53182/1/ntu-96-R94921040-1.pdf