可學習聽覺預期的聽覺記憶模型

鄭士康臺灣大學：電機工程學研究所李務熙Li, Wu-HsiWu-HsiLi2007-11-262018-07-062007-11-262018-07-062004http://ntur.lib.ntu.edu.tw//handle/246246/53579我們提出一個以非督導方式學習的聲音記憶模型，此模型可以學習一種基本型式的聲音知識－經常依序發生的訊號。我們使用自我組織圖來實作模型的底層，每個圖上的細胞會各自對特定聲音特徵產生反應。輸入訊號會先映射到聲音特徵圖上，使一連串的細胞依序被激發。接著模型以一種間接地方式來得到聲音知識：觀察並學習圖上細胞活動的時序相依關係。模型中有一個緩衝區能記錄先前幾個被激發的細胞，使模型能利用被激發細胞前後相依的關係來預測後續活動的細胞。由於每個細胞都對應特定的聲音特徵，因此預期一個細胞會被激發就好像預期聽到聲音一樣。我們試著比對預期與現實是否相符：當模型作出正確的活動細胞預測時，模型聽到了預期中的聲音。相對地，當模型預測錯誤時則表示該聲音是出忽意料的。關於輸入訊號是預期或非預期的資訊提供了進一步認知過程所需的線索。例如，模型可以利用這些資訊來將聲音訊號分割成聲音的單元。此外，我們可以依此來粗估每個短時訊號所帶來的資訊量。我們以音樂和語音訊號分別進行實驗，以示範此模型學習聲音預期的過程和結果。We propose an unsupervised auditory memory model which learns a basic form of auditory knowledge – “what usually happens in sequence” of the audio signal. We use a self-organizing map in the bottom layer of the model; each neuron on the map reacts to specific acoustic feature. The input signal is mapping on the acoustic feature map; a series of neuron is activated in sequence as a result. Then the memory model gains auditory knowledge in an indirect way: it observes the map and learns the sequential regularities of the neuron activities. The model has a context buffer, which keeps the information of previous activated neurons. It uses the context information and the statistic regularities it has learned to anticipate the next active neuron. Since each neuron maps to specific acoustic feature, the prediction of which neuron to be activated is like to expect the sound to hear. Compare what actually happens with what the model expects to happen: When the model makes a correct prediction of the active neuron, the sound it hears is expected. In contrast, when the model makes a wrong prediction, the sound it hears is unexpected. The information of whether the input signal is expected or unexpected provides clues for further perception process. For example, the model can use these information to segment the signal into sound units. Moreover, we can estimate the information quantity given by each short-time frame of signal. Experiment on speech and music signal are conducted to demonstrate how our model learns to expect what it hears.致謝 v Abstract vi 摘要 vii Contents viii List of Figures x List of Tables xi Chapter 1 Introduction 1 1.1 Overview 1 1.2 Approach 2 1.3 Goal of thesis 3 1.4 Organization of Thesis 3 Chapter 2 Background and Previous Work 5 2.1 Self-Organizing Map 5 2.1.1 Introduction 5 2.1.2 SOM algorithm 6 2.1.3 Previous work in audio processing using SOM 7 2.1.4 Summary 9 2.2 Auditory Perception and Memory 10 2.2.1 Human auditory memory 10 2.2.2 What drives the unsupervised learning for massive flow of sensory information? 11 2.2.3 Sequence learning 12 2.2.4 Summary 13 Chapter 3 Acoustic Feature Map using SOM 15 3.1 Introduction 15 3.2 Front-end processing 16 3.2.1 Front-end processing for music signal 17 3.2.2 Front-end processing for speech signal 18 3.3 Implementation of SOM 18 3.3.1 Parameter setting of SOM 19 3.3.2 SOM Training Process 19 3.4 Experiment on SOM using synthesized music signal 20 3.4.1 Learning process of SOM 21 Chapter 4 A model that expects what to hear 27 4.1 Introduction 27 4.2 Sequence Prediction 29 4.2.1 Deterministic vs. Probabilistic 29 4.2.2 Difficulty of Sequence Prediction 30 4.3 Auditory expectation model using lookup table 31 4.3.1 Converting into a sequence prediction problem 32 4.3.2 Implementation of the lookup table 32 4.3.3 Operation flow of expectation model 34 4.4 More of the sequence prediction algorithm 36 4.4.1 Can we measure the performance of probabilistic sequence prediction? 36 4.4.2 The dilemma between learning time and uncertainty 37 4.4.3 Improvement of the sequence prediction algorithm 38 4.5 Expectation Matching Analysis 40 4.5.1 Introduction 40 4.5.2 Expected Active Probability 40 4.5.3 Expectation Matching Rank 41 4.5.4 Short-time Information Quantity 42 4.6 Results and Discussions 43 4.6.1 The learning process 43 4.6.2 The curve of expected active probability 46 4.6.3 The curve of expectation matching rank 49 4.6.4 The short-time information quantity curve 50 Chapter 5 Conclusions and Future Work 53 5.1 Conclusions 53 5.2 Future Work 53 References 55 Publication 57658911 bytesapplication/pdfen-US自我組織圖聽覺預期聽覺記憶redundancyauditory memoryauditory expectationself-organizing mapsequence learning可學習聽覺預期的聽覺記憶模型An SOM-based auditory memory model that learns to perform auditory expectation in an unsupervised mannerthesishttp://ntur.lib.ntu.edu.tw/bitstream/246246/53579/1/ntu-93-R91921093-1.pdf