電機資訊學院: 資訊工程學研究所指導教授: 陳文進; 吳家麟蘇兆為Su, Chao-WeiChao-WeiSu2017-03-032018-07-052017-03-032018-07-052016http://ntur.lib.ntu.edu.tw//handle/246246/275601音樂資訊檢索在現今的科技中扮演著重要的角色,許多音樂的相關應用,如Shazam、Soundhound、Spotify、及Apple Music等,都需要聽音辨曲、音樂推薦等技術的支援,而這些應用與技術都屬於音樂資訊檢索的範疇。其中,音樂訊號分離是一個被廣為研究的主題。當混合的聲音訊號能夠被分離成原始的組成成分時,在後續的辨識、檢索、再創造等工作上都能在功效上得到很好的改善。傳統上,訊號分離技術是對音訊的短時傅立葉轉換(STFT)做非負矩陣分解(NMF),而較新的研究顯示,若使用等效矩形頻寬(ERB)所做的頻譜轉換,可以進一步提升非負矩陣分解法的效能。此外,隨著電腦效能的改善,近年來深度學習法(Deep learning)在機器學習領域中取得了非常突出的成績。其中,遞歸神經網路(RNN)對於處理有時間連續性的資料有特別好的效果,且正好與音樂訊號的時間特性吻合,因此也開始被應用在音訊分離的工作上。本篇論文分別研究了前述新舊兩種不同的方式,並將其結合,結果顯示在適當的訓練迭代次數下,兩種方法的結合能夠得到更好的分離效果,以及更快的收斂速度。Music information retrieval (MIR) plays an important role in today’s society. Many music applications, such as Shazam, Soundhound, Spotify, or Apple Music, in need of “query by humming” or “music recommendation” technique’s help. These techniques are all in the range of MIR domain. One of the most widely discussed topic is source separation of music signal. When the mixed signal can be separated into the components that consist of them, the performance of music recognition, retrieval, or re-creation can be greatly improved. Traditionally, source separation are done by using non-negative matrix factorization (NMF) technique on short-time Fourier transform (STFT) of signals. Recent researches showed that using the spectra of equivalent rectangular bandwidth (ERB) could further improve the performance of NMF. Besides, as the improvement of computing power, deep learning techniques do a great job on the machine learning researches. Among then, recurrent neural network (RNN) has better result on the data which has time continuous feature. This thesis studied both old and new works respectively, and proposed an integral structure. The results show that within the proper amount of iterations, the combination of two methods has better performance and convergence time.論文使用權限: 不同意授權音樂資訊檢索訊號分離非負矩陣分解等效矩形頻寬傅立葉轉換深度學習類神經網路遞歸神經網路Music information retrievalSource separationNonnegative matrix factorizationEquivalent rectangular bandwidthFourier transformDeep learningNeural networkRecurrent neural network基於等效矩形頻寬非負矩陣分解與遞歸神經網路的人聲訊號分離方法Singing Voice Separation Using Equivalent Rectangular Bandwidth NMF and Recurrent Neural Networkthesis10.6342/NTU201601491