Singing Voice Separation Using Equivalent Rectangular Bandwidth NMF and Recurrent Neural Network
Date Issued
2016
Date
2016
Author(s)
Su, Chao-Wei
Abstract
Music information retrieval (MIR) plays an important role in today’s society. Many music applications, such as Shazam, Soundhound, Spotify, or Apple Music, in need of “query by humming” or “music recommendation” technique’s help. These techniques are all in the range of MIR domain. One of the most widely discussed topic is source separation of music signal. When the mixed signal can be separated into the components that consist of them, the performance of music recognition, retrieval, or re-creation can be greatly improved. Traditionally, source separation are done by using non-negative matrix factorization (NMF) technique on short-time Fourier transform (STFT) of signals. Recent researches showed that using the spectra of equivalent rectangular bandwidth (ERB) could further improve the performance of NMF. Besides, as the improvement of computing power, deep learning techniques do a great job on the machine learning researches. Among then, recurrent neural network (RNN) has better result on the data which has time continuous feature. This thesis studied both old and new works respectively, and proposed an integral structure. The results show that within the proper amount of iterations, the combination of two methods has better performance and convergence time.
Subjects
Music information retrieval
Source separation
Nonnegative matrix factorization
Equivalent rectangular bandwidth
Fourier transform
Deep learning
Neural network
Recurrent neural network
Type
thesis
