A Humming Transcription Algorithm Based on Hidden Markov Models
Date Issued
2016
Date
2016
Author(s)
Chen, Yan-Hsing
Abstract
Segmentation and labelling are core problems in humming transcription. Based on features like energy, voicing and abrupt changes in fundamental frequency (F0), segmentation stage divide the whole song into note sequence with proper boundary. While the F0 sequence are widely varying and out of absolute tuning, labelling stage assign a pitch label such as an integer MIDI note number for each note. According to Ryynanen’s classification, hidden Markov models (HMM) is one of the methods that perform these two stages jointly; SPiTH (Molina, 2015) belong to cascade system, deciding boundary and pitch sequentially. Based on corpus data, HMM methods use probability distribution to model the conventional syntax in music; in the view of that music in constituted by notes, SPiTH filters the unstable pitch change in each note, obtaining better note boundary. We propose a humming transcription system in this paper. In the stages of segmentation and labelling, firstly, the interval-based segmentation (SPiTH) divide song into note set. Second, HMM model which is trained by collected corpus, is used to assign pitch label to each note. In the experiment, this method has 55% correct in note rate. The main reason of this advantage is not lying on the proper note boundary, but the prior pitch label: The assignment of prior pitch makes the unstable pitch change shrink, which make the tuning problem (singing out-of-tune) more easily. In the evaluation method, we collect 140 songs recorded by non-professional user and make the answer of each song (ground truth). Firstly, experts play on the MIDI keyboard and record it. Second, the MIDI file are aligned to the WAV file through dynamic time warping (DTW) algorithm. At last, an expert corrects the remain errors manually. When making the ground truth, the pitch difference between MIDI and WAV highlight the tuning problem. After reviewing the related literature, we also propose the principle of correcting pitch based on tolerance difference between singer and listener and the error propagating phenomenon.
Subjects
humming transcription
note segmentation and labelling
HMM
corpus
tuning
Type
thesis
File(s)
Loading...
Name
ntu-105-R01943140-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):abcf8b8a9776d8d6c3260dd5f37a8152