A Humming Transcription Algorithm Based on Hidden Markov Models

Chen, Yan-Hsing

doi:10.6342/NTU201600513

A Humming Transcription Algorithm Based on Hidden Markov Models

Date Issued

2016

Date

2016

Author(s)

Chen, Yan-Hsing

DOI

10.6342/NTU201600513

URI

http://ntur.lib.ntu.edu.tw//handle/246246/276720

Abstract

Segmentation and labelling are core problems in humming transcription. Based on features like energy, voicing and abrupt changes in fundamental frequency (F0), segmentation stage divide the whole song into note sequence with proper boundary. While the F0 sequence are widely varying and out of absolute tuning, labelling stage assign a pitch label such as an integer MIDI note number for each note. According to Ryynanen’s classification, hidden Markov models (HMM) is one of the methods that perform these two stages jointly; SPiTH (Molina, 2015) belong to cascade system, deciding boundary and pitch sequentially. Based on corpus data, HMM methods use probability distribution to model the conventional syntax in music; in the view of that music in constituted by notes, SPiTH filters the unstable pitch change in each note, obtaining better note boundary. We propose a humming transcription system in this paper. In the stages of segmentation and labelling, firstly, the interval-based segmentation (SPiTH) divide song into note set. Second, HMM model which is trained by collected corpus, is used to assign pitch label to each note. In the experiment, this method has 55% correct in note rate. The main reason of this advantage is not lying on the proper note boundary, but the prior pitch label: The assignment of prior pitch makes the unstable pitch change shrink, which make the tuning problem (singing out-of-tune) more easily. In the evaluation method, we collect 140 songs recorded by non-professional user and make the answer of each song (ground truth). Firstly, experts play on the MIDI keyboard and record it. Second, the MIDI file are aligned to the WAV file through dynamic time warping (DTW) algorithm. At last, an expert corrects the remain errors manually. When making the ground truth, the pitch difference between MIDI and WAV highlight the tuning problem. After reviewing the related literature, we also propose the principle of correcting pitch based on tolerance difference between singer and listener and the error propagating phenomenon.

Subjects

humming transcription

note segmentation and labelling

HMM

corpus

tuning

Type

thesis

File(s)

Name

ntu-105-R01943140-1.pdf

Size

23.32 KB

Format

Adobe PDF

Checksum

(MD5):abcf8b8a9776d8d6c3260dd5f37a8152

A Humming Transcription Algorithm Based on Hidden Markov Models

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)