Acoustic-Phonetic Likelihood Models for Analysis of Accompanied Singing Audio
Date Issued
2016
Date
2016
Author(s)
Chien, Yu-Ren
Abstract
This dissertation addresses melodic and lyrics analysis of accompanied singing recordings. Central to my approach are likelihood models that integrate acoustic-phonetic knowledge and real-world data. These models are based on a timbral fitness score and a voicing fitness score evaluated for each fundamental frequency (F0) or vowel/voicing candidate. Timbral fitness is measured for the partial amplitudes of an F0 value, with respect to a small set of vocal timbre examples. This F0-specific measurement of timbral fitness depends on an acoustic-phonetic F0 modification of each timbre example, which preserves glottal pulse shape and formant frequencies. In the voicing part of the likelihood models, sinusoids are detected, tracked, and pruned to give loudness values that minimize interference from the accompaniment. A final F0 or syllable estimate is determined by a prior sequential model in addition to the likelihood model. The numerical parameters involved in my approach were optimized on several development sets from different sources before the system was evaluated on multiple test sets separate from these development sets. Controlled experiments show that use of the timbral fitness score accounts for a 13% difference in overall melodic accuracy, and a 7% difference in average normalized lyrics alignment error.
Subjects
melody extraction
lyrics alignment
singing voice
acoustic phonetics
F0 modification
glottal pulse shape
formant frequency
Type
thesis
File(s)![Thumbnail Image]()
Loading...
Name
ntu-105-D98942017-1.pdf
Size
23.32 KB
Format
Adobe PDF
Checksum
(MD5):d12f765ac34f510724dedebac52c3d78
