Yeh, Chin Chia MichaelChin Chia MichaelYehYI-HSUAN YANG2023-10-242023-10-242012-07-279781450313292https://scholars.lib.ntu.edu.tw/handle/123456789/636481This paper concerns the development of a music codebook for summarizing local feature descriptors computed over time. Comparing to a holistic representation, this text-like representation better captures the rich and time-varying information of music. We systematically compare a number of existing codebook generation techniques and also propose a new one that incorporates labeled data in the dictionary learning process. Several aspects of the encoding system such as local feature extraction and codeword encoding are also an- alyzed. Our result demonstrates the superiority of sparsity- enforced dictionary learning over conventional VQ-based or exemplar-based methods. With the new supervised dictionary learning algorithm and the optimal settings inferred from the performance study, we achieve state-of-the-art accuracy of music genre classification using just the log-power spectrogram as the local feature descriptor. The classification accuracies for benchmark datasets GTZAN and IS- MIR2004Genre are 84.7% and 90.8%, respectively. Copyright © 2012 ACM.Dictionary learning | Genre classification | Sparse codingSupervised dictionary learning for music genre classificationconference paper10.1145/2324796.23248592-s2.0-84864120028https://api.elsevier.com/content/abstract/scopus_id/84864120028