Wang, Ju ChiangJu ChiangWangYI-HSUAN YANGJhuo, I. HongI. HongJhuoLin, Yen YuYen YuLinWang, Hsin MinHsin MinWang2023-10-242023-10-242012-12-269781450310895https://scholars.lib.ntu.edu.tw/handle/123456789/636480This paper presents a novel content-based system that utilizes the perceived emotion of multimedia content as a bridge to connect music and video. Specifically, we propose a novel machine learning framework, called Acousticvisual Emotion Guassians (AVEG), to jointly learn the tripartite relationship among music, video, and emotion from an emotion-annotated corpus of music videos. For a music piece (or a video sequence), the AVEG model is applied to predict its emotion distribution in a stochastic emotion space from the corresponding low-level acoustic (resp. visual) features. Finally, music and video are matched by measuring the similarity between the two corresponding emotion distributions, based on a distance measure such as KL divergence. © 2012 Authors.cross-modal media retrieval | emotion recognition[SDGs]SDG4The acousticvisual emotion guassians model for automatic generation of music videoconference paper10.1145/2393347.23964942-s2.0-84871398713https://api.elsevier.com/content/abstract/scopus_id/84871398713