EMVGAN: Emotion-Aware Music-Video Common Representation Learning via Generative Adversarial Networks
Journal
MMArt-ACM 2022 - Proceedings of the 2022 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia
ISBN
9781450392402
Date Issued
2022-06-27
Author(s)
Abstract
Music can enhance our emotional reactions to videos and images, while videos and images can enrich our emotional response to music. Cross-modality retrieval technology can be used to recommend appropriate music for a given video and vice versa. However, the heterogeneity gap caused by the inconsistent distribution between different data modalities complicates learning the common representation space from different modalities. Accordingly, we propose an emotion-Aware music-video cross-modal generative adversarial network (EMVGAN) model to build an affective common embedding space to bridge the heterogeneity gap among different data modalities. The evaluation results revealed that the proposed EMVGAN model can learn affective common representations with convincing performance while outperforming other existing models. Furthermore, the satisfactory performance of the proposed network encouraged us to undertake the music-video bidirectional retrieval task.
Subjects
common representation learning | cross-modal adversarial mechanism | cross-modal retrieval | emotion recognition | generative adversarial network
Type
conference paper