Yeh, Che HuaChe HuaYehYI-HSUAN YANGChang, Ming HsuMing HsuChangLiao, Hong Yuan MarkHong Yuan MarkLiao2023-10-202023-10-202015-02-059781479943111https://scholars.lib.ntu.edu.tw/handle/123456789/636331Multimedia content creation and manipulation have garnered attention in recent days due to the desires of personalization. As a content producing application, we propose a novel idea that requires the fusion of video and audio intelligence. The system is composed of at least three core techniques: 1) the capability to process the video sequence to have access to the geometric and appearance information pertaining to meaningful and representative targets, 2) a systematic way to reliably classify and identify important emotions from the music, 3) effective approaches to manipulate the video targets according to the extracted music emotions. In this paper, we report preliminary results of the proposed system. Specifically, we introduce the employed framework to manipulate the magnitude and speed of music conducting gestures of a video sequence of human skeleton according to the emotion intensity and tempo of an arbitrary music excerpt, using state-of-the-art inverse kinematics and music information retrieval techniques. We present the details of the prototype system and validate its effectiveness with a video demonstrating how we can manipulate the music conducting gestures according to the proposed manipulation rules.Motion manipulation | music conducting | music emotion | robotics[SDGs]SDG16Music driven human motion manipulation for characters in a videoconference paper10.1109/ISM.2014.312-s2.0-84930453546https://api.elsevier.com/content/abstract/scopus_id/84930453546