Seethevoice: Learning from Music to Visual Storytelling of Shots
Journal
Proceedings - IEEE International Conference on Multimedia and Expo
Journal Volume
2018-July
ISBN
9781538617373
Date Issued
2018-10-08
Author(s)
Wei, Wen Li
Lin, Jen Chun
Liu, Tyng Luh
Wang, Hsin Min
Tyan, Hsiao Rong
Mark Liao, Hong Yuan
Abstract
Types of shots in the language of film are considered the key elements used by a director for visual storytelling. In filming a musical performance, manipulating shots could stimulate desired effects such as manifesting the emotion or deepening the atmosphere. However, while the visual storytelling technique is often employed in creating professional recordings of a live concert, audience recordings of the same event often lack such sophisticated manipulations. Thus it would be useful to have a versatile system that can perform video mashup to create a refined video from such amateur clips. To this end, we propose to translate the music into a near-professional shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. Our method introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. The results from objective and subjective experiments demonstrate that MF-RNNs with film-language can generate an appealing shot sequence with better viewing experience.
Subjects
Language of film | live concert | recurrent neural networks | types of shots
Type
conference paper
