https://scholars.lib.ntu.edu.tw/handle/123456789/581258
標題: | Diverse Audio-to-Image Generation via Semantics and Feature Consistency | 作者: | Yang P.-T Su F.-G YU-CHIANG WANG |
關鍵字: | Audition; Semantics; Adversarial networks; Feature consistency; Generative model; Image generations; Image synthesis; Natural languages; State-of-the-art approach; Visual qualities; Image processing | 公開日期: | 2020 | 起(迄)頁: | 1188-1192 | 來源出版物: | 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings | 摘要: | Humans are capable of imagining scene images when hearing ambient sounds. Therefore, audio-to-image synthesis becomes a challenging yet practical topic for both natural language comprehension and image content understanding. In this paper, we propose an audio-to-image generation network by applying the conditional generative adversarial networks. Specifically, we utilize such generative models with the proposed feature consistency and conditional adversarial losses, so that diverse image outputs with satisfactory visual quality can be synthesized from a single audio input. Experimental results on sports audio/visual data verify that the effectiveness and practicality of the proposed method over the state-of-the-art approaches on audio-to-image synthesis. ? 2020 APSIPA. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85100938113&partnerID=40&md5=6f9148240dea0eb3ece9abe07fad52b6 https://scholars.lib.ntu.edu.tw/handle/123456789/581258 |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。