Face-based Voice Conversion: Learning the Voice behind a Face

Lu, Hsiao Han; Weng, Shao En; Yen, Ya Fan; Shuai, Hong Han; WEN-HUANG CHENG

doi:10.1145/3474085.3475198

Face-based Voice Conversion: Learning the Voice behind a Face

Journal

MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

ISBN

9781450386517

Date Issued

2021-10-17

Author(s)

Lu, Hsiao Han

Weng, Shao En

Yen, Ya Fan

Shuai, Hong Han

WEN-HUANG CHENG

DOI

10.1145/3474085.3475198

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/628535

URL

https://api.elsevier.com/content/abstract/scopus_id/85119353981

Abstract

Zero-shot voice conversion (VC) trained by non-parallel data has gained a lot of attention in recent years. Previous methods usually extract speaker embeddings from audios and use them for converting the voices into different voice styles. Since there is a strong relationship between human faces and voices, a promising approach would be to synthesize various voice characteristics from face representation. Therefore, we introduce a novel idea of generating different voice styles from different human face photos, which can facilitate new applications, e.g., personalized voice assistants. However, the audio-visual relationship is implicit. Moreover, the existing VCs are trained on laboratory-collected datasets without speaker photos, while the datasets with both photos and audios are in-the-wild datasets. Directly replacing the target audio with the target photo and training on the in-the-wild dataset leads to noisy results. To address these issues, we propose a novel many-to-many voice conversion network, namely Face-based Voice Conversion (FaceVC), with a 3-stage training strategy. Quantitative and qualitative experiments on the LRS3-Ted dataset show that the proposed FaceVC successfully performs voice conversion according to the target face photos. Audio samples can be found on the demo website at https://facevc.github.io/.

Subjects

face-voice relationship | visual-audio generation | voice conversion

Type

conference paper

Face-based Voice Conversion: Learning the Voice behind a Face

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)