VQVC+: One-shot voice conversion by vector quantization and U-Net architecture

Wu D.-Y; Chen Y.-H; HUNG-YI LEE; Wu D.-Y;Chen Y.-H;Lee H.-Y.

doi:10.21437/Interspeech.2020-1443

VQVC+: One-shot voice conversion by vector quantization and U-Net architecture

Journal

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Journal Volume

2020-October

Pages

4691-4695

Date Issued

2020

Author(s)

Wu D.-Y

Chen Y.-H

HUNG-YI LEE

DOI

10.21437/Interspeech.2020-1443

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098233777&doi=10.21437%2fInterspeech.2020-1443&partnerID=40&md5=d336f3475206a343b83f5079ac99413c

https://scholars.lib.ntu.edu.tw/handle/123456789/580916

Abstract

Voice conversion (VC) is a task that transforms the source speaker's timbre, accent, and tones in audio into another one's while preserving the linguistic content. It is still a challenging work, especially in a one-shot setting. Auto-encoder-based VC methods disentangle the speaker and the content in input speech without explicit information about the speaker's identity, so these methods can further generalize to unseen speakers. The disentangle capability is achieved by vector quantization (VQ), adversarial training, or instance normalization (IN). However, the imperfect disentanglement may harm the quality of output speech. In this work, to further improve audio quality, we use the U-Net architecture within an auto-encoder-based VC system. We find that to leverage the U-Net architecture, a strong information bottleneck is necessary. The VQ-based method, which quantizes the latent vectors, can serve the purpose. The objective and the subjective evaluations show that the proposed method performs well in both audio naturalness and speaker similarity. Copyright ? 2020 ISCA

Subjects

Architecture; Learning systems; Signal encoding; Speech communication; Audio quality; Auto encoders; Explicit information; Information bottleneck; Latent vectors; NET architecture; Subjective evaluations; Voice conversion; Vector quantization

SDGs

[SDGs]SDG4

Type

conference paper

VQVC+: One-shot voice conversion by vector quantization and U-Net architecture

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)