One-Shot Voice Conversion by Vector Quantization

Wu, D.-Y.D.-Y.WuHUNG-YI LEE2021-05-052021-05-05202015206149https://www.scopus.com/inward/record.url?eid=2-s2.0-85089227176&partnerID=40&md5=250cdc69c2b513111c3181093ce099d7https://scholars.lib.ntu.edu.tw/handle/123456789/558970In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. We model the content embedding as a series of discrete codes and take the difference between quantize-before and quantize-after vector as the speaker embedding. We show that this approach has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one-shot VC is thus achieved. © 2020 IEEE.disentangled representations; vector quantization; voice conversion[SDGs]SDG10One-Shot Voice Conversion by Vector Quantizationconference paper10.1109/ICASSP40776.2020.90538542-s2.0-85089227176