https://scholars.lib.ntu.edu.tw/handle/123456789/633658
Title: | DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores | Authors: | Tseng, Wei Cheng Kao, Wei Tsung HUNG-YI LEE |
Keywords: | MOS predicion | self-supervised learning | Issue Date: | 1-Jan-2022 | Journal Volume: | 2022-September | Source: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | Abstract: | Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain-adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero-shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/633658 | ISSN: | 2308457X | DOI: | 10.21437/Interspeech.2022-11247 |
Appears in Collections: | 電機工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.