Tseng, Wei ChengWei ChengTsengKao, Wei TsungWei TsungKaoHUNG-YI LEE2023-07-172023-07-172022-01-012308457Xhttps://scholars.lib.ntu.edu.tw/handle/123456789/633658Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain-adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero-shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.MOS predicion | self-supervised learningDDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scoresconference paper10.21437/Interspeech.2022-112472-s2.0-85140066920https://api.elsevier.com/content/abstract/scopus_id/85140066920