https://scholars.lib.ntu.edu.tw/handle/123456789/607152
標題: | Utilizing self-supervised representations for MOS prediction | 作者: | Tseng W.-C Huang C.-Y Kao W.-T Lin Y. Y. HUNG-YI LEE |
關鍵字: | MOS prediction;Self-supervised learning;Speech quality assessment;Machine learning;Petroleum reservoir evaluation;Speech communication;Speech processing;Automatic evaluation;Critical issues;Evaluation approach;Ground truth data;Human perception;Parallel data;Voice conversion;Forecasting | 公開日期: | 2021 | 卷: | 5 | 起(迄)頁: | 3521-3525 | 來源出版物: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | 摘要: | Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and time-consuming because crowd work is necessary. It thus becomes highly desired to develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data. In this paper, we use self-supervised pre-trained models for MOS prediction. We show their representations can distinguish between clean and noisy audios. Then, we fine-tune these pre-trained models followed by simple linear layers in an end-to-end manner. The experiment results showed that our framework outperforms the two previous state-of-the-art models by a significant improvement on Voice Conversion Challenge 2018 and achieves comparable or superior performance on Voice Conversion Challenge 2016. We also conducted an ablation study to further investigate how each module benefits the task. The experiment results are implemented and reproducible with publicly available toolkits. ? 2021 ISCA |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85117824559&doi=10.21437%2fInterspeech.2021-2013&partnerID=40&md5=e7feaac120637d3b7076227417997a18 https://scholars.lib.ntu.edu.tw/handle/123456789/607152 |
ISSN: | 2308457X | DOI: | 10.21437/Interspeech.2021-2013 |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。