Tone Labeling by Deep Learning-based Tone Recognizer for Mandarin Speech

Wu-Hao LiChen-Yu ChiangTE-HSIN LIU2024-07-262024-07-262023-10-31https://scholars.lib.ntu.edu.tw/handle/123456789/720012https://www.scopus.com/inward/record.uri?eid=2-s2.0-85180010068&doi=10.1109%2fAPSIPAASC58517.2023.10317518&partnerID=40&md5=c1092f96e0f13746611270d24226e075Tone labeling of tone sandhi and polyphones is crucial when preparing a high-quality speech corpus for constructing a Mandarin text-to-speech system. Correct tone labeling may ensure that the constructed text-to-speech system can generate a natural prosody. This paper proposes tone labeling using an iterative method with a deep learning-based tone recognizer. The experimental results showed that the proposed method could robustly label tones for syllables of tone sandhi and polyphones on a multi-speaking rate Mandarin speech corpus. Furthermore, this study found that syllables misrecognized as different tones from lexical tones may reflect the true tone realizations caused by coarticulation, location in a prosodic structure, and speaking rates. This study also provided a quantitative analysis of the relationship between labeled tones and prosodic structure to conform to the characteristics found in previous linguistic studies. © 2023 IEEE.[SDGs]SDG4Tone Labeling by Deep Learning-based Tone Recognizer for Mandarin Speechconference paper10.1109/apsipaasc58517.2023.103175182-s2.0-85180010068