Tone Labeling by Deep Learning-based Tone Recognizer for Mandarin Speech
Journal
2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Start Page
873
End Page
880
Date Issued
2023-10-31
Author(s)
Abstract
Tone labeling of tone sandhi and polyphones is crucial when preparing a high-quality speech corpus for constructing a Mandarin text-to-speech system. Correct tone labeling may ensure that the constructed text-to-speech system can generate a natural prosody. This paper proposes tone labeling using an iterative method with a deep learning-based tone recognizer. The experimental results showed that the proposed method could robustly label tones for syllables of tone sandhi and polyphones on a multi-speaking rate Mandarin speech corpus. Furthermore, this study found that syllables misrecognized as different tones from lexical tones may reflect the true tone realizations caused by coarticulation, location in a prosodic structure, and speaking rates. This study also provided a quantitative analysis of the relationship between labeled tones and prosodic structure to conform to the characteristics found in previous linguistic studies. © 2023 IEEE.
SDGs
Publisher
IEEE
Type
conference paper
