A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

Chou, F.-C.; Tseng, C.-Y.; LIN-SHAN LEE; Chou, F.-C.;Tseng, C.-Y.;Lee, L.-S.

doi:10.1109/TSA.2002.803437

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

Journal

IEEE Transactions on Speech and Audio Processing

Journal Volume

10

Journal Issue

7

Pages

481-494

Date Issued

2002

Author(s)

Chou, F.-C.

Tseng, C.-Y.

LIN-SHAN LEE

DOI

10.1109/TSA.2002.803437

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/498512

https://www.scopus.com/inward/record.uri?eid=2-s2.0-0036816067&doi=10.1109%2fTSA.2002.803437&partnerID=40&md5=5ea5822958e73a1db3355dd586624f4f

Abstract

This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing "phonetically rich" and "prosodically rich" corpora by automatically selecting sentences from a large text corpus to include as many desired phonetic combinations and prosodic features as possible. Automatic phonetic labeling with iterative correction rules and automatic prosodic labeling with a multi-pass top-down procedure were also developed such that the labeling process for the corpora can be completely automatic. Hierarchical prosodic structure for an arbitrary desired text sentence is then generated based on the identification of different levels of break indices, and the prosodic feature sets and appropriate waveform units are finally selected and retrieved from the corpus, modified if necessary, and concatenated to produce the output speech. The special structure of Mandarin Chinese has been carefully considered in all these technologies, and preliminary assessments indicated very encouraging synthesized speech quality.

Subjects

Automatic labeling; Mandarin Chinese; Prosody; Synthesis; Text-to-speech

SDGs

[SDGs]SDG16

Other Subjects

Acoustic signal processing; Algorithms; Heuristic methods; Knowledge based systems; Linguistics; Natural language processing systems; Speech analysis; Speech recognition; Statistical methods; Text processing; Automatic phonetic labeling; Automatic prosodic labeling; Mandarin Chineses; Prosody; Speech corpus; Text analysis; Speech synthesis

Type

journal article

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)