OMPAL: Bridging Speech and Learning with an Open-Source Mandarin Pronunciation Assessment Corpus for Global Learners
Journal
Interspeech 2025
Series/Report No.
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
Start Page
2415
End Page
2419
ISSN
2308457X
Date Issued
2025-08-17
Author(s)
Abstract
This paper introduces OMPAL, a new open-source Mandarin corpus specifically designed for non-native pronunciation assessment. This corpus comprises 1,768 Mandarin utterances from French L1 speakers learning Mandarin, each meticulously annotated by four experts with professional Mandarin teaching experience at both the word and sentence levels. We also provide a manual scoring system to assist researchers in constructing related corpora. Furthermore, a baseline model for pronunciation assessment, which is publicly accessible, is provided alongside our corpus. The OMPAL corpus, available for commercial and non-commercial use, is designed to support and enhance speech research across various applications. We believe that OMPAL will be a valuable resource for the speech research community.
Event(s)
26th Interspeech Conference 2025
Subjects
computer-aided pronunciation training (CAPT)
corpus
deep learning
Mandarin
second language (L2)
Publisher
ISCA
Type
conference paper
