Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment

Wang, Jun You; Leong, Chon In; ANGELA YU-CHEN LIN; Su, Li; JYH-SHING JANG

doi:10.1109/ASRU57964.2023.10389800

Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment

Journal

2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023

ISBN

9798350306897

Date Issued

2023-01-01

Author(s)

Wang, Jun You

Leong, Chon In

ANGELA YU-CHEN LIN

Su, Li

JYH-SHING JANG

DOI

10.1109/ASRU57964.2023.10389800

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/640004

URL

https://api.elsevier.com/content/abstract/scopus_id/85184668375

Abstract

The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue, we adapt pretrained Whisper model and fine-tune it on a monophonic Mandarin singing dataset. With the use of data augmentation and source separation model, results show that the proposed method achieves a character error rate of less than 18% on a Mandarin polyphonic dataset for lyrics transcription, and a mean absolute error of 0.071 seconds for lyrics alignment. Our results demonstrate the potential of adapting a pretrained speech model for lyrics transcription and alignment in low-resource scenarios.

Subjects

automatic lyrics alignment | Automatic lyrics transcription | data augmentation | model adaptation

Type

conference paper

Adapting Pretrained Speech Model for Mandarin Lyrics Transcription and Alignment

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)