Publication:
On the preparation and validation of a large-scale dataset of singing transcription

Loading...
Thumbnail Image

Date

2021

Authors

Wang J.-Y
Wang J.-Y;Jang J.-S.R.
JYH-SHING JANG

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Abstract

This paper proposes a large-scale dataset for singing transcription, along with some methods for fine-tuning and validating its contents. The dataset is named MIR-ST500, which consists of more than 160,000 notes from 500 pop songs. To create this large-scale dataset, we set some labeling criteria and ask non-experts to label notes. We also perform some adjustments on the annotation to correct minor errors. Finally, to validate the dataset, we train a singing transcription model on MIR-ST500 dataset and evaluate it on various datasets. The result shows that we can certainly construct a better singing transcription model for various purposes using MIR-ST500, which is properly labeled and validated. ? 2021 IEEE

Description

Keywords

Automatic singing transcription, Dataset preparation, Dataset validation, Music information retrieval, Signal processing, Fine tuning, Large-scale dataset, Large dataset

Citation