Speech-to-singing conversion in an encoder-decoder framework

Parekh, Jayneel; Rao, Preeti; YI-HSUAN YANG

doi:10.1109/ICASSP40776.2020.9054473

Speech-to-singing conversion in an encoder-decoder framework

Journal

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Journal Volume

2020-May

ISBN

9781509066315

Date Issued

2020-05-01

Author(s)

Parekh, Jayneel

Rao, Preeti

YI-HSUAN YANG

DOI

10.1109/ICASSP40776.2020.9054473

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/636019

URL

https://api.elsevier.com/content/abstract/scopus_id/85091180372

Abstract

In this paper our goal is to convert a set of spoken lines into sung ones. Unlike previous signal processing based methods, we take a learning based approach to the problem. This allows us to automatically model various aspects of this transformation, thus overcoming dependence on specific inputs such as high quality singing templates or phoneme-score synchronization information. Specifically, we propose an encoder-decoder framework for our task. Given time-frequency representations of speech and a target melody contour, we learn encodings that enable us to synthesize singing that preserves the linguistic content and timbre of the speaker while adhering to the target melody. We also propose a multi-task learning based objective to improve lyric intelligibility. We present a quantitative and qualitative analysis of our framework.

Subjects

Machine learning | Multi-task learning | Speech-to-singing transformation | Style transfer

Type

conference paper

Speech-to-singing conversion in an encoder-decoder framework

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)