WG-WaveNet: Real-time high-fidelity speech synthesis without GPU

Hsu P.-C;Lee H.-Y.

DC 欄位	值	語言
dc.contributor.author	Hsu P.-C	en_US
dc.contributor.author	HUNG-YI LEE	en_US
dc.creator	Hsu P.-C;Lee H.-Y.	-
dc.date.accessioned	2021-09-02T00:05:16Z	-
dc.date.available	2021-09-02T00:05:16Z	-
dc.date.issued	2020	-
dc.identifier.issn	2308457X	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098206854&doi=10.21437%2fInterspeech.2020-1736&partnerID=40&md5=ea7e26ff29eb02dbc24b649fa4dd0858	-
dc.identifier.uri	https://scholars.lib.ntu.edu.tw/handle/123456789/580917	-
dc.description.abstract	In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online. ? 2020 International Speech Communication Association. All rights reserved.	-
dc.relation.ispartof	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	-
dc.subject	Graphics processing unit; Speech synthesis; Computational resources; Flow-based models; Frequency domains; High-fidelity; Loss functions; Speech waveforms; Training data; Waveform generation; Speech communication	-
dc.title	WG-WaveNet: Real-time high-fidelity speech synthesis without GPU	en_US
dc.type	conference paper	en
dc.identifier.doi	10.21437/Interspeech.2020-1736	-
dc.identifier.scopus	2-s2.0-85098206854	-
dc.relation.pages	210-214	-
dc.relation.journalvolume	2020-October	-
item.fulltext	no fulltext	-
item.openairecristype	http://purl.org/coar/resource_type/c_5794	-
item.cerifentitytype	Publications	-
item.openairetype	conference paper	-
item.grantfulltext	none	-
crisitem.author.dept	Electrical Engineering	-
crisitem.author.dept	Intel-NTU Connected Context Computing Center	-
crisitem.author.dept	Communication Engineering	-
crisitem.author.dept	Computer Science and Information Engineering	-
crisitem.author.dept	Networking and Multimedia	-
crisitem.author.dept	Center for Artificial Intelligence and Advanced Robotics	-
crisitem.author.dept	Master's Program in Smart Medicine and Health Informatics (SMARTMHI)	-
crisitem.author.orcid	0000-0002-9654-5747	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
crisitem.author.parentorg	Others: University-Level Research Centers	-
crisitem.author.parentorg	Others: International Research Centers	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
crisitem.author.parentorg	College of Electrical Engineering and Computer Science	-
crisitem.author.parentorg	Others: University-Level Research Centers	-
crisitem.author.parentorg	International College	-
顯示於：	電機工程學系

顯示文件簡單紀錄

SCOPUS^TM
Citations

checked on 2023/11/17

Page view(s)

checked on 2024/5/18

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM