https://scholars.lib.ntu.edu.tw/handle/123456789/580917
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.author | Hsu P.-C | en_US |
dc.contributor.author | HUNG-YI LEE | en_US |
dc.creator | Hsu P.-C;Lee H.-Y. | - |
dc.date.accessioned | 2021-09-02T00:05:16Z | - |
dc.date.available | 2021-09-02T00:05:16Z | - |
dc.date.issued | 2020 | - |
dc.identifier.issn | 2308457X | - |
dc.identifier.uri | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098206854&doi=10.21437%2fInterspeech.2020-1736&partnerID=40&md5=ea7e26ff29eb02dbc24b649fa4dd0858 | - |
dc.identifier.uri | https://scholars.lib.ntu.edu.tw/handle/123456789/580917 | - |
dc.description.abstract | In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online. ? 2020 International Speech Communication Association. All rights reserved. | - |
dc.relation.ispartof | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.subject | Graphics processing unit; Speech synthesis; Computational resources; Flow-based models; Frequency domains; High-fidelity; Loss functions; Speech waveforms; Training data; Waveform generation; Speech communication | - |
dc.title | WG-WaveNet: Real-time high-fidelity speech synthesis without GPU | en_US |
dc.type | conference paper | en |
dc.identifier.doi | 10.21437/Interspeech.2020-1736 | - |
dc.identifier.scopus | 2-s2.0-85098206854 | - |
dc.relation.pages | 210-214 | - |
dc.relation.journalvolume | 2020-October | - |
item.fulltext | no fulltext | - |
item.openairecristype | http://purl.org/coar/resource_type/c_5794 | - |
item.cerifentitytype | Publications | - |
item.openairetype | conference paper | - |
item.grantfulltext | none | - |
crisitem.author.dept | Electrical Engineering | - |
crisitem.author.dept | Intel-NTU Connected Context Computing Center | - |
crisitem.author.dept | Communication Engineering | - |
crisitem.author.dept | Computer Science and Information Engineering | - |
crisitem.author.dept | Networking and Multimedia | - |
crisitem.author.dept | Center for Artificial Intelligence and Advanced Robotics | - |
crisitem.author.dept | Master's Program in Smart Medicine and Health Informatics (SMARTMHI) | - |
crisitem.author.orcid | 0000-0002-9654-5747 | - |
crisitem.author.parentorg | College of Electrical Engineering and Computer Science | - |
crisitem.author.parentorg | Others: University-Level Research Centers | - |
crisitem.author.parentorg | Others: International Research Centers | - |
crisitem.author.parentorg | College of Electrical Engineering and Computer Science | - |
crisitem.author.parentorg | College of Electrical Engineering and Computer Science | - |
crisitem.author.parentorg | College of Electrical Engineering and Computer Science | - |
crisitem.author.parentorg | Others: University-Level Research Centers | - |
crisitem.author.parentorg | International College | - |
顯示於: | 電機工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。