Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Wei, C.-K.;Chung, C.-T.;Lee, H.-Y.;Lee, L.-S.

doi:10.1109/ICASSP.2017.7953141

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

Journal

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Pages

5165-5169

Date Issued

2017

Author(s)

Wei, C.-K.

Chung, C.-T.

HUNG-YI LEE

LIN-SHAN LEE

DOI

10.1109/ICASSP.2017.7953141

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85023743540&doi=10.1109%2fICASSP.2017.7953141&partnerID=40&md5=fd7a85ab4fcd6daaa74f73205b91b757

Abstract

It is well known that recognizers personalized to each user are much more effective than user-independent recognizers. With the popularity of smartphones today, although it is not difficult to collect a large set of audio data for each user, it is difficult to transcribe it. However, it is now possible to automatically discover acoustic tokens from unlabeled personal data in an unsupervised way. We therefore propose a multi-task deep learning framework called a phoneme-token deep neural network (PTDNN), jointly trained from unsupervised acoustic tokens discovered from unlabeled data and very limited transcribed data for personalized acoustic modeling. We term this scenario 'weakly supervised'. The underlying intuition is that the high degree of similarity between the HMM states of acoustic token models and phoneme models may help them learn from each other in this multi-task learning framework. Initial experiments performed over a personalized audio data set recorded from Facebook posts demonstrated that very good improvements can be achieved in both frame accuracy and word accuracy over popularly-considered baselines such as fDLR, speaker code and lightly supervised adaptation. This approach complements existing speaker adaptation approaches and can be used jointly with such techniques to yield improved results. © 2017 IEEE.

Subjects

deep neural network; multitask learning; speech adaptation; transfer learning; unsupervised learning

Type

conference paper

Personalized acoustic modeling by weakly supervised multi-task deep learning using acoustic tokens discovered from unlabeled data

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)