Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization

Chung, C.-T.;Chan, C.-A.;Lee, L.-S.

標題:	Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization
作者:	Chung, C.-T. Chan, C.-A. LIN-SHAN LEE
關鍵字:	hidden Markov models; iterative optimization; spoken term detection; unsupervised learning; zero resource speech recognition
公開日期:	2013
起(迄)頁:	8081-8085
來源出版物:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
會議論文:	2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
摘要:	Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This linguistic structure includes two-level (subword-like and word-like) acoustic patterns, the lexicon of word-like patterns in terms of subword-like patterns and the N-gram language model based on word-like patterns. All patterns, models, and parameters can be automatically learned from the unlabelled speech corpus. This is achieved by an initialization step followed by three cascaded stages for acoustic, linguistic, and lexical iterative optimization. The lexicon of word-like patterns defines allowed consecutive sequence of HMMs for subword-like patterns. In each iteration, model training and decoding produces updated labels from which the lexicon and HMMs can be further updated. In this way, model parameters and decoded labels are respectively optimized in each iteration, and the knowledge about the linguistic structure is learned gradually layer after layer. The proposed approach was tested in preliminary experiments on a corpus of Mandarin broadcast news, including a task of spoken term detection with performance compared to a parallel test using models trained in a supervised way. Results show that the proposed system not only yields reasonable performance on its own, but is also complimentary to existing large vocabulary ASR systems. © 2013 IEEE.
URI:	https://scholars.lib.ntu.edu.tw/handle/123456789/498588 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84890479779&doi=10.1109%2fICASSP.2013.6639239&partnerID=40&md5=a4f5d9a6165e24c81281bb1ab353dc6b
ISSN:	15206149
DOI:	10.1109/ICASSP.2013.6639239
SDG/關鍵字:	Initialization step; Iterative Optimization; Large vocabulary; Linguistic structure; Manual annotation; Model parameters; N-gram language models; Spoken term detections; Computational linguistics; Hidden Markov models; Optimization; Signal processing; Speech; Speech recognition; Unsupervised learning; Iterative decoding
顯示於：	電機工程學系

顯示文件完整紀錄

SCOPUS^TM
Citations

checked on 2023/11/18

Page view(s)

checked on 2024/4/27

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM