Toward unsupervised model-based spoken term detection with spoken queries without annotated data

Chan, C.-A.;Chung, C.-T.;Kuo, Y.-H.;Lee, L.-S.

Title:	Toward unsupervised model-based spoken term detection with spoken queries without annotated data
Authors:	Chan, C.-A. Chung, C.-T. Kuo, Y.-H. LIN-SHAN LEE
Keywords:	query-by-example; speech pattern discovery; Unsupervised spoken term detection; zero-resource
Issue Date:	2013
Start page/Pages:	8550 - 8554
Source:	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Conference:	2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Abstract:	We present a two-stage model-based approach for unsupervised query-by-example spoken term detection (STD) without any annotated data. Compared to the prevailing DTW approaches for the unsupervised STD task, HMMs used by model-based approaches can better capture the signal distributions and time trajectories of speech with a more global view of the spoken archive; matching with model states also significantly reduces the computational load. The utterances in the spoken archive are first offline decoded into acoustic patterns automatically discovered in an unsupervised way from the spoken archive. In the first stage, we propose a document state matching (DSM) approach, where query frames are matched to the HMM state sequences for the spoken documents. In this process, a novel duration-constrained Viterbi (DC-Vite) algorithm is proposed to avoid unrealistic speaking rate distortion. In the second stage, pseudo relevant/irrelevant examples retrieved from the first stage are respectively used to construct query/anti-query HMMs. Each spoken term hypothesis is then rescored with the likelihood ratio to these two HMMs. Experimental results show an absolute 11.8% of mean average precision improvement with a more than 50% reduction in computation time compared to the segmental DTW approach on a Mandarin broadcast news corpus. © 2013 IEEE.
URI:	https://scholars.lib.ntu.edu.tw/handle/123456789/498582 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84890526441&doi=10.1109%2fICASSP.2013.6639334&partnerID=40&md5=cd1d9f75c9431093c16f8c89dec15998
ISSN:	15206149
DOI:	10.1109/ICASSP.2013.6639334
SDG/Keyword:	Model based approach; Precision improvement; Query-by-example; Signal distribution; Speech patterns; Spoken Term Detection (STD); Spoken term detections; zero-resource; Signal processing; Speech recognition; Viterbi algorithm; Query processing
Appears in Collections:	電機工程學系

Show full item record

SCOPUS^TM
Citations

checked on Dec 27, 2023

Page view(s)

checked on May 18, 2024

Google Scholar^TM

Check

DSpace CRIS

SCOPUS^TM
Citations

Page view(s)

Google Scholar^TM

Altmetric

Altmetric

SCOPUSTM Citations

Page view(s)

Google ScholarTM

Altmetric

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM