Spoken cross-language access to image collection via captions

標題:	Spoken cross-language access to image collection via captions
作者:	HSIN-HSI CHEN
公開日期:	2003
起(迄)頁:	2749-2752
來源出版物:	EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
摘要:	This paper presents a framework of using Chinese speech to access images via English captions. The formulation and the structure mapping rules of Chinese and English named entities are extracted from an NICT foreign location name corpus. For a named location, name part and keyword part are usually transliterated and translated, respectively. Keyword spotting identifies the keyword from speech queries and narrows down the search space of image collections. A scoring function is proposed to compute the similarity between speech query and annotated captions in terms of International Phonetic Alphabets. The experimental results show that the average rank and the mean reciprocal rank are 2.04 and 0.8322, respectively, which is very close to the best performance, i.e., 1, for both average rank and mean reciprocal rank.
URI:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84882972670&partnerID=40&md5=32a0f00f2c6ca19acb1a50ca60ecc10d https://scholars.lib.ntu.edu.tw/handle/123456789/632522
SDG/關鍵字:	Linguistics; Speech; Speech recognition; Chinese speech; Cross languages; English captions; Image collections; International Phonetic Alphabet; Keyword spotting; Mean reciprocal ranks; Scoring functions; Speech communication
顯示於：	資訊工程學系

顯示文件完整紀錄

checked on 2023/11/21

checked on 2024/4/27

檢查

TAIR相關文章