https://scholars.lib.ntu.edu.tw/handle/123456789/632522
標題: | Spoken cross-language access to image collection via captions | 作者: | HSIN-HSI CHEN | 公開日期: | 2003 | 起(迄)頁: | 2749-2752 | 來源出版物: | EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology | 摘要: | This paper presents a framework of using Chinese speech to access images via English captions. The formulation and the structure mapping rules of Chinese and English named entities are extracted from an NICT foreign location name corpus. For a named location, name part and keyword part are usually transliterated and translated, respectively. Keyword spotting identifies the keyword from speech queries and narrows down the search space of image collections. A scoring function is proposed to compute the similarity between speech query and annotated captions in terms of International Phonetic Alphabets. The experimental results show that the average rank and the mean reciprocal rank are 2.04 and 0.8322, respectively, which is very close to the best performance, i.e., 1, for both average rank and mean reciprocal rank. |
URI: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84882972670&partnerID=40&md5=32a0f00f2c6ca19acb1a50ca60ecc10d https://scholars.lib.ntu.edu.tw/handle/123456789/632522 |
SDG/關鍵字: | Linguistics; Speech; Speech recognition; Chinese speech; Cross languages; English captions; Image collections; International Phonetic Alphabet; Keyword spotting; Mean reciprocal ranks; Scoring functions; Speech communication |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。