Spoken cross-language access to image collection via captions
Journal
EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
Pages
2749-2752
Date Issued
2003
Author(s)
Abstract
This paper presents a framework of using Chinese speech to access images via English captions. The formulation and the structure mapping rules of Chinese and English named entities are extracted from an NICT foreign location name corpus. For a named location, name part and keyword part are usually transliterated and translated, respectively. Keyword spotting identifies the keyword from speech queries and narrows down the search space of image collections. A scoring function is proposed to compute the similarity between speech query and annotated captions in terms of International Phonetic Alphabets. The experimental results show that the average rank and the mean reciprocal rank are 2.04 and 0.8322, respectively, which is very close to the best performance, i.e., 1, for both average rank and mean reciprocal rank.
Other Subjects
Linguistics; Speech; Speech recognition; Chinese speech; Cross languages; English captions; Image collections; International Phonetic Alphabet; Keyword spotting; Mean reciprocal ranks; Scoring functions; Speech communication
Type
conference paper
