Visual lifelog retrieval: humans and machines interpretation on first-person images

Yen, An Zi; Fu, Min Huan; Ang, Wei Hong; Chu, Tai Te; Tsai, Ssu Hao; Huang, Hen Hsen; HSIN-HSI CHEN

標題:	Visual lifelog retrieval: humans and machines interpretation on first-person images
作者:	Yen, An Zi Fu, Min Huan Ang, Wei Hong Chu, Tai Te Tsai, Ssu Hao Huang, Hen Hsen HSIN-HSI CHEN
關鍵字:	Interactive system \| Lifelog \| Visual lifelog retrieval
公開日期:	1-一月-2023
來源出版物:	Multimedia Tools and Applications
摘要:	People usually forget the details of life experiences and encounter situations where they require to recall their past experiences. Therefore, lifelog retrieval turns out to be an emerging task in the AI community. Nowadays, people can record their life experiences by capturing images through wearable devices, writing blog posts, and so on. These personal big data stored in digital format can be considered lifelogs for retrieval. In this work, we focus on constructing a visual lifelog retrieval system that is able to efficiently find related images given the input textual queries. The core challenge of visual lifelog retrieval with textual queries comes from the semantic gap between visual and textual data. In this work, we propose LifeConcept, an interactive lifelog search system that is aimed at not only accelerating the retrieval process but also fetching more precise results. To reduce the semantic gap, we incorporate visual and textual concepts from images into our system utilizing pre-trained textual embeddings. Moreover, we propose a concept recommendation method enabling users to set up the related conditions for their requirements efficiently and search the desired images with appropriate query terms based on the suggestion. Experimental results show that textual concepts from images detected by CV models improve the retrieval results. We further employ annotators to label captions of images for investigating the difference between model-generated captions and human-labeled captions. The human-annotated dataset is released to facilitate future study of visual lifelog retrieval. Four research questions are discussed to explore the characteristic of models and humans interpreting the first-person images captured by wearable cameras. The impacts of model-generated captions and human-labeled captions in terms of visual lifelog retrieval are also included.
URI:	https://scholars.lib.ntu.edu.tw/handle/123456789/629165
ISSN:	13807501
DOI:	10.1007/s11042-023-14344-x
顯示於：	資訊工程學系

顯示文件完整紀錄

Page view(s)

checked on 2024/4/27

Google Scholar^TM

檢查

Altmetric

TAIR相關文章

Page view(s)

Google ScholarTM

Altmetric

Altmetric

Google Scholar^TM