https://scholars.lib.ntu.edu.tw/handle/123456789/629165
標題: | Visual lifelog retrieval: humans and machines interpretation on first-person images | 作者: | Yen, An Zi Fu, Min Huan Ang, Wei Hong Chu, Tai Te Tsai, Ssu Hao Huang, Hen Hsen HSIN-HSI CHEN |
關鍵字: | Interactive system | Lifelog | Visual lifelog retrieval | 公開日期: | 1-一月-2023 | 來源出版物: | Multimedia Tools and Applications | 摘要: | People usually forget the details of life experiences and encounter situations where they require to recall their past experiences. Therefore, lifelog retrieval turns out to be an emerging task in the AI community. Nowadays, people can record their life experiences by capturing images through wearable devices, writing blog posts, and so on. These personal big data stored in digital format can be considered lifelogs for retrieval. In this work, we focus on constructing a visual lifelog retrieval system that is able to efficiently find related images given the input textual queries. The core challenge of visual lifelog retrieval with textual queries comes from the semantic gap between visual and textual data. In this work, we propose LifeConcept, an interactive lifelog search system that is aimed at not only accelerating the retrieval process but also fetching more precise results. To reduce the semantic gap, we incorporate visual and textual concepts from images into our system utilizing pre-trained textual embeddings. Moreover, we propose a concept recommendation method enabling users to set up the related conditions for their requirements efficiently and search the desired images with appropriate query terms based on the suggestion. Experimental results show that textual concepts from images detected by CV models improve the retrieval results. We further employ annotators to label captions of images for investigating the difference between model-generated captions and human-labeled captions. The human-annotated dataset is released to facilitate future study of visual lifelog retrieval. Four research questions are discussed to explore the characteristic of models and humans interpreting the first-person images captured by wearable cameras. The impacts of model-generated captions and human-labeled captions in terms of visual lifelog retrieval are also included. |
URI: | https://scholars.lib.ntu.edu.tw/handle/123456789/629165 | ISSN: | 13807501 | DOI: | 10.1007/s11042-023-14344-x |
顯示於: | 資訊工程學系 |
在 IR 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。