Modular AR Framework for Vision-Language Tasks

Fischer R; Weng T.-H; LI-CHEN FU; Fischer R;Weng T.-H;Fu L.-C.

doi:10.1145/3439133.3439142

Modular AR Framework for Vision-Language Tasks

Journal

ACM International Conference Proceeding Series

Pages

16-21

Date Issued

2020

Author(s)

Fischer R

Weng T.-H

LI-CHEN FU

DOI

10.1145/3439133.3439142

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85102859414&doi=10.1145%2f3439133.3439142&partnerID=40&md5=1fb17ee2b3f24b9e7633f880d7bf62fe

https://scholars.lib.ntu.edu.tw/handle/123456789/581389

Abstract

Mixed / augmented reality systems have become more and more sophisticated in recent years. However, they still lack any ability to reason about the surrounding world. On the other hand, computer vision research has made many advancements towards a more human-like reasoning process. This paper aims to bridge these 2 research areas by implementing a modular framework which interconnects an AR application with a deep learning based vision model. Finally, a few potential use cases of the proposed system are showcased. The developed framework allows the application to utilize a variety of Vision-Language (V+L) models, to gain additional understanding about the surrounding environment. The system is designed to be modular and expandable. It is able to connect any number of Python processes of the V+L models to Unity apps using AR technology. The system was evaluated in our university's smart home lab based on daily life use cases. With a further extension of the framework by additional downstream tasks provided by V+L models and other computer vision systems, this framework should find wider adoption in AR applications. The increasing ability of applications to comprehend visual common sense and natural conversations would enable more intuitive interactions with the user, who could perceive his device more as a virtual assistant and companion. ? 2020 ACM.

Event(s)

4th International Conference on Artificial Intelligence and Virtual Reality, AIVR 2020

Subjects

Automation; Deep learning; Virtual reality; AR application; Computer vision system; Intuitive interaction; Modular framework; Reality systems; Reasoning process; Surrounding environment; Virtual assistants; Computer vision

Type

conference paper

Modular AR Framework for Vision-Language Tasks

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)