Data-Efficient 3D Visual Grounding via Order-Aware Referring

Wu, Tung-Yu; Huang, Sheng-Yu; YU-CHIANG WANG

doi:10.1109/wacv61041.2025.00307

Data-Efficient 3D Visual Grounding via Order-Aware Referring

Part Of

Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025

Start Page

3107

End Page

3117

ISBN (of the container)

979-833151083-1

DOI (of the container)

10.1109/WACV61041.2025.00307

ISBN

[9798331510831]

Date Issued

2025-02-26

Author(s)

Wu, Tung-Yu

Huang, Sheng-Yu

YU-CHIANG WANG

DOI

10.1109/wacv61041.2025.00307

URI

https://www.scopus.com/record/display.uri?eid=2-s2.0-105003631900&origin=resultslist

https://scholars.lib.ntu.edu.tw/handle/123456789/730026

Abstract

3D visual grounding aims to identify the target object within a 3D point cloud scene referred to by a natural language description. Previous works usually require significant data relating to point color and their descriptions to exploit the corresponding complicated verbo-visual relations. In our work, we introduce Vigor, a novel Data-Efficient 3D Visual Grounding framework via Order-aware Referring. Vigor leverages LLM to produce a desirable referential order from the input description for 3D visual grounding. With the proposed stacked object-referring blocks, the predicted anchor objects in the above order allow one to locate the target object progressively with-out supervision on the identities of anchor objects or exact relations between anchor/target objects. We also present an order-aware warm-up training strategy, which augments referential orders for pre-training the visual grounding framework, allowing us to better capture the complex verbo-visual relations and benefit the desirable data-efficient learning scheme. Experimental results on the NR3D and ScanRefer datasets demonstrate our superiority in low-resource scenarios. In particular, Vigor surpasses current state-of-the-art frameworks by 9.3% and 7.6% grounding accuracy under 1% data and 10% data settings on the NR3D dataset, respectively.

Event(s)

2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025

Publisher

IEEE

Type

conference paper

Data-Efficient 3D Visual Grounding via Order-Aware Referring

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)