MAViS: A Multi-Agent Approach for Training-Free Referring Video Object Segmentation

Peng, Tai; Chen, Chu-Song

doi:10.1109/tce.2025.3650288

MAViS: A Multi-Agent Approach for Training-Free Referring Video Object Segmentation

Journal

IEEE Transactions on Consumer Electronics

Start Page

1

ISSN

0098-3063

1558-4127

Date Issued

2026-01-16

Author(s)

Peng, Tai

Chen, Chu-Song

DOI

10.1109/tce.2025.3650288

URI

https://www.scopus.com/record/display.uri?eid=2-s2.0-105027990731&origin=resultslist

https://scholars.lib.ntu.edu.tw/handle/123456789/736439

Abstract

In this paper, we introduce a simple but effective training-free pipeline for handling the task of text-to-video object segmentation. Our approach leverages open-source Multimodal Large Language Models (MLLMs) for segmenting objects in videos based on language descriptions. We design three multimodal reasoning agents that decompose the task into semantic, temporal, and spatial reasoning stages: a Video Summarization Agent to provide concise semantic context, a Keyframe Selection Agent employing a Binary-Logit Frame Scoring mechanism to identify informative frames, and an Object Grounding Agent predicting bounding boxes for the described objects. Finally, by providing high-quality prompts to a semantic-free segmentation tool, our method effectively handles spatiotemporal variations and reduces segmentation errors. Extensive experiments show that our training-free method significantly reduces resource requirements while achieving comparable or even better performance than supervised fine-tuning approaches.

Subjects

multi-agent system

multimodal large language Models

reasoning segmentation

Referring video object segmentation

segment anything

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Type

journal article

MAViS: A Multi-Agent Approach for Training-Free Referring Video Object Segmentation

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)