Text-centric Alignment for Bridging Test-time Unseen Modality
Journal
EMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
Start Page
3826
End Page
3845
ISBN (of the container)
979-889176335-7
Date Issued
2025-11-04
Author(s)
Abstract
This paper addresses the challenge of handling unseen modalities at test time and dynamic modality combinations with our proposed text-centric alignment method. This training-free in-context-learning alignment approach unifies different input modalities into a single semantic text representation by leveraging in-context learning with Large Language Models and uni-modal foundation models. Our method significantly enhances the ability to manage unseen, diverse, and unpredictable modality combinations, making it suitable for both generative and discriminative models to adopt on top. Our extensive experiments primarily evaluates on discriminative tasks, demonstrating that our approach is essential for LLMs to achieve robust modality alignment performance. It also surpasses the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the field by offering a flexible and effective solution for real-world applications where modality availability is dynamic and uncertain.
Event(s)
30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Publisher
Association for Computational Linguistics
Type
conference paper
