MTSTRec: Multimodal Time-Aligned Shared Token Recommender
Journal
Proceedings of Machine Learning Research
Journal Volume
267
Start Page
23640
End Page
23661
ISSN
2640349
Date Issued
2025-07
Author(s)
Abstract
Sequential recommendation in e-commerce utilizes users’ anonymous browsing histories to personalize product suggestions without relying on private information. Existing item ID-based methods and multimodal models often overlook the temporal alignment of modalities like textual descriptions, visual content, and prices in user browsing sequences. To address this limitation, this paper proposes the Multimodal Timealigned Shared Token Recommender (MTSTRec), a transformer-based framework with a single time-aligned shared token per product for efficient cross-modality fusion. MTSTRec preserves the distinct contributions of each modality while aligning them temporally to better capture user preferences. Extensive experiments demonstrate that MTSTRec achieves state-of-the-art performance across multiple sequential recommendation benchmarks, significantly improving upon existing multimodal fusion. Our code is available at https://github.com/idssplab/MTSTRec.
Event(s)
42nd International Conference on Machine Learning, ICML 2025
Publisher
ML Research Press
Type
conference paper
