MTSTRec: Multimodal Time-Aligned Shared Token Recommender

Hong, Ming-Yi; Hsu, Yen-Jung; Chiang, Miao-Chen; CHE LIN

MTSTRec: Multimodal Time-Aligned Shared Token Recommender

Journal

Proceedings of Machine Learning Research

Journal Volume

267

Start Page

23640

End Page

23661

ISSN

2640349

Date Issued

2025-07

Author(s)

Hong, Ming-Yi

Hsu, Yen-Jung

Chiang, Miao-Chen

CHE LIN

URI

https://www.scopus.com/record/display.uri?eid=2-s2.0-105023559339&origin=resultslist

https://scholars.lib.ntu.edu.tw/handle/123456789/735529

Abstract

Sequential recommendation in e-commerce utilizes users’ anonymous browsing histories to personalize product suggestions without relying on private information. Existing item ID-based methods and multimodal models often overlook the temporal alignment of modalities like textual descriptions, visual content, and prices in user browsing sequences. To address this limitation, this paper proposes the Multimodal Timealigned Shared Token Recommender (MTSTRec), a transformer-based framework with a single time-aligned shared token per product for efficient cross-modality fusion. MTSTRec preserves the distinct contributions of each modality while aligning them temporally to better capture user preferences. Extensive experiments demonstrate that MTSTRec achieves state-of-the-art performance across multiple sequential recommendation benchmarks, significantly improving upon existing multimodal fusion. Our code is available at https://github.com/idssplab/MTSTRec.

Event(s)

42nd International Conference on Machine Learning, ICML 2025

Publisher

ML Research Press

Type

conference paper

MTSTRec: Multimodal Time-Aligned Shared Token Recommender

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)