Towards Multimodal Sentiment Analysis via Contrastive Cross-modal Retrieval Augmentation and Hierachical Prompts

August 11, 2025 · Declared Dead · 🏛 IEEE Transactions on Affective Computing

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xianbing Zhao, Shengzun Yang, Buzhou Tang, Ronghuan Jiang arXiv ID 2508.07666 Category cs.MM: Multimedia Citations 0 Venue IEEE Transactions on Affective Computing Last Checked 4 months ago

Abstract

Multimodal sentiment analysis is a fundamental problem in the field of affective computing. Although significant progress has been made in cross-modal interaction, it remains a challenge due to the insufficient reference context in cross-modal interactions. Current cross-modal approaches primarily focus on leveraging modality-level reference context within a individual sample for cross-modal feature enhancement, neglecting the potential cross-sample relationships that can serve as sample-level reference context to enhance the cross-modal features. To address this issue, we propose a novel multimodal retrieval-augmented framework to simultaneously incorporate inter-sample modality-level reference context and cross-sample sample-level reference context to enhance the multimodal features. In particular, we first design a contrastive cross-modal retrieval module to retrieve semantic similar samples and enhance target modality. To endow the model to capture both inter-sample and intra-sample information, we integrate two different types of prompts, modality-level prompts and sample-level prompts, to generate modality-level and sample-level reference contexts, respectively. Finally, we design a cross-modal retrieval-augmented encoder that simultaneously leverages modality-level and sample-level reference contexts to enhance the target modality. Extensive experiments demonstrate the effectiveness and superiority of our model on two publicly available datasets.