Hybrid CNN-Mamba Enhancement Network for Robust Multimodal Sentiment Analysis
July 31, 2025 · Declared Dead · 🏛 arXiv.org
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Xiang Li, Xianfu Cheng, Xiaoming Zhang, Zhoujun Li
arXiv ID
2507.23444
Category
cs.MM: Multimedia
Citations
0
Venue
arXiv.org
Last Checked
1 month ago
Abstract
Multimodal Sentiment Analysis (MSA) with missing modalities has recently attracted increasing attention. Although existing research mainly focuses on designing complex model architectures to handle incomplete data, it still faces significant challenges in effectively aligning and fusing multimodal information. In this paper, we propose a novel framework called the Hybrid CNN-Mamba Enhancement Network (HCMEN) for robust multimodal sentiment analysis under missing modality conditions. HCMEN is designed around three key components: (1) hierarchical unimodal modeling, (2) cross-modal enhancement and alignment, and (3) multimodal mix-up fusion. First, HCMEN integrates the strengths of Convolutional Neural Network (CNN) for capturing local details and the Mamba architecture for modeling global contextual dependencies across different modalities. Furthermore, grounded in the principle of Mutual Information Maximization, we introduce a cross-modal enhancement mechanism that generates proxy modalities from mixed token-level representations and learns fine-grained token-level correspondences between modalities. The enhanced unimodal features are then fused and passed through the CNN-Mamba backbone, enabling local-to-global cross-modal interaction and comprehensive multimodal integration. Extensive experiments on two benchmark MSA datasets demonstrate that HCMEN consistently outperforms existing state-of-the-art methods, achieving superior performance across various missing modality scenarios. The code will be released publicly in the near future.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Multimedia
R.I.P.
👻
Ghosted
🌅
🌅
Old Age
Quality Assessment of In-the-Wild Videos
R.I.P.
👻
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
R.I.P.
👻
Ghosted
A Comprehensive Survey on Cross-modal Retrieval
R.I.P.
👻
Ghosted
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
👻
Ghosted
A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™