ProMSC-MIS: Prompt-based Multimodal Semantic Communication for Multi-Spectral Image Segmentation

August 27, 2025 · Declared Dead · 🏛 IEEE Open Journal of the Communications Society

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Haoshuo Zhang, Yufei Bo, Meixia Tao arXiv ID 2508.20057 Category cs.MM: Multimedia Citations 0 Venue IEEE Open Journal of the Communications Society Last Checked 4 months ago

Abstract

Multimodal semantic communication has great potential to enhance downstream task performance by integrating complementary information across modalities. This paper introduces ProMSC-MIS, a novel Prompt-based Multimodal Semantic Communication framework for Multi-Spectral Image Segmentation. It enables efficient task-oriented transmission of spatially aligned RGB and thermal images over band-limited channels. Our framework has two main design novelties. First, by leveraging prompt learning and contrastive learning, unimodal semantic encoders are pre-trained to learn diverse and complementary semantic representations by using features from one modality as prompts for another. Second, a semantic fusion module that combines cross-attention mechanism and squeeze-and-excitation (SE) networks is designed to effectively fuse cross-modal features. Experimental results demonstrate that ProMSC-MIS substantially outperforms conventional image transmission combined with state-of-the-art segmentation methods. Notably, it reduces the required channel bandwidth by 50%--70% at the same segmentation performance, while also decreasing the storage overhead and computational complexity by 26% and 37%, respectively. Ablation studies also validate the effectiveness of the proposed pre-training and semantic fusion strategies. Our scheme is highly suitable for applications such as autonomous driving and nighttime surveillance.