CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization

April 18, 2026 ยท Grace Period ยท ๐Ÿ› CVPRW 2026

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Antonios Kritikos, Nikolaos Spanos, Athanasios Voulodimos arXiv ID 2604.16892 Category cs.CV: Computer Vision Citations 0 Venue CVPRW 2026
Abstract
Domain generalization (DG) aims to maintain performance under domain shift, which in computer vision appears primarily as stylistic variations that cause models to overfit to domain-specific appearance cues rather than class semantics. To overcome this, recent methods use textual representations as stable, domain-invariant anchors. However, multimodal approaches that rely on cosine similarity-based contrastive alignment leave a modality gap where image and text embeddings remain geometrically separated despite semantic correspondence. We propose CrossFlowDG, a novel DG framework that addresses this residual gap using noise-free, cross-modal flow matching. By learning a continuous transformation in the joint Euclidean latent space, our framework explicitly transports domain-biased image embeddings toward domain-invariant text embeddings of the correct class. Using the efficient VMamba image encoder and CLIP's text encoder, CrossFlowDG is tested against four common DG benchmarks, and achieves competitive performance on several benchmarks and state-of-the-art on TerraIncognita. Code is available at: https://github.com/ajkrit/CrossFlowDG
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

๐ŸŒ… ๐ŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV ๐Ÿ› ICCV ๐Ÿ“š 27.7K cites 11 years ago