Robustness of Vision Foundation Models to Common Perturbations

April 16, 2026 · Grace Period · 🏛 CVPR 2026 Workshop

Authors Hongbin Liu, Zhengyuan Jiang, Cheng Hong, Neil Zhenqiang Gong arXiv ID 2604.14973 Category cs.CR: Cryptography & Security Cross-listed cs.CV Citations 0 Venue CVPR 2026 Workshop

Abstract

A vision foundation model outputs an embedding vector for an image, which can be affected by common editing operations (e.g., JPEG compression, brightness, contrast adjustments). These common perturbations alter embedding vectors and may impact the performance of downstream tasks using these embeddings. In this work, we present the first systematic study on foundation models' robustness to such perturbations. We propose three robustness metrics and formulate five desired mathematical properties for these metrics, analyzing which properties they satisfy or violate. Using these metrics, we evaluate six industry-scale foundation models (OpenAI, Meta) across nine common perturbation categories, finding them generally non-robust. We also show that common perturbations degrade downstream application performance (e.g., classification accuracy) and that robustness values can predict performance impacts. Finally, we propose a fine-tuning approach to improve robustness without sacrificing utility.