Multi-attacks: Many images $+$ the same adversarial attack $\to$ many target labels

August 04, 2023 · Entered Twilight · 🏛 arXiv.org

Repo contents: LICENSE, README.md, multiattack_demo.ipynb

Authors Stanislav Fort arXiv ID 2308.03792 Category cs.CV: Computer Vision Cross-listed cs.CR, cs.LG Citations 2 Venue arXiv.org Repository https://github.com/stanislavfort/multi-attacks ⭐ 10 Last Checked 3 months ago

Abstract

We show that we can easily design a single adversarial perturbation $P$ that changes the class of $n$ images $X_1,X_2,\dots,X_n$ from their original, unperturbed classes $c_1, c_2,\dots,c_n$ to desired (not necessarily all the same) classes $c^*_1,c^*_2,\dots,c^*_n$ for up to hundreds of images and target classes at once. We call these \textit{multi-attacks}. Characterizing the maximum $n$ we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around $10^{\mathcal{O}(100)}$, posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.