R.I.P.
๐ป
Ghosted
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks
November 18, 2022 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: README.md, cp_attack.py, data, download_data.sh, download_models.py, figs, get_confusion_matrix.py, get_latents.py, prep_data.py, prep_models.sh, requirements.txt
Authors
Stephen Casper, Kaivalya Hariharan, Dylan Hadfield-Menell
arXiv ID
2211.10024
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.CR
Citations
11
Venue
arXiv.org
Repository
https://github.com/thestephencasper/snafue
โญ 5
Last Checked
2 months ago
Abstract
This paper considers the problem of helping humans exercise scalable oversight over deep neural networks (DNNs). Adversarial examples can be useful by helping to reveal weaknesses in DNNs, but they can be difficult to interpret or draw actionable conclusions from. Some previous works have proposed using human-interpretable adversarial attacks including copy/paste attacks in which one natural image pasted into another causes an unexpected misclassification. We build on these with two contributions. First, we introduce Search for Natural Adversarial Features Using Embeddings (SNAFUE) which offers a fully automated method for finding copy/paste attacks. Second, we use SNAFUE to red team an ImageNet classifier. We reproduce copy/paste attacks from previous works and find hundreds of other easily-describable vulnerabilities, all without a human in the loop. Code is available at https://github.com/thestephencasper/snafue
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted