A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger's Adversarial Attacks
November 20, 2020 · Declared Dead · 🏛 Annual Meeting of the Association for Computational Linguistics
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Thai Le, Noseong Park, Dongwon Lee
arXiv ID
2011.10492
Category
cs.CR: Cryptography & Security
Cross-listed
cs.CL,
cs.LG
Citations
26
Venue
Annual Meeting of the Association for Computational Linguistics
Last Checked
2 months ago
Abstract
The Universal Trigger (UniTrigger) is a recently-proposed powerful adversarial textual attack method. Utilizing a learning-based mechanism, UniTrigger generates a fixed phrase that, when added to any benign inputs, can drop the prediction accuracy of a textual neural network (NN) model to near zero on a target class. To defend against this attack that can cause significant harm, in this paper, we borrow the "honeypot" concept from the cybersecurity community and propose DARCY, a honeypot-based defense framework against UniTrigger. DARCY greedily searches and injects multiple trapdoors into an NN model to "bait and catch" potential attacks. Through comprehensive experiments across four public datasets, we show that DARCY detects UniTrigger's adversarial attacks with up to 99% TPR and less than 2% FPR in most cases, while maintaining the prediction accuracy (in F1) for clean inputs within a 1% margin. We also demonstrate that DARCY with multiple trapdoors is also robust to a diverse set of attack scenarios with attackers' varying levels of knowledge and skills. Source code will be released upon the acceptance of this paper.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Cryptography & Security
R.I.P.
👻
Ghosted
R.I.P.
👻
Ghosted
Membership Inference Attacks against Machine Learning Models
R.I.P.
👻
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
👻
Ghosted
Practical Black-Box Attacks against Machine Learning
R.I.P.
👻
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
👻
Ghosted
Extracting Training Data from Large Language Models
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™