Data-centric NLP Backdoor Defense from the Lens of Memorization
September 21, 2024 ยท Declared Dead ยท ๐ North American Chapter of the Association for Computational Linguistics
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zhenting Wang, Zhizhi Wang, Mingyu Jin, Mengnan Du, Juan Zhai, Shiqing Ma
arXiv ID
2409.14200
Category
cs.CL: Computation & Language
Cross-listed
cs.CR,
cs.LG
Citations
5
Venue
North American Chapter of the Association for Computational Linguistics
Last Checked
4 months ago
Abstract
Backdoor attack is a severe threat to the trustworthiness of DNN-based language models. In this paper, we first extend the definition of memorization of language models from sample-wise to more fine-grained sentence element-wise (e.g., word, phrase, structure, and style), and then point out that language model backdoors are a type of element-wise memorization. Through further analysis, we find that the strength of such memorization is positively correlated to the frequency of duplicated elements in the training dataset. In conclusion, duplicated sentence elements are necessary for successful backdoor attacks. Based on this, we propose a data-centric defense. We first detect trigger candidates in training data by finding memorizable elements, i.e., duplicated elements, and then confirm real triggers by testing if the candidates can activate backdoor behaviors (i.e., malicious elements). Results show that our method outperforms state-of-the-art defenses in defending against different types of NLP backdoors.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age
HellaSwag: Can a Machine Really Finish Your Sentence?
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted