Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
April 02, 2024 Β· The Cartographer Β· π Artificial Intelligence Review
"No code URL or promise found in abstract"
"Title-pattern auto-detect: Digital Forgetting in Large Language Models: A Survey of Unlearning Methods"
Evidence collected by the PWNC Scanner
Authors
Alberto Blanco-Justicia, Najeeb Jebreel, Benet Manzanares, David SΓ‘nchez, Josep Domingo-Ferrer, Guillem Collell, Kuan Eeik Tan
arXiv ID
2404.02062
Category
cs.CR: Cryptography & Security
Cross-listed
cs.AI,
cs.LG
Citations
45
Venue
Artificial Intelligence Review
Last Checked
2 days ago
Abstract
The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well the new model has forgotten the undesired knowledge/behavior), retain the performance of the original model on the desirable tasks, and be scalable (in particular forgetting has to be more efficient than retraining from scratch on just the tasks/data to be retained). This survey focuses on forgetting in large language models (LLMs). We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline. Second, we describe the motivations, types, and desired properties of digital forgetting. Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art. Fourth, we provide a detailed taxonomy of machine unlearning methods for LLMs, and we survey and compare current approaches. Fifth, we detail datasets, models and metrics used for the evaluation of forgetting, retaining and runtime. Sixth, we discuss challenges in the area. Finally, we provide some concluding remarks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Cryptography & Security
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
π»
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
π»
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
π»
Ghosted
How To Backdoor Federated Learning
R.I.P.
π»
Ghosted