R.I.P.
๐ป
Ghosted
A Case Study on the Impact of Anonymization Along the RAG Pipeline
April 17, 2026 ยท Grace Period ยท ๐ IWSPA 2026
Authors
Andreea-Elena Bodea, Stephen Meisenbacher, Florian Matthes
arXiv ID
2604.15958
Category
cs.CR: Cryptography & Security
Cross-listed
cs.CL
Citations
0
Venue
IWSPA 2026
Abstract
Despite the considerable promise of Retrieval-Augmented Generation (RAG), many real-world use cases may create privacy concerns, where the purported utility of RAG-enabled insights comes at the risk of exposing private information to either the LLM or the end user requesting the response. As a potential mitigation, using anonymization techniques to remove personally identifiable information (PII) and other sensitive markers in the underlying data represents a practical and sensible course of action for RAG administrators. Despite a wealth of literature on the topic, no works consider the placement of anonymization along the RAG pipeline, i.e., asking the question, where should anonymization happen? In this case study, we systematically and empirically measure the impact of anonymization at two important points along the RAG pipeline: the dataset and generated answer. We show that differences in privacy-utility trade-offs can be observed depending on where anonymization took place, demonstrating the significance of privacy risk mitigation placement in RAG.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Cryptography & Security
R.I.P.
๐ป
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
๐ป
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
๐ป
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
๐ป
Ghosted
How To Backdoor Federated Learning
R.I.P.
๐ป
Ghosted