Supporting Human Raters with the Detection of Harmful Content using Large Language Models
June 18, 2024 Β· Declared Dead Β· π IEEE Symposium on Security and Privacy
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, BlaΕΎ BrataniΔ, Felipe Tiengo Ferreira, Vijay Kumar Eranti, Elie Bursztein
arXiv ID
2406.12800
Category
cs.CR: Cryptography & Security
Citations
7
Venue
IEEE Symposium on Security and Privacy
Last Checked
3 months ago
Abstract
In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage these capabilities, proposing five design patterns that integrate LLMs with human rating, such as pre-filtering non-violative content, detecting potential errors in human rating, or surfacing critical context to support human rating. We outline how to support all of these design patterns using a single, optimized prompt. Beyond these synthetic experiments, we share how piloting our proposed techniques in a real-world review queue yielded a 41.5% improvement in optimizing available human rater capacity, and a 9--11% increase (absolute) in precision and recall for detecting violative content.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Cryptography & Security
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
π»
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
π»
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
π»
Ghosted
How To Backdoor Federated Learning
R.I.P.
π»
Ghosted
Evasion Attacks against Machine Learning at Test Time
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted