Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

February 09, 2025 Β· Declared Dead Β· πŸ› 2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Marc Bruni, Fabio Gabrielli, Mohammad Ghafari, Martin Kropp arXiv ID 2502.06039 Category cs.SE: Software Engineering Cross-listed cs.AI, cs.CR Citations 18 Venue 2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge) Last Checked 4 months ago
Abstract
Prompt engineering reduces reasoning mistakes in Large Language Models (LLMs). However, its effectiveness in mitigating vulnerabilities in LLM-generated code remains underexplored. To address this gap, we implemented a benchmark to automatically assess the impact of various prompt engineering strategies on code security. Our benchmark leverages two peer-reviewed prompt datasets and employs static scanners to evaluate code security at scale. We tested multiple prompt engineering techniques on GPT-3.5-turbo, GPT-4o, and GPT-4o-mini. Our results show that for GPT-4o and GPT-4o-mini, a security-focused prompt prefix can reduce the occurrence of security vulnerabilities by up to 56%. Additionally, all tested models demonstrated the ability to detect and repair between 41.9% and 68.7% of vulnerabilities in previously generated code when using iterative prompting techniques. Finally, we introduce a "prompt agent" that demonstrates how the most effective techniques can be applied in real-world development workflows.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted