PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback

November 18, 2024 · Declared Dead · 🏛 2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yun Peng, Akhilesh Deepak Gotmare, Michael Lyu, Caiming Xiong, Silvio Savarese, Doyen Sahoo arXiv ID 2412.03578 Category cs.SE: Software Engineering Cross-listed cs.AI, cs.CL, cs.PL Citations 29 Venue 2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge) Last Checked 4 months ago

Abstract

Large Language Models (LLMs) are widely adopted for assisting in software development tasks, yet their performance evaluations have narrowly focused on the functional correctness of generated code. Human programmers, however, require LLM-generated code to be not only correct but also optimally efficient. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code by incorporating feedback based on runtime during test case execution into the self-refinement iterations. With PerfCodeGen, we achieve speedups for a significantly higher proportion of problems compared to using the base LLM with sophisticated prompting techniques. Applied to open language models like Phi-3-mini, PerfCodeGen achieves runtime efficiency comparable to prompting powerful closed models like GPT-4. We achieve state-of-the-art runtime efficiency on benchmarks such as HumanEval, MBPP, and APPS, frequently surpassing the ground truth reference solutions with PerfCodeGen using GPT-3.5 and GPT-4. Additionally, we demonstrate the effectiveness of our approach in enhancing code quality across a range of open LLMs of varying sizes including Phi-3-mini, Llama 3 8B, Mixtral 8x7B, Command R, and Llama 3 70B.