RepairBench: Leaderboard of Frontier Models for Program Repair

September 27, 2024 Β· Declared Dead Β· πŸ› 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors AndrΓ© Silva, Martin Monperrus arXiv ID 2409.18952 Category cs.SE: Software Engineering Cross-listed cs.LG Citations 19 Venue 2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) Last Checked 4 months ago
Abstract
AI-driven program repair uses AI models to repair buggy software by producing patches. Rapid advancements in AI surely impact state-of-the-art performance of program repair. Yet, grasping this progress requires frequent and standardized evaluations. We propose RepairBench, a novel leaderboard for AI-driven program repair. The key characteristics of RepairBench are: 1) it is execution-based: all patches are compiled and executed against a test suite, 2) it assesses frontier models in a frequent and standardized way. RepairBench leverages two high-quality benchmarks, Defects4J and GitBug-Java, to evaluate frontier models against real-world program repair tasks. We publicly release the evaluation framework of RepairBench. We will update the leaderboard as new frontier models are released.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted