From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets
April 24, 2025 Β· The Cartographer Β· π ACM Computing Surveys
"No code URL or promise found in abstract"
"Title-pattern auto-detect: From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets"
Evidence collected by the PWNC Scanner
Authors
Hao-Nan Zhu, Robert M. Furth, Michael Pradel, Cindy Rubio-GonzΓ‘lez
arXiv ID
2504.17977
Category
cs.SE: Software Engineering
Citations
1
Venue
ACM Computing Surveys
Last Checked
4 days ago
Abstract
Software defect datasets, which are collections of software bugs, are essential resources to facilitate empirical research and enable standardized benchmarking for a wide range of software engineering techniques, including emerging areas like agentic AI-based software development. Over the years, numerous software defect datasets have been developed, providing rich resources for the community, yet making it increasingly difficult to navigate the landscape. This article provides a comprehensive survey of 151 software defect datasets, covering their scope, construction, availability, usability, and practical uses. We also suggest potential opportunities for future research based on our findings, such as addressing underrepresented kinds of defects. A complete catalog of all surveyed software defect datasets is available at https://defect-datasets.github.io/.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Software Engineering
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Microservices: yesterday, today, and tomorrow
π
π
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
π»
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
π»
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
π»
Ghosted