Defectors: A Large, Diverse Python Dataset for Defect Prediction

March 08, 2023 Β· Declared Dead Β· πŸ› IEEE Working Conference on Mining Software Repositories

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Parvez Mahbub, Ohiduzzaman Shuvo, Mohammad Masudur Rahman arXiv ID 2303.04738 Category cs.SE: Software Engineering Citations 11 Venue IEEE Working Conference on Mining Software Repositories Last Checked 4 months ago
Abstract
Defect prediction has been a popular research topic where machine learning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are often limited by the quality and size of their datasets. In this paper, we present Defectors, a large dataset for just-in-time and line-level defect prediction. Defectors consists of $\approx$ 213K source code files ($\approx$ 93K defective and $\approx$ 120K defect-free) that span across 24 popular Python projects. These projects come from 18 different domains, including machine learning, automation, and internet-of-things. Such a scale and diversity make Defectors a suitable dataset for training ML/DL models, especially transformer models that require large and diverse datasets. We also foresee several application areas of our dataset including defect prediction and defect explanation. Dataset link: https://doi.org/10.5281/zenodo.7708984
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted