Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?

July 15, 2024 Β· Declared Dead Β· πŸ› International Symposium on Empirical Software Engineering and Measurement

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Triet H. M. Le, M. Ali Babar arXiv ID 2407.10722 Category cs.SE: Software Engineering Cross-listed cs.CR, cs.LG Citations 12 Venue International Symposium on Empirical Software Engineering and Measurement Last Checked 4 months ago
Abstract
Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distributions of the CVSS classes, but such data imbalance has been hardly understood and addressed in the literature. Aims: We conduct a large-scale study to quantify the impacts of data imbalance and mitigate the issue for SV assessment through the use of data augmentation. Method: We leverage nine data augmentation techniques to balance the class distributions of the CVSS metrics. We then compare the performance of SV assessment models with and without leveraging the augmented data. Results: Through extensive experiments on 180k+ real-world SVs, we show that mitigating data imbalance can significantly improve the predictive performance of models for all the CVSS tasks, by up to 31.8% in Matthews Correlation Coefficient. We also discover that simple text augmentation like combining random text insertion, deletion, and replacement can outperform the baseline across the board. Conclusions: Our study provides the motivation and the first promising step toward tackling data imbalance for effective SV assessment.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted