Evaluating Large Language Models for Line-Level Vulnerability Localization

March 30, 2024 Β· Declared Dead Β· πŸ› IEEE Transactions on Software Engineering

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jian Zhang, Chong Wang, Anran Li, Weisong Sun, Cen Zhang, Wei Ma, Yang Liu arXiv ID 2404.00287 Category cs.SE: Software Engineering Cross-listed cs.CR Citations 24 Venue IEEE Transactions on Software Engineering Last Checked 4 months ago
Abstract
Recently, Automated Vulnerability Localization (AVL) has attracted growing attention, aiming to facilitate diagnosis by pinpointing the specific lines of code responsible for vulnerabilities. Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in line-level vulnerability localization remains underexplored. In this work, we present the first comprehensive empirical evaluation of LLMs for AVL. Our study examines 19 leading LLMs suitable for code analysis, including ChatGPT and multiple open-source models, spanning encoder-only, encoder-decoder, and decoder-only architectures, with model sizes from 60M to 70B parameters. We evaluate three paradigms including few-shot prompting, discriminative fine-tuning, and generative fine-tuning with and without Low-Rank Adaptation (LoRA), on both a BigVul-derived dataset for C/C++ and a smart contract vulnerability dataset.} Our results show that discriminative fine-tuning achieves substantial performance gains over existing learning-based AVL methods when sufficient training data is available. In low-data settings, prompting advanced LLMs such as ChatGPT proves more effective. We also identify challenges related to input length and unidirectional context during fine-tuning, and propose two remedial strategies: a sliding window approach and right-forward embedding, both of which yield significant improvements. Moreover, we provide the first assessment of LLM generalizability in AVL, showing that certain models can transfer effectively across Common Weakness Enumerations (CWEs) and projects. However, performance degrades notably for newly discovered vulnerabilities containing unfamiliar lexical or structural patterns, underscoring the need for continual adaptation.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted