Evaluating Source Code Quality with Large Language Models: a comparative study
August 07, 2024 Β· Declared Dead Β· π Brazilian Symposium on Software Quality
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Igor Regis da Silva SimΓ΅es, Elaine Venson
arXiv ID
2408.07082
Category
cs.SE: Software Engineering
Citations
10
Venue
Brazilian Symposium on Software Quality
Last Checked
4 months ago
Abstract
Code quality is an attribute composed of various metrics, such as complexity, readability, testability, interoperability, reusability, and the use of good or bad practices, among others. Static code analysis tools aim to measure a set of attributes to assess code quality. However, some quality attributes can only be measured by humans in code review activities, readability being an example. Given their natural language text processing capability, we hypothesize that a Large Language Model (LLM) could evaluate the quality of code, including attributes currently not automatable. This paper aims to describe and analyze the results obtained using LLMs as a static analysis tool, evaluating the overall quality of code. We compared the LLM with the results obtained with the SonarQube software and its Maintainability metric for two Open Source Software (OSS) Java projects, one with Maintainability Rating A and the other B. A total of 1,641 classes were analyzed, comparing the results in two versions of models: GPT 3.5 Turbo and GPT 4o. We demonstrated that the GPT 3.5 Turbo LLM has the ability to evaluate code quality, showing a correlation with Sonar's metrics. However, there are specific aspects that differ in what the LLM measures compared to SonarQube. The GPT 4o version did not present the same results, diverging from the previous model and Sonar by assigning a high classification to codes that were assessed as lower quality. This study demonstrates the potential of LLMs in evaluating code quality. However, further research is necessary to investigate limitations such as LLM's cost, variability of outputs and explore quality characteristics not measured by traditional static analysis tools.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Software Engineering
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Microservices: yesterday, today, and tomorrow
π
π
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
π»
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
π»
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
π»
Ghosted
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted