Assessing the Code Clone Detection Capability of Large Language Models
July 02, 2024 Β· Declared Dead Β· π 2024 4th International Conference on Code Quality (ICCQ)
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zixian Zhang, Takfarinas Saber
arXiv ID
2407.02402
Category
cs.SE: Software Engineering
Cross-listed
cs.AI
Citations
13
Venue
2024 4th International Conference on Code Quality (ICCQ)
Last Checked
4 months ago
Abstract
This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity, sourced from two datasets: BigCloneBench (human-made) and GPTCloneBench (LLM-generated). Findings from the study indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types. A correlation was observed between the GPTs' accuracy at identifying code clones and code similarity, with both GPT models exhibiting low effectiveness in detecting the most complex Type-4 code clones. Additionally, GPT models demonstrate a higher performance identifying code clones in LLM-generated code compared to humans-generated code. However, they do not reach impressive accuracy. These results emphasize the imperative for ongoing enhancements in LLM capabilities, particularly in the recognition of code clones and in mitigating their predisposition towards self-generated code clones--which is likely to become an issue as software engineers are more numerous to leverage LLM-enabled code generation and code refactoring tools.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Software Engineering
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Microservices: yesterday, today, and tomorrow
π
π
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
π»
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
π»
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
π»
Ghosted
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted