Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey
August 02, 2023 ยท The Cartographer ยท ๐ arXiv.org
"No code URL or promise found in abstract"
"Title-pattern auto-detect: Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey"
Evidence collected by the PWNC Scanner
Authors
Shihan Dou, Junjie Shan, Haoxiang Jia, Wenhao Deng, Zhiheng Xi, Wei He, Yueming Wu, Tao Gui, Yang Liu, Xuanjing Huang
arXiv ID
2308.01191
Category
cs.SE: Software Engineering
Citations
34
Venue
arXiv.org
Last Checked
2 days ago
Abstract
Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile, large language models (LLMs) possess diverse code-related knowledge, making them versatile for various software engineering challenges. However, LLMs' performance in code clone detection is unclear and needs more study for accurate assessment. In this paper, we provide the first comprehensive evaluation of LLMs for clone detection, covering different clone types, languages, and prompts. We find advanced LLMs excel in detecting complex semantic clones, surpassing existing methods. Adding intermediate reasoning steps via chain-of-thought prompts noticeably enhances performance. Additionally, representing code as vector embeddings, especially with text encoders, effectively aids clone detection.Lastly, the ability of LLMs to detect code clones differs among various programming languages. Our study suggests that LLMs have potential for clone detection due to their language capabilities, offering insights for developing robust LLM-based methods to enhance software engineering.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Software Engineering
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Microservices: yesterday, today, and tomorrow
๐
๐
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
๐ป
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
๐ป
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
๐ป
Ghosted