Improving Retrieval-Augmented Code Comment Generation by Retrieving for Generation
August 07, 2024 Β· Declared Dead Β· π IEEE International Conference on Software Maintenance and Evolution
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Hanzhen Lu, Zhongxin Liu
arXiv ID
2408.03623
Category
cs.SE: Software Engineering
Citations
10
Venue
IEEE International Conference on Software Maintenance and Evolution
Last Checked
4 months ago
Abstract
Code comment generation aims to generate high-quality comments from source code automatically and has been studied for years. Recent studies proposed to integrate information retrieval techniques with neural generation models to tackle this problem, i.e., Retrieval-Augmented Comment Generation (RACG) approaches, and achieved state-of-the-art results. However, the retrievers in previous work are built independently of their generators. This results in that the retrieved exemplars are not necessarily the most useful ones for generating comments, limiting the performance of existing approaches. To address this limitation, we propose a novel training strategy to enable the retriever to learn from the feedback of the generator and retrieve exemplars for generation. Specifically, during training, we use the retriever to retrieve the top-k exemplars and calculate their retrieval scores, and use the generator to calculate a generation loss for the sample based on each exemplar. By aligning high-score exemplars retrieved by the retriever with low-loss exemplars observed by the generator, the retriever can learn to retrieve exemplars that can best improve the quality of the generated comments. Based on this strategy, we propose a novel RACG approach named JOINTCOM and evaluate it on two real-world datasets, JCSD and PCSD. The experimental results demonstrate that our approach surpasses the state-of-the-art baselines by 7.3% to 30.0% in terms of five metrics on the two datasets. We also conduct a human evaluation to compare JOINTCOM with the best-performing baselines. The results indicate that JOINTCOM outperforms the baselines, producing comments that are more natural, informative, and useful.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Software Engineering
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Microservices: yesterday, today, and tomorrow
π
π
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
π»
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
π»
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
π»
Ghosted
ContractFuzzer: Fuzzing Smart Contracts for Vulnerability Detection
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted