Addressing Leakage in Self-Supervised Contextualized Code Retrieval

April 17, 2022 Β· Declared Dead Β· πŸ› International Conference on Computational Linguistics

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Johannes Villmow, Viola Campos, Adrian Ulges, Ulrich Schwanecke arXiv ID 2204.11594 Category cs.SE: Software Engineering Cross-listed cs.IR, cs.LG Citations 3 Venue International Conference on Computational Linguistics Last Checked 4 months ago
Abstract
We address contextualized code retrieval, the search for code snippets helpful to fill gaps in a partial input program. Our approach facilitates a large-scale self-supervised contrastive training by splitting source code randomly into contexts and targets. To combat leakage between the two, we suggest a novel approach based on mutual identifier masking, dedentation, and the selection of syntax-aligned targets. Our second contribution is a new dataset for direct evaluation of contextualized code retrieval, based on a dataset of manually aligned subpassages of code clones. Our experiments demonstrate that our approach improves retrieval substantially, and yields new state-of-the-art results for code clone and defect detection.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted