R.I.P.
๐ป
Ghosted
CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings
July 08, 2024 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .gitignore, .gitmodules, .vscode, CodeBERT, LICENSE, README.md, codecse, example_code.json, example_nl.json, inference.py, requirements.txt
Authors
Anthony Varkey, Siyuan Jiang, Weijing Huang
arXiv ID
2407.06360
Category
cs.SE: Software Engineering
Citations
0
Venue
arXiv.org
Repository
https://github.com/emu-se/codecse
โญ 5
Last Checked
3 months ago
Abstract
Pretrained language models for code token embeddings are used in code search, code clone detection, and other code-related tasks. Similarly, code function embeddings are useful in such tasks. However, there are no out-of-box models for function embeddings in the current literature. So, this paper proposes CodeCSE, a contrastive learning model that learns embeddings for functions and their descriptions in one space. We evaluated CodeCSE using code search. CodeCSE's multi-lingual zero-shot approach is as efficient as the models finetuned from GraphCodeBERT for specific languages. CodeCSE is open source at https://github.com/emu-se/codecse and the pretrained model is available at the HuggingFace public hub: https://huggingface.co/sjiang1/codecse
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Software Engineering
R.I.P.
๐ป
Ghosted
Microservices: yesterday, today, and tomorrow
๐
๐
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
๐ป
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
๐ป
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
๐ป
Ghosted