Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces
March 18, 2018 ยท Declared Dead ยท ๐ ESEC/SIGSOFT FSE
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps
arXiv ID
1803.06686
Category
cs.SE: Software Engineering
Citations
81
Venue
ESEC/SIGSOFT FSE
Last Checked
1 month ago
Abstract
With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Software Engineering
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
GraphCodeBERT: Pre-training Code Representations with Data Flow
R.I.P.
๐ป
Ghosted
DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars
R.I.P.
๐ป
Ghosted
Microservices: yesterday, today, and tomorrow
R.I.P.
๐ป
Ghosted
Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks
R.I.P.
๐ป
Ghosted
A Survey of Machine Learning for Big Code and Naturalness
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted