Forklift: An Extensible Neural Lifter
April 01, 2024 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jordi Armengol-EstapΓ©, Rodrigo C. O. Rocha, Jackson Woodruff, Pasquale Minervini, Michael F. P. O'Boyle
arXiv ID
2404.16041
Category
cs.PL: Programming Languages
Cross-listed
cs.AI,
cs.LG
Citations
3
Venue
arXiv.org
Last Checked
4 months ago
Abstract
The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an architecture-independent intermediate representation (IR) (for example, the LLVM IR) and use a pre-existing compiler to recompile the IR to the target ISA. However, the hand-written rules these lifters employ are sensitive to the particular compiler and optimization level used to generate the code and require significant engineering effort to support each new ISA. We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder Transformer. We show how to incrementally add support to new ISAs by fine tuning the assembly encoder and freezing the IR decoder, improving the overall accuracy and efficiency. We collect millions of parallel LLVM IR, x86, ARM, and RISC-V programs across compilers and optimization levels to train Forklift and set up an input/output-based accuracy harness. We evaluate Forklift on two challenging benchmark suites and translate 2.5x more x86 programs than a state-of-the-art hand-written lifter and 4.4x more x86 programs than GPT-4 as well as enabling translation from new ISAs.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Programming Languages
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
R.I.P.
π»
Ghosted
Glow: Graph Lowering Compiler Techniques for Neural Networks
R.I.P.
π»
Ghosted
Learnable Programming: Blocks and Beyond
R.I.P.
π»
Ghosted
Scenic: A Language for Scenario Specification and Scene Generation
R.I.P.
π»
Ghosted
Vandal: A Scalable Security Analysis Framework for Smart Contracts
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted