Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs

September 21, 2023 Β· Declared Dead Β· πŸ› Proc. ACM Program. Lang.

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Alex Renda, Yi Ding, Michael Carbin arXiv ID 2309.11726 Category cs.PL: Programming Languages Cross-listed cs.LG, cs.SE Citations 2 Venue Proc. ACM Program. Lang. Last Checked 4 months ago
Abstract
Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Programming Languages

Died the same way β€” πŸ‘» Ghosted