Mens Sana In Corpore Sano: Sound Firmware Corpora for Vulnerability Research
April 18, 2024 Β· Declared Dead Β· π Network and Distributed System Security Symposium
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
RenΓ© Helmke, Elmar Padilla, Nils Aschenbruck
arXiv ID
2404.11977
Category
cs.CR: Cryptography & Security
Cross-listed
cs.DL
Citations
1
Venue
Network and Distributed System Security Symposium
Last Checked
3 months ago
Abstract
Firmware corpora for vulnerability research should be scientifically sound. Yet, several practical challenges complicate the creation of sound corpora: Sample acquisition, e.g., is hard and one must overcome the barrier of proprietary or encrypted data. As image contents are unknown prior analysis, it is hard to select high-quality samples that can satisfy scientific demands. Ideally, we help each other out by sharing data. But here, sharing is problematic due to copyright laws. Instead, papers must carefully document each step of corpus creation: If a step is unclear, replicability is jeopardized. This has cascading effects on result verifiability, representativeness, and, thus, soundness. Despite all challenges, how can we maintain the soundness of firmware corpora? This paper thoroughly analyzes the problem space and investigates its impact on research: We distill practical binary analysis challenges that significantly influence corpus creation. We use these insights to derive guidelines that help researchers to nurture corpus replicability and representativeness. We apply them to 44 top tier papers and systematically analyze scientific corpus creation practices. Our comprehensive analysis confirms that there is currently no common ground in related work. It shows the added value of our guidelines, as they discover methodical issues in corpus creation and unveil miniscule step stones in documentation. These blur visions on representativeness, hinder replicability, and, thus, negatively impact the soundness of otherwise excellent work. Finally, we show the feasibility of our guidelines and build a new, replicable corpus for large-scale analyses on Linux firmware: LFwC. We share rich meta data for good (and proven) replicability. We verify unpacking, deduplicate, identify contents, provide ground truth, and show LFwC's utility for research.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Cryptography & Security
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
π»
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
π»
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
π»
Ghosted
How To Backdoor Federated Learning
R.I.P.
π»
Ghosted
Evasion Attacks against Machine Learning at Test Time
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted