Unsupervised clustering of file dialects according to monotonic decompositions of mixtures

February 09, 2023 · Declared Dead · 🏛 2023 IEEE Security and Privacy Workshops (SPW)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Michael Robinson, Tate Altman, Denley Lam, Letitia W. Li arXiv ID 2304.09082 Category cs.PL: Programming Languages Cross-listed cs.CL, cs.IR Citations 0 Venue 2023 IEEE Security and Privacy Workshops (SPW) Last Checked 4 months ago

Abstract

This paper proposes an unsupervised classification method that partitions a set of files into non-overlapping dialects based upon their behaviors, determined by messages produced by a collection of programs that consume them. The pattern of messages can be used as the signature of a particular kind of behavior, with the understanding that some messages are likely to co-occur, while others are not. Patterns of messages can be used to classify files into dialects. A dialect is defined by a subset of messages, called the required messages. Once files are conditioned upon dialect and its required messages, the remaining messages are statistically independent. With this definition of dialect in hand, we present a greedy algorithm that deduces candidate dialects from a dataset consisting of a matrix of file-message data, demonstrate its performance on several file formats, and prove conditions under which it is optimal. We show that an analyst needs to consider fewer dialects than distinct message patterns, which reduces their cognitive load when studying a complex format.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Programming Languages

R.I.P. 👻 Ghosted

Ascertaining Uncertainty for Efficient Exact Cache Analysis

Valentin Touzeau, Claire Maïza, ... (+2 more)

cs.PL 🏛 CAV 📚 816 cites 8 years ago

R.I.P. 👻 Ghosted

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Nicolas Vasilache, Oleksandr Zinenko, ... (+7 more)

cs.PL 🏛 arXiv 📚 472 cites 8 years ago

R.I.P. 👻 Ghosted

Glow: Graph Lowering Compiler Techniques for Neural Networks

Nadav Rotem, Jordan Fix, ... (+16 more)

cs.PL 🏛 arXiv 📚 318 cites 8 years ago

R.I.P. 👻 Ghosted

Learnable Programming: Blocks and Beyond

David Bau, Jeff Gray, ... (+3 more)

cs.PL 🏛 CACM 📚 298 cites 9 years ago

R.I.P. 👻 Ghosted

Scenic: A Language for Scenario Specification and Scene Generation

Daniel J. Fremont, Tommaso Dreossi, ... (+4 more)

cs.PL 🏛 ACM-SIGPLAN Symposium on Programming Language Design and Implementation 📚 297 cites 7 years ago

R.I.P. 👻 Ghosted

Vandal: A Scalable Security Analysis Framework for Smart Contracts

Lexi Brent, Anton Jurisevic, ... (+6 more)

cs.PL 🏛 arXiv 📚 296 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago