The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification
July 22, 2025 Β· Declared Dead Β· π ACM SIGCOMM 2025
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yuqi Zhao, Giovanni Dettori, Matteo Boffa, Luca Vassio, Marco Mellia
arXiv ID
2507.16438
Category
cs.NI: Networking & Internet
Cross-listed
cs.LG
Citations
0
Venue
ACM SIGCOMM 2025
Last Checked
3 months ago
Abstract
Recently we have witnessed the explosion of proposals that, inspired by Language Models like BERT, exploit Representation Learning models to create traffic representations. All of them promise astonishing performance in encrypted traffic classification (up to 98% accuracy). In this paper, with a networking expert mindset, we critically reassess their performance. Through extensive analysis, we demonstrate that the reported successes are heavily influenced by data preparation problems, which allow these models to find easy shortcuts - spurious correlation between features and labels - during fine-tuning that unrealistically boost their performance. When such shortcuts are not present - as in real scenarios - these models perform poorly. We also introduce Pcap-Encoder, an LM-based representation learning model that we specifically design to extract features from protocol headers. Pcap-Encoder appears to be the only model that provides an instrumental representation for traffic classification. Yet, its complexity questions its applicability in practical settings. Our findings reveal flaws in dataset preparation and model training, calling for a better and more conscious test design. We propose a correct evaluation methodology and stress the need for rigorous benchmarking.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Networking & Internet
R.I.P.
π»
Ghosted
π
π
The Cartographer
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
π
π
The Cartographer
A Survey of Indoor Localization Systems and Technologies
R.I.P.
π»
Ghosted
Survey of Important Issues in UAV Communication Networks
π
π
The Cartographer
Network Function Virtualization: State-of-the-art and Research Challenges
π
π
The Cartographer
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted