The Sweet Danger of Sugar: Debunking Representation Learning for Encrypted Traffic Classification

July 22, 2025 · Declared Dead · 🏛 ACM SIGCOMM 2025

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yuqi Zhao, Giovanni Dettori, Matteo Boffa, Luca Vassio, Marco Mellia arXiv ID 2507.16438 Category cs.NI: Networking & Internet Cross-listed cs.LG Citations 0 Venue ACM SIGCOMM 2025 Last Checked 3 months ago

Abstract

Recently we have witnessed the explosion of proposals that, inspired by Language Models like BERT, exploit Representation Learning models to create traffic representations. All of them promise astonishing performance in encrypted traffic classification (up to 98% accuracy). In this paper, with a networking expert mindset, we critically reassess their performance. Through extensive analysis, we demonstrate that the reported successes are heavily influenced by data preparation problems, which allow these models to find easy shortcuts - spurious correlation between features and labels - during fine-tuning that unrealistically boost their performance. When such shortcuts are not present - as in real scenarios - these models perform poorly. We also introduce Pcap-Encoder, an LM-based representation learning model that we specifically design to extract features from protocol headers. Pcap-Encoder appears to be the only model that provides an instrumental representation for traffic classification. Yet, its complexity questions its applicability in practical settings. Our findings reveal flaws in dataset preparation and model training, calling for a better and more conscious test design. We propose a correct evaluation methodology and stress the need for rigorous benchmarking.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Networking & Internet

R.I.P. 👻 Ghosted

Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing

Xu Chen, Lei Jiao, ... (+2 more)

cs.NI 🏛 IEEE/ACM ToN 📚 2.2K cites 10 years ago

📚 📚 The Cartographer

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

Wei Yang Bryan Lim, Nguyen Cong Luong, ... (+6 more)

cs.NI 🏛 IEEE COMST 📚 2.1K cites 6 years ago

📚 📚 The Cartographer

A Survey of Indoor Localization Systems and Technologies

Faheem Zafari, Athanasios Gkelias, Kin Leung

cs.NI 🏛 IEEE COMST 📚 2.1K cites 8 years ago

R.I.P. 👻 Ghosted

Survey of Important Issues in UAV Communication Networks

Lav Gupta, Raj Jain, Gabor Vaszkun

cs.NI 🏛 IEEE COMST 📚 2.0K cites 10 years ago

📚 📚 The Cartographer

Network Function Virtualization: State-of-the-art and Research Challenges

Rashid Mijumbi, Joan Serrat, ... (+4 more)

cs.NI 🏛 IEEE COMST 📚 1.8K cites 10 years ago

📚 📚 The Cartographer

Applications of Deep Reinforcement Learning in Communications and Networking: A Survey

Nguyen Cong Luong, Dinh Thai Hoang, ... (+5 more)

cs.NI 🏛 IEEE COMST 📚 1.7K cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago