Scalable and Generalizable Social Bot Detection through Data Selection

November 20, 2019 · Declared Dead · 🏛 AAAI Conference on Artificial Intelligence

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Kai-Cheng Yang, Onur Varol, Pik-Mai Hui, Filippo Menczer arXiv ID 1911.09179 Category cs.CY: Computers & Society Cross-listed cs.LG, cs.SI Citations 363 Venue AAAI Conference on Artificial Intelligence Last Checked 2 months ago

Abstract

Efficient and reliable social bot classification is crucial for detecting information manipulation on social media. Despite rapid development, state-of-the-art bot detection models still face generalization and scalability challenges, which greatly limit their applications. In this paper we propose a framework that uses minimal account metadata, enabling efficient analysis that scales up to handle the full stream of public tweets of Twitter in real time. To ensure model accuracy, we build a rich collection of labeled datasets for training and validation. We deploy a strict validation system so that model performance on unseen datasets is also optimized, in addition to traditional cross-validation. We find that strategically selecting a subset of training data yields better model accuracy and generalization than exhaustively training on all available data. Thanks to the simplicity of the proposed model, its logic can be interpreted to provide insights into social bot characteristics.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computers & Society

R.I.P. 👻 Ghosted

A Survey Of Methods For Explaining Black Box Models

Riccardo Guidotti, Anna Monreale, ... (+4 more)

cs.CY 🏛 ACM CSUR 📚 4.6K cites 8 years ago

R.I.P. 👻 Ghosted

Artificial Intelligence: the global landscape of ethics guidelines

Anna Jobin, Marcello Ienca, Effy Vayena

cs.CY 🏛 arXiv 📚 2.2K cites 6 years ago

R.I.P. 👻 Ghosted

The role of artificial intelligence in achieving the Sustainable Development Goals

Ricardo Vinuesa, Hossein Azizpour, ... (+8 more)

cs.CY 🏛 Nat. Commun. 📚 1.9K cites 6 years ago

R.I.P. 👻 Ghosted

Green AI

Roy Schwartz, Jesse Dodge, ... (+2 more)

cs.CY 🏛 arXiv 📚 1.5K cites 6 years ago

R.I.P. 👻 Ghosted

Principles alone cannot guarantee ethical AI

Brent Mittelstadt

cs.CY 🏛 Nat. Mach. Int. 📚 1.1K cites 6 years ago

R.I.P. 👻 Ghosted

Tackling Climate Change with Machine Learning

David Rolnick, Priya L. Donti, ... (+20 more)

cs.CY 🏛 ACM CSUR 📚 1.0K cites 6 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago