Sentiment Analysis of Cybersecurity Content on Twitter and Reddit

April 26, 2022 · Declared Dead · 🏛 Data Mining and Machine Learning

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Bipun Thapa arXiv ID 2204.12267 Category cs.CL: Computation & Language Cross-listed cs.CR, cs.LG Citations 17 Venue Data Mining and Machine Learning Last Checked 4 months ago

Abstract

Sentiment Analysis provides an opportunity to understand the subject(s), especially in the digital age, due to an abundance of public data and effective algorithms. Cybersecurity is a subject where opinions are plentiful and differing in the public domain. This descriptive research analyzed cybersecurity content on Twitter and Reddit to measure its sentiment, positive or negative, or neutral. The data from Twitter and Reddit was amassed via technology-specific APIs during a selected timeframe to create datasets, which were then analyzed individually for their sentiment by VADER, an NLP (Natural Language Processing) algorithm. A random sample of cybersecurity content (ten tweets and posts) was also classified for sentiments by twenty human annotators to evaluate the performance of VADER. Cybersecurity content on Twitter was at least 48% positive, and Reddit was at least 26.5% positive. The positive or neutral content far outweighed negative sentiments across both platforms. When compared to human classification, which was considered the standard or source of truth, VADER produced 60% accuracy for Twitter and 70% for Reddit in assessing the sentiment; in other words, some agreement between algorithm and human classifiers. Overall, the goal was to explore an uninhibited research topic about cybersecurity sentiment