Topo Goes Political: TDA-Based Controversy Detection in Imbalanced Reddit Political Data
March 05, 2025 Β· Declared Dead Β· π The Web Conference
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Arvindh Arun, Karuna K Chandra, Akshit Sinha, Balakumar Velayutham, Jashn Arora, Manish Jain, Ponnurangam Kumaraguru
arXiv ID
2503.03500
Category
cs.SI: Social & Info Networks
Citations
2
Venue
The Web Conference
Last Checked
4 months ago
Abstract
The detection of controversial content in political discussions on the Internet is a critical challenge in maintaining healthy digital discourse. Unlike much of the existing literature that relies on synthetically balanced data, our work preserves the natural distribution of controversial and non-controversial posts. This real-world imbalance highlights a core challenge that needs to be addressed for practical deployment. Our study re-evaluates well-established methods for detecting controversial content. We curate our own dataset focusing on the Indian political context that preserves the natural distribution of controversial content, with only 12.9% of the posts in our dataset being controversial. This disparity reflects the true imbalance in real-world political discussions and highlights a critical limitation in the existing evaluation methods. Benchmarking on datasets that model data imbalance is vital for ensuring real-world applicability. Thus, in this work, (i) we release our dataset, with an emphasis on class imbalance, that focuses on the Indian political context, (ii) we evaluate existing methods from this domain on this dataset and demonstrate their limitations in the imbalanced setting, (iii) we introduce an intuitive metric to measure a model's robustness to class imbalance, (iv) we also incorporate ideas from the domain of Topological Data Analysis, specifically Persistent Homology, to curate features that provide richer representations of the data. Furthermore, we benchmark models trained with topological features against established baselines.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Social & Info Networks
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Fake News Detection on Social Media: A Data Mining Perspective
R.I.P.
π»
Ghosted
Natural Scales in Geographical Patterns
R.I.P.
π»
Ghosted
Representation Learning on Graphs: Methods and Applications
R.I.P.
π»
Ghosted
The COVID-19 Social Media Infodemic
R.I.P.
π»
Ghosted
OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted