Evolution of Privacy Loss in Wikipedia
December 11, 2015 Β· Declared Dead Β· π Web Search and Data Mining
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Marian-Andrei Rizoiu, Lexing Xie, Tiberio Caetano, Manuel Cebrian
arXiv ID
1512.03523
Category
cs.SI: Social & Info Networks
Citations
22
Venue
Web Search and Data Mining
Last Checked
3 months ago
Abstract
The cumulative effect of collective online participation has an important and adverse impact on individual privacy. As an online system evolves over time, new digital traces of individual behavior may uncover previously hidden statistical links between an individual's past actions and her private traits. To quantify this effect, we analyze the evolution of individual privacy loss by studying the edit history of Wikipedia over 13 years, including more than 117,523 different users performing 188,805,088 edits. We trace each Wikipedia's contributor using apparently harmless features, such as the number of edits performed on predefined broad categories in a given time period (e.g. Mathematics, Culture or Nature). We show that even at this unspecific level of behavior description, it is possible to use off-the-shelf machine learning algorithms to uncover usually undisclosed personal traits, such as gender, religion or education. We provide empirical evidence that the prediction accuracy for almost all private traits consistently improves over time. Surprisingly, the prediction performance for users who stopped editing after a given time still improves. The activities performed by new users seem to have contributed more to this effect than additional activities from existing (but still active) users. Insights from this work should help users, system designers, and policy makers understand and make long-term design choices in online content creation systems.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Social & Info Networks
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Fake News Detection on Social Media: A Data Mining Perspective
R.I.P.
π»
Ghosted
Natural Scales in Geographical Patterns
R.I.P.
π»
Ghosted
Representation Learning on Graphs: Methods and Applications
R.I.P.
π»
Ghosted
The COVID-19 Social Media Infodemic
R.I.P.
π»
Ghosted
OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted