Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States

May 10, 2016 Β· Declared Dead Β· πŸ› Palgrave Communications

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Eszter BokΓ‘nyi, DΓ‘niel Kondor, LΓ‘szlΓ³ Dobos, TamΓ‘s SebΕ‘k, JΓ³zsef StΓ©ger, IstvΓ‘n Csabai, GΓ‘bor Vattay arXiv ID 1605.02951 Category physics.soc-ph Cross-listed cs.SI Citations 31 Venue Palgrave Communications Last Checked 3 months ago
Abstract
Recently, numerous approaches have emerged in the social sciences to exploit the opportunities made possible by the vast amounts of data generated by online social networks (OSNs). Having access to information about users on such a scale opens up a range of possibilities, all without the limitations associated with often slow and expensive paper-based polls. A question that remains to be satisfactorily addressed, however, is how demography is represented in the OSN content? Here, we study language use in the US using a corpus of text compiled from over half a billion geo-tagged messages from the online microblogging platform Twitter. Our intention is to reveal the most important spatial patterns in language use in an unsupervised manner and relate them to demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented with the Robust Principal Component Analysis (RPCA) methodology. We find spatially correlated patterns that can be interpreted based on the words associated with them. The main language features can be related to slang use, urbanization, travel, religion and ethnicity, the patterns of which are shown to correlate plausibly with traditional census data. Our findings thus validate the concept of demography being represented in OSN language use and show that the traits observed are inherently present in the word frequencies without any previous assumptions about the dataset. Thus, they could form the basis of further research focusing on the evaluation of demographic data estimation from other big data sources, or on the dynamical processes that result in the patterns found here.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” physics.soc-ph

R.I.P. πŸ‘» Ghosted

Scale-free networks are rare

Anna D. Broido, Aaron Clauset

physics.soc-ph πŸ› Nat. Commun. πŸ“š 988 cites 8 years ago

Died the same way β€” πŸ‘» Ghosted