Universal and non-universal text statistics: Clustering coefficient for language identification

November 18, 2019 · Declared Dead · 🏛 Physica A: Statistical Mechanics and its Applications

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Diego Espitia, Hernán Larralde arXiv ID 1911.08915 Category physics.soc-ph Cross-listed cs.CL Citations 1 Venue Physica A: Statistical Mechanics and its Applications Last Checked 4 months ago

Abstract

In this work we analyze statistical properties of 91 relatively small texts in 7 different languages (Spanish, English, French, German, Turkish, Russian, Icelandic) as well as texts with randomly inserted spaces. Despite the size (around 11260 different words), the well known universal statistical laws -- namely Zipf and Herdan-Heap's laws -- are confirmed, and are in close agreement with results obtained elsewhere. We also construct a word co-occurrence network of each text. While the degree distribution is again universal, we note that the distribution of Clustering Coefficients, which depend strongly on the local structure of networks, can be used to differentiate between languages, as well as to distinguish natural languages from random texts.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — physics.soc-ph

📚 📚 The Cartographer

Community detection in networks: A user guide

Santo Fortunato, Darko Hric

physics.soc-ph 🏛 arXiv 📚 2.0K cites 9 years ago

R.I.P. 👻 Ghosted

Networks beyond pairwise interactions: structure and dynamics

Federico Battiston, Giulia Cencetti, ... (+6 more)

physics.soc-ph 🏛 Physics reports 📚 1.5K cites 6 years ago

R.I.P. 👻 Ghosted

Statistical physics of human cooperation

Matjaz Perc, Jillian J. Jordan, ... (+4 more)

physics.soc-ph 🏛 arXiv 📚 1.3K cites 9 years ago

R.I.P. 👻 Ghosted

Vital nodes identification in complex networks

Linyuan Lü, Duanbing Chen, ... (+4 more)

physics.soc-ph 🏛 arXiv 📚 1.1K cites 9 years ago

R.I.P. 👻 Ghosted

Influence maximization in complex networks through optimal percolation

Flaviano Morone, Hernan A. Makse

physics.soc-ph 🏛 Nature 📚 1.1K cites 10 years ago

R.I.P. 👻 Ghosted

Scale-free networks are rare

Anna D. Broido, Aaron Clauset

physics.soc-ph 🏛 Nat. Commun. 📚 988 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago