Universal and non-universal text statistics: Clustering coefficient for language identification
November 18, 2019 Β· Declared Dead Β· π Physica A: Statistical Mechanics and its Applications
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Diego Espitia, HernΓ‘n Larralde
arXiv ID
1911.08915
Category
physics.soc-ph
Cross-listed
cs.CL
Citations
1
Venue
Physica A: Statistical Mechanics and its Applications
Last Checked
4 months ago
Abstract
In this work we analyze statistical properties of 91 relatively small texts in 7 different languages (Spanish, English, French, German, Turkish, Russian, Icelandic) as well as texts with randomly inserted spaces. Despite the size (around 11260 different words), the well known universal statistical laws -- namely Zipf and Herdan-Heap's laws -- are confirmed, and are in close agreement with results obtained elsewhere. We also construct a word co-occurrence network of each text. While the degree distribution is again universal, we note that the distribution of Clustering Coefficients, which depend strongly on the local structure of networks, can be used to differentiate between languages, as well as to distinguish natural languages from random texts.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β physics.soc-ph
π
π
The Cartographer
R.I.P.
π»
Ghosted
Networks beyond pairwise interactions: structure and dynamics
R.I.P.
π»
Ghosted
Statistical physics of human cooperation
R.I.P.
π»
Ghosted
Vital nodes identification in complex networks
R.I.P.
π»
Ghosted
Influence maximization in complex networks through optimal percolation
R.I.P.
π»
Ghosted
Scale-free networks are rare
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted