Entropic selection of concepts unveils hidden topics in documents corpora

May 18, 2017 Β· Declared Dead Β· πŸ› arXiv.org

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Andrea Martini, Alessio Cardillo, Paolo De Los Rios arXiv ID 1705.06510 Category physics.soc-ph Cross-listed cs.CL, cs.DL, cs.SI Citations 2 Venue arXiv.org Last Checked 4 months ago
Abstract
The organization and evolution of science has recently become itself an object of scientific quantitative investigation, thanks to the wealth of information that can be extracted from scientific documents, such as citations between papers and co-authorship between researchers. However, only few studies have focused on the concepts that characterize full documents and that can be extracted and analyzed, revealing the deeper organization of scientific knowledge. Unfortunately, several concepts can be so common across documents that they hinder the emergence of the underlying topical structure of the document corpus, because they give rise to a large amount of spurious and trivial relations among documents. To identify and remove common concepts, we introduce a method to gauge their relevance according to an objective information-theoretic measure related to the statistics of their occurrence across the document corpus. After progressively removing concepts that, according to this metric, can be considered as generic, we find that the topic organization displays a correspondingly more refined structure.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” physics.soc-ph

R.I.P. πŸ‘» Ghosted

Scale-free networks are rare

Anna D. Broido, Aaron Clauset

physics.soc-ph πŸ› Nat. Commun. πŸ“š 988 cites 8 years ago

Died the same way β€” πŸ‘» Ghosted