Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness
September 16, 2024 Β· Declared Dead Β· π Journal of the Association for Information Science and Technology
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
LucΓa CΓ©spedes, Diego Kozlowski, Carolina Pradier, Maxime Holmberg Sainte-Marie, Natsumi Solange Shokida, Pierre Benz, Constance Poitras, Anton Boudreau Ninkov, Saeideh Ebrahimy, Philips Ayeni, Sarra Filali, Bing Li, Vincent LariviΓ¨re
arXiv ID
2409.10633
Category
cs.DL: Digital Libraries
Cross-listed
cs.DB
Citations
38
Venue
Journal of the Association for Information Science and Technology
Last Checked
2 months ago
Abstract
Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research information. While already in use by scholars and research institutions, the quality of its metadata is currently being assessed. This paper contributes to this literature by assessing the completeness and accuracy of OpenAlex's metadata related to language, through a comparison with WoS, as well as an in-depth manual validation of a sample of 6,836 articles. Results show that OpenAlex exhibits a far more balanced linguistic coverage than WoS. However, language metadata is not always accurate, which leads OpenAlex to overestimate the place of English while underestimating that of other languages. If used critically, OpenAlex can provide comprehensive and representative analyses of languages used for scholarly publishing. However, more work is needed at infrastructural level to ensure the quality of metadata on language.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Digital Libraries
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Measuring academic influence: Not all citations are equal
R.I.P.
π»
Ghosted
The Open Access Advantage Considering Citation, Article Usage and Social Media Attention
R.I.P.
π»
Ghosted
A Bibliometric Review of Large Language Models Research from 2017 to 2023
R.I.P.
π»
Ghosted
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering
R.I.P.
π»
Ghosted
A Systematic Identification and Analysis of Scientists on Twitter
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted