Ethnicity sensitive author disambiguation using semi-supervised learning
August 31, 2015 ยท Declared Dead ยท ๐ International Conference on Knowledge Engineering and the Semantic Web
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Gilles Louppe, Hussein Al-Natsheh, Mateusz Susik, Eamonn Maguire
arXiv ID
1508.07744
Category
cs.DL: Digital Libraries
Cross-listed
cs.IR,
stat.ML
Citations
72
Venue
International Conference on Knowledge Engineering and the Semantic Web
Last Checked
2 months ago
Abstract
Author name disambiguation in bibliographic databases is the problem of grouping together scientific publications written by the same person, accounting for potential homonyms and/or synonyms. Among solutions to this problem, digital libraries are increasingly offering tools for authors to manually curate their publications and claim those that are theirs. Indirectly, these tools allow for the inexpensive collection of large annotated training data, which can be further leveraged to build a complementary automated disambiguation system capable of inferring patterns for identifying publications written by the same person. Building on more than 1 million publicly released crowdsourced annotations, we propose an automated author disambiguation solution exploiting this data (i) to learn an accurate classifier for identifying coreferring authors and (ii) to guide the clustering of scientific publications by distinct authors in a semi-supervised way. To the best of our knowledge, our analysis is the first to be carried out on data of this size and coverage. With respect to the state of the art, we validate the general pipeline used in most existing solutions, and improve by: (i) proposing phonetic-based blocking strategies, thereby increasing recall; and (ii) adding strong ethnicity-sensitive features for learning a linkage function, thereby tailoring disambiguation to non-Western author names whenever necessary.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Digital Libraries
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Measuring academic influence: Not all citations are equal
R.I.P.
๐ป
Ghosted
The Open Access Advantage Considering Citation, Article Usage and Social Media Attention
R.I.P.
๐ป
Ghosted
A Bibliometric Review of Large Language Models Research from 2017 to 2023
R.I.P.
๐ป
Ghosted
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering
R.I.P.
๐ป
Ghosted
A Systematic Identification and Analysis of Scientists on Twitter
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted