(Almost) All of Entity Resolution
August 10, 2020 Β· Declared Dead Β· π Science Advances
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Olivier Binette, Rebecca C. Steorts
arXiv ID
2008.04443
Category
stat.ME
Cross-listed
cs.DB,
stat.ML
Citations
84
Venue
Science Advances
Last Checked
1 month ago
Abstract
Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β stat.ME
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology
R.I.P.
π»
Ghosted
External Validity: From Do-Calculus to Transportability Across Populations
R.I.P.
π»
Ghosted
Least Ambiguous Set-Valued Classifiers with Bounded Error Levels
R.I.P.
π»
Ghosted
Doubly Robust Policy Evaluation and Optimization
R.I.P.
π»
Ghosted
Comparison of Bayesian predictive methods for model selection
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted