Personal Names Popularity Estimation and its Application to Record Linkage

November 13, 2018 Β· Declared Dead Β· πŸ› Symposium on Advances in Databases and Information Systems

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ksenia Zhagorina, Pavel Braslavski, Vladimir Gusev arXiv ID 1811.05361 Category cs.DB: Databases Citations 2 Venue Symposium on Advances in Databases and Information Systems Last Checked 4 months ago
Abstract
This study deals with a fairly simply formulated problem -- how to estimate the number of people bearing the same full name in a large population. Estimation of name popularity can leverage personal name matching in databases and be of interest for many other domains. A distinctive feature of large collections of names is that they contain a large number of unique items, which is challenging for statistical modeling. We investigate a number of statistical techniques and also propose a simple yet effective method aimed at obtaining more accurate count estimates. In our experiments we use a dataset containing about 20 million name occurrences that correspond to about 13 million real-world persons. We perform a thorough evaluation of the name count estimation methods and a record linkage experiment guided by name popularity estimates. Obtained results suggest that theoretically informed approaches outperform simple heuristics and can be useful in a variety of applications.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Databases

Died the same way β€” πŸ‘» Ghosted