Google Dataset Search by the Numbers

June 12, 2020 Β· Declared Dead Β· πŸ› International Workshop on the Semantic Web

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Omar Benjelloun, Shiyu Chen, Natasha Noy arXiv ID 2006.06894 Category cs.IR: Information Retrieval Cross-listed cs.DB Citations 73 Venue International Workshop on the Semantic Web Last Checked 3 months ago
Abstract
Scientists, governments, and companies increasingly publish datasets on the Web. Google's Dataset Search extracts dataset metadata -- expressed using schema.org and similar vocabularies -- from Web pages in order to make datasets discoverable. Since we started the work on Dataset Search in 2016, the number of datasets described in schema.org has grown from about 500K to almost 30M. Thus, this corpus has become a valuable snapshot of data on the Web. To the best of our knowledge, this corpus is the largest and most diverse of its kind. We analyze this corpus and discuss where the datasets originate from, what topics they cover, which form they take, and what people searching for datasets are interested in. Based on this analysis, we identify gaps and possible future work to help make data more discoverable.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Information Retrieval

Died the same way β€” πŸ‘» Ghosted