A Short Survey on Sense-Annotated Corpora

February 13, 2018 · The Cartographer · 🏛 International Conference on Language Resources and Evaluation

"No code URL or promise found in abstract"
"Title-pattern auto-detect: A Short Survey on Sense-Annotated Corpora"

Evidence collected by the PWNC Scanner

Authors Tommaso Pasini, Jose Camacho-Collados arXiv ID 1802.04744 Category cs.CL: Computation & Language Citations 15 Venue International Conference on Language Resources and Evaluation Last Checked 2 days ago

Abstract

Large sense-annotated datasets are increasingly necessary for training deep supervised systems in Word Sense Disambiguation. However, gathering high-quality sense-annotated data for as many instances as possible is a laborious and expensive task. This has led to the proliferation of automatic and semi-automatic methods for overcoming the so-called knowledge-acquisition bottleneck. In this short survey we present an overview of sense-annotated corpora, annotated either manually- or (semi)automatically, that are currently available for different languages and featuring distinct lexical resources as inventory of senses, i.e. WordNet, Wikipedia, BabelNet. Furthermore, we provide the reader with general statistics of each dataset and an analysis of their specific features.