Implications of Annotation Artifacts in Edge Probing Test Datasets
October 20, 2023 ยท Entered Twilight ยท ๐ Conference on Computational Natural Language Learning
Repo contents: .flake8, .gitignore, .pre-commit-config.yaml, .pre-commit-hooks.yaml, README.md, eptests, pyproject.toml, requirements-py38.txt, requirements.txt
Authors
Sagnik Ray Choudhury, Jushaan Kalra
arXiv ID
2310.13856
Category
cs.CL: Computation & Language
Citations
1
Venue
Conference on Computational Natural Language Learning
Repository
https://github.com/Josh1108/EPtest.git
Last Checked
3 months ago
Abstract
Edge probing tests are classification tasks that test for grammatical knowledge encoded in token representations coming from contextual encoders such as large language models (LLMs). Many LLM encoders have shown high performance in EP tests, leading to conjectures about their ability to encode linguistic knowledge. However, a large body of research claims that the tests necessarily do not measure the LLM's capacity to encode knowledge, but rather reflect the classifiers' ability to learn the problem. Much of this criticism stems from the fact that often the classifiers have very similar accuracy when an LLM vs a random encoder is used. Consequently, several modifications to the tests have been suggested, including information theoretic probes. We show that commonly used edge probing test datasets have various biases including memorization. When these biases are removed, the LLM encoders do show a significant difference from the random ones, even with the simple non-information theoretic probes.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age