Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
October 26, 2023 ยท Entered Twilight ยท ๐ Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning
Repo contents: .gitignore, LICENSE, README.md, Results.numbers, midi, tokenizers, train_clm.py, training_bevo.py
Authors
Venkata S Govindarajan, Juan Diego Rodriguez, Kaj Bostrom, Kyle Mahowald
arXiv ID
2310.17591
Category
cs.CL: Computation & Language
Citations
1
Venue
Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning
Repository
https://github.com/venkatasg/Lil-Bevo
โญ 2
Last Checked
3 months ago
Abstract
We present Lil-Bevo, our submission to the BabyLM Challenge. We pretrained our masked language models with three ingredients: an initial pretraining with music data, training on shorter sequences before training on longer ones, and masking specific tokens to target some of the BLiMP subtasks. Overall, our baseline models performed above chance, but far below the performance levels of larger LLMs trained on more data. We found that training on short sequences performed better than training on longer sequences.Pretraining on music may help performance marginally, but, if so, the effect seems small. Our targeted Masked Language Modeling augmentation did not seem to improve model performance in general, but did seem to help on some of the specific BLiMP tasks that we were targeting (e.g., Negative Polarity Items). Training performant LLMs on small amounts of data is a difficult but potentially informative task. While some of our techniques showed some promise, more work is needed to explore whether they can improve performance more than the modest gains here. Our code is available at https://github.com/venkatasg/Lil-Bevo and out models at https://huggingface.co/collections/venkatasg/babylm-653591cdb66f4bf68922873a
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age