APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning
December 19, 2022 ยท Declared Dead ยท ๐ Annual Meeting of the Association for Computational Linguistics
Authors
Soumya Sanyal, Yichong Xu, Shuohang Wang, Ziyi Yang, Reid Pryzant, Wenhao Yu, Chenguang Zhu, Xiang Ren
arXiv ID
2212.09282
Category
cs.CL: Computation & Language
Cross-listed
cs.AI,
cs.LG
Citations
12
Venue
Annual Meeting of the Association for Computational Linguistics
Repository
https://github.com/INK-USC/APOLLO
Last Checked
1 month ago
Abstract
Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA. The code base has been made publicly available.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted
Deep contextualized word representations
Died the same way โ ๐ 404 Not Found
R.I.P.
๐
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
๐
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
๐
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
๐
404 Not Found