๐ฎ
๐ฎ
The Ethereal
IsarStep: a Benchmark for High-level Mathematical Reasoning
June 13, 2020 ยท The Ethereal ยท ๐ arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Wenda Li, Lei Yu, Yuhuai Wu, Lawrence C. Paulson
arXiv ID
2006.09265
Category
cs.LO: Logic in CS
Cross-listed
cs.AI,
cs.CL,
cs.LG,
cs.PL,
stat.ML
Citations
10
Venue
arXiv.org
Last Checked
2 months ago
Abstract
A well-defined benchmark is essential for measuring and accelerating research progress of machine learning models. In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models. We build a non-synthetic dataset from the largest repository of proofs written by human experts in a theorem prover. The dataset has a broad coverage of undergraduate and research-level mathematical and computer science theorems. In our defined task, a model is required to fill in a missing intermediate proposition given surrounding proofs. This task provides a starting point for the long-term goal of having machines generate human-readable proofs automatically. Our experiments and analysis reveal that while the task is challenging, neural models can capture non-trivial mathematical reasoning. We further design a hierarchical transformer that outperforms the transformer baseline.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Logic in CS
๐ฎ
๐ฎ
The Ethereal
Safe Reinforcement Learning via Shielding
๐ฎ
๐ฎ
The Ethereal
Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks
๐ฎ
๐ฎ
The Ethereal
Heterogeneous substitution systems revisited
๐ฎ
๐ฎ
The Ethereal
Omega-Regular Objectives in Model-Free Reinforcement Learning
๐ฎ
๐ฎ
The Ethereal