Exploring the Long-Term Generalization of Counting Behavior in RNNs
November 29, 2022 ยท Declared Dead ยท ๐ arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde
arXiv ID
2211.16429
Category
cs.NE: Neural & Evolutionary
Cross-listed
cs.FL,
cs.LG
Citations
6
Venue
arXiv.org
Last Checked
4 months ago
Abstract
In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Neural & Evolutionary
๐ฎ
๐ฎ
The Ethereal
R.I.P.
๐ป
Ghosted
Deep Learning using Rectified Linear Units (ReLU)
R.I.P.
๐ป
Ghosted
Generative Adversarial Text to Image Synthesis
R.I.P.
๐ป
Ghosted
Regularized Evolution for Image Classifier Architecture Search
R.I.P.
๐ป
Ghosted
Temporal Ensembling for Semi-Supervised Learning
๐
๐
Old Age
Learning Structured Sparsity in Deep Neural Networks
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted