Kernelized Bayesian Softmax for Text Generation
November 01, 2019 ยท Declared Dead ยท ๐ Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ning Miao, Hao Zhou, Chengqi Zhao, Wenxian Shi, Lei Li
arXiv ID
1911.00274
Category
cs.CL: Computation & Language
Cross-listed
cs.LG
Citations
4
Venue
Neural Information Processing Systems
Last Checked
4 months ago
Abstract
Neural models for text generation require a softmax layer with proper token embeddings during the decoding phase. Most existing approaches adopt single point embedding for each token. However, a word may have multiple senses according to different context, some of which might be distinct. In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation. KerBS embodies two advantages: (a) it employs a Bayesian composition of embeddings for words with multiple senses; (b) it is adaptive to semantic variances of words and robust to rare sentence context by imposing learned kernels to capture the closeness of words (senses) in the embedding space. Empirical studies show that KerBS significantly boosts the performance of several text generation tasks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age
HellaSwag: Can a Machine Really Finish Your Sentence?
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted