Exploring Neural Transducers for End-to-End Speech Recognition

July 24, 2017 · Declared Dead · 🏛 Automatic Speech Recognition & Understanding

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu arXiv ID 1707.07413 Category cs.CL: Computation & Language Cross-listed cs.NE Citations 233 Venue Automatic Speech Recognition & Understanding Last Checked 3 months ago

Abstract

In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperform the best reported CTC models with a language model, on the popular Hub5'00 benchmark. On our internal diverse dataset, these trends continue - RNNTransducer models rescored with a language model after beam search outperform our best CTC models. These results simplify the speech recognition pipeline so that decoding can now be expressed purely as neural network operations. We also study how the choice of encoder architecture affects the performance of the three models - when all encoder layers are forward only, and when encoders downsample the input representation aggressively.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computation & Language

🌅 🌅 Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL 🏛 NeurIPS 📚 166.0K cites 9 years ago

🌅 🌅 Old Age

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, ... (+2 more)

cs.CL 🏛 NAACL 📚 110.2K cites 7 years ago

🌅 🌅 Old Age

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Zihang Dai, ... (+4 more)

cs.CL 🏛 NeurIPS 📚 9.2K cites 6 years ago

🔮 🔮 The Ethereal

Effective Approaches to Attention-based Neural Machine Translation

Minh-Thang Luong, Hieu Pham, Christopher D. Manning

cs.CL 🏛 EMNLP 📚 8.3K cites 10 years ago

🌅 🌅 Old Age

A large annotated corpus for learning natural language inference

Samuel R. Bowman, Gabor Angeli, ... (+2 more)

cs.CL 🏛 EMNLP 📚 4.6K cites 10 years ago

🌅 🌅 Old Age

HellaSwag: Can a Machine Really Finish Your Sentence?

Rowan Zellers, Ari Holtzman, ... (+3 more)

cs.CL 🏛 ACL 📚 3.7K cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago