Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers

June 09, 2020 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Tsung-Han Wu, Chun-Chen Hsieh, Yen-Hao Chen, Po-Han Chi, Hung-yi Lee arXiv ID 2006.05174 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.SD Citations 1 Venue arXiv.org Last Checked 3 months ago

Abstract

In this paper, we seek solutions for reducing the computation complexity of transformer-based models for speech representation learning. We evaluate 10 attention algorithms; then, we pre-train the transformer-based model with those attention algorithms in a self-supervised fashion and treat them as feature extractors on downstream tasks, including phoneme classification and speaker classification. With the assistance of t-SNE, PCA and some observation, the attention weights in self-supervised audio transformers can be categorized into four general cases. Based on these cases and some analyses, we are able to use a specific set of attention weights to initialize the model. Our approach shows comparable performance to the typical self-attention yet requires 20% less time in both training and inference.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Audio & Speech

R.I.P. 👻 Ghosted

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection

Massimiliano Todisco, Xin Wang, ... (+8 more)

eess.AS 🏛 Interspeech 📚 736 cites 7 years ago

R.I.P. 👻 Ghosted

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Jean-Marc Valin, Jan Skoglund

eess.AS 🏛 ICASSP 📚 489 cites 7 years ago

R.I.P. 👻 Ghosted

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Quan Wang, Hannah Muckenhirn, ... (+8 more)

eess.AS 🏛 Interspeech 📚 413 cites 7 years ago

R.I.P. 👻 Ghosted

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

Andy T. Liu, Shang-Wen Li, Hung-yi Lee

eess.AS 🏛 IEEE/ACM TASLP 📚 397 cites 5 years ago

R.I.P. 👻 Ghosted

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

Andy T. Liu, Shu-wen Yang, ... (+3 more)

eess.AS 🏛 ICASSP 📚 393 cites 6 years ago

R.I.P. 👻 Ghosted

Utterance-level Aggregation For Speaker Recognition In The Wild

Weidi Xie, Arsha Nagrani, ... (+2 more)

eess.AS 🏛 ICASSP 📚 365 cites 7 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago