Role of Bias Terms in Dot-Product Attention

February 16, 2023 ยท Declared Dead ยท ๐Ÿ› IEEE International Conference on Acoustics, Speech, and Signal Processing

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Mahdi Namazifar, Devamanyu Hazarika, Dilek Hakkani-Tur arXiv ID 2302.08626 Category cs.NE: Neural & Evolutionary Citations 5 Venue IEEE International Conference on Acoustics, Speech, and Signal Processing Last Checked 4 months ago
Abstract
Dot-product attention is a core module in the present generation of neural network models, particularly transformers, and is being leveraged across numerous areas such as natural language processing and computer vision. This attention module is comprised of three linear transformations, namely query, key, and value linear transformations, each of which has a bias term. In this work, we study the role of these bias terms, and mathematically show that the bias term of the key linear transformation is redundant and could be omitted without any impact on the attention module. Moreover, we argue that the bias term of the value linear transformation has a more prominent role than that of the bias term of the query linear transformation. We empirically verify these findings through multiple experiments on language modeling, natural language understanding, and natural language generation tasks.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Neural & Evolutionary

๐Ÿ”ฎ ๐Ÿ”ฎ The Ethereal

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh Kumar Srivastava, ... (+3 more)

cs.NE ๐Ÿ› IEEE TNNLS ๐Ÿ“š 6.0K cites 11 years ago

Died the same way โ€” ๐Ÿ‘ป Ghosted