Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers
May 07, 2024 Β· Declared Dead Β· π IEEE Access
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Won-Gi Paeng, Daesuk Kwon, Kyungwon Jeong, Honggyo Suh
arXiv ID
2405.04620
Category
hep-ph
Cross-listed
cs.AI,
cs.CL,
cs.LG,
cs.NE
Citations
0
Venue
IEEE Access
Last Checked
3 months ago
Abstract
In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β hep-ph
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds
R.I.P.
π»
Ghosted
An unfolding method based on conditional Invertible Neural Networks (cINN) using iterative training
R.I.P.
π»
Ghosted
PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics
R.I.P.
π»
Ghosted
Stacking machine learning classifiers to identify Higgs bosons at the LHC
R.I.P.
π»
Ghosted
The Power of Genetic Algorithms: what remains of the pMSSM?
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted