The dynamic interplay between in-context and in-weight learning in humans and neural networks

February 13, 2024 · Declared Dead · 🏛 Proceedings of the National Academy of Sciences of the United States of America

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jacob Russin, Ellie Pavlick, Michael J. Frank arXiv ID 2402.08674 Category cs.NE: Neural & Evolutionary Cross-listed cs.LG, q-bio.NC Citations 5 Venue Proceedings of the National Academy of Sciences of the United States of America Last Checked 4 months ago

Abstract

Human learning embodies a striking duality: sometimes, we appear capable of following logical, compositional rules and benefit from structured curricula (e.g., in formal education), while other times, we rely on an incremental approach or trial-and-error, learning better from curricula that are randomly interleaved. Influential psychological theories explain this seemingly disparate behavioral evidence by positing two qualitatively different learning systems -- one for rapid, rule-based inferences and another for slow, incremental adaptation. It remains unclear how to reconcile such theories with neural networks, which learn via incremental weight updates and are thus a natural model for the latter type of learning, but are not obviously compatible with the former. However, recent evidence suggests that metalearning neural networks and large language models are capable of "in-context learning" (ICL) -- the ability to flexibly grasp the structure of a new task from a few examples. Here, we show that the dynamic interplay between ICL and default in-weight learning (IWL) naturally captures a broad range of learning phenomena observed in humans, reproducing curriculum effects on category-learning and compositional tasks, and recapitulating a tradeoff between flexibility and retention. Our work shows how emergent ICL can equip neural networks with fundamentally different learning properties that can coexist with their native IWL, thus offering a novel perspective on dual-process theories and human cognitive flexibility.