Size Doesn't Matter: Cosine-Scored Sparse Autoencoders

June 13, 2026 ยท Grace Period ยท ๐Ÿ› ICML 2026, Spotlight at the Mechanistic Interpretability Workshop

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Silen Naihin, Lev Stambler arXiv ID 2606.15054 Category cs.LG: Machine Learning Citations 0 Venue ICML 2026, Spotlight at the Mechanistic Interpretability Workshop
Abstract
Sparse autoencoders (SAEs) detect features via inner product, so a feature's activation scales with both its directional alignment and the input's norm. Under BatchTopK, high-norm tokens inflate all pre-activations simultaneously, claiming dictionary slots regardless of content alignment. This matters because sublayer normalization has already discarded the magnitude the score measures, so the encoder detects a quantity the model does not read. We replace the score with a learned blend of cosine similarity and input magnitude, letting the optimizer choose how much norm to use; a per-feature extension lets each feature decide independently. In both regimes, training is free to recover inner product but never does, with no feature ever choosing more than half-magnitude dependence. At matched reconstruction, the cosine encoder learns features that align with human-recognizable concepts far more often than standard, filling dictionary slots that inner product wastes on norm detectors. Loss reweighting that equalizes gradients barely closes the gap, confirming forward-pass score geometry as the lever. The advantage is not universal across tasks or depths, but we believe cosine scoring should be the default for dictionary learning on normalized representations.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning