Towards White Box Deep Learning

March 14, 2024 · Entered Twilight · 🏛 arXiv.org

Repo contents: .gitignore, LICENSE, README.md, env_setup.sh, jupyter, lib, models

Authors Maciej Satkiewicz arXiv ID 2403.09863 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.NE Citations 1 Venue arXiv.org Repository https://github.com/314-Foundation/white-box-nn ⭐ 5 Last Checked 4 months ago

Abstract

Deep neural networks learn fragile "shortcut" features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics - with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn