Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation

October 17, 2017 Β· Entered Twilight Β· πŸ› AAAI/ACM Conference on AI, Ethics, and Society

πŸŒ… TWILIGHT: Old Age
Predates the code-sharing era β€” a pioneer of its time

"Last commit was 7.0 years ago (β‰₯5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, distillationsetup.pseudocode, process_chicago_police_ssl.R, process_compas_recidivism.R, process_lending_club_loan.R, process_nypd_stopandfrisk_weapon.R, utils.R

Authors Sarah Tan, Rich Caruana, Giles Hooker, Yin Lou arXiv ID 1710.06169 Category stat.ML: Machine Learning (Stat) Cross-listed cs.AI, cs.LG Citations 203 Venue AAAI/ACM Conference on AI, Ethics, and Society Repository https://github.com/shftan/auditblackbox ⭐ 9 Last Checked 2 months ago
Abstract
Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, a model distillation and comparison approach to audit such models. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by black-box models. We compare the student model trained with distillation to a second un-distilled transparent model trained on ground-truth outcomes, and use differences between the two models to gain insight into the black-box model. Our approach can be applied in a realistic setting, without probing the black-box model API. We demonstrate the approach on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Machine Learning (Stat)

R.I.P. πŸ‘» Ghosted

Graph Attention Networks

Petar VeličkoviΔ‡, Guillem Cucurull, ... (+4 more)

stat.ML πŸ› ICLR πŸ“š 24.7K cites 8 years ago
R.I.P. πŸ‘» Ghosted

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

stat.ML πŸ› arXiv πŸ“š 12.0K cites 9 years ago