Evaluating Readability and Faithfulness of Concept-based Explanations

April 29, 2024 · Entered Twilight · 🏛 Conference on Empirical Methods in Natural Language Processing

Repo contents: .gitignore, README.md, config.py, data, dataloaders, datasets_, evaluators, extractors, logger.py, main.py, metric_evaluators, models, run.sh, utils.py

Authors Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang arXiv ID 2404.18533 Category cs.AI: Artificial Intelligence Cross-listed cs.HC Citations 7 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/hr-jin/Concept-Explanation-Evaluation ⭐ 5 Last Checked 2 months ago

Abstract

With the growing popularity of general-purpose Large Language Models (LLMs), comes a need for more global explanations of model behaviors. Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by LLMs. Yet their evaluation poses unique challenges, especially due to their non-local nature and high dimensional representation in a model's hidden space. Current methods approach concepts from different perspectives, lacking a unified formalization. This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging. To bridge the gap, we introduce a formal definition of concepts generalizing to diverse concept-based explanations' settings. Based on this, we quantify the faithfulness of a concept explanation via perturbation. We ensure adequate perturbation in the high-dimensional space for different concepts via an optimization problem. Readability is approximated via an automatic and deterministic measure, quantifying the coherence of patterns that maximally activate a concept while aligning with human understanding. Finally, based on measurement theory, we apply a meta-evaluation method for evaluating these measures, generalizable to other types of explanations or tasks as well. Extensive experimental analysis has been conducted to inform the selection of explanation evaluation measures.