Evaluating Readability and Faithfulness of Concept-based Explanations
April 29, 2024 Β· Entered Twilight Β· π Conference on Empirical Methods in Natural Language Processing
Repo contents: .gitignore, README.md, config.py, data, dataloaders, datasets_, evaluators, extractors, logger.py, main.py, metric_evaluators, models, run.sh, utils.py
Authors
Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang
arXiv ID
2404.18533
Category
cs.AI: Artificial Intelligence
Cross-listed
cs.HC
Citations
7
Venue
Conference on Empirical Methods in Natural Language Processing
Repository
https://github.com/hr-jin/Concept-Explanation-Evaluation
β 5
Last Checked
2 months ago
Abstract
With the growing popularity of general-purpose Large Language Models (LLMs), comes a need for more global explanations of model behaviors. Concept-based explanations arise as a promising avenue for explaining high-level patterns learned by LLMs. Yet their evaluation poses unique challenges, especially due to their non-local nature and high dimensional representation in a model's hidden space. Current methods approach concepts from different perspectives, lacking a unified formalization. This makes evaluating the core measures of concepts, namely faithfulness or readability, challenging. To bridge the gap, we introduce a formal definition of concepts generalizing to diverse concept-based explanations' settings. Based on this, we quantify the faithfulness of a concept explanation via perturbation. We ensure adequate perturbation in the high-dimensional space for different concepts via an optimization problem. Readability is approximated via an automatic and deterministic measure, quantifying the coherence of patterns that maximally activate a concept while aligning with human understanding. Finally, based on measurement theory, we apply a meta-evaluation method for evaluating these measures, generalizable to other types of explanations or tasks as well. Extensive experimental analysis has been conducted to inform the selection of explanation evaluation measures.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Artificial Intelligence
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
R.I.P.
π»
Ghosted
Addressing Function Approximation Error in Actor-Critic Methods
R.I.P.
π»
Ghosted
Explanation in Artificial Intelligence: Insights from the Social Sciences
R.I.P.
π»
Ghosted
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
R.I.P.
π»
Ghosted