Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework
June 17, 2025 Β· Declared Dead Β· π Conference on Empirical Methods in Natural Language Processing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Mohna Chakraborty, Lu Wang, David Jurgens
arXiv ID
2506.14948
Category
cs.HC: Human-Computer Interaction
Citations
9
Venue
Conference on Empirical Methods in Natural Language Processing
Last Checked
4 months ago
Abstract
Large language models (LLMs) are increasingly deployed in domains requiring moral understanding, yet their reasoning often remains shallow, and misaligned with human reasoning. Unlike humans, whose moral reasoning integrates contextual trade-offs, value systems, and ethical theories, LLMs often rely on surface patterns, leading to biased decisions in morally and ethically complex scenarios. To address this gap, we present a value-grounded framework for evaluating and distilling structured moral reasoning in LLMs. We benchmark 12 open-source models across four moral datasets using a taxonomy of prompts grounded in value systems, ethical theories, and cognitive reasoning strategies. Our evaluation is guided by four questions: (1) Does reasoning improve LLM decision-making over direct prompting? (2) Which types of value/ethical frameworks most effectively guide LLM reasoning? (3) Which cognitive reasoning strategies lead to better moral performance? (4) Can small-sized LLMs acquire moral competence through distillation? We find that prompting with explicit moral structure consistently improves accuracy and coherence, with first-principles reasoning and Schwartz's + care-ethics scaffolds yielding the strongest gains. Furthermore, our supervised distillation approach transfers moral competence from large to small models without additional inference cost. Together, our results offer a scalable path toward interpretable and value-grounded models.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Human-Computer Interaction
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Improving fairness in machine learning systems: What do industry practitioners need?
R.I.P.
π»
Ghosted
Identifying Stable Patterns over Time for Emotion Recognition from EEG
R.I.P.
π»
Ghosted
Questioning the AI: Informing Design Practices for Explainable AI User Experiences
R.I.P.
π»
Ghosted
Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
R.I.P.
π»
Ghosted
Educational data mining and learning analytics: An updated survey
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted