From Legal Text to Executable Decision Models: Evaluating Structured Representations for Legal Decision Model Generation

April 18, 2026 ยท Grace Period ยท ๐Ÿ› ICAIL 2026

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors David Graus arXiv ID 2604.17153 Category cs.CL: Computation & Language Cross-listed cs.AI Citations 0 Venue ICAIL 2026
Abstract
Transforming legal text into executable decision logic is a longstanding challenge in legal informatics. With the rise of LLMs, this task has gained renewed interest, but remains challenging due to requiring extensive manual coding and evaluation. We use a unique real-world dataset that pairs production-grade decision models with legal text from the Dutch Environment and Planning Act. These models power the Omgevingsloket government platform, where citizens check permit requirements for environmental activities. We study whether intermediate structured representations can improve LLM-based generation of executable decision models from legal text. We compare four input conditions: raw legal text, text enriched with semantic role labels, text enriched with input and output constraints, and text enriched with both. We evaluate along two dimensions: structural evaluation, through similarity to gold decision models with graph kernels and graphs' descriptive statistics, and outcome evaluation, through functional equivalence by executing models on pre-configured test scenarios. Our findings show that I/O constraints provide the dominant improvement (+37-54% similarity over baseline), while semantic role labels show modest improvements. Outcome evaluation shows that generated models match the gold standard on 51-53% of test scenarios, even though generated models are typically smaller and simpler. We find LLMs eliminate redundant pass-through logic that comprises up to 45-55% of nodes. Importantly, structural similarity and outcome equivalence are complementary: structural similarity does not guarantee outcome equivalence, and vice versa. To facilitate reproducibility, we publicly release our dataset of 95 production decision models with associated legal text and all experimental code.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 9 years ago