An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

April 17, 2024 · Declared Dead · 🏛 Empirical Software Engineering

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Mohannad Alhanahnah, Md Rashedul Hasan, Lisong Xu, Hamid Bagheri arXiv ID 2404.11050 Category cs.SE: Software Engineering Citations 14 Venue Empirical Software Engineering Last Checked 4 months ago

Abstract

Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper systematically investigates the capacity of Large Language Models (LLMs) to repair declarative specifications in Alloy, a declarative formal language used for software specification. We designed 12 different repair settings, encompassing single-agent and dual-agent paradigms, utilizing various LLMs. These configurations also incorporate different levels of feedback, including an auto-prompting mechanism for generating prompts autonomously using LLMs. Our study reveals that dual-agent with auto-prompting setup outperforms the other settings, albeit with a marginal increase in the number of iterations and token usage. This dual-agent setup demonstrated superior effectiveness compared to state-of-the-art Alloy APR techniques when evaluated on a comprehensive set of benchmarks. This work is the first to empirically evaluate LLM capabilities to repair declarative specifications, while taking into account recent trending LLM concepts such as LLM-based agents, feedback, auto-prompting, and tools, thus paving the way for future agent-based techniques in software engineering.