Test Generation Strategies for Building Failure Models and Explaining Spurious Failures

December 09, 2023 · Declared Dead · 🏛 ACM Transactions on Software Engineering and Methodology

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Baharin Aliashrafi Jodat, Abhishek Chandar, Shiva Nejati, Mehrdad Sabetzadeh arXiv ID 2312.05631 Category cs.SE: Software Engineering Citations 10 Venue ACM Transactions on Software Engineering and Methodology Last Checked 4 months ago

Abstract

Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic. Failures resulting from invalid or unrealistic test inputs are spurious. Avoiding spurious failures improves the effectiveness of testing in exercising the main functions of a system, particularly for compute-intensive (CI) systems where a single test execution takes significant time. In this paper, we propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures. We examine two alternative strategies for building failure models: (1) machine learning (ML)-guided test generation and (2) surrogate-assisted test generation. ML-guided test generation infers boundary regions that separate passing and failing test inputs and samples test inputs from those regions. Surrogate-assisted test generation relies on surrogate models to predict labels for test inputs instead of exercising all the inputs. We propose a novel surrogate-assisted algorithm that uses multiple surrogate models simultaneously, and dynamically selects the prediction from the most accurate model. We empirically evaluate the accuracy of failure models inferred based on surrogate-assisted and ML-guided test generation algorithms. Using case studies from the domains of cyber-physical systems and networks, we show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%, significantly outperforming ML-guided test generation and two baselines. Further, our approach learns failure-inducing rules that identify genuine spurious failures as validated against domain knowledge.