CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification

February 12, 2025 · Declared Dead · 🏛 arXiv.org

"Paper promises code 'coming soon'"

Evidence collected by the PWNC Scanner

Authors Jiacheng Xu, Bo Pang, Jin Qu, Hiroaki Hayashi, Caiming Xiong, Yingbo Zhou arXiv ID 2502.08806 Category cs.SE: Software Engineering Cross-listed cs.AI, cs.LG Citations 11 Venue arXiv.org Last Checked 1 month ago

Abstract

Software testing is a critical aspect of software development, yet generating test cases remains a routine task for engineers. This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases under specific conditions. Spanning from simple assertion completions to writing test cases that cover specific code blocks across multiple files, these tasks are based on 12 python repositories, analyzing 845 problems with context lengths ranging from 4k to 128k tokens. Utilizing code testing frameworks, we propose a method to construct retrieval contexts using coverage information. While models exhibit comparable performance with short contexts, notable differences emerge with 16k contexts. Notably, models like GPT-4o and Claude 3.5 can effectively leverage relevant snippets; however, all models score below 35\% on the complex Task III, even with the oracle context provided, underscoring the benchmark's significance and the potential for model improvement. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Software Engineering

R.I.P. 👻 Ghosted

ImageJ2: ImageJ for the next generation of scientific image data

Curtis T. Rueden, Johannes Schindelin, ... (+5 more)

cs.SE 🏛 BMC Bioinformatics 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

GraphCodeBERT: Pre-training Code Representations with Data Flow

Daya Guo, Shuo Ren, ... (+16 more)

cs.SE 🏛 ICLR 📚 1.5K cites 5 years ago

R.I.P. 👻 Ghosted

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

Yuchi Tian, Kexin Pei, ... (+2 more)

cs.SE 🏛 ICSE 📚 1.4K cites 8 years ago

R.I.P. 👻 Ghosted

Microservices: yesterday, today, and tomorrow

Nicola Dragoni, Saverio Giallorenzo, ... (+5 more)

cs.SE 🏛 Present and Ulterior Software Engineering 📚 1.1K cites 9 years ago

R.I.P. 👻 Ghosted

Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

Yaqin Zhou, Shangqing Liu, ... (+3 more)

cs.SE 🏛 NeurIPS 📚 1.0K cites 6 years ago

R.I.P. 👻 Ghosted

A Survey of Machine Learning for Big Code and Naturalness

Miltiadis Allamanis, Earl T. Barr, ... (+2 more)

cs.SE 🏛 ACM CSUR 📚 962 cites 8 years ago

Died the same way — ⏳ Coming Soon™

R.I.P. ⏳ Coming Soon™

Exploring Simple Siamese Representation Learning

Xinlei Chen, Kaiming He

cs.CV 🏛 CVPR 📚 4.8K cites 5 years ago

R.I.P. ⏳ Coming Soon™

An Analysis of Scale Invariance in Object Detection - SNIP

Bharat Singh, Larry S. Davis

cs.CV 🏛 CVPR 📚 795 cites 8 years ago

R.I.P. ⏳ Coming Soon™

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection

Benjin Zhu, Zhengkai Jiang, ... (+3 more)

cs.CV 🏛 arXiv 📚 556 cites 6 years ago

R.I.P. ⏳ Coming Soon™

FSRNet: End-to-End Learning Face Super-Resolution with Facial Priors

Yu Chen, Ying Tai, ... (+3 more)

cs.CV 🏛 CVPR 📚 542 cites 8 years ago