Diff-XYZ: A Benchmark for Evaluating Diff Understanding
October 14, 2025 ยท Entered Twilight ยท ๐ arXiv.org
"Code repo scraped from project page (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: .github, .gitignore, .isort.cfg, CHANGELOG.md, LICENSE, README.md, lora, pyproject.toml, setup.py, tests, tox.ini
Authors
Evgeniy Glukhov, Michele Conti, Egor Bogomolov, Yaroslav Golubev, Alexander Bezzubov
arXiv ID
2510.12487
Category
cs.SE: Software Engineering
Cross-listed
cs.LG
Citations
0
Venue
arXiv.org
Repository
https://github.com/jieter/python-lora
โญ 71
Last Checked
2 months ago
Abstract
Reliable handling of code diffs is central to agents that edit and refactor repositories at scale. We introduce Diff-XYZ, a compact benchmark for code-diff understanding with three supervised tasks: apply (old code $+$ diff $\rightarrow$ new code), anti-apply (new code $-$ diff $\rightarrow$ old code), and diff generation (new code $-$ old code $\rightarrow$ diff). Instances in the benchmark are triples $\langle \textit{old code}, \textit{new code}, \textit{diff} \rangle$ drawn from real commits in CommitPackFT, paired with automatic metrics and a clear evaluation protocol. We use the benchmark to do a focused empirical study of the unified diff format and run a cross-format comparison of different diff representations. Our findings reveal that different formats should be used depending on the use case and model size. For example, representing diffs in search-replace format performs best for larger models across most tasks, while structured udiff variants offer similar but slightly weaker performance. In contrast, smaller open models benefit little from any formatting choice. The Diff-XYZ benchmark is a reusable foundation for assessing and improving diff handling in LLMs that can aid future development of diff formats and models editing code. The dataset is published on HuggingFace Hub: https://huggingface.co/datasets/JetBrains-Research/diff-xyz.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Software Engineering
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Microservices: yesterday, today, and tomorrow
๐
๐
The Cartographer
A Survey of Machine Learning for Big Code and Naturalness
R.I.P.
๐ป
Ghosted
An Overview on Smart Contracts: Challenges, Advances and Platforms
R.I.P.
๐ป
Ghosted
Slither: A Static Analysis Framework For Smart Contracts
R.I.P.
๐ป
Ghosted