Optimal Predicate Pushdown Synthesis

April 14, 2026 ยท Grace Period ยท ๐Ÿ› Proc. ACM Program. Lang. 10, PLDI, Article 234 (June 2026), 25 pages

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Robert Zhang, Eric Hayden Campbell, Dixin Tang, Isil Dillig arXiv ID 2604.13351 Category cs.PL: Programming Languages Citations 0 Venue Proc. ACM Program. Lang. 10, PLDI, Article 234 (June 2026), 25 pages
Abstract
Predicate pushdown is a long-standing performance optimization that filters data as early as possible in a computational workflow. In modern data pipelines, this transformation is especially important because much of the computation occurs inside user-defined functions (UDFs) written in general-purpose languages such as Python and Scala. These UDFs capture rich domain logic and complex aggregations and are among the most expensive operations in a pipeline. Moving filters ahead of such UDFs can yield substantial performance gains, but doing so requires semantic reasoning. This paper introduces a general semantic foundation for predicate pushdown over stateful fold-based computations. We view pushdown as a correspondence between two programs that process different subsets of input data, with correctness witnessed by a bisimulation invariant relating their internal states. Building on this foundation, we develop a sound and relatively complete framework for verification, alongside a synthesis algorithm that automatically constructs optimal pushdown decompositions by finding the strongest admissible pre-filters and weakest residual post-filters. We implement this approach in a tool called Pusharoo and evaluate it on 150 real-world pandas and Spark data-processing pipelines. Our evaluation shows that Pusharoo is significantly more expressive than prior work, producing optimal pushdown transformations with a median synthesis time of 1.6 seconds per benchmark. Furthermore, our experiments demonstrate that the discovered pushdown optimizations speed up end-to-end execution by an average of 2.4$\times$ and up to two orders of magnitude.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Programming Languages