AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design

June 01, 2026 · Grace Period · 🏛 Workshop on Generative and Agentic AI for Biology, 43rd International Conference on Machine Learning (ICML 2026)

Authors Sahil Rahman, Maxx Richard Rahman arXiv ID 2606.02386 Category cs.AI: Artificial Intelligence Cross-listed q-bio.QM Citations 0 Venue Workshop on Generative and Agentic AI for Biology, 43rd International Conference on Machine Learning (ICML 2026)

Abstract

Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints. We introduce AgentPLM, which addresses this by equipping a pre-trained PLM with i) Reasoning-Augmented Decoding (RAD), which interleaves autoregressive generation with tool calls (ESMFold, FoldX, AutoDock Vina), and ii) Contrastive Agent Policy Optimisation (CAPO), a trajectory-level extension of direct preference optimisation that trains the policy end-to-end to learn when oracle feedback is informative rather than merely imitating high-fitness sequences. We evaluate AgentPLM on benchmark tasks spanning de novo enzyme design, antibody optimisation, thermostability, PPI interface design, and zero-shot fitness prediction with standardised oracle APIs and controlled sequence-identity splits. AgentPLM achieves state-of-the-art results with a gain in antibody top-10% hit rate over the strongest passive baseline, providing mechanistic evidence of online error correction without explicit backtracking.