PaCX-MAE: Physiology-Augmented Chest X-Ray Masked Autoencoder

June 01, 2026 · Grace Period · 🏛 the ICML 2026 3rd Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences

Authors Yancheng Liu, Kenichi Maeda, Manan Pancholy arXiv ID 2606.01537 Category cs.CV: Computer Vision Cross-listed cs.LG Citations 0 Venue the ICML 2026 3rd Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences

Abstract

Clinical diagnosis often requires combining imaging with physiological measurements, yet deployed models typically operate on unimodal data. We present PaCX-MAE, a cross-modal distillation framework that injects physiological priors into chest X-ray (CXR) encoders while remaining strictly unimodal at inference. PaCX-MAE augments in-domain masked autoencoding with a dual contrastive-predictive objective, aligning CXR representations with paired ECG and laboratory embeddings. Extensive evaluation across nine benchmarks demonstrates consistent improvements over domain-specific MAE, particularly on physiology-dependent tasks (e.g., +2.7 AUROC on MedMod; +6.5 F1 on VinDr). The method proves highly label-efficient in the 1% regime and preserves anatomical fidelity, achieving parity with MAE on segmentation tasks. Zero-shot and attention analyses confirm that PaCX-MAE successfully learns to attend to physiological indicators, such as the cardiac silhouette, absent in standard visual pretraining.