A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport

May 17, 2026 · Grace Period · 🏛 ICASSP2026

Authors Weixing Wei, Raynaldi Lalang, Dichucheng Li, Kazuyoshi Yoshii arXiv ID 2605.17405 Category cs.SD: Sound Cross-listed cs.MM Citations 0 Venue ICASSP2026

Abstract

This paper describes a novel paradigm that formalizes automatic piano transcription (APT) as an optimal transport (OT) problem, not as a frame-level multi-label binary classification problem. Our method learns to minimize the cost of transporting a predicted distribution of note events to the ground-truth distribution over time and frequency. The OT loss can thus accommodate temporal misalignment, leading to perceptually relevant optimization. We also propose a convolutional recurrent neural network (CRNN) with a harmonics-aware attention mechanism to capture the spectro-temporal dependencies inherent in music.Our experiments using the MAESTRO dataset showed that our method attained a state-of-the-art performance in onset detection. We confirmed the versatility of the OT loss in application to existing models.