Late multimodal fusion for image and audio music transcription
April 06, 2022 Β· Declared Dead Β· π Expert systems with applications
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
MarΓa Alfaro-Contreras, Jose J. Valero-Mas, JosΓ© M. IΓ±esta, Jorge Calvo-Zaragoza
arXiv ID
2204.03063
Category
cs.MM: Multimedia
Cross-listed
cs.CV,
cs.IR,
cs.SD,
eess.AS
Citations
26
Venue
Expert systems with applications
Last Checked
2 months ago
Abstract
Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios -- in which the corresponding single-modality models yield different error rates -- showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Multimedia
R.I.P.
π»
Ghosted
π
π
Old Age
Quality Assessment of In-the-Wild Videos
R.I.P.
π»
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
R.I.P.
π»
Ghosted
A Comprehensive Survey on Cross-modal Retrieval
R.I.P.
π»
Ghosted
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
π»
Ghosted
A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted