A holistic approach to polyphonic music transcription with neural networks
October 26, 2019 Β· Entered Twilight Β· π International Society for Music Information Retrieval Conference
"Last commit was 6.0 years ago (β₯5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, LICENSE, README.md, configs, data, deepspeech, environment.yml, multiproc.py, prepare.py, prepquartets.sh, runtest.sh, runtrain.sh, showmodel.py, test.py, train.py, transcribe.py, utils.py
Authors
Miguel A. RomΓ‘n, Antonio Pertusa, Jorge Calvo-Zaragoza
arXiv ID
1910.12086
Category
cs.SD: Sound
Cross-listed
cs.LG,
eess.AS,
stat.ML
Citations
33
Venue
International Society for Music Information Retrieval Conference
Repository
https://github.com/mangelroman/audio2score
β 24
Last Checked
2 months ago
Abstract
We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that can be further transformed into a score by incorporating tempo estimation, beat tracking, key estimation or rhythm quantization. Unlike these methods, our approach generates music notation directly from the input audio in a single stage. For this, we use a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss function which does not require annotated alignments of audio frames with the score rhythmic information. We trained our model using as input Haydn, Mozart, and Beethoven string quartets and Bach chorales synthesized with different tempos and expressive performances. The output is a textual representation of four-voice music scores based on **kern format. Although the proposed approach is evaluated in a simplified scenario, results show that this model can learn to transcribe scores directly from audio signals, opening a promising avenue towards complete AMT.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Sound
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
CNN Architectures for Large-Scale Audio Classification
R.I.P.
π»
Ghosted
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
R.I.P.
π»
Ghosted
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R.I.P.
π»
Ghosted