MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
October 17, 2024 ยท Declared Dead ยท ๐ arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yutian Wang, Wanyin Yang, Zhenrong Dai, Yilong Zhang, Kun Zhao, Hui Wang
arXiv ID
2410.13419
Category
cs.SD: Sound
Cross-listed
cs.MM,
eess.AS
Citations
7
Venue
arXiv.org
Last Checked
3 months ago
Abstract
At present, neural network models show powerful sequence prediction ability and are used in many automatic composition models. In comparison, the way humans compose music is very different from it. Composers usually start by creating musical motifs and then develop them into music through a series of rules. This process ensures that the music has a specific structure and changing pattern. However, it is difficult for neural network models to learn these composition rules from training data, which results in a lack of musicality and diversity in the generated music. This paper posits that integrating the learning capabilities of neural networks with human-derived knowledge may lead to better results. To archive this, we develop the POP909$\_$M dataset, the first to include labels for musical motifs and their variants, providing a basis for mimicking human compositional habits. Building on this, we propose MeloTrans, a text-to-music composition model that employs principles of motif development rules. Our experiments demonstrate that MeloTrans excels beyond existing music generation models and even surpasses Large Language Models (LLMs) like ChatGPT-4. This highlights the importance of merging human insights with neural network capabilities to achieve superior symbolic music generation.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
๐ฎ
๐ฎ
The Ethereal
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted
MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted