M6(GPT)3: Generating Multitrack Modifiable Multi-Minute MIDI Music from Text using Genetic algorithms, Probabilistic methods and GPT Models in any Progression and Time Signature

September 19, 2024 · Declared Dead · 🏛 2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jakub Poćwiardowski, Mateusz Modrzejewski, Marek S. Tatara arXiv ID 2409.12638 Category cs.SD: Sound Cross-listed cs.HC, eess.AS Citations 1 Venue 2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) Last Checked 4 months ago

Abstract

This work introduces the M6(GPT)3 composer system, capable of generating complete, multi-minute musical compositions with complex structures in any time signature, in the MIDI domain from input descriptions in natural language. The system utilizes an autoregressive transformer language model to map natural language prompts to composition parameters in JSON format. The defined structure includes time signature, scales, chord progressions, and valence-arousal values, from which accompaniment, melody, bass, motif, and percussion tracks are created. We propose a genetic algorithm for the generation of melodic elements. The algorithm incorporates mutations with musical significance and a fitness function based on normal distribution and predefined musical feature values. The values adaptively evolve, influenced by emotional parameters and distinct playing styles. The system for generating percussion in any time signature utilises probabilistic methods, including Markov chains. Through both human and objective evaluations, we demonstrate that our music generation approach outperforms baselines on specific, musically meaningful metrics, offering a viable alternative to purely neural network-based systems.