JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

August 09, 2023 · Declared Dead · 🏛 Conference on Algebraic Informatics

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang arXiv ID 2308.04729 Category cs.SD: Sound Cross-listed cs.AI, cs.LG, cs.MM, eess.AS Citations 53 Venue Conference on Algebraic Informatics Last Checked 2 months ago

Abstract

Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at https://jenmusic.ai/audio-demos

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Sound

R.I.P. 👻 Ghosted

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, ... (+7 more)

cs.SD 🏛 Speech Synthesis 📚 8.0K cites 9 years ago

R.I.P. 👻 Ghosted

CNN Architectures for Large-Scale Audio Classification

Shawn Hershey, Sourish Chaudhuri, ... (+11 more)

cs.SD 🏛 ICASSP 📚 2.8K cites 9 years ago

R.I.P. 👻 Ghosted

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, Nima Mesgarani

cs.SD 🏛 IEEE/ACM TASLP 📚 2.1K cites 7 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Justin Salamon, Juan Pablo Bello

cs.SD 🏛 IEEE SPL 📚 1.4K cites 9 years ago

R.I.P. 👻 Ghosted

WaveGlow: A Flow-based Generative Network for Speech Synthesis

Ryan Prenger, Rafael Valle, Bryan Catanzaro

cs.SD 🏛 ICASSP 📚 1.1K cites 7 years ago

R.I.P. 👻 Ghosted

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

Morten Kolbæk, Dong Yu, ... (+2 more)

cs.SD 🏛 IEEE/ACM TASLP 📚 763 cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago