๐ฎ
๐ฎ
The Ethereal
TEAdapter: Supply abundant guidance for controllable text-to-music generation
August 09, 2024 ยท Entered Twilight ยท ๐ IEEE International Conference on Multimedia and Expo
Repo contents: README.md, TEAdapter, TEAdapter_training, audioldm2, data, taming
Authors
Jialing Zou, Jiahao Mei, Xudong Nan, Jinghua Li, Daoguo Dong, Liang He
arXiv ID
2408.04865
Category
cs.SD: Sound
Cross-listed
cs.MM,
eess.AS
Citations
1
Venue
IEEE International Conference on Multimedia and Expo
Repository
https://github.com/Ashley1101/TEAdapter
โญ 2
Last Checked
3 months ago
Abstract
Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In addition, we explore the controllable generation of extended music by leveraging TEAdapter control groups trained on data of distinct structural functionalities. In general, we consider controls over global, elemental, and structural levels. Experimental results demonstrate that the proposed TEAdapter enables multiple precise controls and ensures high-quality music generation. Our module is also lightweight and transferable to any diffusion model architecture. Available code and demos will be found soon at https://github.com/Ashley1101/TEAdapter.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted