๐ฎ
๐ฎ
The Ethereal
Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation
November 10, 2022 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: README.md, code, license
Authors
Runbang Zhang, Yixiao Zhang, Kai Shao, Ying Shan, Gus Xia
arXiv ID
2211.05543
Category
cs.SD: Sound
Cross-listed
cs.LG,
eess.AS
Citations
6
Venue
arXiv.org
Repository
https://github.com/ldzhangyx/vis2mus
โญ 9
Last Checked
3 months ago
Abstract
In this study, we explore the representation mapping from the domain of visual arts to the domain of music, with which we can use visual arts as an effective handle to control music generation. Unlike most studies in multimodal representation learning that are purely data-driven, we adopt an analysis-by-synthesis approach that combines deep music representation learning with user studies. Such an approach enables us to discover \textit{interpretable} representation mapping without a huge amount of paired data. In particular, we discover that visual-to-music mapping has a nice property similar to equivariant. In other words, we can use various image transformations, say, changing brightness, changing contrast, style transfer, to control the corresponding transformations in the music domain. In addition, we released the Vis2Mus system as a controllable interface for symbolic music generation.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Sound
R.I.P.
๐ป
Ghosted
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks
R.I.P.
๐ป
Ghosted
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
R.I.P.
๐ป
Ghosted
TasNet: time-domain audio separation network for real-time, single-channel speech separation
R.I.P.
๐ป
Ghosted
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
R.I.P.
๐ป
Ghosted