Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation

November 10, 2022 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: README.md, code, license

Authors Runbang Zhang, Yixiao Zhang, Kai Shao, Ying Shan, Gus Xia arXiv ID 2211.05543 Category cs.SD: Sound Cross-listed cs.LG, eess.AS Citations 6 Venue arXiv.org Repository https://github.com/ldzhangyx/vis2mus โญ 9 Last Checked 3 months ago
Abstract
In this study, we explore the representation mapping from the domain of visual arts to the domain of music, with which we can use visual arts as an effective handle to control music generation. Unlike most studies in multimodal representation learning that are purely data-driven, we adopt an analysis-by-synthesis approach that combines deep music representation learning with user studies. Such an approach enables us to discover \textit{interpretable} representation mapping without a huge amount of paired data. In particular, we discover that visual-to-music mapping has a nice property similar to equivariant. In other words, we can use various image transformations, say, changing brightness, changing contrast, style transfer, to control the corresponding transformations in the music domain. In addition, we released the Vis2Mus system as a controllable interface for symbolic music generation.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound