Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation

August 02, 2024 ยท Declared Dead ยท ๐Ÿ› International Society for Music Information Retrieval Conference

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jiwoo Ryu, Hao-Wen Dong, Jongmin Jung, Dasaem Jeong arXiv ID 2408.01180 Category cs.SD: Sound Cross-listed cs.IR, cs.LG, eess.AS Citations 4 Venue International Society for Music Information Retrieval Conference Last Checked 3 months ago
Abstract
Representing symbolic music with compound tokens, where each token consists of several different sub-tokens representing a distinct musical feature or attribute, offers the advantage of reducing sequence length. While previous research has validated the efficacy of compound tokens in music sequence modeling, predicting all sub-tokens simultaneously can lead to suboptimal results as it may not fully capture the interdependencies between them. We introduce the Nested Music Transformer (NMT), an architecture tailored for decoding compound tokens autoregressively, similar to processing flattened tokens, but with low memory usage. The NMT consists of two transformers: the main decoder that models a sequence of compound tokens and the sub-decoder for modeling sub-tokens of each compound token. The experiment results showed that applying the NMT to compound tokens can enhance the performance in terms of better perplexity in processing various symbolic music datasets and discrete audio tokens from the MAESTRO dataset.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound

Died the same way โ€” ๐Ÿ‘ป Ghosted