⚰️ Sound

R.I.P. 👻 Ghosted

Bridging Cultural and Digital Divides: A Low-Latency JackTrip Framework for Equitable Music Education in the Global South

Tiange Zhou, Marco Bidin

cs.SD 🏛 ICCSTE 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation

Mingyang Huang, Peng Zhang, Bang Zhang

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Segment-Factorized Full-Song Generation on Symbolic Piano Music

Ping-Yi Chen, Chih-Pin Tan, Yi-Hsuan Yang

cs.SD 🏛 arXiv 📚 1 cites 8 months ago

R.I.P. 👻 Ghosted

Prompt-aware classifier free guidance for diffusion models

Xuanhao Zhang, Chang Li

cs.SD 🏛 arXiv 📚 1 cites 8 months ago

R.I.P. 👻 Ghosted

Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models

Junyu Wang, Ziyang Ma, ... (+5 more)

cs.SD 🏛 arXiv 📚 1 cites 9 months ago

R.I.P. 👻 Ghosted

StereoFoley: Object-Aware Stereo Audio Generation from Video

Tornike Karchkhadze, Kuan-Lin Chen, ... (+5 more)

cs.SD 🏛 arXiv 📚 1 cites 9 months ago

R.I.P. 👻 Ghosted

PianoVAM: A Multimodal Piano Performance Dataset

Yonghyun Kim, Junhyung Park, ... (+5 more)

cs.SD 🏛 ISMIR 📚 1 cites 9 months ago

R.I.P. 👻 Ghosted

Multi-level SSL Feature Gating for Audio Deepfake Detection

Hoan My Tran, Damien Lolive, ... (+4 more)

cs.SD 🏛 ACM MM 📚 1 cites 9 months ago

R.I.P. 👻 Ghosted

MoTAS: MoE-Guided Feature Selection from TTS-Augmented Speech for Enhanced Multimodal Alzheimer's Early Screening

Yongqi Shao, Binxin Mei, ... (+3 more)

cs.SD 🏛 ACM MM 📚 1 cites 9 months ago

R.I.P. 👻 Ghosted

Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation

Fang Kang, Yin Cao, Haoyu Chen

cs.SD 🏛 Interspeech 📚 1 cites 11 months ago

R.I.P. 👻 Ghosted

MLLM-based Speech Recognition: When and How is Multimodality Beneficial?

Yiwen Guan, Viet Anh Trinh, ... (+2 more)

cs.SD 🏛 arXiv 📚 1 cites 11 months ago

R.I.P. 👻 Ghosted

Improving BERT for Symbolic Music Understanding Using Token Denoising and Pianoroll Prediction

Jun-You Wang, Li Su

cs.SD 🏛 ISMIR 📚 1 cites 11 months ago

R.I.P. 👻 Ghosted

Exploring Classical Piano Performance Generation with Expressive Music Variational AutoEncoder

Jing Luo, Xinyu Yang, Jie Wei

cs.SD 🏛 SMC 📚 1 cites 11 months ago

R.I.P. 👻 Ghosted

On the Design of Diffusion-based Neural Speech Codecs

Pietro Foti, Andreas Brendel

cs.SD 🏛 EUSIPCO 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Latent Swap Joint Diffusion for 2D Long-Form Latent Generation

Yusheng Dai, Chenxi Wang, ... (+8 more)

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Whisper-GPT: A Hybrid Representation Audio Large Language Model

Prateek Verma

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Aligner-Guided Training Paradigm: Advancing Text-to-Speech Models with Aligner Guided Duration

Haowei Lou, Helen Paik, ... (+2 more)

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services

Danush Venkateshperumal, Rahman Abdul Rafi, ... (+2 more)

cs.SD 🏛 Measurement: Digitalization 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI

Jongmin Jung, Andreas Jansson, Dasaem Jeong

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

🌅 💤 Eternal Rest

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

Irfan Nafiz Shahan, Pulok Ahmed Auvi

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Generative AI for Music and Audio

Hao-Wen Dong

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification

Aditya Sridhar

cs.SD 🏛 arXiv 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio

Gongyu Chen, Haomin Zhang, ... (+3 more)

cs.SD 🏛 ICASSP 📚 1 cites 1 year ago

R.I.P. 👻 Ghosted

SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera

Yuhang He, Sangyun Shin, ... (+3 more)

cs.SD 🏛 WACV W 📚 1 cites 1 year ago

🏛️ The Sound Crypt