⚰️ Sound

R.I.P. 👻 Ghosted

DDFAD: Dataset Distillation Framework for Audio Data

Wenbo Jiang, Rui Zhang, ... (+4 more)

cs.SD 🏛 arXiv 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Generative Semantic Communication for Text-to-Speech Synthesis

Jiahao Zheng, Jinke Ren, ... (+6 more)

cs.SD 🏛 GLOBECOM W 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Content and Style Aware Audio-Driven Facial Animation

Qingju Liu, Hyeongwoo Kim, Gaurav Bharaj

cs.SD 🏛 BMVC 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

Bohan Li, Wenbin Huang, ... (+8 more)

cs.SD 🏛 arXiv 📚 2 cites 7 months ago

R.I.P. 👻 Ghosted

Towards Generalized Source Tracing for Codec-Based Deepfake Speech

Xuanjun Chen, I-Ming Lin, ... (+4 more)

cs.SD 🏛 arXiv 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography

Yuqin Dai, Wanlu Zhu, ... (+5 more)

cs.SD 🏛 IJCV 📚 2 cites 12 months ago

R.I.P. 👻 Ghosted

Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation

Xinyi Tong, Yiran Zhu, ... (+10 more)

cs.SD 🏛 arXiv 📚 2 cites 7 months ago

R.I.P. 👻 Ghosted

FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders

Riccardo Fosco Gramaccioni, Christian Marinoni, ... (+4 more)

cs.SD 🏛 IJCNN 📚 2 cites 8 months ago

R.I.P. 👻 Ghosted

StereoSync: Spatially-Aware Stereo Audio Generation from Video

Christian Marinoni, Riccardo Fosco Gramaccioni, ... (+4 more)

cs.SD 🏛 IJCNN 📚 2 cites 8 months ago

R.I.P. 👻 Ghosted

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers

Juncheng Wang, Chao Xu, ... (+6 more)

cs.SD 🏛 EMNLP 📚 2 cites 8 months ago

📚 📚 The Cartographer

A Survey on Evaluation Metrics for Music Generation

Faria Binte Kader, Santu Karmaker

cs.SD 🏛 arXiv 📚 2 cites 10 months ago

R.I.P. 👻 Ghosted

MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions

Junjie Li, Wenxuan Wu, ... (+5 more)

cs.SD 🏛 arXiv 📚 2 cites 11 months ago

📚 📚 The Cartographer

Manipulated Regions Localization For Partially Deepfake Audio: A Survey

Jiayi He, Jiangyan Yi, ... (+3 more)

cs.SD 🏛 arXiv 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription

Anna Hamberger, Sebastian Murgul, ... (+2 more)

cs.SD 🏛 arXiv 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction

Wenxuan Wu, Shuai Wang, ... (+3 more)

cs.SD 🏛 Interspeech 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

Xilin Jiang, Junkai Wu, ... (+2 more)

cs.SD 🏛 ICASPAA W 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

Jessie Richter-Powell, Antonio Torralba, Jonathan Lorraine

cs.SD 🏛 arXiv 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding

Mingfei Chen, Israel D. Gebru, ... (+8 more)

cs.SD 🏛 CVPR 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Cross-Modal Learning for Music-to-Music-Video Description Generation

Zhuoyuan Mao, Mengjie Zhao, ... (+5 more)

cs.SD 🏛 ICRLN W 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

AVE Speech: A Comprehensive Multi-Modal Dataset for Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

Dongliang Zhou, Yakun Zhang, ... (+4 more)

cs.SD 🏛 IEEE THS 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification

Nishit Anand, Ashish Seth, ... (+2 more)

cs.SD 🏛 ICASSP W 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Stylus: Repurposing Stable Diffusion for Training-Free Music Style Transfer on Mel-Spectrograms

Heehwan Wang, Joonwoo Kwon, ... (+5 more)

cs.SD 🏛 arXiv 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation

Lucas Goncalves, Prashant Mathur, ... (+6 more)

cs.SD 🏛 ICASSP 📚 2 cites 1 year ago

R.I.P. 👻 Ghosted

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

Hui Yan, Zhenchun Lei, ... (+2 more)

cs.SD 🏛 ICASSP 📚 2 cites 1 year ago

🏛️ The Sound Crypt