⚰️ Sound

R.I.P. 👻 Ghosted

It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model

Mingyi Shi, Dafei Qin, ... (+5 more)

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset

Sukhandeep Kaur, Mubashir Buhari, ... (+3 more)

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios

Yongkang Cheng, Mingjiang Liang, ... (+4 more)

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

UALM: Unified Audio Language Model for Understanding, Generation and Reasoning

Jinchuan Tian, Sang-gil Lee, ... (+12 more)

cs.SD 🏛 arXiv 📚 4 cites 8 months ago

R.I.P. 👻 Ghosted

STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution

Anton Firc, Manasi Chhibber, ... (+4 more)

cs.SD 🏛 Interspeech 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Stereo Sound Event Localization and Detection with Onscreen/offscreen Classification

Kazuki Shimada, Archontis Politis, ... (+11 more)

cs.SD 🏛 arXiv 📚 4 cites 11 months ago

R.I.P. 👻 Ghosted

Can Large Language Models Predict Audio Effects Parameters from Natural Language?

Seungheon Doh, Junghyun Koo, ... (+4 more)

cs.SD 🏛 ICASPAA W 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities

Parthasaarathy Sudarsanam, Irene Martín-Morató, Tuomas Virtanen

cs.SD 🏛 EUSIPCO 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment

Abhinaba Roy, Geeta Puri, Dorien Herremans

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

SteerMusic: Enhanced Musical Consistency for Zero-shot Text-guided and Personalized Music Editing

Xinlei Niu, Kin Wai Cheuk, ... (+9 more)

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Do Audio-Visual Segmentation Models Truly Segment Sounding Objects?

Jia Li, Wenjie Zhao, ... (+3 more)

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

FolAI: Synchronized Foley Sound Generation with Semantic and Temporal Alignment

Riccardo Fosco Gramaccioni, Christian Marinoni, ... (+5 more)

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Missing Melodies: AI Music Generation and its "Nearly" Complete Omission of the Global South

Atharva Mehta, Shivam Chauhan, Monojit Choudhury

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

A Theory-Based Explainable Deep Learning Architecture for Music Emotion

Hortense Fong, Vineet Kumar, K. Sudhir

cs.SD 🏛 Marketing science 📚 4 cites 1 year ago

R.I.P. 👻 Ghosted

Morse Code-Enabled Speech Recognition for Individuals with Visual and Hearing Impairments

Ritabrata Roy Choudhury

cs.SD 🏛 arXiv 📚 4 cites 1 year ago

R.I.P. 💀 404 Not Found

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Aditya Chakravarty

cs.SD 🏛 arXiv 📚 4 cites 2 years ago

R.I.P. 👻 Ghosted

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

David Hason Rudd, Huan Huo, Guandong Xu

cs.SD 🏛 PAKDD 📚 4 cites 2 years ago

R.I.P. 💀 404 Not Found

DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning

Aneesh Bhattacharya, Manas Paranjape, ... (+2 more)

cs.SD 🏛 AAAI 📚 4 cites 3 years ago

R.I.P. 👻 Ghosted

Parallel and Limited Data Voice Conversion Using Stochastic Variational Deep Kernel Learning

Mohamadreza Jafaryani, Hamid Sheikhzadeh, Vahid Pourahmadi

cs.SD 🏛 EAAI 📚 4 cites 2 years ago

R.I.P. 👻 Ghosted

Knowledge-based Multimodal Music Similarity

Andrea Poltronieri

cs.SD 🏛 Extended Semantic Web Conference 📚 4 cites 2 years ago

R.I.P. 👻 Ghosted

Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

Jose A. Gonzalez-Lopez, Miriam Gonzalez-Atienza, ... (+3 more)

cs.SD 🏛 IberSPEECH Conference 📚 4 cites 5 years ago

R.I.P. 👻 Ghosted

Improving the Classification of Rare Chords with Unlabeled Data

Marcelo Bortolozzo, Rodrigo Schramm, Claudio R. Jung

cs.SD 🏛 ICASSP 📚 4 cites 5 years ago

R.I.P. 👻 Ghosted

Semi-supervised Learning for Singing Synthesis Timbre

Jordi Bonada, Merlijn Blaauw

cs.SD 🏛 ICASSP 📚 4 cites 5 years ago

R.I.P. 👻 Ghosted

Speakerfilter-Pro: an improved target speaker extractor combines the time domain and frequency domain

Shulin He, Hao Li, Xueliang Zhang

cs.SD 🏛 ICCSLP 📚 4 cites 5 years ago

🏛️ The Sound Crypt