⚰️ Sound

R.I.P. 👻 Ghosted

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

Zihan Pan, Sailor Hardik Bhupendra, Jinyang Wu

cs.SD 🏛 arXiv 📚 3 cites 9 months ago

R.I.P. 👻 Ghosted

Teaching Physical Awareness to LLMs through Sounds

Weiguo Wang, Andy Nie, ... (+3 more)

cs.SD 🏛 ICML 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

MusFlow: Multimodal Music Generation via Conditional Flow Matching

Jiahao Song, Yuzhao Wang

cs.SD 🏛 ACM MM 📚 3 cites 1 year ago

📚 📚 The Cartographer

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives

Shuyu Li, Shulei Ji, ... (+4 more)

cs.SD 🏛 arXiv 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction

Wenxuan Wu, Xueyuan Chen, ... (+6 more)

cs.SD 🏛 IEEE JSTSP 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect

Qingyuan Fei, Wenjie Hou, ... (+2 more)

cs.SD 🏛 arXiv 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement

Meng-Ping Lin, Jen-Cheng Hou, ... (+5 more)

cs.SD 🏛 arXiv 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition

Shreya G. Upadhyay, Ali N. Salman, ... (+2 more)

cs.SD 🏛 ICASSP 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Enhanced Speech Emotion Recognition with Efficient Channel Attention Guided Deep CNN-BiLSTM Framework

Niloy Kumar Kundu, Sarah Kobir, ... (+3 more)

cs.SD 🏛 arXiv 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech

Idoko Agbo, Dr Hoda El-Sayed, M. D Kamruzzan Sarker

cs.SD 🏛 arXiv 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Zero-Shot Mono-to-Binaural Speech Synthesis

Alon Levkovitch, Julian Salazar, ... (+5 more)

cs.SD 🏛 Interspeech 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

LatentSpeech: Latent Diffusion for Text-To-Speech Generation

Haowei Lou, Helen Paik, ... (+3 more)

cs.SD 🏛 arXiv 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Multiple Choice Learning for Efficient Speech Separation with Many Speakers

David Perera, François Derrida, ... (+3 more)

cs.SD 🏛 ICASSP 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

Ailin Liu, Pepijn Vunderink, ... (+3 more)

cs.SD 🏛 Interspeech 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Mengzhe Geng, Xurong Xie, ... (+8 more)

cs.SD 🏛 IEEE TASLP 📚 3 cites 1 year ago

R.I.P. 👻 Ghosted

LSTM-CNN Network for Audio Signature Analysis in Noisy Environments

Praveen Damacharla, Hamid Rajabalipanah, Mohammad Hosein Fakheri

cs.SD 🏛 ICCSCI 📚 3 cites 2 years ago

R.I.P. 👻 Ghosted

Keyword spotting -- Detecting commands in speech using deep learning

Sumedha Rai, Tong Li, Bella Lyu

cs.SD 🏛 arXiv 📚 3 cites 2 years ago

R.I.P. 👻 Ghosted

Affective social anthropomorphic intelligent system

Md. Adyelullahil Mamun, Hasnat Md. Abdullah, ... (+3 more)

cs.SD 🏛 MTA 📚 3 cites 3 years ago

R.I.P. 👻 Ghosted

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

Keren Shao, Ke Chen, ... (+2 more)

cs.SD 🏛 ISMIR 📚 3 cites 2 years ago

R.I.P. 👻 Ghosted

NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis based on Frequency Modulation

Zhen Ye, Wei Xue, ... (+3 more)

cs.SD 🏛 IJCAI 📚 3 cites 3 years ago

R.I.P. 👻 Ghosted

Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification

Lam Pham, Trang Le, ... (+4 more)

cs.SD 🏛 ICCMI 📚 3 cites 3 years ago

R.I.P. 👻 Ghosted

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Zhaoheng Ni, Felix Grezes, ... (+2 more)

cs.SD 🏛 arXiv 📚 3 cites 5 years ago

R.I.P. 👻 Ghosted

From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music

Patrick Gray, Razvan Bunescu

cs.SD 🏛 arXiv 📚 3 cites 5 years ago

R.I.P. 👻 Ghosted

Latent Vector Recovery of Audio GANs

Andrew Keyes, Nicky Bayat, ... (+2 more)

cs.SD 🏛 arXiv 📚 3 cites 5 years ago

🏛️ The Sound Crypt