⚰️ Audio & Speech

R.I.P. 💀 404 Not Found

M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset

Shilong Wu

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion

Ishan D. Biyani, Nirmesh J. Shah, ... (+3 more)

eess.AS 🏛 Interspeech 📚 0 cites 1 year ago

R.I.P. ⚰️ The Empty Tomb

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

Sungwoo Cho, Jeongsoo Choi, ... (+2 more)

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Meta-Learning-Based Delayless Subband Adaptive Filter using Complex Self-Attention for Active Noise Control

Pengxing Feng, Hing Cheung So

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

🌅 💤 Eternal Rest

Speech Watermarking with Discrete Intermediate Representations

Shengpeng Ji, Ziyue Jiang, ... (+5 more)

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

TACO: Training-free Sound Prompted Segmentation via Semantically Constrained Audio-visual CO-factorization

Hugo Malard, Michel Olvera, ... (+2 more)

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment

Firdavs Nasriddinov, Rafal Kocielnik, ... (+5 more)

eess.AS 🏛 ML4H@NeurIPS 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Late fusion ensembles for speech recognition on diverse input audio representations

Marin Jezidžić, Matej Mihelčić

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition

Hyeonseung Lee, Ji Won Yoon, ... (+2 more)

eess.AS 🏛 IEEE SPL 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection

Tzu-Ting Yang, Hsin-Wei Wang, ... (+2 more)

eess.AS 🏛 SLT 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

A Context-Based Numerical Format Prediction for a Text-To-Speech System

Yaser Darwesh, Lit Wei Wern, Mumtaz Begum Mustafa

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Memory-Efficient Training for Text-Dependent SV with Independent Pre-trained Models

Seyed Ali Farokh, Hossein Zeinali

eess.AS 🏛 ROCLING 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Cluster-to-Predict Affect Contours from Speech

Gökhan Kuşçu, Engin Erzin

eess.AS 🏛 arXiv 📚 0 cites 2 years ago

R.I.P. 👻 Ghosted

Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification

Iván López-Espejo, Santi Prieto, ... (+2 more)

eess.AS 🏛 ICMLSP W 📚 0 cites 3 years ago

R.I.P. 👻 Ghosted

Speaker Diaphragm Excursion Prediction: deep attention and online adaptation

Yuwei Ren, Matt Zivney, ... (+4 more)

eess.AS 🏛 ICASSP 📚 0 cites 3 years ago

R.I.P. 👻 Ghosted

Improved Lossless Coding for Storage and Transmission of Multichannel Immersive Audio

Toni Hirvonen, Mahmoud Namazi

eess.AS 🏛 arXiv 📚 0 cites 2 years ago

R.I.P. 👻 Ghosted

Text-to-speech for the hearing impaired

Josef Schlittenlacher, Thomas Baer

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

Comparison of Speaker Role Recognition and Speaker Enrollment Protocol for conversational Clinical Interviews

Rachid Riad, Hadrien Titeux, ... (+7 more)

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks

Yang Jiao

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

End-to-End Trainable Self-Attentive Shallow Network for Text-Independent Speaker Verification

Hyeonmook Park, Jungbae Park, Sang Wan Lee

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

Deep F-measure Maximization for End-to-End Speech Understanding

Leda Sarı, Mark Hasegawa-Johnson

eess.AS 🏛 Interspeech 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

Gábor Gosztolya, László Tóth

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

Weakly Supervised Construction of ASR Systems with Massive Video Data

Mengli Cheng, Chengyu Wang, ... (+3 more)

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

R.I.P. 👻 Ghosted

Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages

Siyuan Feng

eess.AS 🏛 arXiv 📚 0 cites 5 years ago

🏛️ The Audio & Speech Crypt