⚰️ Audio & Speech

R.I.P. 👻 Ghosted

Multimodal Fusion with Semi-Supervised Learning Minimizes Annotation Quantity for Modeling Videoconference Conversation Experience

Andrew Chang, Chenkai Hu, ... (+6 more)

eess.AS 🏛 Interspeech 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Real-Time Auralization for First-Person Vocal Interaction in Immersive Virtual Environments

Mauricio Flores-Vargas, Enda Bates, Rachel McDonnell

eess.AS 🏛 arXiv 📚 0 cites 1 year ago

R.I.P. 👻 Ghosted

Detecting the terminality of speech-turn boundary for spoken interactions in French TV and Radio content

Rémi Uro, Marie Tahon, ... (+3 more)

eess.AS 🏛 Interspeech 📚 0 cites 2 years ago

R.I.P. 👻 Ghosted

Quality-Controlled Multimodal Emotion Recognition in Conversations with Identity-Based Transfer Learning and MAMBA Fusion

Zanxu Wang, Homayoon Beigi

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts

Kashaf Gulzar, Dominik Wagner, ... (+4 more)

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Systematic Evaluation of Time-Frequency Features for Binaural Sound Source Localization

Davoud Shariat Panah, Alessandro Ragano, ... (+3 more)

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs

Wei-Cheng Tseng, David Harwath

eess.AS 🏛 arXiv 📚 0 cites 6 months ago

R.I.P. 👻 Ghosted

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Minu Kim, Ji Sub Um, Hoirin Kim

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward

Guansu Wang, Peijie Sun

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Unifying Model and Layer Fusion for Speech Foundation Models

Yi-Jen Shih, David Harwath

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Quantizing Whisper-small: How design choices affect ASR performance

Arthur Söhler, Julian Irigoyen, Andreas Søeborg Kirkedal

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Pruning as Regularization: Sensitivity-Aware One-Shot Pruning in ASR

Julian Irigoyen, Arthur Söhler, Andreas Søeborg Kirkedal

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models

Harm Lameris, Shree Harsha Bokkahalli Satish, ... (+2 more)

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Data-Centric Lessons To Improve Speech-Language Pretraining

Vishaal Udandarao, Zhiyun Lu, ... (+7 more)

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization

Hyungjun Yoon, Seungjoo Lee, ... (+12 more)

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Can large audio language models understand child stuttering speech? speech summarization, and source separation

Chibuzor Okocha, Maya Bakri, Christan Grant

eess.AS 🏛 arXiv 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction

Qianheng Xu

eess.AS 🏛 IEEE Access 📚 0 cites 7 months ago

R.I.P. 👻 Ghosted

Unsupervised lexicon learning from speech is limited by representations rather than clustering

Danel Slabbert, Simon Malan, Herman Kamper

eess.AS 🏛 arXiv 📚 0 cites 8 months ago

R.I.P. 👻 Ghosted

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Yi-Cheng Lin, Yu-Hsuan Li Liang, ... (+5 more)

eess.AS 🏛 arXiv 📚 0 cites 8 months ago

R.I.P. 👻 Ghosted

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

Mingxuan Wang, Satoshi Nakamura

eess.AS 🏛 arXiv 📚 0 cites 8 months ago

R.I.P. 👻 Ghosted

Index-MSR: A high-efficiency multimodal fusion framework for speech recognition

Jinming Chen, Lu Wang, ... (+2 more)

eess.AS 🏛 arXiv 📚 0 cites 8 months ago

R.I.P. 👻 Ghosted

PerformSinger: Multimodal Singing Voice Synthesis Leveraging Synchronized Lip Cues from Singing Performance Videos

Ke Gu, Zhicong Wu, ... (+6 more)

eess.AS 🏛 arXiv 📚 0 cites 8 months ago

R.I.P. 👻 Ghosted

Attentive AV-FusionNet: Audio-Visual Quality Prediction with Hybrid Attention

Ina Salaj, Arijit Biswas

eess.AS 🏛 arXiv 📚 0 cites 8 months ago

R.I.P. 👻 Ghosted

UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets

Zhichao Sheng, Shilin Zhou, ... (+2 more)

eess.AS 🏛 arXiv 📚 0 cites 11 months ago

🏛️ The Audio & Speech Crypt