⚰️ Audio & Speech

R.I.P. 👻 Ghosted

CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

Yuchen Liu, Li-Chia Yang, ... (+2 more)

eess.AS 🏛 Interspeech 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

Towards Generating Diverse Audio Captions via Adversarial Training

Xinhao Mei, Xubo Liu, ... (+3 more)

eess.AS 🏛 IEEE/ACM TASLP 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

A Comparative Study of Data Augmentation Techniques for Deep Learning Based Emotion Recognition

Ravi Shankar, Abdouh Harouna Kenfack, ... (+2 more)

eess.AS 🏛 arXiv 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

Does Joint Training Really Help Cascaded Speech Translation?

Viet Anh Khoa Tran, David Thulke, ... (+3 more)

eess.AS 🏛 EMNLP 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

Qibing Bai, Tom Ko, Yu Zhang

eess.AS 🏛 Interspeech 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

PoeticTTS -- Controllable Poetry Reading for Literary Studies

Julia Koch, Florian Lux, ... (+7 more)

eess.AS 🏛 Interspeech 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

Magdalena Proszewska, Grzegorz Beringer, ... (+4 more)

eess.AS 🏛 Interspeech 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

Ilja Baumann, Dominik Wagner, ... (+2 more)

eess.AS 🏛 Interspeech 📚 6 cites 4 years ago

R.I.P. 👻 Ghosted

End-to-end speech recognition modeling from de-identified data

Martin Flechl, Shou-Chun Yin, ... (+2 more)

eess.AS 🏛 Interspeech 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

Sonal Joshi, Saurabh Kataria, ... (+5 more)

eess.AS 🏛 arXiv 📚 6 cites 4 years ago

R.I.P. 👻 Ghosted

A Speech Representation Anonymization Framework via Selective Noise Perturbation

Minh Tran, Mohammad Soleymani

eess.AS 🏛 ICASSP 📚 6 cites 4 years ago

📚 📚 The Cartographer

Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

Arushi, Roberto Dillon, ... (+2 more)

eess.AS 🏛 arXiv 📚 6 cites 3 years ago

R.I.P. 👻 Ghosted

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

Vineet Garg, Ognjen Rudovic, ... (+6 more)

eess.AS 🏛 Interspeech 📚 6 cites 4 years ago

R.I.P. 👻 Ghosted

Literary and Colloquial Tamil Dialect Identification

M. Nanmalar, P. Vijayalakshmi, T. Nagarajan

eess.AS 🏛 Circuits, systems, and signal processing 📚 5 cites 1 year ago

R.I.P. 👻 Ghosted

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Satyam Kumar, Sai Srujana Buddi, ... (+7 more)

eess.AS 🏛 Interspeech 📚 5 cites 2 years ago

R.I.P. 👻 Ghosted

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech

Shivam Mehta, Harm Lameris, ... (+4 more)

eess.AS 🏛 Interspeech 📚 5 cites 2 years ago

📚 📚 The Cartographer

A Comprehensive Survey on Generative AI for Video-to-Music Generation

Shulei Ji, Songruoyao Wu, ... (+3 more)

eess.AS 🏛 arXiv 📚 5 cites 1 year ago

R.I.P. 👻 Ghosted

Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning

Chirag Nagpal, Subhashini Venugopalan, ... (+4 more)

eess.AS 🏛 ICASSP 📚 5 cites 1 year ago

R.I.P. 👻 Ghosted

From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology

Haoyang Li, Yuchen Hu, ... (+4 more)

eess.AS 🏛 Interspeech 📚 5 cites 1 year ago

R.I.P. 👻 Ghosted

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Yen-Ju Lu, Jing Liu, ... (+5 more)

eess.AS 🏛 NeurIPS 📚 5 cites 1 year ago

R.I.P. 👻 Ghosted

Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models

Alec Wright, Alistair Carson, Lauri Juvela

eess.AS 🏛 ICASSP 📚 5 cites 1 year ago

R.I.P. 👻 Ghosted

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

Ruiqi Li, Rongjie Huang, ... (+3 more)

eess.AS 🏛 ACL 📚 5 cites 3 years ago

R.I.P. 👻 Ghosted

Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis

Neeraj Kumar, Srishti Goel, ... (+2 more)

eess.AS 🏛 arXiv 📚 5 cites 5 years ago

R.I.P. 👻 Ghosted

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

Hieu-Thi Luong, Junichi Yamagishi

eess.AS 🏛 Blizzard Challenge / Voice Conversion Challenge 📚 5 cites 5 years ago

🏛️ The Audio & Speech Crypt