⚰️ Audio & Speech

R.I.P. 👻 Ghosted

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

Chung-Cheng Chiu, Arun Narayanan, ... (+9 more)

eess.AS 🏛 SLT 📚 44 cites 5 years ago

R.I.P. 👻 Ghosted

Unsupervised adversarial domain adaptation for acoustic scene classification

Shayan Gharib, Konstantinos Drossos, ... (+3 more)

eess.AS 🏛 ICDCASE W 📚 44 cites 7 years ago

R.I.P. 👻 Ghosted

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Ye Bai, Jiangyan Yi, ... (+4 more)

eess.AS 🏛 Interspeech 📚 43 cites 5 years ago

R.I.P. 👻 Ghosted

Emotional Voice Conversion using Multitask Learning with Text-to-speech

Tae-Ho Kim, Sungjae Cho, ... (+3 more)

eess.AS 🏛 ICASSP 📚 43 cites 6 years ago

R.I.P. 👻 Ghosted

Contextual Speech Recognition with Difficult Negative Training Examples

Uri Alon, Golan Pundak, Tara N. Sainath

eess.AS 🏛 ICASSP 📚 43 cites 7 years ago

R.I.P. 👻 Ghosted

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

Kou Tanaka, Takuhiro Kaneko, ... (+2 more)

eess.AS 🏛 SLT 📚 43 cites 7 years ago

R.I.P. 👻 Ghosted

SpeechLMScore: Evaluating speech generation using speech language model

Soumi Maiti, Yifan Peng, ... (+2 more)

eess.AS 🏛 ICASSP 📚 43 cites 3 years ago

R.I.P. 👻 Ghosted

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

Daisuke Niizumi, Daiki Takeuchi, ... (+3 more)

eess.AS 🏛 ICASSP 📚 43 cites 3 years ago

R.I.P. 👻 Ghosted

CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition

Ruchao Fan, Wei Chu, ... (+2 more)

eess.AS 🏛 ICASSP 📚 42 cites 5 years ago

R.I.P. 👻 Ghosted

Self-Supervised Representations Improve End-to-End Speech Translation

Anne Wu, Changhan Wang, ... (+2 more)

eess.AS 🏛 Interspeech 📚 42 cites 5 years ago

R.I.P. 👻 Ghosted

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

Thomas Drugman, Yannis Stylianou

eess.AS 🏛 IEEE SPL 📚 42 cites 5 years ago

R.I.P. 👻 Ghosted

FoleyGen: Visually-Guided Audio Generation

Xinhao Mei, Varun Nagaraja, ... (+5 more)

eess.AS 🏛 ICMLSP W 📚 41 cites 2 years ago

R.I.P. 👻 Ghosted

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Chung-Ming Chien, Hung-yi Lee

eess.AS 🏛 SLT 📚 41 cites 5 years ago

R.I.P. 👻 Ghosted

TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

Manuel Sam Ribeiro, Jennifer Sanger, ... (+5 more)

eess.AS 🏛 SLT 📚 41 cites 5 years ago

R.I.P. 👻 Ghosted

Relative Positional Encoding for Speech Recognition and Direct Translation

Ngoc-Quan Pham, Thanh-Le Ha, ... (+6 more)

eess.AS 🏛 Interspeech 📚 41 cites 5 years ago

R.I.P. 👻 Ghosted

Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching

Chih-Kuan Yeh, Jianshu Chen, ... (+2 more)

eess.AS 🏛 ICLR 📚 41 cites 7 years ago

R.I.P. 👻 Ghosted

Modeling of nonlinear audio effects with end-to-end deep neural networks

Marco A. Martínez Ramirez, Joshua D. Reiss

eess.AS 🏛 arXiv 📚 41 cites 7 years ago

R.I.P. 👻 Ghosted

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

Zhangyu Xiao, Zhijian Ou, ... (+2 more)

eess.AS 🏛 ICCSLP 📚 41 cites 7 years ago

R.I.P. 👻 Ghosted

End-to-End Multimodal Speech Recognition

Shruti Palaskar, Ramon Sanabria, Florian Metze

eess.AS 🏛 ICASSP 📚 41 cites 8 years ago

R.I.P. 👻 Ghosted

S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification

Hang Zhao, Chen Zhang, ... (+3 more)

eess.AS 🏛 ICASSP 📚 41 cites 4 years ago

R.I.P. 👻 Ghosted

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

Won Jang, Dan Lim, Jaesam Yoon

eess.AS 🏛 arXiv 📚 40 cites 5 years ago

R.I.P. 👻 Ghosted

Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding

Bhuvan Agrawal, Markus Müller, ... (+4 more)

eess.AS 🏛 ICASSP 📚 40 cites 5 years ago

R.I.P. 👻 Ghosted

Multistream CNN for Robust Acoustic Modeling

Kyu J. Han, Jing Pan, ... (+3 more)

eess.AS 🏛 ICASSP 📚 40 cites 5 years ago

R.I.P. 👻 Ghosted

Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

Murali Karthick Baskar, Shinji Watanabe, ... (+4 more)

eess.AS 🏛 arXiv 📚 40 cites 6 years ago

🏛️ The Audio & Speech Crypt