⚰️ Audio & Speech

R.I.P. 👻 Ghosted

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

Yi Zhao, Shinji Takaki, ... (+4 more)

eess.AS 🏛 IEEE Access 📚 67 cites 7 years ago

R.I.P. 👻 Ghosted

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

Shiyu Zhou, Linhao Dong, ... (+2 more)

eess.AS 🏛 ICONIP 📚 67 cites 7 years ago

R.I.P. 👻 Ghosted

You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

Aleksandr Laptev, Roman Korostik, ... (+4 more)

eess.AS 🏛 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) 📚 66 cites 5 years ago

R.I.P. 👻 Ghosted

Muse: Multi-modal target speaker extraction with visual cues

Zexu Pan, Ruijie Tao, ... (+2 more)

eess.AS 🏛 ICASSP 📚 65 cites 5 years ago

R.I.P. 👻 Ghosted

Diffusion-based Generative Speech Source Separation

Robin Scheibler, Youna Ji, ... (+4 more)

eess.AS 🏛 ICASSP 📚 65 cites 3 years ago

R.I.P. 👻 Ghosted

Audio Retrieval with WavText5K and CLAP Training

Soham Deshmukh, Benjamin Elizalde, Huaming Wang

eess.AS 🏛 Interspeech 📚 65 cites 3 years ago

R.I.P. 👻 Ghosted

Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

Chunyang Wu, Yongqiang Wang, ... (+3 more)

eess.AS 🏛 Interspeech 📚 64 cites 5 years ago

R.I.P. 👻 Ghosted

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

David Ditter, Timo Gerkmann

eess.AS 🏛 ICASSP 📚 64 cites 6 years ago

R.I.P. 👻 Ghosted

Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

Simon Mittermaier, Ludwig Kürzinger, ... (+2 more)

eess.AS 🏛 ICASSP 📚 64 cites 6 years ago

R.I.P. 👻 Ghosted

Quaternion Convolutional Neural Networks for Detection and Localization of 3D Sound Events

Danilo Comminiello, Marco Lella, ... (+2 more)

eess.AS 🏛 ICASSP 📚 63 cites 7 years ago

R.I.P. 👻 Ghosted

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

Yiwei Guo, Chenpeng Du, ... (+3 more)

eess.AS 🏛 ICASSP 📚 62 cites 2 years ago

R.I.P. 👻 Ghosted

PIANOTREE VAE: Structured Representation Learning for Polyphonic Music

Ziyu Wang, Yiyi Zhang, ... (+5 more)

eess.AS 🏛 ISMIR 📚 62 cites 5 years ago

R.I.P. 👻 Ghosted

Generative Adversarial Speaker Embedding Networks for Domain Robust End-to-End Speaker Verification

Gautam Bhattacharya, Joao Monteiro, ... (+2 more)

eess.AS 🏛 ICASSP 📚 62 cites 7 years ago

R.I.P. 👻 Ghosted

Low-resource expressive text-to-speech using data augmentation

Goeric Huybrechts, Thomas Merritt, ... (+4 more)

eess.AS 🏛 ICASSP 📚 61 cites 5 years ago

R.I.P. 👻 Ghosted

Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions

Suraj Tripathi, Abhay Kumar, ... (+3 more)

eess.AS 🏛 arXiv 📚 61 cites 6 years ago

R.I.P. 👻 Ghosted

Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

Minki Kang, Dongchan Min, Sung Ju Hwang

eess.AS 🏛 ICASSP 📚 61 cites 3 years ago

R.I.P. 👻 Ghosted

Speaker-invariant Affective Representation Learning via Adversarial Training

Haoqi Li, Ming Tu, ... (+3 more)

eess.AS 🏛 ICASSP 📚 60 cites 6 years ago

R.I.P. 👻 Ghosted

DiPCo -- Dinner Party Corpus

Maarten Van Segbroeck, Ahmed Zaid, ... (+8 more)

eess.AS 🏛 Interspeech 📚 60 cites 6 years ago

R.I.P. 👻 Ghosted

The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments

Mirco Ravanelli, Maurizio Omologo

eess.AS 🏛 ASRU 📚 60 cites 8 years ago

R.I.P. 👻 Ghosted

EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance

Yiwei Guo, Chenpeng Du, ... (+2 more)

eess.AS 🏛 ICASSP 📚 60 cites 3 years ago

R.I.P. 👻 Ghosted

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Meng Ge, Chenglin Xu, ... (+4 more)

eess.AS 🏛 ICASSP 📚 58 cites 5 years ago

R.I.P. 👻 Ghosted

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

Ethan A. Chi, Julian Salazar, Katrin Kirchhoff

eess.AS 🏛 NAACL 📚 58 cites 5 years ago

R.I.P. 👻 Ghosted

Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition

Qiujia Li, David Qiu, ... (+6 more)

eess.AS 🏛 ICASSP 📚 58 cites 5 years ago

R.I.P. 👻 Ghosted

The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion

Weicheng Cai, Haiwei Wu, ... (+2 more)

eess.AS 🏛 Interspeech 📚 58 cites 6 years ago

🏛️ The Audio & Speech Crypt