| 201 |
SpeedySpeech: Efficient Neural Speech Synthesis
Jan Vainer, Ondřej Dušek
|
👻
Ghosted
|
eess.AS
|
49 |
5 years ago |
| 202 |
A New GAN-based End-to-End TTS Training Algorithm
Haohan Guo, Frank K. Soong, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
49 |
7 years ago |
| 203 |
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis
Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
|
👻
Ghosted
|
eess.AS
|
49 |
7 years ago |
| 204 |
An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
Bi-Cheng Yan, Meng-Che Wu, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
48 |
5 years ago |
| 205 |
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation
Gakuto Kurata, Kartik Audhkhasi
|
👻
Ghosted
|
cs.CL
|
48 |
7 years ago |
| 206 |
An Online Attention-based Model for Speech Recognition
Ruchao Fan, Pan Zhou, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
48 |
7 years ago |
| 207 |
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Pengcheng Guo, Haihua Xu, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
46 |
7 years ago |
| 208 |
Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation
Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
|
👻
Ghosted
|
cs.CL
|
46 |
8 years ago |
| 209 |
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Pranav Dheram, Murugesan Ramakrishnan, ... (+7 more)
|
👻
Ghosted
|
cs.CL
|
46 |
3 years ago |
| 210 |
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
Thilo von Neumann, Christoph Boeddeker, ... (+5 more)
|
👻
Ghosted
|
eess.AS
|
45 |
5 years ago |
| 211 |
Exploring Transformers for Large-Scale Speech Recognition
Liang Lu, Changliang Liu, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
45 |
5 years ago |
| 212 |
Language learning using Speech to Image retrieval
Danny Merkx, Stefan L. Frank, Mirjam Ernestus
|
👻
Ghosted
|
cs.CL
|
45 |
6 years ago |
| 213 |
Disfluencies and Human Speech Transcription Errors
Vicky Zayats, Trang Tran, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
45 |
7 years ago |
| 214 |
Improved training for online end-to-end speech recognition systems
Suyoun Kim, Michael L. Seltzer, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
45 |
8 years ago |
| 215 |
Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition
Zihan Zhao, Yanfeng Wang, Yu Wang
|
👻
Ghosted
|
cs.CL
|
45 |
3 years ago |
| 216 |
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Milind Rao, Anirudh Raju, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
44 |
5 years ago |
| 217 |
Improving Speaker-Independent Lipreading with Domain-Adversarial Training
Michael Wand, Juergen Schmidhuber
|
👻
Ghosted
|
cs.CV
|
44 |
8 years ago |
| 218 |
Comparison of Decoding Strategies for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, ... (+5 more)
|
👻
Ghosted
|
cs.CL
|
44 |
8 years ago |
| 219 |
Advances in Very Deep Convolutional Neural Networks for LVCSR
Tom Sercu, Vaibhava Goel
|
👻
Ghosted
|
cs.CL
|
44 |
10 years ago |
| 220 |
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
Thai-Son Nguyen, Sebastian Stueker, Alex Waibel
|
👻
Ghosted
|
cs.CV
|
43 |
5 years ago |
| 221 |
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition
Ye Bai, Jiangyan Yi, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
43 |
5 years ago |
| 222 |
DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation
Mandar Gogate, Ahsan Adeel, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
43 |
7 years ago |
| 223 |
Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering
Chenyu You, Nuo Chen, Yuexian Zou
|
👻
Ghosted
|
cs.CL
|
42 |
5 years ago |
| 224 |
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu, Changhan Wang, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
42 |
5 years ago |
| 225 |
Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
|
👻
Ghosted
|
cs.CL
|
42 |
5 years ago |
| 226 |
Relative Positional Encoding for Speech Recognition and Direct Translation
Ngoc-Quan Pham, Thanh-Le Ha, ... (+6 more)
|
👻
Ghosted
|
eess.AS
|
41 |
5 years ago |
| 227 |
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Yonatan Belinkov, Ahmed Ali, James Glass
|
👻
Ghosted
|
cs.CL
|
41 |
6 years ago |
| 228 |
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
Yuan Gong, Jian Yang, ... (+3 more)
|
🌅
Old Age
|
cs.CR
|
41 |
7 years ago |
| 229 |
Conditional End-to-End Audio Transforms
Albert Haque, Michelle Guo, Prateek Verma
|
👻
Ghosted
|
cs.SD
|
41 |
8 years ago |
| 230 |
Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition
Soheil Khorram, Zakaria Aldeneh, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
41 |
8 years ago |
| 231 |
Embedding-Based Speaker Adaptive Training of Deep Neural Networks
Xiaodong Cui, Vaibhava Goel, George Saon
|
👻
Ghosted
|
cs.CL
|
41 |
8 years ago |
| 232 |
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Chao Weng, Chengzhu Yu, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
40 |
6 years ago |
| 233 |
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis
Noé Tits, Fengna Wang, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
40 |
7 years ago |
| 234 |
Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries
Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee
|
🌅
Old Age
|
cs.SD
|
40 |
9 years ago |
| 235 |
Speaker Recognition for Children's Speech
Saeid Safavi, Maryam Najafian, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
40 |
9 years ago |
| 236 |
PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR
Yiwen Shao, Yiming Wang, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
39 |
5 years ago |
| 237 |
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment
Dan Lim, Won Jang, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
39 |
5 years ago |
| 238 |
Deep speech inpainting of time-frequency masks
Mikolaj Kegler, Pierre Beckmann, Milos Cernak
|
👻
Ghosted
|
cs.SD
|
39 |
6 years ago |
| 239 |
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
Ye Bai, Jiangyan Yi, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
39 |
6 years ago |
| 240 |
Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs
Matthew Roddy, Gabriel Skantze, Naomi Harte
|
👻
Ghosted
|
cs.CL
|
39 |
7 years ago |
| 241 |
Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
39 |
8 years ago |
| 242 |
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
Herman Kamper, Benjamin van Niekerk
|
👻
Ghosted
|
cs.CL
|
38 |
5 years ago |
| 243 |
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Zhong Meng, Yashesh Gaur, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
38 |
6 years ago |
| 244 |
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Naoyuki Kanda, Shota Horiguchi, ... (+4 more)
|
👻
Ghosted
|
cs.CL
|
38 |
6 years ago |
| 245 |
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Tobias Menne, Ilya Sklyar, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
38 |
6 years ago |
| 246 |
Dialogue Session Segmentation by Embedding-Enhanced TextTiling
Yiping Song, Lili Mou, ... (+5 more)
|
👻
Ghosted
|
cs.CL
|
38 |
9 years ago |
| 247 |
Multi-Modal Data Augmentation for End-to-End ASR
Adithya Renduchintala, Shuoyang Ding, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
37 |
8 years ago |
| 248 |
Enhancing Monotonic Multihead Attention for Streaming ASR
Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
|
👻
Ghosted
|
eess.AS
|
36 |
5 years ago |
| 249 |
Lite Audio-Visual Speech Enhancement
Shang-Yi Chuang, Yu Tsao, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
35 |
5 years ago |
| 250 |
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
Matt Whitehill, Shuang Ma, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
35 |
6 years ago |