| 51 |
To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection
Aparna Balagopalan, Benjamin Eyre, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
145 |
5 years ago |
| 52 |
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo
|
👻
Ghosted
|
cs.CL
|
144 |
8 years ago |
| 53 |
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition
Jinyu Li, Yu Wu, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
142 |
6 years ago |
| 54 |
Large-Scale Domain Adaptation via Teacher-Student Learning
Jinyu Li, Michael L. Seltzer, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
140 |
8 years ago |
| 55 |
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
Ju-chieh Chou, Cheng-chieh Yeh, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
137 |
8 years ago |
| 56 |
Deep Lip Reading: a comparison of models and an online application
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
|
👻
Ghosted
|
cs.CV
|
135 |
8 years ago |
| 57 |
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling
Ankur Bapna, Gokhan Tur, ... (+2 more)
|
👻
Ghosted
|
cs.AI
|
134 |
8 years ago |
| 58 |
The IBM 2015 English Conversational Telephone Speech Recognition System
George Saon, Hong-Kwang J. Kuo, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
132 |
11 years ago |
| 59 |
Automatic Dialect Detection in Arabic Broadcast Speech
Ahmed Ali, Najim Dehak, ... (+6 more)
|
👻
Ghosted
|
cs.CL
|
132 |
10 years ago |
| 60 |
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
Zhong-Qiu Wang, Jonathan Le Roux, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
132 |
8 years ago |
| 61 |
Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks
Huy Phan, Lars Hertel, ... (+2 more)
|
👻
Ghosted
|
cs.NE
|
128 |
10 years ago |
| 62 |
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
127 |
8 years ago |
| 63 |
Disfluency Detection using a Bidirectional LSTM
Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi
|
👻
Ghosted
|
cs.CL
|
126 |
10 years ago |
| 64 |
A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems
Layla El Asri, Jing He, Kaheer Suleman
|
👻
Ghosted
|
cs.CL
|
126 |
9 years ago |
| 65 |
Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Joel Shor, Dotan Emanuel, ... (+10 more)
|
👻
Ghosted
|
cs.CL
|
126 |
6 years ago |
| 66 |
Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge
Benjamin van Niekerk, Leanne Nortje, Herman Kamper
|
👻
Ghosted
|
eess.AS
|
126 |
6 years ago |
| 67 |
The Zero Resource Speech Challenge 2019: TTS without T
Ewan Dunbar, Robin Algayres, ... (+11 more)
|
👻
Ghosted
|
cs.CL
|
124 |
7 years ago |
| 68 |
Vector-Quantized Autoregressive Predictive Coding
Yu-An Chung, Hao Tang, James Glass
|
👻
Ghosted
|
eess.AS
|
124 |
6 years ago |
| 69 |
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
Shubham Toshniwal, Hao Tang, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
123 |
9 years ago |
| 70 |
Progressive Neural Networks for Transfer Learning in Emotion Recognition
John Gideon, Soheil Khorram, ... (+3 more)
|
👻
Ghosted
|
cs.LG
|
123 |
9 years ago |
| 71 |
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Heiga Zen, Yannis Agiomyrgiannakis, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
122 |
9 years ago |
| 72 |
Contextual RNN-T For Open Domain ASR
Mahaveer Jain, Gil Keren, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
121 |
6 years ago |
| 73 |
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen, Xu Tan, ... (+6 more)
|
👻
Ghosted
|
eess.AS
|
119 |
6 years ago |
| 74 |
Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, ... (+3 more)
|
👻
Ghosted
|
cs.CV
|
118 |
8 years ago |
| 75 |
Speaker anonymisation using the McAdams coefficient
Jose Patino, Natalia Tomashenko, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
118 |
5 years ago |
| 76 |
Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey, Hagen Soltau, Izhak Shafran
|
👻
Ghosted
|
cs.CL
|
117 |
6 years ago |
| 77 |
Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
Shamane Siriwardhana, Andrew Reis, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
117 |
5 years ago |
| 78 |
XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System
Peiling Lu, Jie Wu, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
113 |
6 years ago |
| 79 |
Attention-based End-to-End Models for Small-Footprint Keyword Spotting
Changhao Shan, Junbo Zhang, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
112 |
8 years ago |
| 80 |
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
Jinyu Li, Rui Zhao, ... (+9 more)
|
👻
Ghosted
|
eess.AS
|
112 |
5 years ago |
| 81 |
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer
Guan-Lin Chao, Ian Lane
|
👻
Ghosted
|
cs.CL
|
110 |
6 years ago |
| 82 |
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
Wen-Chin Huang, Tomoki Hayashi, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
109 |
6 years ago |
| 83 |
The IBM 2016 English Conversational Telephone Speech Recognition System
George Saon, Tom Sercu, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
107 |
10 years ago |
| 84 |
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko, Qiantong Xu, ... (+6 more)
|
👻
Ghosted
|
cs.LG
|
106 |
5 years ago |
| 85 |
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
Awni Hannun, Ann Lee, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
105 |
7 years ago |
| 86 |
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Siddique Latif, Rajib Rana, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
104 |
8 years ago |
| 87 |
Recognizing Multi-talker Speech with Permutation Invariant Training
Dong Yu, Xuankai Chang, Yanmin Qian
|
👻
Ghosted
|
cs.SD
|
101 |
9 years ago |
| 88 |
A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis
Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit
|
👻
Ghosted
|
cs.SD
|
101 |
6 years ago |
| 89 |
A Neural Parametric Singing Synthesizer
Merlijn Blaauw, Jordi Bonada
|
👻
Ghosted
|
cs.SD
|
99 |
9 years ago |
| 90 |
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
Bing Liu, Ian Lane
|
👻
Ghosted
|
cs.CL
|
99 |
8 years ago |
| 91 |
On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
Zhiping Zeng, Yerbolat Khassanov, ... (+4 more)
|
👻
Ghosted
|
cs.CL
|
97 |
7 years ago |
| 92 |
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Tao Tu, Yuan-Jui Chen, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
95 |
7 years ago |
| 93 |
Learning Speaker Representations with Mutual Information
Mirco Ravanelli, Yoshua Bengio
|
👻
Ghosted
|
eess.AS
|
94 |
7 years ago |
| 94 |
Multi-modal Attention for Speech Emotion Recognition
Zexu Pan, Zhaojie Luo, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
93 |
5 years ago |
| 95 |
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
Alexander H. Liu, Yu-An Chung, James Glass
|
👻
Ghosted
|
cs.CL
|
93 |
5 years ago |
| 96 |
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Naoyuki Kanda, Yashesh Gaur, ... (+5 more)
|
👻
Ghosted
|
eess.AS
|
92 |
5 years ago |
| 97 |
Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning
Jaebok Kim, Gwenn Englebienne, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
91 |
8 years ago |
| 98 |
Speech recognition for medical conversations
Chung-Cheng Chiu, Anshuman Tripathi, ... (+12 more)
|
👻
Ghosted
|
cs.CL
|
91 |
8 years ago |
| 99 |
PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss
Umut Isik, Ritwik Giri, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
91 |
5 years ago |
| 100 |
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Shane Settle, Keith Levin, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
89 |
9 years ago |