| 401 |
End-to-End Sound Source Separation Conditioned On Instrument Labels
Olga Slizovskaia, Leo Kim, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
34 |
7 years ago |
| 402 |
Learning a Representation for Cover Song Identification Using Convolutional Neural Network
Zhesong Yu, Xiaoshuo Xu, ... (+2 more)
|
👻
Ghosted
|
cs.MM
|
34 |
6 years ago |
| 403 |
Attentive Modality Hopping Mechanism for Speech Emotion Recognition
Seunghyun Yoon, Subhadeep Dey, ... (+2 more)
|
👻
Ghosted
|
cs.LG
|
34 |
6 years ago |
| 404 |
Cooperative Learning via Federated Distillation over Fading Channels
Jin-Hyun Ahn, Osvaldo Simeone, Joonhyuk Kang
|
👻
Ghosted
|
eess.SP
|
34 |
6 years ago |
| 405 |
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Jianyu Wang, Hao Liang, Gauri Joshi
|
👻
Ghosted
|
cs.LG
|
34 |
6 years ago |
| 406 |
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer
Sanyuan Chen, Yu Wu, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
34 |
5 years ago |
| 407 |
Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition
Wei Zhou, Simon Berger, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
34 |
5 years ago |
| 408 |
A data set providing synthetic and real-world fisheye video sequences
Andrea Eichenseer, André Kaup
|
👻
Ghosted
|
eess.IV
|
34 |
3 years ago |
| 409 |
Vision, Deduction and Alignment: An Empirical Study on Multi-modal Knowledge Graph Alignment
Yangning Li, Jiaoyan Chen, ... (+4 more)
|
👻
Ghosted
|
cs.AI
|
34 |
3 years ago |
| 410 |
Extending Whisper with prompt tuning to target-speaker ASR
Hao Ma, Zhiyuan Peng, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
34 |
2 years ago |
| 411 |
Semi-supervised and Transfer learning approaches for low resource sentiment classification
Rahul Gupta, Saurabh Sahu, ... (+2 more)
|
👻
Ghosted
|
cs.IR
|
33 |
8 years ago |
| 412 |
Towards Pose-invariant Lip-Reading
Shiyang Cheng, Pingchuan Ma, ... (+5 more)
|
👻
Ghosted
|
cs.CV
|
33 |
6 years ago |
| 413 |
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
Vatsal Aggarwal, Marius Cotescu, ... (+3 more)
|
👻
Ghosted
|
cs.LG
|
33 |
6 years ago |
| 414 |
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim, Yuan Shangguan, ... (+5 more)
|
👻
Ghosted
|
cs.CL
|
33 |
5 years ago |
| 415 |
Sign language segmentation with temporal convolutional networks
Katrin Renz, Nicolaj C. Stache, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
33 |
5 years ago |
| 416 |
REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling
Hu Hu, Xuesong Yang, ... (+7 more)
|
👻
Ghosted
|
eess.AS
|
33 |
5 years ago |
| 417 |
Adaptive Bi-directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension
Nuo Chen, Fenglin Liu, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
33 |
5 years ago |
| 418 |
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Dongmei Wang, Xiong Xiao, ... (+3 more)
|
👻
Ghosted
|
eess.AS
|
33 |
3 years ago |
| 419 |
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Xinjian Li, Ye Jia, Chung-Cheng Chiu
|
👻
Ghosted
|
cs.CL
|
33 |
3 years ago |
| 420 |
Stethoscope-guided Supervised Contrastive Learning for Cross-domain Adaptation on Respiratory Sound Classification
June-Woo Kim, Sangmin Bae, ... (+3 more)
|
👻
Ghosted
|
cs.SD
|
33 |
2 years ago |
| 421 |
MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error Correction
Jiajun He, Xiaohan Shi, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
33 |
2 years ago |
| 422 |
Multi-centrality Graph Spectral Decompositions and their Application to Cyber Intrusion Detection
Pin-Yu Chen, Sutanay Choudhury, Alfred O. Hero
|
👻
Ghosted
|
cs.SI
|
32 |
10 years ago |
| 423 |
Graph learning under sparsity priors
Hermina Petric Maretic, Dorina Thanou, Pascal Frossard
|
👻
Ghosted
|
cs.LG
|
32 |
8 years ago |
| 424 |
Why Do Neural Dialog Systems Generate Short and Meaningless Replies? A Comparison between Dialog and Translation
Bolin Wei, Shuai Lu, ... (+5 more)
|
👻
Ghosted
|
cs.CL
|
32 |
8 years ago |
| 425 |
Deep factorization for speech signal
Lantian Li, Dong Wang, ... (+4 more)
|
👻
Ghosted
|
eess.AS
|
32 |
8 years ago |
| 426 |
Encrypted Speech Recognition using Deep Polynomial Networks
Shi-Xiong Zhang, Yifan Gong, Dong Yu
|
👻
Ghosted
|
cs.CR
|
32 |
7 years ago |
| 427 |
Accurate and Scalable Version Identification Using Musically-Motivated Embeddings
Furkan Yesiler, Joan Serrà, Emilia Gómez
|
👻
Ghosted
|
cs.SD
|
32 |
6 years ago |
| 428 |
A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems
Tuan Manh Lai, Quan Hung Tran, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
32 |
6 years ago |
| 429 |
What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Chung-Yi Li, Pei-Chieh Yuan, Hung-Yi Lee
|
👻
Ghosted
|
cs.CL
|
32 |
6 years ago |
| 430 |
Single channel voice separation for unknown number of speakers under reverberant and noisy settings
Shlomo E. Chazan, Lior Wolf, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
32 |
5 years ago |
| 431 |
CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition
Minglun Han, Linhao Dong, ... (+2 more)
|
👻
Ghosted
|
cs.CL
|
32 |
5 years ago |
| 432 |
An Embarrassingly Simple Model for Dialogue Relation Extraction
Fuzhao Xue, Aixin Sun, ... (+3 more)
|
👻
Ghosted
|
cs.CL
|
32 |
5 years ago |
| 433 |
ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal
Xuhang Chen, Xiaodong Cun, ... (+2 more)
|
👻
Ghosted
|
cs.CV
|
32 |
3 years ago |
| 434 |
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Chao-Han Huck Yang, Bo Li, ... (+5 more)
|
👻
Ghosted
|
cs.SD
|
32 |
3 years ago |
| 435 |
LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition
Fuyan Ma, Bin Sun, Shutao Li
|
👻
Ghosted
|
cs.CV
|
32 |
3 years ago |
| 436 |
VoiceLDM: Text-to-Speech with Environmental Context
Yeonghyeon Lee, Inmo Yeon, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
32 |
2 years ago |
| 437 |
BER Analysis of the box relaxation for BPSK Signal Recovery
Christos Thrampoulidis, Ehsan Abbasi, ... (+2 more)
|
👻
Ghosted
|
cs.IT
|
31 |
10 years ago |
| 438 |
Dialog Context Language Modeling with Recurrent Neural Networks
Bing Liu, Ian Lane
|
👻
Ghosted
|
cs.CL
|
31 |
9 years ago |
| 439 |
Towards Unsupervised Single-Channel Blind Source Separation using Adversarial Pair Unmix-and-Remix
Yedid Hoshen
|
👻
Ghosted
|
eess.SP
|
31 |
7 years ago |
| 440 |
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
Aakanksha Rana, Cagri Ozcinar, Aljoscha Smolic
|
👻
Ghosted
|
cs.SD
|
31 |
6 years ago |
| 441 |
CNN-based Analog CSI Feedback in FDD MIMO-OFDM Systems
Mahdi Boloursaz Mashhadi, Qianqian Yang, Deniz Gunduz
|
👻
Ghosted
|
cs.IT
|
31 |
6 years ago |
| 442 |
A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition
Dongwei Jiang, Wubo Li, ... (+6 more)
|
👻
Ghosted
|
eess.AS
|
31 |
6 years ago |
| 443 |
Occluded Person Re-Identification via Relational Adaptive Feature Correction Learning
Minjung Kim, MyeongAh Cho, ... (+3 more)
|
👻
Ghosted
|
cs.CV
|
31 |
3 years ago |
| 444 |
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell, Jee-weon Jung, ... (+2 more)
|
👻
Ghosted
|
eess.AS
|
31 |
2 years ago |
| 445 |
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Marco Comunità, Riccardo F. Gramaccioni, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
31 |
2 years ago |
| 446 |
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li, Yiwei Guo, ... (+2 more)
|
👻
Ghosted
|
cs.SD
|
31 |
2 years ago |
| 447 |
Self-Supervised Learning for Anomalous Sound Detection
Kevin Wilkinghoff
|
👻
Ghosted
|
eess.AS
|
31 |
2 years ago |
| 448 |
Fine-grained Disentangled Representation Learning for Multimodal Emotion Recognition
Haoqin Sun, Shiwan Zhao, ... (+4 more)
|
👻
Ghosted
|
cs.SD
|
31 |
2 years ago |
| 449 |
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Guan-Ting Lin, Prashanth Gurunath Shivakumar, ... (+7 more)
|
👻
Ghosted
|
cs.CL
|
31 |
2 years ago |
| 450 |
Single stream parallelization of generalized LSTM-like RNNs on a GPU
Kyuyeon Hwang, Wonyong Sung
|
👻
Ghosted
|
cs.NE
|
30 |
11 years ago |