⚰️ Sound

R.I.P. 👻 Ghosted

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

Xingchen Song, Di Wu, ... (+7 more)

cs.SD 🏛 ICASSP 📚 13 cites 3 years ago

R.I.P. 👻 Ghosted

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Kun Wei, Pengcheng Guo, Ning Jiang

cs.SD 🏛 Interspeech 📚 13 cites 3 years ago

R.I.P. 👻 Ghosted

Real-time Speech Emotion Recognition Based on Syllable-Level Feature Extraction

Abdul Rehman, Zhen-Tao Liu, ... (+3 more)

cs.SD 🏛 arXiv 📚 13 cites 4 years ago

R.I.P. 👻 Ghosted

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien

cs.SD 🏛 Interspeech 📚 12 cites 3 years ago

R.I.P. 👻 Ghosted

Seeing Sound, Hearing Sight: Uncovering Modality Bias and Conflict of AI models in Sound Localization

Yanhao Jia, Ji Xie, ... (+4 more)

cs.SD 🏛 arXiv 📚 12 cites 1 year ago

R.I.P. 👻 Ghosted

Automatic source localization and spectra generation from sparse beamforming maps

Armin Goudarzi, Carsten Spehr, Steffen Herbold

cs.SD 🏛 J.ASA 📚 12 cites 5 years ago

R.I.P. 👻 Ghosted

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

Haohan Guo, Heng Lu, ... (+6 more)

cs.SD 🏛 arXiv 📚 12 cites 5 years ago

R.I.P. 👻 Ghosted

WaveNODE: A Continuous Normalizing Flow for Speech Synthesis

Hyeongju Kim, Hyeonseung Lee, ... (+4 more)

cs.SD 🏛 arXiv 📚 12 cites 6 years ago

R.I.P. 👻 Ghosted

Machine learning for the recognition of emotion in the speech of couples in psychotherapy using the Stanford Suppes Brain Lab Psychotherapy Dataset

Colleen E. Crangle, Rui Wang, ... (+4 more)

cs.SD 🏛 arXiv 📚 12 cites 7 years ago

R.I.P. 👻 Ghosted

Weakly Supervised Training of Speaker Identification Models

Martin Karu, Tanel Alumäe

cs.SD 🏛 The Speaker and Language Recognition Workshop 📚 12 cites 7 years ago

R.I.P. 👻 Ghosted

A Convex Approximation of the Relaxed Binaural Beamforming Optimization Problem

Andreas I. Koutrouvelis, Richard C. Hendriks, ... (+2 more)

cs.SD 🏛 IEEE/ACM TASLP 📚 12 cites 8 years ago

R.I.P. 👻 Ghosted

Deep Learning of Human Perception in Audio Event Classification

Yi Yu, Samuel Beuret, ... (+2 more)

cs.SD 🏛 ICM 📚 12 cites 7 years ago

R.I.P. 👻 Ghosted

On Residual CNN in text-dependent speaker verification task

Egor Malykh, Sergey Novoselov, Oleg Kudashev

cs.SD 🏛 ICSC 📚 12 cites 9 years ago

R.I.P. 👻 Ghosted

OBTAIN: Real-Time Beat Tracking in Audio Signals

Ali Mottaghi, Kayhan Behdin, ... (+3 more)

cs.SD 🏛 arXiv 📚 12 cites 9 years ago

R.I.P. 👻 Ghosted

Melody Generation for Pop Music via Word Representation of Musical Properties

Andrew Shin, Leopold Crestel, ... (+7 more)

cs.SD 🏛 arXiv 📚 12 cites 8 years ago

R.I.P. 👻 Ghosted

PCA/LDA Approach for Text-Independent Speaker Recognition

Zhenhao Ge, Sudhendu R. Sharma, Mark J. T. Smith

cs.SD 🏛 Defense + Commercial Sensing 📚 12 cites 10 years ago

R.I.P. 👻 Ghosted

Max-margin Metric Learning for Speaker Recognition

Lantian Li, Dong Wang, ... (+2 more)

cs.SD 🏛 ICCSLP 📚 12 cites 10 years ago

R.I.P. 👻 Ghosted

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Wupeng Wang, Zexu Pan, ... (+3 more)

cs.SD 🏛 IEEE/ACM TASLP 📚 12 cites 1 year ago

R.I.P. 💀 404 Not Found

Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization

Vinaya Sree Katamneni, Ajita Rattani

cs.SD 🏛 2024 IEEE International Joint Conference on Biometrics (IJCB) 📚 12 cites 1 year ago

R.I.P. 👻 Ghosted

Zero-Shot Fake Video Detection by Audio-Visual Consistency

Xiaolou Li, Zehua Liu, ... (+4 more)

cs.SD 🏛 Interspeech 📚 12 cites 2 years ago

R.I.P. 👻 Ghosted

Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Zhaoxi Mu, Xinyu Yang

cs.SD 🏛 IJCAI 📚 12 cites 2 years ago

R.I.P. 👻 Ghosted

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Zhaoxi Mu, Xinyu Yang, ... (+2 more)

cs.SD 🏛 AAAI 📚 12 cites 2 years ago

R.I.P. 👻 Ghosted

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

Xueyuan Chen, Xi Wang, ... (+5 more)

cs.SD 🏛 ICASSP 📚 12 cites 2 years ago

R.I.P. 👻 Ghosted

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

Kun Wei, Bei Li, ... (+4 more)

cs.SD 🏛 IEEE/ACM TASLP 📚 12 cites 2 years ago

🏛️ The Sound Crypt