| 1 |
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
Mohammad Javad Ranjbar Kalahroodi, Mohammad Amini, ... (+3 more)
|
|
cs.CL
|
0 |
1 month ago |
| 2 |
Controllable Accent Normalization via Discrete Diffusion
Qibing Bai, Yuhan Du, ... (+4 more)
|
|
eess.AS
|
0 |
1 month ago |
| 3 |
What Counts as Real? Speech Restoration and Voice Quality Conversion Pose New Challenges to Deepfake Detection
Shree Harsha Bokkahalli Satish, Harm Lameris, ... (+2 more)
|
|
cs.SD
|
0 |
1 month ago |
| 4 |
LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement
Chih-Ning Chen, Jen-Cheng Hou, ... (+4 more)
|
|
cs.SD
|
0 |
1 month ago |
| 5 |
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
Wei-Chih Chen, Chien-yu Huang, Hung-yi Lee
|
|
cs.SD
|
0 |
1 month ago |
| 6 |
VoXtream2: Full-stream TTS with dynamic speaking rate control
Nikita Torgashov, Gustav Eje Henter, Gabriel Skantze
|
|
eess.AS
|
0 |
1 month ago |
| 7 |
Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
Junwon Moon, Hyunjin Choi, ... (+3 more)
|
|
cs.SD
|
0 |
1 month ago |
| 8 |
Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces
Kwanghee Choi, Eunjung Yeo, ... (+3 more)
|
|
eess.AS
|
0 |
1 month ago |
| 9 |
Resurfacing Paralinguistic Awareness in Large Audio Language Models
Hao Yang, Minghan Wang, ... (+4 more)
|
|
cs.SD
|
0 |
1 month ago |
| 10 |
Affect Decoding in Phonated and Silent Speech Production from Surface EMG
Simon Pistrosch, Kleanthis Avramidis, ... (+5 more)
|
|
eess.AS
|
0 |
1 month ago |
| 11 |
Speak or Stay Silent: Context-Aware Turn-Taking in Multi-Party Dialogue
Kratika Bhagtani, Mrinal Anand, ... (+2 more)
|
|
cs.AI
|
0 |
1 month ago |
| 12 |
Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition
Yinfeng Xia, Jian Tang, ... (+3 more)
|
|
cs.SD
|
0 |
1 month ago |
| 13 |
Probabilistic Verification of Voice Anti-Spoofing Models
Evgeny Kushnir, Alexandr Kozodaev, ... (+4 more)
|
|
cs.SD
|
0 |
1 month ago |
| 14 |
AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
Duojia Li, Shuhan Zhang, ... (+6 more)
|
|
cs.SD
|
0 |
1 month ago |
| 15 |
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng, Ziyi Chen, ... (+8 more)
|
|
eess.AS
|
0 |
1 month ago |
| 16 |
Calibration-Reasoning Framework for Descriptive Speech Quality Assessment
Elizaveta Kostenok, Mathieu Salzmann, Milos Cernak
|
|
eess.AS
|
0 |
1 month ago |
| 17 |
How Contrastive Decoding Enhances Large Audio Language Models?
Tzu-Quan Lin, Wei-Ping Huang, ... (+2 more)
|
|
cs.SD
|
0 |
1 month ago |
| 18 |
DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
Jianing Yang, Yusuke Fujita, Yui Sudo
|
|
cs.CL
|
0 |
1 month ago |
| 19 |
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
Hezhao Zhang, Huang-Cheng Chou, ... (+2 more)
|
|
cs.SD
|
0 |
1 month ago |
| 20 |
Benchmarking Language Modeling for Lossless Compression of Full-Fidelity Audio
Phillip Long, Zachary Novack, Chris Donahue
|
|
cs.SD
|
0 |
1 month ago |
| 21 |
Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data
Pol Buitrago, Pol Gàlvez, ... (+2 more)
|
|
eess.AS
|
0 |
1 month ago |
| 22 |
Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks
Pol Buitrago, Oriol Pareras, ... (+2 more)
|
|
eess.AS
|
0 |
1 month ago |
| 23 |
DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining
Shangeth Rajaa
|
|
eess.AS
|
0 |
1 month ago |
| 24 |
Evolution Strategy-Based Calibration for Low-Bit Quantization of Speech Models
Lucas Rakotoarivony
|
|
cs.SD
|
0 |
1 month ago |
| 25 |
SoundWeaver: Semantic Warm-Starting for Text-to-Audio Diffusion Serving
Ayush Barik, Sofia Stoica, ... (+5 more)
|
|
cs.SD
|
0 |
1 month ago |
| 26 |
Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease
Abner Hernandez, Eunjung Yeo, ... (+13 more)
|
|
cs.CL
|
0 |
29 days ago |
| 27 |
Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning
Xi Xuan, Wenxin Zhang, ... (+4 more)
|
|
eess.AS
|
0 |
29 days ago |
| 28 |
TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild
Kai-Wei Chang, Yi-Cheng Lin, ... (+10 more)
|
|
cs.CL
|
0 |
29 days ago |
| 29 |
ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability
Yen-Ting Piao, Jay Chiehen Liao, ... (+6 more)
|
|
cs.SD
|
0 |
1 month ago |
| 30 |
Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction
Doyeop Kwak, Suyeon Lee, Joon Son Chung
|
|
eess.AS
|
0 |
1 month ago |
| 31 |
CAF-Score: Calibrating CLAP with LALMs for Reference-free Audio Captioning Evaluation
Insung Lee, Taeyoung Jeong, ... (+3 more)
|
|
cs.SD
|
0 |
1 month ago |
| 32 |
On Optimizing Multimodal Jailbreaks for Spoken Language Models
Aravind Krishnan, Karolina Stańczak, Dietrich Klakow
|
|
cs.LG
|
0 |
1 month ago |
| 33 |
Voice Privacy from an Attribute-based Perspective
Mehtab Ur Rahman, Martha Larson, Cristian Tejedor García
|
|
cs.SD
|
0 |
1 month ago |
| 34 |
DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
Maxime Poli, Manel Khentout, ... (+4 more)
|
|
cs.CL
|
0 |
1 month ago |
| 35 |
Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation
Timo K. Koch, Florian Bemmann, ... (+3 more)
|
|
cs.HC
|
0 |
1 month ago |
| 36 |
RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
Abhishek Kumar, Aashraya Sachdeva
|
|
cs.CL
|
0 |
1 month ago |
| 37 |
PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics
Sam Kirkham
|
|
cs.CL
|
0 |
1 month ago |
| 38 |
Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech
Jaesung Bae, Xiuwen Zheng, ... (+3 more)
|
|
eess.AS
|
0 |
1 month ago |
| 39 |
Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation
Joseph Liu, Nameer Hirschkind, ... (+2 more)
|
|
cs.LG
|
0 |
11 days ago |
| 40 |
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
Yi-Cheng Lin, Yusuke Hirota, ... (+2 more)
|
|
eess.AS
|
0 |
2 days ago |
| 41 |
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench
Ke Xu, Yuhao Wang, Yu Wang
|
|
cs.AI
|
0 |
5 days ago |
| 42 |
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
Yanfeng Shi, Pengfei Cai, ... (+6 more)
|
|
cs.SD
|
0 |
6 days ago |