R.I.P.
๐ป
Ghosted
Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models
May 04, 2026 ยท Grace Period ยท ๐ Interspeech 2026
Authors
Sandra Arcos-Holzinger, Sarah M. Erfani, James Bailey, Sanjeev Khudanpur
arXiv ID
2605.02715
Category
eess.AS: Audio & Speech
Cross-listed
cs.CR,
cs.LG
Citations
0
Venue
Interspeech 2026
Abstract
Self-supervised speech models (S3Ms) achieve strong downstream performance, yet their learned representations remain poorly understood under natural and adversarial perturbations. Prior studies rely on representation similarity or global dimensionality, offering limited visibility into local geometric changes. We ask: how do perturbations deform local geometry, and do these shifts track downstream automatic speech recognition (ASR) degradation? To address this, we present GRIDS, a framework using Local Intrinsic Dimensionality (LID) across layer-wise representations in WavLM and wav2vec 2.0. We find that LID increases for all low signal-to noise ratio (SNR) perturbations and diverges at high SNR: benign noise converges toward the clean profile, while adversarial inputs retain early-layer LID elevation. We show LID elevation co-occurs with increased WER, and that layer-wise LID features enable anomaly detection (AUROC 0.78-1.00), opening the door to transcript-free monitoring in S3Ms.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
R.I.P.
๐ป
Ghosted
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
R.I.P.
๐ป
Ghosted
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
R.I.P.
๐ป
Ghosted
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
R.I.P.
๐ป
Ghosted