Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
December 20, 2023 ยท Entered Twilight ยท ๐ IEEE International Conference on Acoustics, Speech, and Signal Processing
Repo contents: .circleci, .github, .gitignore, .gitmodules, .pre-commit-config.yaml, CODE_OF_CONDUCT.md, CONTRIBUTING.md, LICENSE, MANIFEST.in, README.md, RELEASE.md, docs, examples, fairseq, fairseq_cli, hubconf.py, hydra_plugins, pyproject.toml, release_utils.py, scripts, setup.cfg, setup.py, tests, train.py
Authors
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
arXiv ID
2312.12783
Category
eess.AS: Audio & Speech
Cross-listed
cs.AI,
cs.CL,
cs.SD
Citations
2
Venue
IEEE International Conference on Acoustics, Speech, and Signal Processing
Repository
https://github.com/cs20s030/stable_distillation
โญ 4
Last Checked
1 month ago
Abstract
Continued self-supervised (SSL) pre-training for adapting existing SSL models to the target domain has shown to be extremely effective for low-resource Automatic Speech Recognition (ASR). This paper proposes Stable Distillation, a simple and novel approach for SSL-based continued pre-training that boosts ASR performance in the target domain where both labeled and unlabeled data are limited. Stable Distillation employs self-distillation as regularization for continued pre-training, alleviating the over-fitting issue, a common problem continued pre-training faces when the source and target domains differ. Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher. Next, we take the same initial pre-trained model as a student to perform continued pre-training while enforcing its hidden representations to be close to that of the teacher (via MSE loss). This student is then used for downstream ASR fine-tuning on the target dataset. In practice, Stable Distillation outperforms all our baselines by 0.8 - 7 WER when evaluated in various experimental settings.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Audio & Speech
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
R.I.P.
๐ป
Ghosted
DiffWave: A Versatile Diffusion Model for Audio Synthesis
R.I.P.
๐ป
Ghosted
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
R.I.P.
๐ป
Ghosted
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
R.I.P.
๐ป
Ghosted