Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition

October 14, 2022 Β· Declared Dead Β· πŸ› Spoken Language Technology Workshop

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jakob Poncelet, Hugo Van hamme arXiv ID 2210.07771 Category eess.AS: Audio & Speech Cross-listed cs.CL, cs.SD Citations 5 Venue Spoken Language Technology Workshop Last Checked 3 months ago
Abstract
TV subtitles are a rich source of transcriptions of many types of speech, ranging from read speech in news reports to conversational and spontaneous speech in talk shows and soaps. However, subtitles are not verbatim (i.e. exact) transcriptions of speech, so they cannot be used directly to improve an Automatic Speech Recognition (ASR) model. We propose a multitask dual-decoder Transformer model that jointly performs ASR and automatic subtitling. The ASR decoder (possibly pre-trained) predicts the verbatim output and the subtitle decoder generates a subtitle, while sharing the encoder. The two decoders can be independent or connected. The model is trained to perform both tasks jointly, and is able to effectively use subtitle data. We show improvements on regular ASR and on spontaneous and conversational ASR by incorporating the additional subtitle decoder. The method does not require preprocessing (aligning, filtering, pseudo-labeling, ...) of the subtitles.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Audio & Speech

Died the same way β€” πŸ‘» Ghosted